How to Debug Regular Expressions Step by Step
After years of staring at regex patterns that should work but don't, I developed a systematic debugging approach. Here's my step-by-step method for finding and fixing regex bugs.
This is the regex cheat sheet I keep bookmarked. After years of writing patterns at Šikulovi s.r.o., I have compiled the syntax I actually use daily, plus the gotchas that used to trip me up.
I will be honest - I spent years being intimidated by regular expressions. They looked like line noise. But once I started using them daily for log parsing and data validation at Šikulovi s.r.o., something clicked. Now I reach for regex instinctively whenever I need to find or transform text patterns.
This is the cheat sheet I wish I had when starting out. I have organized it the way I actually think about regex - starting with the basics I use constantly, then building up to the patterns that took me longer to internalize.
Let me start with the fundamentals. Most characters match themselves literally, but some have special meaning that I had to memorize. These special characters trip up beginners constantly.
Quantifiers are where regex gets really powerful. These tell the engine how many times to match something. I use * and + constantly, but the lazy versions (*? and +?) saved me countless debugging sessions.
Anchors bit me constantly when I was learning. They match positions, not characters. Forgetting anchors means matching the wrong part of the string. Adding unnecessary anchors means failing to match valid input.
These shorthands are the ones I use most. \d for digits, \w for word characters, \s for whitespace. Their uppercase versions match the opposite. I probably type \d fifty times a day.
Groups let me extract parts of a match, which I need constantly for parsing. Named groups (?<name>...) are a game changer for readability - I switched to them a few years ago and never looked back.
Flags change how the whole pattern behaves. I forget the i flag constantly and wonder why my pattern does not match. The g flag is essential for finding all matches, not just the first.
Email validation is where I learned that perfect regex does not exist. The RFC allows crazy stuff like quotes and comments. This pattern covers 99% of real-world emails:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
It is not perfect (nothing is for email), but it catches obvious typos without rejecting valid addresses. For production, I honestly just send a verification email - that is the only real validation.
URL patterns are tricky because URLs can contain so many characters. This is the pattern I use for basic extraction:
https?:\/\/[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=%]*
For strict validation, I prefer using the URL constructor in JavaScript - try/catch tells me if it is valid. But for finding URLs in text, regex works great.
Phone numbers are a nightmare because every country does them differently. I usually normalize to digits only before validation, but here are patterns for when you need to accept formatted input:
I parse dates from logs constantly. These patterns handle the most common formats. Note that they do not validate that the date is real (like Feb 30) - that is better done after extraction with a proper date library.
Password validation is a great example of using multiple lookaheads. Each (?=...) asserts a different requirement without consuming characters:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This enforces 8+ characters with lowercase, uppercase, digit, and special character. I adjust based on the project security requirements - sometimes simpler is better for user experience.
This concept took me a while to internalize. Greedy quantifiers match as much as possible, lazy quantifiers match as little as possible. Adding ? after a quantifier makes it lazy.
I learned about catastrophic backtracking the hard way when a regex hung our server. These tips come from real production pain:
Since I work primarily in JavaScript/TypeScript at Šikulovi s.r.o., I use these methods daily. test() for validation, match()/matchAll() for extraction, replace() for transformation.
I have made every single one of these mistakes at some point. Now I keep this list in mind when debugging patterns that do not work:
The g (global) flag finds ALL matches in the string instead of stopping at the first. Without it, match() returns only the first result - which confused me for years until I understood the difference.
Escape with a backslash: \. matches a literal period, \* matches an asterisk. I still occasionally forget to escape and wonder why my pattern matches everything.
.* is greedy - it gobbles up as many characters as possible. .*? is lazy - it takes as few as possible. I learned this distinction after spending hours debugging a pattern that matched too much.
No, and I wish someone had told me this earlier. Regex cannot count nesting depth, so it cannot reliably match nested structures. For HTML, XML, or JSON, I always use a proper parser now.
Add the i flag: /pattern/i in JavaScript. This is probably the flag I forget most often when debugging patterns that should match but don't.
A lookahead checks if something follows without including it in the match. (?=abc) is positive (must be followed by abc), (?!abc) is negative (must NOT be followed by abc). I use them for password validation and conditional matching.
Founder of CodeUtil. Web developer building tools I actually use. When I'm not coding, I experiment with productivity techniques (with mixed success).
After years of staring at regex patterns that should work but don't, I developed a systematic debugging approach. Here's my step-by-step method for finding and fixing regex bugs.
I've written probably 20 different email regex patterns in my career. Most of them were wrong. Here's what I learned after years of getting it wrong, and the patterns that actually work.
I used to debug regex by trial and error in my code. Compile, test, fail, repeat. Now I test patterns live before writing a single line. Here's how I actually use this thing, plus the patterns I copy-paste constantly.