Skip to main content
C
CodeUtil

Regex Tester Online: Build, Test, and Debug Regular Expressions

I used to debug regex by trial and error in my code. Compile, test, fail, repeat. Now I test patterns live before writing a single line. Here's how I actually use this thing, plus the patterns I copy-paste constantly.

2024-06-2811 min
Related toolRegex Tester

Use the tool alongside this guide for hands-on practice.

Why I stopped debugging regex in my code

I've wasted so many hours debugging regex patterns directly in my codebase. Write pattern, run code, doesn't work, tweak pattern, run again. The feedback loop was painfully slow.

Now I paste my test string into the regex tester, build the pattern while watching matches highlight in real time, and only then copy it into my code. Saves me 10-15 minutes every single time. At Šikulovi s.r.o. we process a lot of text - log files, user input validation, data extraction. This tool gets opened at least once a day.

How I build patterns step by step

Here's my actual workflow. I start with a tiny piece of my target string, get that matching, then add complexity one token at a time. Trying to write the full pattern in one go is how I used to waste hours.

  • Start small - match one literal word first
  • Add character classes one at a time
  • Test with edge cases BEFORE adding more complexity
  • Add capture groups only when I actually need the values
  • Set flags explicitly - never assume defaults

Regex flags explained

Flags change everything. I've had patterns that worked perfectly until someone added the multiline flag and broke the whole thing. Here's what each flag actually does:

  • g (global) - Find all matches, not just the first. Without this, you get one match and stop.
  • i (case insensitive) - A matches a. Obvious but I forget it constantly.
  • m (multiline) - ^ and $ match line starts/ends, not just string start/end. This one bites me.
  • s (dotAll) - Makes . match newlines too. By default . matches everything EXCEPT newlines.
  • u (unicode) - Enables proper Unicode support. Need this for emoji and non-ASCII text.
  • y (sticky) - Matches only at lastIndex position. I rarely use this one honestly.

The mistakes that always get me

Greedy quantifiers. Every. Single. Time. I write .* expecting it to stop at the first match, and it gobbles up everything to the LAST match. Classic example: trying to match HTML tags with <.*> and watching it match from the first < to the very last > in the document.

  • Greedy .* - Use .*? (lazy) or be more specific about what you want
  • Forgetting to escape special chars - . matches ANY character, \. matches a literal dot
  • ^ and $ in multiline mode - They match every line break, not just start/end of string
  • Forgetting the g flag - Match returns only the first result
  • Catastrophic backtracking - Nested quantifiers like (a+)+ can freeze your browser

Patterns I copy-paste constantly

I have a text file of regex patterns I've collected over the years. Here are the ones I actually use regularly:

  • Email (simple): ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
  • URL: https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=%]+
  • IPv4: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
  • Date (YYYY-MM-DD): \d{4}-\d{2}-\d{2}
  • Time (HH:MM:SS): \d{2}:\d{2}:\d{2}
  • Hex color: #[0-9A-Fa-f]{6}\b
  • Phone (international): \+?[0-9]{1,4}[-.\s]?\(?[0-9]{1,3}\)?[-.\s]?[0-9]{1,4}[-.\s]?[0-9]{1,9}
  • Whitespace trim: ^\s+|\s+$ (with replace)

Capture groups and backreferences

Capture groups are probably my most-used regex feature. Parentheses () create a group that you can reference later - either in the pattern itself or in the replacement string.

Say I want to swap first and last name. Pattern: (\w+) (\w+) with replacement $2 $1 turns 'John Doe' into 'Doe John'. I use this for reformatting log lines all the time.

  • (pattern) - Capturing group, accessible as $1, $2, etc.
  • (?:pattern) - Non-capturing group, just for grouping without capturing
  • (?<name>pattern) - Named capture group, accessible as $<name>
  • \1, \2 - Backreference within the same pattern (match same text again)
  • $1, $2 - Reference captured text in replacement string

Lookahead and lookbehind

These are advanced but incredibly useful. They let you match based on what comes before or after, without including it in the match. Took me years to use these confidently.

Real example: I needed to match prices but only in EUR, not USD. Pattern: \d+(?=\s*EUR) matches '100' in '100 EUR' but not '100 USD'. The EUR isn't included in the match, just used as a condition.

  • (?=pattern) - Positive lookahead: match if followed by pattern
  • (?!pattern) - Negative lookahead: match if NOT followed by pattern
  • (?<=pattern) - Positive lookbehind: match if preceded by pattern
  • (?<!pattern) - Negative lookbehind: match if NOT preceded by pattern

When NOT to use regex

Regex is powerful but it's not always the right tool. I've learned this the hard way after writing unmaintainable patterns that nobody (including future me) could understand.

  • HTML parsing - Use a proper parser. Regex cannot handle nested tags correctly.
  • JSON/XML parsing - Built-in parsers exist for a reason.
  • Complex validation - Sometimes a simple if/else is clearer than a 200-char pattern.
  • When readability matters more than cleverness - If colleagues can't review it, don't use it.
  • Performance-critical code with complex patterns - Measure first, regex can be slow.

Real examples from my projects

Here are actual regex problems I solved recently at Šikulovi s.r.o.:

  • Extract order IDs from email subjects: ORD-\d{6} matched "ORD-123456"
  • Clean phone numbers: [^0-9+] with empty replacement strips everything except digits and +
  • Find console.log statements: console\.log\([^)]*\) for code cleanup before production
  • Parse Apache log IPs: ^(\S+) captures the first non-whitespace chunk
  • Validate Czech postal codes: \d{3}\s?\d{2} matches "123 45" or "12345"

Performance tips

Regex can be slow. I once brought down a server with a poorly written pattern that caused catastrophic backtracking. Here's what I learned:

  • Avoid nested quantifiers like (a+)+ - These can cause exponential backtracking
  • Be specific - [a-z]+ is faster than .+ when you know the content
  • Anchor when possible - ^pattern is faster than searching the whole string
  • Compile once, use many times - In loops, create the regex outside the loop
  • Test with large inputs - What works on 10 chars might freeze on 10,000

Testing methodology

Before I put any regex in production code, I test it with at least these cases:

  • Happy path - The exact input I expect
  • Edge cases - Empty string, single character, very long input
  • Almost matches - Inputs that look similar but should NOT match
  • Unicode - If applicable, test with emoji and accented characters
  • Line breaks - Does your pattern handle \n and \r\n correctly?

FAQ

What regex flavor is supported?

JavaScript's RegExp engine. Same syntax you'd use in Node.js or browser JavaScript. Most patterns work across languages, but some features like lookbehind have varying support.

How do I debug replacements?

Use capture groups with parentheses, then reference them as $1, $2 in your replacement. For example, (\w+)@(\w+) with replacement '$1 at $2' turns 'user@domain' into 'user at domain'.

Why is my pattern not matching anything?

Common issues: forgot to escape special characters (. needs to be \.), pattern is case-sensitive but input is different case (add i flag), or you're missing the g flag for global matching.

How do I match across multiple lines?

Two options: use the s flag to make . match newlines, or use [\s\S] which matches any character including newlines. Also consider the m flag if you need ^ and $ to match line boundaries.

Can I use this regex in Python/PHP/Java?

Most basic patterns work across languages. Watch out for: escaping differences (Java needs double backslashes), flag syntax varies, and some features like lookbehind have different support levels.

How do I match special characters literally?

Escape them with backslash: \. for dot, \* for asterisk, \? for question mark, \[ for bracket, \\ for backslash itself. The special chars are: . * + ? ^ $ { } [ ] \ | ( )

What causes catastrophic backtracking?

Nested quantifiers like (a+)+ or patterns with multiple ways to match the same text. The regex engine tries every possible combination, which can be exponential. Be specific about what you're matching.

Martin Šikula

Founder of CodeUtil. Web developer building tools I actually use. When I'm not coding, I experiment with productivity techniques (with mixed success).

Related articles

9 min

Email Regex: Why It Is Harder Than You Think

I've written probably 20 different email regex patterns in my career. Most of them were wrong. Here's what I learned after years of getting it wrong, and the patterns that actually work.

Regex Testerregexemailvalidation