Skip to main content
C
CodeUtil

Regex Cheat Sheet 2026: Complete Reference for Regular Expressions

This is the regex cheat sheet I keep bookmarked. After years of writing patterns at Šikulovi s.r.o., I have compiled the syntax I actually use daily, plus the gotchas that used to trip me up.

2025-12-0810 min
Related toolRegex Tester

Use the tool alongside this guide for hands-on practice.

My relationship with regex

I will be honest - I spent years being intimidated by regular expressions. They looked like line noise. But once I started using them daily for log parsing and data validation at Šikulovi s.r.o., something clicked. Now I reach for regex instinctively whenever I need to find or transform text patterns.

This is the cheat sheet I wish I had when starting out. I have organized it the way I actually think about regex - starting with the basics I use constantly, then building up to the patterns that took me longer to internalize.

Basic character matching

Let me start with the fundamentals. Most characters match themselves literally, but some have special meaning that I had to memorize. These special characters trip up beginners constantly.

  • abc matches the exact string "abc"
  • . matches any single character except newline
  • \. matches a literal period (escape special characters with backslash)
  • [abc] matches any one character: a, b, or c
  • [^abc] matches any character except a, b, or c
  • [a-z] matches any lowercase letter
  • [A-Za-z0-9] matches any alphanumeric character

Quantifiers

Quantifiers are where regex gets really powerful. These tell the engine how many times to match something. I use * and + constantly, but the lazy versions (*? and +?) saved me countless debugging sessions.

  • * matches 0 or more times (greedy)
  • + matches 1 or more times (greedy)
  • ? matches 0 or 1 time (optional)
  • {n} matches exactly n times
  • {n,} matches n or more times
  • {n,m} matches between n and m times
  • *? +? ?? lazy versions (match as few as possible)

Anchors and boundaries

Anchors bit me constantly when I was learning. They match positions, not characters. Forgetting anchors means matching the wrong part of the string. Adding unnecessary anchors means failing to match valid input.

  • ^ matches start of string (or line with m flag)
  • $ matches end of string (or line with m flag)
  • \b matches word boundary
  • \B matches non-word boundary
  • (?=...) positive lookahead (followed by)
  • (?!...) negative lookahead (not followed by)
  • (?<=...) positive lookbehind (preceded by)
  • (?<!...) negative lookbehind (not preceded by)

Character classes

These shorthands are the ones I use most. \d for digits, \w for word characters, \s for whitespace. Their uppercase versions match the opposite. I probably type \d fifty times a day.

  • \d matches any digit [0-9]
  • \D matches any non-digit [^0-9]
  • \w matches word character [A-Za-z0-9_]
  • \W matches non-word character
  • \s matches whitespace (space, tab, newline)
  • \S matches non-whitespace
  • \t matches tab
  • \n matches newline

Groups and capturing

Groups let me extract parts of a match, which I need constantly for parsing. Named groups (?<name>...) are a game changer for readability - I switched to them a few years ago and never looked back.

  • (abc) capturing group, matches "abc" and remembers it
  • (?:abc) non-capturing group, matches but does not capture
  • (?<name>abc) named capturing group
  • \1 \2 backreference to captured group
  • (a|b) alternation, matches "a" or "b"

Common regex flags

Flags change how the whole pattern behaves. I forget the i flag constantly and wonder why my pattern does not match. The g flag is essential for finding all matches, not just the first.

  • g global: find all matches, not just the first
  • i case-insensitive matching
  • m multiline: ^ and $ match line start/end
  • s dotall: . matches newline characters
  • u unicode: enable full Unicode matching
  • y sticky: match only at lastIndex position

Email validation pattern

Email validation is where I learned that perfect regex does not exist. The RFC allows crazy stuff like quotes and comments. This pattern covers 99% of real-world emails:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

It is not perfect (nothing is for email), but it catches obvious typos without rejecting valid addresses. For production, I honestly just send a verification email - that is the only real validation.

URL matching pattern

URL patterns are tricky because URLs can contain so many characters. This is the pattern I use for basic extraction:

https?:\/\/[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=%]*

For strict validation, I prefer using the URL constructor in JavaScript - try/catch tells me if it is valid. But for finding URLs in text, regex works great.

Phone number patterns

Phone numbers are a nightmare because every country does them differently. I usually normalize to digits only before validation, but here are patterns for when you need to accept formatted input:

  • US: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
  • International: \+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}
  • Digits only: ^\d{10,15}$

Date and time patterns

I parse dates from logs constantly. These patterns handle the most common formats. Note that they do not validate that the date is real (like Feb 30) - that is better done after extraction with a proper date library.

  • ISO 8601: \d{4}-\d{2}-\d{2}
  • US format: \d{2}/\d{2}/\d{4}
  • Time 24h: \d{2}:\d{2}(:\d{2})?
  • Datetime: \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}

Password strength validation

Password validation is a great example of using multiple lookaheads. Each (?=...) asserts a different requirement without consuming characters:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

This enforces 8+ characters with lowercase, uppercase, digit, and special character. I adjust based on the project security requirements - sometimes simpler is better for user experience.

Greedy vs lazy matching

This concept took me a while to internalize. Greedy quantifiers match as much as possible, lazy quantifiers match as little as possible. Adding ? after a quantifier makes it lazy.

  • <.*> greedy: matches "<div>content</div>" entirely
  • <.*?> lazy: matches "<div>" only
  • Use lazy quantifiers when extracting content between delimiters
  • Greedy is faster when you expect long matches

Performance tips

I learned about catastrophic backtracking the hard way when a regex hung our server. These tips come from real production pain:

  • Avoid nested quantifiers like (a+)+ which cause catastrophic backtracking
  • Use anchors (^ $) when matching whole strings
  • Prefer specific character classes over .
  • Use non-capturing groups (?:) when you do not need the match
  • Compile regex once and reuse when matching multiple strings
  • Consider atomic groups or possessive quantifiers if supported

JavaScript regex methods

Since I work primarily in JavaScript/TypeScript at Šikulovi s.r.o., I use these methods daily. test() for validation, match()/matchAll() for extraction, replace() for transformation.

  • test() returns true/false if pattern matches
  • exec() returns match array with groups and index
  • match() returns all matches (with g flag) or first match
  • matchAll() returns iterator of all matches with groups
  • replace() replaces matches with a string or function
  • split() splits string by pattern matches
  • search() returns index of first match or -1

Common mistakes to avoid

I have made every single one of these mistakes at some point. Now I keep this list in mind when debugging patterns that do not work:

  • Forgetting to escape special characters like . * + ? [ ] ( ) { } \ ^ $ |
  • Using .* when you mean [^\n]* (greedy dot matches too much)
  • Not anchoring patterns that should match whole strings
  • Overcomplicating patterns that could be simpler
  • Not testing edge cases like empty strings or special characters
  • Assuming all regex flavors are identical (they differ between languages)

FAQ

What does the g flag do in regex?

The g (global) flag finds ALL matches in the string instead of stopping at the first. Without it, match() returns only the first result - which confused me for years until I understood the difference.

How do I match a literal dot or other special character?

Escape with a backslash: \. matches a literal period, \* matches an asterisk. I still occasionally forget to escape and wonder why my pattern matches everything.

What is the difference between .* and .*? in regex?

.* is greedy - it gobbles up as many characters as possible. .*? is lazy - it takes as few as possible. I learned this distinction after spending hours debugging a pattern that matched too much.

Can regex match nested structures like HTML?

No, and I wish someone had told me this earlier. Regex cannot count nesting depth, so it cannot reliably match nested structures. For HTML, XML, or JSON, I always use a proper parser now.

How do I make regex case-insensitive?

Add the i flag: /pattern/i in JavaScript. This is probably the flag I forget most often when debugging patterns that should match but don't.

What is a lookahead in regex?

A lookahead checks if something follows without including it in the match. (?=abc) is positive (must be followed by abc), (?!abc) is negative (must NOT be followed by abc). I use them for password validation and conditional matching.

Martin Šikula

Founder of CodeUtil. Web developer building tools I actually use. When I'm not coding, I experiment with productivity techniques (with mixed success).

Related articles

9 min

Email Regex: Why It Is Harder Than You Think

I've written probably 20 different email regex patterns in my career. Most of them were wrong. Here's what I learned after years of getting it wrong, and the patterns that actually work.

Regex Testerregexemailvalidation