Email Regex: Why It Is Harder Than You Think

I've written probably 20 different email regex patterns in my career. Most of them were wrong. Here's what I learned after years of getting it wrong, and the patterns that actually work.

2024-06-119 min

Related toolRegex Tester

Use the tool alongside this guide for hands-on practice.

How I learned email validation the hard way

Okay so funny story. Back in 2019 I'm sitting at my desk at Šikulovi s.r.o. working on some random CSS thing when Petr calls. You know Petr? The bakery chain guy. Anyway he's upset because our signup form rejected his email. I ask him to spell it out: john.o'[email protected]

My regex choked on literally everything. The apostrophe in O'Brien. The plus sign Gmail uses for filtering. The .co.uk extension which apparently needs special handling. I spent the next three hours rewriting everything while maintaining a text file called emails_that_broke_prod.txt which I still update to this day.

So yeah that's how I learned that email validation is deceptively annoying. The RFC 5322 spec (the official email standard) allows insane stuff. "hello world"@example.com is valid. With the space in quotes. Comments in parentheses work somehow. I don't pretend to understand all of it anymore.

The pattern that actually works

Here's what I've been copying into projects for years: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Not RFC-perfect. Don't care. Catches typos like gmial.com which happens constantly (I see it maybe 3-4 times per week on high-traffic sites). Handles [email protected] for filtering. Works with weird TLDs like .photography and .co.uk.

Quick breakdown: ^ starts the match, [a-zA-Z0-9._%+-]+ gets the username with allowed special chars (found out percent signs are valid from a user in 2021), @ is the at sign, [a-zA-Z0-9.-]+ matches domain, \.[a-zA-Z]{2,} ensures theres a proper TLD with 2+ letters, $ ends it.

The $ is important btw. Had a form where someone typed "[email protected] please call me back" and it passed because there was nothing stopping text after the email. Oops.

The strict version nobody needs

One German insurance company wanted to block double dots like [email protected] (which is actually invalid per spec so fair enough). Gave them this monster:

^[a-zA-Z0-9](?:[a-zA-Z0-9._%+-]{0,62}[a-zA-Z0-9])?@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z]{2,})+$

Used it twice in five years. The simple one works for everyone else.

My embarrassing mistakes

Rejected [email protected] because plus signs looked wrong. Gmail literally invented plus addressing! Rejected [email protected] because my TLD check only allowed exactly 3 letters. Rejected [email protected] for being "too short" even though its perfectly valid.

Also had a guy with [email protected] - four subdomains! No idea why but hey its his email and it works. Learned not to question these things.

Oh and [email protected] which taught me emails are case-insensitive. That one I learned pretty early at least.

What should actually fail

No @ at all - obvious. Just @example.com with nothing before - who are you?? Just email@ with nothing after - going where exactly? Double dots like [email protected] - invalid per RFC. Starting with a dot like [email protected] - also invalid. Domain starting with hyphen like [email protected] - nope.

If your pattern lets any of these through go fix it.

Browser validation is unreliable

Yeah input type="email" exists. Chrome does it one way, Firefox slightly different, Safari is Safari. I always validate server-side too because people disable JavaScript and pentesters definitely don't run your frontend.

Had a security audit once where the guy curled garbage directly to our API. Backend validation saved us there.

International emails caught me off guard

Munich project 2022. Client's coworker had mü[email protected] with the umlaut. My ASCII-only regex rejected it. Except RFC 6531 made international emails valid years ago??

So now пользователь@example.com is real. So is user@例え.jp. You need the /u flag for Unicode in JavaScript or do Punycode conversion. Honestly I usually just ask if they have an alternate ASCII email. Not elegant but works.

Nobody uses the weird RFC features

A fully RFC-5322 compliant regex is like 6000 characters. The spec allows quoted strings with spaces, inline comments, all sorts of chaos. In eight years I've never seen a real user with "john doe"@example.com as their actual email.

If one shows up someday I'll handle it manually. Not building a regex monster for a 0.001% case.

What actually matters

Regex catches format errors and typos. Verification emails catch fake addresses. Don't be too strict or users get frustrated.

Built a typo suggester that asks "Did you mean gmail.com?" when someone types gmial.com. Catches mistakes before they bounce. Clients love it because it reduces support tickets.

For other languages same pattern works everywhere: Python uses re.match(), PHP uses preg_match(), Java needs extra backslashes as usual. Test everything with a collection of weird emails before shipping. I keep 47 test cases in a file from past production incidents. Five minutes of testing beats Friday evening hotfixes.

FAQ

What is the best regex pattern for email validation?

I use ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ for most projects. It's not perfect, but it catches obvious typos without rejecting valid emails. Good enough for 99% of cases.

Should I use regex or a library for email validation?

For quick form validation? Regex is fine. For anything serious, use a library like validator.js. Trust me, they've already handled the edge cases you haven't thought of.

Why does my email regex reject valid addresses?

Probably too strict. I've made this mistake a dozen times. Check if you're allowing plus signs (user+tag@), long TLDs like .photography, and subdomains. Test with real emails!

Is regex enough for email validation?

Nope. Regex just checks format. The email could be fake or mistyped. If it matters, send a verification email. That's the only way to know it actually works.

How do I validate email addresses in JavaScript?

Simple: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email). Returns true or false. I've got this pattern memorized at this point.

Can regex validate all RFC 5322 compliant emails?

Technically yes, but the pattern would be insane. The spec allows quoted strings, comments, all sorts of weird stuff. Just use a simple pattern and handle the rare exceptions manually.

MŠ

Martin Šikula

Founder of CodeUtil. Web developer building tools I actually use. When I'm not coding, I experiment with productivity techniques (with mixed success).

More about me →LinkedIn

April 15, 202411 min

How to Debug Regular Expressions Step by Step

After years of staring at regex patterns that should work but don't, I developed a systematic debugging approach. Here's my step-by-step method for finding and fixing regex bugs.

Regex Testerregexdebuggingtext tools

December 8, 202510 min

Regex Cheat Sheet 2026: Complete Reference for Regular Expressions

This is the regex cheat sheet I keep bookmarked. After years of writing patterns at Šikulovi s.r.o., I have compiled the syntax I actually use daily, plus the gotchas that used to trip me up.

Regex Testerregextext toolscheat sheet

June 28, 202411 min

Regex Tester Online: Build, Test, and Debug Regular Expressions

I used to debug regex by trial and error in my code. Compile, test, fail, repeat. Now I test patterns live before writing a single line. Here's how I actually use this thing, plus the patterns I copy-paste constantly.

Regex Testerregextext toolsjavascript