Regular expressions (regex) are the industry-standard tool for searching, validating, and transforming text. Once you learn the dozen or so building blocks — character classes, quantifiers, anchors, groups, and lookarounds — you can solve in one line what would otherwise take dozens of lines of procedural code. This guide takes you from zero to advanced patterns with copy-paste-ready snippets for email, URL, phone, IP, hex color, ISO date, JWT, and more — all testable in our Regex Tester.
What Is a Regular Expression?
A regular expression is a pattern describing a set of strings. The math behind it goes back
to Stephen Cole Kleene in 1951, but every modern language ships a regex engine: JavaScript
(RegExp), Python (re), Java, .NET, Go, Rust, PHP (PCRE), Perl, and even shell tools
like grep, sed, and awk.
📖 Definition — Regex answers the question "does this string match this pattern, and if so, where?" Engines also support capture groups for extraction and replacement for transformation.
Core Syntax Cheat Sheet
| Category | Token | Meaning | Example |
|---|---|---|---|
| Character class | [abc] | a, b, or c | [aeiou] |
[a-z] | range | [A-Za-z0-9] | |
[^abc] | negation | [^0-9] | |
| Shorthand | \d | digit ([0-9]) | \d{4} |
\w | word char ([A-Za-z0-9_]) | \w+ | |
\s | whitespace | \s* | |
. | any char (except \n by default) | a.b | |
| Anchor | ^ | start | ^Hello |
$ | end | !$ | |
\b | word boundary | \bword\b | |
| Quantifier | * | 0 or more | a* |
+ | 1 or more | \d+ | |
? | 0 or 1 (optional) | colou?r | |
{n} | exactly n | \d{4} | |
{n,m} | n to m | \d{2,5} | |
| Group | (...) | capturing | (\d+) |
(?:...) | non-capturing | (?:abc){2} | |
(?<name>...) | named capture | (?<year>\d{4}) | |
| Alternation | a|b | a or b | cat|dog |
\x | escape special char | \. |
Flags That Change Behavior
| Flag | Effect | JS | Python |
|---|---|---|---|
g | find all matches, not just first | ✅ | use findall |
i | case-insensitive | ✅ | re.IGNORECASE |
m | multiline (^/$ per line) | ✅ | re.MULTILINE |
s | dotall (. matches \n) | ✅ | re.DOTALL |
u | full Unicode + named props | ✅ | default in Py3 |
x | verbose (whitespace + comments) | — | re.VERBOSE |
Practical Patterns Library
| Use Case | Pattern | Notes |
|---|---|---|
| Email (pragmatic) | ^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$ | Covers ~99% of real addresses; not RFC 5322 complete |
| HTTPS URL | ^https?:\/\/[\w.-]+(?:\:\d+)?(?:\/[^\s]*)?$ | Tighten with allowed-TLD list if needed |
| US phone | ^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$ | Use libphonenumber for international |
| IPv4 (strict 0–255) | ^((25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(25[0-5]|2[0-4]\d|1?\d\d?)$ | Validates each octet |
| Hex color | ^#(?:[0-9A-Fa-f]{3,4}|[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$ | Supports 3/4/6/8 forms |
| ISO 8601 date | ^\d{4}-\d{2}-\d{2}(?:T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+\-]\d{2}:\d{2})?)?$ | Date or full date-time |
| UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ | Strict v4 |
| Slug | ^[a-z0-9]+(?:-[a-z0-9]+)*$ | SEO-safe URL slug |
| JWT | ^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$ | Three Base64URL parts |
| Credit card (Visa) | ^4\d{12}(?:\d{3})?$ | 13 or 16 digits |
| Strong password | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w]).{12,}$ | Requires upper/lower/digit/symbol, 12+ chars |
Drop any of these into the Regex Tester, paste your sample text, and watch matches highlight in real time. The tool also exports working JavaScript and Python snippets.
Groups, Captures & Backreferences
# Pattern: (\w+) (\w+)
# Input: John Smith
Group 0 (full match) = "John Smith"
Group 1 = "John"
Group 2 = "Smith"
# Backreference (\w+)\s+\1 finds repeated words like "the the".
# Named: (?<first>\w+)\s+(?<last>\w+)
# JS: match.groups.first
# PY: m.group("first")
Lookahead & Lookbehind
Zero-width assertions match a position without consuming characters:
| Syntax | Meaning | Example |
|---|---|---|
(?=...) | positive lookahead | \d+(?= dollars) |
(?!...) | negative lookahead | \d+(?! cents) |
(?<=...) | positive lookbehind | (?<=\$)\d+ |
(?<!...) | negative lookbehind | (?<!\\)" |
Search & Replace
# JavaScript
"John Smith".replace(/(\w+) (\w+)/, "$2, $1"); // "Smith, John"
# Python
import re
re.sub(r"(\w+) (\w+)", r"\2, \1", "John Smith") // "Smith, John"
# Named groups (JS):
"2026-01-15".replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, "$<d>/$<m>/$<y>");
// "15/01/2026"
Performance & Catastrophic Backtracking
Naive patterns can blow up exponentially. The classic offender is nested quantifiers like
(a+)+b on input aaaaaaaaaaaaaaaaaaaaa! — the engine tries every partition before
giving up.
🚫 Avoid nested unbounded quantifiers over the same character class ((a+)+, (\d*)*). They turn linear input into exponential CPU, a vector for ReDoS denial-of-service attacks.
Prefer possessive quantifiers or atomic groups (where supported) to suppress backtracking.
Use specific character classes instead of . — e.g., [^"]* inside quoted strings.
Anchor when possible (^, $) to bound the search.
Use RE2-based engines (Go, Rust's regex crate) for guaranteed linear-time matching on untrusted input.
Engine Differences (JS, Python, PCRE)
| Feature | JS (V8) | Python re | PCRE / Perl | RE2 / Go |
|---|---|---|---|---|
| Lookbehind | ✅ (variable-width) | ✅ (fixed-width) | ✅ | ❌ |
| Named groups | ✅ (?<n>) | ✅ (?P<n>) | ✅ | ✅ |
| Backreferences | ✅ | ✅ | ✅ | ❌ |
| Recursion | ❌ | ❌ | ✅ | ❌ |
| Linear-time guarantee | ❌ | ❌ | ❌ | ✅ |
Unicode props \p{} | ✅ (with u) | ✅ via regex | ✅ | partial |
Common Mistakes
| Mistake | Symptom | Fix |
|---|---|---|
Greedy .* across HTML | matches whole page | Use .*? (lazy) or specific class |
Forgetting to escape . | matches any char | \. for literal dot |
Missing ^/$ | partial validation passes | Anchor full-string validators |
| Parsing HTML/JSON with regex | nested structures break | Use a real parser (DOM/JSON.parse) |
| Hardcoding TLD length | fails for .museum, .travel | Use {2,} |
| Building from user input | ReDoS denial-of-service | Sanitize, time out, or use RE2 |
Tools
- 🔧 Regex Tester — Live highlighting, group inspection, JS & Python code export.
- 🔧 JSON Formatter — Use after extracting data with capture groups.
- 🔧 URL Encoder — Encode regex output for query strings.
Frequently Asked Questions
Is there a regex that fully validates email addresses?
Strictly speaking, an RFC 5322-conformant email regex spans hundreds of characters and almost no one uses it. The pragmatic ^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$ covers ~99% of real-world addresses and is what most form validators use.
Should I use regex to parse HTML?
No. HTML is a context-free language and regex is fundamentally regular. Use the DOM parser (DOMParser, BeautifulSoup, cheerio). Regex is fine for tiny extraction tasks but breaks on real markup.
What is "catastrophic backtracking"?
When an engine explores exponentially many combinations to fail a match. Caused by patterns like (a+)+. It can freeze servers — the OWASP term is ReDoS. Use atomic groups, RE2 engines, or input-size limits to defend.
How do I make my regex case-insensitive?
Use the i flag in JavaScript (/foo/i) and Perl/PCRE, re.IGNORECASE in Python. Inline modifier (?i) works in many flavors.
What does the dot (.) actually match?
Any single character except a newline by default. With the s (dotall) flag, it also matches \n. Always specify the character class explicitly when you want full coverage.
Are regex engines all the same?
No. JS, Python, PCRE/Perl, .NET, Go, Rust, and POSIX BRE/ERE all differ. Lookbehind, recursion, and named-group syntax vary. Test with the engine your code will run on.
References
- 📄 MDN — Regular Expressions
- 📄 Python
remodule documentation - 📄 PCRE2 syntax reference
- 📄 RE2 syntax wiki (linear-time engine)
- 📄 Regular-Expressions.info — comprehensive reference
- 📄 OWASP — ReDoS attack
🚀 Free ToolZilla tools used in this article
All client-side, no signup, no upload — open them in a new tab while you read:
- 🔧 Regex Tester — try it free in your browser.
- 🔧 JSON Formatter — try it free in your browser.
- 🔧 URL Encoder & Decoder — try it free in your browser.
- 🧰 Browse all 60+ free tools →
Regex looks cryptic but reduces to a small alphabet of character classes, quantifiers, anchors,
groups, and lookarounds. Memorize the cheat sheet, copy patterns from the library above, anchor
your validators with ^ and $, avoid nested unbounded quantifiers, and always test
against edge cases in our Regex Tester before shipping.

