The Ultimate Guide to Regular Expressions

Q: What is catastrophic backtracking?

When a regex engine explores exponentially many combinations trying to fail a match. It is caused by patterns like (a+)+. It can freeze servers and is known as ReDoS. Use atomic groups, RE2-based engines, or input-size limits to defend.

Q: How do I make my regex case-insensitive?

Use the i flag in JavaScript and Perl/PCRE, re.IGNORECASE in Python. The inline modifier (?i) works in many flavors.

Q: What does the dot actually match?

Any single character except a newline by default. With the s (dotall) flag, it also matches the newline character. Always specify the character class explicitly when you want full coverage.

Q: Are all regex engines the same?

No. JavaScript, Python, PCRE/Perl, .NET, Go, Rust, and POSIX BRE/ERE all differ. Lookbehind, recursion, and named-group syntax vary. Always test with the engine your code will actually run on.

Regular expressions (regex) are the industry-standard tool for searching, validating, and transforming text. Once you learn the dozen or so building blocks — character classes, quantifiers, anchors, groups, and lookarounds — you can solve in one line what would otherwise take dozens of lines of procedural code. This guide takes you from zero to advanced patterns with copy-paste-ready snippets for email, URL, phone, IP, hex color, ISO date, JWT, and more — all testable in our Regex Tester.

📑 Table of Contents

What Is a Regular Expression?
Core Syntax Cheat Sheet
Flags That Change Behavior
Practical Patterns Library
Groups, Captures & Backreferences
Lookahead & Lookbehind
Search & Replace
Performance & Catastrophic Backtracking
Engine Differences (JS, Python, PCRE)
Common Mistakes
Tools
FAQ
References

What Is a Regular Expression?

A regular expression is a pattern describing a set of strings. The math behind it goes back to Stephen Cole Kleene in 1951, but every modern language ships a regex engine: JavaScript (RegExp), Python (re), Java, .NET, Go, Rust, PHP (PCRE), Perl, and even shell tools like grep, sed, and awk.

📖 Definition — Regex answers the question "does this string match this pattern, and if so, where?" Engines also support capture groups for extraction and replacement for transformation.

Core Syntax Cheat Sheet

Category	Token	Meaning	Example
Character class	`[abc]`	a, b, or c	`[aeiou]`
	`[a-z]`	range	`[A-Za-z0-9]`
	`[^abc]`	negation	`[^0-9]`
Shorthand	`\d`	digit (`[0-9]`)	`\d{4}`
	`\w`	word char (`[A-Za-z0-9_]`)	`\w+`
	`\s`	whitespace	`\s*`
	`.`	any char (except `\n` by default)	`a.b`
Anchor	`^`	start	`^Hello`
	`$`	end	`!$`
	`\b`	word boundary	`\bword\b`
Quantifier	`*`	0 or more	`a*`
	`+`	1 or more	`\d+`
	`?`	0 or 1 (optional)	`colou?r`
	`{n}`	exactly n	`\d{4}`
	`{n,m}`	n to m	`\d{2,5}`
Group	`(...)`	capturing	`(\d+)`
	`(?:...)`	non-capturing	`(?:abc){2}`
	`(?<name>...)`	named capture	`(?<year>\d{4})`
Alternation	`a\|b`	a or b	`cat\|dog`
Alternation	`\`x	escape special char	`\.`

Flags That Change Behavior

Flag	Effect	JS	Python
`g`	find all matches, not just first	✅	use `findall`
`i`	case-insensitive	✅	`re.IGNORECASE`
`m`	multiline (`^/$` per line)	✅	`re.MULTILINE`
`s`	dotall (`.` matches `\n`)	✅	`re.DOTALL`
`u`	full Unicode + named props	✅	default in Py3
`x`	verbose (whitespace + comments)	—	`re.VERBOSE`

Practical Patterns Library

Use Case	Pattern	Notes
Email (pragmatic)	`^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$`	Covers ~99% of real addresses; not RFC 5322 complete
HTTPS URL	`^https?:\/\/[\w.-]+(?:\:\d+)?(?:\/[^\s]*)?$`	Tighten with allowed-TLD list if needed
US phone	`^$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}$`	Use `libphonenumber` for international
IPv4 (strict 0–255)	`^((25[0-5]\|2[0-4]\d\|1?\d\d?)\.){3}(25[0-5]\|2[0-4]\d\|1?\d\d?)$`	Validates each octet
Hex color	`^#(?:[0-9A-Fa-f]{3,4}\|[0-9A-Fa-f]{6}\|[0-9A-Fa-f]{8})$`	Supports 3/4/6/8 forms
ISO 8601 date	`^\d{4}-\d{2}-\d{2}(?:T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z\|[+\-]\d{2}:\d{2})?)?$`	Date or full date-time
UUID v4	`^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$`	Strict v4
Slug	`^[a-z0-9]+(?:-[a-z0-9]+)*$`	SEO-safe URL slug
JWT	`^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$`	Three Base64URL parts
Credit card (Visa)	`^4\d{12}(?:\d{3})?$`	13 or 16 digits
Strong password	`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[^\w]).{12,}$`	Requires upper/lower/digit/symbol, 12+ chars

Drop any of these into the Regex Tester, paste your sample text, and watch matches highlight in real time. The tool also exports working JavaScript and Python snippets.

Groups, Captures & Backreferences

# Pattern: (\w+) (\w+)
# Input:   John Smith

Group 0 (full match) = "John Smith"
Group 1              = "John"
Group 2              = "Smith"

# Backreference (\w+)\s+\1 finds repeated words like "the the".
# Named: (?<first>\w+)\s+(?<last>\w+)
#         JS:   match.groups.first
#         PY:   m.group("first")

Lookahead & Lookbehind

Zero-width assertions match a position without consuming characters:

Syntax	Meaning	Example
`(?=...)`	positive lookahead	`\d+(?= dollars)`
`(?!...)`	negative lookahead	`\d+(?! cents)`
`(?<=...)`	positive lookbehind	`(?<=\$)\d+`
`(?<!...)`	negative lookbehind	`(?<!\\)"`

Search & Replace

# JavaScript
"John Smith".replace(/(\w+) (\w+)/, "$2, $1");   // "Smith, John"

# Python
import re
re.sub(r"(\w+) (\w+)", r"\2, \1", "John Smith")  // "Smith, John"

# Named groups (JS):
"2026-01-15".replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, "$<d>/$<m>/$<y>");
//   "15/01/2026"

Performance & Catastrophic Backtracking

Naive patterns can blow up exponentially. The classic offender is nested quantifiers like (a+)+b on input aaaaaaaaaaaaaaaaaaaaa! — the engine tries every partition before giving up.

🚫 Avoid nested unbounded quantifiers over the same character class ((a+)+, (\d*)*). They turn linear input into exponential CPU, a vector for ReDoS denial-of-service attacks.

Prefer possessive quantifiers or atomic groups (where supported) to suppress backtracking.

Use specific character classes instead of . — e.g., [^"]* inside quoted strings.

Anchor when possible (^, $) to bound the search.

Use RE2-based engines (Go, Rust's regex crate) for guaranteed linear-time matching on untrusted input.

Engine Differences (JS, Python, PCRE)

Feature	JS (V8)	Python `re`	PCRE / Perl	RE2 / Go
Lookbehind	✅ (variable-width)	✅ (fixed-width)	✅	❌
Named groups	✅ `(?<n>)`	✅ `(?P<n>)`	✅	✅
Backreferences	✅	✅	✅	❌
Recursion	❌	❌	✅	❌
Linear-time guarantee	❌	❌	❌	✅
Unicode props `\p{}`	✅ (with `u`)	✅ via `regex`	✅	partial

Common Mistakes

Mistake	Symptom	Fix
Greedy `.*` across HTML	matches whole page	Use `.*?` (lazy) or specific class
Forgetting to escape `.`	matches any char	`\.` for literal dot
Missing `^`/`$`	partial validation passes	Anchor full-string validators
Parsing HTML/JSON with regex	nested structures break	Use a real parser (DOM/JSON.parse)
Hardcoding TLD length	fails for `.museum`, `.travel`	Use `{2,}`
Building from user input	ReDoS denial-of-service	Sanitize, time out, or use RE2

Tools

🔧 Regex Tester — Live highlighting, group inspection, JS & Python code export.
🔧 JSON Formatter — Use after extracting data with capture groups.
🔧 URL Encoder — Encode regex output for query strings.

Frequently Asked Questions

Is there a regex that fully validates email addresses?

Strictly speaking, an RFC 5322-conformant email regex spans hundreds of characters and almost no one uses it. The pragmatic ^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$ covers ~99% of real-world addresses and is what most form validators use.

Should I use regex to parse HTML?

No. HTML is a context-free language and regex is fundamentally regular. Use the DOM parser (DOMParser, BeautifulSoup, cheerio). Regex is fine for tiny extraction tasks but breaks on real markup.

What is "catastrophic backtracking"?

When an engine explores exponentially many combinations to fail a match. Caused by patterns like (a+)+. It can freeze servers — the OWASP term is ReDoS. Use atomic groups, RE2 engines, or input-size limits to defend.

How do I make my regex case-insensitive?

Use the i flag in JavaScript (/foo/i) and Perl/PCRE, re.IGNORECASE in Python. Inline modifier (?i) works in many flavors.

What does the dot (.) actually match?

Any single character except a newline by default. With the s (dotall) flag, it also matches \n. Always specify the character class explicitly when you want full coverage.

Are regex engines all the same?

No. JS, Python, PCRE/Perl, .NET, Go, Rust, and POSIX BRE/ERE all differ. Lookbehind, recursion, and named-group syntax vary. Test with the engine your code will run on.

References

📄 MDN — Regular Expressions
📄 Python re module documentation
📄 PCRE2 syntax reference
📄 RE2 syntax wiki (linear-time engine)
📄 Regular-Expressions.info — comprehensive reference
📄 OWASP — ReDoS attack

🚀 Free ToolZilla tools used in this article

All client-side, no signup, no upload — open them in a new tab while you read:

🔧 Regex Tester — try it free in your browser.
🔧 JSON Formatter — try it free in your browser.
🔧 URL Encoder & Decoder — try it free in your browser.
🧰 Browse all 60+ free tools →

Regex looks cryptic but reduces to a small alphabet of character classes, quantifiers, anchors, groups, and lookarounds. Memorize the cheat sheet, copy patterns from the library above, anchor your validators with ^ and $, avoid nested unbounded quantifiers, and always test against edge cases in our Regex Tester before shipping.

The Ultimate Guide to Regular Expressions

What Is a Regular Expression?

Core Syntax Cheat Sheet

Flags That Change Behavior

Practical Patterns Library

Groups, Captures & Backreferences

Lookahead & Lookbehind

Search & Replace

Performance & Catastrophic Backtracking

Engine Differences (JS, Python, PCRE)

Common Mistakes

Tools

Frequently Asked Questions

References

🚀 Free ToolZilla tools used in this article

Related Articles

Top 10 Free Developer Tools You Need in 2026

Why Client-Side Processing Matters for Privacy

How to Optimize Images for Web — Complete Guide

Explore Our Free Tools