Skip to content
Back to Blog
TutorialsMarch 22, 2026·12 min read

The Ultimate Guide to Regular Expressions

Master regex patterns from beginner to advanced. Learn email validation, URL matching, phone number parsing, and more with practical examples.

Code editor showing programming syntax highlighting

Regular expressions (regex) are the industry-standard tool for searching, validating, and transforming text. Once you learn the dozen or so building blocks — character classes, quantifiers, anchors, groups, and lookarounds — you can solve in one line what would otherwise take dozens of lines of procedural code. This guide takes you from zero to advanced patterns with copy-paste-ready snippets for email, URL, phone, IP, hex color, ISO date, JWT, and more — all testable in our Regex Tester.

What Is a Regular Expression?

A regular expression is a pattern describing a set of strings. The math behind it goes back to Stephen Cole Kleene in 1951, but every modern language ships a regex engine: JavaScript (RegExp), Python (re), Java, .NET, Go, Rust, PHP (PCRE), Perl, and even shell tools like grep, sed, and awk.

📖 Definition — Regex answers the question "does this string match this pattern, and if so, where?" Engines also support capture groups for extraction and replacement for transformation.

Core Syntax Cheat Sheet

CategoryTokenMeaningExample
Character class[abc]a, b, or c[aeiou]
[a-z]range[A-Za-z0-9]
[^abc]negation[^0-9]
Shorthand\ddigit ([0-9])\d{4}
\wword char ([A-Za-z0-9_])\w+
\swhitespace\s*
.any char (except \n by default)a.b
Anchor^start^Hello
$end!$
\bword boundary\bword\b
Quantifier*0 or morea*
+1 or more\d+
?0 or 1 (optional)colou?r
{n}exactly n\d{4}
{n,m}n to m\d{2,5}
Group(...)capturing(\d+)
(?:...)non-capturing(?:abc){2}
(?<name>...)named capture(?<year>\d{4})
Alternationa|ba or bcat|dog
\xescape special char\.

Flags That Change Behavior

FlagEffectJSPython
gfind all matches, not just firstuse findall
icase-insensitivere.IGNORECASE
mmultiline (^/$ per line)re.MULTILINE
sdotall (. matches \n)re.DOTALL
ufull Unicode + named propsdefault in Py3
xverbose (whitespace + comments)re.VERBOSE

Practical Patterns Library

Use CasePatternNotes
Email (pragmatic)^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$Covers ~99% of real addresses; not RFC 5322 complete
HTTPS URL^https?:\/\/[\w.-]+(?:\:\d+)?(?:\/[^\s]*)?$Tighten with allowed-TLD list if needed
US phone^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$Use libphonenumber for international
IPv4 (strict 0–255)^((25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(25[0-5]|2[0-4]\d|1?\d\d?)$Validates each octet
Hex color^#(?:[0-9A-Fa-f]{3,4}|[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$Supports 3/4/6/8 forms
ISO 8601 date^\d{4}-\d{2}-\d{2}(?:T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+\-]\d{2}:\d{2})?)?$Date or full date-time
UUID v4^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$Strict v4
Slug^[a-z0-9]+(?:-[a-z0-9]+)*$SEO-safe URL slug
JWT^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$Three Base64URL parts
Credit card (Visa)^4\d{12}(?:\d{3})?$13 or 16 digits
Strong password^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w]).{12,}$Requires upper/lower/digit/symbol, 12+ chars

Drop any of these into the Regex Tester, paste your sample text, and watch matches highlight in real time. The tool also exports working JavaScript and Python snippets.

Groups, Captures & Backreferences

# Pattern: (\w+) (\w+)
# Input:   John Smith

Group 0 (full match) = "John Smith"
Group 1              = "John"
Group 2              = "Smith"

# Backreference (\w+)\s+\1 finds repeated words like "the the".
# Named: (?<first>\w+)\s+(?<last>\w+)
#         JS:   match.groups.first
#         PY:   m.group("first")

Lookahead & Lookbehind

Zero-width assertions match a position without consuming characters:

SyntaxMeaningExample
(?=...)positive lookahead\d+(?= dollars)
(?!...)negative lookahead\d+(?! cents)
(?<=...)positive lookbehind(?<=\$)\d+
(?<!...)negative lookbehind(?<!\\)"

Search & Replace

# JavaScript
"John Smith".replace(/(\w+) (\w+)/, "$2, $1");   // "Smith, John"

# Python
import re
re.sub(r"(\w+) (\w+)", r"\2, \1", "John Smith")  // "Smith, John"

# Named groups (JS):
"2026-01-15".replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, "$<d>/$<m>/$<y>");
//   "15/01/2026"

Performance & Catastrophic Backtracking

Naive patterns can blow up exponentially. The classic offender is nested quantifiers like (a+)+b on input aaaaaaaaaaaaaaaaaaaaa! — the engine tries every partition before giving up.

🚫 Avoid nested unbounded quantifiers over the same character class ((a+)+, (\d*)*). They turn linear input into exponential CPU, a vector for ReDoS denial-of-service attacks.

Prefer possessive quantifiers or atomic groups (where supported) to suppress backtracking.

Use specific character classes instead of . — e.g., [^"]* inside quoted strings.

Anchor when possible (^, $) to bound the search.

Use RE2-based engines (Go, Rust's regex crate) for guaranteed linear-time matching on untrusted input.

Engine Differences (JS, Python, PCRE)

FeatureJS (V8)Python rePCRE / PerlRE2 / Go
Lookbehind✅ (variable-width)✅ (fixed-width)
Named groups(?<n>)(?P<n>)
Backreferences
Recursion
Linear-time guarantee
Unicode props \p{}✅ (with u)✅ via regexpartial

Common Mistakes

MistakeSymptomFix
Greedy .* across HTMLmatches whole pageUse .*? (lazy) or specific class
Forgetting to escape .matches any char\. for literal dot
Missing ^/$partial validation passesAnchor full-string validators
Parsing HTML/JSON with regexnested structures breakUse a real parser (DOM/JSON.parse)
Hardcoding TLD lengthfails for .museum, .travelUse {2,}
Building from user inputReDoS denial-of-serviceSanitize, time out, or use RE2

Tools

  • 🔧 Regex Tester — Live highlighting, group inspection, JS & Python code export.
  • 🔧 JSON Formatter — Use after extracting data with capture groups.
  • 🔧 URL Encoder — Encode regex output for query strings.

Frequently Asked Questions

Is there a regex that fully validates email addresses?

Strictly speaking, an RFC 5322-conformant email regex spans hundreds of characters and almost no one uses it. The pragmatic ^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$ covers ~99% of real-world addresses and is what most form validators use.

Should I use regex to parse HTML?

No. HTML is a context-free language and regex is fundamentally regular. Use the DOM parser (DOMParser, BeautifulSoup, cheerio). Regex is fine for tiny extraction tasks but breaks on real markup.

What is "catastrophic backtracking"?

When an engine explores exponentially many combinations to fail a match. Caused by patterns like (a+)+. It can freeze servers — the OWASP term is ReDoS. Use atomic groups, RE2 engines, or input-size limits to defend.

How do I make my regex case-insensitive?

Use the i flag in JavaScript (/foo/i) and Perl/PCRE, re.IGNORECASE in Python. Inline modifier (?i) works in many flavors.

What does the dot (.) actually match?

Any single character except a newline by default. With the s (dotall) flag, it also matches \n. Always specify the character class explicitly when you want full coverage.

Are regex engines all the same?

No. JS, Python, PCRE/Perl, .NET, Go, Rust, and POSIX BRE/ERE all differ. Lookbehind, recursion, and named-group syntax vary. Test with the engine your code will run on.


References

🚀 Free ToolZilla tools used in this article

All client-side, no signup, no upload — open them in a new tab while you read:


Regex looks cryptic but reduces to a small alphabet of character classes, quantifiers, anchors, groups, and lookarounds. Memorize the cheat sheet, copy patterns from the library above, anchor your validators with ^ and $, avoid nested unbounded quantifiers, and always test against edge cases in our Regex Tester before shipping.

Continue Reading

Related Articles

Free & Private

Explore Our Free Tools

40+ browser-based utilities — fast, private, and always free. No sign-up required.

Browse All Tools