🧩 Regex Cheatsheet & Builder

Last updated: June 16, 2026

🧩 Regex Cheatsheet & Builder

Click patterns to insert them into the builder. Test your regex live against sample text.

Common Pattern Building Blocks
Pattern Builder
/ /
Flags: g=global  i=case-insensitive  m=multiline
Token Reference
Copied to clipboard!

12 Regex Patterns Every Developer Should Have Memorised (and How to Build Your Own)

Regex is one of those skills that separates developers who spend ten minutes on a text-parsing problem from those who spend two hours. The patterns themselves are not complicated — they follow a surprisingly small set of rules. What makes regex feel hard is the sheer density of symbols packed into one line. This guide cuts through that noise. You will get the most useful patterns ready to copy, an explanation of how each one actually works, and a mental model for building new ones from scratch.

1. The Email Pattern That Actually Works (Mostly)

The real RFC 5321 specification for valid email addresses is notoriously bizarre — technically user+"quoted string"@[127.0.0.1] is valid. For practical web validation, this pattern covers 99% of real addresses:

[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}

Breaking it down: the first character class [a-zA-Z0-9._%+\-]+ matches one or more alphanumeric characters plus the symbols commonly found in the local part. The literal @ divides it. The domain portion matches a hostname with dots, and \.[a-zA-Z]{2,} requires a TLD of at least two letters. It handles .co.uk and .photography equally well.

2. IPv4 Address With Range Validation

A naive pattern like \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} will match 999.999.999.999. The correct pattern validates each octet's numeric range:

(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)

The alternation inside each octet group handles three cases: 250–255, 200–249, and 0–199. This is a great example of using alternation (|) to encode numeric constraints that character classes alone cannot express.

3. ISO Date (YYYY-MM-DD) With Month Validation

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

The month group (?:0[1-9]|1[0-2]) prevents month 13 or 00. The day group handles days 01–09, 10–29, and 30–31. Note that this cannot validate February 30th — that requires logic beyond regex, which is worth knowing so you do not waste time trying.

4. URL Matching — Greedy Enough to Be Useful

https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9]{2,6}(?:\/[^\s]*)?

The s? makes https and http both valid. The optional (?:www\.)? handles subdomains gracefully. The path segment (?:\/[^\s]*)? uses a negative class — "everything that is not whitespace" — which is far more practical than trying to enumerate valid URL characters.

5. US Phone Number — Multiple Formats at Once

(?:\+?1[\s\-.])??\(?\d{3}\)?[\s\-.]?\d{3}[\s\-.]?\d{4}

Users enter phone numbers in a dozen different ways: (555) 123-4567, 555.123.4567, +1-555-123-4567. This pattern uses character classes like [\s\-.] to accept any reasonable separator and wraps optional parts in (?:...)?. The trick is to be permissive in the separator position but strict about digit counts.

6. Hex Color Codes — Both 3 and 6 Digit Forms

#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})\b

The alternation tries 6 digits first, then 3. The \b word boundary prevents matching #ffffff1. Adding the i flag lets you drop the A-F from the character class, simplifying it to #(?:[0-9a-f]{3}|[0-9a-f]{6})\b.

7. JWT Token Structure

[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+

A JWT is always three Base64url-encoded segments separated by dots. This pattern validates the structure — three non-empty segments using the Base64url alphabet (which uses - and _ instead of + and /). It will not verify the signature cryptographically, but it will catch malformed tokens before you bother decoding them.

8. Lookaheads — The Regex Superpower Most Developers Skip

Lookaheads match a position, not characters. They do not consume input. This makes them invaluable for complex validations like password rules:

^(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%]).{8,}$

Each (?=.*[A-Z]) scans forward from the current position asking "does this string contain an uppercase letter somewhere?" The three lookaheads all run from the same starting position (the start anchor ^), and then .{8,} actually consumes the characters. You can stack as many lookaheads as you need without changing what gets matched.

9. Named Capturing Groups — Readable Extraction

Instead of referencing groups by number (\1, \2), name them:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript you access them via match.groups.year. This makes regex that parses structured data dramatically more maintainable, especially when groups are reordered later.

10. Non-Greedy Quantifiers Save You From Matching Too Much

Consider extracting content between HTML tags. Greedy <b>(.*)</b> on <b>bold</b> text <b>more</b> matches from the first <b> to the last </b>. Lazy <b>(.*?)</b> correctly matches each pair. The ? after a quantifier changes its behaviour from "match as much as possible" to "match as little as possible".

11. The Slug Pattern — URL-Safe Strings

^[a-z0-9]+(?:-[a-z0-9]+)*$

This enforces the common CMS convention: lowercase letters and numbers, with hyphens as separators but never at the start or end, never doubled. The non-capturing group (?:-[a-z0-9]+)* allows zero or more hyphenated segments after the first word.

12. Backreferences for Duplicate Detection

\b(\w+)\s+\1\b

The group (\w+) captures a word. \1 matches the exact same text again. This classic pattern finds repeated words like "the the" in text. Backreferences can also validate matching HTML tags: <([a-z]+)[^>]*>.*?<\/\1> ensures the closing tag matches the opening one.

Building Your Own Patterns: A Mental Model

Start from the structure, not the characters. Ask: what are the distinct sections of what I'm matching? Define each section as either a fixed literal, a character class, or a quantified group. Then add anchors (^ and $) only if you need the entire string to match rather than a substring. Use non-capturing groups (?:...) by default and only upgrade to capturing groups when you actually need to extract a value. Test with both valid and invalid inputs — a regex that accepts everything it should is only half done.

The patterns above cover the vast majority of practical validation and extraction tasks. With the builder above, you can click any building block, combine patterns, and test instantly against real input — which is the fastest way to develop an intuition that no amount of reading can fully replace.

FAQ

What is the difference between a greedy and a lazy quantifier?
A greedy quantifier like * or + tries to match as many characters as possible, then backtracks if needed. A lazy quantifier like *? or +? matches as few characters as possible and expands only if the rest of the pattern fails to match. For example, <b>(.*)<\/b> against '<b>one<\/b> and <b>two<\/b>' matches everything from the first tag to the last closing tag (greedy), while <b>(.*?)<\/b> correctly finds two separate matches (lazy).
How do lookaheads and lookbehinds work, and do they consume characters?
Lookaheads and lookbehinds are zero-width assertions — they check for a condition at the current position without advancing the match cursor or consuming any characters. A positive lookahead (?=abc) means 'the next characters must be abc, but do not include them in the match'. A negative lookahead (?!abc) means 'the next characters must NOT be abc'. Lookbehinds do the same thing but look at what came before the current position. They are commonly stacked at the start of a pattern to enforce multiple conditions on the same string simultaneously.
Why does my email regex reject valid addresses or accept invalid ones?
Email validation with regex is inherently approximate. The full RFC 5321 specification allows quoted strings, comments, IP address literals in brackets, and other forms that almost never appear in practice. Most production systems use a simple regex that accepts any reasonable-looking address (local part + @ + domain + TLD) and then verify the address by sending a confirmation email. A regex that is too strict rejects real users; one that is too loose lets garbage through. The pattern included in this builder covers nearly all real-world addresses while ignoring exotic edge cases.
What regex flags should I use and when?
The most commonly needed flags are: g (global) to find all matches instead of stopping after the first; i (case-insensitive) so you do not need to write [a-zA-Z] everywhere; m (multiline) to make ^ and $ match the start and end of each line rather than the whole string; and s (dotall) to make the dot . match newline characters as well. In JavaScript, flags are passed as the second argument to new RegExp() or as the suffix in a literal like /pattern/gi. Combining g and i is very common for search-and-replace operations.
How do named capturing groups work and how do I access them?
Named capturing groups use the syntax (?<name>...) and let you reference a captured value by a readable label instead of a numeric index. In JavaScript, after calling string.match() or regexp.exec(), the captured values are available on the .groups property of the result object — for example result.groups.year for a group named year. Named groups make complex patterns far easier to maintain because renaming or reordering groups does not break references elsewhere in your code.
Can regex validate nested structures like HTML or JSON?
Not reliably. Regular expressions describe regular languages, which by definition cannot handle arbitrarily nested or recursive structures. HTML and JSON both have nesting that can go to any depth, which requires a pushdown automaton (i.e., a proper parser) to handle correctly. Regex can extract simple, non-nested fragments from HTML — like the content of a single tag — but it will fail on nested tags of the same type. For serious HTML or JSON processing, use a dedicated parser such as the browser's DOMParser for HTML or JSON.parse() for JSON. Use regex only for pre-validation or extracting known-flat patterns.