Regular Expression Helper

Enter a regex pattern and optional test string to see matches and a token-by-token explanation of what your regex does.

Related Tools

Frequently Asked Questions

What is a lookahead in regex?

A positive lookahead (?=...) asserts that the pattern inside must follow the current position, without consuming characters. \w+(?=@) matches the username part of an email without including the @ sign in the match.

What is the difference between greedy and lazy quantifiers?

Greedy quantifiers (*, +, {n,}) match as much as possible. Lazy/non-greedy variants (*?, +?, {n,}?) match as little as possible. For <.+> on <b>text</b>, greedy matches the whole string; lazy matches just <b>.

How do I match a literal dot in regex?

A bare . in regex matches any character except newline. To match a literal period, escape it with a backslash: \. So to match a URL like example.com, use example\.com.

Regular Expressions: A Practical Guide for Developers

Regular expressions — often called regex or regexp — are sequences of characters that define a search pattern for matching, locating, and manipulating text. They are among the most powerful and universally available tools in a programmer's toolkit, supported natively in nearly every programming language and built into command-line tools like grep, sed, and awk. Despite their cryptic appearance at first glance, regular expressions follow a consistent logic, and mastering them transforms what would be tedious string manipulation tasks into concise, expressive one-liners.

The Building Blocks of Regex

Regular expression syntax is built from a small set of core constructs. Literal characters match themselves — the pattern hello matches the string "hello" anywhere it appears. The dot (.) is a wildcard that matches any single character except a newline. Character classes enclosed in square brackets ([aeiou]) match any one character from the set. Negated character classes ([^aeiou]) match any character not in the set. Shorthand character classes cover common sets: \d matches any digit, \w matches any word character (letter, digit, or underscore), \s matches any whitespace character, and their uppercase equivalents (\D, \W, \S) match the inverse.

Quantifiers control how many times a pattern element repeats. The asterisk (*) means zero or more times, the plus (+) means one or more times, the question mark (?) means zero or one time (making the element optional), and curly braces specify exact ranges: {3} means exactly 3, {2,5} means 2 to 5 times, {3,} means 3 or more times. Anchors constrain where in the string a match can occur: ^ matches the start of the string (or line in multiline mode), $ matches the end, \b matches a word boundary, and \B matches a non-word boundary.

Groups, Alternation, and Backreferences

Parentheses create capturing groups that serve two purposes: they group pattern elements for quantifiers (colou?r matches both "color" and "colour" but (ha)+ matches "hahaha"), and they capture the matched text for use in replacements or subsequent matches. Backreferences allow a pattern to reference previously captured groups within the same pattern — the pattern (\w+)\s+\1 matches any word that appears twice in a row separated by whitespace, because \1 refers to whatever was captured by the first group.

Non-capturing groups (?:...) provide grouping without capturing — useful when you need grouping for quantifiers but don't need to reference the match. Named capturing groups (?P<name>...) in Python or (?<name>...) in JavaScript and .NET assign names to captured groups, making replacement patterns more readable. The alternation operator (|) matches either the pattern to its left or the pattern to its right: cat|dog matches either "cat" or "dog". Alternation has low precedence, so a|b|c is correct but you often need grouping: gr(a|e)y matches both "gray" and "grey".

Practical Regex Patterns for Common Tasks

Email validation with regex is notoriously tricky — the full RFC 5322 email specification is extremely complex — but a practical approximation for most use cases is [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}. URL matching is similarly complex but (?:https?:\/\/)?[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!

amp;'()*+,;=]* covers the most common cases. Phone number patterns vary by country, but a flexible US phone pattern is \(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4}).

For developers, the most frequently useful regex patterns are often simpler: finding all occurrences of a function call in source code (functionName\([^)]*\)), validating that a string is a valid hex color (#[0-9A-Fa-f]{3}(?:[0-9A-Fa-f]{3})?), extracting numbers from a string (\d+(?:\.\d+)?), stripping HTML tags (<[^>]+>), or matching a specific log format. Building up a personal library of tested regex patterns for your domain reduces the time spent reinventing these patterns from scratch.

Greedy vs. Lazy Matching

By default, regex quantifiers are greedy — they match as much as possible while still allowing the overall pattern to match. The pattern <.+> applied to the string <b>hello</b> will match the entire string from the first < to the last >, not just the first tag. To get lazy (non-greedy) matching that takes as few characters as possible, append a ? to the quantifier: <.+?> matches each tag individually. Understanding the greedy/lazy distinction is essential for working with HTML, XML, JSON, and any structured text where delimiters appear multiple times.

Catastrophic backtracking is a performance trap in regular expressions where a poorly constructed pattern causes the regex engine to try exponentially many combinations before determining a match fails. The classic example is patterns like (a+)+ applied to a string that almost matches but doesn't. For security-sensitive applications where users can supply regex patterns (like search functionality), this can be exploited for ReDoS (Regular Expression Denial of Service) attacks. Testing regex patterns against adversarial inputs and using possessive quantifiers or atomic groups when available helps prevent these issues.

Regex Across Programming Languages

While regex syntax is largely consistent across languages, there are important dialects and differences to be aware of. JavaScript regex is built into the language syntax (/pattern/flags), supports lookaheads and lookbehinds (ES2018+), and lacks some features available in other languages. Python's re module follows PCRE (Perl-Compatible Regular Expressions) conventions and offers named groups and verbose mode (re.VERBOSE) for commenting complex patterns. Grep on the command line uses POSIX basic regex by default and POSIX extended regex with -E, with GNU grep adding Perl regex support via -P. SQL databases have varying regex support: PostgreSQL has full PCRE support, MySQL supports basic patterns, and SQLite has limited regex capability. Always check the regex documentation for the specific language or tool you are using, as assuming cross-language portability can lead to subtle bugs.