Regular Expression Helper
Enter a regex pattern and optional test string to see matches and a token-by-token explanation of what your regex does.
Related Tools
Frequently Asked Questions
What is a lookahead in regex?
A positive lookahead (?=...) asserts that the pattern inside must follow the current position, without consuming characters. \w+(?=@) matches the username part of an email without including the @ sign in the match.
What is the difference between greedy and lazy quantifiers?
Greedy quantifiers (*, +, {n,}) match as much as possible. Lazy/non-greedy variants (*?, +?, {n,}?) match as little as possible. For <.+> on <b>text</b>, greedy matches the whole string; lazy matches just <b>.
How do I match a literal dot in regex?
A bare . in regex matches any character except newline. To match a literal period, escape it with a backslash: \. So to match a URL like example.com, use example\.com.
Regular Expressions: A Practical Guide for Developers
Regular expressions — often called regex or regexp — are sequences of characters that define a search pattern for matching, locating, and manipulating text. They are among the most powerful and universally available tools in a programmer's toolkit, supported natively in nearly every programming language and built into command-line tools like grep, sed, and awk. Despite their cryptic appearance at first glance, regular expressions follow a consistent logic, and mastering them transforms what would be tedious string manipulation tasks into concise, expressive one-liners.
The Building Blocks of Regex
Regular expression syntax is built from a small set of core constructs. Literal characters match themselves — the pattern hello matches the string "hello" anywhere it appears. The dot (.) is a wildcard that matches any single character except a newline. Character classes enclosed in square brackets ([aeiou]) match any one character from the set. Negated character classes ([^aeiou]) match any character not in the set. Shorthand character classes cover common sets: \d matches any digit, \w matches any word character (letter, digit, or underscore), \s matches any whitespace character, and their uppercase equivalents (\D, \W, \S) match the inverse.
Quantifiers control how many times a pattern element repeats. The asterisk (*) means zero or more times, the plus (+) means one or more times, the question mark (?) means zero or one time (making the element optional), and curly braces specify exact ranges: {3} means exactly 3, {2,5} means 2 to 5 times, {3,} means 3 or more times. Anchors constrain where in the string a match can occur: ^ matches the start of the string (or line in multiline mode), $ matches the end, \b matches a word boundary, and \B matches a non-word boundary.
Groups, Alternation, and Backreferences
Parentheses create capturing groups that serve two purposes: they group pattern elements for quantifiers (colou?r matches both "color" and "colour" but (ha)+ matches "hahaha"), and they capture the matched text for use in replacements or subsequent matches. Backreferences allow a pattern to reference previously captured groups within the same pattern — the pattern (\w+)\s+\1 matches any word that appears twice in a row separated by whitespace, because \1 refers to whatever was captured by the first group.
Non-capturing groups (?:...) provide grouping without capturing — useful when you need grouping for quantifiers but don't need to reference the match. Named capturing groups (?P<name>...) in Python or (?<name>...) in JavaScript and .NET assign names to captured groups, making replacement patterns more readable. The alternation operator (|) matches either the pattern to its left or the pattern to its right: cat|dog matches either "cat" or "dog". Alternation has low precedence, so a|b|c is correct but you often need grouping: gr(a|e)y matches both "gray" and "grey".
Practical Regex Patterns for Common Tasks
Email validation with regex is notoriously tricky — the full RFC 5322 email specification is extremely complex — but a practical approximation for most use cases is [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}. URL matching is similarly complex but (?:https?:\/\/)?[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!