Java Regex Syntax Reference | W3Docs Learn Java

A regular expression is a small pattern language for describing text. Java implements it in the java.util.regex package, and the syntax is the Perl-style dialect shared by most modern languages — with one Java-specific wrinkle: every backslash in a pattern must be doubled in a Java string literal, because the compiler eats one before the regex engine ever sees it. This chapter is a reference to that syntax: the building blocks you combine to match, search, and validate text.

If you are new to regular expressions in Java, start with the Java Regex Introduction, then come back here when you need the full syntax cheat sheet.

Literals and metacharacters

Most characters in a pattern match themselves: cat matches the three letters c, a, t. The power comes from metacharacters — characters with special meaning that you combine into rules. The twelve that the engine treats specially are:

. ^ $ * + ? ( ) [ ] { } | \

To match one of these literally, escape it with a backslash. Remember the double-backslash rule for Java source: a regex \. is written "\\." in code.

Pattern.matches("a.c", "abc");   // true  — '.' matches any char
Pattern.matches("a.c", "a.c");   // true  — '.' also matches a literal dot
Pattern.matches("a\\.c", "abc"); // false — '\.' matches ONLY a literal dot
Pattern.matches("a\\.c", "a.c"); // true

Character classes

A character class in square brackets matches any one character from a set. Ranges use a hyphen, and a leading ^ negates the set.

"[aeiou]"     // any one lowercase vowel
"[a-z]"       // any one lowercase letter
"[A-Za-z0-9]" // any letter or digit
"[^0-9]"      // any character that is NOT a digit

Java also offers predefined classes as shorthand. These are the ones you reach for constantly:

Shorthand	Equivalent	Matches
`.`	—	Any character except a line terminator
`\d`	`[0-9]`	A digit
`\D`	`[^0-9]`	A non-digit
`\w`	`[a-zA-Z0-9_]`	A word character
`\W`	`[^a-zA-Z0-9_]`	A non-word character
`\s`	`[ \t\n\x0B\f\r]`	A whitespace character
`\S`	`[^\s]`	A non-whitespace character

The uppercase form is always the negation of the lowercase form. See Java Regex Character Classes for the full set, including POSIX and Unicode classes.

Quantifiers: greedy, reluctant, possessive

A quantifier says how many times the preceding element may repeat. By default quantifiers are greedy — they grab as much as possible, then back off if the rest of the pattern needs it. Add ? to make a quantifier reluctant (match as little as possible), or + to make it possessive (grab and never give back).

Quantifier	Meaning
`*`	Zero or more
`+`	One or more
`?`	Zero or one (optional)
`{n}`	Exactly `n`
`{n,}`	At least `n`
`{n,m}`	Between `n` and `m`

"\\d{3}"     // exactly three digits
"\\d{2,4}"   // two to four digits
"a+"         // one or more 'a'
"colou?r"    // matches "color" and "colour"
"<.+>"       // greedy:    on "<a><b>" matches the whole "<a><b>"
"<.+?>"      // reluctant: on "<a><b>" matches just "<a>"

For a deeper look at greedy, reluctant, and possessive behavior, read Java Regex Quantifiers.

Anchors, boundaries, and alternation

Anchors match a position, not a character. ^ is the start of input (or line, in multiline mode), $ is the end, and \b is a word boundary — the zero-width spot between a \w and a \W. Alternation with | matches either side.

"^Hello"      // "Hello" only at the start
"\\.txt$"     // ".txt" only at the end
"\\bcat\\b"   // "cat" as a whole word, not inside "category"
"cat|dog"     // "cat" or "dog"
"^(cat|dog)$" // the whole string is exactly "cat" or "dog"

Note that | has very low precedence: ^cat|dog$ means (^cat)|(dog$), not ^(cat|dog)$. Wrap alternatives in a group when you want anchors to apply to both.

Groups, backreferences, and inline flags

Parentheses create a capturing group — the engine remembers what each group matched, numbered left to right starting at 1. (?:...) is a non-capturing group when you only need to apply a quantifier. A backreference \1 matches the same text the first group captured. Inline flags like (?i) change matching behavior without a separate Pattern.compile flag.

"(\\d{4})-(\\d{2})"   // group 1 = year, group 2 = month
"(?:ab)+"             // repeats "ab" without capturing it
"(\\w+) \\1"          // a word followed by itself ("the the")
"(?i)java"            // case-insensitive: matches "Java", "JAVA"
"(?m)^line"           // multiline: ^ matches at each line start

Capturing groups have their own chapter — Java Regex Groups covers named groups and how to read captured text out of a Matcher. Inline flags such as (?i) and (?m) are the in-pattern equivalents of the Pattern.compile flags described in Java Regex Flags.

A worked example: the constructs in action

This program exercises a digit class with a quantifier, anchored alternation, a backreference, greedy versus reluctant matching, the \w+ shorthand, and an inline case-insensitive flag — all against java.util.regex only. The syntax here drives the Pattern and Matcher API covered in Java Regex Pattern and Matcher.

java— editable, runs on the server

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexSyntaxDemo {
  public static void main(String[] args) {
    // 1. A character class plus a quantifier: \d{4} matches exactly four digits.
    Pattern year = Pattern.compile("\\d{4}");
    Matcher ym = year.matcher("Released in 1995, remastered 2011.");
    System.out.print("years found:");
    while (ym.find()) System.out.print(" " + ym.group());
    System.out.println();

// 2. Anchors and alternation: ^ and $ bind to the whole input.
    System.out.println("^cat|dog$ on 'dog' : " + Pattern.matches("cat|dog", "dog"));
    System.out.println("^cat|dog$ on 'catnap' : " + Pattern.matches("cat|dog", "catnap"));

// 3. Groups capture; backreferences match what a group captured.
    Matcher dbl = Pattern.compile("\\b(\\w+) \\1\\b").matcher("the the end is is near");
    System.out.print("doubled words:");
    while (dbl.find()) System.out.print(" " + dbl.group(1));
    System.out.println();

// 4. Greedy vs. reluctant quantifiers on the same input.
    String html = "<a><b>";
    System.out.println("greedy <.+>  : " + first("<.+>", html));
    System.out.println("lazy   <.+?> : " + first("<.+?>", html));

// 5. Predefined class \w vs. its negation \W, and shorthand counts.
    Matcher words = Pattern.compile("\\w+").matcher("ab, cd-ef!");
    int n = 0;
    while (words.find()) n++;
    System.out.println("\\w+ tokens   : " + n);

// 6. Case-insensitive flag via an inline (?i) construct.
    System.out.println("(?i)java on 'JAVA': " + Pattern.matches("(?i)java", "JAVA"));
  }

static String first(String regex, String input) {
    Matcher m = Pattern.compile(regex).matcher(input);
    return m.find() ? m.group() : "(no match)";
  }
}

What to take from the run:

\d{4} found both 1995 and 2011 because find() scans for every match in the input, while a class-plus-quantifier (\d repeated {4} times) is the canonical way to match a fixed-width field. The doubled backslash in "\\d{4}" is the Java string literal producing the single-backslash regex the engine wants.
Pattern.matches("cat|dog", "dog") returned true but the same pattern on "catnap" returned false — matches() implicitly anchors the whole input, so even though cat appears in catnap, the trailing nap is left unmatched and the overall match fails.
The backreference \1 turned (\w+) \1 into "a word followed by the identical word," which is why it reported the and is — the two stutters — and ignored every word that was not immediately repeated. Backreferences match captured text, not the pattern again.
On the same <a><b> input, greedy <.+> swallowed the entire string while reluctant <.+?> stopped at the first >, yielding just <a>. This single contrast is the most common regex bug fix you will ever make: add ? to a quantifier when it grabs too much.
\w+ counted 3 tokens in ab, cd-ef! — ab, cd, and ef — because ,, -, and ! are all \W (non-word) characters that break a run of word characters. The inline (?i) flag then matched java against JAVA, showing flags can live inside the pattern itself rather than only in Pattern.compile.

Practice

On the input '<a><b>', why does the regex '<.+>' match the whole string '<a><b>' while '<.+?>' matches only '<a>'?

'.+' is greedy and consumes as much as possible before backtracking, while '.+?' is reluctant and stops at the first character that lets the rest of the pattern matchThe '?' in '.+?' makes the dot optional, so it matches fewer characters by skipping some'<.+>' uses a capturing group and '<.+?>' does not, which changes how much text is returnedThe two patterns are equivalent, and the difference in output is non-deterministic