W3docs

Java Pattern and Matcher Classes

Compile patterns and match input in Java with the Pattern and Matcher classes.

Java Pattern and Matcher Classes

Regular expressions in Java live in the java.util.regex package, and almost all of that package is two classes: Pattern and Matcher. A Pattern is a compiled regular expression — the rule. A Matcher is the engine that runs that rule against one piece of input and reports what it found. Separating the two lets you compile an expression once and reuse it across thousands of inputs, which is the difference between fast regex code and slow regex code.

Pattern: compile once, reuse forever

Pattern.compile(String regex) parses the regex syntax and returns an immutable, thread-safe Pattern. Compilation is the expensive step, so do it once — typically in a static final field — and share the result. The convenience methods on String (matches, replaceAll, split) recompile the pattern on every call, which is fine for a one-off but wasteful in a loop.

// Good: compile once, reuse
private static final Pattern EMAIL =
    Pattern.compile("[\\w.+-]+@[\\w.-]+\\.[a-z]{2,}");

// Wasteful in a loop: String.matches recompiles every iteration
for (String s : lines) {
  if (s.matches("[\\w.+-]+@[\\w.-]+\\.[a-z]{2,}")) { /* ... */ }
}

Note the doubled backslashes: \\w in Java source is the regex token \w, because the backslash must first survive Java's own string escaping.

Matcher: the stateful engine

A Pattern holds no input and no position — it is just the rule. Calling pattern.matcher(input) produces a Matcher bound to that input, and the Matcher carries all the mutable state: the current search position, the bounds of the last match, and the captured groups. Because it is stateful, a Matcher is not thread-safe; give each thread its own.

MethodWhat it does
matches()Tests whether the entire input matches the pattern
lookingAt()Tests whether the input matches starting at the beginning (need not reach the end)
find()Finds the next match anywhere in the input; returns true and advances
group() / group(n)Returns the whole match, or capture group n
start() / end()Index of the first character of the match, and one past the last
replaceAll(repl)Replaces every match, with $1, $2 back-references to groups
reset()Rewinds the matcher to position zero (optionally with new input)

find() versus matches(): the most common confusion

matches() is anchored to the whole string — it returns true only if the pattern consumes the entire input. find() is a scanner: it looks for the pattern anywhere and can be called repeatedly to walk every occurrence.

Pattern p = Pattern.compile("\\d+");
System.out.println(p.matcher("abc123").matches());     // false — whole string isn't digits
System.out.println(p.matcher("abc123").find());        // true  — found "123" inside

Matcher m = p.matcher("a1 b22 c333");
while (m.find()) {
  System.out.println(m.group() + " @ " + m.start());   // 1@1, 22@4, 333@8
}

A frequent bug is calling group() before a successful matches()/find() — that throws IllegalStateException, because there is no match to read yet.

Capturing groups, named groups, and replacement

Parentheses in a regex create capturing groups, numbered left to right starting at 1 (group 0 is the whole match). Java also supports named groups with (?<name>...), which you read back via group("name") — far more readable than counting parentheses. In replacement strings, $1 and ${name} insert what a group captured.

Pattern date = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = date.matcher("2024-11-30");
if (m.matches()) {
  System.out.println(m.group("year"));   // 2024
  System.out.println(m.group(3));        // 30  (third numbered group)
}
// Reformat using back-references
System.out.println("2024-11-30".replaceFirst(
    "(\\d{4})-(\\d{2})-(\\d{2})", "$3/$2/$1"));   // 30/11/2024

Flags: case-insensitive, multiline, and more

Pass flags as the second argument to Pattern.compile, combined with |. The most used are CASE_INSENSITIVE, MULTILINE (so ^/$ match at line breaks), and DOTALL (so . matches newlines too). The same flags can be set inline with (?i), (?m), (?s).

Pattern p = Pattern.compile("error", Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("FATAL ERROR").find());   // true

// Equivalent inline form:
Pattern.compile("(?i)error");

A worked example: parsing a log line with one compiled pattern

This program compiles a date pattern once and puts a Matcher through its paces — scanning every date in a string with find(), contrasting find() with matches(), reformatting via group back-references, splitting on whitespace, and reading values out of named groups.

java— editable, runs on the server

What to take from the run:

  • find() is a scanner with memory. The while (m.find()) loop located both dates — at index 6 and index 22 — because each call resumes from where the previous match ended. That is how you enumerate every occurrence, and it is why the total came out to 2.
  • matches() is all-or-nothing. matches whole log? printed false because the dates are buried in other text, while matches one date? printed true because "2024-01-15" is the entire input. Reach for find() to locate, matches() to validate.
  • Numbered groups are addressable mid-match. Inside the loop, m.group(1) returned just the four-digit year (2024) while m.group() returned the whole date — group 0 is the match, groups 1..n are the parenthesized captures left to right.
  • Back-references rearrange text. replaceAll("$3/$2/$1") turned each YYYY-MM-DD into DD/MM/YYYY, producing start 15/01/2024 build 30/11/2024 done — the engine substituted each captured group into the replacement template.
  • Named groups read like fields. Splitting "a b c" on \s+ collapsed the runs of spaces into 3 parts, and the named pattern let nm.group("user") and nm.group("host") pull alice and w3docs.com out by name instead of by fragile position numbers.

Practice

Practice

You call 'pattern.matcher(input)' to get a Matcher, then immediately call 'matcher.group()' without calling 'find()' or 'matches()' first. What happens?