Java Pattern and Matcher Classes
Compile patterns and match input in Java with the Pattern and Matcher classes.
Java Pattern and Matcher Classes
Regular expressions in Java live in the java.util.regex package, and almost all of that package is two classes: Pattern and Matcher. A Pattern is a compiled regular expression — the rule. A Matcher is the engine that runs that rule against one piece of input and reports what it found. Separating the two lets you compile an expression once and reuse it across thousands of inputs, which is the difference between fast regex code and slow regex code.
Pattern: compile once, reuse forever
Pattern.compile(String regex) parses the regex syntax and returns an immutable, thread-safe Pattern. Compilation is the expensive step, so do it once — typically in a static final field — and share the result. The convenience methods on String (matches, replaceAll, split) recompile the pattern on every call, which is fine for a one-off but wasteful in a loop.
// Good: compile once, reuse
private static final Pattern EMAIL =
Pattern.compile("[\\w.+-]+@[\\w.-]+\\.[a-z]{2,}");
// Wasteful in a loop: String.matches recompiles every iteration
for (String s : lines) {
if (s.matches("[\\w.+-]+@[\\w.-]+\\.[a-z]{2,}")) { /* ... */ }
}Note the doubled backslashes: \\w in Java source is the regex token \w, because the backslash must first survive Java's own string escaping.
Matcher: the stateful engine
A Pattern holds no input and no position — it is just the rule. Calling pattern.matcher(input) produces a Matcher bound to that input, and the Matcher carries all the mutable state: the current search position, the bounds of the last match, and the captured groups. Because it is stateful, a Matcher is not thread-safe; give each thread its own.
| Method | What it does |
|---|---|
matches() | Tests whether the entire input matches the pattern |
lookingAt() | Tests whether the input matches starting at the beginning (need not reach the end) |
find() | Finds the next match anywhere in the input; returns true and advances |
group() / group(n) | Returns the whole match, or capture group n |
start() / end() | Index of the first character of the match, and one past the last |
replaceAll(repl) | Replaces every match, with $1, $2 back-references to groups |
reset() | Rewinds the matcher to position zero (optionally with new input) |
find() versus matches(): the most common confusion
matches() is anchored to the whole string — it returns true only if the pattern consumes the entire input. find() is a scanner: it looks for the pattern anywhere and can be called repeatedly to walk every occurrence.
Pattern p = Pattern.compile("\\d+");
System.out.println(p.matcher("abc123").matches()); // false — whole string isn't digits
System.out.println(p.matcher("abc123").find()); // true — found "123" inside
Matcher m = p.matcher("a1 b22 c333");
while (m.find()) {
System.out.println(m.group() + " @ " + m.start()); // 1@1, 22@4, 333@8
}A frequent bug is calling group() before a successful matches()/find() — that throws IllegalStateException, because there is no match to read yet.
Capturing groups, named groups, and replacement
Parentheses in a regex create capturing groups, numbered left to right starting at 1 (group 0 is the whole match). Java also supports named groups with (?<name>...), which you read back via group("name") — far more readable than counting parentheses. In replacement strings, $1 and ${name} insert what a group captured.
Pattern date = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = date.matcher("2024-11-30");
if (m.matches()) {
System.out.println(m.group("year")); // 2024
System.out.println(m.group(3)); // 30 (third numbered group)
}
// Reformat using back-references
System.out.println("2024-11-30".replaceFirst(
"(\\d{4})-(\\d{2})-(\\d{2})", "$3/$2/$1")); // 30/11/2024Flags: case-insensitive, multiline, and more
Pass flags as the second argument to Pattern.compile, combined with |. The most used are CASE_INSENSITIVE, MULTILINE (so ^/$ match at line breaks), and DOTALL (so . matches newlines too). The same flags can be set inline with (?i), (?m), (?s).
Pattern p = Pattern.compile("error", Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("FATAL ERROR").find()); // true
// Equivalent inline form:
Pattern.compile("(?i)error");A worked example: parsing a log line with one compiled pattern
This program compiles a date pattern once and puts a Matcher through its paces — scanning every date in a string with find(), contrasting find() with matches(), reformatting via group back-references, splitting on whitespace, and reading values out of named groups.
What to take from the run:
find()is a scanner with memory. Thewhile (m.find())loop located both dates — at index6and index22— because each call resumes from where the previous match ended. That is how you enumerate every occurrence, and it is why the total came out to2.matches()is all-or-nothing.matches whole log?printedfalsebecause the dates are buried in other text, whilematches one date?printedtruebecause"2024-01-15"is the entire input. Reach forfind()to locate,matches()to validate.- Numbered groups are addressable mid-match. Inside the loop,
m.group(1)returned just the four-digit year (2024) whilem.group()returned the whole date — group0is the match, groups1..nare the parenthesized captures left to right. - Back-references rearrange text.
replaceAll("$3/$2/$1")turned eachYYYY-MM-DDintoDD/MM/YYYY, producingstart 15/01/2024 build 30/11/2024 done— the engine substituted each captured group into the replacement template. - Named groups read like fields. Splitting
"a b c"on\s+collapsed the runs of spaces into3parts, and the named pattern letnm.group("user")andnm.group("host")pullaliceandw3docs.comout by name instead of by fragile position numbers.
Practice
You call 'pattern.matcher(input)' to get a Matcher, then immediately call 'matcher.group()' without calling 'find()' or 'matches()' first. What happens?