W3docs

Java Regex Groups and Captures

Capture parts of matched text in Java regex with parentheses, numbered groups, and named groups.

Java Regex Groups and Captures

A regular expression doesn't just tell you whether a string matches—it can carve the match into pieces you can read back. Those pieces are groups. By wrapping part of a pattern in parentheses you create a capturing group, and once a Matcher succeeds you pull each group out by number or by name. This chapter covers numbered groups, named groups, backreferences, non-capturing groups, and how all of them feed into replacements.

Numbered Capturing Groups

Every pair of parentheses in a pattern opens a capturing group, numbered left to right by their opening (. Group 0 is special: it's always the entire match. So (\d{4})-(\d{2})-(\d{2}) gives you four groups—the whole date plus year, month, and day.

Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("2026-05-30");
if (m.matches()) {
    System.out.println(m.group(0)); // 2026-05-30 (entire match)
    System.out.println(m.group(1)); // 2026
    System.out.println(m.group(2)); // 05
    System.out.println(m.group(3)); // 30
}

groupCount() returns the number of capturing groups excluding group 0, so the pattern above reports 3. Reading a group index that doesn't exist throws IndexOutOfBoundsException.

Named Groups

Counting parentheses gets brittle as patterns grow. Named groups, written (?<name>...), let you read a capture by a readable label instead of an index. The names must be valid Java identifiers and unique within the pattern.

Pattern p = Pattern.compile("(?<user>[\\w.]+)@(?<host>[\\w.]+)");
Matcher m = p.matcher("[email protected]");
if (m.matches()) {
    System.out.println(m.group("user")); // ada
    System.out.println(m.group("host")); // math.org
}

Named groups are still numbered underneath, so m.group(1) and m.group("user") return the same text. The name is purely for your readability.

Backreferences

A backreference matches the same text a previous group already captured. Inside the pattern you write \1 for group 1 (or \k<name> for a named group). This is how you detect repetition—say, a doubled word—within a single match.

// \b(\w+)\s+\1\b  matches a word followed by the same word again
Pattern p = Pattern.compile("\\b(\\w+)\\s+\\1\\b");
Matcher m = p.matcher("the the end");
if (m.find()) {
    System.out.println(m.group(1)); // the
}

Note the double backslash in Java source: \\1 in the string becomes \1 in the actual regex. A backreference can only match after its group has captured, so the group must appear earlier in the pattern.

Capturing in Replacements

String.replaceAll, Matcher.replaceAll, and appendReplacement all understand group references in the replacement text. Use $1, $2, ... for numbered groups and ${name} for named ones. This turns regex into a small reordering and templating tool.

ReferenceMeaning in replacement
$0The entire match
$1, $2, ...Numbered capturing groups
${name}A named capturing group
\$A literal dollar sign
// Reorder "First Last" into "Last, First"
String out = "Ada Lovelace".replaceAll("(\\w+)\\s+(\\w+)", "$2, $1");
System.out.println(out); // Lovelace, Ada

If you need a literal $ or \ in the output, escape it as \\$ or \\\\ in the Java string.

Non-Capturing Groups

Sometimes you need parentheses only to group an alternation or apply a quantifier—not to capture. A non-capturing group (?:...) does exactly that: it groups without consuming a group number, which keeps your indices clean and the engine slightly faster.

// Group the protocol alternation, but capture only the host
Pattern p = Pattern.compile("(?:https?|ftp)://(\\S+)");
Matcher m = p.matcher("https://w3docs.com");
if (m.find()) {
    System.out.println(m.group(1)); // w3docs.com  (group 1, not 2)
}

Because (?:https?|ftp) is non-capturing, the host is group 1 rather than group 2. An optional capturing group that doesn't participate in the match returns null, so always null-check optional groups before using them.

A Runnable Example

The program below exercises every kind of group in one run: numbered date parts, named email fields, a doubled-word backreference, group references in two replacements, a non-capturing protocol group, and an optional group that comes back null.

java— editable, runs on the server

What to take from the run:

  • The two date lines show numbered groups: group(0) is the whole match while group(1..3) are year, month, and day captured by position.
  • The email line proves named groups read the same text as indices, and groupCount=2 counts only the named captures, never group 0.
  • Doubled: the and Doubled: sat come from the \1 backreference matching whatever the word group just captured—each repeated word is found independently.
  • Swapped: Lovelace, Ada, Turing, Alan shows $2, $1 reordering each name pair, while Masked: card [4111] [2222] shows the ${num} named reference templating the output.
  • The non-capturing (?:https?|ftp) keeps the host at group 1 (w3docs.com), and the optional fraction group prints frac=null because it never participated in matching 42.

Practice

Practice

In the pattern (?:https?|ftp)://(\S+), which group holds the host portion that (\S+) captures?