Java Regex Groups and Captures
Capture parts of matched text in Java regex with parentheses, numbered groups, and named groups.
Java Regex Groups and Captures
A regular expression doesn't just tell you whether a string matches—it can carve the match into pieces you can read back. Those pieces are groups. By wrapping part of a pattern in parentheses you create a capturing group, and once a Matcher succeeds you pull each group out by number or by name. This chapter covers numbered groups, named groups, backreferences, non-capturing groups, and how all of them feed into replacements.
Numbered Capturing Groups
Every pair of parentheses in a pattern opens a capturing group, numbered left to right by their opening (. Group 0 is special: it's always the entire match. So (\d{4})-(\d{2})-(\d{2}) gives you four groups—the whole date plus year, month, and day.
Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("2026-05-30");
if (m.matches()) {
System.out.println(m.group(0)); // 2026-05-30 (entire match)
System.out.println(m.group(1)); // 2026
System.out.println(m.group(2)); // 05
System.out.println(m.group(3)); // 30
}groupCount() returns the number of capturing groups excluding group 0, so the pattern above reports 3. Reading a group index that doesn't exist throws IndexOutOfBoundsException.
Named Groups
Counting parentheses gets brittle as patterns grow. Named groups, written (?<name>...), let you read a capture by a readable label instead of an index. The names must be valid Java identifiers and unique within the pattern.
Pattern p = Pattern.compile("(?<user>[\\w.]+)@(?<host>[\\w.]+)");
Matcher m = p.matcher("[email protected]");
if (m.matches()) {
System.out.println(m.group("user")); // ada
System.out.println(m.group("host")); // math.org
}Named groups are still numbered underneath, so m.group(1) and m.group("user") return the same text. The name is purely for your readability.
Backreferences
A backreference matches the same text a previous group already captured. Inside the pattern you write \1 for group 1 (or \k<name> for a named group). This is how you detect repetition—say, a doubled word—within a single match.
// \b(\w+)\s+\1\b matches a word followed by the same word again
Pattern p = Pattern.compile("\\b(\\w+)\\s+\\1\\b");
Matcher m = p.matcher("the the end");
if (m.find()) {
System.out.println(m.group(1)); // the
}Note the double backslash in Java source: \\1 in the string becomes \1 in the actual regex. A backreference can only match after its group has captured, so the group must appear earlier in the pattern.
Capturing in Replacements
String.replaceAll, Matcher.replaceAll, and appendReplacement all understand group references in the replacement text. Use $1, $2, ... for numbered groups and ${name} for named ones. This turns regex into a small reordering and templating tool.
| Reference | Meaning in replacement |
|---|---|
$0 | The entire match |
$1, $2, ... | Numbered capturing groups |
${name} | A named capturing group |
\$ | A literal dollar sign |
// Reorder "First Last" into "Last, First"
String out = "Ada Lovelace".replaceAll("(\\w+)\\s+(\\w+)", "$2, $1");
System.out.println(out); // Lovelace, AdaIf you need a literal $ or \ in the output, escape it as \\$ or \\\\ in the Java string.
Non-Capturing Groups
Sometimes you need parentheses only to group an alternation or apply a quantifier—not to capture. A non-capturing group (?:...) does exactly that: it groups without consuming a group number, which keeps your indices clean and the engine slightly faster.
// Group the protocol alternation, but capture only the host
Pattern p = Pattern.compile("(?:https?|ftp)://(\\S+)");
Matcher m = p.matcher("https://w3docs.com");
if (m.find()) {
System.out.println(m.group(1)); // w3docs.com (group 1, not 2)
}Because (?:https?|ftp) is non-capturing, the host is group 1 rather than group 2. An optional capturing group that doesn't participate in the match returns null, so always null-check optional groups before using them.
A Runnable Example
The program below exercises every kind of group in one run: numbered date parts, named email fields, a doubled-word backreference, group references in two replacements, a non-capturing protocol group, and an optional group that comes back null.
What to take from the run:
- The two date lines show numbered groups:
group(0)is the whole match whilegroup(1..3)are year, month, and day captured by position. - The email line proves named groups read the same text as indices, and
groupCount=2counts only the named captures, never group 0. Doubled: theandDoubled: satcome from the\1backreference matching whatever the word group just captured—each repeated word is found independently.Swapped: Lovelace, Ada, Turing, Alanshows$2, $1reordering each name pair, whileMasked: card [4111] [2222]shows the${num}named reference templating the output.- The non-capturing
(?:https?|ftp)keeps the host at group 1 (w3docs.com), and the optional fraction group printsfrac=nullbecause it never participated in matching42.
Practice
In the pattern (?:https?|ftp)://(\S+), which group holds the host portion that (\S+) captures?