W3docs

Java Regex Quantifiers

How Java regex quantifiers (*, +, ?, {n,m}) behave in greedy, reluctant, and possessive modes.

Java Regex Quantifiers

A quantifier says how many times the thing before it may repeat. * means "zero or more", + means "one or more", ? means "zero or one", and {n,m} spells out an exact range. That much is shared by every regex flavor. What trips people up in Java is that each quantifier comes in three modesgreedy, reluctant, and possessive — and the mode, not the count, decides how the regex engine in java.util.regex walks the input.

The four basic quantifiers

Attach a quantifier to a single character, a character class, or a group, and it controls the repetition of whatever is immediately to its left:

QuantifierRepeats the preceding elementExampleMatches
*zero or more timesab*cac, abc, abbbc
+one or more timesab+cabc, abbc (not ac)
?zero or one timecolou?rcolor, colour
{n}exactly n times\d{4}a 4-digit year
{n,}n or more times\d{2,}two or more digits
{n,m}between n and m times\d{3,5}3 to 5 digits
Pattern.matches("ab*c", "abbbc");   // true  — three b's
Pattern.matches("ab+c", "ac");      // false — '+' needs at least one b
Pattern.matches("colou?r", "color");// true  — the 'u' is optional
Pattern.matches("\\d{3,5}", "1234");// true  — four digits is within 3..5

Greedy is the default

A bare quantifier is greedy: it consumes as much input as it possibly can, then backtracks — giving characters back one at a time — until the rest of the pattern can match. This is why a naive <.+> against HTML swallows far more than one tag:

String html = "<b>one</b>";
Matcher m = Pattern.compile("<.+>").matcher(html);
m.find();
m.group(); // "<b>one</b>"  — '.+' ate everything, then backed up to the last '>'

The engine first grabbed the whole string, found no trailing >, and walked backward until a > lined up — landing on the very last one.

Reluctant: add ? to take as little as possible

Append a ? to any quantifier (*?, +?, ??, {n,m}?) and it becomes reluctant (also called lazy): it matches the fewest repetitions first and expands only when forced. That is what you usually want when scanning delimited tokens:

String html = "<b>one</b>";
Matcher m = Pattern.compile("<.+?>").matcher(html);
m.find();
m.group(); // "<b>"  — stopped at the first '>'

Same pattern, one extra character, opposite behavior: greedy <.+> returns the whole string while reluctant <.+?> returns just the first tag.

Possessive: add + and never give back

Append a + (*+, ++, ?+, {n,m}+) and the quantifier becomes possessive: it grabs as much as it can like a greedy one, but it refuses to backtrack. If the rest of the pattern then fails, the whole match fails — there is no walking backward to rescue it.

// Possessive '.++' eats the final '>' too and won't return it, so no '>' is left
Pattern.compile("<.++>").matcher("<b>one</b>").find(); // false

Why give up that flexibility? Speed and safety. Because a possessive quantifier never reconsiders, it cannot fall into catastrophic backtracking — the exponential blow-up that freezes a thread on patterns like (a+)+b fed a long run of a with no b. Possessive a++b answers "no match" almost instantly.

ModeSyntaxStrategyBacktracks?
GreedyX*take the most, then back off as neededyes
ReluctantX*?take the least, then add as neededyes
PossessiveX*+take the most and keep itno

A worked example: all three modes side by side

This program runs the same input through greedy, reluctant, and possessive quantifiers, then exercises the {n,m} range and a group quantifier. Everything here is pure java.util.regex from the JDK.

java— editable, runs on the server

What to take from the run:

  • Greedy <.+> printed <b>one</b><i>two</i> — the whole string. It consumed everything, then backtracked to the last >, which is exactly why greedy patterns over-match across delimiters.
  • Reluctant <.+?> printed <b> from the identical input. The lone ? flipped the strategy from "most" to "least", stopping at the first > — the fix for tag-by-tag scanning.
  • Possessive <.++> printed matches=false. It swallowed the final > and refused to give it back, so the trailing > in the pattern had nothing to match and the whole attempt failed — the price of never backtracking.
  • \d{3,5} rejected 12 (no match, too few digits), accepted 123 and 12345 whole, and on 1234567 matched only 12345 (len 5) — the upper bound 5 capped it even though more digits were available.
  • The group pattern (\w+\s*){2,3} matched alpha beta gamma — three words, its maximum — proving the quantifier applied to the entire parenthesized group, and a++b returned false instantly on a long run of a with no b, showing how possessive quantifiers sidestep catastrophic backtracking.

Practice

Practice

Against the input '<b>one</b>', what does the Java pattern '<.+?>' (a reluctant quantifier) match on the first call to find()?

  • ' — the engine scans from the right and returns the last tag