W3docs

Java Regular Expressions Introduction

An introduction to regular expressions in Java with the java.util.regex package.

Java Regular Expressions Introduction

A regular expression (regex) is a compact pattern that describes a set of strings. In Java, the java.util.regex package turns these patterns into a small, fast matching engine you can point at any text — to validate input, search for substrings, extract fields, or rewrite content. This chapter lays out the pieces before you start writing patterns of your own.

What a Regular Expression Is

A regex is just a string written in a special syntax, but Java does not interpret it character by character every time. Instead you compile the pattern once into a Pattern object, then apply it to input through a Matcher. The compiled form is an efficient state machine, so reusing one Pattern across many inputs is far cheaper than recompiling.

Patterns describe structure: a literal like cat matches exactly those letters, while metacharacters describe shapes — \d is any digit, + means "one or more", . means "any character". Combine them and you can describe phone numbers, emails, or log lines in a single line of code.

import java.util.regex.Pattern;

public class FirstPattern {
    public static void main(String[] args) {
        // Compile the pattern once; reuse the result.
        Pattern digits = Pattern.compile("\\d+");
        System.out.println(digits.matcher("abc123").find()); // true
        System.out.println(digits.matcher("hello").find());  // false
    }
}

Note the double backslash: \d in regex must be written \\d in a Java string literal, because the Java compiler consumes one backslash first.

Pattern and Matcher

Two classes do almost all the work. Pattern is the compiled, reusable, thread-safe blueprint. Matcher is the stateful engine that runs that blueprint against one specific input — it tracks where you are in the text, which groups captured, and where the last match landed. Create a fresh Matcher per input; never share one across threads.

TypeRole
PatternThe compiled pattern. Immutable, thread-safe, reusable.
MatcherApplies a Pattern to one input. Holds match state.
Pattern.compile(regex)Builds a Pattern from a regex string.
pattern.matcher(input)Returns a Matcher bound to that input.
String.matches(regex)One-shot helper that compiles and full-matches in a single call.
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PatternAndMatcher {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("\\w+@\\w+\\.\\w+");
        Matcher m = p.matcher("ping me at [email protected] please");
        if (m.find()) {
            System.out.println("Found email: " + m.group());
        }
    }
}

find() vs matches()

The single most common beginner mistake is confusing the two ways to run a pattern. matches() requires the entire input to match the pattern from start to end. find() scans for any substring that matches, and can be called repeatedly to walk every occurrence. lookingAt() sits between them: it anchors at the start but does not require matching to the end.

MethodAnchored at start?Must match to end?Repeatable?
matches()yesyesno
lookingAt()yesnono
find()nonoyes
import java.util.regex.Pattern;

public class FindVsMatches {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("\\d+");
        System.out.println(p.matcher("42").matches());        // true  (whole input)
        System.out.println(p.matcher("age 42").matches());    // false (extra text)
        System.out.println(p.matcher("age 42").find());       // true  (substring)
        System.out.println(p.matcher("age 42").lookingAt());  // false (no digit at start)
    }
}

Common Syntax You Build On

Most real patterns are assembled from a small vocabulary of building blocks: character classes, predefined shortcuts, quantifiers, and anchors. Learning these gets you most of the way.

ConstructMeaningExample
.Any single character (except newline)a.c matches abc, axc
\d \w \sDigit, word char, whitespace\d\d matches 42
[abc]Any one of a, b, c[aeiou] matches a vowel
[^abc]Any char except a, b, c[^0-9] matches a non-digit
* + ?Zero+, one+, zero-or-oneab+ matches ab, abb
{n} {n,m}Exactly n, between n and m\d{3} matches 555
^ $Start, end of input/line^Hi matches a leading Hi
(...)Capturing group(\d{4}) captures four digits
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GroupsExample {
    public static void main(String[] args) {
        // Two capturing groups: year and month.
        Pattern date = Pattern.compile("(\\d{4})-(\\d{2})");
        Matcher m = date.matcher("Released 2025-11 to users");
        if (m.find()) {
            System.out.println("Full: " + m.group(0)); // 2025-11
            System.out.println("Year: " + m.group(1)); // 2025
            System.out.println("Month: " + m.group(2)); // 11
        }
    }
}

Worked Example

The program below compiles one phone-number pattern and exercises the whole API on it: it walks every match with find(), reads capturing groups and match positions, contrasts find() with matches(), and rewrites the text with replaceAll(). Run it to watch the engine in action.

java— editable, runs on the server

What to take from the run:

  • find() is called in a loop and returns two matches, so the same Matcher walks the text one occurrence at a time until it returns false.
  • group(1) and group(2) return the parenthesized sub-parts (555 and 1234), while group() with no argument returns the whole match.
  • start() and end() report the character offsets of each match, which is how you would highlight or slice the original text.
  • matches() on the full sentence prints false because the pattern does not cover the whole string, while "555-1234" alone prints true — proof that matches() is whole-input only.
  • replaceAll("XXX-XXXX") rewrites every match in one pass, producing the masked sentence and showing how patterns drive text transformation.

Practice

Practice

What is the key difference between Matcher.matches() and Matcher.find()?