Java Regex Character Classes | W3Docs Learn Java

A character class is the part of a regular expression that matches one character out of a set of allowed characters. Whenever you write \d, [aeiou], or [^0-9], you are using a character class. They are the smallest building block of almost every useful pattern, and Java's java.util.regex engine gives you three kinds: ready-made shorthands like \d, custom sets you spell out in [...], and Unicode-aware named classes like \p{Lower}. Get these right and the rest of regex falls into place.

This chapter covers all three kinds — custom bracket classes with ranges and negation, the predefined shorthands, set operations (union, intersection, subtraction), and Unicode/POSIX named classes — then ties them together in one runnable program. If you are new to Java regex, start with the regex introduction and regex syntax first.

Custom classes: brackets, ranges, and negation

The square-bracket class [...] matches any single character listed inside it. Spell out the characters one by one, or use a hyphen to express a range. Put ^ right after the opening bracket to negate the set — match any character that is not listed.

"[abc]"      // matches one 'a', 'b', or 'c'
"[a-z]"      // any one lowercase letter (a range)
"[A-Za-z0-9]"// any letter or digit (three ranges in one class)
"[^aeiou]"   // any single character that is NOT a vowel

Inside a class most metacharacters lose their special meaning, so you rarely need to escape them. A literal ], ^, -, or \ is the exception — escape those, or position them so they cannot be misread (a - first or last is a literal hyphen, not a range).

java.util.regex.Pattern p = java.util.regex.Pattern.compile("[-+0-9]");
System.out.println(p.matcher("-").find());   // true: leading - is literal
System.out.println(p.matcher("*").find());   // false

Predefined classes: the shorthands

Java ships shorthands for the sets you reach for constantly. Each lowercase form has an uppercase complement that matches everything the lowercase form does not.

Shorthand	Matches	Equivalent class
`.`	any character except line terminators	—
`\d`	a digit	`[0-9]`
`\D`	a non-digit	`[^0-9]`
`\w`	a word character	`[a-zA-Z_0-9]`
`\W`	a non-word character	`[^a-zA-Z_0-9]`
`\s`	a whitespace character	`[ \t\n\x0B\f\r]`
`\S`	a non-whitespace character	`[^\s]`

Warning

In Java source the backslash itself must be escaped, so the regex \d is written "\\d" in a string literal, and \w becomes "\\w". Forgetting the second backslash is the single most common Java regex mistake — "\d" will not even compile.

String digits = "\\d+";          // regex is \d+  (one or more digits)
String word   = "\\w+";          // regex is \w+
"abc123".replaceAll("\\d", "#"); // "abc###"

Negation vs. intersection vs. union

A bracket class is a union by default — list [abcxyz] and you match any of the six. Java adds two set operators inside classes. The && operator forms an intersection (match only characters in both sets), and nesting a class inside another lets you subtract.

Construct	Meaning
`[a-d[m-p]]`	union: `a`–`d` or `m`–`p`
`[a-z&&[aeiou]]`	intersection: lowercase letters that are vowels
`[a-z&&[^aeiou]]`	subtraction: lowercase letters that are not vowels (consonants)

"[a-z&&[^aeiou]]"  // all lowercase consonants
"[\\p{L}&&[^\\p{Lu}]]" // any letter that is not uppercase

Negation with ^ flips an entire class; intersection with && narrows it. Reaching for the right one keeps patterns readable instead of piling up alternations.

Unicode and POSIX named classes

For anything beyond ASCII, use the \p{...} family. These named classes know about Unicode categories and POSIX-style groups, and \P{...} is the negated form. They are essential once your input contains accented letters, non-Latin scripts, or you simply want intent-revealing names.

Class	Matches
`\p{Lower}`	a lowercase ASCII letter (`[a-z]`)
`\p{Punct}`	a punctuation character
`\p{Alnum}`	an ASCII letter or digit
`\p{L}`	any Unicode letter (with `UNICODE_CHARACTER_CLASS`)
`\p{IsDigit}`	a Unicode digit

// Make \w, \d, \s honor full Unicode, not just ASCII:
java.util.regex.Pattern uni =
    java.util.regex.Pattern.compile("\\w+", java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
System.out.println(uni.matcher("café").results().count()); // 1: "café" is one word

A worked example: every class kind on one string

This program builds a tiny helper, count, that reports how many characters of a sample string a one-character class matches. Running the same string through digit, word, whitespace, range, negation, intersection, the dot, and a POSIX class makes the relationships between them concrete — and shows a class doing real work inside a larger pattern.

java— editable, runs on the server

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharClassDemo {
  // Count how many chars of `input` match the single-class `regex`.
  static long count(String regex, String input) {
    return Pattern.compile(regex).matcher(input).results().count();
  }

public static void main(String[] args) {
    String sample = "Order #A7! cost $19.50, ship-by 2026-05-30.";
    System.out.println("sample: " + sample);
    System.out.println();

// 1. Predefined classes: \\d digits, \\w word chars, \\s whitespace.
    System.out.println("\\d (digits)      : " + count("\\d", sample));
    System.out.println("\\w (word chars)  : " + count("\\w", sample));
    System.out.println("\\s (whitespace)  : " + count("\\s", sample));

// 2. A custom set and a range mean the same thing here.
    System.out.println("[0-9] (range)    : " + count("[0-9]", sample));
    System.out.println("[A-Za-z] letters : " + count("[A-Za-z]", sample));

// 3. Negation: [^...] and \\D are complements.
    System.out.println("[^0-9] non-digit : " + count("[^0-9]", sample));
    System.out.println("\\D non-digit     : " + count("\\D", sample));

// 4. Intersection: vowels that are also lowercase letters.
    System.out.println("[a-z&&[aeiou]]   : " + count("[a-z&&[aeiou]]", sample));

// 5. The dot matches any char except line terminators by default.
    System.out.println(". (any char)     : " + count(".", sample));

// 6. A class inside a real pattern: validate a simple SKU like A7.
    Pattern sku = Pattern.compile("[A-Z]\\d");
    Matcher m = sku.matcher(sample);
    if (m.find()) {
      System.out.println("first [A-Z]\\d    : " + m.group());
    }

// 7. POSIX-style \\p classes are Unicode-aware names.
    System.out.println("\\p{Punct} count  : " + count("\\p{Punct}", sample));
  }
}

What to take from the run:

\d and [0-9] both report 13 for this string — they are exactly equivalent for ASCII input. A predefined class is just a built-in name for a set you could spell out by hand, so use the shorthand for readability.
\D and [^0-9] both report 30, the complement of the 13 digits in a 43-character string (13 + 30 = 43). An uppercase shorthand and a negated bracket class are two spellings of the same complement.
[a-z&&[aeiou]] reports 3 — only the lowercase vowels, which are the e of Order, the o of cost, and the i of ship-by. The leading O of Order is uppercase, so the intersection excludes it. The point: && narrows a class to the characters in both sets rather than uniting them, so uppercase vowels never qualify.
The dot matched 43, every character including spaces and punctuation, because this single-line string has no line terminator for . to stop at. The dot is the broadest class of all, which is exactly why you usually replace it with a tighter one.
The [A-Z]\d pattern found A7: character classes are not only for counting — combined into a sequence they validate structure, here a letter immediately followed by a digit. \p{Punct} separately found 9 punctuation marks, showing the named POSIX classes work alongside the shorthands.

Practice

In a Java regex, which character class matches any single lowercase letter that is also a vowel?

[a-z&&[aeiou]] — the && operator forms the intersection of the lowercase range and the vowel set[a-z||aeiou] — the || operator unions the two sets so only common members remain[a-z[aeiou]] — nesting one class in another keeps only the characters in both[^a-z&&aeiou] — the leading ^ restricts the class to the vowels

Where to go next

A character class matches one character; the next step is controlling how many of them you match and what to do with the result:

Regex quantifiers — add +, *, ?, and {n,m} so a class matches a run of characters.
Capturing groups — wrap a class in (...) to pull a matched substring back out.
Pattern and Matcher — the API used in every example above, in depth.
Regex flags — CASE_INSENSITIVE, UNICODE_CHARACTER_CLASS, and friends that change how classes behave.