Java Regex Character Classes
Predefined and custom character classes in Java regex — \d, \w, \s, ranges, negations, and intersections.
Java Regex Character Classes
A character class is the part of a regular expression that matches one character out of a set of allowed characters. Whenever you write \d, [aeiou], or [^0-9], you are using a character class. They are the smallest building block of almost every useful pattern, and Java's java.util.regex engine gives you three kinds: ready-made shorthands like \d, custom sets you spell out in [...], and Unicode-aware named classes like \p{Lower}. Get these right and the rest of regex falls into place.
Custom classes: brackets, ranges, and negation
The square-bracket class [...] matches any single character listed inside it. Spell out the characters one by one, or use a hyphen to express a range. Put ^ right after the opening bracket to negate the set — match any character that is not listed.
"[abc]" // matches one 'a', 'b', or 'c'
"[a-z]" // any one lowercase letter (a range)
"[A-Za-z0-9]"// any letter or digit (three ranges in one class)
"[^aeiou]" // any single character that is NOT a vowelInside a class most metacharacters lose their special meaning, so you rarely need to escape them. A literal ], ^, -, or \ is the exception — escape those, or position them so they cannot be misread (a - first or last is a literal hyphen, not a range).
java.util.regex.Pattern p = java.util.regex.Pattern.compile("[-+0-9]");
System.out.println(p.matcher("-").find()); // true: leading - is literal
System.out.println(p.matcher("*").find()); // falsePredefined classes: the shorthands
Java ships shorthands for the sets you reach for constantly. Each lowercase form has an uppercase complement that matches everything the lowercase form does not.
| Shorthand | Matches | Equivalent class |
|---|---|---|
. | any character except line terminators | — |
\d | a digit | [0-9] |
\D | a non-digit | [^0-9] |
\w | a word character | [a-zA-Z_0-9] |
\W | a non-word character | [^a-zA-Z_0-9] |
\s | a whitespace character | [ \t\n\x0B\f\r] |
\S | a non-whitespace character | [^\s] |
Remember that in Java source the backslash itself must be escaped, so the regex \d is written "\\d" in a string literal. This double-backslash trips up nearly every beginner.
String digits = "\\d+"; // regex is \d+ (one or more digits)
String word = "\\w+"; // regex is \w+
"abc123".replaceAll("\\d", "#"); // "abc###"Negation vs. intersection vs. union
A bracket class is a union by default — list [abcxyz] and you match any of the six. Java adds two set operators inside classes. The && operator forms an intersection (match only characters in both sets), and nesting a class inside another lets you subtract.
| Construct | Meaning |
|---|---|
[a-d[m-p]] | union: a–d or m–p |
[a-z&&[aeiou]] | intersection: lowercase letters that are vowels |
[a-z&&[^aeiou]] | subtraction: lowercase letters that are not vowels (consonants) |
"[a-z&&[^aeiou]]" // all lowercase consonants
"[\\p{L}&&[^\\p{Lu}]]" // any letter that is not uppercaseNegation with ^ flips an entire class; intersection with && narrows it. Reaching for the right one keeps patterns readable instead of piling up alternations.
Unicode and POSIX named classes
For anything beyond ASCII, use the \p{...} family. These named classes know about Unicode categories and POSIX-style groups, and \P{...} is the negated form. They are essential once your input contains accented letters, non-Latin scripts, or you simply want intent-revealing names.
| Class | Matches |
|---|---|
\p{Lower} | a lowercase ASCII letter ([a-z]) |
\p{Punct} | a punctuation character |
\p{Alnum} | an ASCII letter or digit |
\p{L} | any Unicode letter (with UNICODE_CHARACTER_CLASS) |
\p{IsDigit} | a Unicode digit |
// Make \w, \d, \s honor full Unicode, not just ASCII:
java.util.regex.Pattern uni =
java.util.regex.Pattern.compile("\\w+", java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
System.out.println(uni.matcher("café").results().count()); // 1: "café" is one wordA worked example: every class kind on one string
This program builds a tiny helper, count, that reports how many characters of a sample string a one-character class matches. Running the same string through digit, word, whitespace, range, negation, intersection, the dot, and a POSIX class makes the relationships between them concrete — and shows a class doing real work inside a larger pattern.
What to take from the run:
\dand[0-9]both report 13 for this string — they are exactly equivalent for ASCII input. A predefined class is just a built-in name for a set you could spell out by hand, so use the shorthand for readability.\Dand[^0-9]both report 30, the complement of the 13 digits in a 43-character string (13 + 30 = 43). An uppercase shorthand and a negated bracket class are two spellings of the same complement.[a-z&&[aeiou]]reports 3 — only the lowercase vowels, namely theoof cost, theoof Order... no: the leadingOis uppercase, so the three matches are theoandiof ship-by region and theoof cost. The point stands:&&narrows a class to the intersection of two sets rather than uniting them, so uppercase vowels are excluded.- The dot matched 43, every character including spaces and punctuation, because this single-line string has no line terminator for
.to stop at. The dot is the broadest class of all, which is exactly why you usually replace it with a tighter one. - The
[A-Z]\dpattern foundA7: character classes are not only for counting — combined into a sequence they validate structure, here a letter immediately followed by a digit.\p{Punct}separately found 9 punctuation marks, showing the named POSIX classes work alongside the shorthands.
Practice
In a Java regex, which character class matches any single lowercase letter that is also a vowel?