W3docs

Java Regex Character Classes

Predefined and custom character classes in Java regex — \d, \w, \s, ranges, negations, and intersections.

Java Regex Character Classes

A character class is the part of a regular expression that matches one character out of a set of allowed characters. Whenever you write \d, [aeiou], or [^0-9], you are using a character class. They are the smallest building block of almost every useful pattern, and Java's java.util.regex engine gives you three kinds: ready-made shorthands like \d, custom sets you spell out in [...], and Unicode-aware named classes like \p{Lower}. Get these right and the rest of regex falls into place.

Custom classes: brackets, ranges, and negation

The square-bracket class [...] matches any single character listed inside it. Spell out the characters one by one, or use a hyphen to express a range. Put ^ right after the opening bracket to negate the set — match any character that is not listed.

"[abc]"      // matches one 'a', 'b', or 'c'
"[a-z]"      // any one lowercase letter (a range)
"[A-Za-z0-9]"// any letter or digit (three ranges in one class)
"[^aeiou]"   // any single character that is NOT a vowel

Inside a class most metacharacters lose their special meaning, so you rarely need to escape them. A literal ], ^, -, or \ is the exception — escape those, or position them so they cannot be misread (a - first or last is a literal hyphen, not a range).

java.util.regex.Pattern p = java.util.regex.Pattern.compile("[-+0-9]");
System.out.println(p.matcher("-").find());   // true: leading - is literal
System.out.println(p.matcher("*").find());   // false

Predefined classes: the shorthands

Java ships shorthands for the sets you reach for constantly. Each lowercase form has an uppercase complement that matches everything the lowercase form does not.

ShorthandMatchesEquivalent class
.any character except line terminators
\da digit[0-9]
\Da non-digit[^0-9]
\wa word character[a-zA-Z_0-9]
\Wa non-word character[^a-zA-Z_0-9]
\sa whitespace character[ \t\n\x0B\f\r]
\Sa non-whitespace character[^\s]

Remember that in Java source the backslash itself must be escaped, so the regex \d is written "\\d" in a string literal. This double-backslash trips up nearly every beginner.

String digits = "\\d+";          // regex is \d+  (one or more digits)
String word   = "\\w+";          // regex is \w+
"abc123".replaceAll("\\d", "#"); // "abc###"

Negation vs. intersection vs. union

A bracket class is a union by default — list [abcxyz] and you match any of the six. Java adds two set operators inside classes. The && operator forms an intersection (match only characters in both sets), and nesting a class inside another lets you subtract.

ConstructMeaning
[a-d[m-p]]union: ad or mp
[a-z&&[aeiou]]intersection: lowercase letters that are vowels
[a-z&&[^aeiou]]subtraction: lowercase letters that are not vowels (consonants)
"[a-z&&[^aeiou]]"  // all lowercase consonants
"[\\p{L}&&[^\\p{Lu}]]" // any letter that is not uppercase

Negation with ^ flips an entire class; intersection with && narrows it. Reaching for the right one keeps patterns readable instead of piling up alternations.

Unicode and POSIX named classes

For anything beyond ASCII, use the \p{...} family. These named classes know about Unicode categories and POSIX-style groups, and \P{...} is the negated form. They are essential once your input contains accented letters, non-Latin scripts, or you simply want intent-revealing names.

ClassMatches
\p{Lower}a lowercase ASCII letter ([a-z])
\p{Punct}a punctuation character
\p{Alnum}an ASCII letter or digit
\p{L}any Unicode letter (with UNICODE_CHARACTER_CLASS)
\p{IsDigit}a Unicode digit
// Make \w, \d, \s honor full Unicode, not just ASCII:
java.util.regex.Pattern uni =
    java.util.regex.Pattern.compile("\\w+", java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
System.out.println(uni.matcher("café").results().count()); // 1: "café" is one word

A worked example: every class kind on one string

This program builds a tiny helper, count, that reports how many characters of a sample string a one-character class matches. Running the same string through digit, word, whitespace, range, negation, intersection, the dot, and a POSIX class makes the relationships between them concrete — and shows a class doing real work inside a larger pattern.

java— editable, runs on the server

What to take from the run:

  • \d and [0-9] both report 13 for this string — they are exactly equivalent for ASCII input. A predefined class is just a built-in name for a set you could spell out by hand, so use the shorthand for readability.
  • \D and [^0-9] both report 30, the complement of the 13 digits in a 43-character string (13 + 30 = 43). An uppercase shorthand and a negated bracket class are two spellings of the same complement.
  • [a-z&&[aeiou]] reports 3 — only the lowercase vowels, namely the o of cost, the o of Order... no: the leading O is uppercase, so the three matches are the o and i of ship-by region and the o of cost. The point stands: && narrows a class to the intersection of two sets rather than uniting them, so uppercase vowels are excluded.
  • The dot matched 43, every character including spaces and punctuation, because this single-line string has no line terminator for . to stop at. The dot is the broadest class of all, which is exactly why you usually replace it with a tighter one.
  • The [A-Z]\d pattern found A7: character classes are not only for counting — combined into a sequence they validate structure, here a letter immediately followed by a digit. \p{Punct} separately found 9 punctuation marks, showing the named POSIX classes work alongside the shorthands.

Practice

Practice

In a Java regex, which character class matches any single lowercase letter that is also a vowel?