How to Count Words in a String in Java
Count the number of words in a Java string using split, StreamTokenizer, and regex.
How to Count Words in a String in Java
Counting the words in a string sounds trivial — split on spaces and count the pieces — but the naive version breaks the moment the input has leading, trailing, or repeated spaces. This chapter shows the idiomatic ways to count words in Java, the edge cases that trip people up, and which approach to reach for.
Split on whitespace (the reliable default)
The standard recipe is to trim the string, then split it on one or more whitespace characters with the regex \s+:
String text = " The quick brown fox ";
String trimmed = text.trim();
int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
System.out.println(words); // 4Two details make this correct. The trim() removes the leading and trailing spaces, because split("\\s+") on a string that starts with whitespace produces an empty leading element. The isEmpty() guard handles blank input: splitting "" returns an array of length 1, not 0, so without the check a blank string would wrongly report one word.
The \\s is a Java string literal for the regex \s, which matches spaces, tabs, and newlines. The + means "one or more," so any run of whitespace counts as a single separator.
Avoid the naive single-space split
It is tempting to write text.split(" ").length, but that splits on exactly one space and breaks on real-world input:
String text = " The quick brown fox ";
System.out.println(text.split(" ").length); // 5, not 4Every doubled space and every leading space creates an empty element, inflating the count. Splitting on " " is only safe when you already know the input is a single space–delimited line with no extra spacing — which is rarely guaranteed.
Count tokens with a regex matcher
Instead of splitting, you can match word tokens directly and count the matches. This sidesteps the empty-element problem entirely:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Matcher m = Pattern.compile("\\w+").matcher(text);
int count = 0;
while (m.find()) {
count++;
}\w+ matches runs of word characters (letters, digits, underscore). Because it looks for words rather than for separators, leading, trailing, and repeated whitespace are irrelevant — no guard needed. The trade-off is that \w excludes punctuation, so a hyphenated "well-known" counts as two tokens. Choose the pattern to match your definition of a "word."
| Approach | Handles extra/edge spaces | Empty-string safe | Note |
|---|---|---|---|
split(" ") | No | No | Breaks on repeated spaces |
trim().split("\\s+") | Yes | With isEmpty() guard | The go-to default |
Pattern \w+ matcher | Yes | Yes (counts 0) | Splits on punctuation |
A complete worked example
What to take from the run:
Naive split length: 8proves thatsplit(" ")overcounts badly because every leading space and every space inside a run produces an extra empty array element.Whitespace split: 4shows thattrim().split("\\s+")collapses every run of whitespace and gives the correct count.Regex matcher: 4confirms the\w+matcher reaches the same answer without any trimming or guards.Blank input words: 0demonstrates why theisEmpty()guard matters — without it, blank input would wrongly report one word.- All three correct methods agree on
4, so the difference between them is robustness and definition of a "word," not the result on clean input.
Practice
Why does splitting on a single space overcount words when the string has repeated or leading spaces?