W3docs

How to Count Words in a String in Java

Count the number of words in a Java string using split, StreamTokenizer, and regex.

How to Count Words in a String in Java

Counting the words in a string sounds trivial — split on spaces and count the pieces — but the naive version breaks the moment the input has leading, trailing, or repeated spaces. This chapter shows the idiomatic ways to count words in Java, the edge cases that trip people up, and which approach to reach for.

Split on whitespace (the reliable default)

The standard recipe is to trim the string, then split it on one or more whitespace characters with the regex \s+:

String text = "  The quick   brown fox  ";
String trimmed = text.trim();
int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
System.out.println(words); // 4

Two details make this correct. The trim() removes the leading and trailing spaces, because split("\\s+") on a string that starts with whitespace produces an empty leading element. The isEmpty() guard handles blank input: splitting "" returns an array of length 1, not 0, so without the check a blank string would wrongly report one word.

The \\s is a Java string literal for the regex \s, which matches spaces, tabs, and newlines. The + means "one or more," so any run of whitespace counts as a single separator.

Avoid the naive single-space split

It is tempting to write text.split(" ").length, but that splits on exactly one space and breaks on real-world input:

String text = "  The quick   brown fox  ";
System.out.println(text.split(" ").length); // 5, not 4

Every doubled space and every leading space creates an empty element, inflating the count. Splitting on " " is only safe when you already know the input is a single space–delimited line with no extra spacing — which is rarely guaranteed.

Count tokens with a regex matcher

Instead of splitting, you can match word tokens directly and count the matches. This sidesteps the empty-element problem entirely:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Matcher m = Pattern.compile("\\w+").matcher(text);
int count = 0;
while (m.find()) {
    count++;
}

\w+ matches runs of word characters (letters, digits, underscore). Because it looks for words rather than for separators, leading, trailing, and repeated whitespace are irrelevant — no guard needed. The trade-off is that \w excludes punctuation, so a hyphenated "well-known" counts as two tokens. Choose the pattern to match your definition of a "word."

ApproachHandles extra/edge spacesEmpty-string safeNote
split(" ")NoNoBreaks on repeated spaces
trim().split("\\s+")YesWith isEmpty() guardThe go-to default
Pattern \w+ matcherYesYes (counts 0)Splits on punctuation

A complete worked example

java— editable, runs on the server

What to take from the run:

  • Naive split length: 8 proves that split(" ") overcounts badly because every leading space and every space inside a run produces an extra empty array element.
  • Whitespace split: 4 shows that trim().split("\\s+") collapses every run of whitespace and gives the correct count.
  • Regex matcher: 4 confirms the \w+ matcher reaches the same answer without any trimming or guards.
  • Blank input words: 0 demonstrates why the isEmpty() guard matters — without it, blank input would wrongly report one word.
  • All three correct methods agree on 4, so the difference between them is robustness and definition of a "word," not the result on clean input.

Practice

Practice

Why does splitting on a single space overcount words when the string has repeated or leading spaces?