How to Count Words in a String in Java

Counting the words in a string sounds trivial — split on spaces and count the pieces — but the naive version breaks the moment the input has leading, trailing, or repeated spaces. This chapter shows the idiomatic ways to count words in Java, the edge cases that trip people up, and which approach to reach for. It builds on the basics in Java Strings and String split and join.

Split on whitespace (the reliable default)

The standard recipe is to trim the string, then split it on one or more whitespace characters with the regex \s+:

String text = "  The quick   brown fox  ";
String trimmed = text.trim();
int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
System.out.println(words); // 4

Two details make this correct. The trim() removes the leading and trailing spaces, because split("\\s+") on a string that starts with whitespace produces an empty leading element. The isEmpty() guard handles blank input: splitting "" returns an array of length 1, not 0, so without the check a blank string would wrongly report one word.

The \\s is a Java string literal for the regex \s, which matches spaces, tabs, and newlines. The + means "one or more," so any run of whitespace counts as a single separator.

Avoid the naive single-space split

It is tempting to write text.split(" ").length, but that splits on exactly one space and breaks on real-world input:

String text = "  The quick   brown fox  ";
System.out.println(text.split(" ").length); // 8, not 4

The two leading spaces produce two empty leading elements, and each run of three internal spaces adds more — so the array is ["", "", "The", "quick", "", "", "brown", "fox"], eight elements instead of four. (Java's split discards trailing empty strings, which is why the two trailing spaces add nothing.) Every doubled space and every leading space inflates the count. Splitting on " " is only safe when you already know the input is a single space–delimited line with no extra spacing — which is rarely guaranteed.

Count tokens with a regex matcher

Instead of splitting, you can match word tokens directly and count the matches. This sidesteps the empty-element problem entirely:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Matcher m = Pattern.compile("\\w+").matcher(text);
int count = 0;
while (m.find()) {
    count++;
}

\w+ matches runs of word characters (letters, digits, underscore). Because it looks for words rather than for separators, leading, trailing, and repeated whitespace are irrelevant — no guard needed. The trade-off is that \w excludes punctuation, so a hyphenated "well-known" counts as two tokens. Choose the pattern to match your definition of a "word."

If you need to tokenize a longer document or a stream rather than a single line, the Java StringTokenizer chapter covers a class built specifically for breaking text into tokens.

Approach	Handles extra/edge spaces	Empty-string safe	Note
`split(" ")`	No	No	Breaks on repeated spaces
`trim().split("\\s+")`	Yes	With `isEmpty()` guard	The go-to default
`Pattern \w+` matcher	Yes	Yes (counts 0)	Splits on punctuation

A complete worked example

java— editable, runs on the server

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WordCount {
    public static void main(String[] args) {
        String text = "  The quick   brown fox  ";

// 1. Naive split on a single space (buggy on extra spaces).
        int naive = text.split(" ").length;
        System.out.println("Naive split length: " + naive);

// 2. Trim then split on one-or-more whitespace.
        String trimmed = text.trim();
        int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
        System.out.println("Whitespace split:   " + words);

// 3. Regex matcher counting word tokens.
        Matcher m = Pattern.compile("\\w+").matcher(text);
        int matched = 0;
        while (m.find()) {
            matched++;
        }
        System.out.println("Regex matcher:      " + matched);

// 4. Empty / blank input is handled safely.
        String blank = "   ";
        String bt = blank.trim();
        int blankWords = bt.isEmpty() ? 0 : bt.split("\\s+").length;
        System.out.println("Blank input words:  " + blankWords);
    }
}

What to take from the run:

Naive split length: 8 proves that split(" ") overcounts badly because every leading space and every space inside a run produces an extra empty array element.
Whitespace split: 4 shows that trim().split("\\s+") collapses every run of whitespace and gives the correct count.
Regex matcher: 4 confirms the \w+ matcher reaches the same answer without any trimming or guards.
Blank input words: 0 demonstrates why the isEmpty() guard matters — without it, blank input would wrongly report one word.
All three correct methods agree on 4, so the difference between them is robustness and definition of a "word," not the result on clean input.

Practice

Why does splitting on a single space overcount words when the string has repeated or leading spaces?

Splitting on a single space produces empty string elements for each extra space, which inflate the array lengthThe split method always adds one to the countStrings in Java cannot contain more than one spaceThe length field counts characters, not array elements