Java StringTokenizer
Split strings into tokens in Java with the legacy StringTokenizer class, and when to prefer String.split instead.
Java StringTokenizer
java.util.StringTokenizer is the JDK's original "break a string into pieces" class — a one-character-at-a-time tokenizer that's been in the platform since 1.0. The Javadoc itself recommends against it for new code: "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."
That's the headline, and it's accurate — but StringTokenizer still turns up in real codebases and has a few honest niches. This chapter teaches it briefly, then says clearly when to reach for String.split or Scanner instead.
The basic loop
A StringTokenizer is an Enumeration of substrings:
StringTokenizer st = new StringTokenizer("apple banana cherry");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
// apple
// banana
// cherryThe default delimiter set is whitespace — space, tab, newline, return, form feed. Adjacent delimiters are collapsed: " a b " yields exactly two tokens.
Custom delimiters
The second constructor takes a string whose characters are each treated as a delimiter — not as a delimiter string:
StringTokenizer st = new StringTokenizer("one,two;three|four", ",;|");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
// one
// two
// three
// fourThis is the design quirk that catches people. new StringTokenizer(input, ", ") treats , and space each as a delimiter, not the two-character sequence ", ". If you want multi-character delimiters, you've outgrown StringTokenizer.
Returning the delimiters themselves
The three-argument constructor controls whether delimiters are themselves emitted as tokens:
StringTokenizer st = new StringTokenizer("a+b*c", "+*", true);
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
// a
// +
// b
// *
// cThis is the one feature that String.split doesn't directly replicate: you'd build it with a Matcher and a regex. For very simple expression parsing — the kind of thing that doesn't justify pulling in a lexer — this overload still earns its keep.
Counting tokens up front
countTokens() reports how many tokens remain (i.e. would be produced by repeated nextToken() calls). It does not consume them.
StringTokenizer st = new StringTokenizer("a b c d");
int n = st.countTokens(); // 4
while (st.hasMoreTokens()) st.nextToken();
n = st.countTokens(); // 0This is occasionally useful when allocating an output array of the right size — though with String.split you'd just call .split(...).length.
Changing the delimiter mid-stream
A less-known feature: nextToken(String newDelims) re-sets the delimiter set for that call onward. Once changed, the new set persists for subsequent hasMoreTokens()/nextToken() calls until you change it again.
StringTokenizer st = new StringTokenizer("key1=value1; key2=value2; key3=value3");
while (st.hasMoreTokens()) {
String pair = st.nextToken("; ").trim(); // tokens separated by ';' or space
System.out.println(pair);
}For ad-hoc one-off parsing this can be neat. For anything maintainable it's confusing — readers don't expect a delimiter set to change inside a loop.
Why String.split is usually better
The reasons to prefer String.split (or Pattern.compile(...).split) for new code:
- Real regex delimiters. Multi-character delimiters, character classes, alternation — all natural.
StringTokenizeronly handles single-character delimiters. - Empty tokens are visible.
"a,,b".split(",")returns["a", "", "b"].StringTokenizerskips the empty token silently. For CSV-shaped input, "the second field was blank" is information you usually need. - Returns an array. Easy to index, easy to convert to
List, easy to stream. - Generally faster under JIT thanks to
Patterncaching for simple split patterns. - Easier to test, easier to read. A tokenizer loop reads like 1996;
parts = csv.split(",")reads like the intent.
Why you might still pick it
A short list of cases where StringTokenizer is still defensible:
- Streaming over a very long string where you want to consume token-by-token without holding an array of all of them.
StringTokenizerdoesn't allocate the result up front;splitdoes. - Returning delimiters as tokens for the simplest possible tokeniser, without reaching for
Pattern/Matcher. - Maintaining old code where the rest of the file uses it and consistency is the kindest thing for the next reader.
For everything else: use split.
A worked example
The program below tokenises three inputs three different ways, side by side with the equivalent split calls, so the behavioural differences are visible:
Two takeaways from the output. In case 1, split reports the empty cell between green and blue; the tokenizer doesn't. In case 2, the tokenizer breaks mix at every ; and every space — which is why each key=value arrives with no spaces — while split("; ") keeps the spaces inside the values. Whichever behaviour you want, the code should be explicit about it, and that's much easier to be with split.
What's next
So far we've split strings into pieces. Often what you really need is to turn a string into a number, a boolean, or another primitive — and the other direction, primitives back into strings. That round-trip has its own set of helpers and pitfalls. Continue to Java String conversions.
Practice
Which of the following is a **real behavioural difference** between `StringTokenizer` and `String.split`?