Java StringTokenizer Class | W3Docs Learn Java

java.util.StringTokenizer is the JDK's original "break a string into pieces" class — a one-character-at-a-time tokenizer that's been in the platform since 1.0. The Javadoc itself recommends against it for new code: "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."

That's the headline, and it's accurate — but StringTokenizer still turns up in real codebases and has a few honest niches. This chapter teaches it briefly, then says clearly when to reach for String.split or Scanner instead.

The basic loop

A StringTokenizer is an Enumeration of substrings:

StringTokenizer st = new StringTokenizer("apple banana cherry");
while (st.hasMoreTokens()) {
  System.out.println(st.nextToken());
}
// apple
// banana
// cherry

The default delimiter set is whitespace — space, tab, newline, return, form feed. Adjacent delimiters are collapsed: " a b " yields exactly two tokens.

Custom delimiters

The second constructor takes a string whose characters are each treated as a delimiter — not as a delimiter string:

StringTokenizer st = new StringTokenizer("one,two;three|four", ",;|");
while (st.hasMoreTokens()) {
  System.out.println(st.nextToken());
}
// one
// two
// three
// four

This is the design quirk that catches people. new StringTokenizer(input, ", ") treats , and space each as a delimiter, not the two-character sequence ", ". If you want multi-character delimiters, you've outgrown StringTokenizer.

Returning the delimiters themselves

The three-argument constructor controls whether delimiters are themselves emitted as tokens:

StringTokenizer st = new StringTokenizer("a+b*c", "+*", true);
while (st.hasMoreTokens()) {
  System.out.println(st.nextToken());
}
// a
// +
// b
// *
// c

This is the one feature that String.split doesn't directly replicate: you'd build it with a Matcher and a regex. For very simple expression parsing — the kind of thing that doesn't justify pulling in a lexer — this overload still earns its keep.

Counting tokens up front

countTokens() reports how many tokens remain (i.e. would be produced by repeated nextToken() calls). It does not consume them.

StringTokenizer st = new StringTokenizer("a b c d");
int n = st.countTokens();        // 4
while (st.hasMoreTokens()) st.nextToken();
n = st.countTokens();            // 0

This is occasionally useful when allocating an output array of the right size — though with String.split you'd just call .split(...).length.

Changing the delimiter mid-stream

A less-known feature: nextToken(String newDelims) re-sets the delimiter set for that call onward. Once changed, the new set persists for subsequent hasMoreTokens()/nextToken() calls until you change it again.

StringTokenizer st = new StringTokenizer("key1=value1; key2=value2; key3=value3");
while (st.hasMoreTokens()) {
  String pair = st.nextToken("; ").trim();  // tokens separated by ';' or space
  System.out.println(pair);
}

For ad-hoc one-off parsing this can be neat. For anything maintainable it's confusing — readers don't expect a delimiter set to change inside a loop.

Why `String.split` is usually better

The reasons to prefer String.split (or Pattern.compile(...).split) for new code:

Real regex delimiters. Multi-character delimiters, character classes, alternation — all natural. StringTokenizer only handles single-character delimiters.
Empty tokens are visible. "a,,b".split(",") returns ["a", "", "b"]. StringTokenizer skips the empty token silently. For CSV-shaped input, "the second field was blank" is information you usually need.
Returns an array. Easy to index, easy to convert to List, easy to stream.
Generally faster under JIT thanks to Pattern caching for simple split patterns.
Easier to test, easier to read. A tokenizer loop reads like 1996; parts = csv.split(",") reads like the intent.

Why you might still pick it

A short list of cases where StringTokenizer is still defensible:

Streaming over a very long string where you want to consume token-by-token without holding an array of all of them. StringTokenizer doesn't allocate the result up front; split does.
Returning delimiters as tokens for the simplest possible tokeniser, without reaching for Pattern/Matcher.
Maintaining old code where the rest of the file uses it and consistency is the kindest thing for the next reader.

For everything else: use split.

A worked example

The program below tokenises three inputs three different ways, side by side with the equivalent split calls, so the behavioural differences are visible:

java— editable, runs on the server

import java.util.*;

public class TokenizerDemo {
  public static void main(String[] args) {
    String csv  = "red,green,,blue";
    String mix  = "key1=value1; key2=value2; key3=value3";
    String expr = "a + b * c";

// --- 1. Empty tokens are dropped by StringTokenizer but kept by split ---
    System.out.println("-- empty tokens --");
    print("tokenizer: ", tokens(new StringTokenizer(csv, ",")));
    print("split:     ", List.of(csv.split(",")));

// --- 2. StringTokenizer treats each char of "; " as its own delimiter ---
    System.out.println("\n-- multi-char delimiters --");
    print("tokenizer: ", tokens(new StringTokenizer(mix, "; ")));   // splits on ';' OR ' '
    print("split:     ", List.of(mix.split("; ")));                    // splits on the two-char "; "

// --- 3. Returning the delimiters themselves — easy with the 3-arg ctor ---
    System.out.println("\n-- delimiters as tokens --");
    print("tokenizer: ", tokens(new StringTokenizer(expr, "+*", true)));
    // The split equivalent uses a regex with a look-around; here's the simpler tokenizer.
  }

static List<String> tokens(StringTokenizer st) {
    List<String> out = new ArrayList<>();
    while (st.hasMoreTokens()) out.add(st.nextToken());
    return out;
  }

static void print(String label, List<String> xs) {
    System.out.println(label + xs);
  }
}

Two takeaways from the output. In case 1, split reports the empty cell between green and blue ([red, green, , blue]); the tokenizer collapses it ([red, green, blue]). In case 2 both happen to produce the same tokens, but for different reasons: the tokenizer breaks mix at every ; and every space independently ("; " means "either character is a delimiter"), while split("; ") matches the literal two-character sequence "; ". They agree here only because the separators in mix are exactly ; . Change the input to "k1=v1 ;k2=v2" and the two diverge immediately. Whichever behaviour you want, the code should be explicit about it — and that's much easier with split.

What's next

So far we've split strings into pieces. Often what you really need is to turn a string into a number, a boolean, or another primitive — and the other direction, primitives back into strings. That round-trip has its own set of helpers and pitfalls. Continue to Java String conversions.

Practice

Which of the following is a **real behavioural difference** between `StringTokenizer` and `String.split`?

`StringTokenizer` silently drops empty tokens between adjacent delimiters; `String.split` returns them as empty strings`StringTokenizer` accepts only regex delimiters; `String.split` accepts only fixed-character delimiters`StringTokenizer` returns a `String[]` while `String.split` returns an `Enumeration``StringTokenizer` is thread-safe and `String.split` is not