W3docs

Java String split() and join()

Split Java strings on delimiters with split() and combine arrays of strings with String.join().

Java String split() and join()

String.split and String.join are the two methods you'll reach for whenever a delimiter-separated value needs to become a list, or a list needs to become a delimiter-separated value. They cover the lion's share of CSV parsing, header splitting, path building, and a thousand small log-line shape changes. They're also the methods most often misused, because the first argument to split is a regex, not a literal — a fact that has tripped every Java developer at least once.

split(regex) — string into pieces

The simplest call splits on a delimiter and returns a String[]:

String[] parts = "red,green,blue".split(",");
// ["red", "green", "blue"]

That argument is a regular expression. For an ordinary punctuation character like ,, the regex form looks exactly like the literal form, which is why most uses look harmless. The trouble starts when the delimiter is a regex metacharacter:

"127.0.0.1".split(".");      // [] — '.' matches *any* character, every char is a delimiter
"127.0.0.1".split("\\.");    // ["127", "0", "0", "1"] — escape the dot
"x|y|z".split("|");          // ["", "x", "|", "y", "|", "z"] — '|' is alternation
"x|y|z".split("\\|");        // ["x", "y", "z"]

The metacharacters that need escaping for a literal split: ., |, \, (, ), [, ], {, }, +, *, ?, ^, $. A safer pattern for literal multi-character delimiters is Pattern.quote:

String delim = "::";
String[] xs = "a::b::c".split(Pattern.quote(delim));   // ["a", "b", "c"]

Pattern.quote wraps the input in \Q...\E so every character inside is taken literally, regex metacharacters and all.

The limit argument: empty trailing fields

split(regex, limit) controls how many splits happen and what happens to trailing empty fields:

  • limit > 0 — at most limit pieces; the last one contains the remainder unsplit.
  • limit == 0 — unlimited splits; trailing empty strings are removed.
  • limit < 0 — unlimited splits; trailing empty strings are kept.

That middle behaviour is the silent surprise. A CSV row with two missing trailing fields parses with the wrong shape by default:

"a,b,,,".split(",");      // ["a", "b"]               — trailing empties stripped
"a,b,,,".split(",", -1);  // ["a", "b", "", "", ""]   — trailing empties kept
"a,b,,,".split(",",  3);  // ["a", "b", ",,"]          — third element absorbs the rest

For any tabular data — CSVs, TSVs, log lines with fixed field count — pass -1. The day a field is legitimately empty at the end of a row is the day a missing -1 becomes a wrong-shape parse downstream.

Streams and lists

The natural follow-on:

List<String> parts = Arrays.asList(csv.split(","));         // fixed-size, backed by the array
List<String> mutable = new ArrayList<>(parts);              // copy into a growable list

// Or via streams, with cheap transformation along the way:
List<Integer> ints = Pattern.compile(",")
    .splitAsStream("1,2,3,4")
    .map(String::trim)
    .map(Integer::parseInt)
    .toList();

Pattern.compile(regex).splitAsStream(input) is the lazy-stream alternative when you want to map/filter without materialising the array first. For one-off splits, String#split is fine; for a delimiter you reuse a lot, pre-compiling the Pattern once and reusing it skips repeated compilation.

String.join — pieces into a string

The opposite direction is String.join, added in Java 8. The first argument is the delimiter, the rest are the parts — either as varargs or as any Iterable<? extends CharSequence>:

String csv = String.join(",", "red", "green", "blue");      // "red,green,blue"
String csv2 = String.join(",", List.of("red", "green", "blue"));

List<String> tags = List.of("java", "strings", "split");
String hashtags  = "#" + String.join(" #", tags);            // "#java #strings #split"

This replaces the old loop-with-conditional-comma pattern entirely. It's also the most efficient assembly when the parts are already in hand — internally it sizes a single buffer and writes once.

String.join happily accepts an empty delimiter:

String concatenated = String.join("", "a", "b", "c");        // "abc"

That's occasionally what you want; for non-trivial concatenation prefer StringBuilder.

Collectors.joining for streams

When the pieces are arriving from a stream pipeline, Collectors.joining is the matching collector:

String list = users.stream()
    .map(User::name)
    .collect(Collectors.joining(", "));
// "Ada, Linus, Grace"

String pretty = users.stream()
    .map(User::name)
    .collect(Collectors.joining(", ", "[", "]"));
// "[Ada, Linus, Grace]"

The three-argument form takes delimiter, prefix, and suffix. It's the idiomatic way to render a list as "(a, b, c)"-style output without manually trimming a trailing comma.

Beware regex collisions in split

A handful of subtle traps the JDK does not warn you about:

  • Pipe (|) is alternation. "a|b".split("|") does not do what you think.
  • Dot is "any character". "1.2.3".split(".") returns an empty array.
  • split always returns a non-null array. Even "".split(",") returns [""], not [] — useful to know when iterating.
  • Empty regex "" matches between every character. "abc".split("") returns ["a", "b", "c"]. Not a bug; sometimes useful.

If in doubt, use Pattern.quote(delim).

replace vs replaceAll is the same trap

While we're on the topic of regex-as-string arguments: String#replace(target, replacement) is literal for both arguments. String#replaceAll(regex, replacement) is regex for the first and partial-regex for the second ($1 group references, \\ escapes). Same words, very different parsers. Most of the time you want replace, not replaceAll.

A worked example

A program that parses three rows of pseudo-CSV (with mid-line empties and missing trailing fields), normalises the data, then renders it back out with both String.join and Collectors.joining. The two split calls illustrate the limit = -1 rule.

java— editable, runs on the server

Read the first three lines of output: every row reports 5 cells, including the row with two trailing empties. Without the -1, that last row would arrive as 3 cells and the missing-field logic later in your program would never run. The 1.2.3 lines at the bottom are the single best demonstration of why . needs escaping in a regex split — what looks like a one-character delimiter is in fact "match any character".

What's next

That closes Part 9 — you've got a working grasp of how String is built, how the pool works, why immutability matters, the two mutable buffers, formatting, comparison, conversion, and the split/join round-trip. The next part is one of Java's most powerful — and most argued-about — features: generics. They turn collections from "container of Objects you have to cast" into "container of a specific type the compiler checks for you", and that single idea reaches into almost every modern Java API. Continue to Java Generics intro.

Practice

Practice

You're parsing a CSV row with a fixed schema of 5 fields. The last two fields are sometimes empty. Which call to `split` gives you exactly 5 strings every time?