W3docs

Java String Class

A deeper look at Java's String class — its design, internal structure, and core methods.

Java String Class

String is the most-used reference type in Java by a wide margin. You met it on day one — String name = "Ada"; — and it slid into your code without much ceremony. This part of the book digs underneath that ceremony. The class has more depth than its surface suggests: a fixed internal layout, language-level syntax that other types don't get, a memory pool that affects identity, and a deliberate design choice (immutability) that ripples through threading, hashing, and security.

This chapter is the map. The rest of Part 9 fills in each region.

What kind of thing is a String?

A String is a regular Java object — java.lang.String, in the same package as Object and Integer. It's final, so you can't subclass it, and every method that "modifies" a string actually returns a new one. The original is never touched.

String greeting = "hello";
greeting.toUpperCase();          // returns "HELLO" — discarded
System.out.println(greeting);    // still prints "hello"
greeting = greeting.toUpperCase();
System.out.println(greeting);    // now prints "HELLO"

That return-a-new-one habit is the most important fact about the class. It's covered in depth in Java String immutability.

Language-level treatment

Two pieces of syntax are reserved for String and aren't extensible to other types:

  • String literals"hello" produces a String object directly, without new. The compiler also deduplicates identical literals into the string pool.
  • The + operator — overloaded for strings: "a" + "b" is "ab". Even mixed expressions like "score: " + 42 work, because Java boxes the right-hand side to a String.

Behind the scenes, modern Java compilers translate + chains using StringBuilder or the invokedynamic-based StringConcatFactory, so you rarely need to write the concatenation manually. The compiler knows what to do.

Internal layout

Before Java 9, every String held a char[] — two bytes per character regardless of content. Java 9 introduced compact strings: the backing array is now a byte[], plus a one-byte coder field that records whether the bytes are Latin-1 (one byte per character) or UTF-16 (two bytes). For text that fits in Latin-1 — most code, configuration, identifiers, plain English — this roughly halves the memory footprint without changing the API.

You can't see the field, you can't change it, you don't need to think about it. But it's why string-heavy programs in JDK 9+ use noticeably less heap than they did on JDK 8.

The core method families

The String API is large but organizes into a handful of recognisable groups:

Inspection. length(), isEmpty(), isBlank(), charAt(i), codePointAt(i), hashCode().

Searching. indexOf, lastIndexOf, contains, startsWith, endsWith, matches.

Extracting. substring(start), substring(start, end), chars(), codePoints(), toCharArray().

Transforming. toUpperCase(), toLowerCase(), trim(), strip(), replace, replaceAll, replaceFirst, concat.

Splitting and joining. split, String.join — covered in split() and join().

Formatting. String.format, the instance formatted method, and printf-style output — covered in String formatting.

Comparison. equals, equalsIgnoreCase, compareTo, compareToIgnoreCase, contentEquals — covered in String comparison.

Conversion. valueOf (static), toString (instance), parsing helpers on Integer, Double, etc. — covered in String conversions.

Whole-API listings sit in the JDK Javadoc. The skill is recognising which family you reach for, not memorising every overload.

Strings are sequences of UTF-16 code units

charAt(i) and length() count UTF-16 code units, not Unicode characters. For text inside the Basic Multilingual Plane (the bulk of common scripts), one char = one character and the distinction never matters. For supplementary characters — most emoji, some CJK extensions, ancient scripts — a single user-visible character occupies two chars, a surrogate pair.

String emoji = "🙂";
System.out.println(emoji.length());          // 2 — two code units
System.out.println(emoji.codePointCount(0, emoji.length())); // 1 — one code point

If you need to iterate by Unicode code point, use codePoints() or codePointAt. For most ASCII-ish use cases — splitting CSVs, formatting log lines, comparing identifiers — length() and charAt are exactly what you want.

Mutable cousins: StringBuilder and StringBuffer

When you need to build up a string piece by piece, repeated += allocates a new String on every step. The standard library ships two mutable companions for that case:

  • StringBuilder — fast, single-threaded.
  • StringBuffer — same API, synchronised methods, useful only when more than one thread writes to the same buffer.

They have parallel APIs and parallel chapters in this part of the book.

A worked example

A small exercise that touches the most common families — inspection, searching, extracting, transforming, and conversion — on the same input. Read the output line by line; each call illustrates one tool from the list above.

java— editable, runs on the server

The last line is the punchline. After every transformation we used, line itself is byte-identical to the literal we started with — proof that the return-a-new-one model is real, not just a documentation note.

What's next

Strings come with a memory model that's unique in the standard library: identical literals share storage, and you can opt arbitrary strings into that shared pool with one method call. Continue to Java String pool.

Practice

Practice

Which statement about Java's `String` class is true?