Java String Class
A deeper look at Java's String class — its design, internal structure, and core methods.
Java String Class
String is the most-used reference type in Java by a wide margin. You met it on day one — String name = "Ada"; — and it slid into your code without much ceremony. This part of the book digs underneath that ceremony. The class has more depth than its surface suggests: a fixed internal layout, language-level syntax that other types don't get, a memory pool that affects identity, and a deliberate design choice (immutability) that ripples through threading, hashing, and security.
This chapter is the map. The rest of Part 9 fills in each region.
What kind of thing is a String?
A String is a regular Java object — java.lang.String, in the same package as Object and Integer. It's final, so you can't subclass it, and every method that "modifies" a string actually returns a new one. The original is never touched.
String greeting = "hello";
greeting.toUpperCase(); // returns "HELLO" — discarded
System.out.println(greeting); // still prints "hello"
greeting = greeting.toUpperCase();
System.out.println(greeting); // now prints "HELLO"That return-a-new-one habit is the most important fact about the class. It's covered in depth in Java String immutability.
Language-level treatment
Two pieces of syntax are reserved for String and aren't extensible to other types:
- String literals —
"hello"produces aStringobject directly, withoutnew. The compiler also deduplicates identical literals into the string pool. - The
+operator — overloaded for strings:"a" + "b"is"ab". Even mixed expressions like"score: " + 42work, because Java boxes the right-hand side to aString.
Behind the scenes, modern Java compilers translate + chains using StringBuilder or the invokedynamic-based StringConcatFactory, so you rarely need to write the concatenation manually. The compiler knows what to do.
Internal layout
Before Java 9, every String held a char[] — two bytes per character regardless of content. Java 9 introduced compact strings: the backing array is now a byte[], plus a one-byte coder field that records whether the bytes are Latin-1 (one byte per character) or UTF-16 (two bytes). For text that fits in Latin-1 — most code, configuration, identifiers, plain English — this roughly halves the memory footprint without changing the API.
You can't see the field, you can't change it, you don't need to think about it. But it's why string-heavy programs in JDK 9+ use noticeably less heap than they did on JDK 8.
The core method families
The String API is large but organizes into a handful of recognisable groups:
Inspection. length(), isEmpty(), isBlank(), charAt(i), codePointAt(i), hashCode().
Searching. indexOf, lastIndexOf, contains, startsWith, endsWith, matches.
Extracting. substring(start), substring(start, end), chars(), codePoints(), toCharArray().
Transforming. toUpperCase(), toLowerCase(), trim(), strip(), replace, replaceAll, replaceFirst, concat.
Splitting and joining. split, String.join — covered in split() and join().
Formatting. String.format, the instance formatted method, and printf-style output — covered in String formatting.
Comparison. equals, equalsIgnoreCase, compareTo, compareToIgnoreCase, contentEquals — covered in String comparison.
Conversion. valueOf (static), toString (instance), parsing helpers on Integer, Double, etc. — covered in String conversions.
Whole-API listings sit in the JDK Javadoc. The skill is recognising which family you reach for, not memorising every overload.
Strings are sequences of UTF-16 code units
charAt(i) and length() count UTF-16 code units, not Unicode characters. For text inside the Basic Multilingual Plane (the bulk of common scripts), one char = one character and the distinction never matters. For supplementary characters — most emoji, some CJK extensions, ancient scripts — a single user-visible character occupies two chars, a surrogate pair.
String emoji = "🙂";
System.out.println(emoji.length()); // 2 — two code units
System.out.println(emoji.codePointCount(0, emoji.length())); // 1 — one code pointIf you need to iterate by Unicode code point, use codePoints() or codePointAt. For most ASCII-ish use cases — splitting CSVs, formatting log lines, comparing identifiers — length() and charAt are exactly what you want.
Mutable cousins: StringBuilder and StringBuffer
When you need to build up a string piece by piece, repeated += allocates a new String on every step. The standard library ships two mutable companions for that case:
StringBuilder— fast, single-threaded.StringBuffer— same API, synchronised methods, useful only when more than one thread writes to the same buffer.
They have parallel APIs and parallel chapters in this part of the book.
A worked example
A small exercise that touches the most common families — inspection, searching, extracting, transforming, and conversion — on the same input. Read the output line by line; each call illustrates one tool from the list above.
The last line is the punchline. After every transformation we used, line itself is byte-identical to the literal we started with — proof that the return-a-new-one model is real, not just a documentation note.
What's next
Strings come with a memory model that's unique in the standard library: identical literals share storage, and you can opt arbitrary strings into that shared pool with one method call. Continue to Java String pool.
Practice
Which statement about Java's `String` class is true?