Java Character Streams
Read and write text in Java with Reader, Writer, FileReader, FileWriter, and character encoding considerations.
Java Character Streams
The previous chapter covered byte streams — the raw layer where everything is byte. That layer is right for binary data and wrong for text. A UTF-8 character can be one, two, three, or four bytes; UTF-16 uses two-byte code units with surrogate pairs for anything beyond the basic multilingual plane; even ASCII text needs a "this is ASCII" decision somewhere. Calling InputStream.read() on text and casting the result to char works only if you're lucky and the file is single-byte-per-character — and the moment someone writes "é" or "日" or "🎉", the lucky version corrupts the data.
The character stream hierarchy exists to keep that decoding out of your code. Reader and Writer deal in char, not byte. The bridge classes — InputStreamReader and OutputStreamWriter — take a Charset and do the conversion. Get the charset right at the bridge, and every layer above it works with decoded text.
The Reader contract
Reader is the mirror of InputStream, one abstract pair of methods (read(char[], int, int) and close()) with conveniences on top:
int read(); // next char as int 0..65535, or -1 at end
int read(char[] buf); // read up to buf.length chars; return count or -1
int read(char[] buf, int off, int len); // into a slice
String readLine(); // only on BufferedReader — not on Reader itself
long transferTo(Writer out); // Java 10+: pipe straight to a sinkTwo subtle differences from the byte side. First, the unit is char (a 16-bit UTF-16 code unit), not byte. Second, read() returns 0..65535 for a code unit and -1 at end of stream — the same sentinel trick as InputStream, but the legal range is wider.
A char is not always one "character" — characters outside the basic multilingual plane (U+10000 and up: most emoji, ancient scripts) use two UTF-16 code units (a surrogate pair). If you split on char boundaries (e.g. read 100 chars at a time and process them in chunks) you can split a surrogate pair across two reads. For line-oriented text this rarely matters; for character-level processing of arbitrary Unicode, work in code points (String.codePoints()).
The Writer contract
Writer mirrors OutputStream:
void write(int c); // low 16 bits
void write(char[] buf);
void write(char[] buf, int off, int len);
void write(String s); // convenience — encodes a whole String
void write(String s, int off, int len);
Writer append(CharSequence csq); // chainable: w.append("a").append("b")
void flush();
void close(); // calls flush() firstwrite(String) is the convenience you'll use most: most text I/O is a small number of large writes (a JSON body, a generated report) rather than character-by-character output.
append exists for CharSequence interop — StringBuilder implements CharSequence, so a Writer can be the target of code that's writing into either depending on a flag. It's the same append method StringBuilder itself has, by interface.
Concrete character streams
| Class | What it wraps |
|---|---|
FileReader / FileWriter | A file on disk, decoded as text. |
CharArrayReader / CharArrayWriter | An in-memory char[]. |
StringReader / StringWriter | An in-memory String/StringBuilder. |
BufferedReader / BufferedWriter | A buffered view of another Reader/Writer. |
InputStreamReader / OutputStreamWriter | Bridge classes: a Reader/Writer over an underlying byte stream, with a Charset. |
PrintWriter | A Writer decorator that adds print, println, and printf. |
The bridge classes are the structural point of the whole hierarchy. Every character stream that talks to a file, socket, or pipe is — underneath — a byte stream plus a charset. FileReader is a thin wrapper around InputStreamReader(new FileInputStream(...)); FileWriter likewise around OutputStreamWriter(new FileOutputStream(...)).
The charset trap
The classic Java I/O bug:
// WRONG in any code that might run on more than one machine
try (FileReader in = new FileReader("data.txt")) { ... }
try (FileWriter out = new FileWriter("data.txt")) { ... }The no-charset constructors use the JVM's default charset, which is determined at startup from the OS locale. On a developer Mac it's almost always UTF-8. On a Linux server with a C locale it can be US-ASCII. On Windows with an English install it's Cp1252. The "works on my Mac, broken on the production box" bug is exactly this constructor.
Pass a charset explicitly:
// Right
try (FileReader in = new FileReader("data.txt", StandardCharsets.UTF_8)) { ... }
try (FileWriter out = new FileWriter("data.txt", StandardCharsets.UTF_8)) { ... }(The two-argument forms taking a Charset were added in Java 11. Before that, you had to drop down to the bridge classes — new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8) — and the chained-decorator line is one of the reasons Files.newBufferedReader(path) was added: it defaults to UTF-8 since Java 18 and was always charset-explicit before.)
The modern Files API made this default safer:
String text = Files.readString(path); // UTF-8 by default (Java 18+)
BufferedReader r = Files.newBufferedReader(path); // UTF-8 by default (always was)If you're starting fresh, use the Files factories. If you're touching legacy FileReader/FileWriter code, the cheapest fix is adding the StandardCharsets.UTF_8 second argument.
The bridge classes directly
You need InputStreamReader and OutputStreamWriter whenever the source isn't a file — a ZipEntry, a socket, an HTTP response body, System.in, an Inflater-wrapped stream — and you want text out of it:
// Read text from System.in as UTF-8
try (BufferedReader stdin = new BufferedReader(
new InputStreamReader(System.in, StandardCharsets.UTF_8))) {
String line = stdin.readLine();
}
// Write the response of an HttpURLConnection as text
try (BufferedReader resp = new BufferedReader(
new InputStreamReader(connection.getInputStream(), StandardCharsets.UTF_8))) {
resp.lines().forEach(System.out::println);
}The shape is always the same: byte stream → InputStreamReader(stream, charset) → optional BufferedReader → your code.
A worked example: text in three shapes
The program below writes a small UTF-8 text file containing ASCII, accented characters, and a multi-byte emoji, then reads it back four ways: as a String, character by character, line by line through a BufferedReader, and through the legacy FileReader(charset) constructor. The example also shows the bridge-class shape working over a ByteArrayInputStream so you can see where Reader and InputStream meet.
What to take from the run:
- The file on disk was larger than
content.length(). TheStringhaslength() == 14(counting\nand counting the 🎉 emoji as two UTF-16 code units — that's what a Javacharmeasures); UTF-8 encodes the emoji as four bytes andéas two, so the byte count is bigger. The same logical text is one number in chars, another in bytes, another in code points. Knowing which one you mean is half of charset bugs. - The char-by-char loop reassembled the exact same string. The
ReaderAPI handled UTF-8 decoding for you: a single emoji shows up as two(char) read()calls because of UTF-16 surrogates, but you never had to think about byte boundaries. BufferedReader.readLine()returned three lines:hello,café,🎉 party. That's the text-oriented vocabulary — line-by-line, terminator-aware (handles\n,\r, and\r\n), and built on top of the bridge class. Every API call this chapter and the next make ultimately reduces to "decode bytes through a charset and serve characters."- The direct
InputStreamReader(new ByteArrayInputStream(raw), UTF_8)block shows the structural shape: byte source on the inside, charset at the bridge, character API on the outside. SwapByteArrayInputStreamforsocket.getInputStream()and the rest is identical — that's why HTTP and JDBC clients all converge on the same idiom. - The final block decoded the same bytes with the wrong charset. The accented
éand the emoji both came out as garbage — the textbook mojibake bug. The bytes on disk were fine; the charset at the bridge was wrong. That's why pinning the charset explicitly is the single most useful habit in Java text I/O.
What's next
Both byte and character streams default to one-at-a-time I/O, and on a raw file stream every call is a syscall. The next chapter, Java Buffered Streams, covers the Buffered* decorators — an in-memory buffer between your code and the OS — and the readLine() API that lives there.
Practice
Why do `new FileReader(path)` and `new FileWriter(path)` (no charset argument) cause 'works on my machine, broken on the server' bugs?