W3docs

Reading Files in Java

Read text and binary files in Java using FileReader, BufferedReader, Scanner, Files.readString, and streams.

Reading Files in Java

There are five common ways to read a text file in Java, and the right choice depends almost entirely on the file's size and what you want to do with the contents. This chapter walks the five from simplest to most flexible:

  1. Files.readString(path) — whole file as one String.
  2. Files.readAllLines(path) — whole file as a List<String>.
  3. Files.readAllBytes(path) — whole file as a byte[].
  4. Files.lines(path) — file as a Stream<String>, lazy.
  5. BufferedReader / Scanner — classic decorators, full control.

Pick the smallest tool that fits. Reading a 4 GB log with Files.readString is an OutOfMemoryError; reading a 12-line config with BufferedReader and a while loop is six lines of code where one would do.

Files.readString(path) — whole file, one call

String text = Files.readString(Path.of("config.json"), StandardCharsets.UTF_8);

Added in Java 11. Returns the full file as a String. Uses UTF-8 by default since Java 18 (Charset is still strongly recommended to pin explicitly, even with the new default). Throws IOException if the file doesn't exist or can't be read; throws OutOfMemoryError if the file is bigger than the heap.

Use when: the file is "small enough" — config files, JSON payloads, MDX chapters, anything you'd be willing to read in a single editor window. The classic informal rule is under a few megabytes.

Files.readAllLines(path) — list of lines

List<String> lines = Files.readAllLines(Path.of("hosts.txt"), StandardCharsets.UTF_8);

Returns an immutable List<String> of the file's lines, with line terminators stripped. Same memory profile as readString plus the List overhead — also holds the whole file in memory.

Use when: you want to index by line number, sort the file, or feed lines into a for (String line : lines) loop without setting up streams.

Files.readAllBytes(path) — raw bytes

byte[] raw = Files.readAllBytes(Path.of("photo.png"));

The byte equivalent. No Charset because no decoding happens. Use for binary files (images, archives, executables) or when you need to compute a hash or pipe bytes into a ByteArrayInputStream.

Files.lines(path) — lazy stream

try (Stream<String> lines = Files.lines(Path.of("app.log"), StandardCharsets.UTF_8)) {
  long errors = lines.filter(l -> l.contains("ERROR")).count();
}

This is the only built-in reader that scales to arbitrarily large files. The Stream<String> is lazy — lines are read on demand, not all at once — and connects directly to the Part 12 pipeline vocabulary (filter, map, count, toList).

Two non-negotiables:

  • try-with-resources is required. The stream owns an open file handle; without try-with-resources, the file stays open until GC, and you'll exhaust file descriptors on a busy server.
  • Don't reuse the stream after a terminal op. Streams are single-use.

Use when: the file is too big for readAllLines, or you want the line-by-line transform to compose with the rest of your stream pipeline.

BufferedReader.readLine() — the classic

BufferedReader is the workhorse the modern helpers wrap. It buffers underlying reads into a fixed-size in-memory chunk so that readLine() doesn't issue one syscall per character.

try (BufferedReader in = Files.newBufferedReader(Path.of("hosts.txt"), StandardCharsets.UTF_8)) {
  String line;
  while ((line = in.readLine()) != null) {
    System.out.println(line);
  }
}

Files.newBufferedReader(path) is the modern factory; the classic version is new BufferedReader(new FileReader("hosts.txt")) (which uses the platform charset on JDKs older than 18 — pin to UTF-8 with the three-argument overload). The readLine() contract is:

  • Returns the next line without its terminator (\n, \r, or \r\n).
  • Returns null at end of file. The loop condition (line = readLine()) != null is the established idiom.

BufferedReader is also a Stream<String>-producer: reader.lines() returns a Stream<String> backed by the reader. That's how Files.lines is implemented under the hood.

Scanner — token-by-token parsing

Scanner reads text by tokens — words, integers, doubles, lines, even regex matches — and is the right tool for reading structured input where the units aren't whole lines.

try (Scanner sc = new Scanner(Files.newBufferedReader(Path.of("nums.txt")))) {
  while (sc.hasNextInt()) {
    int n = sc.nextInt();
    System.out.println(n * n);
  }
}

Scanner is slower than BufferedReader because it parses; it allocates short strings and runs regex. For line-by-line processing, prefer BufferedReader. For typed tokens out of a small file (numbers, words, CSV-ish input), Scanner saves the parsing layer.

There's a full chapter on Scanner later in this part — this is the read-a-file flavour.

FileReader — the raw character reader

try (FileReader in = new FileReader("notes.txt", StandardCharsets.UTF_8)) {
  int c;
  while ((c = in.read()) != -1) {
    System.out.print((char) c);
  }
}

FileReader reads characters straight from the file — no buffering, no line awareness, no decoding choices made for you (you pass the Charset, or accept the platform default on pre-18 JDKs). It's the layer the others sit on top of. You almost never use it directly in application code; you wrap it in a BufferedReader.

It's still useful when you want to read a few hundred characters and stop — small lookups where the cost of a buffer setup is dwarfed by the call cost.

Which one to use

ScenarioPick
Small file you want as a single StringFiles.readString
Small file you want as a List<String>Files.readAllLines
Binary file (image, archive)Files.readAllBytes
Any file with a stream-style transformFiles.lines (inside try-with-resources)
Line-by-line loop, full controlFiles.newBufferedReader + readLine
Typed tokens (ints, words, regex matches)Scanner
One character at a time, tiny fileFileReader

The right default for the "I just want to load this small text file" case is Files.readString. The right default for "process this giant log without blowing memory" is Files.lines.

A worked example: same file, five readers

The program below writes a small text file, then reads it five different ways — readString, readAllLines, Files.lines filtered through a Predicate<String> from Part 12's vocabulary, BufferedReader.readLine, and Scanner for tokenised integers. Each block prints what it got so you can see the shapes side by side.

java— editable, runs on the server

What to take from the run:

  • Files.readString returned the whole file as one String — easy and exactly what you want for small configs and templates. For a 4 GB log it would have thrown OutOfMemoryError.
  • Files.readAllLines returned an indexable List<String> with terminators stripped. lines.get(0) worked because the list is materialised in memory; you couldn't do that with a stream.
  • Files.lines(file) was opened inside try-with-resources because the stream owns the file handle. The pipeline .filter(isError).count() is the same shape as anything from Part 12 — only the source changed.
  • BufferedReader.readLine() returned null at end of file. The for loop here stopped at three on purpose, but the production idiom is while ((line = in.readLine()) != null).
  • Scanner skipped lines that didn't start with an integer, then read tokens with nextInt() until it ran out. The same Scanner could have read doubles (nextDouble), regex matches (findInLine), or BigIntegers — that's why it costs more per token than BufferedReader does per line.

What's next

The next chapter, Writing Files in Java, covers the writing side of the same APIs — Files.writeString, Files.write, BufferedWriter, PrintWriter, and the StandardOpenOption flags (APPEND, CREATE_NEW, TRUNCATE_EXISTING) that decide how an existing file is handled.

Practice

Practice

You need to process a 5 GB server log line by line, counting how many lines contain the word `ERROR`. Which reader is the right pick?