W3docs

Java Byte Streams

Read and write binary data in Java with InputStream, OutputStream, FileInputStream, and FileOutputStream.

Java Byte Streams

Chapter 1 introduced the java.io design as a stack of decorators: a raw stream at the bottom, layers of functionality wrapped around it, the highest layer exposing the API you call. The first six chapters of this part lived at the top of that stack — Files.readString, Files.lines, Files.writeString. This chapter drops one layer down to the byte-oriented abstraction the whole stack is built on: InputStream and OutputStream.

Every file, socket, pipe, and in-memory buffer in java.io is — at the bottom — a stream of bytes. Even a UTF-8 text file is bytes on disk; the "this is text" view comes from a Reader layered on top of an InputStream. Knowing the byte API matters when the data isn't text (images, audio, archives, network protocols), when you need to copy bytes without decoding them, and when you want to understand what the higher-level APIs are really doing.

The InputStream contract

InputStream is a one-method abstract class. The one method is:

public abstract int read() throws IOException;

It returns the next byte as an int in the range 0..255, or -1 when the stream is exhausted. The int is not a mistake: a byte in Java is signed (-128..127), but the stream contract is unsigned, so the wider return type makes "end of stream" (-1) distinguishable from a real byte value (0xFF reads back as 255, not -1).

Three more methods are defined on top of read() and are what you usually call:

int read(byte[] buf);                  // read up to buf.length bytes; return count or -1
int read(byte[] buf, int off, int len); // same, into a slice
byte[] readAllBytes();                  // Java 9+: read everything into a byte[]
long transferTo(OutputStream out);       // Java 9+: pipe straight to a sink, no copy loop

readAllBytes() is the convenience for small files; transferTo is the convenience for copying without decoding. For everything else there's the buffered-read loop, which is the canonical shape:

byte[] buf = new byte[8192];
int n;
while ((n = in.read(buf)) != -1) {
  out.write(buf, 0, n);                 // n bytes, not buf.length — the last chunk is short
}

Two things to internalise. First, the read(byte[]) calls return how many bytes were actually read, not always buf.length. The last read is almost always partial; treating the buffer as full corrupts the data. Second, read() and read(byte[]) are blocking — they return when at least one byte is available or the stream ends. They do not return early on a slow disk or a slow socket.

The OutputStream contract

The mirror class is OutputStream, also one abstract method:

public abstract void write(int b) throws IOException;

It writes the low 8 bits of b and ignores the rest. The convenience overloads are:

void write(byte[] buf);                    // write the whole array
void write(byte[] buf, int off, int len);  // write a slice — this is the one you usually want
void flush();                               // push buffered data to the OS
void close();                               // flush + release resources

flush() only matters if the stream buffers. Raw FileOutputStream doesn't — every write calls the OS — so flush is a no-op. BufferedOutputStream (next chapter) is where buffering, and the need to flush, live.

close() calls flush() first. That's why "forgot to close the buffered stream" silently truncates the file: the tail buffer is sitting in memory waiting for a flush that never comes.

Concrete byte streams

The concrete subclasses you'll actually instantiate:

ClassWhat it wraps
FileInputStream / FileOutputStreamA file on disk. Opens a file descriptor.
ByteArrayInputStream / ByteArrayOutputStreamAn in-memory byte[]. Useful for tests and for capturing output.
BufferedInputStream / BufferedOutputStreamA buffered view of another stream.
PipedInputStream / PipedOutputStreamA producer/consumer pipe between threads.
DataInputStream / DataOutputStreamLayered on a byte stream to read/write primitives portably.

FileInputStream and FileOutputStream are the raw file streams. They are unbuffered: every read()/write() is one syscall. That's catastrophic for byte-at-a-time loops — millions of syscalls — and merely fine for chunked reads with an 8 KB or larger buffer. The buffered chapter is what makes the byte-at-a-time API affordable.

// Raw, unbuffered — fine for chunked reads
try (FileInputStream in = new FileInputStream("photo.jpg")) {
  byte[] buf = new byte[8192];
  int n;
  while ((n = in.read(buf)) != -1) { /* process buf[0..n] */ }
}

// Equivalent one-liner, Java 7+
byte[] all = Files.readAllBytes(Path.of("photo.jpg"));

Files.readAllBytes is the right call for small files; for anything that might not fit in memory, the chunked loop is the safe shape.

Three patterns worth memorising

The three things you do with byte streams over and over:

// 1. Copy a file
try (InputStream in  = Files.newInputStream(src);
     OutputStream out = Files.newOutputStream(dst)) {
  in.transferTo(out);                                 // Java 9+: no manual loop
}
// Java 7+ one-liner: Files.copy(src, dst);

// 2. Read everything into memory
byte[] all = Files.readAllBytes(path);                 // small-file shortcut

// 3. Build a byte[] you don't know the size of in advance
ByteArrayOutputStream baos = new ByteArrayOutputStream();
in.transferTo(baos);
byte[] bytes = baos.toByteArray();

ByteArrayOutputStream is the "grow as you go" byte sink. It's how the JDK itself implements readAllBytes() on streams whose length isn't known up front. It never throws on write (until you run out of heap) and has no close() semantics worth thinking about, which makes it the standard test fixture for "capture what this writer produced."

When to reach for byte streams

The honest answer: when the data isn't text. Anything binary — images, audio, video, archives (.zip, .tar), executables, protocol buffers, custom file formats — is bytes and stays bytes.

When the data is text, prefer the character-stream side (Reader/Writer, next chapter) or the modern Files.readString / Files.lines. Reading a text file as raw bytes and decoding by hand is the standard way to invent your own charset bug — UTF-8 multi-byte characters get split across read() calls and you reassemble them wrong. The Reader layer exists precisely so you don't have to think about that.

A worked example: copy, hash, and capture

The program below exercises the byte-stream API end to end. It writes a small binary file (a header plus some payload), reads it back chunk by chunk into a checksum, copies it to a second file with transferTo, and captures another copy into a ByteArrayOutputStream so you can see the in-memory sink in action. The temp files clean themselves up on exit.

java— editable, runs on the server

What to take from the run:

  • The write side used Files.newOutputStream — a Files-flavour factory that returns a plain OutputStream. Once you have it, the API is the same one Java's had since 1.0. The factory just spares you constructing FileOutputStream and worrying about open options.
  • The read loop used n, not buf.length, when calling crc.update. The reason is in the line of output: "read in N chunks." The buffer was 256 bytes and the file was 1004 bytes, so the last chunk was short. Using buf.length would have hashed garbage past the real data.
  • in.transferTo(out) is the JDK's tested copy loop. It's measurably faster than a hand-written loop on most JVMs because it can use a 16 KB buffer and skip the safepoint checks, and it's one line instead of five. Reach for it any time you'd otherwise write a while ((n = in.read(buf)) != -1) loop with no other logic inside.
  • ByteArrayOutputStream plugged straight into transferTo. It looks like a file but lives in memory — the same API. That symmetry is what makes java.io testable: pass a ByteArrayInputStream for the source, a ByteArrayOutputStream for the sink, and you can unit-test code that "writes to a file" without touching the disk.
  • The final block printed 255 then -1. That's the contract: 0xFF is a legal byte value and reads back as 255; -1 is the out of band sentinel that says "no more bytes." Treating the return as a byte (instead of int) and comparing == -1 would silently treat a real 0xFF as end-of-stream. Always store the result in an int and compare to -1 before casting.

What's next

Bytes are the right abstraction for binary data. The next chapter, Java Character Streams, covers the parallel hierarchy for text — Reader and Writer, charset bridging, and why "just new FileReader(path)" is the classic source of "works on my machine, broken on the server" bugs.

Practice

Practice

What does `InputStream.read()` return when the stream contains a single byte with value `0xFF`, and what does it return on the next call?