Java Byte Streams: InputStream and OutputStream

Chapter 1 introduced the java.io design as a stack of decorators: a raw stream at the bottom, layers of functionality wrapped around it, the highest layer exposing the API you call. The first six chapters of this part lived at the top of that stack — Files.readString, Files.lines, Files.writeString. This chapter drops one layer down to the byte-oriented abstraction the whole stack is built on: InputStream and OutputStream.

Every file, socket, pipe, and in-memory buffer in java.io is — at the bottom — a stream of bytes. Even a UTF-8 text file is bytes on disk; the "this is text" view comes from a Reader layered on top of an InputStream. Knowing the byte API matters when the data isn't text (images, audio, archives, network protocols), when you need to copy bytes without decoding them, and when you want to understand what the higher-level APIs are really doing.

The `InputStream` contract

InputStream is a one-method abstract class. The one method is:

public abstract int read() throws IOException;

It returns the next byte as an int in the range 0..255, or -1 when the stream is exhausted. The int is not a mistake: a byte in Java is signed (-128..127), but the stream contract is unsigned, so the wider return type makes "end of stream" (-1) distinguishable from a real byte value (0xFF reads back as 255, not -1).

Three more methods are defined on top of read() and are what you usually call:

int read(byte[] buf);                  // read up to buf.length bytes; return count or -1
int read(byte[] buf, int off, int len); // same, into a slice
byte[] readAllBytes();                  // Java 9+: read everything into a byte[]
long transferTo(OutputStream out);       // Java 9+: pipe straight to a sink, no copy loop

readAllBytes() is the convenience for small files; transferTo is the convenience for copying without decoding. For everything else there's the buffered-read loop, which is the canonical shape:

byte[] buf = new byte[8192];
int n;
while ((n = in.read(buf)) != -1) {
  out.write(buf, 0, n);                 // n bytes, not buf.length — the last chunk is short
}

Two things to internalise. First, the read(byte[]) calls return how many bytes were actually read, not always buf.length. The last read is almost always partial; treating the buffer as full corrupts the data. Second, read() and read(byte[]) are blocking — they return when at least one byte is available or the stream ends. They do not return early on a slow disk or a slow socket.

Skipping, peeking, and rewinding

InputStream also defines three methods you reach for less often but should recognise:

long skip(long n);     // discard up to n bytes without copying them anywhere
int  available();      // bytes you can read right now without blocking — an estimate, not a length
boolean markSupported();
void mark(int readAheadLimit);  // remember this position
void reset();                    // jump back to the last mark

Two traps live here. available() is not the size of the stream — for a file it often is, but for a socket it's "bytes already buffered," which can be 0 mid-transfer. Never write new byte[in.available()] and assume you read the whole thing. And mark/reset only work if markSupported() returns true; a raw FileInputStream returns false, so wrap it in a BufferedInputStream (next chapter) when you need to peek ahead and back up.

The `OutputStream` contract

The mirror class is OutputStream, also one abstract method:

public abstract void write(int b) throws IOException;

It writes the low 8 bits of b and ignores the rest. The convenience overloads are:

void write(byte[] buf);                    // write the whole array
void write(byte[] buf, int off, int len);  // write a slice — this is the one you usually want
void flush();                               // push buffered data to the OS
void close();                               // flush + release resources

flush() only matters if the stream buffers. Raw FileOutputStream doesn't — every write calls the OS — so flush is a no-op. BufferedOutputStream (next chapter) is where buffering, and the need to flush, live.

close() calls flush() first. That's why "forgot to close the buffered stream" silently truncates the file: the tail buffer is sitting in memory waiting for a flush that never comes.

Concrete byte streams

The concrete subclasses you'll actually instantiate:

Class	What it wraps
`FileInputStream` / `FileOutputStream`	A file on disk. Opens a file descriptor.
`ByteArrayInputStream` / `ByteArrayOutputStream`	An in-memory `byte[]`. Useful for tests and for capturing output.
`BufferedInputStream` / `BufferedOutputStream`	A buffered view of another stream.
`PipedInputStream` / `PipedOutputStream`	A producer/consumer pipe between threads.
`DataInputStream` / `DataOutputStream`	Layered on a byte stream to read/write primitives portably.

FileInputStream and FileOutputStream are the raw file streams. They are unbuffered: every read()/write() is one syscall. That's catastrophic for byte-at-a-time loops — millions of syscalls — and merely fine for chunked reads with an 8 KB or larger buffer. The buffered chapter is what makes the byte-at-a-time API affordable.

// Raw, unbuffered — fine for chunked reads
try (FileInputStream in = new FileInputStream("photo.jpg")) {
  byte[] buf = new byte[8192];
  int n;
  while ((n = in.read(buf)) != -1) { /* process buf[0..n] */ }
}

// Equivalent one-liner, Java 7+
byte[] all = Files.readAllBytes(Path.of("photo.jpg"));

Files.readAllBytes is the right call for small files; for anything that might not fit in memory, the chunked loop is the safe shape.

Three patterns worth memorising

The three things you do with byte streams over and over:

// 1. Copy a file
try (InputStream in  = Files.newInputStream(src);
     OutputStream out = Files.newOutputStream(dst)) {
  in.transferTo(out);                                 // Java 9+: no manual loop
}
// Java 7+ one-liner: Files.copy(src, dst);

// 2. Read everything into memory
byte[] all = Files.readAllBytes(path);                 // small-file shortcut

// 3. Build a byte[] you don't know the size of in advance
ByteArrayOutputStream baos = new ByteArrayOutputStream();
in.transferTo(baos);
byte[] bytes = baos.toByteArray();

ByteArrayOutputStream is the "grow as you go" byte sink. It's how the JDK itself implements readAllBytes() on streams whose length isn't known up front. It never throws on write (until you run out of heap) and has no close() semantics worth thinking about, which makes it the standard test fixture for "capture what this writer produced."

When to reach for byte streams

The honest answer: when the data isn't text. Anything binary — images, audio, video, archives (.zip, .tar), executables, protocol buffers, custom file formats — is bytes and stays bytes.

When the data is text, prefer the character-stream side (Reader/Writer, next chapter) or the modern Files.readString / Files.lines. Reading a text file as raw bytes and decoding by hand is the standard way to invent your own charset bug — UTF-8 multi-byte characters get split across read() calls and you reassemble them wrong. The Reader layer exists precisely so you don't have to think about that.

A worked example: copy, hash, and capture

The program below exercises the byte-stream API end to end. It writes a small binary file (a header plus some payload), reads it back chunk by chunk into a checksum, copies it to a second file with transferTo, and captures another copy into a ByteArrayOutputStream so you can see the in-memory sink in action. The temp files clean themselves up on exit.

java— editable, runs on the server

import java.io.*;
import java.nio.file.*;
import java.util.zip.CRC32;

public class ByteStreamsDemo {
  public static void main(String[] args) throws IOException {
    Path src = Files.createTempFile("bytes-src-", ".bin");
    Path dst = Files.createTempFile("bytes-dst-", ".bin");
    src.toFile().deleteOnExit();
    dst.toFile().deleteOnExit();

// --- 1. Write a small binary file: 4-byte header + 1000 bytes of payload ---
    try (OutputStream out = Files.newOutputStream(src)) {
      out.write(new byte[]{(byte) 0xCA, (byte) 0xFE, (byte) 0xBA, (byte) 0xBE});
      byte[] payload = new byte[1000];
      for (int i = 0; i < payload.length; i++) payload[i] = (byte) (i % 256);
      out.write(payload);
    }
    System.out.println("wrote " + Files.size(src) + " bytes");

// --- 2. Read it back chunk by chunk, computing a CRC32 along the way ---
    CRC32 crc = new CRC32();
    try (InputStream in = Files.newInputStream(src)) {
      byte[] buf = new byte[256];                    // small on purpose: forces multiple reads
      int n;
      int chunks = 0;
      while ((n = in.read(buf)) != -1) {
        crc.update(buf, 0, n);                       // n, not buf.length — last chunk is short
        chunks++;
      }
      System.out.println("read in " + chunks + " chunks; crc32 = "
          + Long.toHexString(crc.getValue()));
    }

// --- 3. Copy src -> dst with transferTo (no manual loop) ---
    try (InputStream  in  = Files.newInputStream(src);
         OutputStream out = Files.newOutputStream(dst)) {
      long copied = in.transferTo(out);
      System.out.println("transferTo copied " + copied + " bytes");
    }
    System.out.println("dst size = " + Files.size(dst));

// --- 4. Capture an in-memory copy with ByteArrayOutputStream ---
    byte[] all;
    try (InputStream in = Files.newInputStream(src);
         ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
      in.transferTo(baos);
      all = baos.toByteArray();
    }
    System.out.printf("in-memory copy length = %d; first 4 bytes = %02X %02X %02X %02X%n",
        all.length, all[0], all[1], all[2], all[3]);

// --- 5. End-of-stream sentinel: read() returns -1, never 255 ---
    try (InputStream in = new ByteArrayInputStream(new byte[]{(byte) 0xFF})) {
      int first  = in.read();                        // 255
      int second = in.read();                        // -1
      System.out.println("first read  = " + first  + "  (the byte 0xFF as unsigned int)");
      System.out.println("second read = " + second + "  (sentinel: end of stream)");
    }
  }
}

What to take from the run:

The write side used Files.newOutputStream — a Files-flavour factory that returns a plain OutputStream. Once you have it, the API is the same one Java's had since 1.0. The factory just spares you constructing FileOutputStream and worrying about open options.
The read loop used n, not buf.length, when calling crc.update. The reason is in the line of output: "read in N chunks." The buffer was 256 bytes and the file was 1004 bytes, so the last chunk was short. Using buf.length would have hashed garbage past the real data.
in.transferTo(out) is the JDK's tested copy loop. It's measurably faster than a hand-written loop on most JVMs because it can use a 16 KB buffer and skip the safepoint checks, and it's one line instead of five. Reach for it any time you'd otherwise write a while ((n = in.read(buf)) != -1) loop with no other logic inside.
ByteArrayOutputStream plugged straight into transferTo. It looks like a file but lives in memory — the same API. That symmetry is what makes java.io testable: pass a ByteArrayInputStream for the source, a ByteArrayOutputStream for the sink, and you can unit-test code that "writes to a file" without touching the disk.
The final block printed 255 then -1. That's the contract: 0xFF is a legal byte value and reads back as 255; -1 is the out of band sentinel that says "no more bytes." Treating the return as a byte (instead of int) and comparing == -1 would silently treat a real 0xFF as end-of-stream. Always store the result in an int and compare to -1 before casting.

What's next

Bytes are the right abstraction for binary data. The next chapter, Java Character Streams, covers the parallel hierarchy for text — Reader and Writer, charset bridging, and why "just new FileReader(path)" is the classic source of "works on my machine, broken on the server" bugs.

Practice

What does `InputStream.read()` return when the stream contains a single byte with value `0xFF`, and what does it return on the next call?

`255` first (the byte `0xFF` as an unsigned int), then `-1` on the next call as the end-of-stream sentinel`-1` first (the byte `0xFF` as a signed byte cast to int), then `-1` again at end-of-stream`0xFF` first, then throws `EOFException` on the next call`255` first, then `0` to indicate the stream is empty

Practice

In the loop `while ((n = in.read(buf)) != -1) out.write(buf, 0, n);`, why pass `n` rather than `buf.length` to `write`?

`read(byte[])` returns how many bytes were actually read, which is often fewer than `buf.length` on the final chunk; using `buf.length` would write stale bytes past the real data`buf.length` is illegal as the third argument to `write`They are always equal, so it makes no difference which you pass`n` is the offset into `buf`, and `buf.length` is the count

The InputStream contract

Skipping, peeking, and rewinding

The OutputStream contract

Concrete byte streams

Three patterns worth memorising

When to reach for byte streams

A worked example: copy, hash, and capture

What's next

Practice

The `InputStream` contract

The `OutputStream` contract