W3docs

Java DataInput and DataOutput Streams

Read and write Java primitive types in a portable binary format with DataInputStream and DataOutputStream.

Java DataInput and DataOutput Streams

So far in this part: bytes (raw or buffered) for arbitrary binary data, characters for text. There's a third use case the previous chapters don't cover — writing a Java int, double, or boolean to a file and reading it back as the same type, in a format another JVM (running on a different OS, with a different default byte order) will agree on.

This is what DataInputStream and DataOutputStream exist for. They're decorators that sit on top of any byte stream and add typed read/write methods: writeInt, writeDouble, writeUTF, readInt, readDouble, readUTF. The binary format is documented, fixed, big-endian, and portable across every JVM ever shipped.

What you write is what you read

DataOutputStream exposes a method per primitive type:

void writeBoolean(boolean v);    //  1 byte (0 or 1)
void writeByte(int v);            //  1 byte (low 8 bits)
void writeShort(int v);           //  2 bytes, big-endian
void writeChar(int v);            //  2 bytes, big-endian (UTF-16 code unit)
void writeInt(int v);             //  4 bytes, big-endian
void writeLong(long v);           //  8 bytes, big-endian
void writeFloat(float v);         //  4 bytes, IEEE 754
void writeDouble(double v);       //  8 bytes, IEEE 754
void writeUTF(String s);          //  modified UTF-8 with a 2-byte length prefix

DataInputStream has the matching readInt, readLong, readUTF, and so on. The contract is symmetric: write an int with writeInt, read it back with readInt, get the same number, every time, on every JVM, on every operating system.

Three things to internalise:

  1. The format has no field separators. A file with writeInt(42); writeUTF("alice"); writeDouble(3.14) is 4 + 2 + 5 + 8 = 19 bytes laid down with no markers between them. You must read in the same order with the same types. There is no schema, no self-description, no recovery if you guess wrong.

  2. writeUTF is modified UTF-8. The prefix is an unsigned 16-bit length (so 65,535 byte max per string), and U+0000 is encoded as two bytes (0xC0 0x80) instead of the standard one byte. The format is incompatible with plain UTF-8 — you can't read a writeUTF string with a Reader. Use it only when both sides are Java.

  3. Big-endian, always. Native machine byte order varies (x86 is little-endian, network protocols are big-endian) but DataOutputStream writes big-endian unconditionally. That's what makes the format portable. If you need little-endian for a protocol you don't control, use java.nio.ByteBuffer instead — it has a settable byte order.

When to reach for data streams

Two cases:

  • You control both sides and want a simple, compact, language-portable binary format. A "save file" for a small Java game, a fixture file for a unit test, a cache that doesn't need to outlive the JVM version. The format is straightforward to write and parse; you don't pull in a serialization library.
  • You're reading a file format that happens to use the Java data-stream layout. Class files (.class), RandomAccessFile-formatted records, some .jar index files. These were all written with DataOutputStream because the JDK builds the format itself.

When you need cross-language interop (Python, Go, JS), reach for JSON, Protocol Buffers, or MessagePack instead. When you need versioning and schema evolution, ObjectOutputStream (next chapter) is closer — but it's heavier and has its own pitfalls.

The end-of-file rule

Where InputStream.read() returns -1 at end of stream, DataInputStream.readInt() (and friends) throws EOFException. There's no in-band sentinel — a legal int can be any 32-bit value, including -1, so the only way to signal end-of-stream is the exception.

try (DataInputStream in = new DataInputStream(new BufferedInputStream(Files.newInputStream(path)))) {
  try {
    while (true) {
      int x = in.readInt();
      process(x);
    }
  } catch (EOFException e) {
    // normal end of stream
  }
}

That try/catch for normal termination is the idiomatic shape. It's unusual for the JDK to make a control-flow signal out of an exception, but the typed-read API has no other option — there's no value to return that isn't also a valid int.

For files where you control the format, the better pattern is to write a length prefix at the front:

out.writeInt(n);
for (int i = 0; i < n; i++) out.writeInt(values[i]);

Then the read side loops n times and never has to catch EOFException for control flow.

Buffer before you decorate

DataInputStream does not buffer. Every readInt becomes a series of read() calls on the underlying stream. If that underlying stream is a FileInputStream, each readInt is four syscalls. Always wrap with BufferedInputStream first:

// Right
DataInputStream  in  = new DataInputStream(new BufferedInputStream(Files.newInputStream(path)));
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(Files.newOutputStream(path)));

That's the standard three-deep stack: file → buffered → data. The same order applies to writing. Skip the buffer and you pay the syscall-per-byte cost from the buffered-streams chapter, multiplied by the number of bytes per primitive.

A worked example: a tiny binary record format

The program below defines a minimal binary record — an int id, a UTF name, a double score, a boolean active — and writes a few records to a temp file with DataOutputStream. It reads them back with DataInputStream using both the count-prefix pattern and the EOFException pattern, and finally shows the format-mismatch failure mode where the reader and writer disagree on field types.

java— editable, runs on the server

What to take from the run:

  • The file size came out to exactly the bytes you'd predict by adding up the typed widths: 4 (count) + per-record (4 + length-prefixed UTF + 8 + 1). No padding, no separators. A data-stream file is the bytes laid down, nothing else.
  • Both read patterns produced the same three records. The count-prefix pattern is the better one when you're designing the format; the EOFException pattern is what you fall back to when you can't change the writer and the format is open-ended.
  • The format-mismatch block wrote two ints and read one long. The bytes on disk (00 00 00 2A 00 00 00 63) were valid for either interpretation — DataInputStream has no way to tell. The two interpretations are mutually consistent byte-by-byte and mutually wrong on the semantic level. That's the cost of a schema-free binary format: discipline at the boundary is the only protection.
  • Every stream was wrapped Files.newInputStreamBufferedInputStreamDataInputStream (and the same on the write side). Skip the buffer and readInt becomes four syscalls; the data-stream layer is purely format conversion and adds no buffering of its own.
  • writeUTF was used for the name. The format is fine for inter-Java communication and useless for anything else — don't pick it for a config file you might one day read in Python. For "Java only and I want it small," it's the right tool; for "anyone else might read this," go to JSON or Protobuf.

What's next

Data streams handle one primitive at a time and require the reader to know the format. The next chapter, Java PrintWriter, goes back to the character side and covers the Writer decorator that adds print, println, and printf — the API you've been using on System.out since chapter 1, finally as the file-writer it always was.

Practice

Practice

A file was written by `DataOutputStream` on a Linux x86 server (little-endian native byte order) with `out.writeInt(1)`. What does `DataInputStream.readInt()` return on a Windows ARM laptop reading the same file?