Java DataInput and DataOutput Streams
Read and write Java primitive types in a portable binary format with DataInputStream and DataOutputStream.
Java DataInput and DataOutput Streams
So far in this part: bytes (raw or buffered) for arbitrary binary data, characters for text. There's a third use case the previous chapters don't cover — writing a Java int, double, or boolean to a file and reading it back as the same type, in a format another JVM (running on a different OS, with a different default byte order) will agree on.
This is what DataInputStream and DataOutputStream exist for. They're decorators that sit on top of any byte stream and add typed read/write methods: writeInt, writeDouble, writeUTF, readInt, readDouble, readUTF. The binary format is documented, fixed, big-endian, and portable across every JVM ever shipped.
What you write is what you read
DataOutputStream exposes a method per primitive type:
void writeBoolean(boolean v); // 1 byte (0 or 1)
void writeByte(int v); // 1 byte (low 8 bits)
void writeShort(int v); // 2 bytes, big-endian
void writeChar(int v); // 2 bytes, big-endian (UTF-16 code unit)
void writeInt(int v); // 4 bytes, big-endian
void writeLong(long v); // 8 bytes, big-endian
void writeFloat(float v); // 4 bytes, IEEE 754
void writeDouble(double v); // 8 bytes, IEEE 754
void writeUTF(String s); // modified UTF-8 with a 2-byte length prefixDataInputStream has the matching readInt, readLong, readUTF, and so on. The contract is symmetric: write an int with writeInt, read it back with readInt, get the same number, every time, on every JVM, on every operating system.
Three things to internalise:
-
The format has no field separators. A file with
writeInt(42); writeUTF("alice"); writeDouble(3.14)is4 + 2 + 5 + 8 = 19bytes laid down with no markers between them. You must read in the same order with the same types. There is no schema, no self-description, no recovery if you guess wrong. -
writeUTFis modified UTF-8. The prefix is an unsigned 16-bit length (so 65,535 byte max per string), andU+0000is encoded as two bytes (0xC0 0x80) instead of the standard one byte. The format is incompatible with plain UTF-8 — you can't read awriteUTFstring with aReader. Use it only when both sides are Java. -
Big-endian, always. Native machine byte order varies (x86 is little-endian, network protocols are big-endian) but
DataOutputStreamwrites big-endian unconditionally. That's what makes the format portable. If you need little-endian for a protocol you don't control, usejava.nio.ByteBufferinstead — it has a settable byte order.
When to reach for data streams
Two cases:
- You control both sides and want a simple, compact, language-portable binary format. A "save file" for a small Java game, a fixture file for a unit test, a cache that doesn't need to outlive the JVM version. The format is straightforward to write and parse; you don't pull in a serialization library.
- You're reading a file format that happens to use the Java data-stream layout. Class files (
.class),RandomAccessFile-formatted records, some.jarindex files. These were all written withDataOutputStreambecause the JDK builds the format itself.
When you need cross-language interop (Python, Go, JS), reach for JSON, Protocol Buffers, or MessagePack instead. When you need versioning and schema evolution, ObjectOutputStream (next chapter) is closer — but it's heavier and has its own pitfalls.
The end-of-file rule
Where InputStream.read() returns -1 at end of stream, DataInputStream.readInt() (and friends) throws EOFException. There's no in-band sentinel — a legal int can be any 32-bit value, including -1, so the only way to signal end-of-stream is the exception.
try (DataInputStream in = new DataInputStream(new BufferedInputStream(Files.newInputStream(path)))) {
try {
while (true) {
int x = in.readInt();
process(x);
}
} catch (EOFException e) {
// normal end of stream
}
}That try/catch for normal termination is the idiomatic shape. It's unusual for the JDK to make a control-flow signal out of an exception, but the typed-read API has no other option — there's no value to return that isn't also a valid int.
For files where you control the format, the better pattern is to write a length prefix at the front:
out.writeInt(n);
for (int i = 0; i < n; i++) out.writeInt(values[i]);Then the read side loops n times and never has to catch EOFException for control flow.
Buffer before you decorate
DataInputStream does not buffer. Every readInt becomes a series of read() calls on the underlying stream. If that underlying stream is a FileInputStream, each readInt is four syscalls. Always wrap with BufferedInputStream first:
// Right
DataInputStream in = new DataInputStream(new BufferedInputStream(Files.newInputStream(path)));
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(Files.newOutputStream(path)));That's the standard three-deep stack: file → buffered → data. The same order applies to writing. Skip the buffer and you pay the syscall-per-byte cost from the buffered-streams chapter, multiplied by the number of bytes per primitive.
A worked example: a tiny binary record format
The program below defines a minimal binary record — an int id, a UTF name, a double score, a boolean active — and writes a few records to a temp file with DataOutputStream. It reads them back with DataInputStream using both the count-prefix pattern and the EOFException pattern, and finally shows the format-mismatch failure mode where the reader and writer disagree on field types.
What to take from the run:
- The file size came out to exactly the bytes you'd predict by adding up the typed widths: 4 (count) + per-record (4 + length-prefixed UTF + 8 + 1). No padding, no separators. A data-stream file is the bytes laid down, nothing else.
- Both read patterns produced the same three records. The count-prefix pattern is the better one when you're designing the format; the EOFException pattern is what you fall back to when you can't change the writer and the format is open-ended.
- The format-mismatch block wrote two
ints and read onelong. The bytes on disk (00 00 00 2A 00 00 00 63) were valid for either interpretation —DataInputStreamhas no way to tell. The two interpretations are mutually consistent byte-by-byte and mutually wrong on the semantic level. That's the cost of a schema-free binary format: discipline at the boundary is the only protection. - Every stream was wrapped
Files.newInputStream→BufferedInputStream→DataInputStream(and the same on the write side). Skip the buffer andreadIntbecomes four syscalls; the data-stream layer is purely format conversion and adds no buffering of its own. writeUTFwas used for the name. The format is fine for inter-Java communication and useless for anything else — don't pick it for a config file you might one day read in Python. For "Java only and I want it small," it's the right tool; for "anyone else might read this," go to JSON or Protobuf.
What's next
Data streams handle one primitive at a time and require the reader to know the format. The next chapter, Java PrintWriter, goes back to the character side and covers the Writer decorator that adds print, println, and printf — the API you've been using on System.out since chapter 1, finally as the file-writer it always was.
Practice
A file was written by `DataOutputStream` on a Linux x86 server (little-endian native byte order) with `out.writeInt(1)`. What does `DataInputStream.readInt()` return on a Windows ARM laptop reading the same file?