W3docs

Java Deserialization

Deserialize Java objects from bytes with ObjectInputStream and understand the security pitfalls of deserialization.

Java Deserialization

Deserialization is the mirror of the previous chapter: given a stream of bytes produced by ObjectOutputStream, reconstruct the object graph. The API is ObjectInputStream.readObject(), and the mechanism is — for "trusted bytes" — almost as simple as the write side. The complication is that deserialization is the part of the serialization design with the well-publicised security problem; the second half of this chapter is about that.

try (ObjectInputStream in = new ObjectInputStream(
         new BufferedInputStream(Files.newInputStream(path)))) {
  User u = (User) in.readObject();                   // throws ClassNotFoundException, IOException
}

That's the minimal recipe. The reader sees the bytes, looks up each class by name in its own class loader, allocates instances without calling their constructors, fills in the fields by reflection, and returns the root of the graph cast to Object. You cast it to the type you expect.

What readObject returns

It returns the root object of the graph the writer wrote. The static return type is Object — the reader can't know the type at compile time — so a cast is part of the idiom:

Object raw = in.readObject();
if (raw instanceof User u) {                         // pattern match, recommended
  process(u);
} else {
  throw new IOException("expected User, got " + raw.getClass());
}

That instanceof check (or an explicit getClass() check) is the only place in normal code where you can verify the stream contained what you thought it would. Skip it and a crafted stream can hand you a different type, your code will ClassCastException, and you have no idea why.

Two checked exceptions

readObject declares two:

  • ClassNotFoundException — the stream named a class (com.example.User) that the reader's class loader can't find. You wrote User to disk; the reader's classpath doesn't include User; the deserializer can't reconstruct it.
  • IOException — anything else: truncated stream, wrong magic header, schema mismatch (InvalidClassException), stream corruption (StreamCorruptedException).

The schema-mismatch case is the common one. InvalidClassException is thrown when the reader's version of the class has a different serialVersionUID than the one in the stream — usually because the class evolved between write and read and the UID wasn't bumped (or was bumped accidentally). The message names the class and both UIDs; that's how you debug it.

Constructors don't run

This is the bit that surprises everyone: deserialization does not call your class's constructors. The JDK allocates a raw instance of the class, then fills in the fields directly via reflection from the bytes. Any invariants you established in the constructor — required-non-null fields, integer-in-range checks, idempotent initialisation — are silently bypassed.

class User implements Serializable {
  private static final long serialVersionUID = 1L;
  String name;
  int age;
  User(String name, int age) {
    if (age < 0) throw new IllegalArgumentException("age >= 0");   // never runs on read
    this.name = name;
    this.age = age;
  }
}

Hand-craft a byte stream where age = -1, run readObject, and you'll get a User with age == -1. The constructor was skipped. If you need a class invariant to survive deserialization, you have to add a readObject hook:

private void readObject(ObjectInputStream in)
    throws IOException, ClassNotFoundException {
  in.defaultReadObject();                            // do the normal field-by-field read
  if (age < 0) throw new InvalidObjectException("age must be >= 0");
}

The signature is exact: name, parameter type, exception list. It's a private method the JDK looks up by reflection — there's no interface to declare. If you write it correctly, it runs at the end of deserialization and you get a clean failure on bad data.

transient fields after the read

transient (and static) fields aren't in the stream, so the reader leaves them at their default values: null for references, 0 for numerics, false for booleans. The reconstructed object has those defaults — that's the rule from the serialization chapter, stated from the read side.

For caches, that's fine. For required fields you marked transient to avoid persisting (a Connection, a worker Thread, a derived Map), the deserialized instance is in an "incomplete" state until you finish initialising it. The readObject hook is the place to do that:

private void readObject(ObjectInputStream in)
    throws IOException, ClassNotFoundException {
  in.defaultReadObject();
  this.cache = new ConcurrentHashMap<>();            // rebuild the transient
}

Same hook, different reason — the previous section used it for validation; this one uses it for initialisation.

The security problem

Here is the warning that drives modern Java's stance on this whole API: deserialization can execute arbitrary code.

The reason: deserialization is "instantiate any class the bytes name, then run its readObject hook." Many classes in the JDK and on a typical classpath have readObject hooks that do consequential things — initialise a thread, open a file, build an object graph that triggers side effects via hashCode/equals. A carefully crafted stream can chain together (a "gadget chain") readObject calls that, on the right classpath, end with Runtime.getRuntime().exec(...).

This isn't theoretical. The 2015 Apache Commons Collections RCE, the WebSphere/JBoss/Jenkins/Weblogic vulnerabilities of 2016–2018, and most of the "Java deserialization" CVEs since are this exact pattern: the attacker gives you bytes; you call readObject on them; their gadget chain runs in your process.

The rule that came out of all of this:

Never call readObject on bytes you do not fully control.

"Fully control" means: you wrote them, on the same machine, into a file or pipe nobody else can touch. The moment the bytes cross any kind of trust boundary — a network socket, a user upload, a queue message — ObjectInputStream is the wrong tool. Use JSON or Protocol Buffers; those formats don't instantiate classes by name.

ObjectInputFilter: the partial mitigation

Java 9 added ObjectInputFilter, a hook that lets you reject classes during deserialization. Set a process-wide filter at startup and any class outside the allowlist raises InvalidClassException before its readObject hook runs:

ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(
    "com.example.*;java.util.*;!*"                   // allow these packages; reject everything else
);
ObjectInputFilter.Config.setSerialFilter(filter);

This narrows the attack surface — a gadget that needs a class outside the allowlist can't trigger. It does not make deserialization safe; gadgets exist inside java.util.*, and the allowlist has to include classes you didn't write. Use it as defence in depth, not as a primary control. The primary control is still "don't deserialize untrusted bytes."

For new code, the answer remains JSON.

A worked example: round-trip, evolution, and a failure

The program below extends the example from the serialization chapter by reading the bytes back. It deserializes the Department/Employee graph, verifies the back-references reconnected, demonstrates the transient field coming back as null, and finishes with the version-mismatch failure mode: a stream written with one serialVersionUID and read by a class with a different one.

java— editable, runs on the server

What to take from the run:

  • readObject() reconstructed the full Department graph in one call. The list of Employees came back populated, each Employee.department pointer was set correctly, and the back-reference (employee → same department instance) was preserved as object identity, not a copy. That last point is what makes serialization "graph-shaped" rather than "tree-shaped" — the JDK tracked which references it had seen and rewired them.
  • The instanceof Department d check was the gate that turned a raw Object into a typed Department. Without it, a stream containing a different type would have failed at the (Department) raw cast with ClassCastException — uglier and harder to diagnose. The instanceof form is the idiom.
  • All three passwordHash fields came back as null. Marking the field transient excluded it from the stream; the reader had no value to assign, so the field stayed at its default. That's the rule from the serialization chapter, confirmed here in the read direction.
  • The version-mismatch block produced the InvalidClassException you should expect: the stream said "UID = 1" and the class said "UID = 2," so the JDK refused to instantiate. The error message names both UIDs — that's how you find out which class drifted. Production-grade code declares serialVersionUID explicitly and bumps it only when the change is incompatible.
  • Nothing in this example called any Employee or Department constructor. The objects came into existence via reflection, fields filled in directly. Any constructor-time validation (if (salary < 0) throw ...) was bypassed; if you need it to run on the read side, that's what the private readObject hook is for. The Practice question at the bottom drills that point.

What's next

Serialization and deserialization closed out the streaming side of java.io — bytes, characters, and graphs of objects, all written as streams. The next chapter, Java NIO Overview, steps up to a different API family: java.nio and java.nio.file. NIO replaces some of java.io, complements the rest, and is the home of the modern Path and Files classes that the file-related chapters have been quietly using already.

Practice

Practice

A class invariant — 'salary must be greater than 0' — is enforced in the constructor of a `Serializable` class. An attacker hands your server a serialized byte stream where the salary field is encoded as -1. What happens when your code calls `readObject()`?