W3docs

Java Serialization

Serialize Java objects to bytes with the Serializable interface, ObjectOutputStream, and serialVersionUID.

Java Serialization

The previous chapters covered streams of content — bytes, characters, primitives, lines. Serialization is one step up the ladder: a stream of objects. You call writeObject(someObject) and the JDK walks the entire reference graph from that object, encoding every field of every reachable object as bytes, and writes the result to the stream. On the read side, readObject() reconstructs the graph.

That's a big claim with a big asterisk attached. Serialization works, has worked since Java 1.1, and you'll see it in old codebases (RMI, EJB, session replication, some caching layers). But the design has well-known problems — fragile versioning, security holes, tight coupling between persistence and class shape — and Oracle has been publicly trying to retire it for years. For new code, the answer is almost always JSON or Protocol Buffers. This chapter is here so you can read and maintain the code that already exists.

The mechanism

Three pieces:

  1. The Serializable marker interface. A class declares it can be serialized by implementing java.io.Serializable. The interface has no methods; it's a flag the JDK checks at runtime.
  2. ObjectOutputStream. A decorator that wraps any OutputStream and adds writeObject(Object). It's the engine that walks the graph and writes the bytes.
  3. ObjectInputStream (next chapter). The mirror that reads the bytes and reconstructs the graph.
class User implements Serializable {                 // the marker
  private static final long serialVersionUID = 1L;
  String name;
  int age;
  User(String name, int age) { this.name = name; this.age = age; }
}

try (ObjectOutputStream out = new ObjectOutputStream(Files.newOutputStream(path))) {
  out.writeObject(new User("alice", 30));            // the user is now on disk
}

That's the minimal recipe. The class implements Serializable; the writer is ObjectOutputStream; the call is writeObject. On the next read of that file (covered in the next chapter) you get a User instance back.

What gets written

Everything reachable from the object, by default:

  • Every non-transient, non-static field, by reflection, in declaration order.
  • Recursively, every object those fields reference.
  • For each class involved, a descriptor (the class name, field types, and serialVersionUID) so the reader can validate the format.

The format is binary, self-describing (it carries class metadata), and not human-readable. It is also specific to the Java type system — the bytes encode field offsets, type names, and inheritance hierarchies that mean nothing outside Java. This is the cardinal limitation: a User.bin file cannot be read by Python, Go, or JavaScript without a custom parser.

transient: fields you don't want serialized

A field marked transient is skipped during serialization. The reader sees it as the default value for its type — null, 0, false. Use it for:

  • Caches that can be rebuilt: transient Map<String, Result> cache;
  • Fields that don't make sense across JVMs: transient Thread worker;, transient Connection db;
  • Sensitive data you don't want on disk: transient String password;
class Session implements Serializable {
  private static final long serialVersionUID = 1L;
  String userId;
  long createdAt;
  transient byte[] sessionToken;                     // never gets written
}

The deserialized Session will have sessionToken == null. Your code has to handle the field being absent after reconstruction.

Static fields are also skipped — static belongs to the class, not the instance, so it isn't part of the per-object state.

serialVersionUID: declare it explicitly

Every serializable class has a serialVersionUID — a 64-bit version number written into the stream and checked against the class on the read side. If they don't match, deserialization throws InvalidClassException.

You should always declare it:

private static final long serialVersionUID = 1L;

If you don't, the JVM computes one from the class shape — every field, every method signature, every interface. Add a field, change a method's return type, rename a parameter, and the computed UID changes. Code that wrote User.bin with last week's class can't read it with this week's class. You won't catch this in unit tests because both sides see the same class. You will catch it in production when a user upgrades.

Declaring the UID explicitly puts you in control. Bump it manually only when you have made an incompatible change. (See Serializable Javadoc for the full evolution rules — they are intricate.)

What you can change between versions

The rules for "compatible" changes are surprisingly strict. Roughly:

  • Safe: adding new fields, removing transient/static fields, expanding access (privatepublic).
  • Unsafe: removing non-transient fields, changing a field's type, changing a class's serialVersionUID, changing the inheritance chain.

The point: the on-disk bytes are coupled to the shape of the class hierarchy, not just the data. Long-term storage formats need their own schema. Serialization is fine for short-lived caches and intra-JVM transport, brittle for anything that has to outlive a deploy.

The whole graph, including cycles

writeObject follows every reference. If User holds a Team and the Team holds a List<User> that includes the first User, the cycle is handled: the JDK tracks the identity of every object it writes and, when it encounters one a second time, writes a back-reference instead of recursing again. The reconstructed graph on the other side has the same identity relationships.

That's powerful and a footgun. A serializable object pulls in everything it can reach — and if any of those reachable objects isn't Serializable, the write fails with NotSerializableException naming the offending type. The fix is one of: implement Serializable on the offender, mark the field transient, or restructure the class to not hold the reference.

Security: never deserialize untrusted bytes

This is mostly a next-chapter topic, but the consequence shapes the write side too. Java's serialization format runs code on the reader — class constructors and readObject hooks — during deserialization. Crafted byte streams have been used for remote code execution against every major Java app server. The rule that has emerged from years of CVEs:

Do not deserialize bytes from any source you do not fully control.

On the write side this means: don't design protocols where one party serializes data with ObjectOutputStream and another deserializes it with ObjectInputStream. Use JSON or Protocol Buffers across trust boundaries; reserve serialization for "same JVM, same class loader, same trust domain" use cases.

When to use serialization (and when not)

Reach for it when:

  • You need to checkpoint a graph of objects in the same JVM for restart recovery.
  • You're working with an existing framework (RMI, JMX, EJB, some session replication) that requires it.
  • You want a 10-line implementation for a "save game" file you can break compatibility on any time.

Don't reach for it when:

  • The format has to outlive a deploy. Use a schema-versioned format instead (JSON + a version field, Protobuf, Avro).
  • The data crosses a trust boundary. Use JSON or Protobuf.
  • Another language has to read or write the data. The Java serialization format is Java-only.

For most new code, Jackson.writeValueAsString(obj) to a JSON file is the better choice. It's schemaless-but-flexible, human-readable, and parseable from any language.

A worked example: writing a graph of records

The program below defines two simple serializable types, Department and Employee, with a back-reference (each Employee knows its Department, and each Department keeps a list of its Employees — a cycle). It writes the graph with ObjectOutputStream, dumps the byte count, and shows the NotSerializableException you get when a non-serializable field sneaks in. Reading the bytes back is the next chapter; here we focus on the write side.

java— editable, runs on the server

What to take from the run:

  • One writeObject(eng) call serialized the Department, all three Employees, the back-references from Employee to Department, and the list inside Department. That's the headline feature of serialization: graphs, not records. Cycles handled, identity preserved, no manual walking.
  • The first four bytes were AC ED 00 05 — the Java serialization "magic number" and stream version. Every serialized file starts with these. If you see this header on a file you found in production, you're looking at ObjectOutputStream output.
  • The byte dump contained "alice" (a non-transient field) and did not contain "hash-A" (a transient field). Marking a field transient is the supported way to exclude it. Sensitive fields (passwords, tokens, session keys) belong in transient.
  • The BadEmployee write threw NotSerializableException and the message named Settings — the exact non-serializable type. That's how you find offenders: try to write, read the exception, fix the named class (or mark the field transient). The check happens at the field, not at the class level — one stray non-serializable reference is enough.
  • serialVersionUID = 1L was declared on every serializable class. The current run wouldn't notice if it were missing, but a future you who refactors the class and tries to load an old file with the new code would notice immediately. Declare it; bump it deliberately when you make an incompatible change.

What's next

This chapter covered writing — Serializable, ObjectOutputStream, the graph traversal, the format. Reading and reconstructing the graph is the mirror operation with its own set of pitfalls (the security one being the biggest). That's the next chapter, Java Deserialization.

Practice

Practice

An `Employee` class has a `transient String sessionToken` field. The token is `'abc123'` at serialization time. After deserialization in a new JVM, what is the value of `sessionToken` on the reconstructed object?