W3docs

Java LinkedHashSet

Use LinkedHashSet in Java to maintain insertion order while keeping HashSet's near-constant-time operations.

Java LinkedHashSet

LinkedHashSet<E> is HashSet<E> with one extra promise: when you iterate, you get elements in the order you first inserted them. The hash-table machinery is identical — same buckets, same load factor, same near-constant-time add, remove, contains — but each entry carries two extra pointers (before, after) that thread the entries into a doubly-linked list as they're added. Iteration walks that list, not the bucket array.

If you want hash-set performance and deterministic, predictable iteration order, LinkedHashSet is the answer. It's almost a free upgrade for the cases where HashSet's unspecified order has bitten you.

The "first-insertion wins" rule

The order is fixed by the first time an element is inserted. Re-adding an existing element doesn't move it:

Set<String> s = new LinkedHashSet<>();
s.add(\"a\");
s.add(\"b\");
s.add(\"c\");
s.add(\"a\");   // already present — returns false, order unchanged
System.out.println(s);   // [a, b, c]

That makes it the right tool for "remember the order tags arrived" or "log unique events in time order." If you remove an element and re-add it, it goes to the end of the list — the position was tied to the current insertion, and the new one is the only one left.

The cost: pointers and pointers

The extra ordering machinery has a cost. Each entry stores not just (hash, key, next-in-bucket) like HashSet, but (hash, key, next-in-bucket, before, after). That's two extra references per element — roughly an extra 16 bytes on a 64-bit JVM. For a set of 10 million Longs, that's around 160 MB extra. For most application code it's nothing; for cache-sized data structures, it matters.

In exchange, you get O(1) on every operation (same as HashSet) plus a stable iteration order that doesn't depend on the load factor, the rehash, the hash spread, or the JVM version.

Iteration cost is proportional to size, not capacity

There's a subtle bonus over HashSet: walking a LinkedHashSet follows the linked list, so it visits exactly size entries. Iterating a HashSet walks every bucket, so it visits roughly capacity slots — including empty ones. For a sparsely populated set, that can be a significant difference. If you build a set, expand it well past the elements you'll keep, then iterate often, LinkedHashSet can actually iterate faster.

When to pick it

The decision flow:

  • Order doesn't matter, you just need fast membershipHashSet. Smaller, simpler.
  • You want insertion order rememberedLinkedHashSet. Same speed for add/contains, predictable iteration.
  • You want sorted orderTreeSet. Different algorithm, log-time operations.

The most common reason to pick LinkedHashSet is defensive: you're building a public API that returns a Set, and you don't want callers depending on HashSet's arbitrary order. A LinkedHashSet is the kindest thing you can return — it has the same contract as a Set, but iteration is reproducible across runs and JVMs, which makes user-visible output stable and tests easier to write.

A worked example: unique tags in arrival order

The program below builds two sets from the same stream of tag inputs: one with HashSet, one with LinkedHashSet. The HashSet iteration order depends on the JVM (it's stable-but-arbitrary for a given JVM); the LinkedHashSet order is exactly the order the unique elements first appeared. Then it shows the "remove and re-add" rule, and finally builds an order-preserving deduplicator that's two lines long.

java— editable, runs on the server

What to read out of the run:

  • The LinkedHashSet printed the unique events in the order they first appeared. The HashSet printed them in some other order entirely — whatever the bucket layout dictated.
  • Re-adding \"a\" did nothing to the order. Removing it and re-adding it moved it to the end. The first insertion is the one that anchors the position.
  • The order-preserving deduplicator is a one-liner once you know the trick: collect into a LinkedHashSet, then back to a list.
  • The 10-element scan through a 2 000 000-bucket LinkedHashSet walked exactly 10 entries; a HashSet of the same shape would have scanned every empty bucket in between.

What's next

The third standard Set implementation gives you something neither HashSet nor LinkedHashSet can: sorted iteration and the ability to ask range questions like "every tag between a and m." TreeSet is up next.

Practice

Practice

What does `LinkedHashSet` give you that plain `HashSet` doesn't?