Java HashSet
Use the hash-table-backed HashSet for fast unordered sets in Java.
Java HashSet
HashSet<E> is the implementation you reach for first when you want a set. It's backed by a hash table — internally, it's a HashMap with a dummy value — so add, remove, and contains are expected O(1): the cost is a hash of the element plus one or two equality checks, regardless of how many elements are already in the set. That's the property that makes hash sets the right answer for "have I seen this before?" questions, deduplication passes, and any membership check that would be quadratic against a List.
What "near-constant time" actually means
Constant-time isn't free; it's amortised. Every operation does roughly this:
- Compute
e.hashCode(). Mash the high and low bits together so a hash like0x...0000doesn't collapse into bucket 0. - Look up the bucket at
bucketIndex = hash & (table.length - 1). - Walk the bucket's linked chain (or, since Java 8, a small balanced tree if the chain got long) calling
equalsuntil you find the element or hit the end.
Step 3 is where the cost goes wrong if your hashCode is bad. With a sensible hash, the chain is one or two elements long; with a constant hash, it's every element you ever inserted. That's the difference between O(1) and O(n) per operation.
Capacity, load factor, and the rehash
A HashSet has a backing array of buckets. Two constructor parameters control it:
- Initial capacity — the starting bucket count. Defaults to 16. Rounded up to a power of two.
- Load factor — the ratio of elements to buckets at which the table doubles in size. Defaults to 0.75.
When size / capacity exceeds the load factor, the set rehashes: it allocates a new array twice as big and re-buckets every element. A rehash is O(n) — that's the cost that gets amortised across the O(1) inserts before it. Pre-sizing a set you know will hold ~1 000 000 elements saves you twenty doublings:
Set<Long> ids = new HashSet<>(1_500_000); // skip the doublings up to ~1MSmaller load factors (e.g. 0.5) waste memory but reduce collisions; larger ones (e.g. 0.9) pack tighter but make chains longer. The default 0.75 is a balance Sun calibrated decades ago and it still holds up — don't touch it without a benchmark.
Null, ordering, thread safety
Three rules:
- One
nullelement is allowed.HashSetstores it in bucket 0 with a special hash of 0. That's a deliberate convenience —Map.of/Set.ofandTreeSetboth forbidnull. - No iteration order is guaranteed. The order changes when the table rehashes and isn't even consistent across JVMs. If you need insertion order, use LinkedHashSet; if you need sorted order, use TreeSet.
- Not thread-safe. Concurrent mutation will corrupt the structure. For multi-threaded code use
ConcurrentHashMap.newKeySet()(aSetview of a concurrent map) or wrap inCollections.synchronizedSet.
hashCode is your responsibility
Putting your own class into a HashSet only works if you override hashCode and equals consistently. The contract from Object:
- If
a.equals(b)thena.hashCode() == b.hashCode(). - If
a.hashCode() == b.hashCode(),a.equals(b)may still be false (a collision).
Breaking the first half of the contract is the most common source of "I added it, but contains returns false" bugs. Modern IDEs and the record keyword generate both methods for you — use them.
record Tag(String name) {} // hashCode/equals auto-generated
Set<Tag> tags = new HashSet<>();
tags.add(new Tag(\"java\"));
System.out.println(tags.contains(new Tag(\"java\"))); // trueThe mutable-element trap
A subtler bug: storing a mutable object, then mutating it after insertion. The hash code that decided which bucket the element lives in was computed at insert time; the object is now in the "wrong" bucket and contains walks a chain that doesn't include it.
StringBuilder sb = new StringBuilder(\"a\");
Set<StringBuilder> set = new HashSet<>();
set.add(sb);
sb.append(\"b\");
set.contains(sb); // probably false — same object, different hashThe fix isn't to be clever; it's to put immutable elements in hash sets. String, Integer, your own records, freshly snapshotted DTOs. If you need a set keyed by some mutable state, key by an immutable projection of it.
A worked example: dedup, membership, and capacity
The program below demonstrates the four reasons you reach for a HashSet: deduplication, fast membership tests, set algebra, and the cost of a bad hashCode.
What to take away:
- The dedup loop is O(n) — every
addis constant-time, and the finalunique.size()is the number of distinct inputs. - A
containsin a 1 000 000-element set returned in microseconds. That's the property that makesHashSetthe membership-test tool of the JDK. - The
Tagrecord getsequals/hashCodefor free, so twoTag(\"java\")objects collapse to one element. - The
StringBuilderexample is the trap: same object, mutated after insertion, set now confused. Put immutable elements in hash sets.
What's next
HashSet doesn't promise any iteration order. If you need to remember the order you inserted elements in — say you're building a tag list and the user expects to see tags in the order they were added — the right tool is LinkedHashSet. That's the next chapter.
Practice
You insert your own class `Customer` into a `HashSet`, then look it up and `contains` returns `false` for a `Customer` that should be equal to one you inserted. What's the most likely cause?