Java HashSet Class | W3Docs Learn Java

HashSet<E> is the implementation you reach for first when you want a set. It's backed by a hash table — internally, it's a HashMap with a dummy value — so add, remove, and contains are expected O(1): the cost is a hash of the element plus one or two equality checks, regardless of how many elements are already in the set. That's the property that makes hash sets the right answer for "have I seen this before?" questions, deduplication passes, and any membership check that would be quadratic against a List.

What "near-constant time" actually means

Constant-time isn't free; it's amortised. Every operation does roughly this:

Compute e.hashCode(). Mash the high and low bits together so a hash like 0x...0000 doesn't collapse into bucket 0.
Look up the bucket at bucketIndex = hash & (table.length - 1).
Walk the bucket's linked chain (or, since Java 8, a small balanced tree if the chain got long) calling equals until you find the element or hit the end.

Step 3 is where the cost goes wrong if your hashCode is bad. With a sensible hash, the chain is one or two elements long; with a constant hash, it's every element you ever inserted. That's the difference between O(1) and O(n) per operation.

Capacity, load factor, and the rehash

A HashSet has a backing array of buckets. Two constructor parameters control it:

Initial capacity — the starting bucket count. Defaults to 16. Rounded up to a power of two.
Load factor — the ratio of elements to buckets at which the table doubles in size. Defaults to 0.75.

When size / capacity exceeds the load factor, the set rehashes: it allocates a new array twice as big and re-buckets every element. A rehash is O(n) — that's the cost that gets amortised across the O(1) inserts before it. Pre-sizing a set you know will hold ~1 000 000 elements saves you twenty doublings:

Set<Long> ids = new HashSet<>(1_500_000); // skip the doublings up to ~1M

Smaller load factors (e.g. 0.5) waste memory but reduce collisions; larger ones (e.g. 0.9) pack tighter but make chains longer. The default 0.75 is a balance Sun calibrated decades ago and it still holds up — don't touch it without a benchmark.

Null, ordering, thread safety

Three rules:

One null element is allowed. HashSet stores it in bucket 0 with a special hash of 0. That's a deliberate convenience — Map.of/Set.of and TreeSet both forbid null.
No iteration order is guaranteed. The order changes when the table rehashes and isn't even consistent across JVMs. If you need insertion order, use LinkedHashSet; if you need sorted order, use TreeSet.
Not thread-safe. Concurrent mutation will corrupt the structure. For multi-threaded code use ConcurrentHashMap.newKeySet() (a Set view of a concurrent map) or wrap in Collections.synchronizedSet.

`hashCode` is your responsibility

Putting your own class into a HashSet only works if you override hashCode and equals consistently. The contract from Object:

If a.equals(b) then a.hashCode() == b.hashCode().
If a.hashCode() == b.hashCode(), a.equals(b) may still be false (a collision).

Breaking the first half of the contract is the most common source of "I added it, but contains returns false" bugs. Modern IDEs and the record keyword generate both methods for you — use them.

record Tag(String name) {}            // hashCode/equals auto-generated
Set<Tag> tags = new HashSet<>();
tags.add(new Tag("java"));
System.out.println(tags.contains(new Tag("java"))); // true

The mutable-element trap

A subtler bug: storing an object whose hashCode depends on mutable fields, then mutating it after insertion. The hash that decided which bucket the element lives in was computed at insert time; once you change a field the hash relies on, the object is in the "wrong" bucket and contains walks a chain that doesn't include it — even though it's the very same reference.

class Box {
    int n;
    Box(int n) { this.n = n; }
    @Override public boolean equals(Object o) {
        return o instanceof Box b && b.n == n;
    }
    @Override public int hashCode() { return Integer.hashCode(n); }
}

Box box = new Box(1);
Set<Box> set = new HashSet<>();
set.add(box);
box.n = 2;                  // mutate a field hashCode depends on
System.out.println(set.contains(box)); // false — element is now in the wrong bucket

Note that this only bites when hashCode reads mutable state. StringBuilder, for example, uses identity hashing, so mutating it never moves it between buckets — but relying on that is fragile. The fix isn't to be clever; it's to put immutable elements in hash sets. String, Integer, your own records, freshly snapshotted DTOs. If you need a set keyed by some mutable state, key by an immutable projection of it.

A worked example: dedup, membership, and capacity

The program below demonstrates the four reasons you reach for a HashSet: deduplication, fast membership tests, set algebra, and the cost of a bad hashCode.

java— editable, runs on the server

import java.util.*;

public class HashSetShowcase {
  // Element whose hashCode depends on a mutable field -- the trap waiting to happen.
  static class Box {
    int n;
    Box(int n) { this.n = n; }
    @Override public boolean equals(Object o) { return o instanceof Box b && b.n == n; }
    @Override public int hashCode() { return Integer.hashCode(n); }
    @Override public String toString() { return "Box(" + n + ")"; }
  }

public static void main(String[] args) {
    // --- 1. Deduplication of a stream of inputs ---
    String[] raw = { "java", "Java", "java", "python", "java", "go", "go" };
    Set<String> unique = new HashSet<>();
    for (String s : raw) unique.add(s);
    System.out.println("input size:  " + raw.length);
    System.out.println("unique size: " + unique.size());
    System.out.println("unique:      " + unique);

// --- 2. Membership against a 1M-element set ---
    Set<Integer> big = new HashSet<>(1_500_000); // pre-sized
    for (int i = 0; i < 1_000_000; i++) big.add(i);
    long t0 = System.nanoTime();
    boolean found = big.contains(999_999);
    long t1 = System.nanoTime();
    System.out.println("\ncontains in 1M-set: " + found + "  (" + (t1 - t0) + " ns)");

// --- 3. The records-as-elements pattern ---
    record Tag(String name) {}
    Set<Tag> tags = new HashSet<>();
    tags.add(new Tag("java"));
    tags.add(new Tag("java")); // duplicate by equals -> not added
    System.out.println("\ntag set: " + tags + "  size=" + tags.size());

// --- 4. The mutable-element trap ---
    Box box = new Box(1);
    Set<Box> bad = new HashSet<>();
    bad.add(box);
    box.n = 2;                 // change a field hashCode depends on
    System.out.println("\nafter mutation, contains(box)? " + bad.contains(box)
        + "   (same object, but now in the wrong bucket)");
  }
}

What to take away:

The dedup loop is O(n) — every add is constant-time, and the final unique.size() is the number of distinct inputs.
A contains in a 1 000 000-element set returned in microseconds. That's the property that makes HashSet the membership-test tool of the JDK.
The Tag record gets equals/hashCode for free, so two Tag("java") objects collapse to one element.
The Box example is the trap: the same object, mutated after insertion so its hashCode changed, now reports contains(box) == false. Put immutable elements in hash sets.

What's next

HashSet doesn't promise any iteration order. If you need to remember the order you inserted elements in — say you're building a tag list and the user expects to see tags in the order they were added — the right tool is LinkedHashSet. That's the next chapter.

Practice

You insert your own class `Customer` into a `HashSet`, then look it up and `contains` returns `false` for a `Customer` that should be equal to one you inserted. What's the most likely cause?

`Customer` overrides `equals` but not `hashCode` (or they're inconsistent), so the lookup hits a different bucket`HashSet` only accepts elements that implement `Comparable`The set rehashed during insertion and dropped the element`HashSet.contains` uses reference equality, not `equals`