Java Set Interface
Collections of unique elements in Java with the Set interface and its core implementations.
Java Set Interface
Set<E> is Collection<E> with one rule on top: no duplicates. The interface itself adds almost no new methods — it inherits add, remove, contains, size, iterator, and the rest. What it changes is what those methods promise: add(e) returns false (and changes nothing) if an equal element is already in the set; two sets are equals if they contain the same elements regardless of order; and hashCode is the sum of element hashes so equal sets always agree.
That tiny contract — "elements are unique" — turns out to be exactly what you need for membership tests, deduplication, intersection/union, and a thousand other patterns that read poorly with a List.
What counts as a duplicate
A Set decides duplication by equals (and, in hash-based sets, hashCode as the bucketing function). It's not pointer equality, and it's not "looks the same when printed." If you put your own class into a Set you must override both methods together; the equals and hashCode chapter covered the contract.
Set<String> tags = new HashSet<>();
tags.add("java");
tags.add("java"); // add returns false; the set still has one element
tags.add(new String("java")); // also false — equals, not referenceA common trap: storing mutable elements and then mutating them while they live in a hash-based set. The element's hash changes, the set looks for it in the wrong bucket, and contains starts lying. The rule: if you put an object in a hash set, treat it as effectively immutable from that point on. TreeSet has the same trap with compareTo.
The four standard implementations
java.util ships four implementations of Set that cover almost every use case:
| Class | Backing structure | Ordering | Null elements | Typical use |
|---|---|---|---|---|
HashSet | hash table | none | one null allowed | the default — fastest membership tests |
LinkedHashSet | hash table + doubly-linked list | insertion order | one null allowed | when iteration order must match insert order |
TreeSet | red-black tree | natural / comparator order | no null | when you need sorted iteration or range queries |
EnumSet | bit vector | enum declaration order | no null | a Set<MyEnum> — tiny, fast, ordered |
The first three are covered in the next three chapters; EnumSet is covered later in the book alongside EnumMap.
The bulk operations: union, intersection, difference
A Set inherits four bulk operations from Collection, and on a Set they take on textbook set-theory meanings:
addAll(other)— union (in place). After the call,acontains every element of either side.retainAll(other)— intersection (in place). After the call,acontains only elements that were also inother.removeAll(other)— difference (in place). After the call,acontains only elements that were not inother.containsAll(other)— subset test. Returnstrueif every element ofotheris ina.
These mutate the receiver. If you need a non-destructive version, copy first: Set<E> u = new HashSet<>(a); u.addAll(b);.
Equality and hash codes of sets
The Set contract for equals is unusual: two sets are equal if they contain the same elements, regardless of order or implementation type. A HashSet, a LinkedHashSet, and a TreeSet of the same elements all compare equal to each other.
Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new TreeSet<>(List.of(3, 2, 1));
System.out.println(a.equals(b)); // trueThat's why the bulk operations and the immutable factory methods can move between implementations freely — only the rule "same elements" is observed.
Read-only and immutable factories
Since Java 9 the easiest way to make a small, fixed set is Set.of(...):
Set<String> primary = Set.of("red", "green", "blue");Set.of returns an immutable set that refuses null elements and throws if you pass a duplicate at construction time. For a defensive snapshot of an existing set, use Set.copyOf(existing) — also immutable, also rejects nulls.
For a view that hides mutations (the original can still be modified, but callers can't add through the view) use Collections.unmodifiableSet(s). The unmodifiable collections chapter later in this part covers when to pick which.
A worked example: deduplication, set algebra, and order
The program below uses all four implementations, demonstrates the bulk operations on real data, and shows how iteration order differs between HashSet, LinkedHashSet, and TreeSet.
What to take from the run:
addreturnedfalsefor every duplicate\"java\"— that's how you write a deduplicator in two lines.- Union, intersection, and difference are one
addAll/retainAll/removeAlleach — copy first if you don't want to lose the original. - The three implementations contain the same elements and compare equal, but iterate in very different orders. Pick the implementation by the order you need, not by reflex.
What's next
The most common Set implementation — and the one you reach for unless you have a reason not to — is hash-table-backed. HashSet is up next; we'll cover load factor, the rehash, and what "near-constant time" means in practice.
Practice
What does `set.add(x)` return when `x` is already an element of `set`, according to the `Set` contract?