W3docs

Java Set Interface

Collections of unique elements in Java with the Set interface and its core implementations.

Java Set Interface

Set<E> is Collection<E> with one rule on top: no duplicates. The interface itself adds almost no new methods — it inherits add, remove, contains, size, iterator, and the rest. What it changes is what those methods promise: add(e) returns false (and changes nothing) if an equal element is already in the set; two sets are equals if they contain the same elements regardless of order; and hashCode is the sum of element hashes so equal sets always agree.

That tiny contract — "elements are unique" — turns out to be exactly what you need for membership tests, deduplication, intersection/union, and a thousand other patterns that read poorly with a List.

What counts as a duplicate

A Set decides duplication by equals (and, in hash-based sets, hashCode as the bucketing function). It's not pointer equality, and it's not "looks the same when printed." If you put your own class into a Set you must override both methods together; the equals and hashCode chapter covered the contract.

Set<String> tags = new HashSet<>();
tags.add("java");
tags.add("java");      // add returns false; the set still has one element
tags.add(new String("java")); // also false — equals, not reference

A common trap: storing mutable elements and then mutating them while they live in a hash-based set. The element's hash changes, the set looks for it in the wrong bucket, and contains starts lying. The rule: if you put an object in a hash set, treat it as effectively immutable from that point on. TreeSet has the same trap with compareTo.

The four standard implementations

java.util ships four implementations of Set that cover almost every use case:

ClassBacking structureOrderingNull elementsTypical use
HashSethash tablenoneone null allowedthe default — fastest membership tests
LinkedHashSethash table + doubly-linked listinsertion orderone null allowedwhen iteration order must match insert order
TreeSetred-black treenatural / comparator orderno nullwhen you need sorted iteration or range queries
EnumSetbit vectorenum declaration orderno nulla Set<MyEnum> — tiny, fast, ordered

The first three are covered in the next three chapters; EnumSet is covered later in the book alongside EnumMap.

The bulk operations: union, intersection, difference

A Set inherits four bulk operations from Collection, and on a Set they take on textbook set-theory meanings:

  • addAll(other)union (in place). After the call, a contains every element of either side.
  • retainAll(other)intersection (in place). After the call, a contains only elements that were also in other.
  • removeAll(other)difference (in place). After the call, a contains only elements that were not in other.
  • containsAll(other)subset test. Returns true if every element of other is in a.

These mutate the receiver. If you need a non-destructive version, copy first: Set<E> u = new HashSet<>(a); u.addAll(b);.

Equality and hash codes of sets

The Set contract for equals is unusual: two sets are equal if they contain the same elements, regardless of order or implementation type. A HashSet, a LinkedHashSet, and a TreeSet of the same elements all compare equal to each other.

Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new TreeSet<>(List.of(3, 2, 1));
System.out.println(a.equals(b));   // true

That's why the bulk operations and the immutable factory methods can move between implementations freely — only the rule "same elements" is observed.

Read-only and immutable factories

Since Java 9 the easiest way to make a small, fixed set is Set.of(...):

Set<String> primary = Set.of("red", "green", "blue");

Set.of returns an immutable set that refuses null elements and throws if you pass a duplicate at construction time. For a defensive snapshot of an existing set, use Set.copyOf(existing) — also immutable, also rejects nulls.

For a view that hides mutations (the original can still be modified, but callers can't add through the view) use Collections.unmodifiableSet(s). The unmodifiable collections chapter later in this part covers when to pick which.

A worked example: deduplication, set algebra, and order

The program below uses all four implementations, demonstrates the bulk operations on real data, and shows how iteration order differs between HashSet, LinkedHashSet, and TreeSet.

java— editable, runs on the server

What to take from the run:

  • add returned false for every duplicate \"java\" — that's how you write a deduplicator in two lines.
  • Union, intersection, and difference are one addAll/retainAll/removeAll each — copy first if you don't want to lose the original.
  • The three implementations contain the same elements and compare equal, but iterate in very different orders. Pick the implementation by the order you need, not by reflex.

What's next

The most common Set implementation — and the one you reach for unless you have a reason not to — is hash-table-backed. HashSet is up next; we'll cover load factor, the rehash, and what "near-constant time" means in practice.

Practice

Practice

What does `set.add(x)` return when `x` is already an element of `set`, according to the `Set` contract?