W3docs

Searching Java Collections

Find elements in Java collections with contains, indexOf, binarySearch, and stream-based search.

Searching Java Collections

"Is this element in the collection?" sounds like one question, but Java answers it half a dozen different ways with different costs and different return types. Knowing which to use turns a 50-millisecond hot loop into a 50-microsecond one. This chapter is a tour of the search methods on collections, the maps that wrap them, and the static helpers in Collections.

The cost-aware mental model

Every search method's cost is determined by the underlying collection, not by the call site. Pick the right collection up front and your searches are free; pick the wrong one and no clever method call will save you.

Collectioncontains / lookupWhy
HashSet, LinkedHashSet, HashMap.keySet()O(1) expectedHash-bucket lookup
TreeSet, TreeMap.keySet()O(log n)Red-black tree
ArrayList, LinkedList, VectorO(n)Linear scan
Sorted ArrayList + Collections.binarySearchO(log n)Binary search on indexed list
LinkedList + Collections.binarySearchO(n)Binary search must index — O(n) per step

Two rules of thumb:

  1. If you contains a lot, use a Set. Building a HashSet from a List and querying it is almost always faster than list.contains in a loop.
  2. If the data is sorted and indexed, use Collections.binarySearch. It pays off after about 30 elements on most JVMs.

Collection.contains(o)

Every Collection has it. The semantics are equality-based:

boolean has = list.contains("alpha");        // uses .equals

For a List, this is a linear scan — O(n). For a HashSet, it's a hash-bucket lookup — O(1) expected. For a TreeSet, it's an O(log n) tree walk. The method signature is the same; the cost is not.

null is allowed (the method returns whether the collection contains any null element), unless the collection rejects null outright — TreeSet with natural ordering, EnumSet, ConcurrentHashMap.keySet().

List.indexOf and lastIndexOf

Lists support more than just "yes/no" — they return the position:

int firstA = list.indexOf("alpha");          // -1 if absent
int lastA  = list.lastIndexOf("alpha");

Linear scan from the front (or back). For an ArrayList<String> of a thousand elements, this is fine. For a million, build a Map<String, Integer> once and query it.

Map.containsKey, containsValue, get, getOrDefault

The map-specific search methods divide cleanly:

map.containsKey("alpha");                    // O(1) for HashMap, O(log n) for TreeMap
map.get("alpha");                             // returns the value or null
map.getOrDefault("alpha", 0);                 // returns the value or your default
map.containsValue("v");                       // O(n) — scans every entry

containsValue is the trap. It walks every entry every time. If you find yourself calling it more than once, build an inverse map (Map<V, K>) or a Set<V> of values once and query that.

getOrDefault is a small but important pattern shift: it replaces the old Integer n = map.get(k); if (n == null) n = 0; idiom with one line, and the default is only used when the key is absent — not when the value is null. (For "absent or null," use Objects.requireNonNullElse(map.get(k), 0).)

Collections.binarySearch

Binary search on a sorted list:

List<String> sorted = new ArrayList<>(...);
Collections.sort(sorted);
int hit  = Collections.binarySearch(sorted, "delta");      // 2  (some index)
int miss = Collections.binarySearch(sorted, "zeta");       // negative

Two preconditions:

  1. The list must be sorted in the order the search will use. If you sorted with a comparator, pass the same comparator to binarySearch. Mismatched orders produce nonsense (silently — no exception).
  2. The list should be indexed (ArrayList, not LinkedList). On a linked list, binary search is O(n log n), worse than linear scan.

The return value encodes both "found" and "where to insert":

int i = Collections.binarySearch(sorted, key);
if (i >= 0) {
  // key is at index i
} else {
  int insertAt = -i - 1;
  sorted.add(insertAt, key);             // keeps the list sorted
}

That -i - 1 arithmetic is the way every "find or insert" routine in the JDK handles a miss. It's worth memorising.

Collections.frequency and disjoint

Two helpers that wrap common search patterns:

int n = Collections.frequency(coll, "alpha");        // how many times "alpha" appears
boolean none = Collections.disjoint(a, b);           // no element of a is in b

frequency is O(n). For repeated queries with different targets, count once with a stream into a Map<T, Long> instead.

disjoint is implemented cleverly: it iterates the smaller collection and checks contains on the larger if the larger is a Set, swapping arguments under the hood. So Collections.disjoint(largeList, smallSet) is O(largeList) — and faster than rolling your own.

Streams handle "find the first matching element" with findFirst / findAny, and "is there any match" with anyMatch / allMatch / noneMatch:

Optional<Person> match = people.stream()
    .filter(p -> p.age() >= 18 && p.name().startsWith("A"))
    .findFirst();

boolean any = people.stream().anyMatch(p -> p.age() >= 65);
boolean all = people.stream().allMatch(p -> p.age() >= 0);
boolean non = people.stream().noneMatch(p -> p.age() < 0);

Streams short-circuit on findFirst and anyMatch — they stop as soon as a match is found. They're the cleanest answer for predicate-based search. They are not faster than contains for equality search on the right data structure — a HashSet.contains will always beat stream().anyMatch(x -> x.equals(target)).

Optional<T> deserves its own attention (it has a chapter in the functional programming part). For now: findFirst().isPresent() is the cleanest "did we find anything?" expression for a predicate.

LinkedHashSet for "contains and order"

A common pattern: you need fast contains and insertion-order iteration. LinkedHashSet is the answer:

LinkedHashSet<String> seen = new LinkedHashSet<>();
for (String line : input) {
  if (seen.add(line)) System.out.println(line);    // print first occurrences only
}

add returns true only the first time. The set rejects duplicates in O(1) and preserves insertion order for iteration. That's the right tool for "deduplicate while keeping order" — neither HashSet (loses order) nor ArrayList (slow contains) is as good.

A worked example: comparing five search strategies on the same data

The program below seeds 100 000 strings into different collections and times five lookup strategies for 1 000 random hits: ArrayList.contains, HashSet.contains, TreeSet.contains, Collections.binarySearch on a sorted list, and stream().anyMatch.

java— editable, runs on the server

What to take from the run:

  • HashSet.contains and Collections.binarySearch on the sorted ArrayList are dramatically faster than ArrayList.contains for repeated lookups. The hash table wins for "any equality," the binary search wins when the data must be kept sorted for other reasons too.
  • TreeSet.contains is close behind, but not free — every lookup walks a tree of depth ~log₂(100 000) ≈ 17, with cache misses for tree pointers.
  • stream().anyMatch for equality search is the worst option here: same O(n) as list.contains but with extra allocation overhead per query. Use it for predicates, not for plain equality on a list.
  • The "missing key" call returned a negative value, and -i - 1 gave the index where "zzz" would be inserted to keep the list sorted. That's the same convention TreeMap.subMap and Arrays.binarySearch use.

What's next

You've now covered iteration, ordering, sorting, and searching — the four mechanical operations the collections framework exists to make easy. The final chapter in this part is the modern story for the one thing none of them touched: immutability. Java Unmodifiable Collections covers List.of, Set.of, Map.of, and the Collections.unmodifiable* wrappers — when each one is the right choice, and why a "defensive copy" pattern that used to span four lines now fits in one.

Practice

Practice

You call `Collections.binarySearch(sortedList, key)` and the result is `-5`. What index would `key` need to be inserted at to keep the list sorted?