Searching Java Collections
Find elements in Java collections with contains, indexOf, binarySearch, and stream-based search.
Searching Java Collections
"Is this element in the collection?" sounds like one question, but Java answers it half a dozen different ways with different costs and different return types. Knowing which to use turns a 50-millisecond hot loop into a 50-microsecond one. This chapter is a tour of the search methods on collections, the maps that wrap them, and the static helpers in Collections.
The cost-aware mental model
Every search method's cost is determined by the underlying collection, not by the call site. Pick the right collection up front and your searches are free; pick the wrong one and no clever method call will save you.
| Collection | contains / lookup | Why |
|---|---|---|
HashSet, LinkedHashSet, HashMap.keySet() | O(1) expected | Hash-bucket lookup |
TreeSet, TreeMap.keySet() | O(log n) | Red-black tree |
ArrayList, LinkedList, Vector | O(n) | Linear scan |
Sorted ArrayList + Collections.binarySearch | O(log n) | Binary search on indexed list |
LinkedList + Collections.binarySearch | O(n) | Binary search must index — O(n) per step |
Two rules of thumb:
- If you
containsa lot, use aSet. Building aHashSetfrom aListand querying it is almost always faster thanlist.containsin a loop. - If the data is sorted and indexed, use
Collections.binarySearch. It pays off after about 30 elements on most JVMs.
Collection.contains(o)
Every Collection has it. The semantics are equality-based:
boolean has = list.contains("alpha"); // uses .equalsFor a List, this is a linear scan — O(n). For a HashSet, it's a hash-bucket lookup — O(1) expected. For a TreeSet, it's an O(log n) tree walk. The method signature is the same; the cost is not.
null is allowed (the method returns whether the collection contains any null element), unless the collection rejects null outright — TreeSet with natural ordering, EnumSet, ConcurrentHashMap.keySet().
List.indexOf and lastIndexOf
Lists support more than just "yes/no" — they return the position:
int firstA = list.indexOf("alpha"); // -1 if absent
int lastA = list.lastIndexOf("alpha");Linear scan from the front (or back). For an ArrayList<String> of a thousand elements, this is fine. For a million, build a Map<String, Integer> once and query it.
Map.containsKey, containsValue, get, getOrDefault
The map-specific search methods divide cleanly:
map.containsKey("alpha"); // O(1) for HashMap, O(log n) for TreeMap
map.get("alpha"); // returns the value or null
map.getOrDefault("alpha", 0); // returns the value or your default
map.containsValue("v"); // O(n) — scans every entrycontainsValue is the trap. It walks every entry every time. If you find yourself calling it more than once, build an inverse map (Map<V, K>) or a Set<V> of values once and query that.
getOrDefault is a small but important pattern shift: it replaces the old Integer n = map.get(k); if (n == null) n = 0; idiom with one line, and the default is only used when the key is absent — not when the value is null. (For "absent or null," use Objects.requireNonNullElse(map.get(k), 0).)
Collections.binarySearch
Binary search on a sorted list:
List<String> sorted = new ArrayList<>(...);
Collections.sort(sorted);
int hit = Collections.binarySearch(sorted, "delta"); // 2 (some index)
int miss = Collections.binarySearch(sorted, "zeta"); // negativeTwo preconditions:
- The list must be sorted in the order the search will use. If you sorted with a comparator, pass the same comparator to
binarySearch. Mismatched orders produce nonsense (silently — no exception). - The list should be indexed (
ArrayList, notLinkedList). On a linked list, binary search is O(n log n), worse than linear scan.
The return value encodes both "found" and "where to insert":
int i = Collections.binarySearch(sorted, key);
if (i >= 0) {
// key is at index i
} else {
int insertAt = -i - 1;
sorted.add(insertAt, key); // keeps the list sorted
}That -i - 1 arithmetic is the way every "find or insert" routine in the JDK handles a miss. It's worth memorising.
Collections.frequency and disjoint
Two helpers that wrap common search patterns:
int n = Collections.frequency(coll, "alpha"); // how many times "alpha" appears
boolean none = Collections.disjoint(a, b); // no element of a is in bfrequency is O(n). For repeated queries with different targets, count once with a stream into a Map<T, Long> instead.
disjoint is implemented cleverly: it iterates the smaller collection and checks contains on the larger if the larger is a Set, swapping arguments under the hood. So Collections.disjoint(largeList, smallSet) is O(largeList) — and faster than rolling your own.
Stream-based search
Streams handle "find the first matching element" with findFirst / findAny, and "is there any match" with anyMatch / allMatch / noneMatch:
Optional<Person> match = people.stream()
.filter(p -> p.age() >= 18 && p.name().startsWith("A"))
.findFirst();
boolean any = people.stream().anyMatch(p -> p.age() >= 65);
boolean all = people.stream().allMatch(p -> p.age() >= 0);
boolean non = people.stream().noneMatch(p -> p.age() < 0);Streams short-circuit on findFirst and anyMatch — they stop as soon as a match is found. They're the cleanest answer for predicate-based search. They are not faster than contains for equality search on the right data structure — a HashSet.contains will always beat stream().anyMatch(x -> x.equals(target)).
Optional<T> deserves its own attention (it has a chapter in the functional programming part). For now: findFirst().isPresent() is the cleanest "did we find anything?" expression for a predicate.
LinkedHashSet for "contains and order"
A common pattern: you need fast contains and insertion-order iteration. LinkedHashSet is the answer:
LinkedHashSet<String> seen = new LinkedHashSet<>();
for (String line : input) {
if (seen.add(line)) System.out.println(line); // print first occurrences only
}add returns true only the first time. The set rejects duplicates in O(1) and preserves insertion order for iteration. That's the right tool for "deduplicate while keeping order" — neither HashSet (loses order) nor ArrayList (slow contains) is as good.
A worked example: comparing five search strategies on the same data
The program below seeds 100 000 strings into different collections and times five lookup strategies for 1 000 random hits: ArrayList.contains, HashSet.contains, TreeSet.contains, Collections.binarySearch on a sorted list, and stream().anyMatch.
What to take from the run:
HashSet.containsandCollections.binarySearchon the sortedArrayListare dramatically faster thanArrayList.containsfor repeated lookups. The hash table wins for "any equality," the binary search wins when the data must be kept sorted for other reasons too.TreeSet.containsis close behind, but not free — every lookup walks a tree of depth ~log₂(100 000) ≈ 17, with cache misses for tree pointers.stream().anyMatchfor equality search is the worst option here: same O(n) aslist.containsbut with extra allocation overhead per query. Use it for predicates, not for plain equality on a list.- The "missing key" call returned a negative value, and
-i - 1gave the index where"zzz"would be inserted to keep the list sorted. That's the same conventionTreeMap.subMapandArrays.binarySearchuse.
What's next
You've now covered iteration, ordering, sorting, and searching — the four mechanical operations the collections framework exists to make easy. The final chapter in this part is the modern story for the one thing none of them touched: immutability. Java Unmodifiable Collections covers List.of, Set.of, Map.of, and the Collections.unmodifiable* wrappers — when each one is the right choice, and why a "defensive copy" pattern that used to span four lines now fits in one.
Practice
You call `Collections.binarySearch(sortedList, key)` and the result is `-5`. What index would `key` need to be inserted at to keep the list sorted?