Java Stream Intermediate Operations
Transform Java streams lazily with filter, map, flatMap, sorted, distinct, peek, limit, and skip.
Java Stream Intermediate Operations
An intermediate operation takes a stream and returns another stream. It records what should happen to each element when the pipeline eventually runs; it doesn't run anything on its own. You chain them; the chain stays cold until a terminal pulls the first element through. That laziness is what makes a 30-line pipeline cost less than its parts, what makes infinite sources tractable, and what makes the choice of operation more about clarity than about avoiding work — adjacent intermediates fuse into a single pass.
This chapter is a tour of every intermediate you'll write. Each entry has the same shape: what it does, what its callback's type is, whether it's stateless or stateful, and the one or two gotchas that decide whether the pipeline is correct.
filter — keep what matches
Drops elements that fail a Predicate<T>:
List<Integer> evens = nums.stream()
.filter(n -> n % 2 == 0)
.toList();Stateless, lazy, preserves order. The predicate must be side-effect free — if it mutates anything visible, parallel pipelines will surprise you and even sequential ones become hard to read.
filter doesn't change the element type. To both keep a subset and change the type, use filter then map, or mapMulti (Java 16+) for the rare case where one input becomes zero-or-one differently-typed output.
map — transform each element
Applies a Function<T, R> to every element, producing a stream of R:
List<Integer> lengths = words.stream()
.map(String::length)
.toList();Stateless, lazy, preserves order, one-in one-out. Use the primitive specialisations when the result is numeric:
mapToInt,mapToLong,mapToDouble→ primitive stream (no boxing,sum()available).mapToObjon a primitive stream → back toStream<R>.
int totalLength = words.stream().mapToInt(String::length).sum();flatMap — replace each element with a stream of others
A Function<T, Stream<R>> that "unpacks" each element into multiple outputs (or none, or one):
List<List<String>> grouped = List.of(List.of("a", "b"), List.of("c"));
List<String> flat = grouped.stream()
.flatMap(List::stream)
.toList(); // [a, b, c]The mental model: "every element becomes a sub-stream, and flatMap concatenates them." It's how you go from a stream of containers (Stream<List<T>>) to a stream of contents (Stream<T>), how you expand each text into its words, and how you turn a stream of Optional<T> into a stream of present values (via Optional::stream).
There are primitive specialisations too — flatMapToInt, flatMapToLong, flatMapToDouble — for fan-out into a primitive stream.
A common confusion: map(s -> s.split(" ")) gives Stream<String[]> — a stream of arrays, not a flat stream of words. To flatten, flatMap(s -> Arrays.stream(s.split(" "))).
mapMulti — push zero, one, or many elements per input
mapMulti (Java 16+) is a more efficient flatMap for cases where each input produces a small, variable number of outputs and building a per-element Stream is overkill:
people.stream()
.<String>mapMulti((p, downstream) -> {
if (p.age() >= 18) downstream.accept(p.name());
if (p.email() != null) downstream.accept(p.email());
})
.forEach(System.out::println);Use flatMap when you naturally have a stream/list to emit; use mapMulti when you'd otherwise build a tiny one-or-two-element stream per input just to satisfy flatMap's signature.
distinct — drop duplicates
Removes equal elements using equals / hashCode:
List<String> unique = words.stream().distinct().toList();Stateful — to know whether an element is a duplicate, distinct has to remember the ones it's already emitted. On an ordered stream it keeps the first occurrence. On an unordered stream the JVM can be smarter about parallel work. On an infinite stream you almost never want distinct without a limit upstream.
sorted — order the elements
Two forms — natural order and a Comparator<T>:
List<String> az = words.stream().sorted().toList();
List<String> byLen = words.stream().sorted(Comparator.comparingInt(String::length)).toList();Stateful and terminal-blocking: sorted must buffer every element before it can emit one. That makes it the most expensive intermediate and one to use deliberately. Putting it before a limit(n) does not save work — the JVM still has to see every input to know which n to keep. (For a "top N" pipeline, prefer a bounded PriorityQueue or Collectors.toList() then subList after a sorted, depending on N vs. total.)
Also: do not call sorted on a stream from an infinite source — it never returns.
peek — observe without changing
A Consumer<T> that fires for each element pulled through. Returns the stream unchanged:
words.stream()
.peek(s -> System.out.println(\"seen: \" + s))
.filter(s -> s.length() > 3)
.toList();For debugging only. peek runs lazily and exactly once per pulled element, so it's a useful window onto laziness and short-circuiting:
Stream.iterate(1, n -> n + 1)
.peek(n -> System.out.println(\"considered \" + n))
.filter(n -> n > 100)
.findFirst(); // pulls 1..101 -- peek fires 101 times, then stopsDon't put real logic in a peek. The JVM is allowed to fuse, reorder, or skip peek calls under certain conditions on unmodified streams, and on parallel streams the order is undefined.
limit(n) — keep at most n elements
Stops the pipeline after n elements have made it through:
List<Integer> firstFive = Stream.iterate(1, i -> i + 1).limit(5).toList();Stateful (it counts) and short-circuiting (downstream stops once n is reached). On an ordered stream it keeps the first n. On an unordered parallel stream it keeps some n — order is not guaranteed, and a parallel limit on an ordered stream pays for the ordering. If you don't care which n you get, stream.unordered().limit(n) is faster in parallel.
The standard pattern for taming any infinite source: every Stream.iterate / Stream.generate ends in either a limit, a 3-arg iterate bound, or a short-circuiting terminal like findFirst.
skip(n) — drop the first n
The complement of limit. Drops the first n elements, then emits the rest:
List<Integer> rest = nums.stream().skip(2).toList(); // drops nums[0], nums[1]Stateful (it counts down). On an ordered stream the meaning is exact; on a parallel ordered stream it pays an ordering cost. Together with limit, it gives you "paged" access:
list.stream().skip(page * pageSize).limit(pageSize).toList();That works, but for large skip over a List it's still O(skip + limit). A direct list.subList(...) is cheaper if you have the List in hand.
takeWhile / dropWhile — prefix-based windowing
Two short-circuiting intermediates (Java 9+) that act on a prefix of the stream:
// take elements while predicate holds, stop at the first miss
List<Integer> small = Stream.of(1, 2, 3, 10, 4, 5)
.takeWhile(n -> n < 5)
.toList(); // [1, 2, 3]
// drop elements while predicate holds, then emit the rest
List<Integer> rest = Stream.of(1, 2, 3, 10, 4, 5)
.dropWhile(n -> n < 5)
.toList(); // [10, 4, 5]These are not filter. filter tests every element. takeWhile stops at the first failure (including ones that would pass filter later). On a sorted stream they're how you express "everything up to threshold" cheaply.
boxed / asLongStream / asDoubleStream — move between primitive worlds
Primitive streams have a few intermediates of their own for crossing back into the object world:
IntStream.range(0, 5).boxed().toList(); // Stream<Integer> [0, 1, 2, 3, 4]
IntStream.range(0, 3).asLongStream().sum(); // 0L + 1L + 2L
IntStream.range(0, 3).asDoubleStream().average();boxed is the bridge from primitive to Stream<Integer/Long/Double>. The reverse is mapToInt/mapToLong/mapToDouble.
Stateless vs. stateful — why it matters
| Stateless | Stateful |
|---|---|
filter | distinct |
map / mapToX | sorted |
flatMap / mapMulti | limit |
peek | skip |
boxed / asLongStream / asDoubleStream | takeWhile / dropWhile |
Stateful intermediates need to remember something across elements. sorted has to buffer everything. distinct has to remember every emitted element. limit and skip need a counter. That makes them more expensive (especially in parallel) and worth using deliberately.
Order matters — fuse, filter early, transform late
Because adjacent intermediates fuse into one element-by-element pass, the order in which you write them determines how much work the pipeline does:
// Good: filter first, then the expensive map runs only on survivors.
people.stream()
.filter(p -> p.age() >= 18)
.map(this::expensiveLookup)
.toList();
// Bad: every element pays for the map, then most are thrown away.
people.stream()
.map(this::expensiveLookup)
.filter(r -> r.score() > 0.5)
.toList();The general rule: filter early, transform late, sort once, distinct once. The JVM does not reorder your intermediates — you do.
A worked example: the whole vocabulary in one pipeline
The program below builds a stream from a small list, walks every operation we've covered, prints the result of each, and proves laziness/short-circuiting with peek plus an infinite iterate.
What to take from the run:
filterandmapare the workhorses; the other one-in-one-out intermediates (mapToInt,mapToObj,boxed) are the cheap currency-conversions between object and primitive streams.flatMapandmapMultiare how one input becomes several outputs. TheStream.of("a b") -> Arrays.stream(split(...))shape is the canonical "tokenise" pattern;mapMultiis the cheaper choice when you'd otherwise build a tiny stream per element.distinctandsortedare stateful —distincthad to remember every previously-emittedPersonto drop the duplicate "Alice", andsortedhad to buffer the whole input. That's why both are placed deliberately, usually once, and usually late.peekfired once per pulled element on the infiniteiterate— there were exactly as many "considered N" lines as elementsfindFirsthad to look at. Without short-circuiting that pipeline would never terminate.- The two
lookup-counting blocks at the end made the order rule concrete. Filtering first ran the expensive transform on far fewer elements than mapping first. That tradeoff is yours to set.
What's next
Intermediates record the shape of the work; nothing runs until a terminal pulls. The next chapter, Java Stream Terminal Operations, is the full vocabulary of terminals — forEach, count, min/max, findFirst/findAny, anyMatch/allMatch/noneMatch, reduce, toArray, toList, and the gateway to the chapter after — collect.
Practice
In which pipeline does `sorted` need to buffer *every* input element before it can emit even one output?