W3docs

Java Fork/Join Framework

Split work recursively and parallelize it with the Fork/Join framework — ForkJoinPool, RecursiveTask, work-stealing.

Java Fork/Join Framework

A regular thread pool is great for "many independent tasks." It's not great for "one big task that can be split recursively into smaller versions of itself." For that second shape — divide-and-conquer work — Java has a specialised executor: the ForkJoinPool. It's the pool behind parallelStream, CompletableFuture.supplyAsync (when no executor is given), and any code you write with RecursiveTask/RecursiveAction.

The trick that makes ForkJoinPool special is work-stealing: every worker has its own deque, and when its own deque is empty it steals a task from the bottom of another worker's deque. The result is automatic load-balancing — fast workers help slow workers without any coordination.

When to reach for it

Fork/join is the right tool for:

  • Recursive divide-and-conquer. Quicksort, mergesort, tree traversal, recursive numeric algorithms, matrix multiply by halving.
  • CPU-bound work that's parallelisable into roughly equal pieces.
  • Work whose granularity adapts: split if the chunk is big, do directly if it's small.

It's the wrong tool for:

  • I/O-bound work. A blocked worker doesn't steal — and the pool's default size is your CPU count. Block one worker and you've lost a core.
  • Independent unrelated tasks. A regular ThreadPoolExecutor is simpler and just as fast for that shape.
  • Tasks that depend on a fixed external schedule. Use ScheduledExecutorService.

The Java 7 streams team's mental model: "if you'd write a parallelStream to do this, fork/join is the right shape directly."

The three classes

ForkJoinPool pool;                                    // the executor
RecursiveTask<V>;                                     // an abstract task returning V
RecursiveAction;                                      // an abstract task returning nothing

You extend RecursiveTask or RecursiveAction, override compute(), decide inside compute() whether to split or to do the work directly, and call fork()/join() on the sub-tasks.

class Sum extends RecursiveTask<Long> {
  private static final int THRESHOLD = 1000;
  private final long[] data;
  private final int lo, hi;

  Sum(long[] data, int lo, int hi) {
    this.data = data; this.lo = lo; this.hi = hi;
  }

  @Override
  protected Long compute() {
    int len = hi - lo;
    if (len <= THRESHOLD) {
      long s = 0;
      for (int i = lo; i < hi; i++) s += data[i];
      return s;
    }
    int mid = lo + len / 2;
    Sum left  = new Sum(data, lo, mid);
    Sum right = new Sum(data, mid, hi);
    left.fork();                                      // schedule left to run on another worker
    long rightResult = right.compute();               // run right on this worker (avoid extra task)
    long leftResult  = left.join();                   // wait for left
    return leftResult + rightResult;
  }
}

ForkJoinPool pool = new ForkJoinPool();
long total = pool.invoke(new Sum(data, 0, data.length));

The shape — check threshold → split → fork one half → compute the other → join — is the canonical fork/join idiom. The "compute one half here instead of fork-ing both" trick avoids creating an unnecessary task and is a small but real win.

The threshold matters

The single most important decision: when to stop splitting. Too small a threshold and you create thousands of tasks for trivial pieces — the overhead dominates the work. Too large and you can't fully use the cores — many workers stand idle while one chews through a big chunk.

Rules of thumb:

  • The task body should take at least 10 microseconds. Below that, task-management overhead is comparable to the work.
  • Make the threshold a constant you can tune. 100, 1000, 10000 are typical for primitive arrays; the right number depends on the per-element cost.
  • For very small inputs, fall back to a purely serial implementation. The split-and-fork overhead is wasted on inputs that fit in cache.

fork(), join(), invoke()

The three operations on a RecursiveTask:

MethodBehaviour
fork()Schedule the task on the current pool; return immediately
join()Wait for the task and return its result (or throw what it threw)
invoke()Combination of fork + join for this thread — synchronous
compute()Run the body directly on the calling thread (no fork)

In the pattern above, left.fork(); right.compute(); left.join(); does the right thing — fork one half to another worker, run the other half here, then wait for the fork to finish.

You should not write left.fork(); right.fork(); left.join(); right.join();. The right side is forked and the current worker waits — there's no thread of execution available to actually run right until the worker reaches join. The combination wastes the current worker's time slot.

The common pool

ForkJoinPool.commonPool() is a JVM-wide shared pool, sized to Runtime.getRuntime().availableProcessors() - 1 by default. It powers:

  • Stream.parallelStream()
  • CompletableFuture.supplyAsync(supplier) (the no-executor overload)
  • Arrays.parallelSort()

You can configure the common pool's size via the system property java.util.concurrent.ForkJoinPool.common.parallelism at JVM start. You should not use the common pool for I/O — a single blocking call ties up a worker that the whole JVM shares.

Work-stealing in pictures

worker-1 deque:  [t1 t2 t3 t4]            (it forked these; t4 just got pushed)
worker-2 deque:  []                       (empty — workers steal)
worker-3 deque:  [t10 t11]                (still has its own)

worker-2 finds its deque empty; steals t1 from the BOTTOM of worker-1's deque
worker-1 keeps pulling its own tasks from the TOP

The double-ended queue is the heart of the design: workers push and pop from one end (LIFO — locality of reference for cache hits), thieves take from the other end (FIFO — minimal contention with the owner). It's why fork/join performs so well: workers rarely contend on each other's data structures even under heavy load.

A worked example: parallel sum vs. serial

The program below sums a 10-million-element array two ways — serial loop and fork/join recursion — and prints the wall-clock for each.

java— editable, runs on the server

What to take from the run:

  • The fork/join version was several times faster than the serial loop. On an N-core machine the upper bound is roughly — the actual number was lower because the JVM, the GC, and other JVM threads also wanted CPU, and the threshold work isn't perfectly balanced. Still, a substantial speedup for a few lines of recursive code.
  • Both sums were equal. That's the partition-and-merge correctness check: each leaf summed its non-overlapping slice; the combine step (l + r) added them; no double-counting or data races because each leaf wrote to its own local variable.
  • The SumTiny variant with threshold 10 was slower than the serial loop. With 10M elements split down to chunks of 10, you create about 2M tasks — and the task-management overhead dwarfs the actual addition work. The threshold is a real knob; bench it on representative input.
  • The left.fork(); long r = right.compute(); long l = left.join(); pattern used one fewer task than fork(); fork(); join(); join(); would. The current worker has free time during the compute() — using it for one of the halves saves an entire task allocation. That's the small but cumulative win on many real workloads.
  • ForkJoinPool.commonPool() was the executor used in this demo. For a one-off run, the common pool is fine. For a long-running program that mixes fork/join work with stream parallel calls and async futures, give the heavy fork/join workload its own pool — the common pool is meant for small bursts, not steady-state heavy compute.

What's next

The next chapter, Java Concurrent Collections, covers the data structures designed to be hammered by multiple threads — ConcurrentHashMap, CopyOnWriteArrayList, BlockingQueue, and the rest of java.util.concurrent.

Practice

Practice

Inside `compute()` of a `RecursiveTask`, you have two sub-tasks `left` and `right`. Which call pattern is the canonical fork/join idiom?