W3docs

Walking File Trees in Java

Recursively traverse directories in Java with Files.walk, Files.find, and the FileVisitor interface.

Walking File Trees in Java

The previous chapter ended with Files.walk(dir) — the Stream<Path> form of "give me every file under this directory." That's the fast tool for the common case. This chapter covers the lower-level alternative, Files.walkFileTree, which lets you control the traversal in ways the stream form can't: handle I/O errors per-file, skip whole subtrees mid-walk, run code on directory exit as well as entry, and short-circuit on a match.

Use Files.walk for "list everything." Use Files.walkFileTree for "do something at each step, with control over the step."

Three walking APIs

The catalogue, in order of how often you'll reach for them:

APIReturnsWhen
Files.walk(dir)Stream<Path>Most common — filter/map/foreach over every entry
Files.find(dir, depth, biPredicate)Stream<Path>Same, with an attribute-aware predicate (isDirectory, mtime)
Files.walkFileTree(dir, visitor)Path (the start)Need pre/post-visit hooks, per-file error handling, or to abort the walk

The first two are good enough for 90% of "find me all the .log files" code. walkFileTree is what you reach for when the answer is "and then delete the directory afterwards" or "stop walking the moment I find the one I'm looking for."

FileVisitor and SimpleFileVisitor

Files.walkFileTree takes a FileVisitor<Path> — an interface with four methods the walker calls at specific moments:

FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs);    // entering a directory
FileVisitResult visitFile(Path file, BasicFileAttributes attrs);            // each non-directory entry
FileVisitResult visitFileFailed(Path file, IOException exc);                // I/O failure on a specific file
FileVisitResult postVisitDirectory(Path dir, IOException exc);              // leaving the directory (after all children)

The order matters: for a directory d with children [a, b/, c], the calls are preVisitDirectory(d), visitFile(a), preVisitDirectory(b), ... postVisitDirectory(b), visitFile(c), postVisitDirectory(d). The post* hook is what makes recursive delete possible — you can't delete a directory until you've deleted its contents.

SimpleFileVisitor<Path> is the helper class that implements all four methods with sensible defaults (continue on success, throw on failure). Subclass it and override only the methods you care about:

class LogPrinter extends SimpleFileVisitor<Path> {
  @Override public FileVisitResult visitFile(Path f, BasicFileAttributes a) {
    System.out.println(f);
    return FileVisitResult.CONTINUE;
  }
}
Files.walkFileTree(root, new LogPrinter());

That's the minimum viable visitor.

FileVisitResult: four signals

Every visitor method returns a FileVisitResult telling the walker what to do next:

ValueEffect
CONTINUENormal — go to the next entry
SKIP_SUBTREE(from preVisitDirectory only) Skip this directory and its children entirely
SKIP_SIBLINGSStop visiting the rest of the current directory; resume at the parent's next sibling
TERMINATEStop the walk completely

SKIP_SUBTREE is the one you'll reach for: "don't descend into .git/ or node_modules/." Return it from preVisitDirectory when the directory's name matches and the walker skips both the directory and its children:

@Override public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes a) {
  String name = dir.getFileName() == null ? "" : dir.getFileName().toString();
  if (name.equals(".git") || name.equals("node_modules")) {
    return FileVisitResult.SKIP_SUBTREE;
  }
  return FileVisitResult.CONTINUE;
}

TERMINATE is the "found it, stop" signal — useful when you're searching for the first matching file and don't want to walk the rest:

@Override public FileVisitResult visitFile(Path f, BasicFileAttributes a) {
  if (f.getFileName().toString().equals("target.txt")) {
    found = f;
    return FileVisitResult.TERMINATE;
  }
  return FileVisitResult.CONTINUE;
}

The Stream form can't do this — Files.walk(...).filter(...).findFirst() does short-circuit, but only after the walker has already enumerated each directory entry into the stream. For a deep tree where the match is shallow, walkFileTree is meaningfully faster.

Per-file error handling

visitFile and preVisitDirectory only get called when the JDK could read the entry. If a single file is unreadable (permission denied, dangling symlink, race condition where it was deleted mid-walk), visitFileFailed is called instead with the exception. By default SimpleFileVisitor re-throws — that aborts the walk:

@Override public FileVisitResult visitFileFailed(Path f, IOException e) throws IOException {
  throw e;                                          // default behaviour
}

For a tolerant walker (log and keep going), override it:

@Override public FileVisitResult visitFileFailed(Path f, IOException e) {
  System.err.println("skipping " + f + ": " + e.getMessage());
  return FileVisitResult.CONTINUE;
}

Files.walk(...) doesn't have this hook — it throws an UncheckedIOException from inside the stream the moment it hits a bad entry, and the stream is dead after that. For long-running scanners over filesystems you don't fully control, that's another reason to reach for walkFileTree.

The canonical use case: recursive delete

Files.delete only works on empty directories. To remove a tree you have to delete the leaves first, then the directories that contained them. walkFileTree is the right shape for this — visitFile deletes the file, postVisitDirectory deletes the directory once all its children are gone:

Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
  @Override public FileVisitResult visitFile(Path f, BasicFileAttributes a) throws IOException {
    Files.delete(f);
    return FileVisitResult.CONTINUE;
  }
  @Override public FileVisitResult postVisitDirectory(Path d, IOException e) throws IOException {
    if (e != null) throw e;                          // propagate I/O failures from descent
    Files.delete(d);
    return FileVisitResult.CONTINUE;
  }
});

This is the JDK's "delete a directory tree" recipe. Every codebase that needs it ends up with some version of this 10-line block. Save a copy in a utility class once and reuse.

By default, Files.walkFileTree and Files.walk don't follow symbolic links. That's the safe default: it prevents infinite loops on a symlink that points to its own ancestor. To follow them, pass FileVisitOption.FOLLOW_LINKS:

Files.walkFileTree(root, EnumSet.of(FileVisitOption.FOLLOW_LINKS),
    Integer.MAX_VALUE, visitor);

When you opt in, the walker detects cycles for you — it tracks visited directory keys and bails out if the same one shows up again with FileSystemLoopException. That's the only way to walk a tree with links without writing the cycle-detection yourself.

A worked example: tree-print, skip-subtree, recursive delete

The program below builds a small directory tree with a couple of subdirectories (one of which we want to skip), files at multiple depths, then walks it three ways. First, a tree-printer with SimpleFileVisitor that skips .git. Second, a "find first match" with TERMINATE. Third, the canonical recursive-delete pattern that removes the entire tree at the end.

java— editable, runs on the server

What to take from the run:

  • The preVisitDirectory hook returned SKIP_SUBTREE the moment it saw .git. The walker never descended into the directory; the config file under it was never visited. That's the right tool for "ignore these conventional directories" — .git, node_modules, target, dist, anything else your project doesn't want walked. The Stream<Path> form can't do this without producing the entries and filtering them out, which still costs the directory read.
  • The order of calls for sub/ was preVisitDirectory(sub)visitFile(b.txt)preVisitDirectory(nested)visitFile(c.txt)postVisitDirectory(nested)postVisitDirectory(sub). The post* hooks fire after all descendants have been processed — that's the depth-first contract, and it's what makes the recursive-delete pattern possible.
  • The "find first" walk returned TERMINATE from visitFile the moment c.txt showed up. Everything after that — the remaining entries in nested/, the rest of sub/, the rest of root/ — was never visited. On a small tree the saving is invisible; on a deep tree where the match is shallow, it's the difference between O(n) and O(matching-depth).
  • The recursive delete had two halves. visitFile deleted leaves; postVisitDirectory deleted the (now-empty) directories. The walker's depth-first order guaranteed every child was visited before its parent's postVisitDirectory, so Files.delete(d) always saw an empty directory. Trying to delete the directory in preVisitDirectory would fail because the children are still there; trying to delete it with Files.delete(root) at the end would fail for the same reason. The post* hook is the whole point of the visitor API.
  • Throughout, SimpleFileVisitor was the base class and we overrode only the methods we needed. visitFileFailed was left at its default (throw), which for these temp-file demos is fine. For a scanner over a real filesystem you don't fully control — say, a virus scanner walking /, where files might be deleted out from under you — override visitFileFailed to log and CONTINUE.

What's next

Part 13 ends here. Files have been written, read, opened, copied, moved, deleted, walked, serialized. Streams have been buffered, decorated, formatted, mapped, channeled. The next part, Date and Time, moves to a completely different problem: representing instants, durations, calendar dates, time zones, and the formatting and parsing thereof — java.time, the modern API that replaced java.util.Date and Calendar.

Practice

Practice

You need to delete a directory tree containing 50 files across 10 nested subdirectories. Which `FileVisitor` hook implementation removes each directory only after its children are gone?