Java Scanner Class
Parse primitives and strings from text input in Java with the Scanner class — nextInt, nextLine, useDelimiter.
Java Scanner Class
BufferedReader.readLine() from the buffered-streams chapter is the right tool when the input is line-oriented and you want each line as a String. Scanner is the right tool when the input is a stream of tokens — integers, doubles, words separated by whitespace, or fields separated by a regex you choose. It's the parser that the JDK ships built-in.
Scanner is also the class most introductory Java tutorials use for reading from the keyboard. new Scanner(System.in) and you have a working interactive program in two lines. That convenience comes with one well-known pitfall — the nextInt/nextLine trap — which this chapter is mostly about.
What Scanner parses
The token-reading methods, paired with their hasNext predicates:
boolean hasNext(); String next(); // a whitespace-delimited token
boolean hasNextInt(); int nextInt(); // a token parsed as int
boolean hasNextLong(); long nextLong();
boolean hasNextDouble(); double nextDouble();
boolean hasNextBoolean(); boolean nextBoolean();
boolean hasNextLine(); String nextLine(); // the rest of the current lineThe contract is identical across the typed methods: hasNextX() checks whether the next token can be parsed as X without consuming it; nextX() consumes it. Mismatch (nextInt() when the token is "hello") throws InputMismatchException. End-of-stream throws NoSuchElementException.
A token is, by default, a maximal run of non-whitespace characters. The delimiter pattern is whatever Pattern.UNICODE_CHARACTER_CLASS considers whitespace — spaces, tabs, newlines, and friends. You can change it with useDelimiter(...).
Constructors
new Scanner(InputStream source); // typical: System.in
new Scanner(InputStream source, Charset charset); // explicit charset (preferred for files)
new Scanner(Path source, Charset charset); // open a file by path
new Scanner(String source); // parse a literal String — great for tests
new Scanner(Readable source); // wrap any Readable (Reader, CharBuffer, ...)Same rule as the rest of java.io/java.nio: always pass an explicit charset when reading bytes. The no-charset constructors default to the platform encoding.
try (Scanner s = new Scanner(path, StandardCharsets.UTF_8)) {
while (s.hasNextInt()) {
process(s.nextInt());
}
}Closing the Scanner closes the underlying stream. Don't close a Scanner wrapping System.in — closing it closes System.in, and any further reads in the same JVM will fail.
The nextInt / nextLine trap
The single most-asked Java question on Stack Overflow.
Scanner s = new Scanner(System.in);
System.out.print("age: "); int age = s.nextInt();
System.out.print("name: "); String name = s.nextLine();Type 30, hit Enter, then Alice, hit Enter. Expected: age=30, name=Alice. Actual: age=30, name="".
The reason: nextInt() reads the digits 30 and stops. It leaves the trailing \n in the input buffer. The next nextLine() reads everything up to the next newline — which is right there, immediately — and returns the empty string before the user has a chance to type anything.
The fix is one of:
int age = s.nextInt(); s.nextLine(); // explicit "skip to end of line"
String name = s.nextLine();or, more robustly, parse the whole line yourself:
int age = Integer.parseInt(s.nextLine().trim()); // always reads the full line
String name = s.nextLine();The second pattern is the one I reach for in real code. Mixing token-reading methods (nextInt, nextDouble, next) with line-reading (nextLine) is a recipe for off-by-one bugs; pick one and stick with it. Either parse line-by-line with nextLine, or parse token-by-token with next* and call nextLine only for the explicit "skip the rest of this line" purpose.
hasNext is the loop condition
The shape of every Scanner loop:
while (s.hasNextInt()) { // predicate, no exception
int n = s.nextInt(); // consume
process(n);
}hasNextInt() returns false at end-of-stream and when the next token isn't an integer — so the loop ends cleanly on EOF and on a non-numeric token (which is often the right thing, e.g. when the trailing footer is non-numeric). If you want to fail loudly instead, use hasNext() and let nextInt() throw InputMismatchException on mismatch:
while (s.hasNext()) {
int n = s.nextInt(); // throws if the token isn't an int
process(n);
}Same end-of-stream check, different behaviour on bad tokens.
Custom delimiters
The default delimiter is whitespace. For CSV-ish input you can change it:
s.useDelimiter(",|\\R"); // comma or any line break\\R is the Java regex for "any newline sequence" (\n, \r\n, \r, plus the Unicode line separators). The combined pattern splits on commas and line breaks, so 1,2,3\n4,5,6 yields six tokens.
That said: for real CSV, use a CSV library. Scanner doesn't handle quoted fields, escaped commas, or embedded newlines. For the simple cases — a list of numbers, a space-delimited config — it's perfect.
The locale gotcha
nextDouble() parses with the default locale's decimal separator. On a German JVM, 3.14 fails (3,14 is the German form). On a US JVM, 3,14 fails.
For machine-readable input, force the parser locale:
s.useLocale(Locale.ROOT); // dot as decimal separator, no grouping
double x = s.nextDouble(); // now parses "3.14"Locale.ROOT is the "neutral" locale — the convention for parsing data files that aren't meant for humans. Forgetting this is the most common reason a CSV reader works in development and fails in CI: the dev box and the CI box have different default locales.
Scanner vs BufferedReader
Scanner | BufferedReader | |
|---|---|---|
| Reads | tokens (typed) | lines (String) |
| Speed | slow (regex on every token) | fast |
| Convenience | high (nextInt etc.) | low (you parse yourself) |
| Right for | small inputs, interactive prompts, tests | large files, log processing, hot loops |
Rule of thumb: if the input is from a human and you want types, use Scanner. If the input is a file and you want lines, use BufferedReader. For competitive-programming-sized inputs (millions of tokens), BufferedReader + StringTokenizer is an order of magnitude faster than Scanner.
A worked example: parsing a small text format
The program below parses a small space-delimited text file with three records per line — id name score — using Scanner. It demonstrates the hasNextInt() loop, the locale fix for nextDouble(), the nextInt/nextLine trap and its resolution, and finally useDelimiter for a CSV-like alternative.
What to take from the run:
- The first read parsed three records of three different types in three lines of code. The token-based API is genuinely convenient when the input is shaped like tokens — no regex, no
String.split, no manualInteger.parseInt. That's the case forScanner. useLocale(Locale.ROOT)was the line that made97.5parseable. Without it, the parser uses the JVM default locale; on a machine where that's German,97.5would throwInputMismatchException. For machine-readable input, always pin the locale.- The buggy/fixed split for the trap printed
name=''thenname='Alice'. The bug was real —nextInt()left the\nin the buffer — and the line-oriented fix (Integer.parseInt(s.nextLine().trim())) was the cleanest way to avoid mixing the two read styles. Pick a style and stick with it. - The
useDelimiter("," + "|" + "\\R")block parsed comma-separated rows with the same token-reading code, just with a different delimiter. The same caveat applies as in the prose: this works for clean CSV and breaks on real-world CSV with quoted fields. Use a real CSV library for anything that came out of Excel. - The mixed-input footer (
-- end --) showed whyhasNextInt()is the right loop condition: it returnedfalseat the first non-integer token and the loop exited cleanly. Switching tohasNext()would have let the loop continue untilnextInt()threw — both shapes are useful, depending on whether a non-integer token is "we're done" or "the input is broken."
What's next
PrintWriter (the previous chapter) and Scanner are the character-oriented input/output classes most introductory Java code uses. The next chapter, Java PrintStream, covers the byte-oriented sibling of PrintWriter — and explains why System.out and System.err are PrintStreams instead of PrintWriters.
Practice
On a JVM with German as the default locale, you call `scanner.nextDouble()` to parse '3.14' from a config file. What happens, and what's the fix?