Java Scanner Class for Input Parsing

BufferedReader.readLine() from the buffered streams chapter is the right tool when the input is line-oriented and you want each line as a String. Scanner is the right tool when the input is a stream of tokens — integers, doubles, words separated by whitespace, or fields separated by a regex you choose. It's the parser that the JDK ships built-in.

Scanner is also the class most introductory Java tutorials use for reading from the keyboard. new Scanner(System.in) and you have a working interactive program in two lines. That convenience comes with one well-known pitfall — the nextInt/nextLine trap — which this chapter is mostly about.

What `Scanner` parses

The token-reading methods, paired with their hasNext predicates:

boolean hasNext();        String  next();           // a whitespace-delimited token
boolean hasNextInt();     int     nextInt();        // a token parsed as int
boolean hasNextLong();    long    nextLong();
boolean hasNextDouble();  double  nextDouble();
boolean hasNextBoolean(); boolean nextBoolean();
boolean hasNextLine();    String  nextLine();       // the rest of the current line

The contract is identical across the typed methods: hasNextX() checks whether the next token can be parsed as X without consuming it; nextX() consumes it. Mismatch (nextInt() when the token is "hello") throws InputMismatchException. End-of-stream throws NoSuchElementException.

A token is, by default, a maximal run of non-whitespace characters. The delimiter pattern is whatever Pattern.UNICODE_CHARACTER_CLASS considers whitespace — spaces, tabs, newlines, and friends. You can change it with useDelimiter(...).

Constructors

new Scanner(InputStream source);                     // typical: System.in
new Scanner(InputStream source, Charset charset);    // explicit charset (preferred for files)
new Scanner(Path source, Charset charset);           // open a file by path
new Scanner(String source);                          // parse a literal String — great for tests
new Scanner(Readable source);                        // wrap any Readable (Reader, CharBuffer, ...)

Same rule as the rest of java.io/java.nio: always pass an explicit charset when reading bytes. The no-charset constructors default to the platform encoding.

try (Scanner s = new Scanner(path, StandardCharsets.UTF_8)) {
  while (s.hasNextInt()) {
    process(s.nextInt());
  }
}

Closing the Scanner closes the underlying stream. Don't close a Scanner wrapping System.in — closing it closes System.in, and any further reads in the same JVM will fail.

The `nextInt` / `nextLine` trap

The single most-asked Java question on Stack Overflow.

Scanner s = new Scanner(System.in);
System.out.print("age: ");  int age = s.nextInt();
System.out.print("name: "); String name = s.nextLine();

Type 30, hit Enter, then Alice, hit Enter. Expected: age=30, name=Alice. Actual: age=30, name="".

The reason: nextInt() reads the digits 30 and stops. It leaves the trailing \n in the input buffer. The next nextLine() reads everything up to the next newline — which is right there, immediately — and returns the empty string before the user has a chance to type anything.

The fix is one of:

int age = s.nextInt(); s.nextLine();                 // explicit "skip to end of line"
String name = s.nextLine();

or, more robustly, parse the whole line yourself:

int age = Integer.parseInt(s.nextLine().trim());     // always reads the full line
String name = s.nextLine();

The second pattern is the one I reach for in real code. Mixing token-reading methods (nextInt, nextDouble, next) with line-reading (nextLine) is a recipe for off-by-one bugs; pick one and stick with it. Either parse line-by-line with nextLine, or parse token-by-token with next* and call nextLine only for the explicit "skip the rest of this line" purpose.

`hasNext` is the loop condition

The shape of every Scanner loop:

while (s.hasNextInt()) {                             // predicate, no exception
  int n = s.nextInt();                               // consume
  process(n);
}

hasNextInt() returns false at end-of-stream and when the next token isn't an integer — so the loop ends cleanly on EOF and on a non-numeric token (which is often the right thing, e.g. when the trailing footer is non-numeric). If you want to fail loudly instead, use hasNext() and let nextInt() throw InputMismatchException on mismatch:

while (s.hasNext()) {
  int n = s.nextInt();                               // throws if the token isn't an int
  process(n);
}

Same end-of-stream check, different behaviour on bad tokens.

Custom delimiters

The default delimiter is whitespace. For CSV-ish input you can change it:

s.useDelimiter(",|\\R");                             // comma or any line break

\\R is the Java regex for "any newline sequence" (\n, \r\n, \r, plus the Unicode line separators). The combined pattern splits on commas and line breaks, so 1,2,3\n4,5,6 yields six tokens.

That said: for real CSV, use a CSV library. Scanner doesn't handle quoted fields, escaped commas, or embedded newlines. For the simple cases — a list of numbers, a space-delimited config — it's perfect.

The locale gotcha

nextDouble() parses with the default locale's decimal separator. On a German JVM, 3.14 fails (3,14 is the German form). On a US JVM, 3,14 fails.

For machine-readable input, force the parser locale:

s.useLocale(Locale.ROOT);                            // dot as decimal separator, no grouping
double x = s.nextDouble();                           // now parses "3.14"

Locale.ROOT is the "neutral" locale — the convention for parsing data files that aren't meant for humans. Forgetting this is the most common reason a CSV reader works in development and fails in CI: the dev box and the CI box have different default locales.

`Scanner` vs `BufferedReader`

	`Scanner`	`BufferedReader`
Reads	tokens (typed)	lines (`String`)
Speed	slow (regex on every token)	fast
Convenience	high (`nextInt` etc.)	low (you parse yourself)
Right for	small inputs, interactive prompts, tests	large files, log processing, hot loops

Rule of thumb: if the input is from a human and you want types, use Scanner. If the input is a file and you want lines, use BufferedReader. For competitive-programming-sized inputs (millions of tokens), BufferedReader + StringTokenizer is an order of magnitude faster than Scanner.

A worked example: parsing a small text format

The program below parses a small space-delimited text file with three records per line — id name score — using Scanner. It demonstrates the hasNextInt() loop, the locale fix for nextDouble(), the nextInt/nextLine trap and its resolution, and finally useDelimiter for a CSV-like alternative.

java— editable, runs on the server

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.*;
import java.util.Locale;
import java.util.Scanner;

public class ScannerDemo {
  record Score(int id, String name, double value) {}

public static void main(String[] args) throws IOException {
    // --- 1. Write a tiny text file: id name score, one record per line ---
    Path file = Files.createTempFile("scanner-", ".txt");
    file.toFile().deleteOnExit();
    Files.writeString(file,
        "1 alice 97.5\n" +
        "2 bob   42.0\n" +
        "3 caro  88.25\n",
        StandardCharsets.UTF_8);

// --- 2. Read it back, token by token ---
    System.out.println("reading tokens (Locale.ROOT for the dot decimal):");
    try (Scanner s = new Scanner(file, StandardCharsets.UTF_8)) {
      s.useLocale(Locale.ROOT);                            // dot decimal separator regardless of host locale
      while (s.hasNextInt()) {
        Score score = new Score(s.nextInt(), s.next(), s.nextDouble());
        System.out.println("  " + score);
      }
    }

// --- 3. The nextInt / nextLine trap and the fix ---
    System.out.println("\nthe nextInt/nextLine trap:");
    String synthetic = "30\nAlice\n";
    try (Scanner s = new Scanner(synthetic)) {
      int age = s.nextInt();
      String nameBuggy = s.nextLine();                     // reads the leftover '' before '\n'
      System.out.println("  buggy : age=" + age + " name='" + nameBuggy + "'");
    }
    try (Scanner s = new Scanner(synthetic)) {
      int age = Integer.parseInt(s.nextLine().trim());     // line-oriented all the way
      String nameFixed = s.nextLine();
      System.out.println("  fixed : age=" + age + " name='" + nameFixed + "'");
    }

// --- 4. A custom delimiter for comma/newline-separated input ---
    System.out.println("\nCSV-ish input with useDelimiter:");
    String csv = "1,alice,97.5\n2,bob,42.0\n3,caro,88.25\n";
    try (Scanner s = new Scanner(csv)) {
      s.useDelimiter(",|\\R");                            // comma OR any newline
      s.useLocale(Locale.ROOT);
      while (s.hasNextInt()) {
        Score score = new Score(s.nextInt(), s.next(), s.nextDouble());
        System.out.println("  " + score);
      }
    }

// --- 5. Mismatch demo: hasNextInt() guards a row of mixed-type tokens ---
    System.out.println("\nhasNextInt() guards the loop and exits cleanly on the footer:");
    String mixed = "10 20 30 -- end --";
    try (Scanner s = new Scanner(mixed)) {
      int sum = 0;
      while (s.hasNextInt()) sum += s.nextInt();           // stops at '--'
      System.out.println("  numeric sum = " + sum + "  (footer ignored)");
    }
  }
}

What to take from the run:

The first read parsed three records of three different types in three lines of code. The token-based API is genuinely convenient when the input is shaped like tokens — no regex, no String.split, no manual Integer.parseInt. That's the case for Scanner.
useLocale(Locale.ROOT) was the line that made 97.5 parseable. Without it, the parser uses the JVM default locale; on a machine where that's German, 97.5 would throw InputMismatchException. For machine-readable input, always pin the locale.
The buggy/fixed split for the trap printed name='' then name='Alice'. The bug was real — nextInt() left the \n in the buffer — and the line-oriented fix (Integer.parseInt(s.nextLine().trim())) was the cleanest way to avoid mixing the two read styles. Pick a style and stick with it.
The useDelimiter("," + "|" + "\\R") block parsed comma-separated rows with the same token-reading code, just with a different delimiter. The same caveat applies as in the prose: this works for clean CSV and breaks on real-world CSV with quoted fields. Use a real CSV library for anything that came out of Excel.
The mixed-input footer (-- end --) showed why hasNextInt() is the right loop condition: it returned false at the first non-integer token and the loop exited cleanly. Switching to hasNext() would have let the loop continue until nextInt() threw — both shapes are useful, depending on whether a non-integer token is "we're done" or "the input is broken."

What's next

PrintWriter (the previous chapter) and Scanner are the character-oriented input/output classes most introductory Java code uses. The next chapter, Java PrintStream, covers the byte-oriented sibling of PrintWriter — and explains why System.out and System.err are PrintStreams instead of PrintWriters.

Practice

On a JVM with German as the default locale, you call `scanner.nextDouble()` to parse '3.14' from a config file. What happens, and what's the fix?

`InputMismatchException` because German uses comma as decimal separator. Fix: call `scanner.useLocale(Locale.ROOT)` before the parse so the dot is acceptedIt returns `3.14` — `Scanner` always uses the dot decimal separator regardless of the JVM localeIt returns `314.0` — the dot is treated as a grouping separatorIt returns `3.0` — the `.14` part is skipped silently because it can't be parsed

What Scanner parses