W3docs

Java Characters (char)

Represent single characters in Java with the char primitive and use the Character wrapper class for utility methods.

The char primitive holds a single 16-bit unsigned value — a UTF-16 code unit. For most Latin, Cyrillic, Greek, Arabic, CJK, and many other scripts that fits the whole character. For emoji and some less common scripts, one user-perceived character takes two chars (a surrogate pair).

char literals

A character literal is one character enclosed in single quotes:

char a = 'A';
char z = 'z';
char digit = '7';
char punct = '!';

You can also use escape sequences and Unicode escapes:

char newline = '\n';
char tab = '\t';
char quote = '\'';
char back = '\\';
char copy = '©';   // ©
char pi = 'π';     // π

char is a 16-bit integer

A char value is, internally, an unsigned 16-bit integer. You can do arithmetic on it and convert to/from int:

char c = 'A';
int code = c;             // 65 — implicit widening to int
char next = (char) (c + 1); // 'B'
char digit5 = (char) ('0' + 5); // '5'

The classic trick for testing character ranges:

boolean isUpper = c >= 'A' && c <= 'Z';
boolean isDigit = c >= '0' && c <= '9';

The Character wrapper

Character is the wrapper class. It has dozens of static utility methods for classifying and converting characters — more reliable than the c >= '0' style above because they're Unicode-aware:

Character.isLetter('A');         // true
Character.isDigit('7');          // true
Character.isLetterOrDigit('é');  // true
Character.isWhitespace(' ');     // true
Character.isUpperCase('A');      // true
Character.isLowerCase('a');      // true
Character.toUpperCase('a');      // 'A'
Character.toLowerCase('A');      // 'a'
Character.getNumericValue('7');  // 7
Character.toString('A');         // "A"

Prefer the Character methods over manual range checks when you might encounter non-ASCII text.

A char is one UTF-16 code unit, not always one user character

This is the subtle part. Java strings are UTF-16. For characters with code points up to U+FFFF (the Basic Multilingual Plane — most languages, most punctuation), one code point fits in one char. For characters above U+FFFF — most emoji, ancient scripts, and some musical symbols — two chars are needed (a surrogate pair).

String emoji = "🎉";
System.out.println(emoji.length());           // 2 — surrogate pair
System.out.println(emoji.codePointCount(0, 2)); // 1 — one user character

If your code processes user-supplied text and might encounter emoji or rare scripts, prefer code-point-aware methods (String.codePoints(), Character.toString(int), Character.isLetter(int)) over the char-based ones.

char[] — character arrays

A char[] is sometimes used for performance-sensitive text processing or for passwords (you can zero out the array after use, whereas you can't zero a String):

char[] greeting = {'H', 'e', 'l', 'l', 'o'};
String s = new String(greeting);     // "Hello"

char[] back = s.toCharArray();

A demonstration

java— editable, runs on the server

What's next

Java Math Class — the static-method library for arithmetic that goes beyond +, -, *, /.

Practice

Practice

Which statements about char in Java are correct?