Character classes are used for distinguishing characters like distinguishing between digits and letters.
Let’s start from a practical case. Imagine you have a phone number like +3(522) -865-42-76, and wish to turn it into pure numbers (35228654276). To meet that goal, it is necessary to find and remove everything that’s not a number.Character classes are there to help you with that.
So, a character class can be described as a specific notation that corresponds to any symbol from a certain set.
We will start from the “digit” class. It should be written as \d and matches any single digit.
In the example below, let’s find the first digit:
let str = "+3(522)865-42-76"; let regexp = /\d/; console.log(str.match(regexp)); // 3
With no flag g, the regular expression searches for the first match, which is the first \d.
Adding the g flag will enable finding all the digits, like this:
let str = "+3(522)865-42-76";let regexp = /\d/g; console.log(str.match(regexp)); // array of matches: 3,5,2,2,8,6,5,4,2,7,6 // the digits-only phone number of them: console.log(str.match(regexp).join('')); // 35228654276
So, it is a character class for digits. But there exist other character classes, too.
The most used character classes are as follows:
- \d ( comes from digit): a digit (a character from 0 to 9).
- \s ( comes from space): a space symbol. It contains \t (tabs),\n (newlines), and other characters (\v, \f,\r ).
- \w (comes from word): it is either a letter of the Latin alphabet, a digit, or an underscore (_). Non-latin letters don't belong to this class.
A regular expression can include regular symbols, as well as character classes.
Let’s see an example where CSS\d corresponds to a string CSS with a digit following it:
let str = "It is CSS3?"; let regexp = /CSS\d/ console.log(str.match(regexp)); // CSS3
Multiple character classes can be used, like this:
alert("It is HTML5!".match(/\s\w\w\w\w\d/)); // ' HTML5'
There is an “inverse class” for every character class, denoted with the same but uppercase letter.
“Inverse” means that it corresponds to all other characters:
- \D - non-digit. It accepts any character, except \d (for instance, a letter).
- \S - non-space. Accepts any character, except \s (for instance, a letter).
- \W- non-wordly character. Accepts anything , except \w ( non-latin letter or a space).
A dot (.) is considered a special character class corresponding to “any character except a newline”.
The example will look like this:
console.log("W".match(/./)); // W
In the example below, the dot is in the middle of a regexp:
let regexp = /HTM.5/; console.log("HTML5".match(regexp)); // HTML5 console.log("HTM-5".match(regexp)); // HTM-5 console.log("HTM 5".match(regexp)); // HTM 5(space is a character, too)
So, the dot is considered “any character”, but not the “absense of a character”.
There should be a character for matching it, like here:
console.log("HTM5".match(/HTM.5/)); // null, no match, as there's no character for the dot
A dot doesn’t correspond to the newline character \n by default.
For example, the regexp A.B corresponds to A, and then B with any character between them, except for an \n newline, like this:
console.log("W\nD".match(/W.D/)); // null (no match)
There are circumstances when one wants a dot to mean “any character”, including newline.
The flag s is used for that. In case a regexp has it, then a dot corresponds literally to any character, like this:
console.log("W\nD".match(/W.D/s)); //W\nD (match!)
It is important to pay special attention to spaces. For example, the strings 1-5 and 1 - 5 are similar to each other. But, in case a regexp doesn’t take spaces into account, it might not work.
For finding the digits, separated by a hyphen, you can act like this:
console.log("1 - 5".match(/\d-\d/)); // null, no match!
Now, let’s fix it by adding spaces in the regular expression \d - \d, like here:
console.log("1 - 5".match(/\d - \d/)); // 1 - 5, now it works // or we can use \s class: console.log("1 - 5".match(/\d\s-\s\d/)); // 1 - 5, also works
A space is considered a character. In importance, it is equal to any other character. You can add or remove spaces from a regexp, expecting to work the same way. That is, in a regexp all the characters matter.