The word boundary \b corresponds to positions when one of the sides is a word character, and the other one is not.
Word boundaries are especially handy when it is necessary to match a sequence of letters or digits on their own. Or it is useful when you want to ensure that they happen at the start or the end of the sequence of characters.
Once the regular expression engine comes across \b, it checks whether the position within the string is a word boundary.
Three different positions are qualified as word boundaries. They are the following:
- At the beginning of the string, if the initial string character is a word character \w.
- Between two characters within a string, where one is considered a word character, the other- not.
- At the end of the string, in case the last string character is \w.
For example, a regexp \bW3\b can be found in Welcome to W3 where W3 is considered a standalone word, but not Welcome to W3docs as follows:
It matches the \bWelcome\b pattern. There are three primary reasons for that:
- At the string beginning, the first test \b matches.
- Then the word Welcome matches.
- Afterward, the test \b matches again.
The \bW3\b pattern could also match but not \bWell\b, as there isn’t any word boundary after l. The W3\b couldn’t match either because the exclamation sign is not \w.
Here is an example:
You can use \b both with words and digits.
For instance, the \b\d\d\b pattern searches for standalone two-digit numbers.
That is, it searches for two-digit numbers surrounded by characters different from \w (for example, punctuation or spaces), like this;
Another important thing to note: \b will not work for non-latin alphabets.
So, \w means a Latin letter a-z (an underscore or a digit). Therefore, the test doesn’t work for other characters such as Chinese hieroglyphs, Cyrillic letters, and so on.