Lookahead and Lookbehind

Sometimes it is necessary to detect merely those matches for a pattern that are preceded and followed by another pattern.

Specific syntaxes are used to meet that goal. They are known as lookahead and lookbehind. Together they are called lookaround.

As a rule, lookaround corresponds to characters, giving up the match and returning only the result: no match or match. That’s why they are considered assertions. They don’t employ characters in a string, and only state whether the match is successful or not.

To be more precise, let’s try to find the price from the following string: 1 lesson costs 15€. Here you can see a number, followed by .

Lookahead

The syntax of lookahead is the following:

X(?=Y)

It commands to look for X but match only when it is followed by X. Any pattern can be used instead of X and Y.

For an integer number that is followed by , the regular expression will be \d+(?=€). Here, how it looks like:

let str = "1 lesson costs 15€";
console.log(str.match(/\d+(?=€)/)); // 15, the number 1 is ignored, as it is not followed by the sign €

The lookahead is just a test, hence the parentheses contests (?=...) are not included in the result 10.

While looking for X(?=Y), the engine of the regular expression detects X and then checks whether there is Y right after it. In case there is no Y, then the match is skipped, and the search goes on.

A pattern like X(?=Y)(?=Z) considers searching for X followed by and then Z simultaneously. It can be possible only when Y and Z are mutually exclusive.

Here is an example:

let str = "1 lesson costs 15€";
console.log(str.match(/\d+(?=\s)(?=.*15)/)); // 1

Negative Lookahead

Now, imagine you need to get the quantity instead of the price from the same string. In our case, it’s a number \d+, not followed by .

You can use the negative lookahead for that purpose.

The syntax of the negative lookahead is X(?!Y), considering the search for X, only if it is not followed by Y, like here:

let str = "2 lessons cost 30€";
console.log(str.match(/\d+(?!€)/)); // 2 (skipping the price)

Lookbehind

As it was noted above, lookahead allows adding a condition for what is ahead. Now, let’s discover loohbehind. The same logic works here. Lookbehind allows adding a condition for what is behind. In other words, it allows matching a pattern only if there is something before it.

Lookbehind can also be positive and negative. The positive lookbehind syntax is (?<=y)x< kbd>, considering that X will be matched only if there is Y before it. the syntax of the negative lookbehind is (?, considering that X will be matched, only if there is no Y before it.

Let’s check out an example:

let str = "1 lesson costs $15";
// the dollar sign is escaped \$
alert(str.match(/(?<=\$)\d+/)); // 15,the sole number is skipped

In the example above, the price is changed to US dollars. But, if you want to get the quantity, then you should use the negative lookbehind, like this:

let str = "2 lessons cost $30";
console.log(str.match(/(?<!\$)\d+/)); // 2 ,the price is skipped

Capturing Groups

Usually, the contents inside the lookaround parentheses don’t become part of the result.

But, there are situations when it’s necessary to capture the lookaround expression or just a part of it. It is possible though wrapping that part into additional parentheses.

The currency sign (€|kr) is captured along with the amount, in the example below:

let str = "1 lesson costs 15€";
let regexp = /\d+(?=(€|kr))/; // additional parentheses around €|kr
console.log(str.match(regexp)); // 15, €

The same situation is with lookbehind in this example:

let str = "1 lesson costs $15";
let regexp = /(?<=(\$|£))\d+/;
console.log(str.match(regexp)); // 15, $

Summary

For matching a pattern that is preceded and/or followed by another one are used the lookaround syntaxes. Lookahead is useful for matching something depending on the context after it, and lookbehind- the context before it.

The same thing is done manually for simple regular expressions. In other words, matching everything, in any context, then filtering it in the loop.

For example, str.match and str.matchAll return matches as arrays with the index property.

But, lookaround is much more convenient, especially for more complex regular expressions.




Do you find this helpful?

Related articles