Recursive Regular Expressions

\"Yo The regular expressions we use in our daily lives are actually not that “regular.” Most of the languages support some kind of extended regular expressions that are computationally more powerful than the “regular” regular expressions as defined by the formal language theory.

For instance, the so often used capture buffers add auxiliary storage to the regular expressions that allow them to match an arbitrary pattern repeatedly. Or look-ahead assertions that allow the regular expression engine to peek ahead before it making a decision.

The Perl programming language has an especially rich with regex engine. One of the engine’s features is the lazy regular subexpressions. The lazy regular subexpressions are expressed as (??{ code }), where the “code” is arbitrary Perl code that gets executed when the moment this subexpression may match.

This allows us to construct something really interesting – we can define a regular expression that has itself in the “code” part. The result is a recursive regular expression!

One of the classical problems that a regular expression can’t match is the language 0n1n, i.e., a string with a number of zeroes followed by an equal number of ones. Surprisingly, using the lazy regular subexpressions this problem becomes tractable!

Here is a Perl regular expression that matches 0n1n:

 my $regex = qr/0(??{$regex})*1/;

This regular expression matches a 0 followed by itself zero or more times, followed by a one. If the itself part doesn’t match, then the string this regular expression matches is 01. If the itself part matches once, the string this regular expression matches is 0011. If it matches two times, the string is 000111, …, etc.

Read More:


About Amit Agarwal

4 Comments on “Recursive Regular Expressions”

2 Trackbacks on “Recursive Regular Expressions”