1. Computing

Lookahead and Back-references

By

Ruby's regular expression language allows you to match previous groupings and look ahead in the character stream without actually consuming these characters. These mechanisms allow you to create regular expressions that make sense in context with what's around it.

What Is Lookahead?

Lookahead is like peeking ahead in the string without actually taking those characters out of the character stream. Because look-ahead does not remove the characters from the character stream, the characters in the look-ahead can still be matched by later elements of the regular expression.

An Example of Lookahead

In this example, two words which are separated by a colon will be matched and captured in groups. It's set up, however, so that the second word must begin with bar. The lookahead peeks forward to make sure the second word begins with bar before the regex will match and the groups captured.

A lookahead group uses the syntax (?=...). Anything within this special group will be matched, but not removed from the character stream. You can use any regular expression elements and quantifiers within this lookahead group, including other lookahead groups and other groupings.

 #!/usr/bin/env ruby
 
 strings = [
   "foo:bar",
   "foo:baritone",
   "foo:barbell"
 ]
 
 strings.each do|s|
   if s =~ /(foo):(?=bar)(\w+)/
     puts "#{$1} #{$2}"
   end
 end
 

Back-references

Back-references allow you to refer to your previous and yet-to-come groupings without having to re-iterate their contents. This is extremely useful if the same part of a string must be matched several times within the string.

In regexen with multiple groupings, the groupings are numbered by how close to the left-hand side their opening parentheses are. To find the back-reference number of any grouping, count parentheses from left to right until you reach the opening parenthesis of that group. Note that other operators with parentheses--such as the lookahead operator (?=...)--don't count here; only normal grouping parentheses do.

An Example of Back-references

The following example matches two words separated by a colon, but only if the two words are the same word. You'll notice that the normal parentheses groupings are used and both words are captured in a group, as is the first word in the group. The entire capture is back-reference #1, and, since its opening parenthesis is the second in the regex, the first word is back-reference #2.

 #!/usr/bin/env ruby
 
 strings = [
   "bar:baz",
   "han:luke",
   "foo:foo"
 ]
 
 # Only foo:foo will match
 # Using back-references
 strings.each do|s|
   if /((\w+):\2)/ =~ s
     puts "#{$1} #{$2}"
   end
 end
 
  1. About.com
  2. Computing
  3. Ruby
  4. Regular Expressions
  5. Lookahead and Back-references

©2014 About.com. All rights reserved.