1. Computing

Regular Expressions: Grouping

Using the Parentheses Operator

By

Some of the most powerful and useful operators in the Ruby regex syntax are the grouping operators. While quantifiers only work on the previous element, the grouping operators allow you to group multiple elements together into a single element. They also allow you to extract single groups later after the match has been made.

The Parentheses Operator

The parentheses operators are used to group a number of elements into a single element. For example, to match the string "abcabcabc" without grouping operators, the only regex that would match this would be /abcabcabc/. Unfortunately, that's limiting. If, for example, the string "abcabc" must also match, this regex fails. The solution? Group one of the "abc" sequences and use a quantifier.

What this regex really wants to do is to match the sequences "abc" repeated any number of times. The string "abcabcabc" should match, but "abcabc" should as well. The regex /(abc)+/, which makes use of the parentheses operator, can be used to achieve this. The parentheses group the three elements a, b and c into a single element. The quantifier + will match any sequence of one or more of the combined element abc.

Putting it Into Practice

The grouping operators get a lot of use and more complex regexen are often an indecipherable soup of parentheses. It can be said that the parentheses grouping operator is the backbone of all non-trivial regexen. Not only are single-level parentheses used, but parentheses with quantifiers are also used inside of other parentheses with other quantifiers.

In the following example, the programmer wants to extract the numeric information from an IP address. Given 192.168.0.1, the programmer wants the array [192, 168, 0, 1].

First, the match method will be used to ensure the IP address follows the correct format. Note that this can still match invalid IP addresses such as 456.777.323.546 (the numbers cannot be higher than 255), so later testing for a valid address will need to be made.

Next, the scan method is used to extract strings of digits separated by periods. Scan will emit lists of all the groupings it found on each iteration. These extracted groupings are referred to as "captures." The list of captures is a list of lists. Since each iteration only had one grouping, it's safe to call flatten on this to make it a single flat list. However, if there were multiple captures per iteration of scan which you wanted to keep separate, you would have to use map and the index operator.

Lastly, the example program uses map to create integers out of the strings.

#!/usr/bin/env ruby

ip = "192.168.0.1"

if( ip.match(/((\d+).?)+/).nil? )
  puts "Not an IP address"
  exit
end

numbers = ip.scan(/(\d+).?/).
  flatten.
  map{|n| n.to_i }

puts numbers.inspect
  1. About.com
  2. Computing
  3. Ruby
  4. Regular Expressions
  5. Regular Expressions: Grouping

©2014 About.com. All rights reserved.