1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

More That Static Regular Expressions

By

Procedural Living Regular Expressions

I came across a challenge on Reddit and thought it could be implemented in an interesting way. The challenge here is to implement something similar to Time#strftime. There are a number of goals in challenges such as these. You can opt for the fastest implementation, the shortest, or even the strangest. I like the go for the DRYest, which often ends up being the most elegant way of doing things.

The Challenge in a Nutshell

As I said before, the challenge is to, more or less, re-implement the String#strftime method. This method takes a string such as "%M/%d/%y" and replaces each code with the month, day and year of the time object, respectively. While it's not practical by any means to re-implement this unless you have a really, really good reason, these types of challenges often have no real purpose other than to exercise your programming ability.

Attempt 1: Build a Hash

Ruby is about as dynamic as you can get. The types of things you can do at runtime boggle my mind sometimes. Coming from languages like C++ where you have little to no information about objects at runtime, the ability to not only query objects but build entire classes or methods programatically kind of knocked a few screws loose when I first learned of it. If we're going to remain DRY, we're going to have to do a few things programatically.

First, we essentially have two things here. We have a list of strings we want the match (the formatting codes) and some actions we want to do when Ruby encounters them that generate the correct time information. My first instinct is to make a hash of these things (using the "correct" hash syntax, of course) with the keys being the strings we want to match and the values a proc that does the work. So here it is, let's take a detailed look at it.


#!/usr/bin/env ruby

module TimeFormatter
  FORMAT_CODES = {
    l: proc{|t| "%03d" % (t.usec / 1000) },
    s: proc{|t| "%02d" % t.sec },
    m: proc{|t| "%02d" % t.min },
    h: proc{|t| t.hour % 12 },
    H: proc{|t| t.hour },
    c: proc{|t| t.hour < 12 ? 'AM' : 'PM' },
    d: proc{|t| t.day },
    M: proc{|t| t.month },
    y: proc{|t| t.year }
  }

  FORMAT_REGEXP = Regexp.union(FORMAT_CODES.keys.map{|str| "%#{str}" })

  def format(s)
    s.gsub(FORMAT_REGEXP) do|code|
      FORMAT_CODES[code[-1].to_sym].call self
    end
  end
end

puts Time.now.extend(TimeFormatter).format(ARGV[0])

There are a few things going on here. First, we bundled up this functionality in a module we can either extend specific objects with or if we want the functionality all across the program, we can include in the Time class itself.

Next, in a constant we define our hash. The keys are the codes we're looking for in symbol form (minus the % character). Each value is a proc that takes a single Time object and spits out the data for the substitution. This looks pretty good, but it's not "good enough," as you'll see below.

The regular expression itself is formed by using Regexp#union on each of the keys prepended with the % character. Essentially, what this does is make a regular expression that matches any of the patterns. For example, Regexp.union(['a','b']) will produce a regular expression equivalent to /a|b/.

Finally, the actual substitution is performed in the format method. The gsub method is used, the last character of the code is extracted using string indexing and if the format code hash has the key, call the proc passing in the time object (self, in this case, since we're in a mixin module) and it just works. This is pretty dry, but there's a better (in my opinion) way.

Attempt 2: Don't Fight Ruby

After looking at the first attempt, I came to the realization that I was essentially recreating methods using a hash. Since Ruby is so dynamic, you can actually query a module for the methods that have been defined in it even from within the code defining the module. We don't have to use a hash here, we can just use methods. This also means that since the methods exist within the mixin module they have a concept of "self" and no longer need an argument to get to the time. The result is something DRYer and much more elegant.


#!/usr/bin/env ruby

module TimeFormatter
  def l; "%03d" % (usec / 1000); end
  def s; "%02d" % sec; end
  def m; "%02d" % min; end
  def h; hour % 12; end
  def H; hour; end
  def c; hour < 12 ? 'AM' : 'PM'; end
  def d; day; end
  def M; month; end
  def y; year; end

  REGEXP = Regexp.union instance_methods.map{|m| "%#{m}" }

  def format(s)
    s.gsub(REGEXP) {|k| send k[-1] }
  end
end

puts Time.now.extend(TimeFormatter).format(ARGV[0])

The first thing that stands out is all the methods defined on one line. While this is typically not the "Ruby way," I thought it was best or else the class would just drag on and on, "one line" methods typically take up four lines, whitespace included. And remember that anywhere a newline exists in Ruby, you can generally replace it with a semicolon. As for the methods themselves, they're almost identical to the procs we defined in the hash. But now since they have a concept of "self," they're a bit shorter.

Now, how to access the methods at runtime? In this case, the correct way is instance_methods. Things defined with def something and not def self.something end up as instance methods of the module or class you're in, and the naked call to instance_methods is sending a message to the "self" of the current scope. The "self" of the scope of a module definition is the module object itself, so this is the equivalent of calling TimeFormatter.instance_methods. I know, this stuff can make your head spin. I recommend the book Metaprogramming Ruby by Paolo Perrotta if you want to know more about how this stuff works inside Ruby.

So we get our list of methods using instance_methods and do more or less the same thing as last time using Regexp.union. However, you'll notice that the format method (defined after the REGEXP constant was defined, so it's not included in the instance_method calls) is much simpler. I just gsubs the patterns with whatever is returned when you send the pattern to self as a method call. The self object knows what to do with itself, essentially.

What do you Think?

Can you come up with something better? This is a challenge, after all. Maybe I'm way off base here, maybe I missed a much better, much more elegant solution? Or maybe a faster solution? Give it a try!

  1. About.com
  2. Technology
  3. Ruby
  4. Advanced Ruby
  5. Living Regular Expressions

©2014 About.com. All rights reserved.