1. Computing

Creating Hashes from Arrays

By

Creating Hashes from Arrays

Imagine the following situation: you've just collected some tabular data. You have one array of column names, and then an array of arrays representing the rows. However, the code you need to interface with expects an array of hashes, keys being column names and values being the value for that row and column. What's the easiest way to make these hashes?

The Example Data

Before we start, let's with some sample data. A problem like this is hard to work without some data to play with.


#!/usr/bin/env ruby

columns = [ "Name", "Average", "Math", "Science", "English" ]
rows = [
	[ "Alice", 0, 95, 90, 85 ],
	[ "Bob", 0, 65, 70, 90 ],
	[ "Carol", 0, 95, 80, 85 ],
	[ "Dave", 0, 75, 90, 90 ]
]

The columns array holds the column names. These will be the keys for the final hashes. The rows array is an array of arrays, holding the actual tabular data. The objective, once again, is to end up with an array of hashes, each hash's keys will be the column names, and data the data from that row.

The Loop Method

The first thing that pops into your head would be to use a loop. Loop over the rows, loop over the columns, match them up and voila, done! OK, this work, but it's not very Ruby-like. Here is the solution, not too pretty.


correlated = []
(0...rows.length).each do|row|
	h = {}

	(0...columns.length).each do|column|
		h[columns[column]] = rows[row][column]
	end

	correlated << h
end

puts correlated.inspect

This does work. But it looks like C code. Nested loops? Lots of indexing? There's an easier way, as long as you know about splat, soak and the Hash.[] method (which is a kind of voodoo in itself, it's not obvious what it does at first) as well as the zip method from Array. Kind of involved, but more elegant. There's a tradeoff here. To a newbie Rubyist, the above example would probably be best. It does work, and everyone can understand it.

The Zip and Splat Method

First, we need to get a handle on the Hash.[] class method. This method will create a hash instance with initial keys and values based on the parameter list. The parameters are alternating keys and values, so if we were to call Hash[ :key, 'value' ], this method call (it is a method call, just an odd overloaded operator on a class object, something you don't see very often) will product the hash { :key => 'value' }. Already you might be able to tell how this can help us.

Next, the Array#zip method. This will "zip" two arrays together, making an array of arrays, each array holding the elements of the two component arrays at the same index. It's kind of hard to put into words, it's be to just see it in action.


puts [1,2,3].zip( %w{a b c} ).inspect
# [[1, "a"], [2, "b"], [3, "c"]]

So, at index 0 of the resulting array is an array of each of the component arrays and what was indexed at index 0 in each of them. And again for index 1, and so on. We're going to be using this to zip the column names up with the actual data. This is essentially what that loop was doing, but this is doing the same thing more succinctly.

And the final piece of the puzzle. The Hash.[] method takes either a flat parameter list, like Hash[ 'a', 'b', 'c', 'd' ]. This is almost what we have. We have an array of two element arrays. How do we get these array elements into the parameter list? The splat operator and the flatten method.

The splat and soak operator is one of the less used operators in Ruby. When used in a method call, it will "splat" an Array into actual method parameters. So calling foo(*[1,2,3]) is the equivalent of calling foo(1,2,3). It turns arrays into parameter lists. It's the bridge between data structures and parameters lists. Parameters lists are one of the few things in Ruby that isn't an object, that can't be manipulated in the normal ways. The splat operator here is used to act as that bridge.

So, we have everything we need. We can zip the values together and we can create the hashes with the aid of the Hash.[] method and the splat operator. Here it is all put together.


correlated = rows.map{|r| Hash[ *r.zip(columns).flatten ] 

That's it. One line. A short line too. Without much line noise (non-alphanumeric characters that make it difficult to read, a reference to actual line noise on terminal programs operating over modems). Is this any better? This depends on who your audience is. For new Ruby programmers, the first solution would probably be best. For experienced programmers, the second is more compact and functionally the same thing.

After pulling this trick out of some old code, it seems that later versions of Ruby have made this even easier, letting Hash.[] take a zipped array directly. So Ruby 1.9.3 can simply say rows.map{|r| Hash[ r.zip(columns) ] }.

©2014 About.com. All rights reserved.