1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

Splitting Strings

By

Computer skills
John Lamb/Photographer's Choice RF/Getty Images

Unless user input is a single word or number, that input will need to be split, or turned into a list of strings or numbers. For instance, if a program asks for your full name, including middle initial, it will first need to split that input into three separate strings before it can work with your individual first, middle and last name. This is achieved using the String#split method.

In its most basic form, String#split take a single argument: the field delimiter as a string. This delimiter will be removed from the output and an array of strings split on the delimiter will be returned. So, in the following example, assuming the user input their name correctly, you should receive a 3-element Array from the split.


#!/usr/bin/env ruby
print "What is your full name? "
full_name = gets.chomp

name = full_name.split(' ')
puts "Your first name is #{name.first}"
puts "Your last name is #{name.last}"

If we run this program and enter a name, we'll get some expected results. Also note that name.first and name.last are coincidences. The name variable will be an Array, and those two method calls will be equivalent to name[0] and name[-1] respectively.


$ ruby split.rb
What is your full name? Michael C. Morin
Your first name is Michael
Your last name is Morin

But String#split is a little bit smarter than you'd think. If the argument to String#split is a string, it does indeed use that as the delimiter, but if the argument is a string with a single space (as we used), then it infers that you mean to split on any amount of whitespace, and that you also want to remove any leading whitespace. So, if we were to give it some slightly malformed input such as Michael C. Morin (with extra spaces), then String#split would still do what is expected. But that's the only special case when you pass a String as the first argument.

Regular Expression Delimiters

However, you can also pass a regular expression as the first argument. Here, String#split becomes a bit more flexible. We can also make our little name splitting code a bit smarter. We don't want the period at the end of the middle initial. We know it's a middle initial, and the database won't want a period there, so we can remove it while we split. When String#split matches a regular expression, it does the same exact thing as if it had just matched a string delimiter: it takes it out of the output and splits it at that point. So we can evolve our example a little bit.


$ cat split.rb
#!/usr/bin/env ruby
print "What is your full name? "
full_name = gets.chomp

name = full_name.split(/\.?\s+/)
puts "Your first name is #{name.first}"
puts "Your middle initial is #{name[1]}"
puts "Your last name is #{name.last}"

Default Record Separator

Ruby is not real big on "special variables" that you might find in languages like Perl, but String#split does use one you need to be aware of. This is the default record separator variable, also known as $;. It's a global, something you don't often see in Ruby, so if you change it it might effect other parts of the code, so make sure you change it back afterward. But all this variable does is act as the default value for the first argument to String#split. By default, this variable seems to be set to nil. However, if String#split's first argument is nil, it will replace it with a single space string.

Zero-Length Delimiters

If the delimiter passed to String#split is a zero-length string or regular expression, then String#split will act a bit differently. It will remove nothing at all from the original string and split on every character. This essentially turns the string into an array of equal length containing only one-character strings, one for each character in the string. This can be useful for iterating over the string, and was used in pre-1.9.x and pre-1.8.7 (which backported a number of features from 1.9.x) to iterate over characters in a string without worrying about breaking up multi-byte unicode characters. However, if what you really want to do is iterate over a string and you're using 1.8.7 or 1.9.x, you should probably use String#each_char instead.


#!/usr/bin/env ruby

str = "She turned me into a newt!"
str.split('').each do|c|
  puts c
end

Limiting The Length of the Returned Array

So back to our name parsing example, what if someone has a space in their last name? For instance, Dutch surnames can often begin with "van" (meaning "of" or "from"). We only really want a 3-element array, so we can use the second argument to String#split that we have so far ignored. The second argument is expected to be a Fixnum. If this argument is positive, at most that many elements will be filled in the array. So in our case, we would want to pass 3 for this argument.


#!/usr/bin/env ruby
print "What is your full name? "
full_name = gets.chomp

name = full_name.split(/\.?\s+/, 3)
puts "Your first name is #{name.first}"
puts "Your middle initial is #{name[1]}"
puts "Your last name is #{name.last}"

And if we run this again and give it a Dutch name, it will act as expected.


$ ruby split.rb
What is your full name? Vincent Willem van Gogh
Your first name is Vincent
Your middle initial is Willem
Your last name is van Gogh

However, if this argument is negative (any negative number), then there will be no limit on the number of elements in the output array and any trailing delimiters will appear as zero-length strings at the end of the array. This is demontrated in the following IRB snippet.


:001 > "this,is,a,test,,,,".split(',', -1)
 => ["this", "is", "a", "test", "", "", "", ""] 
  1. About.com
  2. Technology
  3. Ruby
  4. Beginning Ruby
  5. Strings
  6. Splitting Strings in Ruby - the String#split Method

©2014 About.com. All rights reserved.