As you may already know, strings in Ruby are what are known as first class objects which use a number of methods for queries and manipulation. (If you don't already know, check out the Using Strings tutorial here.) One of the most basic string manipulation actions is to split a string into multiple sub-strings. This would be done, for example, if you have a string like"foo, bar, baz" and you want the three strings "foo", "bar", and "baz". The split method of the String class can accomplish this for you.
The Basic Usage of 'split'
The most basic usage of the split method is to split a string based on a single character or static sequence of characters. If split's first argument is a string, the characters in that string are used as a string separator delimiter. Whereas in comma delimited data, the comma is used to separate data, here each component of the string is used to separate the data./p]
str = "foo,bar,baz"
Add Flexibility With Regular Expressions
However, there are easier ways to delimit the string. Using a regular expression as your delimiter makes the split method a lot more flexible. Again, take for example the string "foo, bar,baz". There is a space after the first comma, but not after the second. If the string "," is used as a delimiter, the space will still exist at the beginning of the "baz" string. If the string ", " is used (with a space after the comma), it will only match the first comma as the second comma doesn't have a space after it. It's very limiting.
The solution to this problem is to use a regular expression as your delimiter argument instead of a string. Regular expressions allow you to match not only static sequences of characters, but also indeterminate numbers of characters and optional characters.
Writing Regular Expressions
When writing a regular expression for your delimiter, the first step is to describe in words what the delimiter is. In this case, the phase "a comma that might be followed by one or more spaces" is reasonable. There are two elements to this regex: the comma and the optional spaces. The spaces will use the * (star, or asterisk) quantifier, which means "zero or more." Any element that precedes this will match zero or more times. For example, the regex /a*/ will match a sequence of zero or more 'a' characters.
str = "foo, bar,baz"
puts str.split( /, */ )
Limiting the Number of Splits
Imagine a comma separated value string such as "10,20,30,This is an arbitrary string". This format is three numbers followed by a comment column. This comment column can contain arbitrary text, including text with commas in it. To prevent split from splitting the text of this column, we can set a maximum number of columns to split. Note that this will only work if the comment string with the arbitrary text is the last column of the table.
To limit the number of splits the split method will perform, pass the number of fields in the string as a second argument to the split method.
str = "10,20,30,Ten, Twenty and Thirty"
puts str.split( /, */, 4 )
Ten, Twenty and Thirty
Knowing the Limitations
The split method has some rather large limitations. Take for example the string '10,20,"Bob, Eve and Mallory",30'. What's intended is two numbers, followed by a quoted string (that may contain commas) and then another number. Split cannot correctly separate this string into fields. In order to do this, the string scanner has to be stateful, which means it can remember if it's inside of a quoted string or not. The split scanner is not stateful, so it cannot solve problems like this one.