1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

What's new in Ruby 2.1.0: Strings

Freezing and Scrubbing Strings

By

This article is part of a series. To read more, see What's new in Ruby 2.1?.

Strings. You cannot avoid them. Every program uses strings of some kind. There are a few changes and tweaks you can use to your advantage in Ruby 2.1.0.

Frozen Strings

Strings are a big issue in Ruby when it comes to performance. Every time you declare a Ruby string, a new object is created and the string data copied into it. This means if you have a string literal in your method, every time you call the method a new string object will be created. This is a relatively expensive operation.

Most of the time, you can get around this using symbols. A symbol can be thought of as a single instance of a string. Every time you refer to a symbol of the same name, that single instance of Symbol is referred to. For example, I can say :sasquatch in one method and in a totally unrelated method I can refer to :sasquatch again and both times they'll refer to the same Symbol object, only one object for every symbol exists.

Symbols are fast, but they're not strings. You can't split a symbol, or index it, or search through it, or really do anything with it without converting it to a string (which kind of defeats the purpose of using a symbol). You can use symbols when referring to "the thing called," but manipulating them in any way is out of the question. What's needed is something in between a symbol and a mutable string. This is what frozen strings are for.

In the past, you were able to create frozen strings using "some string"f. Notice the f suffix. This syntax has been removed (or deprecated), and the more standard "some string".freeze method syntax optimized. Frozen strings are created once, and are immutable just like symbols, but all of the traditional string methods (barring those that modify the string) are available. While frozen strings are not new, the optimization of the .freeze method is new.

Scrubbing Strings

Scrubbing strings is important when working with multibyte encoded strings (which Ruby defaults to now). "Scrubbing" a string is removing invalid byte sequences from a string while keeping all valid byte sequences intact. This is important when sending strings downstream to "dumber" clients if these clients can conceivably display incorrect data as garbage.

The String#scrub method scrubs all invalid character sequences from a string and returns the new string. It can be called with no arguments like "string".scrub, which replaces all invalid sequences with an empty string ("deleting" them). You can also call it with a simple string replacement, such as "abc\u3042\x81".scrub("X") which replaced the invalid sequence \x81 with X, producing the string "abc\u3042X". You can also use a block, and the docs for the method give a very useful method for debugging: a block that replaces all invalid sequences with their hex equivalent so you can see the invalid sequences without delving into the Ruby debugger or IRB: "abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' }. This will produce the string "abc\u3042".

  1. About.com
  2. Technology
  3. Ruby
  4. Beginning Ruby
  5. New Features in Ruby 2.1.0
  6. What's new in Ruby 2.1.0: Strings

©2014 About.com. All rights reserved.