1. Computing

Dealing With Large Directories

By

It's tempting to gather up all of your directory entries in an Array and iterate over them in this way, but there's a good reason not to do this on large directories. Imagine you have a very large directory that holds thousands of images. For example, the frames of a movie about the be encoded. At 30 frames per second, the number of files in this directory is going to be very large. Gathering all of these filenames into an Array does two things that are undesirable.

First, it really taxes the operating system's file system code. There will be a lot of code running in kernel mode to walk through these directories, and that code often can't be preempted. That's going to make the entire system run slowly while it's walking over the directory at very high speed.

Second, it's going to tax Ruby's object system by creating one really huge Array and many String objects. This can make your program consume large amounts of memory, and make creating new and destroying objects more expensive. This should be avoided if possible.

The solution to this is to use Dir.open. This is similar to File.open in its use. You pass it a block, and the Dir or File is closed at the end of the block. Within that block you use the read and seek methods to read from the directory or file or to move the pointer. However, instead of reading text or binary data, we'll be reading filenames. Walking through a directory in this way will prevent you from having to gather all of the filenames into one large Array.

The following example demonstrates this. Note that the block form of Dir.open is used. If no block is passed, the method simply returns the Dir object and you must close it later.

 #!/usr/bin/env ruby
 
 Dir.open(ARGV[0]) do|dir|
   while(file = dir.read) do
     puts file
   end
 end
 

Alternatively, you can use the Dir#each method. This will do approximately the same thing.

 #!/usr/bin/env ruby
 
 Dir.open(ARGV[0]) do|dir|
   dir.each do|file|
     puts file
   end
 end
 
  1. About.com
  2. Computing
  3. Ruby
  4. Beginning Ruby
  5. Files and Directories
  6. Dealing With Large Directories

©2014 About.com. All rights reserved.