1. Computing

Using Glob with Directories


In the previous article in this series, we discussed the basics of the Dir class. Particularly useful was the Dir.foreach method for iterating over all files in a directory. However, this only goes so far. What if you want to iterate over only some files (for example, just XML files) in a directory?

This is where Dir.glob and its cousins come in. By "globbing" a number of files, you can use regular expression-like pattern matching to select just the files you want. Note that though these are like regular expressions, they are not. They're very limited compared to Ruby's regular expressions and are more closely related to shell expansion wildcards than regular expressions.

We'll start off with an example. The following glob will match all files ending in .rb in the current directory. It uses a single wildcard, the asterisk. The asterisk will match zero or more characters, so any file ending in .rb will match this glob, including a file called simply .rb, with nothing before the dot. The glob method will return all files that match the globbing rules as an array, which can be saved for later use or iterated over.

 #!/usr/bin/env ruby
 Dir.glob('*.rb').each do|f|
 puts f

There are only a few wildcards to learn. Below is a list of the wildcards, and following is a code example demonstrating their use.

  • * - Match zero or more characters. A glob consisting of only the asterisk and no other characters or wildcards will match all files in the current directory. The asterisk is usually combined with a file extension, if not more characters to narrow down the search.

  • ** - Match all directories recursively. This is used to descend into the directory tree and find all files in sub-directories of the current directory, rather than just files in the current directory. This wildcard is explored in the example code.

  • ? - Match any one character. This is useful for finding files whose name are in a particular format. For example, 5 characters and a .xml extension could be expressed as ?????.xml.

  • [a-z] - Match any character in the character set. The set can be either a list of characters, or a range separated with the hyphen character. Character sets follow the same syntax as and behave in the same manner as character sets in regular expressions.

  • {a,b} - Match pattern a or b. Though this looks like a regular expression quantifier, it isn't. For example, in regular expression, the pattern a{1,2} will match 1 or 2 'a' characters. In globbing, it will match the string a1 or a2. Other patterns can be nested inside of this construct.

One thing to consider is case sensitivity. It's up to the operating system to determine whether TEST.txt and TeSt.TxT refer to the same file. On Linux and other systems, these are different files. On Windows, these will refer to the same file.

One final thing to note is the Dir[globstring] convenience method. This is functionally the same as Dir.glob(globstring) and is also semantically correct (you are indexing a directory, much like an array). For this reason, you may see Dir[] more often than Dir.glob, but they are the same thing.

The following example program will demonstrate as many patterns as it can in many combinations.

 #!/usr/bin/env ruby
 # Get all .xml files
 # Get all files with 5 characters and a .jpg extension
 # Get all jpg, png and gif images
 # Descend into the directory tree and get all jpg images
 # Note: this will also file jpg images in the current directory
 # Descend into all directories starting with Uni and find all
 # jpg images.
 # Note: this only descends down one directory
 # Descend into all directories starting with Uni and all
 # subdirectories of directories starting with Uni and find
 # all .jpg images
  1. About.com
  2. Computing
  3. Ruby
  4. Beginning Ruby
  5. Files and Directories
  6. Using Glob with Directories

©2014 About.com. All rights reserved.