1. Computing

ARGF, a Shortcut for Scripts

By

ARGF, a Shortcut for Scripts

It is a common task in system scripts to read all files mentioned in the argument list and process them all as a single stream of bytes. For example, you collect all expenses for each department of a university as a collection of CSV files, one for each department. If you then have a script that tabulates the total expenses, it can either open and read each file in turn, or simply use the ARGF shortcut.

Preparing for ARGF

Before you start using ARGF, know that it assumes all non-file arguments have been removed from ARGV. This can be done by hand, simply removing any string from ARGV that begins with a dash character -, but this is crude and hamfisted. Your script is now incapable of referring to any file with a name starting with a dash. For example, somescript.rb -a -b -- -a (where the last -a is referring to a file) will not work with this naive method. So, before using ARGF, it's best to properly parse options from ARGV.

This can be done simply with either the OptionParser library or one of its many replacements. The following example outlines a few dummy arguments and prints out what remains in ARGV so we can test this out.


#!/usr/bin/env ruby
require 'optparse'

options = {}
optparse = OptionParser.new do|opts|
  opts.on('-h', '--help', 'Display onscreen help') do
    puts opts
    exit
  end

  opts.on('-a')
  opts.on('-b')
  opts.on('-c')
end

# Parse arguments and remove them from ARGV
optparse.parse!

puts ARGV

When called like ./optparsetest.rb -a -b -- test -a we expect to see -a remain in ARGV. Since it occurs after the double dash, it's clear that (by convention), it's not referring to a command line option. And indeed, this program will produce test -a when run with those arguments. So, now that we've cleared the options out of ARGV, it's time to feed it to ARGF.

Start Using ARGF

You can now start using ARGF as you would any other IO object. It will handle opening all the files, gluing the characters together into one stream, etc. So, if we want to read the contents of every file into a single string, you could simply say ARGF.read and Ruby will handle the rest.

A more realistic example to go with the university example above is to read line by line using ARGF.each_line.

Duck Typing

ARGF is not an IO object. It does, however, provide the same set of methods that IO objects provide. So, if it walks like a duck and quacks like a duck, it's a duck. However, not all methods are provided by ARGF, but the ones you'd want are all represented.

  • binmode and binmode? - These read or set the binmode flag, which will change how Ruby interprets newlines. If you're dealing with binary data or text from another operating system (Windows and Linux use different line ending characters, for example), you may want to set binmode.
  • close - While you cannot close ARGF entirely, this method will close the current file and future reads from ARGF will read from the next file in the list.
  • each and each_line - These are the main ways to iterate over all data in the ARGF stream. When called without arguments, each will read each file an its entirety, while each_line will attempt to parse out each line and yield them to the passed block.
  • eof? - Returns true if the end of all the files has been reached and there are no more bytes to read from the stream.
  • file - Returns the current file as an IO object.
  • rewind - Resets ARGF entirely, starting from the first byte of the first file listed. Use ARGF.file.rewind to rewind the current file.
  • skip - Go to the next file in the list.
  1. About.com
  2. Computing
  3. Ruby
  4. Tutorials
  5. ARGF, a Shortcut for Scripts

©2014 About.com. All rights reserved.