1. Technology
Send to a Friend via Email
You can opt-out at any time. Please refer to our privacy policy for contact information.

Fibers

Lightweight Concurrency

By

A fiber is a lightweight concurrency model. In the progression of most expensive to light weight, fiber sits very close to the bottom. For example, processes are the most expensive, followed by threads, and further down still are fibers. But to understand how fibers work, you first have to understand a bit about multitasking operating system.

Multitasking Operating Systems

The primary job of a modern multtiasking operating system is to manage multiple tasks (or processes, jobs, threads, etc). Your computer has limited CPU resources, it doesn't have enough cores to run every process currently alive simultaneously, so it has to multitask. When multitasking, the CPU sets an interrupt to fire at a set interval (say 100Hz), and each time this interrupt fires the OS pauses the current task and loads another task into the CPU and lets it run. This is called a context switch, and it takes a certain amount of time. The more tasks you have, the more context switches there will be and the less time the CPU will spend doing actual work. This technique is called preemptive multitasking.

Additionally, every time another thread or process is created a new translation table for the MMU (Memory Management Unit) must be created, as well as new stack and heap memory allocated. On a Linux system, a process creates a whole new process completely independent of the parent process (such as the shell process launching a program), but the lighter weight threads allows a second task to execute in the same memory as another. Processes are more expensive than threads as more memory has to be allocated and the context switches take longer.

Fibers are a way to avoid all of this. They're not useful for parallel programming, that is for getting more than one program to truly execute at the same time, but they are useful for concurrent programming, or allowing two independent programs to run side by side. This is done by implementing cooperating multitasking. With preemptive multitasking the OS schedules new tasks to run whether the task is busy or not, and indeed the task doesn't even know when it's being paused to let another process run. With cooperative multitasking, the current job explicitly tells the other jobs that it's their turn to run. Since there is no context switch involved, Fibers are very cheap, and you can create a large amount of them to segment your work.

A Practical Example

While Fibers are not useful for parallel programming, they are very useful when working with a producer consumer model. The producer consumer model allows for some tasks to be producers, they make or read data for the consumer to ultimately use. Since the producers are often simple loops, they can easily be abstracted out into a fiber.

Say you have a large CSV file you want to consume (for sample data, see Lahman's Baseball Database). You could write the CSV parsing code directly into your consumer loop, but that means the consumer is tightly coupled to the CSV. You could write a class that produces parsed CSV lines, but that is a bunch of code overhead and a lot more work for what should be a simple loop. Enter the Fiber, a handy way to implement this with little more than a block.


#!/usr/bin/env ruby
require 'fiber'
require 'csv'

producer = Fiber.new do
  # Open the file
  csv = File.open(ARGV[0])

  # Waste the first line, which are the column names
  csv.readline

  until csv.eof?
    Fiber.yield CSV.parse(csv.readline)
  end
end

while producer.alive?
  # Do something with the line
  puts producer.resume.join(',')
end

In this example, the fiber opens a file and reads single lines. It returns parsed CSV lines as arrays. Every time there is a Fiber.yield call, the execution of the fiber stops and the value is yielded to the code that called the fiber's resume method. When the fiber reaches the end of its loop, the fiber ends and the alive? method returns false.

So what is the advantage of this? First, it decouples the specifics of dealing with a CSV file from the inner loop. This means your code that actually matters can be much cleaner. Second, the decoupled producer code is physically near to the consumer code. It's not off in some class in another file. It's not some generic CSV parsing class with 100 methods you'll never use. It's right there, easy to be seen and understood. Another advantage is the producer loop is simpler. It's a true loop, it doesn't need instance variables to remember its state since it has its own stack and instruction pointer.

  1. About.com
  2. Technology
  3. Ruby
  4. Advanced Ruby
  5. Fibers

©2014 About.com. All rights reserved.