1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

Thread Pools

By

In the previous example, we scheduled 10 threads and then just let them go until they were finished. This can be useful, but something more useful is to fire up 10 or so threads, periodically monitor their status, end the completed ones and start new threads. This is referred to as a "thread pool," for when you have a lot of tasks to complete but don't want to start a thread for each and every one of them right away. For example, you have 500 downloads to do, but starting 500 concurrent HTTP connections (perhaps to the same server) would be both inefficient and rude.


#!/usr/bin/env ruby
require 'thread'
require 'pp'

jobs = Array.new(20).map{ rand(10) }
max_threads = 3
threads = []

puts "Jobs:"
pp jobs

until jobs.empty? && threads.empty?
  until jobs.empty? || threads.size == max_threads
    threads << Thread.new do
      job = jobs.pop
      puts "Sleeping #{job} seconds"
      sleep job
    end
  end

  sleep 1

  threads.each do|t|
    t.join unless t.status
  end

  threads.delete_if{|t| not t.status }
end

In this example, you have a "thread pool," the threads array. This thread pool can hold max_threads threads at a time. The jobs array holds a list of tasks that must be completed (again, using placeholder sleep calls). The main loop will continue until both the jobs array and thread pool are empty (until all jobs have been launched, and until all jobs threads have finished).

The first inner loop makes sure there are always max_threads jobs running unless the jobs array is empty. Each thread grabs a job from the list and gets working on it. There is, however, a bug here called a "race condition" where more than one thread tries to access the jobs list simultaneously (or, since Ruby threads don't actually run simultaneously, almost simultaneously) and the jobs list becomes corrupted, or two threads end up with the same job. To see how this is resolved, see the next article on mutexes (mutual exclusion locks).

The main thread then sleeps. You don't need to sleep, but it will take 100% of your CPU resources (on a single core) if it continually checks for finished threads, so throwing some kind of sleep in there is a good idea, even if it's just for a fraction of a second. The final two loops end finished threads and delete finished threads from the thread pool. Both of these loops make use of the Thread#status method. If they were to simply call join as in the previous example, they'd wait around forever until the current thread being queried finished. Instead, by checking its status first and only joining the thread if it's finished, no time is wasted. The Thread#status method returns false or nil if the thread is finished (both of which evaluate to false in a boolean expression).

This type of thread pooling is commonplace. Where you have a large number of IO bound tasks (or CPU-bound tasks and are using JRuby), they can seriously speed up your Ruby programs.

  1. About.com
  2. Technology
  3. Ruby
  4. Advanced Ruby
  5. Thread Pools

©2014 About.com. All rights reserved.