1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

Error: "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL"


The "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL" error is raised by Mechanize when robots.txt is honored and you have tried to fetch a page forbidden by robots.txt.

What Causes the Error?

The robots.txt file is a file in the root directory of many web servers. Its purpose is to give any robots, web crawlers, spiders or bots information on which directories or URLs in the server it should not visit. Though obeying robots.txt is completely optional, your Mechanize program is free to ignore robots.txt (and this is the default behaviour, you must manually turn it on). To turn it on, use the Mechanize#robots= method.

agent = Mechanize.new
agent.robots = true

If after you turn robots.txt checking on you try to visit any of the URLs listed in the robots.txt file for that domain, you will get the error "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL".

How do I Fix the Error?

Since robots.txt works on the honor system, you can simply turn off this behavior.

agent.robots = false

Alternatively, you can catch the error and act accordingly.

agent = Mechanize.new
agent.robots = true

  page = agent.get('http://example.com/something/disallowed.txt')
rescue Mechanize::RobotsDisallowedError => e
  puts "Page disallowed by robots.txt"
See More About
  1. About.com
  2. Technology
  3. Ruby
  4. Tutorials
  5. The Mechanize 2.0 Handbook
  6. Mechanize Error: "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL"

©2014 About.com. All rights reserved.