The "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL" error is raised by Mechanize when robots.txt is honored and you have tried to fetch a page forbidden by robots.txt.
What Causes the Error?
The robots.txt file is a file in the root directory of many web servers. Its purpose is to give any robots, web crawlers, spiders or bots information on which directories or URLs in the server it should not visit. Though obeying robots.txt is completely optional, your Mechanize program is free to ignore robots.txt (and this is the default behaviour, you must manually turn it on). To turn it on, use the Mechanize#robots= method.
agent = Mechanize.new agent.robots = true
If after you turn robots.txt checking on you try to visit any of the URLs listed in the robots.txt file for that domain, you will get the error "Mechanize::RobotsDisallowedError: Robots access is disallowed for URL".
How do I Fix the Error?
Since robots.txt works on the honor system, you can simply turn off this behavior.
agent.robots = false
Alternatively, you can catch the error and act accordingly.
agent = Mechanize.new agent.robots = true begin page = agent.get('http://example.com/something/disallowed.txt') #... rescue Mechanize::RobotsDisallowedError => e puts "Page disallowed by robots.txt" #... end