1. Technology
You can opt-out at any time. Please refer to our privacy policy for contact information.

Finding Links

By

Most likely, you'll be wanting to find links and click on them. There are three methods that help you in finding links, Page#links, Page#link_with and Page#links_with. This also typifies the pattern in the Page class for finding and interacting with links, frame, forms and other elements.

Page#links will return an array of all links. This Array will contain instances of the class Mechanize::Page::Link. You can use this to either iterate over all links, or manually search for links in ways that Page::link_with cannot. The following snippet will fetch Google's homepage, iterate over all links, and print the link text.


agent = Mechanize.new
page = agent.get('http://www.google.com/')

page.links.each do|link|
  puts link.text
end

If you need a specific link or list of links with a common attribute, the Page#link_with or Page#links_with methods can be used. The only way these two differ is that Page#link_with will return the first matching link found, and Page#links_with will return an array of all matching links. Typically, you'll be searching for links by link text or link URL (its "href"). The following two snippets demonstrate how to find a link with the text "News", click on it and then find all links that lead to www.nytimes.com.


agent = Mechanize.new
page = agent.get('http://www.google.com/')

news = page.link_with(:text => 'News').click
news.links_with(:href => %r{^http://www.nytimes.com}i).each do|link|
  puts link.text
end

This snippet was written to illustrate the two typical ways in which links are searched for: by text and by href. Further, the text and href can be matched with either a String (in which case a simple comparison is used) or a Regexp (remember that %r{} is another way to do a Regexp literal).

This pattern will be repeated throughout the Page class, so it's worth investigating this a little further. How does this work? Well, for every key sent in the hash argument for any of these methods, Mechanize will send that key to each link as a method call and compare the result of that method call with that key's value. For example, if I were to call page.link_with(:text => 'News'), Mechanize will iterate over every link and do the equivalent of link.text. It will then compare the result of that with the string 'News'. So in other words, it searches for links by doing the equivalent of 'News' === link.text on each link, and returns the first one where this evaluates to true. Also note that it's using the "triple equals" operator, which will also return true if the left hand side is a matching Regexp.

The Mechanize::Page::Link class is rather bare, so there aren't many messages it will respond to look for links. But these are the ones of interest.

  • href - The target URL of the link. This is most often matched with Regexp, since the target may be an absolute or relative URL.
  • text - The text of the link. Since the exact text of links are often known, this is a very common way to search for links.
  • dom_id - The id attribute of the link tag. For shorthand, you can also use the :id key. This is renamed to :dom_id by Mechanize, since id is a reserved Ruby method on all objects. This is a common way to search for links with ids defined, but not all links have defined ids.

One final thing to note is that all of these methods return blocks, and yield what they return. In other words, instead of doing this:


links = page.links_with(:text => 'Something')
links.each do|link|
  #...
end

You can do this:


page.links_with(:text => 'Something') do|links|
  links.each do|link|
    #...
  end
end

This doesn't technically add any functionality to Mechanize, but it does allow you more freedom in how you want your code to read.

  1. About.com
  2. Technology
  3. Ruby
  4. Tutorials
  5. The Mechanize 2.0 Handbook
  6. Finding Links

©2014 About.com. All rights reserved.