In Files

Parent

Mechanize

The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URLs is maintained and can be queried.

Example

  require 'mechanize'
  require 'logger'

  agent = Mechanize.new
  agent.log = Logger.new "mech.log"
  agent.user_agent_alias = 'Mac Safari'

  page = agent.get "http://www.google.com/"
  search_form = page.form_with :name => "f"
  search_form.field_with(:name => "q").value = "Hello"

  search_results = agent.submit search_form
  puts search_results.body

Issues with mechanize

If you think you have a bug with mechanize, but aren’t sure, please file a ticket at github.com/tenderlove/mechanize/issues

Here are some common problems you may experience with mechanize

Problems connecting to SSL sites

Mechanize defaults to validating SSL certificates using the default CA certificates for your platform. At this time, Windows users do not have integration between the OS default CA certificates and OpenSSL. # explains how to download and use Mozilla’s CA certificates to allow SSL sites to work.

Problems with content-length

Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.

The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:

  agent = Mechanize.new
  uri = URI 'http://example/invalid_content_length'

  begin
    page = agent.get uri
  rescue Mechanize::ResponseReadError => e
    page = e.force_parse
  end

Constants

VERSION

The version of Mechanize you are using.

AGENT_ALIASES

Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.

  • Linux Firefox (3.6.1)

  • Linux Konqueror (3)

  • Linux Mozilla

  • Mac Firefox (3.6)

  • Mac Mozilla

  • Mac Safari (5)

  • Mac Safari 4

  • Mechanize (default)

  • Windows IE 6

  • Windows IE 7

  • Windows IE 8

  • Windows IE 9

  • Windows Mozilla

  • iPhone (3.0)

Example:

  agent = Mechanize.new
  agent.user_agent_alias = 'Mac Safari'

Attributes

history_added[RW]

Callback which is invoked with the page that was added to history.

Public Class Methods

new() click to toggle source

Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:

  agent = Mechanize.new do |a|
    a.proxy_host = 'proxy.example'
    a.proxy_port = 8080
  end
     # File lib/mechanize.rb, line 148
148:   def initialize
149:     @agent = Mechanize::HTTP::Agent.new
150:     @agent.context = self
151:     @log = nil
152: 
153:     # attr_accessors
154:     @agent.user_agent = AGENT_ALIASES['Mechanize']
155:     @watch_for_set    = nil
156:     @history_added    = nil
157: 
158:     # attr_readers
159:     @pluggable_parser = PluggableParser.new
160: 
161:     @keep_alive_time  = 0
162: 
163:     # Proxy
164:     @proxy_addr = nil
165:     @proxy_port = nil
166:     @proxy_user = nil
167:     @proxy_pass = nil
168: 
169:     @html_parser = self.class.html_parser
170: 
171:     @default_encoding = nil
172:     @force_default_encoding = false
173: 
174:     # defaults
175:     @agent.max_history = 50
176: 
177:     yield self if block_given?
178: 
179:     @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass
180:   end

Public Instance Methods

back() click to toggle source

Equivalent to the browser back button. Returns the previous page visited.

     # File lib/mechanize.rb, line 189
189:   def back
190:     @agent.history.pop
191:   end
click(link) click to toggle source

If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.

     # File lib/mechanize.rb, line 290
290:   def click link
291:     case link
292:     when Page::Link then
293:       referer = link.page || current_page()
294:       if @agent.robots
295:         if (referer.is_a?(Page) and referer.parser.nofollow?) or
296:            link.rel?('nofollow') then
297:           raise RobotsDisallowedError.new(link.href)
298:         end
299:       end
300:       if link.noreferrer?
301:         href = @agent.resolve(link.href, link.page || current_page)
302:         referer = Page.new
303:       else
304:         href = link.href
305:       end
306:       get href, [], referer
307:     when String, Regexp then
308:       if real_link = page.link_with(:text => link)
309:         click real_link
310:       else
311:         button = nil
312:         form = page.forms.find do |f|
313:           button = f.button_with(:value => link)
314:           button.is_a? Form::Submit
315:         end
316:         submit form, button if form
317:       end
318:     else
319:       referer = current_page()
320:       href = link.respond_to?(:href) ? link.href :
321:         (link['href'] || link['src'])
322:       get href, [], referer
323:     end
324:   end
content_encoding_hooks() click to toggle source

A list of hooks to call before reading response header ‘content-encoding’.

The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.

     # File lib/mechanize.rb, line 256
256:   def content_encoding_hooks
257:     @agent.content_encoding_hooks
258:   end
current_page() click to toggle source

Returns the latest page loaded by Mechanize

     # File lib/mechanize.rb, line 196
196:   def current_page
197:     @agent.current_page
198:   end
Also aliased as: page
delete(uri, query_params = {}, headers = {}) click to toggle source

DELETE uri with query_params, and setting headers:

  delete('http://example/', {'q' => 'foo'}, {})
     # File lib/mechanize.rb, line 372
372:   def delete(uri, query_params = {}, headers = {})
373:     page = @agent.fetch(uri, :delete, headers, query_params)
374:     add_to_history(page)
375:     page
376:   end
download(uri, io_or_filename, parameters = [], referer = nil, headers = {}) click to toggle source

GETs uri and writes it to io_or_filename without recording the request in the history. If io_or_filename does not respond to # it will be used as a file name. parameters, referer and headers are used as in #.

By default, if the Content-type of the response matches a Mechanize::File or Mechanize::Page parser, the response body will be loaded into memory before being saved. See # for details on changing this default.

For alternate ways of downloading files see Mechanize::FileSaver and Mechanize::DirectorySaver.

     # File lib/mechanize.rb, line 340
340:   def download uri, io_or_filename, parameters = [], referer = nil, headers = {}
341:     page = transact do
342:       get uri, parameters, referer, headers
343:     end
344: 
345:     io = if io_or_filename.respond_to? :write then
346:            io_or_filename
347:          else
348:            open io_or_filename, 'wb'
349:          end
350: 
351:     case page
352:     when Mechanize::File then
353:       io.write page.body
354:     else
355:       body_io = page.body_io
356: 
357:       until body_io.eof? do
358:         io.write body_io.read 16384
359:       end
360:     end
361: 
362:     page
363:   ensure
364:     io.close if io and not io_or_filename.respond_to? :write
365:   end
get(uri, parameters = [], referer = nil, headers = {}) click to toggle source

GET the uri with the given request parameters, referer and headers.

The referer may be a URI or a page.

     # File lib/mechanize.rb, line 384
384:   def get(uri, parameters = [], referer = nil, headers = {})
385:     method = :get
386: 
387:     referer ||=
388:       if uri.to_s =~ %{\Ahttps?://}
389:         Page.new
390:       else
391:         current_page || Page.new
392:       end
393: 
394:     # FIXME: Huge hack so that using a URI as a referer works.  I need to
395:     # refactor everything to pass around URIs but still support
396:     # Mechanize::Page#base
397:     unless Mechanize::Parser === referer then
398:       referer = if referer.is_a?(String) then
399:                   Page.new URI(referer)
400:                 else
401:                   Page.new referer
402:                 end
403:     end
404: 
405:     # fetch the page
406:     headers ||= {}
407:     page = @agent.fetch uri, method, headers, parameters, referer
408:     add_to_history(page)
409:     yield page if block_given?
410:     page
411:   end
get_file(url) click to toggle source

GET url and return only its contents

     # File lib/mechanize.rb, line 416
416:   def get_file(url)
417:     get(url).body
418:   end
head(uri, query_params = {}, headers = {}) click to toggle source

HEAD uri with query_params and headers:

  head('http://example/', {'q' => 'foo'}, {})
     # File lib/mechanize.rb, line 425
425:   def head(uri, query_params = {}, headers = {})
426:     page = @agent.fetch uri, :head, headers, query_params
427: 
428:     yield page if block_given?
429: 
430:     page
431:   end
history() click to toggle source

The history of this mechanize run

     # File lib/mechanize.rb, line 205
205:   def history
206:     @agent.history
207:   end
max_history() click to toggle source

Maximum number of items allowed in the history. The default setting is 50 pages. Note that the size of the history multiplied by the maximum response body size

     # File lib/mechanize.rb, line 214
214:   def max_history
215:     @agent.history.max_size
216:   end
max_history=(length) click to toggle source

Sets the maximum number of items allowed in the history to length.

Setting the maximum history length to nil will make the history size unlimited. Take care when doing this, mechanize stores response bodies in memory for pages and in the temporary files directory for other responses. For a long-running mechanize program this can be quite large.

See also the discussion under #

     # File lib/mechanize.rb, line 228
228:   def max_history= length
229:     @agent.history.max_size = length
230:   end
page() click to toggle source
Alias for: current_page
post(uri, query={}, headers={}) click to toggle source

POST to the given uri with the given query. The query is specified by either a string, or a list of key-value pairs represented by a hash or an array of arrays.

Examples:

  agent.post 'http://example.com/', "foo" => "bar"

  agent.post 'http://example.com/', [%w[foo bar]]

  agent.post('http://example.com/', "<message>hello</message>",
             'Content-Type' => 'application/xml')
     # File lib/mechanize.rb, line 446
446:   def post(uri, query={}, headers={})
447:     return request_with_entity(:post, uri, query, headers) if String === query
448: 
449:     node = {}
450:     # Create a fake form
451:     class << node
452:       def search(*args); []; end
453:     end
post_connect_hooks() click to toggle source

A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.

     # File lib/mechanize.rb, line 269
269:   def post_connect_hooks
270:     @agent.post_connect_hooks
271:   end
pre_connect_hooks() click to toggle source

A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.

     # File lib/mechanize.rb, line 277
277:   def pre_connect_hooks
278:     @agent.pre_connect_hooks
279:   end
search(*args) click to toggle source
     # File lib/mechanize.rb, line 452
452:       def search(*args); []; end
visited?(url) click to toggle source

Returns a visited page for the url passed in, otherwise nil

     # File lib/mechanize.rb, line 235
235:   def visited? url
236:     url = url.href if url.respond_to? :href
237: 
238:     @agent.visited_page url
239:   end
Also aliased as: visited_page
visited_page(url) click to toggle source

Returns whether or not a url has been visited

Alias for: visited?

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.