The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URLs is maintained and can be queried.
require 'mechanize' require 'logger' agent = Mechanize.new agent.log = Logger.new "mech.log" agent.user_agent_alias = 'Mac Safari' page = agent.get "http://www.google.com/" search_form = page.form_with :name => "f" search_form.field_with(:name => "q").value = "Hello" search_results = agent.submit search_form puts search_results.body
If you think you have a bug with mechanize, but aren’t sure, please file a ticket at github.com/tenderlove/mechanize/issues
Here are some common problems you may experience with mechanize
Mechanize defaults to validating SSL certificates using the default CA certificates for your platform. At this time, Windows users do not have integration between the OS default CA certificates and OpenSSL. # explains how to download and use Mozilla’s CA certificates to allow SSL sites to work.
Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.
The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:
agent = Mechanize.new uri = URI 'http://example/invalid_content_length' begin page = agent.get uri rescue Mechanize::ResponseReadError => e page = e.force_parse end
The version of Mechanize you are using.
Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.
Linux Firefox (3.6.1)
Linux Konqueror (3)
Linux Mozilla
Mac Firefox (3.6)
Mac Mozilla
Mac Safari (5)
Mac Safari 4
Mechanize (default)
Windows IE 6
Windows IE 7
Windows IE 8
Windows IE 9
Windows Mozilla
iPhone (3.0)
Example:
agent = Mechanize.new agent.user_agent_alias = 'Mac Safari'
Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:
agent = Mechanize.new do |a| a.proxy_host = 'proxy.example' a.proxy_port = 8080 end
# File lib/mechanize.rb, line 148 148: def initialize 149: @agent = Mechanize::HTTP::Agent.new 150: @agent.context = self 151: @log = nil 152: 153: # attr_accessors 154: @agent.user_agent = AGENT_ALIASES['Mechanize'] 155: @watch_for_set = nil 156: @history_added = nil 157: 158: # attr_readers 159: @pluggable_parser = PluggableParser.new 160: 161: @keep_alive_time = 0 162: 163: # Proxy 164: @proxy_addr = nil 165: @proxy_port = nil 166: @proxy_user = nil 167: @proxy_pass = nil 168: 169: @html_parser = self.class.html_parser 170: 171: @default_encoding = nil 172: @force_default_encoding = false 173: 174: # defaults 175: @agent.max_history = 50 176: 177: yield self if block_given? 178: 179: @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass 180: end
Equivalent to the browser back button. Returns the previous page visited.
# File lib/mechanize.rb, line 189 189: def back 190: @agent.history.pop 191: end
If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.
# File lib/mechanize.rb, line 290 290: def click link 291: case link 292: when Page::Link then 293: referer = link.page || current_page() 294: if @agent.robots 295: if (referer.is_a?(Page) and referer.parser.nofollow?) or 296: link.rel?('nofollow') then 297: raise RobotsDisallowedError.new(link.href) 298: end 299: end 300: if link.noreferrer? 301: href = @agent.resolve(link.href, link.page || current_page) 302: referer = Page.new 303: else 304: href = link.href 305: end 306: get href, [], referer 307: when String, Regexp then 308: if real_link = page.link_with(:text => link) 309: click real_link 310: else 311: button = nil 312: form = page.forms.find do |f| 313: button = f.button_with(:value => link) 314: button.is_a? Form::Submit 315: end 316: submit form, button if form 317: end 318: else 319: referer = current_page() 320: href = link.respond_to?(:href) ? link.href : 321: (link['href'] || link['src']) 322: get href, [], referer 323: end 324: end
A list of hooks to call before reading response header ‘content-encoding’.
The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.
# File lib/mechanize.rb, line 256 256: def content_encoding_hooks 257: @agent.content_encoding_hooks 258: end
Returns the latest page loaded by Mechanize
# File lib/mechanize.rb, line 196 196: def current_page 197: @agent.current_page 198: end
DELETE uri with query_params, and setting headers:
delete('http://example/', {'q' => 'foo'}, {})
# File lib/mechanize.rb, line 372 372: def delete(uri, query_params = {}, headers = {}) 373: page = @agent.fetch(uri, :delete, headers, query_params) 374: add_to_history(page) 375: page 376: end
GETs uri and writes it to io_or_filename without recording the request in the history. If io_or_filename does not respond to # it will be used as a file name. parameters, referer and headers are used as in #.
By default, if the Content-type of the response matches a Mechanize::File or Mechanize::Page parser, the response body will be loaded into memory before being saved. See # for details on changing this default.
For alternate ways of downloading files see Mechanize::FileSaver and Mechanize::DirectorySaver.
# File lib/mechanize.rb, line 340 340: def download uri, io_or_filename, parameters = [], referer = nil, headers = {} 341: page = transact do 342: get uri, parameters, referer, headers 343: end 344: 345: io = if io_or_filename.respond_to? :write then 346: io_or_filename 347: else 348: open io_or_filename, 'wb' 349: end 350: 351: case page 352: when Mechanize::File then 353: io.write page.body 354: else 355: body_io = page.body_io 356: 357: until body_io.eof? do 358: io.write body_io.read 16384 359: end 360: end 361: 362: page 363: ensure 364: io.close if io and not io_or_filename.respond_to? :write 365: end
GET the uri with the given request parameters, referer and headers.
The referer may be a URI or a page.
# File lib/mechanize.rb, line 384 384: def get(uri, parameters = [], referer = nil, headers = {}) 385: method = :get 386: 387: referer ||= 388: if uri.to_s =~ %{\Ahttps?://} 389: Page.new 390: else 391: current_page || Page.new 392: end 393: 394: # FIXME: Huge hack so that using a URI as a referer works. I need to 395: # refactor everything to pass around URIs but still support 396: # Mechanize::Page#base 397: unless Mechanize::Parser === referer then 398: referer = if referer.is_a?(String) then 399: Page.new URI(referer) 400: else 401: Page.new referer 402: end 403: end 404: 405: # fetch the page 406: headers ||= {} 407: page = @agent.fetch uri, method, headers, parameters, referer 408: add_to_history(page) 409: yield page if block_given? 410: page 411: end
GET url and return only its contents
# File lib/mechanize.rb, line 416 416: def get_file(url) 417: get(url).body 418: end
HEAD uri with query_params and headers:
head('http://example/', {'q' => 'foo'}, {})
# File lib/mechanize.rb, line 425 425: def head(uri, query_params = {}, headers = {}) 426: page = @agent.fetch uri, :head, headers, query_params 427: 428: yield page if block_given? 429: 430: page 431: end
The history of this mechanize run
# File lib/mechanize.rb, line 205 205: def history 206: @agent.history 207: end
Maximum number of items allowed in the history. The default setting is 50 pages. Note that the size of the history multiplied by the maximum response body size
# File lib/mechanize.rb, line 214 214: def max_history 215: @agent.history.max_size 216: end
Sets the maximum number of items allowed in the history to length.
Setting the maximum history length to nil will make the history size unlimited. Take care when doing this, mechanize stores response bodies in memory for pages and in the temporary files directory for other responses. For a long-running mechanize program this can be quite large.
See also the discussion under #
# File lib/mechanize.rb, line 228 228: def max_history= length 229: @agent.history.max_size = length 230: end
POST to the given uri with the given query. The query is specified by either a string, or a list of key-value pairs represented by a hash or an array of arrays.
Examples:
agent.post 'http://example.com/', "foo" => "bar" agent.post 'http://example.com/', [%w[foo bar]] agent.post('http://example.com/', "<message>hello</message>", 'Content-Type' => 'application/xml')
# File lib/mechanize.rb, line 446 446: def post(uri, query={}, headers={}) 447: return request_with_entity(:post, uri, query, headers) if String === query 448: 449: node = {} 450: # Create a fake form 451: class << node 452: def search(*args); []; end 453: end
A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
# File lib/mechanize.rb, line 269 269: def post_connect_hooks 270: @agent.post_connect_hooks 271: end
A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
# File lib/mechanize.rb, line 277 277: def pre_connect_hooks 278: @agent.pre_connect_hooks 279: end
# File lib/mechanize.rb, line 452 452: def search(*args); []; end
Returns a visited page for the url passed in, otherwise nil
# File lib/mechanize.rb, line 235 235: def visited? url 236: url = url.href if url.respond_to? :href 237: 238: @agent.visited_page url 239: end
Returns whether or not a url has been visited
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.