An HTTP (and local disk access) user agent. This class is an implementation detail and is subject to change at any time.
A list of hooks to call after retrieving a response. Hooks are called with the agent and the response returned.
A list of hooks to call before making a request. Hooks are called with the agent and the request to be performed.
Follow HTML meta refresh and HTTP Refresh. If set to :anywhere meta refresh tags outside of the head element will be followed.
Follow an HTML meta refresh that has no “url=” in the content attribute.
Defaults to false to prevent infinite refresh loops.
Controls how this agent deals with redirects. The following values are allowed:
:all, true | All 3xx redirects are followed (default) |
:permanent | Only 301 Moved Permanantly redirects are followed |
false | No redirects are followed |
Responses larger than this will be written to a Tempfile instead of stored in memory. Setting this to nil disables creation of Tempfiles.
Creates a new Mechanize HTTP user agent. The user agent is an implementation detail of mechanize and its API may change at any time.
# File lib/mechanize/http/agent.rb, line 122 122: def initialize 123: @conditional_requests = true 124: @context = nil 125: @content_encoding_hooks = [] 126: @cookie_jar = Mechanize::CookieJar.new 127: @follow_meta_refresh = false 128: @follow_meta_refresh_self = false 129: @gzip_enabled = true 130: @history = Mechanize::History.new 131: @ignore_bad_chunking = false 132: @keep_alive = true 133: @max_file_buffer = 100_000 # 5MB for response bodies 134: @open_timeout = nil 135: @post_connect_hooks = [] 136: @pre_connect_hooks = [] 137: @read_timeout = nil 138: @redirect_ok = true 139: @redirection_limit = 20 140: @request_headers = {} 141: @robots = false 142: @user_agent = nil 143: @webrobots = nil 144: 145: # HTTP Authentication 146: @auth_store = Mechanize::HTTP::AuthStore.new 147: @authenticate_parser = Mechanize::HTTP::WWWAuthenticateParser.new 148: @authenticate_methods = Hash.new do |methods, uri| 149: methods[uri] = Hash.new do |realms, auth_scheme| 150: realms[auth_scheme] = [] 151: end 152: end 153: @digest_auth = Net::HTTP::DigestAuth.new 154: @digest_challenges = {} 155: 156: # SSL 157: @pass = nil 158: 159: @scheme_handlers = Hash.new { |h, scheme| 160: h[scheme] = lambda { |link, page| 161: raise Mechanize::UnsupportedSchemeError, scheme 162: } 163: } 164: 165: @scheme_handlers['http'] = lambda { |link, page| link } 166: @scheme_handlers['https'] = @scheme_handlers['http'] 167: @scheme_handlers['relative'] = @scheme_handlers['http'] 168: @scheme_handlers['file'] = @scheme_handlers['http'] 169: 170: @http = Net::HTTP::Persistent.new 'mechanize' 171: @http.idle_timeout = 5 172: @http.keep_alive = 300 173: end
Adds credentials user, pass for uri. If realm is set the credentials are used only for that realm. If realm is not set the credentials become the default for any realm on that URI.
domain and realm are exclusive as NTLM does not follow RFC 2617. If domain is given it is only used for NTLM authentication.
# File lib/mechanize/http/agent.rb, line 183 183: def add_auth uri, user, password, realm = nil, domain = nil 184: @auth_store.add_auth uri, user, password, realm, domain 185: end
Creates a new output IO by reading input_io in read_size chunks. If the output is over the max_file_buffer size a Tempfile with name is created.
If a block is provided, each chunk of input_io is yielded for further processing.
# File lib/mechanize/http/agent.rb, line 1116 1116: def auto_io name, read_size, input_io 1117: out_io = StringIO.new 1118: 1119: out_io.set_encoding Encoding::BINARY if out_io.respond_to? :set_encoding 1120: 1121: until input_io.eof? do 1122: if StringIO === out_io and use_tempfile? out_io.size then 1123: new_io = make_tempfile name 1124: new_io.write out_io.string 1125: out_io = new_io 1126: end 1127: 1128: chunk = input_io.read read_size 1129: chunk = yield chunk if block_given? 1130: 1131: out_io.write chunk 1132: end 1133: 1134: out_io.rewind 1135: 1136: out_io 1137: end
Equivalent to the browser back button. Returns the most recent page visited.
# File lib/mechanize/http/agent.rb, line 336 336: def back 337: @history.pop 338: end
Path to an OpenSSL CA certificate file
# File lib/mechanize/http/agent.rb, line 1007 1007: def ca_file 1008: @http.ca_file 1009: end
Sets the path to an OpenSSL CA certificate file
# File lib/mechanize/http/agent.rb, line 1012 1012: def ca_file= ca_file 1013: @http.ca_file = ca_file 1014: end
The SSL certificate store used for validating connections
# File lib/mechanize/http/agent.rb, line 1017 1017: def cert_store 1018: @http.cert_store 1019: end
Sets the SSL certificate store used for validating connections
# File lib/mechanize/http/agent.rb, line 1022 1022: def cert_store= cert_store 1023: @http.cert_store = cert_store 1024: end
The client X509 certificate
# File lib/mechanize/http/agent.rb, line 1027 1027: def certificate 1028: @http.certificate 1029: end
Sets the client certificate to given X509 certificate. If a path is given the certificate will be loaded and set.
# File lib/mechanize/http/agent.rb, line 1033 1033: def certificate= certificate 1034: certificate = if OpenSSL::X509::Certificate === certificate then 1035: certificate 1036: else 1037: OpenSSL::X509::Certificate.new File.read certificate 1038: end 1039: 1040: @http.certificate = certificate 1041: end
# File lib/mechanize/http/agent.rb, line 397 397: def connection_for uri 398: case uri.scheme.downcase 399: when 'http', 'https' then 400: return @http 401: when 'file' then 402: return Mechanize::FileConnection.new 403: end 404: end
Decodes a gzip-encoded body_io. If it cannot be decoded, inflate is tried followed by raising an error.
# File lib/mechanize/http/agent.rb, line 410 410: def content_encoding_gunzip body_io 411: log.debug('gzip response') if log 412: 413: zio = Zlib::GzipReader.new body_io 414: out_io = auto_io 'mechanize-gunzip', 16384, zio 415: zio.finish 416: 417: return out_io 418: rescue Zlib::Error => gz_error 419: log.warn "unable to gunzip response: #{gz_error} (#{gz_error.class})" if 420: log 421: 422: body_io.rewind 423: body_io.read 10 424: 425: begin 426: log.warn "trying raw inflate on response" if log 427: return inflate body_io, -Zlib::MAX_WBITS 428: rescue Zlib::Error => e 429: log.error "unable to inflate response: #{e} (#{e.class})" if log 430: raise 431: end 432: ensure 433: # do not close a second time if we failed the first time 434: zio.close if zio and not (zio.closed? or gz_error) 435: body_io.close unless body_io.closed? 436: end
Decodes a deflate-encoded body_io. If it cannot be decoded, raw inflate is tried followed by raising an error.
# File lib/mechanize/http/agent.rb, line 442 442: def content_encoding_inflate body_io 443: log.debug('deflate body') if log 444: 445: return inflate body_io 446: rescue Zlib::Error 447: log.error('unable to inflate response, trying raw deflate') if log 448: 449: body_io.rewind 450: 451: begin 452: return inflate body_io, -Zlib::MAX_WBITS 453: rescue Zlib::Error => e 454: log.error("unable to inflate response: #{e}") if log 455: raise 456: end 457: ensure 458: body_io.close 459: end
Returns the latest page loaded by the agent
# File lib/mechanize/http/agent.rb, line 343 343: def current_page 344: @history.last 345: end
# File lib/mechanize/http/agent.rb, line 461 461: def disable_keep_alive request 462: request['connection'] = 'close' unless @keep_alive 463: end
# File lib/mechanize/http/agent.rb, line 465 465: def enable_gzip request 466: request['accept-encoding'] = if @gzip_enabled 467: 'gzip,deflate,identity' 468: else 469: 'identity' 470: end 471: end
Retrieves uri and parses it into a page or other object according to PluggableParser. If the URI is an HTTP or HTTPS scheme URI the given HTTP method is used to retrieve it, along with the HTTP headers, request params and HTTP referer.
redirects tracks the number of redirects experienced when retrieving the page. If it is over the redirection_limit an error will be raised.
# File lib/mechanize/http/agent.rb, line 210 210: def fetch uri, method = :get, headers = {}, params = [], 211: referer = current_page, redirects = 0 212: referer_uri = referer ? referer.uri : nil 213: 214: uri = resolve uri, referer 215: 216: uri, params = resolve_parameters uri, method, params 217: 218: request = http_request uri, method, params 219: 220: connection = connection_for uri 221: 222: request_auth request, uri 223: 224: disable_keep_alive request 225: enable_gzip request 226: 227: request_language_charset request 228: request_cookies request, uri 229: request_host request, uri 230: request_referer request, uri, referer_uri 231: request_user_agent request 232: request_add_headers request, headers 233: 234: pre_connect request 235: 236: # Consult robots.txt 237: if robots && uri.is_a?(URI::HTTP) 238: robots_allowed?(uri) or raise Mechanize::RobotsDisallowedError.new(uri) 239: end 240: 241: # Add If-Modified-Since if page is in history 242: page = visited_page(uri) 243: 244: if (page = visited_page(uri)) and page.response['Last-Modified'] 245: request['If-Modified-Since'] = page.response['Last-Modified'] 246: end if(@conditional_requests) 247: 248: # Specify timeouts if given 249: connection.open_timeout = @open_timeout if @open_timeout 250: connection.read_timeout = @read_timeout if @read_timeout 251: 252: request_log request 253: 254: response_body_io = nil 255: 256: # Send the request 257: begin 258: response = connection.request(uri, request) { |res| 259: response_log res 260: 261: response_body_io = response_read res, request, uri 262: 263: res 264: } 265: rescue Mechanize::ChunkedTerminationError => e 266: raise unless @ignore_bad_chunking 267: 268: response = e.response 269: response_body_io = e.body_io 270: end 271: 272: hook_content_encoding response, uri, response_body_io 273: 274: response_body_io = response_content_encoding response, response_body_io if 275: request.response_body_permitted? 276: 277: post_connect uri, response, response_body_io 278: 279: page = response_parse response, response_body_io, uri 280: 281: response_cookies response, uri, page 282: 283: meta = response_follow_meta_refresh response, uri, page, redirects 284: return meta if meta 285: 286: case response 287: when Net::HTTPSuccess 288: if robots && page.is_a?(Mechanize::Page) 289: page.parser.noindex? and raise Mechanize::RobotsDisallowedError.new(uri) 290: end 291: 292: page 293: when Mechanize::FileResponse 294: page 295: when Net::HTTPNotModified 296: log.debug("Got cached page") if log 297: visited_page(uri) || page 298: when Net::HTTPRedirection 299: response_redirect response, method, page, redirects, headers, referer 300: when Net::HTTPUnauthorized 301: response_authenticate(response, page, uri, request, headers, params, 302: referer) 303: else 304: raise Mechanize::ResponseCodeError.new(page, 'unhandled response') 305: end 306: end
# File lib/mechanize/http/agent.rb, line 676 676: def get_meta_refresh response, uri, page 677: return nil unless @follow_meta_refresh 678: 679: if page.respond_to?(:meta_refresh) and 680: (redirect = page.meta_refresh.first) then 681: [redirect.delay, redirect.href] unless 682: not @follow_meta_refresh_self and redirect.link_self 683: elsif refresh = response['refresh'] 684: delay, href, link_self = Mechanize::Page::MetaRefresh.parse refresh, uri 685: raise Mechanize::Error, 'Invalid refresh http header' unless delay 686: [delay.to_f, href] unless 687: not @follow_meta_refresh_self and link_self 688: end 689: end
# File lib/mechanize/http/agent.rb, line 362 362: def hook_content_encoding response, uri, response_body_io 363: @content_encoding_hooks.each do |hook| 364: hook.call self, uri, response, response_body_io 365: end 366: end
# File lib/mechanize/http/agent.rb, line 473 473: def http_request uri, method, params = nil 474: case uri.scheme.downcase 475: when 'http', 'https' then 476: klass = Net::HTTP.const_get(method.to_s.capitalize) 477: 478: request ||= klass.new(uri.request_uri) 479: request.body = params.first if params 480: 481: request 482: when 'file' then 483: Mechanize::FileRequest.new uri 484: end 485: end
Reset connections that have not been used in this many seconds
# File lib/mechanize/http/agent.rb, line 1097 1097: def idle_timeout 1098: @http.idle_timeout 1099: end
Sets the connection idle timeout for persistent connections
# File lib/mechanize/http/agent.rb, line 1102 1102: def idle_timeout= timeout 1103: @http.idle_timeout = timeout 1104: end
# File lib/mechanize/http/agent.rb, line 1139 1139: def inflate compressed, window_bits = nil 1140: inflate = Zlib::Inflate.new window_bits 1141: 1142: out_io = auto_io 'mechanize-inflate', 1024, compressed do |chunk| 1143: inflate.inflate chunk 1144: end 1145: 1146: inflate.finish 1147: 1148: out_io 1149: ensure 1150: inflate.close 1151: end
# File lib/mechanize/http/agent.rb, line 1153 1153: def log 1154: @context.log 1155: end
# File lib/mechanize/http/agent.rb, line 1189 1189: def make_tempfile name 1190: io = Tempfile.new name 1191: io.unlink 1192: io.binmode if io.respond_to? :binmode 1193: io 1194: end
# File lib/mechanize/http/agent.rb, line 347 347: def max_history 348: @history.max_size 349: end
# File lib/mechanize/http/agent.rb, line 351 351: def max_history=(length) 352: @history.max_size = length 353: end
Invokes hooks added to post_connect_hooks after a response is returned and the response body is handled.
Yields the context, the uri for the request, the response and the response body.
# File lib/mechanize/http/agent.rb, line 375 375: def post_connect uri, response, body_io # :yields: agent, uri, response, body 376: @post_connect_hooks.each do |hook| 377: begin 378: hook.call self, uri, response, body_io.read 379: ensure 380: body_io.rewind 381: end 382: end 383: end
Invokes hooks added to pre_connect_hooks before a request is made. Yields the agent and the request that will be performed to each hook.
# File lib/mechanize/http/agent.rb, line 389 389: def pre_connect request # :yields: agent, request 390: @pre_connect_hooks.each do |hook| 391: hook.call self, request 392: end 393: end
An OpenSSL private key or the path to a private key
# File lib/mechanize/http/agent.rb, line 1044 1044: def private_key 1045: @http.private_key 1046: end
Sets the client’s private key
# File lib/mechanize/http/agent.rb, line 1049 1049: def private_key= private_key 1050: private_key = if OpenSSL::PKey::PKey === private_key then 1051: private_key 1052: else 1053: OpenSSL::PKey::RSA.new File.read(private_key), @pass 1054: end 1055: 1056: @http.private_key = private_key 1057: end
URI for a proxy connection
# File lib/mechanize/http/agent.rb, line 310 310: def proxy_uri 311: @http.proxy_uri 312: end
# File lib/mechanize/http/agent.rb, line 487 487: def request_add_headers request, headers = {} 488: @request_headers.each do |k,v| 489: request[k] = v 490: end 491: 492: headers.each do |field, value| 493: case field 494: when :etag then request["ETag"] = value 495: when :if_modified_since then request["If-Modified-Since"] = value 496: when Symbol then 497: raise ArgumentError, "unknown header symbol #{field}" 498: else 499: request[field] = value 500: end 501: end 502: end
# File lib/mechanize/http/agent.rb, line 504 504: def request_auth request, uri 505: base_uri = uri + '/' 506: schemes = @authenticate_methods[base_uri] 507: 508: if realm = schemes[:digest].find { |r| r.uri == base_uri } then 509: request_auth_digest request, uri, realm, base_uri, false 510: elsif realm = schemes[:iis_digest].find { |r| r.uri == base_uri } then 511: request_auth_digest request, uri, realm, base_uri, true 512: elsif realm = schemes[:basic].find { |r| r.uri == base_uri } then 513: user, password, = @auth_store.credentials_for uri, realm.realm 514: request.basic_auth user, password 515: end 516: end
# File lib/mechanize/http/agent.rb, line 518 518: def request_auth_digest request, uri, realm, base_uri, iis 519: challenge = @digest_challenges[realm] 520: 521: user, password, = @auth_store.credentials_for uri, realm.realm 522: uri.user = user 523: uri.password = password 524: 525: auth = @digest_auth.auth_header uri, challenge.to_s, request.method, iis 526: request['Authorization'] = auth 527: end
# File lib/mechanize/http/agent.rb, line 539 539: def request_host request, uri 540: port = [80, 443].include?(uri.port.to_i) ? nil : uri.port 541: host = uri.host 542: 543: request['Host'] = [host, port].compact.join ':' 544: end
# File lib/mechanize/http/agent.rb, line 546 546: def request_language_charset request 547: request['accept-charset'] = 'ISO-8859-1,utf-8;q=0.7,*;q=0.7' 548: request['accept-language'] = 'en-us,en;q=0.5' 549: end
Log specified headers for the request
# File lib/mechanize/http/agent.rb, line 552 552: def request_log request 553: return unless log 554: 555: log.info("#{request.class}: #{request.path}") 556: 557: request.each_header do |k, v| 558: log.debug("request-header: #{k} => #{v}") 559: end 560: end
Sets a Referer header. Fragment part is removed as demanded by RFC 2616 14.36, and user information part is removed just like major browsers do.
# File lib/mechanize/http/agent.rb, line 565 565: def request_referer request, uri, referer 566: return unless referer 567: return if 'https'.casecmp(referer.scheme) == 0 and 568: 'https'.casecmp(uri.scheme) != 0 569: if referer.fragment || referer.user || referer.password 570: referer = referer.dup 571: referer.fragment = referer.user = referer.password = nil 572: end 573: request['Referer'] = referer 574: end
# File lib/mechanize/http/agent.rb, line 576 576: def request_user_agent request 577: request['User-Agent'] = @user_agent if @user_agent 578: end
# File lib/mechanize/http/agent.rb, line 580 580: def resolve(uri, referer = current_page) 581: referer_uri = referer && referer.uri 582: if uri.is_a?(URI) 583: uri = uri.dup 584: elsif uri.nil? 585: if referer_uri 586: return referer_uri 587: end 588: raise ArgumentError, "absolute URL needed (not nil)" 589: else 590: url = uri.to_s.strip 591: if url.empty? 592: if referer_uri 593: return referer_uri.dup.tap { |u| u.fragment = nil } 594: end 595: raise ArgumentError, "absolute URL needed (not #{uri.inspect})" 596: end 597: 598: url.gsub!(/[^#{0.chr}-#{126.chr}]/) { |match| 599: if RUBY_VERSION >= "1.9.0" 600: Mechanize::Util.uri_escape(match) 601: else 602: begin 603: sprintf('%%%X', match.unpack($KCODE == 'UTF8' ? 'U' : 'C').first) 604: rescue ArgumentError 605: # workaround for ruby 1.8 with -Ku but ISO-8859-1 characters in 606: # URIs. See #227. I can't wait to drop 1.8 support 607: sprintf('%%%X', match.unpack('C').first) 608: end 609: end 610: } 611: 612: escaped_url = Mechanize::Util.html_unescape( 613: url.split(/((?:%[0-9A-Fa-f]{2})+|#)/).each_slice(2).map { |x, y| 614: "#{WEBrick::HTTPUtils.escape(x)}#{y}" 615: }.join('') 616: ) 617: 618: begin 619: uri = URI.parse(escaped_url) 620: rescue 621: uri = URI.parse(WEBrick::HTTPUtils.escape(escaped_url)) 622: end 623: end 624: 625: scheme = uri.relative? ? 'relative' : uri.scheme.downcase 626: uri = @scheme_handlers[scheme].call(uri, referer) 627: 628: if referer_uri 629: if uri.path.length == 0 && uri.relative? 630: uri.path = referer_uri.path 631: end 632: end 633: 634: uri.path = '/' if uri.path.length == 0 635: 636: if uri.relative? 637: raise ArgumentError, "absolute URL needed (not #{uri})" unless 638: referer_uri 639: 640: if referer.respond_to?(:bases) && referer.parser && 641: (lbase = referer.bases.last) && lbase.uri && lbase.uri.absolute? 642: base = lbase 643: else 644: base = nil 645: end 646: 647: uri = referer_uri + (base ? base.uri : referer_uri) + uri 648: # Strip initial "/.." bits from the path 649: uri.path.sub!(/^(\/\.\.)+(?=\/)/, '') 650: end 651: 652: unless ['http', 'https', 'file'].include?(uri.scheme.downcase) 653: raise ArgumentError, "unsupported scheme: #{uri.scheme}" 654: end 655: 656: uri 657: end
# File lib/mechanize/http/agent.rb, line 659 659: def resolve_parameters uri, method, parameters 660: case method 661: when :head, :get, :delete, :trace then 662: if parameters and parameters.length > 0 663: uri.query ||= '' 664: uri.query << '&' if uri.query.length > 0 665: uri.query << Mechanize::Util.build_query_string(parameters) 666: end 667: 668: return uri, nil 669: end 670: 671: return uri, parameters 672: end
# File lib/mechanize/http/agent.rb, line 691 691: def response_authenticate(response, page, uri, request, headers, params, 692: referer) 693: www_authenticate = response['www-authenticate'] 694: 695: unless www_authenticate = response['www-authenticate'] then 696: message = 'WWW-Authenticate header missing in response' 697: raise Mechanize::UnauthorizedError.new(page, nil, message) 698: end 699: 700: challenges = @authenticate_parser.parse www_authenticate 701: 702: unless @auth_store.credentials? uri, challenges then 703: message = "no credentials found, provide some with #add_auth" 704: raise Mechanize::UnauthorizedError.new(page, challenges, message) 705: end 706: 707: if challenge = challenges.find { |c| c.scheme =~ /^Digest$/ } then 708: realm = challenge.realm uri 709: 710: auth_scheme = if response['server'] =~ /Microsoft-IIS/ then 711: :iis_digest 712: else 713: :digest 714: end 715: 716: existing_realms = @authenticate_methods[realm.uri][auth_scheme] 717: 718: if existing_realms.include? realm 719: message = 'Digest authentication failed' 720: raise Mechanize::UnauthorizedError.new(page, challeges, message) 721: end 722: 723: existing_realms << realm 724: @digest_challenges[realm] = challenge 725: elsif challenge = challenges.find { |c| c.scheme == 'NTLM' } then 726: existing_realms = @authenticate_methods[uri + '/'][:ntlm] 727: 728: if existing_realms.include?(realm) and not challenge.params then 729: message = 'NTLM authentication failed' 730: raise Mechanize::UnauthorizedError.new(page, challenges, message) 731: end 732: 733: existing_realms << realm 734: 735: if challenge.params then 736: type_2 = Net::NTLM::Message.decode64 challenge.params 737: 738: user, password, domain = @auth_store.credentials_for uri, nil 739: 740: type_3 = type_2.response({ :user => user, :password => password, 741: :domain => domain }, 742: { :ntlmv2 => true }).encode64 743: 744: headers['Authorization'] = "NTLM #{type_3}" 745: else 746: type_1 = Net::NTLM::Message::Type1.new.encode64 747: headers['Authorization'] = "NTLM #{type_1}" 748: end 749: elsif challenge = challenges.find { |c| c.scheme == 'Basic' } then 750: realm = challenge.realm uri 751: 752: existing_realms = @authenticate_methods[realm.uri][:basic] 753: 754: if existing_realms.include? realm then 755: message = 'Basic authentication failed' 756: raise Mechanize::UnauthorizedError.new(page, challenges, message) 757: end 758: 759: existing_realms << realm 760: else 761: message = 'unsupported authentication scheme' 762: raise Mechanize::UnauthorizedError.new(page, challenges, message) 763: end 764: 765: fetch uri, request.method.downcase.to_sym, headers, params, referer 766: end
# File lib/mechanize/http/agent.rb, line 768 768: def response_content_encoding response, body_io 769: length = response.content_length || 770: case body_io 771: when Tempfile, IO then 772: body_io.stat.size 773: else 774: body_io.length 775: end 776: 777: return body_io if length.zero? 778: 779: out_io = case response['Content-Encoding'] 780: when nil, 'none', '7bit' then 781: body_io 782: when 'deflate' then 783: content_encoding_inflate body_io 784: when 'gzip', 'x-gzip' then 785: content_encoding_gunzip body_io 786: else 787: raise Mechanize::Error, 788: "unsupported content-encoding: #{response['Content-Encoding']}" 789: end 790: 791: out_io.flush 792: out_io.rewind 793: 794: out_io 795: rescue Zlib::Error => e 796: message = "error handling content-encoding #{response['Content-Encoding']}:" 797: message << " #{e.message} (#{e.class})" 798: raise Mechanize::Error, message 799: ensure 800: begin 801: if Tempfile === body_io and 802: (StringIO === out_io or out_io.path != body_io.path) then 803: body_io.close! 804: end 805: rescue IOError 806: # HACK ruby 1.8 raises IOError when closing the stream 807: end 808: end
# File lib/mechanize/http/agent.rb, line 837 837: def response_follow_meta_refresh response, uri, page, redirects 838: delay, new_url = get_meta_refresh(response, uri, page) 839: return nil unless delay 840: new_url = new_url ? resolve(new_url, page) : uri 841: 842: raise Mechanize::RedirectLimitReachedError.new(page, redirects) if 843: redirects + 1 > @redirection_limit 844: 845: sleep delay 846: @history.push(page, page.uri) 847: fetch new_url, :get, {}, [], 848: Mechanize::Page.new, redirects 849: end
# File lib/mechanize/http/agent.rb, line 851 851: def response_log response 852: return unless log 853: 854: log.info("status: #{response.class} #{response.http_version} " "#{response.code} #{response.message}") 855: 856: response.each_header do |k, v| 857: log.debug("response-header: #{k} => #{v}") 858: end 859: end
# File lib/mechanize/http/agent.rb, line 862 862: def response_parse response, body_io, uri 863: @context.parse uri, response, body_io 864: end
# File lib/mechanize/http/agent.rb, line 866 866: def response_read response, request, uri 867: content_length = response.content_length 868: 869: if use_tempfile? content_length then 870: body_io = make_tempfile 'mechanize-raw' 871: else 872: body_io = StringIO.new 873: end 874: 875: body_io.set_encoding Encoding::BINARY if body_io.respond_to? :set_encoding 876: total = 0 877: 878: begin 879: response.read_body { |part| 880: total += part.length 881: 882: if StringIO === body_io and use_tempfile? total then 883: new_io = make_tempfile 'mechanize-raw' 884: 885: new_io.write body_io.string 886: 887: body_io = new_io 888: end 889: 890: body_io.write(part) 891: log.debug("Read #{part.length} bytes (#{total} total)") if log 892: } 893: rescue EOFError => e 894: # terminating CRLF might be missing, let the user check the document 895: raise unless response.chunked? and total.nonzero? 896: 897: body_io.rewind 898: raise Mechanize::ChunkedTerminationError.new(e, response, body_io, uri, 899: @context) 900: rescue Net::HTTP::Persistent::Error => e 901: body_io.rewind 902: raise Mechanize::ResponseReadError.new(e, response, body_io, uri, 903: @context) 904: end 905: 906: body_io.flush 907: body_io.rewind 908: 909: raise Mechanize::ResponseCodeError.new(response, uri) if 910: Net::HTTPUnknownResponse === response 911: 912: content_length = response.content_length 913: 914: unless Net::HTTP::Head === request or Net::HTTPRedirection === response then 915: raise EOFError, "Content-Length (#{content_length}) does not match " "response body length (#{body_io.length})" if 916: content_length and content_length != body_io.length 917: end 918: 919: body_io 920: end
# File lib/mechanize/http/agent.rb, line 923 923: def response_redirect(response, method, page, redirects, headers, 924: referer = current_page) 925: case @redirect_ok 926: when true, :all 927: # shortcut 928: when false, nil 929: return page 930: when :permanent 931: return page unless Net::HTTPMovedPermanently === response 932: end 933: 934: log.info("follow redirect to: #{response['Location']}") if log 935: 936: raise Mechanize::RedirectLimitReachedError.new(page, redirects) if 937: redirects + 1 > @redirection_limit 938: 939: redirect_method = method == :head ? :head : :get 940: 941: # Make sure we are not copying over the POST headers from the original request 942: ['Content-Length', 'Content-MD5', 'Content-Type'].each do |key| 943: headers.delete key 944: end 945: 946: @history.push(page, page.uri) 947: new_uri = resolve response['Location'].to_s, page 948: 949: fetch new_uri, redirect_method, headers, [], referer, redirects + 1 950: end
Retry non-idempotent requests?
# File lib/mechanize/http/agent.rb, line 315 315: def retry_change_requests 316: @http.retry_change_requests 317: end
Retry non-idempotent requests
# File lib/mechanize/http/agent.rb, line 321 321: def retry_change_requests= retri 322: @http.retry_change_requests = retri 323: end
# File lib/mechanize/http/agent.rb, line 961 961: def robots= value 962: require 'webrobots' if value 963: @webrobots = nil if value != @robots 964: @robots = value 965: end
Tests if this agent is allowed to access url, consulting the site’s robots.txt.
# File lib/mechanize/http/agent.rb, line 971 971: def robots_allowed? uri 972: return true if uri.request_uri == '/robots.txt' 973: 974: webrobots.allowed? uri 975: end
Opposite of robots_allowed?
# File lib/mechanize/http/agent.rb, line 979 979: def robots_disallowed? url 980: !robots_allowed? url 981: end
Returns an error object if there is an error in fetching or parsing robots.txt of the site url.
# File lib/mechanize/http/agent.rb, line 985 985: def robots_error(url) 986: webrobots.error(url) 987: end
Raises the error if there is an error in fetching or parsing robots.txt of the site url.
# File lib/mechanize/http/agent.rb, line 991 991: def robots_error!(url) 992: webrobots.error!(url) 993: end
Removes robots.txt cache for the site url.
# File lib/mechanize/http/agent.rb, line 996 996: def robots_reset(url) 997: webrobots.reset(url) 998: end
Sets the proxy address, port, user, and password addr should be a host, with no “http://”, port may be a port number, service name or port number string.
# File lib/mechanize/http/agent.rb, line 1162 1162: def set_proxy addr, port, user = nil, pass = nil 1163: unless addr and port then 1164: @http.proxy = nil 1165: 1166: return 1167: end 1168: 1169: unless Integer === port then 1170: begin 1171: port = Socket.getservbyname port 1172: rescue SocketError 1173: begin 1174: port = Integer port 1175: rescue ArgumentError 1176: raise ArgumentError, "invalid value for port: #{port.inspect}" 1177: end 1178: end 1179: end 1180: 1181: proxy_uri = URI "http://#{addr}" 1182: proxy_uri.port = port 1183: proxy_uri.user = user if user 1184: proxy_uri.password = pass if pass 1185: 1186: @http.proxy = proxy_uri 1187: end
SSL version to use
# File lib/mechanize/http/agent.rb, line 1060 1060: def ssl_version 1061: @http.ssl_version 1062: end
Sets the SSL version to use
# File lib/mechanize/http/agent.rb, line 1065 1065: def ssl_version= ssl_version 1066: @http.ssl_version = ssl_version 1067: end
# File lib/mechanize/http/agent.rb, line 1196 1196: def use_tempfile? size 1197: return false unless @max_file_buffer 1198: return false unless size 1199: 1200: size >= @max_file_buffer 1201: end
# File lib/mechanize/http/agent.rb, line 327 327: def user_agent= user_agent 328: @webrobots = nil if user_agent != @user_agent 329: @user_agent = user_agent 330: end
A callback for additional certificate verification. See OpenSSL::SSL::SSLContext#verify_callback
The callback can be used for debugging or to ignore errors by always returning true. Specifying nil uses the default method that was valid when the SSLContext was created
# File lib/mechanize/http/agent.rb, line 1075 1075: def verify_callback 1076: @http.verify_callback 1077: end
Sets the certificate verify callback
# File lib/mechanize/http/agent.rb, line 1080 1080: def verify_callback= verify_callback 1081: @http.verify_callback = verify_callback 1082: end
How to verify SSL connections. Defaults to VERIFY_PEER
# File lib/mechanize/http/agent.rb, line 1085 1085: def verify_mode 1086: @http.verify_mode 1087: end
Sets the mode for verifying SSL connections
# File lib/mechanize/http/agent.rb, line 1090 1090: def verify_mode= verify_mode 1091: @http.verify_mode = verify_mode 1092: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.