StringScanner
The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay' c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;" for text, kind in c_scanner puts text if kind == :operator end # prints: (*==)++;
OK, this is a very simple example :) You can also use map, any?, find and even sort_by, if you want.
Raised if a Scanner fails while scanning
The default options for all scanner classes.
Define @default_options for subclasses.
The encoding used internally by this scanner.
# File lib/coderay/scanner.rb, line 89 89: def encoding name = 'UTF-8' 90: @encoding ||= defined?(Encoding.find) && Encoding.find(name) 91: end
The typical filename suffix for this scanner’s language.
# File lib/coderay/scanner.rb, line 84 84: def file_extension extension = lang 85: @file_extension ||= extension.to_s 86: end
Create a new Scanner.
code is the input String and is handled by the superclass StringScanner.
options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 143 143: def initialize code = '', options = {} 144: if self.class == Scanner 145: raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." 146: end 147: 148: @options = self.class::DEFAULT_OPTIONS.merge options 149: 150: super self.class.normalize(code) 151: 152: @tokens = options[:tokens] || Tokens.new 153: @tokens.scanner = self if @tokens.respond_to? :scanner= 154: 155: setup 156: end
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
# File lib/coderay/scanner.rb, line 69 69: def normalize code 70: # original = code 71: code = code.to_s unless code.is_a? ::String 72: return code if code.empty? 73: 74: if code.respond_to? :encoding 75: code = encode_with_encoding code, self.encoding 76: else 77: code = to_unix code 78: end 79: # code = code.dup if code.eql? original 80: code 81: end
# File lib/coderay/scanner.rb, line 100 100: def encode_with_encoding code, target_encoding 101: if code.encoding == target_encoding 102: if code.valid_encoding? 103: return to_unix(code) 104: else 105: source_encoding = guess_encoding code 106: end 107: else 108: source_encoding = code.encoding 109: end 110: # print "encode_with_encoding from #{source_encoding} to #{target_encoding}" 111: code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace 112: end
# File lib/coderay/scanner.rb, line 118 118: def guess_encoding s 119: #:nocov: 120: IO.popen("file -b --mime -", "w+") do |file| 121: file.write s[0, 1024] 122: file.close_write 123: begin 124: Encoding.find file.gets[/charset=([-\w]+)/, 1] 125: rescue ArgumentError 126: Encoding::BINARY 127: end 128: end 129: #:nocov: 130: end
The string in binary encoding.
To be used with #, which is the index of the byte the scanner will scan next.
# File lib/coderay/scanner.rb, line 243 243: def binary_string 244: @binary_string ||= 245: if string.respond_to?(:bytesize) && string.bytesize != string.size 246: #:nocov: 247: string.dup.force_encoding('binary') 248: #:nocov: 249: else 250: string 251: end 252: end
The current column position of the scanner, starting with 1. See also: #.
# File lib/coderay/scanner.rb, line 234 234: def column pos = self.pos 235: return 1 if pos <= 0 236: pos - (binary_string.rindex(\n\, pos - 1) || 1) 237: end
Traverse the tokens.
# File lib/coderay/scanner.rb, line 217 217: def each &block 218: tokens.each(&block) 219: end
the default file extension for this scanner
# File lib/coderay/scanner.rb, line 178 178: def file_extension 179: self.class.file_extension 180: end
the Plugin ID for this scanner
# File lib/coderay/scanner.rb, line 173 173: def lang 174: self.class.lang 175: end
The current line position of the scanner, starting with 1. See also: #.
Beware, this is implemented inefficiently. It should be used for debugging only.
# File lib/coderay/scanner.rb, line 227 227: def line pos = self.pos 228: return 1 if pos <= 0 229: binary_string[0...pos].count("\n") + 1 230: end
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
# File lib/coderay/scanner.rb, line 160 160: def reset 161: super 162: reset_instance 163: end
Set a new string to be scanned.
# File lib/coderay/scanner.rb, line 166 166: def string= code 167: code = self.class.normalize(code) 168: super code 169: reset_instance 170: end
Scan the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 183 183: def tokenize source = nil, options = {} 184: options = @options.merge(options) 185: @tokens = options[:tokens] || @tokens || Tokens.new 186: @tokens.scanner = self if @tokens.respond_to? :scanner= 187: case source 188: when Array 189: self.string = self.class.normalize(source.join) 190: when nil 191: reset 192: else 193: self.string = self.class.normalize(source) 194: end 195: 196: begin 197: scan_tokens @tokens, options 198: rescue => e 199: message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] 200: raise_inspect e.message, @tokens, message, 30, e.backtrace 201: end 202: 203: @cached_tokens = @tokens 204: if source.is_a? Array 205: @tokens.split_into_parts(*source.map { |part| part.size }) 206: else 207: @tokens 208: end 209: end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 281 281: def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller 282: raise ScanError, ***ERROR in %s: %s (after %d tokens)tokens:%scurrent line: %d column: %d pos: %dmatched: %p state: %pbol? = %p, eos? = %psurrounding code:%p ~~ %p***ERROR*** % [ 283: File.basename(caller[0]), 284: msg, 285: tokens.respond_to?(:size) ? tokens.size : 0, 286: tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '', 287: line, column, pos, 288: matched, state, bol?, eos?, 289: binary_string[pos - ambit, ambit], 290: binary_string[pos, ambit], 291: ], backtrace 292: end
Resets the scanner.
# File lib/coderay/scanner.rb, line 274 274: def reset_instance 275: @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens] 276: @cached_tokens = nil 277: @binary_string = nil if defined? @binary_string 278: end
Shorthand for scan_until(/z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 314 314: def scan_rest 315: rest = self.rest 316: terminate 317: rest 318: end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 269 269: def scan_tokens tokens, options # :doc: 270: raise NotImplementedError, "#{self.class}#scan_tokens not implemented." 271: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.