Parent

Class Index [+]

Quicksearch

HTML::Selector

Selects HTML elements using CSS 2 selectors.

The Selector class uses CSS selector expressions to match and select HTML elements.

For example:

  selector = HTML::Selector.new "form.login[action=/login]"

creates a new selector that matches any form element with the class login and an attribute action with the value /login.

Matching Elements

Use the # method to determine if an element matches the selector.

For simple selectors, the method returns an array with that element, or nil if the element does not match. For complex selectors (see below) the method returns an array with all matched elements, of nil if no match found.

For example:

  if selector.match(element)
    puts "Element is a login form"
  end

Selecting Elements

Use the # method to select all matching elements starting with one element and going through all children in depth-first order.

This method returns an array of all matching elements, an empty array if no match is found

For example:

  selector = HTML::Selector.new "input[type=text]"
  matches = selector.select(element)
  matches.each do |match|
    puts "Found text field with name #{match.attributes['name']}"
  end

Expressions

Selectors can match elements using any of the following criteria:

When using a combination of the above, the element name comes first followed by identifier, class names, attributes, pseudo classes and negation in any order. Do not separate these parts with spaces! Space separation is used for descendant selectors.

For example:

  selector = HTML::Selector.new "form.login[action=/login]"

The matched element must be of type form and have the class login. It may have other classes, but the class login is required to match. It must also have an attribute called action with the value /login.

This selector will match the following element:

  <form class="login form" method="post" action="/login">

but will not match the element:

  <form method="post" action="/logout">

Attribute Values

Several operators are supported for matching attributes:

For example, the following two selectors match the same element:

  #my_id
  [id=my_id]

and so do the following two selectors:

  .my_class
  [class~=my_class]

Alternatives, siblings, children

Complex selectors use a combination of expressions to match elements:

Since children and sibling selectors may match more than one element given the first element, the # method may return more than one match.

Pseudo classes

Pseudo classes were introduced in CSS 3. They are most often used to select elements in a given position:

As you can see, :nth-child pseudo class and its variant can get quite tricky and the CSS specification doesn’t do a much better job explaining it. But after reading the examples and trying a few combinations, it’s easy to figure out.

For example:

  table tr:nth-child(odd)

Selects every second row in the table starting with the first one.

  div p:nth-child(4)

Selects the fourth paragraph in the div, but not if the div contains other elements, since those are also counted.

  div p:nth-of-type(4)

Selects the fourth paragraph in the div, counting only paragraphs, and ignoring all other elements.

  div p:nth-of-type(-n+4)

Selects the first four paragraphs, ignoring all others.

And you can always select an element that matches one set of rules but not another using :not. For example:

  p:not(.post)

Matches all paragraphs that do not have the class .post.

Substitution Values

You can use substitution with identifiers, class names and element values. A substitution takes the form of a question mark (?) and uses the next value in the argument list following the CSS expression.

The substitution value may be a string or a regular expression. All other values are converted to strings.

For example:

  selector = HTML::Selector.new "#?", /^\d+$/

matches any element whose identifier consists of one or more digits.

See www.w3.org/TR/css3-selectors/

Public Class Methods

for_class(cls) => selector click to toggle source

Creates a new selector for the given class name.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 216
216:       def for_class(cls)
217:         self.new([".?", cls])
218:       end
for_id(id) => selector click to toggle source

Creates a new selector for the given id.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 225
225:       def for_id(id)
226:         self.new(["#?", id])
227:       end
]) => selector click to toggle source

Creates a new selector from a CSS 2 selector expression.

The first argument is the selector expression. All other arguments are used for value substitution.

Throws InvalidSelectorError is the selector expression is invalid.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 241
241:     def initialize(selector, *values)
242:       raise ArgumentError, "CSS expression cannot be empty" if selector.empty?
243:       @source = ""
244:       values = values[0] if values.size == 1 && values[0].is_a?(Array)
245: 
246:       # We need a copy to determine if we failed to parse, and also
247:       # preserve the original pass by-ref statement.
248:       statement = selector.strip.dup
249: 
250:       # Create a simple selector, along with negation.
251:       simple_selector(statement, values).each { |name, value| instance_variable_set("@#{name}", value) }
252: 
253:       @alternates = []
254:       @depends = nil
255: 
256:       # Alternative selector.
257:       if statement.sub!(/^\s*,\s*/, "")
258:         second = Selector.new(statement, values)
259:         @alternates << second
260:         # If there are alternate selectors, we group them in the top selector.
261:         if alternates = second.instance_variable_get(:@alternates)
262:           second.instance_variable_set(:@alternates, [])
263:           @alternates.concat alternates
264:         end
265:         @source << " , " << second.to_s
266:       # Sibling selector: create a dependency into second selector that will
267:       # match element immediately following this one.
268:       elsif statement.sub!(/^\s*\+\s*/, "")
269:         second = next_selector(statement, values)
270:         @depends = lambda do |element, first|
271:           if element = next_element(element)
272:             second.match(element, first)
273:           end
274:         end
275:         @source << " + " << second.to_s
276:       # Adjacent selector: create a dependency into second selector that will
277:       # match all elements following this one.
278:       elsif statement.sub!(/^\s*~\s*/, "")
279:         second = next_selector(statement, values)
280:         @depends = lambda do |element, first|
281:           matches = []
282:           while element = next_element(element)
283:             if subset = second.match(element, first)
284:               if first && !subset.empty?
285:                 matches << subset.first
286:                 break
287:               else
288:                 matches.concat subset
289:               end
290:             end
291:           end
292:           matches.empty? ? nil : matches
293:         end
294:         @source << " ~ " << second.to_s
295:       # Child selector: create a dependency into second selector that will
296:       # match a child element of this one.
297:       elsif statement.sub!(/^\s*>\s*/, "")
298:         second = next_selector(statement, values)
299:         @depends = lambda do |element, first|
300:           matches = []
301:           element.children.each do |child|
302:             if child.tag? && subset = second.match(child, first)
303:               if first && !subset.empty?
304:                 matches << subset.first
305:                 break
306:               else
307:                 matches.concat subset
308:               end
309:             end
310:           end
311:           matches.empty? ? nil : matches
312:         end
313:         @source << " > " << second.to_s
314:       # Descendant selector: create a dependency into second selector that
315:       # will match all descendant elements of this one. Note,
316:       elsif statement =~ /^\s+\S+/ && statement != selector
317:         second = next_selector(statement, values)
318:         @depends = lambda do |element, first|
319:           matches = []
320:           stack = element.children.reverse
321:           while node = stack.pop
322:             next unless node.tag?
323:             if subset = second.match(node, first)
324:               if first && !subset.empty?
325:                 matches << subset.first
326:                 break
327:               else
328:                 matches.concat subset
329:               end
330:             elsif children = node.children
331:               stack.concat children.reverse
332:             end
333:           end
334:           matches.empty? ? nil : matches
335:         end
336:         @source << " " << second.to_s
337:       else
338:         # The last selector is where we check that we parsed
339:         # all the parts.
340:         unless statement.empty? || statement.strip.empty?
341:           raise ArgumentError, "Invalid selector: #{statement}"
342:         end
343:       end
344:     end

Public Instance Methods

match(element, first?) => array or nil click to toggle source

Matches an element against the selector.

For a simple selector this method returns an array with the element if the element matches, nil otherwise.

For a complex selector (sibling and descendant) this method returns an array with all matching elements, nil if no match is found.

Use +first_only=true+ if you are only interested in the first element.

For example:

  if selector.match(element)
    puts "Element is a login form"
  end
     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 365
365:     def match(element, first_only = false)
366:       # Match element if no element name or element name same as element name
367:       if matched = (!@tag_name || @tag_name == element.name)
368:         # No match if one of the attribute matches failed
369:         for attr in @attributes
370:           if element.attributes[attr[0]] !~ attr[1]
371:             matched = false
372:             break
373:           end
374:         end
375:       end
376: 
377:       # Pseudo class matches (nth-child, empty, etc).
378:       if matched
379:         for pseudo in @pseudo
380:           unless pseudo.call(element)
381:             matched = false
382:             break
383:           end
384:         end
385:       end
386: 
387:       # Negation. Same rules as above, but we fail if a match is made.
388:       if matched && @negation
389:         for negation in @negation
390:           if negation[:tag_name] == element.name
391:             matched = false
392:           else
393:             for attr in negation[:attributes]
394:               if element.attributes[attr[0]] =~ attr[1]
395:                 matched = false
396:                 break
397:               end
398:             end
399:           end
400:           if matched
401:             for pseudo in negation[:pseudo]
402:               if pseudo.call(element)
403:                 matched = false
404:                 break
405:               end
406:             end
407:           end
408:           break unless matched
409:         end
410:       end
411: 
412:       # If element matched but depends on another element (child,
413:       # sibling, etc), apply the dependent matches instead.
414:       if matched && @depends
415:         matches = @depends.call(element, first_only)
416:       else
417:         matches = matched ? [element] : nil
418:       end
419: 
420:       # If this selector is part of the group, try all the alternative
421:       # selectors (unless first_only).
422:       if !first_only || !matches
423:         @alternates.each do |alternate|
424:           break if matches && first_only
425:           if subset = alternate.match(element, first_only)
426:             if matches
427:               matches.concat subset
428:             else
429:               matches = subset
430:             end
431:           end
432:         end
433:       end
434: 
435:       matches
436:     end
next_element(element, name = nil) click to toggle source

Return the next element after this one. Skips sibling text nodes.

With the name argument, returns the next element with that name, skipping other sibling elements.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 495
495:     def next_element(element, name = nil)
496:       if siblings = element.parent.children
497:         found = false
498:         siblings.each do |node|
499:           if node.equal?(element)
500:             found = true
501:           elsif found && node.tag?
502:             return node if (name.nil? || node.name == name)
503:           end
504:         end
505:       end
506:       nil
507:     end
select(root) => array click to toggle source

Selects and returns an array with all matching elements, beginning with one node and traversing through all children depth-first. Returns an empty array if no match is found.

The root node may be any element in the document, or the document itself.

For example:

  selector = HTML::Selector.new "input[type=text]"
  matches = selector.select(element)
  matches.each do |match|
    puts "Found text field with name #{match.attributes['name']}"
  end
     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 455
455:     def select(root)
456:       matches = []
457:       stack = [root]
458:       while node = stack.pop
459:         if node.tag? && subset = match(node, false)
460:           subset.each do |match|
461:             matches << match unless matches.any? { |item| item.equal?(match) }
462:           end
463:         elsif children = node.children
464:           stack.concat children.reverse
465:         end
466:       end
467:       matches
468:     end
select_first(root) click to toggle source

Similar to # but returns the first matching element. Returns nil if no element matches the selector.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 473
473:     def select_first(root)
474:       stack = [root]
475:       while node = stack.pop
476:         if node.tag? && subset = match(node, true)
477:           return subset.first if !subset.empty?
478:         elsif children = node.children
479:           stack.concat children.reverse
480:         end
481:       end
482:       nil
483:     end

Protected Instance Methods

attribute_match(equality, value) click to toggle source

Create a regular expression to match an attribute value based on the equality operator (=, ^=, |=, etc).

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 689
689:     def attribute_match(equality, value)
690:       regexp = value.is_a?(Regexp) ? value : Regexp.escape(value.to_s)
691:       case equality
692:         when "=" then
693:           # Match the attribute value in full
694:           Regexp.new("^#{regexp}$")
695:         when "~=" then
696:           # Match a space-separated word within the attribute value
697:           Regexp.new("(^|\s)#{regexp}($|\s)")
698:         when "^="
699:           # Match the beginning of the attribute value
700:           Regexp.new("^#{regexp}")
701:         when "$="
702:           # Match the end of the attribute value
703:           Regexp.new("#{regexp}$")
704:         when "*="
705:           # Match substring of the attribute value
706:           regexp.is_a?(Regexp) ? regexp : Regexp.new(regexp)
707:         when "|=" then
708:           # Match the first space-separated item of the attribute value
709:           Regexp.new("^#{regexp}($|\s)")
710:         else
711:           raise InvalidSelectorError, "Invalid operation/value" unless value.empty?
712:           # Match all attributes values (existence check)
713:           //
714:       end
715:     end
next_selector(statement, values) click to toggle source

Called to create a dependent selector (sibling, descendant, etc). Passes the remainder of the statement that will be reduced to zero eventually, and array of substitution values.

This method is called from four places, so it helps to put it here for reuse. The only logic deals with the need to detect comma separators (alternate) and apply them to the selector group of the top selector.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 802
802:     def next_selector(statement, values)
803:       second = Selector.new(statement, values)
804:       # If there are alternate selectors, we group them in the top selector.
805:       if alternates = second.instance_variable_get(:@alternates)
806:         second.instance_variable_set(:@alternates, [])
807:         @alternates.concat alternates
808:       end
809:       second
810:     end
nth_child(a, b, of_type, reverse) click to toggle source

Returns a lambda that can match an element against the nth-child pseudo class, given the following arguments:

  • a — Value of a part.

  • b — Value of b part.

  • of_type — True to test only elements of this type (of-type).

  • reverse — True to count in reverse order (last-).

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 724
724:     def nth_child(a, b, of_type, reverse)
725:       # a = 0 means select at index b, if b = 0 nothing selected
726:       return lambda { |element| false } if a == 0 && b == 0
727:       # a < 0 and b < 0 will never match against an index
728:       return lambda { |element| false } if a < 0 && b < 0
729:       b = a + b + 1 if b < 0   # b < 0 just picks last element from each group
730:       b -= 1 unless b == 0  # b == 0 is same as b == 1, otherwise zero based
731:       lambda do |element|
732:         # Element must be inside parent element.
733:         return false unless element.parent && element.parent.tag?
734:         index = 0
735:         # Get siblings, reverse if counting from last.
736:         siblings = element.parent.children
737:         siblings = siblings.reverse if reverse
738:         # Match element name if of-type, otherwise ignore name.
739:         name = of_type ? element.name : nil
740:         found = false
741:         for child in siblings
742:           # Skip text nodes/comments.
743:           if child.tag? && (name == nil || child.name == name)
744:             if a == 0
745:               # Shortcut when a == 0 no need to go past count
746:               if index == b
747:                 found = child.equal?(element)
748:                 break
749:               end
750:             elsif a < 0
751:               # Only look for first b elements
752:               break if index > b
753:               if child.equal?(element)
754:                 found = (index % a) == 0
755:                 break
756:               end
757:             else
758:               # Otherwise, break if child found and count ==  an+b
759:               if child.equal?(element)
760:                 found = (index % a) == b
761:                 break
762:               end
763:             end
764:             index += 1
765:           end
766:         end
767:         found
768:       end
769:     end
only_child(of_type) click to toggle source

Creates a only child lambda. Pass of-type to only look at elements of its type.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 774
774:     def only_child(of_type)
775:       lambda do |element|
776:         # Element must be inside parent element.
777:         return false unless element.parent && element.parent.tag?
778:         name = of_type ? element.name : nil
779:         other = false
780:         for child in element.parent.children
781:           # Skip text nodes/comments.
782:           if child.tag? && (name == nil || child.name == name)
783:             unless child.equal?(element)
784:               other = true
785:               break
786:             end
787:           end
788:         end
789:         !other
790:       end
791:     end
simple_selector(statement, values, can_negate = true) click to toggle source

Creates a simple selector given the statement and array of substitution values.

Returns a hash with the values tag_name, attributes, pseudo (classes) and negation.

Called the first time with can_negate true to allow negation. Called a second time with false since negation cannot be negated.

     # File lib/action_controller/vendor/html-scanner/html/selector.rb, line 522
522:     def simple_selector(statement, values, can_negate = true)
523:       tag_name = nil
524:       attributes = []
525:       pseudo = []
526:       negation = []
527: 
528:       # Element name. (Note that in negation, this can come at
529:       # any order, but for simplicity we allow if only first).
530:       statement.sub!(/^(\*|[[:alpha:]][\w\-]*)/) do |match|
531:         match.strip!
532:         tag_name = match.downcase unless match == "*"
533:         @source << match
534:         "" # Remove
535:       end
536: 
537:       # Get identifier, class, attribute name, pseudo or negation.
538:       while true
539:         # Element identifier.
540:         next if statement.sub!(/^#(\?|[\w\-]+)/) do |match|
541:           id = $1
542:           if id == "?"
543:             id = values.shift
544:           end
545:           @source << "##{id}"
546:           id = Regexp.new("^#{Regexp.escape(id.to_s)}$") unless id.is_a?(Regexp)
547:           attributes << ["id", id]
548:           "" # Remove
549:         end
550: 
551:         # Class name.
552:         next if statement.sub!(/^\.([\w\-]+)/) do |match|
553:           class_name = $1
554:           @source << ".#{class_name}"
555:           class_name = Regexp.new("(^|\s)#{Regexp.escape(class_name)}($|\s)") unless class_name.is_a?(Regexp)
556:           attributes << ["class", class_name]
557:           "" # Remove
558:         end
559: 
560:         # Attribute value.
561:         next if statement.sub!(/^\[\s*([[:alpha:]][\w\-:]*)\s*((?:[~|^$*])?=)?\s*('[^']*'|"[^*]"|[^\]]*)\s*\]/) do |match|
562:           name, equality, value = $1, $2, $3
563:           if value == "?"
564:             value = values.shift
565:           else
566:             # Handle single and double quotes.
567:             value.strip!
568:             if (value[0] == "" || value[0] == '') && value[0] == value[1]
569:               value = value[1..2]
570:             end
571:           end
572:           @source << "[#{name}#{equality}'#{value}']"
573:           attributes << [name.downcase.strip, attribute_match(equality, value)]
574:           "" # Remove
575:         end
576: 
577:         # Root element only.
578:         next if statement.sub!(/^:root/) do |match|
579:           pseudo << lambda do |element|
580:             element.parent.nil? || !element.parent.tag?
581:           end
582:           @source << ":root"
583:           "" # Remove
584:         end
585: 
586:         # Nth-child including last and of-type.
587:         next if statement.sub!(/^:nth-(last-)?(child|of-type)\((odd|even|(\d+|\?)|(-?\d*|\?)?n([+\-]\d+|\?)?)\)/) do |match|
588:           reverse = $1 == "last-"
589:           of_type = $2 == "of-type"
590:           @source << ":nth-#{$1}#{$2}("
591:           case $3
592:             when "odd"
593:               pseudo << nth_child(2, 1, of_type, reverse)
594:               @source << "odd)"
595:             when "even"
596:               pseudo << nth_child(2, 2, of_type, reverse)
597:               @source << "even)"
598:             when /^(\d+|\?)$/  # b only
599:               b = ($1 == "?" ? values.shift : $1).to_i
600:               pseudo << nth_child(0, b, of_type, reverse)
601:               @source << "#{b})"
602:             when /^(-?\d*|\?)?n([+\-]\d+|\?)?$/
603:               a = ($1 == "?" ? values.shift :
604:                    $1 == "" ? 1 : $1 == "-" ? 1 : $1).to_i
605:               b = ($2 == "?" ? values.shift : $2).to_i
606:               pseudo << nth_child(a, b, of_type, reverse)
607:               @source << (b >= 0 ? "#{a}n+#{b})" : "#{a}n#{b})")
608:             else
609:               raise ArgumentError, "Invalid nth-child #{match}"
610:           end
611:           "" # Remove
612:         end
613:         # First/last child (of type).
614:         next if statement.sub!(/^:(first|last)-(child|of-type)/) do |match|
615:           reverse = $1 == "last"
616:           of_type = $2 == "of-type"
617:           pseudo << nth_child(0, 1, of_type, reverse)
618:           @source << ":#{$1}-#{$2}"
619:           "" # Remove
620:         end
621:         # Only child (of type).
622:         next if statement.sub!(/^:only-(child|of-type)/) do |match|
623:           of_type = $1 == "of-type"
624:           pseudo << only_child(of_type)
625:           @source << ":only-#{$1}"
626:           "" # Remove
627:         end
628: 
629:         # Empty: no child elements or meaningful content (whitespaces
630:         # are ignored).
631:         next if statement.sub!(/^:empty/) do |match|
632:           pseudo << lambda do |element|
633:             empty = true
634:             for child in element.children
635:               if child.tag? || !child.content.strip.empty?
636:                 empty = false
637:                 break
638:               end
639:             end
640:             empty
641:           end
642:           @source << ":empty"
643:           "" # Remove
644:         end
645:         # Content: match the text content of the element, stripping
646:         # leading and trailing spaces.
647:         next if statement.sub!(/^:content\(\s*(\?|'[^']*'|"[^"]*"|[^)]*)\s*\)/) do |match|
648:           content = $1
649:           if content == "?"
650:             content = values.shift
651:           elsif (content[0] == "" || content[0] == '') && content[0] == content[1]
652:             content = content[1..2]
653:           end
654:           @source << ":content('#{content}')"
655:           content = Regexp.new("^#{Regexp.escape(content.to_s)}$") unless content.is_a?(Regexp)
656:           pseudo << lambda do |element|
657:             text = ""
658:             for child in element.children
659:               unless child.tag?
660:                 text << child.content
661:               end
662:             end
663:             text.strip =~ content
664:           end
665:           "" # Remove
666:         end
667: 
668:         # Negation. Create another simple selector to handle it.
669:         if statement.sub!(/^:not\(\s*/, "")
670:           raise ArgumentError, "Double negatives are not missing feature" unless can_negate
671:           @source << ":not("
672:           negation << simple_selector(statement, values, false)
673:           raise ArgumentError, "Negation not closed" unless statement.sub!(/^\s*\)/, "")
674:           @source << ")"
675:           next
676:         end
677: 
678:         # No match: moving on.
679:         break
680:       end
681: 
682:       # Return hash. The keys are mapped to instance variables.
683:       {:tag_name=>tag_name, :attributes=>attributes, :pseudo=>pseudo, :negation=>negation}
684:     end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.