Parent

Scrubyt::PostProcessor

Post processing results after the extraction

Some things can not be carried out during evaluation - for example the ensure_presence_of_pattern constraint (since the evaluation is top to bottom, at a given point we don't know yet whether the currently evaluated pattern will have a child pattern or not) or removing unneeded results caused by evaluating multiple filters.

The sole purpose of this class is to execute these post-processing tasks.

Public Class Methods

apply_post_processing(root_pattern) click to toggle source

This is just a convenience method do call all the postprocessing functionality and checks

# File lib/scrubyt/output/post_processor.rb, line 18
def self.apply_post_processing(root_pattern)
  ensure_presence_of_pattern_full(root_pattern)      
  remove_multiple_filter_duplicates(root_pattern) if root_pattern.children[0].filters.size > 1
  report_if_no_results(root_pattern) if root_pattern.evaluation_context.extractor.get_mode != :production
end
ensure_presence_of_pattern_full(pattern) click to toggle source

Apply the ensure_presence_of_pattern constraint on the full extractor

# File lib/scrubyt/output/post_processor.rb, line 27
def self.ensure_presence_of_pattern_full(pattern)
  ensure_presence_of_pattern(pattern)
  pattern.children.each {|child| ensure_presence_of_pattern_full(child)}
end
remove_multiple_filter_duplicates(pattern) click to toggle source

Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded

# File lib/scrubyt/output/post_processor.rb, line 37
def self.remove_multiple_filter_duplicates(pattern)
  remove_multiple_filter_duplicates_intern(pattern) if pattern.parent_of_leaf
  pattern.children.each {|child| remove_multiple_filter_duplicates(child)}
end
report_if_no_results(root_pattern) click to toggle source

Issue an error report if the document did not extract anything. Probably this is because the structure of the page changed or because of some rather nasty bug - in any case, something wrong is going on, and we need to inform the user about this!

# File lib/scrubyt/output/post_processor.rb, line 47
def self.report_if_no_results(root_pattern)
  results_found = false
  root_pattern.children.each {|child| return if (child.result.childmap.size > 0)}
  
  Scrubyt.log :WARNING, [
    "The extractor did not find any result instances. Most probably this is wrong.",
    "Check your extractor and if you are sure it should work, report a bug!"
  ]
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.