class Bio::Blast::Report
Bio::Blast::Report¶ ↑
Parsed results of the blast execution for Tab-delimited and XML output format. Tab-delimited reports are consists of
Query id, Subject id, percent of identity, alignment length, number of mismatches (not including gaps), number of gap openings, start of alignment in query, end of alignment in query, start of alignment in subject, end of alignment in subject, expected value, bit score.
according to the MEGABLAST document (README.mbl). As for XML output, see the following DTDs.
* http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd * http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod * http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod
Constants
- DELIMITER
for Bio::FlatFile support (only for XML data)
- FLATFILE_SPLITTER
Flatfile splitter for NCBI BLAST XML format. It is internally used when reading BLAST XML. Normally, users do not need to use it directly.
Attributes
database name or title (String)
Returns an Array of Bio::Blast::Report::Iteration objects.
Returns a Hash containing execution parameters. Valid keys are: 'matrix', 'expect', 'include', 'sc-match', 'sc-mismatch', 'gap-open', 'gap-extend', 'filter'
program name (e.g. “blastp”) (String)
query definition line (String)
query ID (String)
query length (Integer)
reference (String)
When the report contains results for multiple query sequences, returns an array of Bio::Blast::Report objects corresponding to the multiple queries. Otherwise, returns nil.
Note for “No hits found”: When no hits found for a query sequence, the result for the query is completely void and no information available in the result XML, including query ID and query definition. The only trace is that iteration number is skipped. This means that if the no-hit query is the last query, the query can not be detected, because the result XML is completely the same as the result XML without the query.
BLAST version (e.g. “blastp 2.2.18 [Mar-02-2008]”) (String)
Public Class Methods
Passing a BLAST output from 'blastall -m 7' or '-m 8' as a String. Formats are auto detected.
# File lib/bio/appl/blast/report.rb, line 85 def initialize(data, parser = nil) @iterations = [] @parameters = {} case parser when :xmlparser # format 7 xmlparser_parse(data) @reports = blastxml_split_reports when :rexml # format 7 rexml_parse(data) @reports = blastxml_split_reports when :tab # format 8 tab_parse(data) when false # do not parse, creates an empty object else auto_parse(data) end end
Specify to use REXML to parse XML (-m 7) output.
# File lib/bio/appl/blast/report.rb, line 59 def self.rexml(data) self.new(data, :rexml) end
Specify to use tab delimited output parser.
# File lib/bio/appl/blast/report.rb, line 64 def self.tab(data) self.new(data, :tab) end
Specify to use XMLParser to parse XML (-m 7) output.
# File lib/bio/appl/blast/report.rb, line 54 def self.xmlparser(data) self.new(data, :xmlparser) end
Public Instance Methods
Length of BLAST db
# File lib/bio/appl/blast/report.rb, line 191 def db_len; statistics['db-len']; end
Number of sequences in BLAST db
# File lib/bio/appl/blast/report.rb, line 189 def db_num; statistics['db-num']; end
Iterates on each Bio::Blast::Report::Hit object of the the last Iteration. Shortcut for the last iteration's hits (for blastall)
# File lib/bio/appl/blast/report.rb, line 167 def each_hit @iterations.last.each do |x| yield x end end
Iterates on each Bio::Blast::Report::Iteration object. (for blastpgp)
# File lib/bio/appl/blast/report.rb, line 159 def each_iteration @iterations.each do |x| yield x end end
Effective search space
# File lib/bio/appl/blast/report.rb, line 195 def eff_space; statistics['eff-space']; end
Limit of request to Entrez : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 156 def entrez_query; @parameters['entrez-query']; end
Karlin-Altschul parameter H
# File lib/bio/appl/blast/report.rb, line 201 def entropy; statistics['entropy']; end
Expectation threshold (-e) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 140 def expect; @parameters['expect']; end
Filtering options (-F) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 152 def filter; @parameters['filter']; end
Gap extension cost (-E) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 150 def gap_extend; @parameters['gap-extend']; end
Gap opening cost (-G) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 148 def gap_open; @parameters['gap-open']; end
Returns a Array of Bio::Blast::Report::Hits of the last iteration. Shortcut for the last iteration's hits
# File lib/bio/appl/blast/report.rb, line 176 def hits @iterations.last.hits end
Effective HSP length
# File lib/bio/appl/blast/report.rb, line 193 def hsp_len; statistics['hsp-len']; end
Inclusion threshold (-h) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 142 def inclusion; @parameters['include']; end
Karlin-Altschul parameter K
# File lib/bio/appl/blast/report.rb, line 197 def kappa; statistics['kappa']; end
Karlin-Altschul parameter Lamba
# File lib/bio/appl/blast/report.rb, line 199 def lambda; statistics['lambda']; end
Matrix used (-M) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 138 def matrix; @parameters['matrix']; end
Returns a String (or nil) containing execution message of the last iteration (typically “CONVERGED”). Shortcut for the last iteration's message (for checking 'CONVERGED')
# File lib/bio/appl/blast/report.rb, line 206 def message @iterations.last.message end
PHI-BLAST pattern : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 154 def pattern; @parameters['pattern']; end
Match score for NT (-r) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 144 def sc_match; @parameters['sc-match']; end
Mismatch score for NT (-q) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb, line 146 def sc_mismatch; @parameters['sc-mismatch']; end
Returns a Hash containing execution statistics of the last iteration. Valid keys are: 'db-num', 'db-len', 'hsp-len', 'eff-space', 'kappa', 'lambda', 'entropy' Shortcut for the last iteration's statistics.
# File lib/bio/appl/blast/report.rb, line 184 def statistics @iterations.last.statistics end
Private Instance Methods
# File lib/bio/appl/blast/report.rb, line 68 def auto_parse(data) if /<?xml/.match(data[/.*/]) if defined?(XMLParser) xmlparser_parse(data) @reports = blastxml_split_reports else rexml_parse(data) @reports = blastxml_split_reports end else tab_parse(data) end end
(private method) In new BLAST XML (blastall >= 2.2.14), results of multiple queries are stored in <Iteration>. This method splits iterations into multiple Bio::Blast objects and returns them as an array.
# File lib/bio/appl/blast/report.rb, line 390 def blastxml_split_reports unless self.iterations.find { |iter| iter.query_id || iter.query_def || iter.query_len } then # traditional BLAST XML format, or blastpgp result. return nil end # new BLAST XML format (blastall 2.2.14 or later) origin = self reports = [] prev_iternum = 0 firsttime = true orig_iters = self.iterations orig_iters.each do |iter| blast = self.class.new(nil, false) # When no hits found, the iteration is skipped in NCBI BLAST XML. # So, filled with empty report object. if prev_iternum + 1 < iter.num then ((prev_iternum + 1)...(iter.num)).each do |num| empty_i = Iteration.new empty_i.num = num empty_i.instance_eval { if firsttime then @query_id = origin.query_id @query_def = origin.query_def @query_len = origin.query_len firsttime = false end } empty = self.class.new(nil, false) empty.instance_eval { # queriy_* are copied from the empty_i @query_id = empty_i.query_id @query_def = empty_i.query_def @query_len = empty_i.query_len # others are copied from the origin @program = origin.program @version = origin.version @reference = origin.reference @db = origin.db @parameters.update(origin.parameters) # the empty_i is added to the iterations @iterations.push empty_i } reports.push empty end end blast.instance_eval { if firsttime then @query_id = origin.query_id @query_def = origin.query_def @query_len = origin.query_len firsttime = false end # queriy_* are copied from the iter @query_id = iter.query_id if iter.query_id @query_def = iter.query_def if iter.query_def @query_len = iter.query_len if iter.query_len # others are copied from the origin @program = origin.program @version = origin.version @reference = origin.reference @db = origin.db @parameters.update(origin.parameters) # rewrites hit's query_id, query_def, query_len iter.hits.each do |h| h.query_id = @query_id h.query_def = @query_def h.query_len = @query_len end # the iter is added to the iterations @iterations.push iter } prev_iternum = iter.num reports.push blast end #orig_iters.each # This object's iterations is set as first report's iterations @iterations.clear if rep = reports.first then @iterations = rep.iterations end return reports end
# File lib/bio/appl/blast/rexml.rb, line 25 def rexml_parse(xml) dom = REXML::Document.new(xml) rexml_parse_program(dom) dom.elements.each("*//Iteration") do |e| @iterations.push(rexml_parse_iteration(e)) end end
# File lib/bio/appl/blast/rexml.rb, line 87 def rexml_parse_hit(e) hit = Hit.new hash = {} hit.query_id = @query_id hit.query_def = @query_def hit.query_len = @query_len e.elements.each do |h| case h.name when 'Hit_hsps' h.elements.each("Hsp") do |s| hit.hsps.push(rexml_parse_hsp(s)) end else hash[h.name] = h.text end end hit.num = hash['Hit_num'].to_i hit.hit_id = hash['Hit_id'] hit.len = hash['Hit_len'].to_i hit.definition = hash['Hit_def'] hit.accession = hash['Hit_accession'] return hit end
# File lib/bio/appl/blast/rexml.rb, line 111 def rexml_parse_hsp(e) hsp = Hsp.new hash = {} e.each_element_with_text do |h| hash[h.name] = h.text end hsp.num = hash['Hsp_num'].to_i hsp.bit_score = hash['Hsp_bit-score'].to_f hsp.score = hash['Hsp_score'].to_i hsp.evalue = hash['Hsp_evalue'].to_f hsp.query_from = hash['Hsp_query-from'].to_i hsp.query_to = hash['Hsp_query-to'].to_i hsp.hit_from = hash['Hsp_hit-from'].to_i hsp.hit_to = hash['Hsp_hit-to'].to_i hsp.pattern_from = hash['Hsp_pattern-from'].to_i hsp.pattern_to = hash['Hsp_pattern-to'].to_i hsp.query_frame = hash['Hsp_query-frame'].to_i hsp.hit_frame = hash['Hsp_hit-frame'].to_i hsp.identity = hash['Hsp_identity'].to_i hsp.positive = hash['Hsp_positive'].to_i hsp.gaps = hash['Hsp_gaps'].to_i hsp.align_len = hash['Hsp_align-len'].to_i hsp.density = hash['Hsp_density'].to_i hsp.qseq = hash['Hsp_qseq'] hsp.hseq = hash['Hsp_hseq'] hsp.midline = hash['Hsp_midline'] return hsp end
# File lib/bio/appl/blast/rexml.rb, line 55 def rexml_parse_iteration(e) iteration = Iteration.new e.elements.each do |i| case i.name when 'Iteration_iter-num' iteration.num = i.text.to_i when 'Iteration_hits' i.elements.each("Hit") do |h| iteration.hits.push(rexml_parse_hit(h)) end when 'Iteration_message' iteration.message = i.text when 'Iteration_stat' i.elements["Statistics"].each_element_with_text do |s| k = s.name.sub(/Statistics_/, '') v = s.text =~ /\D/ ? s.text.to_f : s.text.to_i iteration.statistics[k] = v end # for new BLAST XML format when 'Iteration_query-ID' iteration.query_id = i.text when 'Iteration_query-def' iteration.query_def = i.text when 'Iteration_query-len' iteration.query_len = i.text.to_i end end #case i.name return iteration end
# File lib/bio/appl/blast/rexml.rb, line 33 def rexml_parse_program(dom) hash = {} dom.root.each_element_with_text do |e| name, text = e.name, e.text case name when 'BlastOutput_param' e.elements["Parameters"].each_element_with_text do |p| xml_set_parameter(p.name, p.text) end else hash[name] = text if text.strip.size > 0 end end @program = hash['BlastOutput_program'] @version = hash['BlastOutput_version'] @reference = hash['BlastOutput_reference'] @db = hash['BlastOutput_db'] @query_id = hash['BlastOutput_query-ID'] @query_def = hash['BlastOutput_query-def'] @query_len = hash['BlastOutput_query-len'].to_i end
# File lib/bio/appl/blast/format8.rb, line 20 def tab_parse(data) iteration = Iteration.new @iterations.push(iteration) @query_id = @query_def = data[/\S+/] query_prev = '' target_prev = '' hit_num = 1 hsp_num = 1 hit = '' data.each_line do |line| ary = line.chomp.split("\t") query_id, target_id, hsp = tab_parse_hsp(ary) if query_prev != query_id or target_prev != target_id hit = Hit.new hit.num = hit_num hit_num += 1 hit.query_id = hit.query_def = query_id hit.accession = hit.definition = target_id iteration.hits.push(hit) hsp_num = 1 end hsp.num = hsp_num hsp_num += 1 hit.hsps.push(hsp) query_prev = query_id target_prev = target_id end end
# File lib/bio/appl/blast/format8.rb, line 50 def tab_parse_hsp(ary) query_id, target_id, percent_identity, align_len, mismatch_count, gaps, query_from, query_to, hit_from, hit_to, evalue, bit_score = *ary hsp = Hsp.new hsp.align_len = align_len.to_i hsp.gaps = gaps.to_i hsp.query_from = query_from.to_i hsp.query_to = query_to.to_i hsp.hit_from = hit_from.to_i hsp.hit_to = hit_to.to_i hsp.evalue = evalue.strip.to_f hsp.bit_score = bit_score.to_f hsp.percent_identity = percent_identity.to_f hsp.mismatch_count = mismatch_count.to_i return query_id, target_id, hsp end
set parameter of the key as val
# File lib/bio/appl/blast/xmlparser.rb, line 119 def xml_set_parameter(key, val) #labels = { # 'matrix' => 'Parameters_matrix', # 'expect' => 'Parameters_expect', # 'include' => 'Parameters_include', # 'sc-match' => 'Parameters_sc-match', # 'sc-mismatch' => 'Parameters_sc-mismatch', # 'gap-open' => 'Parameters_gap-open', # 'gap-extend' => 'Parameters_gap-extend', # 'filter' => 'Parameters_filter', # 'pattern' => 'Parameters_pattern', # 'entrez-query' => 'Parameters_entrez-query', #} k = key.sub(/\AParameters\_/, '') @parameters[k] = case k when 'expect', 'include' val.to_f when /\Agap\-/, /\Asc\-/ val.to_i else val end end
# File lib/bio/appl/blast/xmlparser.rb, line 33 def xmlparser_parse(xml) parser = XMLParser.new def parser.default; end begin tag_stack = Array.new hash = Hash.new parser.parse(xml) do |type, name, data| case type when XMLParser::START_ELEM tag_stack.push(name) hash.update(data) case name when 'Iteration' iteration = Iteration.new @iterations.push(iteration) when 'Hit' hit = Hit.new hit.query_id = @query_id hit.query_def = @query_def hit.query_len = @query_len @iterations.last.hits.push(hit) when 'Hsp' hsp = Hsp.new @iterations.last.hits.last.hsps.push(hsp) end when XMLParser::END_ELEM case name when /^BlastOutput/ xmlparser_parse_program(name,hash) hash = Hash.new when /^Parameters$/ xmlparser_parse_parameters(hash) hash = Hash.new when /^Iteration/ xmlparser_parse_iteration(name, hash) hash = Hash.new when /^Hit/ xmlparser_parse_hit(name, hash) hash = Hash.new when /^Hsp$/ xmlparser_parse_hsp(hash) hash = Hash.new when /^Statistics$/ xmlparser_parse_statistics(hash) hash = Hash.new end tag_stack.pop when XMLParser::CDATA if hash[tag_stack.last].nil? hash[tag_stack.last] = data unless data.strip.empty? else hash[tag_stack.last].concat(data) if data end when XMLParser::PI end end rescue XMLParserError line = parser.line column = parser.column print "Parse error at #{line}(#{column}) : #{$!}\n" end end
# File lib/bio/appl/blast/xmlparser.rb, line 167 def xmlparser_parse_hit(tag, hash) hit = @iterations.last.hits.last case tag when 'Hit_num' hit.num = hash[tag].to_i when 'Hit_id' hit.hit_id = hash[tag].clone when 'Hit_def' hit.definition = hash[tag].clone when 'Hit_accession' hit.accession = hash[tag].clone when 'Hit_len' hit.len = hash[tag].clone.to_i end end
# File lib/bio/appl/blast/xmlparser.rb, line 183 def xmlparser_parse_hsp(hash) hsp = @iterations.last.hits.last.hsps.last hsp.num = hash['Hsp_num'].to_i hsp.bit_score = hash['Hsp_bit-score'].to_f hsp.score = hash['Hsp_score'].to_i hsp.evalue = hash['Hsp_evalue'].to_f hsp.query_from = hash['Hsp_query-from'].to_i hsp.query_to = hash['Hsp_query-to'].to_i hsp.hit_from = hash['Hsp_hit-from'].to_i hsp.hit_to = hash['Hsp_hit-to'].to_i hsp.pattern_from = hash['Hsp_pattern-from'].to_i hsp.pattern_to = hash['Hsp_pattern-to'].to_i hsp.query_frame = hash['Hsp_query-frame'].to_i hsp.hit_frame = hash['Hsp_hit-frame'].to_i hsp.identity = hash['Hsp_identity'].to_i hsp.positive = hash['Hsp_positive'].to_i hsp.gaps = hash['Hsp_gaps'].to_i hsp.align_len = hash['Hsp_align-len'].to_i hsp.density = hash['Hsp_density'].to_i hsp.qseq = hash['Hsp_qseq'] hsp.hseq = hash['Hsp_hseq'] hsp.midline = hash['Hsp_midline'] end
# File lib/bio/appl/blast/xmlparser.rb, line 150 def xmlparser_parse_iteration(tag, hash) case tag when 'Iteration_iter-num' @iterations.last.num = hash[tag].to_i when 'Iteration_message' @iterations.last.message = hash[tag].to_s # for new BLAST XML format when 'Iteration_query-ID' @iterations.last.query_id = hash[tag].to_s when 'Iteration_query-def' @iterations.last.query_def = hash[tag].to_s when 'Iteration_query-len' @iterations.last.query_len = hash[tag].to_i end end
# File lib/bio/appl/blast/xmlparser.rb, line 144 def xmlparser_parse_parameters(hash) hash.each do |k, v| xml_set_parameter(k, v) end end
# File lib/bio/appl/blast/xmlparser.rb, line 99 def xmlparser_parse_program(tag, hash) case tag when 'BlastOutput_program' @program = hash[tag] when 'BlastOutput_version' @version = hash[tag] when 'BlastOutput_reference' @reference = hash[tag] when 'BlastOutput_db' @db = hash[tag].strip when 'BlastOutput_query-ID' @query_id = hash[tag] when 'BlastOutput_query-def' @query_def = hash[tag] when 'BlastOutput_query-len' @query_len = hash[tag].to_i end end
# File lib/bio/appl/blast/xmlparser.rb, line 207 def xmlparser_parse_statistics(hash) labels = { 'db-num' => 'Statistics_db-num', 'db-len' => 'Statistics_db-len', 'hsp-len' => 'Statistics_hsp-len', 'eff-space' => 'Statistics_eff-space', 'kappa' => 'Statistics_kappa', 'lambda' => 'Statistics_lambda', 'entropy' => 'Statistics_entropy' } labels.each do |k,v| case k when 'db-num', 'db-len', 'hsp-len' @iterations.last.statistics[k] = hash[v].to_i else @iterations.last.statistics[k] = hash[v].to_f end end end