class Bio::Blast::RPSBlast::Report
NCBI RPS Blast (Reversed Position Specific Blast) default output parser.
It supports defalut (-m 0 option) output of the “rpsblast” command.
Because this class inherits Bio::Blast::Default::Report, almost all methods are eqaul to Bio::Blast::Default::Report. Only DELIMITER (and RS) and few methods are different.
By using Bio::FlatFile, (for example, Bio::FlatFile.open), rpsblast result generated from multiple query sequences is automatically splitted into multiple Bio::BLast::RPSBlast::Report objects corresponding to query sequences.
Note for multi-fasta results WITH using Bio::FlatFile: Each splitted result is concatenated with header of the result which describes RPS-BLAST version and database information, if possible.
Note for multi-fasta results WITHOUT using Bio::FlatFile: When parsing an output of rpsblast command running with multi-fasta sequences WITHOUT using Bio::FlatFile, each query's result is stored as an “iteration” of PSI-Blast. This behavior may be changed in the future.
Note for nucleotide results: This class is not tested with nucleotide query and/or nucleotide databases.
Constants
- DELIMITER
Delimter of each entry for RPS-BLAST.
- DELIMITER_OVERRUN
(Integer) excess read size included in DELIMITER.
- FLATFILE_SPLITTER
Flatfile splitter for RPS-BLAST reports. It is internally used when reading RPS-BLAST report. Normally, users do not need to use it directly.
Note for Windows: RPS-BLAST results generated in Microsoft Windows may not be parsed correctly due to the line feed code problem. For a workaroud, convert line feed codes from Windows(DOS) to UNIX.
Public Class Methods
Creates a new Report object from a string.
Using Bio::FlatFile.open (or some other methods) is recommended instead of using this method directly. Refer Bio::Blast::RPSBlast::Report document for more information.
Note for multi-fasta results WITHOUT using Bio::FlatFile: When parsing an output of rpsblast command running with multi-fasta sequences WITHOUT using Bio::FlatFile, each query's result is stored as an “iteration” of PSI-Blast. This behavior may be changed in the future.
Note for nucleotide results: This class is not tested with nucleotide query and/or nucleotide databases.
# File lib/bio/appl/blast/rpsblast.rb, line 173 def initialize(str) str = str.sub(/\A\s+/, '') # remove trailing entries for sure str.sub!(/\n(RPS\-BLAST.*)/m, "\n") @entry_overrun = $1 @entry = str data = str.split(/(?:^[ \t]*\n)+/) if data[0] and /\AQuery\=/ !~ data[0] then format0_split_headers(data) end @iterations = format0_split_search(data) format0_split_stat_params(data) end
Public Instance Methods
Returns definition of the query. For a result of multi-fasta input, the
first query's definition is returned (The same as
iterations.first.query_def
).
# File lib/bio/appl/blast/rpsblast.rb, line 191 def query_def iterations.first.query_def end
Returns length of the query. For a result of multi-fasta input, the first
query's length is returned (The same as
iterations.first.query_len
).
# File lib/bio/appl/blast/rpsblast.rb, line 198 def query_len iterations.first.query_len end
Private Instance Methods
Splits headers into the first line, reference, query line and database line.
# File lib/bio/appl/blast/rpsblast.rb, line 206 def format0_split_headers(data) @f0header = data.shift @f0references = [] while data[0] and /\ADatabase\:/ !~ data[0] @f0references.push data.shift end @f0database = data.shift # In special case, a void line is inserted after database name. if /\A +[\d\,]+ +sequences\; +[\d\,]+ total +letters\s*\z/ =~ data[0] then @f0database.concat "\n" @f0database.concat data.shift end end
Splits the search results.
# File lib/bio/appl/blast/rpsblast.rb, line 221 def format0_split_search(data) iterations = [] dummystr = 'Searching..................................................done' if r = data[0] and /^Searching/ =~ r then dummystr = data.shift end while r = data[0] and /^Query\=/ =~ r iterations << Iteration.new(data, dummystr) end iterations end