class Bio::Blast

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

require 'bio'

# To run an actual BLAST analysis:
#   1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot',
                                         '-e 0.0001', 'genomenet')
#or:
local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

#   2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(sequence_text)

# Then, to parse the report, see Bio::Blast::Report

See also

References

Attributes

blastall[RW]

Full path for blastall. (default: 'blastall').

db[RW]

Database name (-d option for blastall)

filter[RW]

Filter option for blastall -F (T or F).

format[RW]

Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].

matrix[RW]

Substitution matrix for blastall -M

options[R]

Options for blastall

output[R]

Returns a String containing blast execution output in as is the #format.

parser[W]
program[RW]

Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx

server[R]

Server to submit the BLASTs to

Public Class Methods

local(program, db, options = '', blastall = nil) click to toggle source

This is a shortcut for ::new:

Bio::Blast.local(program, database, options)

is equivalent to

Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the local database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 79
def self.local(program, db, options = '', blastall = nil)
  f = self.new(program, db, options, 'local')
  if blastall then
    f.blastall = blastall
  end
  f
end
new(program, db, opt = [], server = 'local') click to toggle source

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the (local or remote) database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (e.g. 'genomenet'; DEFAULT = 'local')

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 317
def initialize(program, db, opt = [], server = 'local')
  @program  = program
  @db       = db

  @blastall = 'blastall'
  @matrix   = nil
  @filter   = nil

  @output   = ''
  @parser   = nil
  @format   = nil

  @options = set_options(opt, program, db)
  self.server = server
end
remote(program, db, option = '', server = 'genomenet') click to toggle source

::remote does exactly the same as ::new, but sets the remote server 'genomenet' as its default.


Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the remote database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (DEFAULT = 'genomenet')

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 97
def self.remote(program, db, option = '', server = 'genomenet')
  self.new(program, db, option, server)
end
reports(input, parser = nil) { |e| ... } click to toggle source

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.

# File lib/bio/appl/blast.rb, line 114
def self.reports(input, parser = nil)
  begin
    istr = input.to_str
  rescue NoMethodError
    istr = nil
  end
  if istr then
    input = StringIO.new(istr)
  end
  raise 'unsupported input data type' unless input.respond_to?(:gets)

  # if proper parser is given, emulates old behavior.
  case parser
  when :xmlparser, :rexml
    ff = Bio::FlatFile.new(Bio::Blast::Report, input)
    if block_given? then
      ff.each do |e|
        yield e
      end
      return []
    else
      return ff.to_a
    end
  when :tab
    istr = input.read unless istr
    rep = Report.new(istr, parser)
    if block_given? then
      yield rep
      return []
    else
      return [ rep ]
    end
  end

  # preparation of the new format autodetection rule if needed
  if !defined?(@@reports_format_autodetection_rule) or
      !@@reports_format_autodetection_rule then
    regrule = Bio::FlatFile::AutoDetect::RuleRegexp
    blastxml = regrule[ 'Bio::Blast::Report',
                        /\<\!DOCTYPE BlastOutput PUBLIC / ]
    blast    = regrule[ 'Bio::Blast::Default::Report',
                        /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
                        /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tab      = regrule[ 'Bio::Blast::Report_tab',
                        /^([^\t]*\t){11}[^\t]*$/ ]
    auto = Bio::FlatFile::AutoDetect[ blastxml,
                                      blast,
                                      tblast,
                                      tab
                                    ]
    # sets priorities
    blastxml.is_prior_to blast
    blast.is_prior_to tblast
    tblast.is_prior_to tab
    # rehash
    auto.rehash
    @@report_format_autodetection_rule = auto
  end

  # Creates a FlatFile object with dummy class
  ff = Bio::FlatFile.new(Object, input)
  ff.dbclass = nil

  # file format autodetection
  3.times do
    break if ff.eof? or
      ff.autodetect(31, @@report_format_autodetection_rule)
  end
  # If format detection failed, assumed to be tabular (-m 8)
  ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass

  if block_given? then
    ff.each do |entry|
      yield entry
    end
    ret = []
  else
    ret = ff.to_a
  end
  ret
end
reports_xml(input, parser = nil) { |r| ... } click to toggle source

Note that this is the old implementation of ::reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new ::reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

::reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or ::reports.


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.

# File lib/bio/appl/blast.rb, line 220
def self.reports_xml(input, parser = nil)
  ary = []
  input.each_line("</BlastOutput>\n") do |xml|
    xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
    next if xml.empty?          # skip trailing no hits
    rep = Report.new(xml, parser)
    if rep.reports then
      if block_given?
        rep.reports.each { |r| yield r }
      else
        ary.concat rep.reports
      end
    else
      if block_given?
        yield rep
      else
        ary.push rep
      end
    end
  end
  return ary
end

Public Instance Methods

option() click to toggle source

Returns options of blastall

# File lib/bio/appl/blast.rb, line 374
def option
  # backward compatibility
  Bio::Command.make_command_line(options)
end
option=(str) click to toggle source

Set options for blastall

# File lib/bio/appl/blast.rb, line 380
def option=(str)
  # backward compatibility
  self.options = Shellwords.shellwords(str)
end
options=(ary) click to toggle source

Sets options for blastall

# File lib/bio/appl/blast.rb, line 255
def options=(ary)
  @options = set_options(ary)
end
query(query) click to toggle source

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

# example 1
seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
report = blast_factory.query(seq)

# example 2
str <<END_OF_FASTA
>lcl|MySequence
MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
END_OF_FASTA
report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)

Returns

a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.

# File lib/bio/appl/blast.rb, line 358
def query(query)
  case query
  when Bio::Sequence
    query = query.output(:fasta)
  when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
    query = query.to_fasta('query', 70)
  else
    query = query.to_s
  end

  @output = self.__send__("exec_#{@server}", query)
  report = parse_result(@output)
  return report
end
server=(str) click to toggle source

Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.

# File lib/bio/appl/blast.rb, line 265
def server=(str)
  @server = str
  begin
    m = Bio::Blast::Remote.const_get(@server.capitalize)
  rescue NameError
    m = nil
  end
  if m and !(self.is_a?(m)) then
    # lazy include Bio::Blast::Remote::XXX module
    self.class.class_eval { include m }
  end
  return @server
end

Private Instance Methods

exec_genomenet_tab(query) click to toggle source

This method is obsolete.

Runs genomenet with '-m 8' option. Note that the format option is overwritten.

# File lib/bio/appl/blast.rb, line 496
def exec_genomenet_tab(query)
  warn "Bio::Blast#server=\"genomenet_tab\" is deprecated."
  @format = 8
  exec_genomenet(query)
end
exec_local(query) click to toggle source

Local execution of blastall

# File lib/bio/appl/blast.rb, line 486
def exec_local(query)
  cmd = make_command_line
  @output = Bio::Command.query_command(cmd, query)
  return @output
end
make_command_line() click to toggle source

makes command line.

# File lib/bio/appl/blast.rb, line 479
def make_command_line
  cmd = make_command_line_options
  cmd.unshift @blastall
  cmd
end
make_command_line_options() click to toggle source

returns an array containing NCBI BLAST options

# File lib/bio/appl/blast.rb, line 456
def make_command_line_options
  set_options
  cmd = []
  if @program
    cmd.concat([ '-p', @program ])
  end
  if @db
    cmd.concat([ '-d', @db ])
  end
  if @format
    cmd.concat([ '-m', @format.to_s ])
  end
  if @matrix
    cmd.concat([ '-M', @matrix ]) 
  end
  if @filter
    cmd.concat([ '-F', @filter ]) 
  end
  ncbiopts = NCBIOptions.new(@options)
  ncbiopts.make_command_line_options(cmd)
end
parse_result(str) click to toggle source

parses result

# File lib/bio/appl/blast.rb, line 438
def parse_result(str)
  if @format.to_i == 0 then
    ary = Bio::FlatFile.open(Bio::Blast::Default::Report,
                             StringIO.new(str)) { |ff| ff.to_a }
    case ary.size
    when 0
      return nil
    when 1
      return ary[0]
    else
      return ary
    end
  else
    Report.new(str, @parser)
  end
end
set_options(opt = nil, program = nil, db = nil) click to toggle source
# File lib/bio/appl/blast.rb, line 387
def set_options(opt = nil, program = nil, db = nil)
  opt = @options unless opt

  # when opt is a String, splits to an array
  begin
    a = opt.to_ary
  rescue NameError #NoMethodError
    # backward compatibility
    a = Shellwords.shellwords(opt)
  end
  ncbiopt = NCBIOptions.new(a)

  if fmt = ncbiopt.get('-m') then
    @format = fmt.to_i
  else
    dummy = Bio::Blast::Report #dummy to load XMLParser or REXML
    if defined?(XMLParser) or defined?(REXML)
      @format ||= 7
    else
      @format ||= 8
    end
  end

  mtrx = ncbiopt.get('-M')
  @matrix = mtrx if mtrx
  fltr = ncbiopt.get('-F')
  @filter = fltr if fltr

  # special treatment for '-p'
  if program then
    @program = program
    ncbiopt.delete('-p')
  else
    program = ncbiopt.get('-p')
    @program = program if program
  end

  # special treatment for '-d'
  if db then
    @db = db
    ncbiopt.delete('-d')
  else
    db = ncbiopt.get('-d')
    @db = db if db
  end

  # returns an array of string containing options
  return ncbiopt.options
end