class Bio::NCBI::REST

Description

The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities

Entrez utilities index:

Constants

NCBI_INTERVAL

Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby

Wait for 1/3 seconds. NCBI's restriction is: “Make no more than 3 requests every 1 second.”.

Public Class Methods

efetch(*args) click to toggle source
# File lib/bio/io/ncbirest.rb, line 352
def self.efetch(*args)
  self.new.efetch(*args)
end
einfo() click to toggle source
# File lib/bio/io/ncbirest.rb, line 340
def self.einfo
  self.new.einfo
end
esearch(*args) click to toggle source
# File lib/bio/io/ncbirest.rb, line 344
def self.esearch(*args)
  self.new.esearch(*args)
end
esearch_count(*args) click to toggle source
# File lib/bio/io/ncbirest.rb, line 348
def self.esearch_count(*args)
  self.new.esearch_count(*args)
end

Public Instance Methods

efetch(ids, hash = {}, step = 100) click to toggle source

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)

  • hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “gbc”, “medline”, “count”,…

  • step: maximum number of entries retrieved at a time

Returns

String

# File lib/bio/io/ncbirest.rb, line 316
def efetch(ids, hash = {}, step = 100)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
  opts = default_parameters.merge({ "retmode"  => "text" })
  opts.update(hash)

  case ids
  when Array
    list = ids
  else
    list = ids.to_s.split(/\s*,\s*/)
  end

  result = ""
  0.step(list.size, step) do |i|
    opts["id"] = list[i, step].join(',')
    unless opts["id"].empty?
      response = ncbi_post_form(serv, opts)
      result += response.body
    end
  end
  return result.strip
  #return result.strip.split(/\n\n+/)
end
einfo() click to toggle source

List the NCBI database names E-Utils (einfo) service

pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists

Usage

ncbi = Bio::NCBI::REST.new
ncbi.einfo

Bio::NCBI::REST.einfo

Returns

array of string (database names)

# File lib/bio/io/ncbirest.rb, line 180
def einfo
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
  opts = default_parameters.merge({})
  response = ncbi_post_form(serv, opts)
  result = response.body
  list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
  return list
end
esearch(str, hash = {}, limit = nil, step = 10000) click to toggle source

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)

  • hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “medline”, “count”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field:

      • “titl”: Title [TI]

      • “tiab”: Title/Abstract [TIAB]

      • “word”: Text words [TW]

      • “auth”: Author [AU]

      • “affl”: Affiliation [AD]

      • “jour”: Journal [TA]

      • “vol”: Volume [VI]

      • “iss”: Issue [IP]

      • “page”: First page [PG]

      • “pdat”: Publication date [DP]

      • “ptyp”: Publication type [PT]

      • “lang”: Language [LA]

      • “mesh”: MeSH term [MH]

      • “majr”: MeSH major topic [MAJR]

      • “subh”: Mesh sub headings [SH]

      • “mhda”: MeSH date [MHDA]

      • “ecno”: EC/RN Number [rn]

      • “si”: Secondary source ID [SI]

      • “uid”: PubMed ID (PMID) [UI]

      • “fltr”: Filter [FILTER] [SB]

      • “subs”: Subset [SB]

    • reldate: 365

    • mindate: 2001

    • maxdate: 2002/01/01

    • datetype: “edat”

  • limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))

  • step: maximum number of entries retrieved at a time

Returns

array of entry IDs or a number of results

# File lib/bio/io/ncbirest.rb, line 247
def esearch(str, hash = {}, limit = nil, step = 10000)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = default_parameters.merge({ "term" => str })
  opts.update(hash)

  case opts["rettype"]
  when "count"
    count = esearch_count(str, opts)
    return count
  else
    retstart = 0
    retstart = hash["retstart"].to_i if hash["retstart"]

    limit ||= hash["retmax"].to_i if hash["retmax"]
    limit ||= 100 # default limit is 100
    limit = esearch_count(str, opts) if limit == 0   # unlimit

    list = []
    0.step(limit, step) do |i|
      retmax = [step, limit - i].min
      opts.update("retmax" => retmax, "retstart" => i + retstart)
      response = ncbi_post_form(serv, opts)
      result = response.body
      list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
    end
    return list
  end
end
esearch_count(str, hash = {}) click to toggle source
Arguments

same as esearch method

Returns

array of entry IDs or a number of results

# File lib/bio/io/ncbirest.rb, line 278
def esearch_count(str, hash = {})
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = default_parameters.merge({ "term" => str })
  opts.update(hash)
  opts.update("rettype" => "count")
  response = ncbi_post_form(serv, opts)
  result = response.body
  count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
  return count
end

Private Instance Methods

default_parameters() click to toggle source

(Private) default parameters


Returns

Hash

# File lib/bio/io/ncbirest.rb, line 117
def default_parameters
  Bio::NCBI::ENTREZ_DEFAULT_PARAMETERS
end
ncbi_access_wait(wait = NCBI_INTERVAL) click to toggle source

(Private) Sleeps until allowed to access.


Arguments:

  • (required) wait: wait unit time

Returns

(undefined)

# File lib/bio/io/ncbirest.rb, line 100
def ncbi_access_wait(wait = NCBI_INTERVAL)
  @@last_access_mutex ||= Mutex.new
  @@last_access_mutex.synchronize {
    if @@last_access
      duration = Time.now - @@last_access
      if wait > duration
        sleep wait - duration
      end
    end
    @@last_access = Time.now
  }
  nil
end
ncbi_check_parameters(opts) click to toggle source

(Private) Checks parameters as NCBI requires. If no email or tool parameter, raises an error.

NCBI announces that “Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.”


Arguments:

  • (required) opts: Hash containing parameters

Returns

(undefined)

# File lib/bio/io/ncbirest.rb, line 148
def ncbi_check_parameters(opts)
  #return if Time.now < Time.gm(2010,5,31)
  if opts['email'].to_s.empty? then
    raise 'Set email parameter for the query, or set Bio::NCBI.default_email = "(your email address)"'
  end
  if opts['tool'].to_s.empty? then
    raise 'Set tool parameter for the query, or set Bio::NCBI.default_tool = "(your tool name)"'
  end
  nil
end
ncbi_post_form(serv, opts) click to toggle source

(Private) Sends query to NCBI.


Arguments:

  • (required) serv: (String) server URI string

  • (required) opts: (Hash) parameters

Returns

nil

# File lib/bio/io/ncbirest.rb, line 127
def ncbi_post_form(serv, opts)
  ncbi_check_parameters(opts)
  ncbi_access_wait
  #$stderr.puts opts.inspect
  response = Bio::Command.post_form(serv, opts)
  response
end