class Bio::NCBI::REST
Description¶ ↑
The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities
Entrez utilities index:
Constants
- NCBI_INTERVAL
Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby
Wait for 1/3 seconds. NCBI's restriction is: “Make no more than 3 requests every 1 second.”.
Public Class Methods
# File lib/bio/io/ncbirest.rb, line 352 def self.efetch(*args) self.new.efetch(*args) end
# File lib/bio/io/ncbirest.rb, line 340 def self.einfo self.new.einfo end
# File lib/bio/io/ncbirest.rb, line 344 def self.esearch(*args) self.new.esearch(*args) end
# File lib/bio/io/ncbirest.rb, line 348 def self.esearch_count(*args) self.new.esearch_count(*args) end
Public Instance Methods
Retrieve database entries by given IDs and using E-Utils (efetch) service.
For information on the possible arguments, see
Usage¶ ↑
ncbi = Bio::NCBI::REST.new ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"}) ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"}) ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"}) Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"}) Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"}) Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})
Arguments:
-
ids: list of NCBI entry IDs (required)
-
hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}
-
db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …
-
retmode: “text”, “xml”, “html”, …
-
rettype: “gb”, “gbc”, “medline”, “count”,…
-
-
step: maximum number of entries retrieved at a time
- Returns
# File lib/bio/io/ncbirest.rb, line 316 def efetch(ids, hash = {}, step = 100) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" opts = default_parameters.merge({ "retmode" => "text" }) opts.update(hash) case ids when Array list = ids else list = ids.to_s.split(/\s*,\s*/) end result = "" 0.step(list.size, step) do |i| opts["id"] = list[i, step].join(',') unless opts["id"].empty? response = ncbi_post_form(serv, opts) result += response.body end end return result.strip #return result.strip.split(/\n\n+/) end
List the NCBI database names E-Utils (einfo) service
pubmed protein nucleotide nuccore nucgss nucest structure genome books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc popset probe proteinclusters pcassay pccompound pcsubstance snp taxonomy toolkit unigene unists
Usage¶ ↑
ncbi = Bio::NCBI::REST.new ncbi.einfo Bio::NCBI::REST.einfo
- Returns
-
array of string (database names)
# File lib/bio/io/ncbirest.rb, line 180 def einfo serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi" opts = default_parameters.merge({}) response = ncbi_post_form(serv, opts) result = response.body list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten return list end
Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.
For information on the possible arguments, see
-
eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
-
www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Search_Field_Descrip
Usage¶ ↑
ncbi = Bio::NCBI::REST.new ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"}) ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"}) ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5}) Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"}) Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"}) Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})
Arguments:
-
str: query string (required)
-
hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}
-
db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …
-
retmode: “text”, “xml”, “html”, …
-
rettype: “gb”, “medline”, “count”, …
-
retmax: integer (default 100)
-
retstart: integer
-
field:
-
“titl”: Title [TI]
-
“tiab”: Title/Abstract [TIAB]
-
“word”: Text words [TW]
-
“auth”: Author [AU]
-
“affl”: Affiliation [AD]
-
“jour”: Journal [TA]
-
“vol”: Volume [VI]
-
“iss”: Issue [IP]
-
“page”: First page [PG]
-
“pdat”: Publication date [DP]
-
“ptyp”: Publication type [PT]
-
“lang”: Language [LA]
-
“mesh”: MeSH term [MH]
-
“majr”: MeSH major topic [MAJR]
-
“subh”: Mesh sub headings [SH]
-
“mhda”: MeSH date [MHDA]
-
“ecno”: EC/RN Number [rn]
-
“si”: Secondary source ID [SI]
-
“uid”: PubMed ID (PMID) [UI]
-
“fltr”: Filter [FILTER] [SB]
-
“subs”: Subset [SB]
-
-
reldate: 365
-
mindate: 2001
-
maxdate: 2002/01/01
-
datetype: “edat”
-
-
limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))
-
step: maximum number of entries retrieved at a time
- Returns
-
array of entry IDs or a number of results
# File lib/bio/io/ncbirest.rb, line 247 def esearch(str, hash = {}, limit = nil, step = 10000) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = default_parameters.merge({ "term" => str }) opts.update(hash) case opts["rettype"] when "count" count = esearch_count(str, opts) return count else retstart = 0 retstart = hash["retstart"].to_i if hash["retstart"] limit ||= hash["retmax"].to_i if hash["retmax"] limit ||= 100 # default limit is 100 limit = esearch_count(str, opts) if limit == 0 # unlimit list = [] 0.step(limit, step) do |i| retmax = [step, limit - i].min opts.update("retmax" => retmax, "retstart" => i + retstart) response = ncbi_post_form(serv, opts) result = response.body list += result.scan(/<Id>(.*?)<\/Id>/m).flatten end return list end end
- Arguments
-
same as esearch method
- Returns
-
array of entry IDs or a number of results
# File lib/bio/io/ncbirest.rb, line 278 def esearch_count(str, hash = {}) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = default_parameters.merge({ "term" => str }) opts.update(hash) opts.update("rettype" => "count") response = ncbi_post_form(serv, opts) result = response.body count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i return count end
Private Instance Methods
(Private) default parameters
- Returns
# File lib/bio/io/ncbirest.rb, line 117 def default_parameters Bio::NCBI::ENTREZ_DEFAULT_PARAMETERS end
(Private) Sleeps until allowed to access.
Arguments:
-
(required) wait: wait unit time
- Returns
-
(undefined)
# File lib/bio/io/ncbirest.rb, line 100 def ncbi_access_wait(wait = NCBI_INTERVAL) @@last_access_mutex ||= Mutex.new @@last_access_mutex.synchronize { if @@last_access duration = Time.now - @@last_access if wait > duration sleep wait - duration end end @@last_access = Time.now } nil end
(Private) Checks parameters as NCBI requires. If no email or tool parameter, raises an error.
NCBI announces that “Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.”
Arguments:
-
(required) opts: Hash containing parameters
- Returns
-
(undefined)
# File lib/bio/io/ncbirest.rb, line 148 def ncbi_check_parameters(opts) #return if Time.now < Time.gm(2010,5,31) if opts['email'].to_s.empty? then raise 'Set email parameter for the query, or set Bio::NCBI.default_email = "(your email address)"' end if opts['tool'].to_s.empty? then raise 'Set tool parameter for the query, or set Bio::NCBI.default_tool = "(your tool name)"' end nil end
(Private) Sends query to NCBI.
Arguments:
-
(required) serv: (String) server URI string
-
(required) opts: (Hash) parameters
- Returns
-
nil
# File lib/bio/io/ncbirest.rb, line 127 def ncbi_post_form(serv, opts) ncbi_check_parameters(opts) ncbi_access_wait #$stderr.puts opts.inspect response = Bio::Command.post_form(serv, opts) response end