class Bio::Locations
Description¶ ↑
The Bio::Locations class is a container for Bio::Location objects: creating a Bio::Locations object (based on a GenBank style position string) will spawn an array of Bio::Location objects.
Usage¶ ↑
locations = Bio::Locations.new('join(complement(500..550), 600..625)') locations.each do |loc| puts "class = " + loc.class.to_s puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})" end # Output would be: # class = Bio::Location # range = 500..550 (strand = -1) # class = Bio::Location # range = 600..625 (strand = 1) # For the following three location strings, print the span and range ['one-of(898,900)..983', 'one-of(5971..6308,5971..6309)', '8050..one-of(10731,10758,10905,11242)'].each do |loc| location = Bio::Locations.new(loc) puts location.span puts location.range end
GenBank location descriptor classification¶ ↑
Definition of the position notation of the GenBank location format¶ ↑
According to the GenBank manual 'gbrel.txt', position notations were classified into 10 patterns - (A) to (J).
3.4.12.2 Feature Location The second column of the feature descriptor line designates the location of the feature in the sequence. The location descriptor begins at position 22. Several conventions are used to indicate sequence location. Base numbers in location descriptors refer to numbering in the entry, which is not necessarily the same as the numbering scheme used in the published report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5 to 3 direction. Location descriptors can be one of the following: (A) 1. A single base; (B) 2. A contiguous span of bases; (C) 3. A site between two bases; (D) 4. A single base chosen from a range of bases; (E) 5. A single base chosen from among two or more specified bases; (F) 6. A joining of sequence spans; (G) 7. A reference to an entry other than the one to which the feature belongs (i.e., a remote entry), followed by a location descriptor referring to the remote sequence; (H) 8. A literal sequence (a string of bases enclosed in quotation marks).
Description commented with pattern IDs.¶ ↑
(C) A site between two residues, such as an endonuclease cleavage site, is indicated by listing the two bases separated by a carat (e.g., 23^24). (D) A single residue chosen from a range of residues is indicated by the number of the first and last bases in the range separated by a single period (e.g., 23.79). The symbols < and > indicate that the end point (I) of the range is beyond the specified base number. (B) A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79). The (I) symbols < and > indicate that the end point of the range is beyond the specified base number. Starting and ending positions can be indicated by base number or by one of the operators described below. Operators are prefixes that specify what must be done to the indicated sequence to locate the feature. The following are the operators available, along with their most common format and a description. (J) complement (location): The feature is complementary to the location indicated. Complementary strands are read 5 to 3. (F) join (location, location, .. location): The indicated elements should be placed end to end to form one contiguous sequence. (F) order (location, location, .. location): The elements are found in the specified order in the 5 to 3 direction, but nothing is implied about the rationality of joining them. (F) group (location, location, .. location): The elements are related and should be grouped together, but no order is implied. (E) one-of (location, location, .. location): The element can be any one, but only one, of the items listed.
Reduction strategy of the position notations¶ ↑
Attributes
(Array) An Array of Bio::Location objects
(Symbol or nil) Operator. nil (means :join), :order, or :group (obsolete).
Public Class Methods
Parses a GenBank style position string and returns a Bio::Locations object, which contains a list of Bio::Location objects.
locations = Bio::Locations.new('join(complement(500..550), 600..625)')
Arguments:
-
(required) str: GenBank style position string
- Returns
-
Bio::Locations object
# File lib/bio/location.rb, line 346 def initialize(position) @operator = nil if position.is_a? Array @locations = position else position = gbl_cleanup(position) # preprocessing @locations = gbl_pos2loc(position) # create an Array of Bio::Location objects end end
Public Instance Methods
If other is equal with the self, returns true. Otherwise, returns false.
Arguments:
-
(required) other: any object
- Returns
-
true or false
# File lib/bio/location.rb, line 381 def ==(other) return true if super(other) return false unless other.instance_of?(self.class) if self.locations == other.locations and self.operator == other.operator then true else false end end
Returns nth Bio::Location object.
# File lib/bio/location.rb, line 400 def [](n) @locations[n] end
Converts relative position in the locus to position in the whole of the DNA sequence.
This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ':aa'-flag returns the position of the associated amino-acid rather than the nucleotide.
loc = Bio::Locations.new('complement(12838..13533)') puts loc.absolute(10) # => 13524 puts loc.absolute(10, :aa) # => 13506
Arguments:
-
(required) position: nucleotide position within locus
-
:aa: flag to be used if position is a aminoacid position rather than a nucleotide position
- Returns
-
position within the whole of the sequence
# File lib/bio/location.rb, line 490 def absolute(n, type = nil) case type when :location ; when :aa n = (n - 1) * 3 + 1 rel2abs(n) else rel2abs(n) end end
Iterates on each Bio::Location object.
# File lib/bio/location.rb, line 393 def each @locations.each do |x| yield(x) end end
Evaluate equality of Bio::Locations object.
# File lib/bio/location.rb, line 364 def equals?(other) if ! other.kind_of?(Bio::Locations) return nil end if self.sort == other.sort return true else return false end end
Returns first Bio::Location object.
# File lib/bio/location.rb, line 405 def first @locations.first end
Returns last Bio::Location object.
# File lib/bio/location.rb, line 410 def last @locations.last end
Returns a length of the spliced RNA.
# File lib/bio/location.rb, line 429 def length len = 0 @locations.each do |x| if x.sequence len += x.sequence.size else len += (x.to - x.from + 1) end end len end
Similar to span, but returns a Range object min..max
# File lib/bio/location.rb, line 423 def range min, max = span min..max end
Converts absolute position in the whole of the DNA sequence to relative position in the locus.
This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ':aa'-flag returns the position of the associated amino-acid rather than the nucleotide.
loc = Bio::Locations.new('complement(12838..13533)') puts loc.relative(13524) # => 10 puts loc.relative(13506, :aa) # => 3
Arguments:
-
(required) position: nucleotide position within whole of the sequence
-
:aa: flag that lets method return position in aminoacid coordinates
- Returns
-
position within the location
# File lib/bio/location.rb, line 458 def relative(n, type = nil) case type when :location ; when :aa if n = abs2rel(n) (n - 1) / 3 + 1 else nil end else abs2rel(n) end end
Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.
# File lib/bio/location.rb, line 416 def span span_min = @locations.min { |a,b| a.from <=> b.from } span_max = @locations.max { |a,b| a.to <=> b.to } return span_min.from, span_max.to end
String representation.
Note: In some cases, it fails to detect whether “complement(join(…))” or “join(complement(..))”, and whether “complement(order(…))” or “order(complement(..))”.
- Returns
# File lib/bio/location.rb, line 511 def to_s return '' if @locations.empty? complement_join = false locs = @locations if locs.size >= 2 and locs.inject(true) do |flag, loc| # check if each location is complement (flag && (loc.strand == -1) && !loc.xref_id) end and locs.inject(locs[0].from) do |pos, loc| if pos then (pos >= loc.from) ? loc.from : false else false end end then locs = locs.reverse complement_join = true end locs = locs.collect do |loc| lt = loc.lt ? '<' : '' gt = loc.gt ? '>' : '' str = if loc.from == loc.to then "#{lt}#{gt}#{loc.from.to_i}" elsif loc.carat then "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}" else "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}" end if loc.xref_id and !loc.xref_id.empty? then str = "#{loc.xref_id}:#{str}" end if loc.strand == -1 and !complement_join then str = "complement(#{str})" end if loc.sequence then str = "replace(#{str},\"#{loc.sequence}\")" end str end if locs.size >= 2 then op = (self.operator || 'join').to_s result = "#{op}(#{locs.join(',')})" else result = locs[0] end if complement_join then result = "complement(#{result})" end result end
Private Instance Methods
Convert the absolute position to the relative position
# File lib/bio/location.rb, line 684 def abs2rel(n) return nil unless n > 0 # out of range cursor = 0 @locations.each do |x| if x.sequence len = x.sequence.size else len = x.to - x.from + 1 end if n < x.from or n > x.to then cursor += len else if x.strand < 0 then return x.to - (n - cursor - 1) else return n + cursor + 1 - x.from end end end return nil # out of range end
Preprocessing to clean up the position notation.
# File lib/bio/location.rb, line 565 def gbl_cleanup(position) # sometimes position contains white spaces... position = position.gsub(/\s+/, '') # select one base # (D) n.m # .. n m : # <match> $1 ( $2 $3 not ) position.gsub!(/(\.{2})?\(?([<>\d]+)\.([<>\d]+)(?!:)\)?/) do |match| if $1 $1 + $3 # ..(n.m) => ..m else $2 # (?n.m)? => n end end # select the 1st location # (E) one-of() # <match> .. one-of ($2 ,$3 ) position.gsub!(/(\.{2})?one-of\(([^,]+),([^)]+)\)/) do |match| if $1 $1 + $3.gsub(/.*,(.*)/, '\1') # ..one-of(n,m) => ..m else $2 # one-of(n,m) => n end end ## substitute order(), group() by join() # (F) group(), order() #position.gsub!(/(order|group)/, 'join') return position end
Parse position notation and create Location objects.
# File lib/bio/location.rb, line 598 def gbl_pos2loc(position) ary = [] case position when /^(join|order|group)\((.*)\)$/ # (F) join() if $1 != "join" then @operator = $1.intern end position = $2 join_list = [] # sub positions to join bracket = [] # position with bracket s_count = 0 # stack counter position.split(',').each do |sub_pos| case sub_pos when /\(.*\)/ join_list << sub_pos when /\(/ s_count += 1 bracket << sub_pos when /\)/ s_count -= 1 bracket << sub_pos if s_count == 0 join_list << bracket.join(',') end else if s_count == 0 join_list << sub_pos else bracket << sub_pos end end end join_list.each do |pos| ary << gbl_pos2loc(pos) end when /^complement\((.*)\)$/ # (J) complement() position = $1 gbl_pos2loc(position).reverse_each do |location| ary << location.complement end when /^replace\(([^,]+),"?([^"]*)"?\)/ # (K) replace() position = $1 sequence = $2 ary << gbl_pos2loc(position).first.replace(sequence) else # (A, B, C, G, H, I) ary << Location.new(position) end return ary.flatten end
Convert the relative position to the absolute position
# File lib/bio/location.rb, line 660 def rel2abs(n) return nil unless n > 0 # out of range cursor = 0 @locations.each do |x| if x.sequence len = x.sequence.size else len = x.to - x.from + 1 end if n > cursor + len cursor += len else if x.strand < 0 return x.to - (n - cursor - 1) else return x.from + (n - cursor - 1) end end end return nil # out of range end