Class Bio::Tree
In: lib/bio/db/newick.rb
lib/bio/tree.rb
Parent: Object

This is the class for phylogenetic tree. It stores a phylogenetic tree.

Internally, it is based on Bio::Pathway class. However, users cannot handle Bio::Pathway object directly.

This is alpha version. Incompatible changes may be made frequently.

Methods

Classes and Modules

Class Bio::Tree::Edge
Class Bio::Tree::NoPathError
Class Bio::Tree::Node

Constants

DEFAULT_OPTIONS = { :indent => ' ' }   default options

Attributes

options  [RW]  tree options; mainly used for tree output
root  [RW]  root node of this tree (even if unrooted tree, it is used by some methods)

Public Class methods

Creates a new phylogenetic tree. When no arguments are given, it creates a new empty tree. When a Tree object is given, it copies the tree. Note that the new tree shares Node and Edge objects with the given tree.

[Source]

     # File lib/bio/tree.rb, line 259
259:     def initialize(tree = nil)
260:       # creates an undirected adjacency list graph
261:       @pathway = Bio::Pathway.new([], true)
262:       @root = nil
263:       @options = {}
264:       self.concat(tree) if tree
265:     end

Public Instance methods

Adds a new edge to the tree. Returns the newly added edge. If the edge already exists, it is overwritten with new one.

[Source]

     # File lib/bio/tree.rb, line 368
368:     def add_edge(source, target, edge = Edge.new)
369:       @pathway.append(Bio::Relation.new(source, target, edge))
370:       edge
371:     end

Adds a node to the tree. Returns self. If the node already exists, it does nothing.

[Source]

     # File lib/bio/tree.rb, line 389
389:     def add_node(node)
390:       @pathway.graph[node] ||= {}
391:       self
392:     end

Shows the adjacency matrix representation of the tree. It shows matrix only for given nodes. If nodes is nil or is ommitted, it acts the same as tree.adjacency_matrix(tree.nodes). If a block is given, for each edge, it yields source, target, and edge, and uses the returned value of the block. Without blocks, it uses edge. Returns a matrix object.

[Source]

     # File lib/bio/tree.rb, line 760
760:     def adjacency_matrix(nodes = nil,
761:                          default_value = nil,
762:                          diagonal_value = nil) #:yields: source, target, edge
763:       nodes ||= self.nodes
764:       size = nodes.size
765:       hash = {}
766:       nodes.each_with_index { |x, i| hash[x] = i }
767:       # prepares an matrix
768:       matrix = Array.new(size, nil)
769:       matrix.collect! { |x| Array.new(size, default_value) }
770:       (0...size).each { |i| matrix[i][i] = diagonal_value }
771:       # fills the matrix from each edge
772:       self.each_edge do |source, target, edge|
773:         i_source = hash[source]
774:         i_target = hash[target]
775:         if i_source and i_target then
776:           val = block_given? ? (yield source, target, edge) : edge
777:           matrix[i_source][i_target] = val
778:           matrix[i_target][i_source] = val
779:         end
780:       end
781:       Matrix.rows(matrix, false)
782:     end

Returns an array of adjacent nodes of the given node.

[Source]

     # File lib/bio/tree.rb, line 319
319:     def adjacent_nodes(node)
320:       h = @pathway.graph[node]
321:       h ? h.keys : []
322:     end

Gets all ancestral nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 695
695:     def ancestors(node, root = nil)
696:       root ||= @root
697:       (self.path(root, node) - [ node ]).reverse
698:     end

Gets the adjacent children nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 638
638:     def children(node, root = nil)
639:       root ||= @root
640:       path = self.path(root, node)
641:       result = self.adjacent_nodes(node)
642:       result -= path
643:       result
644:     end

Clears all nodes and edges. Returns self. Note that options and root are also cleared.

[Source]

     # File lib/bio/tree.rb, line 277
277:     def clear
278:       initialize
279:       self
280:     end

Removes all edges connected with the node. Returns self. If the node does not exist, raises IndexError.

[Source]

     # File lib/bio/tree.rb, line 403
403:     def clear_node(node)
404:       unless self.include?(node)
405:         raise IndexError, 'the node does not exist'
406:       end
407:       @pathway.relations.delete_if do |rel|
408:         rel.node.include?(node)
409:       end
410:       @pathway.graph[node].each_key do |k|
411:         @pathway.graph[k].delete(node)
412:       end
413:       @pathway.graph[node].clear
414:       self
415:     end

Replaces each edge by each block‘s return value. Returns self.

[Source]

     # File lib/bio/tree.rb, line 507
507:     def collect_edge! #:yields: source, target, edge
508:       @pathway.relations.each do |rel|
509:         newedge = yield rel.node[0], rel.node[1], rel.relation
510:         rel.relation = newedge
511:         @pathway.append(rel, false)
512:       end
513:       self
514:     end

Replaces each node by each block‘s return value. Returns self.

[Source]

     # File lib/bio/tree.rb, line 487
487:     def collect_node! #:yields: node
488:       tr = {}
489:       self.each_node do |node|
490:         tr[node] = yield node
491:       end
492:       # replaces nodes in @pathway.relations
493:       @pathway.relations.each do |rel|
494:         rel.node.collect! { |node| tr[node] }
495:       end
496:       # re-generates @pathway from relations
497:       @pathway.to_list
498:       # adds orphan nodes
499:       tr.each_value do |newnode|
500:         @pathway.graph[newnode] ||= {}
501:       end
502:       self
503:     end

Concatenates the other tree. If the same edge exists, the edge in other is used. Returns self. The result is unspecified if other isn‘t a Tree object. Note that the Node and Edge objects in the other tree are shared in the concatinated tree.

[Source]

     # File lib/bio/tree.rb, line 574
574:     def concat(other)
575:       #raise TypeError unless other.kind_of?(self.class)
576:       other.each_node do |node|
577:         self.add_node(node)
578:       end
579:       other.each_edge do |node1, node2, edge|
580:         self.add_edge(node1, node2, edge)
581:       end
582:       self
583:     end

Gets all descendent nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 650
650:     def descendents(node, root = nil)
651:       root ||= @root
652:       distance, route = @pathway.breadth_first_search(root)
653:       d = distance[node]
654:       result = []
655:       distance.each do |key, val|
656:         if val > d then
657:           x = key
658:           while x = route[x]
659:             if x == node then
660:               result << key
661:               break
662:             end
663:             break if distance[x] <= d
664:           end
665:         end
666:       end
667:       result
668:     end

Returns distance between node1 and node2. It would raise error if the edges didn‘t contain distance values. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 617
617:     def distance(node1, node2)
618:       distance = 0
619:       self.each_edge_in_path(node1, node2) do |source, target, edge|
620:         distance += get_edge_distance(edge)
621:       end
622:       distance
623:     end

Calculates distance matrix of given nodes. If nodes is nil, or is ommited, it acts the same as tree.distance_matrix(tree.leaves). Returns a matrix object. The result is unspecified for cyclic trees. Note 1: The diagonal values of the matrix are 0. Note 2: If the distance cannot be calculated, nil will be set.

[Source]

     # File lib/bio/tree.rb, line 731
731:     def distance_matrix(nodes = nil)
732:       nodes ||= self.leaves
733:       matrix = []
734:       nodes.each_index do |i|
735:         row = []
736:         nodes.each_index do |j|
737:           if i == j then
738:             distance = 0
739:           elsif r = matrix[j] and val = r[i] then
740:             distance = val
741:           else
742:             distance = (self.distance(nodes[i], nodes[j]) rescue nil)
743:           end
744:           row << distance
745:         end
746:         matrix << row
747:       end
748:       Matrix.rows(matrix, false)
749:     end

Iterates over each edges of this tree.

[Source]

     # File lib/bio/tree.rb, line 299
299:     def each_edge #:yields: source, target, edge
300:       @pathway.relations.each do |rel|
301:         yield rel.node[0], rel.node[1], rel.relation
302:       end
303:       self
304:     end

Iterates over each edge from node1 to node2. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 603
603:     def each_edge_in_path(node1, node2)
604:       path = self.path(node1, node2)
605:       source = path.shift
606:       path.each do |target|
607:         edge = self.get_edge(source, target)
608:         yield source, target, edge
609:         source = target
610:       end
611:       self
612:     end

Iterates over each node of this tree.

[Source]

     # File lib/bio/tree.rb, line 293
293:     def each_node(&x) #:yields: node
294:       @pathway.graph.each_key(&x)
295:       self
296:     end

Iterates over each connected edges of the given node. Returns self.

The reason why the method name is "each_out_edge" is that it comes from the Boost Graph Library.

[Source]

     # File lib/bio/tree.rb, line 343
343:     def each_out_edge(source) #:yields: source, target, edge
344:       h = @pathway.graph[source]
345:       h.each { |key, val| yield source, key, val } if h
346:       self
347:     end

Returns all edges an array of [ node0, node1, edge ]

[Source]

     # File lib/bio/tree.rb, line 307
307:     def edges
308:       @pathway.relations.collect do |rel|
309:         [ rel.node[0], rel.node[1], rel.relation ]
310:       end
311:     end

Returns an edge from source to target. If source and target are not adjacent nodes, returns nil.

[Source]

     # File lib/bio/tree.rb, line 360
360:     def get_edge(source, target)
361:       h = @pathway.graph[source]
362:       h ? h[target] : nil
363:     end

Gets distance value from the given edge. Returns float or any other numeric value or nil.

[Source]

     # File lib/bio/tree.rb, line 101
101:     def get_edge_distance(edge)
102:       begin
103:         dist = edge.distance
104:       rescue NoMethodError
105:         dist = edge
106:       end
107:       dist
108:     end

Gets distance string from the given edge. Returns a string or nil.

[Source]

     # File lib/bio/tree.rb, line 112
112:     def get_edge_distance_string(edge)
113:       begin
114:         dist = edge.distance_string
115:       rescue NoMethodError
116:         dist = (edge ? edge.to_s : nil)
117:       end
118:       dist
119:     end

Returns edge1 + edge2

[Source]

     # File lib/bio/tree.rb, line 122
122:     def get_edge_merged(edge1, edge2)
123:       dist1 = get_edge_distance(edge1)
124:       dist2 = get_edge_distance(edge2)
125:       if dist1 and dist2 then
126:         Edge.new(dist1 + dist2)
127:       elsif dist1 then
128:         Edge.new(dist1)
129:       elsif dist2 then
130:         Edge.new(dist2)
131:       else
132:         Edge.new
133:       end
134:     end

[Source]

     # File lib/bio/tree.rb, line 238
238:     def get_node_bootstrap(node)
239:       begin
240:         node.bootstrap
241:       rescue NoMethodError
242:         nil
243:       end
244:     end

[Source]

     # File lib/bio/tree.rb, line 246
246:     def get_node_bootstrap_string(node)
247:       begin
248:         node.bootstrap_string
249:       rescue NoMethodError
250:         nil
251:       end
252:     end

Finds a node in the tree by given name and returns the node. If the node does not found, returns nil. If multiple nodes with the same name exist, the result would be one of those (unspecified).

[Source]

     # File lib/bio/tree.rb, line 377
377:     def get_node_by_name(str)
378:       self.each_node do |node|
379:         if get_node_name(node) == str
380:           return node
381:         end
382:       end
383:       nil
384:     end

Gets node name

[Source]

     # File lib/bio/tree.rb, line 230
230:     def get_node_name(node)
231:       begin
232:         node.name
233:       rescue NoMethodError
234:         node.to_s
235:       end
236:     end

If the node exists, returns true. Otherwise, returns false.

[Source]

     # File lib/bio/tree.rb, line 396
396:     def include?(node)
397:       @pathway.graph[node] ? true : false
398:     end

Insert a new node between adjacent nodes node1 and node2. The old edge between node1 and node2 are changed to the edge between new_node and node2. The edge between node1 and new_node is newly created.

If new_distance is specified, the distance between node1 and new_node is set to new_distance, and distance between new_node and node2 is set to tree.get_edge(node1, node2).distance - new_distance.

Returns self. If node1 and node2 are not adjacent, raises IndexError.

If new_node already exists in the tree, the tree would become circular. In addition, if the edge between new_node and node1 (or node2) already exists, it will be erased.

[Source]

     # File lib/bio/tree.rb, line 827
827:     def insert_node(node1, node2, new_node, new_distance = nil)
828:       unless edge = self.get_edge(node1, node2) then
829:         raise IndexError, 'nodes not found or two nodes are not adjacent'
830:       end
831:       new_edge = Edge.new(new_distance)
832:       self.remove_edge(node1, node2)
833:       self.add_edge(node1, new_node, new_edge)
834:       if new_distance and old_distance = get_edge_distance(edge) then
835:         old_distance -= new_distance
836:         begin
837:           edge.distance = old_distance
838:         rescue NoMethodError
839:           edge = old_distance
840:         end
841:       end
842:       self.add_edge(new_node, node2, edge)
843:       self
844:     end

If node is nil, returns an array of all leaves (nodes connected with one edge). Otherwise, gets all descendent leaf nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 676
676:     def leaves(node = nil, root = nil)
677:       unless node then
678:         nodes = []
679:         self.each_node do |x|
680:           nodes << x if self.out_degree(x) == 1
681:         end
682:         return nodes
683:       else
684:         root ||= @root
685:         self.descendents(node, root).find_all do |x|
686:           self.adjacent_nodes(x).size == 1
687:         end
688:       end
689:     end

Gets the lowest common ancestor of the two nodes. If root isn‘t specified or root is nil, @root is used. Returns a Node object or nil. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 704
704:     def lowest_common_ancestor(node1, node2, root = nil)
705:       root ||= @root
706:       distance, route = @pathway.breadth_first_search(root)
707:       x = node1; r1 = []
708:       begin; r1 << x; end while x = route[x]
709:       x = node2; r2 = []
710:       begin; r2 << x; end while x = route[x]
711:       return (r1 & r2).first
712:     end
newick(options = {})

Alias for output_newick

Returns all nodes as an array.

[Source]

     # File lib/bio/tree.rb, line 283
283:     def nodes
284:       @pathway.graph.keys
285:     end

Returns number of edges in the tree.

[Source]

     # File lib/bio/tree.rb, line 314
314:     def number_of_edges
315:       @pathway.relations.size
316:     end

Number of nodes.

[Source]

     # File lib/bio/tree.rb, line 288
288:     def number_of_nodes
289:       @pathway.nodes
290:     end

Returns number of edges in the given node.

The reason why the method name is "out_degree" is that it comes from the Boost Graph Library.

[Source]

     # File lib/bio/tree.rb, line 353
353:     def out_degree(source)
354:       h = @pathway.graph[source]
355:       h ? h.size : 0
356:     end

Returns all connected edges with adjacent nodes. Returns an array of the array [ source, target, edge ].

The reason why the method name is "out_edges" is that it comes from the Boost Graph Library.

[Source]

     # File lib/bio/tree.rb, line 329
329:     def out_edges(source)
330:       h = @pathway.graph[source]
331:       if h
332:         h.collect { |key, val| [ source, key, val ] }
333:       else
334:         []
335:       end
336:     end

Returns formatted text (or something) of the tree Currently supported format is: :newick, :nhx

[Source]

     # File lib/bio/db/newick.rb, line 235
235:     def output(format, *arg, &block)
236:       case format
237:       when :newick
238:         output_newick(*arg, &block)
239:       when :nhx
240:         output_nhx(*arg, &block)
241:       when :phylip_distance_matrix
242:         output_phylip_distance_matrix(*arg, &block)
243:       else
244:         raise 'Unknown format'
245:       end
246:     end

Returns a newick formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).

Available options:

:indent:indent string; set false to disable (default: ’ ’)
:bootstrap_style::disabled disables bootstrap representations. :traditional for traditional style. :molphy for Molphy style (default).

[Source]

     # File lib/bio/db/newick.rb, line 203
203:     def output_newick(options = {}, &block) #:yields: node1, node2
204:       root = @root
205:       root ||= self.nodes.first
206:       return '();' unless root
207:       __to_newick([], root, 0, :__to_newick_format_leaf, options, &block) +
208:         __to_newick_format_leaf(root, Edge.new, options) +
209:         ";\n"
210:     end

Returns a NHX (New Hampshire eXtended) formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).

Available options:

:indent:indent string; set false to disable (default: ’ ’)

[Source]

     # File lib/bio/db/newick.rb, line 223
223:     def output_nhx(options = {}, &block) #:yields: node1, node2
224:       root = @root
225:       root ||= self.nodes.first
226:       return '();' unless root
227:       __to_newick([], root, 0,
228:                   :__to_newick_format_leaf_NHX, options, &block) +
229:         __to_newick_format_leaf_NHX(root, Edge.new, options) +
230:         ";\n"
231:     end

Generates phylip-style distance matrix as a string. if nodes is not given, all leaves in the tree are used. If the names of some of the given (or default) nodes are not defined or are empty, the names are automatically generated.

[Source]

     # File lib/bio/db/newick.rb, line 256
256:     def output_phylip_distance_matrix(nodes = nil, options = {})
257:       nodes = self.leaves unless nodes
258:       names = nodes.collect do |x|
259:         y = get_node_name(x)
260:         y = sprintf("%x", x.__id__.abs) if y.empty?
261:         y
262:       end
263:       m = self.distance_matrix(nodes)
264:       Bio::Phylip::DistanceMatrix.generate(m, names, options)
265:     end

Gets the parent node of the node. If root isn‘t specified or root is nil, @root is used. Returns an Node object or nil. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 629
629:     def parent(node, root = nil)
630:       root ||= @root
631:       self.path(root, node)[-2]
632:     end

Gets path from node1 to node2. Retruns an array of nodes, including node1 and node2. If node1 and/or node2 do not exist, IndexError is raised. If node1 and node2 are not connected, NoPathError is raised. The result is unspecified for cyclic trees.

[Source]

     # File lib/bio/tree.rb, line 590
590:     def path(node1, node2)
591:       raise IndexError, 'node1 not found' unless @pathway.graph[node1]
592:       raise IndexError, 'node2 not found' unless @pathway.graph[node2]
593:       return [ node1 ] if node1 == node2
594:       step, path = @pathway.bfs_shortest_path(node1, node2)
595:       unless path[0] == node1 and path[-1] == node2 then
596:         raise NoPathError, 'node1 and node2 are not connected'
597:       end
598:       path
599:     end

# Removes an edge between source and target. # Returns self. # If the edge does not exist, raises IndexError. +

[Source]

     # File lib/bio/tree.rb, line 448
448:     def remove_edge(source, target)
449:       unless self.get_edge(source, target) then
450:         raise IndexError, 'edge not found'
451:       end
452:       fwd = [ source, target ]
453:       rev = [ target, source ]
454:       @pathway.relations.delete_if do |rel|
455:         rel.node == fwd or rel.node == rev
456:       end
457:       h = @pathway.graph[source]
458:       h.delete(target) if h
459:       h = @pathway.graph[target]
460:       h.delete(source) if h
461:       self
462:     end

Removes each edge if the block returns not nil. Returns self.

[Source]

     # File lib/bio/tree.rb, line 466
466:     def remove_edge_if #:yields: source, target, edge
467:       removed_rel = []
468:       @pathway.relations.delete_if do |rel|
469:         if yield rel.node[0], rel.node[1], edge then
470:           removed_rel << rel
471:           true
472:         end
473:       end
474:       removed_rel.each do |rel|
475:         source = rel[0]
476:         target = rel[1]
477:         h = @pathway.graph[source]
478:         h.delete(target) if h
479:         h = @pathway.graph[target]
480:         h.delete(source) if h
481:       end
482:       self
483:     end

Removes the given node from the tree. All edges connected with the node are also removed. Returns self. If the node does not exist, raises IndexError.

[Source]

     # File lib/bio/tree.rb, line 421
421:     def remove_node(node)
422:       self.clear_node(node)
423:       @pathway.graph.delete(node)
424:       self
425:     end

Removes each node if the block returns not nil. All edges connected with the removed nodes are also removed. Returns self.

[Source]

     # File lib/bio/tree.rb, line 430
430:     def remove_node_if
431:       all = self.nodes
432:       all.each do |node|
433:         if yield node then
434:           self.clear_node(node)
435:           @pathway.graph.delete(node)
436:         end
437:       end
438:       self
439:     end

Removes all nodes that are not branches nor leaves. That is, removes nodes connected with exactly two edges. For each removed node, two adjacent edges are merged and a new edge are created. Returns removed nodes. Note that orphan nodes are still kept unchanged.

[Source]

     # File lib/bio/tree.rb, line 790
790:     def remove_nonsense_nodes
791:       hash = {}
792:       self.each_node do |node|
793:         hash[node] = true if @pathway.graph[node].size == 2
794:       end
795:       hash.each_key do |node|
796:         adjs = @pathway.graph[node].keys
797:         edges = @pathway.graph[node].values
798:         new_edge = get_edge_merged(edges[0], edges[1])
799:         @pathway.graph[adjs[0]].delete(node)
800:         @pathway.graph[adjs[1]].delete(node)
801:         @pathway.graph.delete(node)
802:         @pathway.append(Bio::Relation.new(adjs[0], adjs[1], new_edge))
803:       end
804:       #@pathway.to_relations
805:       @pathway.relations.reject! do |rel|
806:         hash[rel.node[0]] or hash[rel.node[1]]
807:       end
808:       return hash.keys
809:     end

Gets the sub-tree consisted of given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. Note that the sub-tree shares Node and Edge objects with the original tree.

[Source]

     # File lib/bio/tree.rb, line 522
522:     def subtree(nodes)
523:       nodes = nodes.find_all do |x|
524:         @pathway.graph[x]
525:       end
526:       return self.class.new if nodes.empty?
527:       # creates subtree
528:       new_tree = self.class.new
529:       nodes.each do |x|
530:         new_tree.add_node(x)
531:       end
532:       self.each_edge do |node1, node2, edge|
533:         if new_tree.include?(node1) and new_tree.include?(node2) then
534:           new_tree.add_edge(node1, node2, edge)
535:         end
536:       end
537:       return new_tree
538:     end

Gets the sub-tree consisted of given nodes and all internal nodes connected between given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. The result is unspecified for cyclic trees. Note that the sub-tree shares Node and Edge objects with the original tree.

[Source]

     # File lib/bio/tree.rb, line 548
548:     def subtree_with_all_paths(nodes)
549:       hash = {}
550:       nodes.each { |x| hash[x] = true }
551:       nodes.each_index do |i|
552:         node1 = nodes[i]
553:         (0...i).each do |j|
554:           node2 = nodes[j]
555:           unless node1 == node2 then
556:             begin
557:               path = self.path(node1, node2)
558:             rescue IndexError, NoPathError
559:               path = []
560:             end
561:             path.each { |x| hash[x] = true }
562:           end
563:         end
564:       end
565:       self.subtree(hash.keys)
566:     end

Returns total distance of all edges. It would raise error if some edges didn‘t contain distance values.

[Source]

     # File lib/bio/tree.rb, line 716
716:     def total_distance
717:       distance = 0
718:       self.each_edge do |source, target, edge|
719:         distance += get_edge_distance(edge)
720:       end
721:       distance
722:     end

[Validate]