bio-0.5.2: A bioinformatics library

Safe HaskellSafe-Inferred

Bio.Alignment.AlignData

Contents

Description

Data structures and helper functions for calculating alignments

There are two ways to view an alignment: either as a list of edits (i.e., insertions, deletions, or substitutions), or as a set of sequences with inserted gaps.

The edit list approach is perhaps more restrictive model but doesn't generalize to multiple alignments.

The gap approach is more general, and probably more commonly used by other software (see e.g. the ACE file format).

Synopsis

Data types for gap-based alignemnts

data Dir

Constructors

Fwd 
Rev 

Instances

type Gaps = [Offset]

type Alignment a = [(Offset, Dir, Sequence a, Gaps)]

Helper functions

extractGaps :: SeqData -> (SeqData, Gaps)

Gaps are coded as *s, this function removes them, and returns the sequence along with the list of gap positions. note that gaps are positioned relative to the *gapped* sequence (contrast to stmassembler/Cluster.hs)

Data types for edit-based alignments

data Edit

An Edit is either the insertion, the deletion, or the replacement of a character.

Constructors

Ins Chr 
Del Chr 
Repl Chr Chr 

Instances

type EditList = [Edit]

An alignment is a sequence of edits.

type SubstMx t a = (Chr, Chr) -> a

A substitution matrix gives scores for replacing a character with another. Typically, it will be symmetric. It is type-tagged with the alphabet - Nuc or Amino.

type Selector a = [(a, Edit)] -> a

A Selector consists of a zero element, and a funcition that chooses a possible Edit operation, and generates an updated result.

type Chr = Word8

The sequence element type, used in alignments.

Helper functions

columns :: Selector a -> a -> Sequence b -> Sequence b -> [[a]]

Calculate a set of columns containing scores This represents the columns of the alignment matrix, but will only require linear space for score calculation.

eval :: SubstMx t a -> a -> Edit -> a

Evaluate an Edit based on SubstMx and gap penalty

isRepl :: Edit -> Bool

True if the Edit is a Repl.

on :: (t1 -> t1 -> t) -> (t2 -> t1) -> t2 -> t2 -> t

showalign :: [Edit] -> [Char]

toStrings :: EditList -> (String, String)

turn an alignment into sequences with - representing gaps (for checking, filtering out the - characters should return the original sequences, provided - isn't part of the sequence alphabet)