ChangeLog - FASTA v35


$Name: fa35_03_06 $ - $Id: changes_v35.html,v 1.5 2008/02/19 08:50:13 wrp Exp $


Summary - Major Changes in FASTA version 35 (August, 2007)

  1. Accurate shuffle based statistics for searches of small libraries (or pairwise comparisons).

  2. Inclusion of lalign35 (SIM) into FASTA3. Accurate statistics for lalign35 alignments. plalign has been replaced by lalign35 and ps_lav.
  3. Two new global alignment programs: ggsearch35 and glsearch35.

January 25, 2007

Support protein queries and sequence libraries that contain 'O' (pyrrolysine) and 'U' (selenocysteine). ('J' was supported already). Currently, 'O' is mapped automatically to 'K' and 'U' to 'C'.

December 10, 2007

Provide encoded annotation information with -m 9c alignment summaries. The encoded alignment information makes it much simpler to highlight changes in critical residues.

August 22, 2007

A new program is available, lav2svg, which creates SVG (Scalable Vector Graphics) output. SVG files are more easily edited with Adobe Illustrator than postscript (lav2ps) files.

July 25, 2007 CVS fa35_02_02

Change default gap penalties for OPTIMA5 matrix to -20/-2 from -24/-4.

July 23, 2007

Add code to support to support sub-sequence ranges for "library" sequences - necessary for fully functional prss (ssearch35) and lalign35. For all programs, it is now possible to specify a subset of both the query and the library, e.g.
lalign35 -q mchu.aa:1-74 mchu.aa:75-148
Note, however, that the subset range applied to the library will be applied to every sequence in the library - not just the first - and that the same subset range is applied to each sequence. This probably makes sense only if the library contains a single sequence (this is also true for the query sequence file).

July 3, 2007 CVS fa35_02_01

Merge of previous fasta34 with development version fasta35.

June 26, 2007

Add amino-acid 'J' for 'I' or 'L'.

Add Mueller and Vingron (2000) J. Comp. Biol. 7:761-776 VT160 matrix, "-s VT160", and OPTIMA_5 (Kann et al. (2000) Proteins 41:498-503).

June 7, 2007

ggssearch35(_t), glsearch35(_t) can now use PSSMs.

May 30, 2007 CVS fa35_01_04

Addition of ps_lav -- which can be used to plot the lav output of lalign35 -m 11.
lalign35 -m 11 | ps_lav
replaces plalign (from FASTA2).

May 2, 2007

The labels on the alignment scores are much more informative (and more diverse). In the past, alignment scores looked like:
>>gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-transfer  (218 aa)
 s-w opt: 1497  Z-score: 1857.5  bits: 350.8 E(): 8.3e-97
Smith-Waterman score: 1497; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218)
^^^^^^^^^^^^^^
where the highlighted text was either: "Smith-Waterman" or "banded Smith-Waterman". In fact, scores were calculated in other ways, including global/local for fasts and fastf. With the addition of ggsearch35, glsearch35, and lalign35, there are many more ways to calculate alignments: "Smith-Waterman" (ssearch and protein fasta), "banded Smith-Waterman" (DNA fasta), "Waterman-Eggert", "trans. Smith-Waterman", "global/local", "trans. global/local", "global/global (N-W)". The last option is a global global alignment, but with the affine gap penalties used in the Smith-Waterman algorithm.

April 19, 2007 CVS fa34t27br_lal_3

Two new programs, ggsearch35(_t) and glsearch35(_t) are now available. ggsearch35(_t) calculates an alignment score that is global in the query and global in the library; glsearch35(_t) calculates an alignment that is global in the query and local, while local in the library sequence. The latter program is designed for global alignments to domains. Both programs assume that scores are normally distributed. This appears to be an excellent approximation for ggsearch35 scores, but the distribution is somewhat skewed for global/local (glsearch) scores. ggsearch35(_t) only compares the query to library sequences that are beween 80% and 125% of the length of the query; glsearch limits comparisons to library sequences that are longer than 80% of the query. Initial results suggest that there is relatively little length dependence of scores over this range (scores go down dramatically outside these ranges).

March 29, 2007 CVS fa34t27br_lal_1

At last, the lalign (SIM) algorithm has been moved from FASTA21 to FASTA35. Currently, only lalign35 is available (in May, a
plalign equivalent was released). The statistical estimates for lalign35 should be much more accurate than those from the earlier lalign, because lambda and K are estimated from shuffles. In addition, all programs can now generate accurate statistical estimates with shuffles if the library has fewer than 500 sequences. If the library contains more than 500 sequences, and the sequences are related, then the -z 11 option should be used.
FASTA v34 Change Log