The Ragbag package is a set of classes for setting up a virtual sequence contig without the need for writing Biojava code. It's a sort of Ensembl for Dummies. It uses ComponentFeatures to effect this.
What would I use it for?
Any situation where you need to provide a sequence assembled from other sequences and associated annotations, e.g. a DAS server. Other more sophisticated products exist like Ensembl with more features and superior performance but need a brain the size of a planet to set up. Ragbag is not intended to compete with those packages but to just provide an easier way of setting up a minimalistic annotated sequence.How do I use it?
To use Ragbag, you only need to understand two structures. Every directory represents a sequence.
A directory that doesn't have a file named "Map" is assumed to have a single sequence file and (optionally) a directory named "Annotation" containing files holding features to be annotated onto that sequence. The sequence file will be instantiated into a SimpleSequence object and all the features present in files in the Annotation directory will be applied as features on this SimpleSequence object.
If a directory has a "Map" file, then that file contains in an XML format all the mappings of sequences in that directory onto the SimpleAssembly object that the directory is associated with. These sequences can be in sequence files or be directories that themselves represent other virtual sequences either of the SimpleSequence or SimpleAssembly types. Any files in the Annotation directory will be applied as features onto the SimpleAssembly directly.
A hierarchy of sequences can be constructed by merely constructing a directory tree and editing the Map files.
From a software perspective, instantiating a RagbagAssembly on the root directory will instantiate the whole contig in memory. You do have to pass to RagbagAssembly a RagbagSequenceFactory object to determine the type of sequence object it will use and the type of cache it will utilize to manage your resources.
Current Limitations and Future Plans
The EMBL, GAME, Genbank and XFF formats currently supported. GFF will eventually be supported but the current Biojava GFF parser is appears rather different from the other parsers. Further improvements in lazy instantiation and caching may be explored. It should now be possible to annotate your virtual contig by editing the EMBL/Genbank files within the directory structure with Artemis!
Under active development. Not warranted to work for any particular or even general purpose. Suited to coders with a death wish. VERY EXPERIMENTAL CODE!!!!