We have several resources to help those wishing to learn the arcane craft of dataset creation. A good place to start is the LST File Classes, they give a good overview of the basics of the dataset structure as well as detailed instructions on coding races. Another good resource is the my_dataset directory in the data folder. This is a framework for a PCGen dataset. It is meant for beginners who want to take their first steps in entering their own custom data into PCGen and have found that the built-in List Editors are too limiting for what they intend to do. It contains some commented out examples and a lot of advice and tips for putting your own sets together. You can get up and running fast by renaming the elements in this set to your suit your needs. Once you start delving deep into LST code you're going to want to keep the LST index handy, most LST tags have been documented with details on their use.
Some tags are labeled New or Updated as well as Deprecate, Obsolete and Removed.