Parsifal is a minimal non-validating XML parser written in ANSI C. Parsifal implements the subset of SAX2 including namespace support.
Parsifal can be used for parsing XML based messages (such as SOAP and RSS) and for application specific data processing e.g. config files, data files etc. Parsifal can also be used for document-oriented processing (e.g. XHTML) and for parsing modular documents because it is conforming non-validating XML 1.0 parser and it supports features like internal and external general entities, DTD parameter entities and default attributes etc. (these make for example processing of XHTML modules possible (e.g. xhtml1-transitional.dtd)). Parsifal is ideal for processing large data files and streams since it's SAX based and consumes very little memory not to mention it is fast enough for most purposes 'cos it's written in C.
Using Parsifal in place of large XML processing libraries (e.g. libxml, xerces) or even in the place of small Expat (which is considerably bigger and more complicated) can be justified for limited memory environments and in applications requiring bundled parser. If you need higher level tools, for example library supporting DTD validation or dom/xpath processing, you should look for other libs of course.
You can download Parsifal including source, documentation and samples from here
Supported SAX events | |
---|---|
startDocument/endDocument | |
startElement/endElement | Thorough namespace support in startElement/endElement callbacks. Also supports getting attributes by name or by index using methods similar to SAX attributes handling. |
characters | |
ignorableWhitespace | |
comment | |
startCDATA/endCDATA | |
processingInstruction | |
errorHandler | |
startDTD/endDTD | |
defaultHandler | for miscellaneous character data |
xmlDecl | reports XML declaration <?xml version="1.0" ... tag |
skippedEntity | |
resolveEntity/externalEntityParsed | for parsing external entities/external DTDs |
startEntity/endEntity | |
elementDecl, attributeDecl, entityDecl... | Used for reporting DTD declarations |
Supported SAX properties/features |
---|
http://xml.org/sax/features/namespaces |
http://xml.org/sax/features/namespace-prefixes |
http://xml.org/sax/features/external-general-entities |
see XMLFlags for info on Parsifal specific properties |
Read the manual page. Examine the samples that come with the download.
Sample | Description (see README in each sample dir for more info) | |
---|---|---|
elements.c | Simple example that output elements from stdin into stdout with some indentation. | README |
zenstory.c zenstory.h |
Despite of its name demostrates some real world SAX parsing techniques. | README |
canonxml.c | Turns input XML file into canonical XML (linefeeds turned into character references, attributes sorted etc.). Is used by xmltest OASIS XML testsuite parser. | README |
winurl.c | Uses windows urlmon.dll for simple parsing of urls - only inputsource handling is windows specific, otherwise os independent. | README |
xmltest.c | OASIS XML testsuite parser | README |
test_pool.c | Demonstrates XMLVector, XMLStringbuf and XMLPool usage. (These are ADTs that are used internally by Parsifal but can be used in your application too - This example has nothing to do with XML parsing) |
I've done some Parsifal benchmarking on my Dell Inspiron 8200 laptop:
In-memory 11 MB test.rdf UTF-8 encoded file, just dummy startElement, endElement and characters handlers set, gets parsed in about 0.66 sec (namespaces on). Not bad though parsing performance could be measured in many ways; sometimes parser's fast initialization time and small memory usage could lead to better overall performance with small documents in some messaging environment for example.
Expat-1.95.6 parses test.rdf in about 0.45 sec (namespaces on), but oh the complexity of it... Note also that although Parsifal is slower than Expat, Parsifal provides more thorough information for some events; namespace information and XMLParser_GetNamedItem and other helper routines make Parsifal easier for some parsing tasks.
There's also many optimizations areas in Parsifal so more optimizations are expected in the future.
NOTE: 11 MB doc is relatively large XML doc and if that's parsed in less than a second in my test system this means that parsing should be fast enough for everybody; for example 654 KB 1998statistics.xml (http://www.ibiblio.org/xml/examples/1998statistics.xml) gets parsed in about 0.038 sec! I've also parsed very large docs (like 256 MB file) with Parsifal without problems. Freshmeat project dump fm-projects.rdf gets parsed in 4.5 secs! (about 83 MB from http://download.freshmeat.net/backend/ - you should get compressed .bz2 file if you're interested in that)
Copyright © 2002-2004 Toni
Uusitalo.
Send mail, suggestions and bug reports to
Last modified: 11.08.2004 23:53