Parsifal
XML Parser C library

Abstract


Parsifal is a minimal non-validating XML parser written in ANSI C. Parsifal implements the subset of SAX2 including namespace support.

Parsifal can be used for parsing XML based messages (such as SOAP and RSS) and for application specific data processing e.g. config files, data files etc. Parsifal can also be used for document-oriented processing (e.g. XHTML) and for parsing modular documents because it is conforming non-validating XML 1.0 parser and it supports features like internal and external general entities, DTD parameter entities and default attributes etc. (these make for example processing of XHTML modules possible (e.g. xhtml1-transitional.dtd)). Parsifal is ideal for processing large data files and streams since it's SAX based and consumes very little memory not to mention it is fast enough for most purposes 'cos it's written in C.

Using Parsifal in place of large XML processing libraries (e.g. libxml, xerces) or even in the place of small Expat (which is considerably bigger and more complicated) can be justified for limited memory environments and in applications requiring bundled parser. If you need higher level tools, for example library supporting DTD validation or dom/xpath processing, you should look for other libs of course.

You can download Parsifal including source, documentation and samples from here



Features


Supported SAX events  
startDocument/endDocument  
startElement/endElement Thorough namespace support in startElement/endElement callbacks. Also supports getting attributes by name or by index using methods similar to SAX attributes handling.
characters  
ignorableWhitespace  
comment  
startCDATA/endCDATA  
processingInstruction  
errorHandler  
startDTD/endDTD  
defaultHandler for miscellaneous character data
xmlDecl reports XML declaration <?xml version="1.0" ... tag
skippedEntity  
resolveEntity/externalEntityParsed for parsing external entities/external DTDs
startEntity/endEntity  
elementDecl, attributeDecl, entityDecl... Used for reporting DTD declarations


XML 1.0 features that are not currently supported by Parsifal:



Supported SAX properties/features
http://xml.org/sax/features/namespaces
http://xml.org/sax/features/namespace-prefixes
http://xml.org/sax/features/external-general-entities
see XMLFlags for info on Parsifal specific properties


Supported XML encodings



When compiled with GNU libiconv support:


see also Notes about encodings


Licence


Parsifal is released to the public domain and is provided "AS IS," without a warranty of any kind. Use at your own risk. See COPYING. Note that even though Parsifal is Public Domain software, GNU libiconv uses GPL licence and that will affect your software too if you use libiconv.

Conformance


Parsifal accepts only well-formed XML documents and despite of its small size Parsifal enforces strict rules for XML tag names, namespace declarations, XML declaration etc. See OASIS XML testsuite results.

How to use


Read the manual page. Examine the samples that come with the download.


Sample Description (see README in each sample dir for more info)
elements.c Simple example that output elements from stdin into stdout with some indentation. README
zenstory.c
zenstory.h
Despite of its name demostrates some real world SAX parsing techniques. README
canonxml.c Turns input XML file into canonical XML (linefeeds turned into character references, attributes sorted etc.). Is used by xmltest OASIS XML testsuite parser. README
winurl.c Uses windows urlmon.dll for simple parsing of urls - only inputsource handling is windows specific, otherwise os independent. README
xmltest.c OASIS XML testsuite parser README
test_pool.c Demonstrates XMLVector, XMLStringbuf and XMLPool usage.
(These are ADTs that are used internally by Parsifal but can be used in your application too - This example has nothing to do with XML parsing)
 


Performance


I've done some Parsifal benchmarking on my Dell Inspiron 8200 laptop:


In-memory 11 MB test.rdf UTF-8 encoded file, just dummy startElement, endElement and characters handlers set, gets parsed in about 0.66 sec (namespaces on). Not bad though parsing performance could be measured in many ways; sometimes parser's fast initialization time and small memory usage could lead to better overall performance with small documents in some messaging environment for example.

Expat-1.95.6 parses test.rdf in about 0.45 sec (namespaces on), but oh the complexity of it... Note also that although Parsifal is slower than Expat, Parsifal provides more thorough information for some events; namespace information and XMLParser_GetNamedItem and other helper routines make Parsifal easier for some parsing tasks.

There's also many optimizations areas in Parsifal so more optimizations are expected in the future.

NOTE: 11 MB doc is relatively large XML doc and if that's parsed in less than a second in my test system this means that parsing should be fast enough for everybody; for example 654 KB 1998statistics.xml (http://www.ibiblio.org/xml/examples/1998statistics.xml) gets parsed in about 0.038 sec! I've also parsed very large docs (like 256 MB file) with Parsifal without problems. Freshmeat project dump fm-projects.rdf gets parsed in 4.5 secs! (about 83 MB from http://download.freshmeat.net/backend/ - you should get compressed .bz2 file if you're interested in that)


ChangeLog


ChangeLog is here. You might want to read API changes too.


Copyright © 2002-2004 Toni Uusitalo.
Send mail, suggestions and bug reports to

Last modified: 11.08.2004 23:53