pypersrc implementation

©2005  Jim E. Brooks  http://www.jimbrooks.org/hypersrc/

See also pypersrc design.


Contents


Python and C++ integration

pypersrc is mostly written in Python but a small percentage is written in C++ for speed and to take advantage of the C++ STL. Python allows itself to be extended by C functions, not directly by C++ objects. pypersrc starts as a C++ executable, not as a Python script file. The C++ extensions surface in Python as modules with a "pycpp_" prefix. In the following, Python imports the C++ extension by "import pycpp_greedydict". GreedyDict_New() is a C function in src/cc/GreedyDict.cc

src/py/GreedyDict.py:

import pycpp_greedydict

class GreedyDict:
    """ Dictionary that greedily finds the longest matching key. """

    def __init__( s, hintKeySize=15 ):
        pycpp_greedydict.GreedyDict_New( s, hintKeySize )

src/cc/GreedyDict.cc:

/*****************************************************************************
 * Interface to Python: Create a new GreedyDict C++ object.
 *****************************************************************************/
PyObject*
GreedyDict_New( PyObject* self_, PyObject* args )
{
    ...
        GreedyDict* greedyDict = new GreedyDict( hintKeySize );
}

C++ classes that extend Python

DequeChar

The lex code, which is written in Python, needs an efficient deque. Source files are lexed char-by-char using a deque. Python doesn't provide an efficient deque so this C++ class was written. It provides an interface to Python over a C++ STL deque<char>. Examples of methods are: IfEmpty(), PushHead(), PopHead(), PushTail(), etc.

GreedyMap

GreedyDict and GreedyMap are the same thing. GreedyDict and GreedyMap follow Python and C++ naming conventions, resp. GreedyDict is also the name of a C++ typedef for GreedyMap<PyObject*>.

GreedyMap was written because the lexer needs a special associative container that can both greedily and partially match keys with a given pattern. GreedyMap arose from the problem of lexing C keywords in a C source file. For example, the C keyword "#ifdef" should be greedily matched, not just "#if". Simultaneous to greedy matching, partial matching is necessary. For example, the stored key "#ifdef" (ot just "#if") must match the pattern "#ifdef DEBUG".

The definitions of the two special matching behaviors of GreedyMap:

As an example of both greedy and partial matching, assuming { "ab", "abc", "abcd" } are stored in a GreedyMap, the given pattern "abc123" will be matched with the key "abc". Why? First note that key "ab" matches the leftmost part of the pattern. But since we're greedy, matching continues to the key "abc" which also forms a partial match. But the "d" in the key "abcd" causes a mismatch with the pattern.

The Lex object initializes greedyDictKeywords with C keywords. Later when a source file is being loaded and lexed, LexKeyword() extracts a string from the deque holding source file, and calls GreedyDict to try to match it with a C keyword. The partial matching behavior helps LexKeyword() because just extracts enough characters from the deque to match the longest C keyword, but it usually extracts too many characters. As in the case of lexing "#ifdef DEBUG", " DEBUG" were the chars extracted excessively.

class LexBase:
...
        # Used to match keywords using "greedy" searches.
        # Eg, to find "#ifdef" instead of just "#if".
        s.greedyDictKeywords = GreedyDict( s.keywordMax )
        for (key,val) in s.keywordDict.items():
            s.greedyDictKeywords[key] = val

    def LexKeyword( s, tokenPrev ):
        """ Lex a possible keyword.  Returns (token,keyword) or (None,None). """
        # Greedy match.
        try:
            try:
                text = s.deq.PeekHead( s.keywordMax )
            except:
                text = s.deq.PeekHead( s.deq.Size() )  # peeked past EOF
            (key,val) = s.greedyDictKeywords.Find( text )
            s.deq.PopHead( len(key) )
            return (val,key)
        except:
            return (None,None)

In the C++ code, GreedyMap is a class template (GreedyMap.hh) with methods for inserting, finding, and iterating-by-callback thru key/value pairs. GreedyDict is just a typedef for GreedyMap<PyObject*> (GreedyDict is also the name of a Python class). GreedyDict.cc provides the C functions that interfaces the Python code with the GreedyMap object.


Last modified: Mon Feb 20 23:13:39 EST 2006