Automatic Memory Management in newLISP
Lutz Mueller, 2005-09-26 rev 12
During expression evaluation newLISP or any other interactive language system will constantly generate new memory objects resulting from intermediate evaluation results or from de-referencing memory objects due to new assignments or change of those memory objects in there contents. If this un-referenced memory were not taken care of by deletion than newLISP would run out of memory over time.
To understand newLISP's type of automatic memory management it is necessary to review the traditional methods employed by other languages.
Traditional automatic memory management
In most programming languages automatic memory management is realized by a process called Garbage Collection. This is a process where allocated but unused memory gets occasionally freed again. When memory is allocated it is registered in some form. Some other process, typically working asynchronous to the normal statement evaluation investigates the allocated memory pool for unused parts, which can be recycled for future use. This process is typically triggered by some memory allocation limit or happens synchronously between steps of the normal evaluation process.
In traditional garbage collection schemes two types of algorithms are employed:
(1) The mark and sweep type registers each allocated memory piece. Once in a while a mark algorithm flags each memory piece in the allocated pool, which is directly or indirectly referenced by a named object (a variable) in the system. The sweep phase de-allocates, frees all un-referenced memory.
(2) A reference counting scheme registers each allocated memory piece together with a reference count. This reference count gets incremented or decremented during expression evaluation. Whenever a reference count reaches zero this piece of memory can be freed.
Over time many elaborate schemes based on those two principles or combinations of them have been tried. The first algorithms were developed during the implementation of LISP, many of the more elaborate schemes were invented during the development of the Smalltalk language. The history of Smalltalk-80 is an exciting account about the challenges of memory management in an interactive programming language; see [Glenn Krasner, 1983 Smalltalk-80, Bits of History, Words of Advice]. A more recent overview about garbage collection methods can be found in [Richard Jones, Rafael Lins, 1996 Garbage Collection, Algorithms for Automatic Dynamic Memory Management].
One Reference Only, (ORO) memory management
Memory management in newLISP is different from memory management in other dynamic languages and based on a One Reference Only rule. Memory is never marked or reference counted, but a decision to delete a newly created memory object is taken right away after it has been created.
Empirical studies of LISP have shown that most LISP cells are not shared but can be reclaimed immediately during the evaluation process. newLISP does this by pushing a reference of each created memory object on to a result stack. When a higher order evaluation level is reached these memory objects can be deleted. Note, that this should not be confused with One-bit Reference Counting, no bits are set to mark objects as sticky or not, except for some optimizations for primitives like set, define and eval all evaluation results get pushed on the result stack for delayed deletion at the next higher evaluation level.
Except for symbols and object (context) references newLISP follows the One Reference Only: ORO-rule. This means, that every memory object not referenced by a symbol or object references is obsolete on the next higher evaluation level. It means also that all objects in the system (except for symbols and contexts) have to be passed to other user-defined functions by making a copy of them and are not passed by just referencing them. This is called Passing Parameters by Value instead of Passing Parameters by Reference.
The ORO rule simplifies not only memory management but also other aspects of the language. Traditional LISP users have to deal with two types of equality one for copied memory objects the other for references. This distinction is not necessary in newLISP where all objects are copied. But it also has disadvantages: LISP cells constantly have to be allocated and freed again. newLISP optimizes this process by allocating cell memory in bigger chunks from the host operating system. LISP cells then get requested from a free cell list and recycled into it. Very few CPU instructions (pointer assignments) are needed to unlink a free cell or re-insert a deleted cell.
The overall effect of this simplified memory management is a speed-up of the evaluation process compared to traditional LISPs and a smaller LISP implementation footprint. The higher frequency of cell creation / deletion in newLISP is more than compensated for by missing garbage collection overhead. Only during error conditions a simple mark and sweep algorithms is employed to free un-referenced cells.
Performance considerations with value-passing
Passing parameters by value (memory copy) instead of reference poses a potential disadvantage when dealing with large lists. For practical purposes the overhead needed to copy the list is small compared to the processing done on the list and can be neglected. To achieve maximum performance, newLISP has a group of destructive functions, which can be used when dealing with very large lists and the list can be enclosed in an object (context) and passed by context reference.
Several list manipulating functions work on the list in a destructive manner and do not create a new list as a return value, i.e. push can be used instead of cons when creating lists, use nth-set instead of set-nth when changing the contents of a large list. While cons and set-nth return a new memory object of the changed list, push, pop and nth-set change the existing list and return only a copy of the new / old element.
Context objects in newLISP are passed by reference and the best choice when passing big lists or string buffers by reference.
In practice the overhead created by copying parameters is small compared to other processing and it is more than compensated for by faster performance of ORO type of memory management in other areas.
Memory and datatypes in newLISP
String memory is allocated and freed directly from and to the hosts OS when requesting or returning their reference cells to the cell-memory chunks. This means that newLISP is more efficiently handling cell memory than string memory. It is often a better approach to use symbols than strings for efficient processing. I.e. when handling natural language it is far more efficient to handle natural language words as symbols in a separated name-space, than as strings. The 'spam-filter' program in the newLISP source distribution is a good example of his. newLISP can handle millions of symbols without degrading performance.Programmers coming from other programming languages frequently overlook that symbols in LISP are for much more than variables or object references, but are a useful data type in itself, which in many cases can replace strings.
Integer numbers and double floating point numbers are stored directly in newLISP's LISP cells and do not need a separate memory allocation / de-allocation.
Matrix functions in newLISP will allocate memory for matrix space then perform matrix operations like multiplication or inversion more efficientily on those matrices before converting them back to LISP cells and freeing memory space from the matrices. This speeds up greatly processing of matrices.
Refererences
- Glenn Krasner, 1983. Smalltalk-80, Bits of History, Words of Advice
Addision Wesley Publishing Company
- Richard Jones, Rafael Lins, 1996 Garbage Collection, Algorithms for Automatic Dynamic Memory Management
John Wiley & Sons
Copyright © 2004, Lutz Mueller http://newlisp.org . All rights reserved.