Issue 14

by Felix Winkelmann

2010-11-29 +0000

0. Introduction

Welcome to issue 14 of the Chicken Gazette, slapped together at the last minute by Felix.

1. The Hatching Farm

2. Core development

Some work has been done on the "fat-symbols" branch which adds a additional slot to symbol objects containing a precomputed hash value. This is required for something called "lazy gensyms", a clever trick (by R. Kent Dybvig, not by me, of course) to create names of "gensym" (uninterned generated symbols used heavily in syntax-expansion and the compiler) only on demand, that is, when they are printed or the name is explicitly requested. Everything seems to work but the performance improvement is disappointing - it seems to be that hashing symbol-names has neglible cost and that the additional memory requirement for the extra slot possibly negates any improvements gained by the reduced symbol-name generation. Needs more benchmarking.

The change-request for umask (umask support) has been discussed a little and will (hopefully) be voted upon soon. We waited long enough for this ground-breaking feature!

3. Chicken Talk

Alan Post is putting final touches on his genturfahi packrat parser and asked about the most efficient way to store character offsets into an utf8 string. Various people responded to this, posting ideas or even substantial code examples (like this by the one and only Joerg Wittenberger). Felix also took part in this discussion but quickly became confused, forgetting the original problem - well, that's how we know him, eh, me. Boy, this is confusing... Anyway, this is still an open issue, since working with arbitrary sub-sections of strings without requiring a copy of the subsequence is a reappearing problem.

Alan also asked whether an extension can install both a dynamically loadable library and an executable with the same name. Well it turns out, you can!

Unfortunately the large bodies of code generated by the parser results in excessive compile-times, as reported by Alan. This is really critical, and he works heavily on changing the generated code patterns to reduce the amount of code. Wether this is just inherent in the code-generation style or whether chicken just does a bad job at optimizing this still has to be resolved. As Alan kindly pointed out, the fellas on #chicken did their best in suggesting improvements.

David Dreisigmeyer reported problems using Paredit with the cluck or quack emacs extensions - it's unclear what can be done here. Perhaps a reader can suggest something? If yes, drop into #chicken or write on the mailing list.

Hugo Arregui posted code for defining "memoizing" procedures and many people suggested improvements and gave comments.

And last but not least Daishi Kato asked for help with a shortcoming in the csv egg, which Ivan quickly fixed.

4. Omelette Recipes

This week I'm going talk about the fast-generic extension, a fascinating little tool for generating efficient dispatching code for generic functions. The egg is a port of Thant Tessman's generic code, written originally for Chez Scheme. It defines two macros for defining types and declaring generic functions over these types. This is best described by an example:

(define-type <list> list?)
(define-type <vector> vector?)

(define-generic (element-ref (<list> lst) i) (list-ref lst i))
(define-generic (element-ref (<vector> vec) i) (vector-ref vec i))

(element-ref '(a b c) 1)         ==> b
(element-ref '#(99 100 101) 2)   ==> 101

define-type defines a named type that is characterized by a predicate and define-generic creates a function definition that can be specialized for certain argument types.

Types can have subtype-relationship by adding a third parameter to define-type:

(define-type <vector> vector?)

(define (odd-sized-vector? x)
  (and (vector? x) (odd? (vector-length x))))

(define-type <odd-sized-vector> odd-sized-vector? <vector>)

(define-generic (pairwise (<vector> v))
  (chop (vector->list v) 2))

(define-generic (pairwise (<odd-sized-vector> v))
  (error "I'm sorry, Dave, but I can't do that."))

The interesting part is when we look at the implementation of this. Expanding the generic function definition above results in

  (##core#begin
    ;; specialized code
    (define (pairwise<<odd-sized-vector>> v)
      (error "I'm sorry, Dave, but I can't do that."))
    ;; dispatching procedure
    (define (pairwise v)
      (let* ((g155 '##fast-generic#unset) (g156 g155))
	(or (and (odd-sized-vector? v)
		 (begin (set! g156 (pairwise<<odd-sized-vector>> v)) #t))
	    (and (#%vector? v) (begin (set! g156 (pairwise<<vector>> v)) #t)))
	(if (#%eq? g155 g156)
	  (fast-generic#generic-error
	    'pairwise
	    (#%list v)
	    '((pairwise<<vector>> <vector>)
	      (pairwise<<odd-sized-vector>> <odd-sized-vector>)))
	  g156)))
    (define-compiler-syntax
      pairwise
      (lambda (x2 r2 c2)
	(if (#%eq? 1 (#%length (#%cdr x2)))
	  (let* ((arg-list (#%cdr x2))
		 (result (#%gensym))
		 (unset (#%gensym))
		 (args (#%map (lambda _ (#%gensym)) arg-list))
		 (method-body
		   (fast-generic-compile-time#build-or-clause
		     (#%get 'pairwise 'generic-trees)
		     args
		     args
		     result
		     r2)))
	    (#%list
	      (r2 'let*)
	      (#%cons
		(#%list
		  unset
		  (#%list
		    (r2 'cons)
		    (#%list (r2 'quote) 'unset)
		    (#%list (r2 'quote) '())))
		(#%cons (#%list result unset) (#%map #%list args arg-list)))
	      method-body
	      (#%list
		(r2 'if)
		(#%list (r2 'eq?) result unset)
		(#%list
		  (r2 'generic-error)
		  (#%list (r2 'quote) 'pairwise)
		  (#%cons (r2 'list) args)
		  (#%list
		    (r2 'quote)
		    '((pairwise<<vector>> <vector>)
		      (pairwise<<odd-sized-vector>> <odd-sized-vector>))))
		result)))
	  x2))))

Disregarding all the internal decorations, we see that this defines two procedures (a specialized version for this particular argument type and a dispatching procedure that performs the type checks and calls the suitable specialized procedure) and an ugly "compiler macro", which will replace direct calls to the generic with the dispatching code in-line. Interpreted code will ignore this and call the dispatching procedure, but compiled code will perform the dispatching at the call site, which gives the optimizer a lot of opportunities to reduce the checks or even eliminate them completely for literal arguments.

Subsequent definitions of generic procedures will overwrite the dispatching procedure and redefine the compiler-syntax and thus create more differentiated dispatching code with each new generic.

This makes generic functions and the required dispatching very fast. Another nice property is that inheritance is defined "conceptually", not necessarily "structurally". If you need the latter, something like coops is probably more suitable.

There are some caveats, though (as usual): the type-predicates have to be chosen with care and should be disjoint or funny things may happen. Also, excessive use of generic procedures can result in code-bloat. Only direct calls will be optimized and only if they appear textually after the definition of the generic. Non-direct calls will still refer to the dispatcher, so they will work. Since the types and generics are registered at compile-time, generic-procedures can be exported but dispatching code can not be created outside of the current compilation unit.

Still, for internal code that needs to dispatch this is very handy and has been successfully used in the sequences extension to define generic operations on various sequence-like data types.

For more information about this, check out Thant Tessman's paper on the original implementation, which can be found here.

5. About the Chicken Gazette

The Gazette is produced weekly by a volunteer from the Chicken community. The latest issue can be found at http://gazette.call-cc.org or you can follow it in your feed reader at http://gazette.call-cc.org/feed.atom. If you'd like to write an issue, check out the instructions and come and find us in #chicken on Freenode!

The chicken image used in the logo is kindly provided and © 2010 by Manfred Wischner