Issue 15

by Alaric Snell-Pym and Moritz Heidkamp

2010-12-07 +0000

0. Introduction

Welcome to issue 15 of the Chicken Gazette.

1. The Hatching Farm

In his everlasting quest to provide the best documentation tooling of all, Jim Ursetto created yet another chicken-doc spin-off. Luckily it is not called yacdso. Please welcome manual-labor!
Ivan Raikov went on and fixed an edge case problem in the npdiff egg, leading to the release of version 1.14. While he was at it he also released version 1.11 of format-textdiff fixing some bugs in the context diff handler.
Peter Lane updated his dataset-utils and classifiers eggs. If machine learning or data mining is your thing you might want to check those out!
Felix Winkelmann released version 0.2 of the handy progress-indicators extension.
Lastly, our daring Peter Bex wrestled with openssl once again. The new release 1.5 should hopefully fix the remaining issues we had with the Chicken Wiki.

2. Core development

Chicken's master branch has seen two bug fix commits this week. One of them fixed a problem with thread-sleep! which failed when it was passed an inexact/flonum sleep-time. Thanks to Karel Miklav for reporting this! The other one fixed a bug which caused all kinds of trouble when cons got redefined. This was noticed by David Steiner when he tried following the code examples in SICP using Chicken.

The experimental branch has been busy experimenting: In the aftermath of the jbogenturfa'i debugging session (see below), Felix decided to remove the lambda-lifter entirely since he figured it was rather ineffective at lifting lambdas. The scrutinizer on the other hand got some dents flattened and is here to stay. Finally, the aforementioned manual-labor is the official (experimental) manual generator now!

3. Chicken Talk

Readers of issue 11 may remember the T-DOSE picture gallery. Be informed that it now contains a few additional pictures, some of which are quite okay!

The chicken-users list was dominated by Alan Post's ongoing effort to create a Lojban parser this week, spanning across two threads. In the first of them he details his progress on optimizing the code's compilation while in the second he reports on problems he encountered when running it for the first time. As it turned out, they were caused by using equal? on a list containing circular data resulting in an endless recursion and finally in a stack overflow. A change request is imminent to ameliorate this situation.

Christian Kellermann kindly announced the all new Gazette authorship organisation mode. Check it out if you are interested in contributing or helping with a future Gazette issue!

4. Omelette Recipes

Today we're going to look at the amb egg. This egg implements the amb operator, much-loved as an example of the use of continuations to alter control flow in exciting ways, unusual evaluation orders, and a mind-altering image of a world where computers exploit parallel universes or quantum mechanics to explore multiple branches of a recursion.

However, amb is sometimes a useful tool; there's been a few cases where I've wished it was available in projects I've done in other languages, and I'm going to simplify one of them into an example.

But first, let's look at what amb does (and a little about how it works). Basically, amb is a form (it can be a macro or a procedure, and the difference in effect is immaterial for our purposes in this recipe) that has a variable number of arguments and, in principle, returns them all in separate "threads"; it can be thought of as a bit like POSIX's fork(), except lightweight. However, the intent is not to exploit parallelism (most implementations of amb do not provide any kind of pre-emption; each "thread" runs until it terminates, then lets another run), but to explore different possible control flows. As such, there is an amb-assert form that, if its argument is #f, kills the current "thread" and runs another.

So every time your program performs an amb, multiple threads pop into existence; and whenever it performs an amb-assert, some threads are culled. The principle is, whenever you have a point in your program where a choice must be made, you can use amb to have the program split up and explore every possible choice; and when it turns out, further down the line, to have been the wrong choice, you can kill it, freeing the CPU to explore another choice.

This is usually demonstrated with a program that solves a logic puzzle of the form "Peter, Jim, Moritz and Felix are hacking on the Chicken core. Peter is working only on source files written in Scheme. Moritz works only on posix.scm, whoever works on the expander also works on the compiler, you can't work on the scheduler without also working on csi, ...and so on... So, who introduced bug #232?". Which is all well and good for an undergraduate programming exercise, but who encounters such problems in real life?

Well, the general class of problem is a "tree search problem". The idea is you have some situation that can be modelled as a tree of possibilities (potentially infinite!), with solutions dotted around the tree; and our goal is to find perhaps all the solutions, perhaps just any solution, or perhaps the solution nearest to the root of the tree. Practical examples include finding a record in a B-Tree (where the tree structure actually corresponds to physical index blocks on disk), solving logic puzzles (where the tree structure is a purely logical tree of possible situations, some of which may yield solutions), or choosing the best move to make in a game (where the tree represents possible states of the game, and the moves that can move between states).

These kinds of algorithms are normally written as recursive functions, which take whatever represents a position on the tree as an argument, check to see if there's a solution there (and handle it appropriately if so, which may involve terminating the search if we've found enough solutions), then work out all the positions below this one in the tree and recursively call the function on each in turn. This forces a "depth-first" search, as the recursion will only bottom out if a solution is found (that terminates the search entirely) or a subtree is totally exhausted of options. Going for a "breadth-first" search, where every child of the current position is checked for solutions before starting to recurse down into the children of the children, is a little harder to implement; rather than just writing a recursive function that explores the tree, one must keep track of a queue of subtrees that will need exploring in future, pushing newly-found subtrees onto the back of the queue rather than exploring them as soon as they're found. However, breadth-first searches will find the solutions nearest to the top of the tree first, and will not flounder if they encounter infinite subtrees with no solutions in, so they are often more attractive than depth-first.

However, rather than writing a depth-first recursive search function, or a breadth-first search function using a queue, you can also implement these search problems as a simple function using amb. One benefit of this is that one can swap the amb implementation used from depth-first to breadth-first to choose the best search strategy, without changing your program - but that's a side benefit. The real benefit is, of course, clearer, more maintainable code - which makes the amb egg worth its memory usage in leaked diplomatic cables!

Let's look at a real example from my sordid past. I once wrote an app (in C, alas) to help people navigate the UK's complex welfare system, which basically lets people apply to receive money back from the government if they meet certain criteria. There's a lot of different benefits, each of which with different eligibility rules, and complex rules to work out how much benefit you're entitled to based on your circumstances. It's a nightmare, and exactly the sort of thing computers are good at. So I wrote a program to work it out. Now, working out eligibility for a benefit, and how much can be claimed, involves asking the user a lot of questions, which is tiresome. Many of these questions are shared between different benefits (or different branches of the complex flowchart you have to follow to work out some of the hairier benefits), and also, many of these questions need only be asked if certain paths are explored (questions about your child need only be asked if you're looking into benefits that are meant to help with the costs of raising children - which are only worth exploring at all if you actually have children; and the ones relating to pregnancy, well, there's no point in asking about any of them if the answer to "What is your biological gender?" was "Male".) We want to ask the minimum number of questions we can - so we shouldn't ask the same question twice, and we shouldn't ask questions unless we need the answers. The first problem can be solved by keeping a database of answers, and only asking the question if the desired information isn't already in the database; the second problem has to be solved by organising the control flow of the process to ask questions that can eliminate entire branches of enquiry up-front. Which implies a tree search!

As it happens, in C, I wrote a horrible nest of tangled-looking code that basically followed all the rules to work out what benefits you were entitled to. It worked, but it was hard to maintain - and the benefits rules change frequently. Sometimes the order of questions being asked or calculations being performed mattered (you needed to ask things up front to eliminate possibilities) and sometimes it was irrelevant, as I had to put them in some order, and only my comments in the code made it clear which was which. A big part of the problem was that I had to arrange the computation around the asking of the questions; this was fine when we could ask a single question that would choose the path to take (if the client is male, don't ask about pregnancy), but sometimes we had to work out several benefits that were available, working out each case in turn (which was complex, as claiming some benefits altered your eligibility for others) to see which one would work out best for the client - rejecting any they turned out to not be eligible for. The resulting code was messy indeed.

But enough prose - let's get to an example. With some code!

I can't remember the details of the UK welfare system as of the late 1990s, and it was incredibly tedious anyway. So we'll work on a fictitious over-simplified example, to get to the heart of the matter easily.

Let's say we have these benefits available:

Low Income Credit, which is payable to people who earn less than £10,000 a year, and whose partner (if the have one) earns less than £15,000 a year. You get given £(30,000 - total income of person/couple) / 10 per year, split into monthly payments.
Housing Benefit, which is payable to people or couples who earn less than £20,000 a year between them, or £15,000 if they have children, and who live in rented accomodation, and are not claiming Low Income Credit. You get £2,500 a year, in monthly payments, if you have children or earn less than £15,000; £2,000 a year if you do not have children and earn more than £15,000.
Carer's Allowance, which is payable to people whose partners are so ill that they need help with basic household tasks. The partner must not be earning any income for this to be claimed. It's a flat £1,000 a year. However, being an "allowance" rather than a "credit" or a "benefit", it counts as income so may affect other benefits.
Family Credit, which is available to the parent of a child (as long as the other parent is not also claiming it for the same child). It's a flat £1,000 a year, unless you (and your partner, if you have one) together earn more than £30,000 a year.

Now, on to the code. To avoid asking the same question more than once, we can keep a global alist of asked questions:

 (use srfi-1)

 (define *asked-questions* '())

 (define (ask question)
   (let ((existing (assoc question *asked-questions*)))
     (if existing
	 (cdr existing)
	 (begin
	   (write question) (newline)
	   (let ((answer (read-line)))
	     (set! *asked-questions*
		   (cons (cons question answer)
			 *asked-questions*))
	     answer)))))

As we use an actual global variable, this state will be preserved even if we backtrack with amb.

We can wrap that in a few functions that ask useful questions:

 (define (income allowances)
   (+ allowances (string->number
      (ask "What is your income, not including any allowances? Enter 0 for none"))))

 (define (has-partner?)
   (string=? (ask "Do you have a partner?") "yes"))

 (define (partner-is-ill?)
   (if (has-partner?)
       (string=? (ask "Does your partner need help around the house?") "yes")
       #f))

 (define (renting?)
   (string=? (ask "Do you rent your home?") "yes"))

 (define (partner-income)
   (if (has-partner?)
       (string->number
	(ask "What is your partner's income, including any allowances? Enter 0 for none"))
       0))

 (define (family-income allowances)
   (+ (income allowances) (partner-income)))

 (define (num-children)
   (string->number (ask "How many children do you have?")))

Now, clearly, "effective income" is the actual earnings of the person or couple, plus any "allowances" they receive, and what other benefits are being claimed may affect the computation of a benefit. So we will phrase our functions to work out the benefits as having an argument which is a record giving details of what else they're claiming. We can then write our basic benefit calculators as fairly direct translations of the rules, using amb-assert from the amb egg to reject any invalid benefits:

 (use amb)

 (define-record benefits
   claimed allowances)

 (define (claim-benefit existing-benefits benefit amount allowance?)
   (make-benefits
    (cons (cons benefit amount)
	  (benefits-claimed existing-benefits))
    (if allowance?
	(+ amount (benefits-allowances existing-benefits))
	(benefits-allowances existing-benefits))))

 (define (claiming? existing-benefits benefit)
   (assq benefit (benefits-claimed existing-benefits)))

 (define (compute-lic other-benefits)
   (amb-assert (< (income (benefits-allowances other-benefits)) 10000))
   (amb-assert (not (claiming? other-benefits 'hb)))
   (if (has-partner?)
       (amb-assert (< (partner-income) 15000)))
   (claim-benefit
    other-benefits
    'lic (/ (- 30000 (family-income (benefits-allowances other-benefits))) 10) #f))

 (define (compute-hb other-benefits)
   (if (zero? (num-children))
       (amb-assert (< (family-income (benefits-allowances other-benefits)) 20000))
       (amb-assert (< (family-income (benefits-allowances other-benefits)) 25000)))
   (amb-assert (renting?))
   (amb-assert (not (claiming? other-benefits 'lic)))
   (claim-benefit
    other-benefits
    'hb (if (and (zero? (num-children)) (> (family-income) 15000))
	    2000
	    2500) #f))

 (define (compute-ca other-benefits)
   (amb-assert (zero? (partner-income)))
   (amb-assert (partner-is-ill?))
   (claim-benefit
    other-benefits
    'ca 1000 #t))

 (define (compute-fc other-benefits)
   (amb-assert (not (zero? (num-children))))
   (amb-assert (< (family-income (benefits-allowances other-benefits)) 30000))
   (claim-benefit
    other-benefits
    'fc 1000 #f))

Having done that, we can try out all the possibilities using amb to decide whether we try to claim each benefit or not:

 (define (compute-benefits)
   (let* ((initial-state (make-benefits '() 0))
	  ;; Compute allowances
	  (claim-ca (amb #t #f))
	  (after-ca (if claim-ca (compute-ca initial-state) initial-state))
	  ;; Compute other benefits
	  (claim-fc (amb #t #f))
	  (after-fc (if claim-fc (compute-fc after-ca) after-ca))
	  (claim-hb (amb #t #f))
	  (after-hb (if claim-hb (compute-hb after-fc) after-fc))
	  (claim-lic (amb #t #f))
	  (after-lic (if claim-lic (compute-lic after-hb) after-hb)))
     after-lic))

What does compute-benefits actually do? For each benefit, it "splits the world" using amb into two versions - one where we try and claim this benefit, and one where we don't. We compute Carer's Allowance first, as it is an allowance which might affect later calculations; we can't compute any income-dependent benefits until we've decided if we're claiming this one or not, so it has to go first. The order of the others was picked arbitrarily.

compute-benefits then returns the final state of affairs as a benefits record, but it will return several times with different possible final states. We need to find them all, and pick the one with the highest benefits earned. The way to do that is with amb-collect, which takes an expression and makes a list of all the results that expression returns in different threads:

 (define (total-benefits benefits)
   (fold (lambda (claim sum-so-far)
	   (+ (cdr claim) sum-so-far))
	 0
	 (benefits-claimed benefits)))

 (define (best-benefits-to-claim)
   (let ((options
	  (sort
	   (amb-collect (compute-benefits))
	   (lambda (a b)
	     (< (total-benefits b)
		(total-benefits a))))))
     (display "Your best option is to claim the following: ")
     (write (benefits-claimed (car options)))
     (newline)))

Running (best-benefits-to-claim) will ask you some questions (if it's the first time you've run it, at any rate) and then suggest how much to claim of which benefits:

#;22> (best-benefits-to-claim)
"Do you have a partner?"
#;22> no
"How many children do you have?"
#;22> 2
"What is your income, not including any allowances? Enter 0 for none"
#;22> 15000
"Do you rent your home?"
#;22> yes
Your best option is to claim the following: ((hb . 2500) (fc . 1000))

You can try again if you clear the *asked-questions* store:

#;25> (set! *asked-questions* '())
#;26> (best-benefits-to-claim)
"Do you have a partner?"
#;26> yes
"What is your partner's income, including any allowances? Enter 0 for none"
#;26> 0
"Does your partner need help around the house?"
#;26> yes
"How many children do you have?"
#;26> 2
"What is your income, not including any allowances? Enter 0 for none"
#;26> 14500
"Do you rent your home?"
#;26> yes
Your best option is to claim the following: ((hb . 2500) (fc . 1000) (ca . 1000))

The resulting code is highly extensible. The entire complexities of choosing which benefits to go for are reduced to listing the requirements of each benefit inside its definition, using amb-asserts, then stacking up the benefits in the let* inside compute-benefits. If compute-benefits became much more complex, I'd be inclined to put the benefits functions into two lists (one for allowances, one for the rest, to ensure allowances are covered first) and iterate through them.

This example is a bit simplified and contrived - imagine each benefit has tens to hundreds of computation steps, many of which involve asking a question, and many of which depend on the results of previous calculations or assumptions; and imagine there are fifteen different benefits of varying complexity levels. And then imagine that the rules for each benefit change periodically, so you need to minimize the amount of duplication of information in your formulation of the rules.

Aren't you glad to be a smug Lisp weenie?

5. About the Chicken Gazette

The Gazette is produced weekly by a volunteer from the Chicken community. The latest issue can be found at http://gazette.call-cc.org or you can follow it in your feed reader at http://gazette.call-cc.org/feed.atom. If you'd like to write an issue, consult the wiki for the schedule and instructions!