Issue 13

by Peter Bex

2010-11-22 +0000

0. Introduction

Welcome to issue 13 of the Chicken Gazette!

1. The Hatching Farm

2. Core development

The following things have been changed in Chicken core by Felix:

A workaround was added to the scheduler for a problem with threads that accidentally get blocked on a filedescriptor but are not marked as blocked for I/O, which may help fix a few race conditions.

The ,g debugging toplevel command was fixed and some more feedback was added to it so it shows the variable being fetched.

The scrutinizer was updated to give a warning when a one-armed if is used in tail-position, as suggested on chicken-users by Jörg Wittenberg.

Also suggested by Jörg Wittenberg, make-mutex was changed so it's shorter and doesn't store the thread in the mutex.

The definition of ##sys#halt was modified to accept an error message argument, to make it match the call sites.

As suggested by Moritz Heidkamp, an SRFI-0 feature "irregex-is-core-unit" was added to the new Chicken to make it easier for egg authors to check whether it's running under an older or a newer Chicken.

A new equal=? procedure was added, which works like equal? except that it ignores exactness when comparing numbers.

The Chicken team has been discussing a feature request by Mario Goulart for adding umask support to Chicken's posix unit. This will be the first feature to go through the new Change Request procedure outlined by Felix. Anyone who would like to take part in shaping Chicken's evolution is free to add her or his insights about this functionality to the ticket or on the corresponding thread on the chicken-hackers mailinglist!

3. Chicken Talk

Just before the previous gazette was issued, Nicolas Pelletier posted a follow-up to an old thread of his about an overflow problem with the current-time procedure. He reported that he had tested the changes Felix made in the "flonum-milliseconds" branch, and they were working perfectly for him.

The latter half of the week saw a flurry of messages on the list:

On Wednesday, Jörg Wittenberg posted some ideas about how to improve Chicken's debugging information so that it would be able to detect a difficult problem he had encountered (but already solved). Felix responded that he agrees the scrutinizer could be smarter about issuing warnings and promptly added a change to the experimental branch which implements such a warning.

Later it also turned out that Jörg had been using the -local option in an attempt to optimize his code, but didn't fully foresee the consequences of that. The new version of Chicken does more aggressive local optimizations, which broke some of his assumptions because it now inlines certain procedures. The lesson: specific optimizations like -local are tricky things which change the semantics of Scheme, and hence they can break code later on when they're improved, so don't use them unless you're absolutely sure you need them!

Jörg then quickly followed with another post about "the hardship of upgrading Chicken from 4.6.1 to 4.6.3" in which he points out some possible regressions in the latter version, and he also mentions that srfi-18's make-mutex procedure could use some improvements. Again, Felix responded that he's lost his taste for square brackets (cue collective sigh of relief) and that he will integrate those improvements.

On Thursday, Felix posted a message to announce his "sequences" egg which adds generic operations on various "sequence" types like lists, strings etcetera, inspired by Common Lisp's sequences. He said he'd like to get some feedback on this API, which he got -- in spades! Alan Post, Thomas Chust, John Cowan, Hans Nowak and Jörg Wittenberg all responded to his request for comments, asking for the addition of several new procedures as well as sequence comprehensions, possibly based on SRFI-42. There was also a short discussion about the naming conventions in the sequence API.

On Friday, Daishi Kato posted a bugreport for the format-textdiff egg which Ivan Raikov dealt with swiftly.

Jeronimo Pellegrini also posted about a problem with threads which he ran into while working on an implementation of thread pools for didactic purposes. He also mentioned he found a bug in the error handling of the deadlock detection code. Jörg chimed in to tell him that he also noticed that Chicken's behaviour of how thread port buffering of newlines had changed recently.

On Saturday another new thread (!) was started by Alan Post in which he reported a bug in Chicken's keyword argument handling. He created a ticket in Trac to help track this bug, but with the help of Alex and Felix he found out it was not a bug in Chicken but in his own code; string->symbol does not produce keyword objects even when the string ends with a colon. After he changed his code to use string->keyword everything worked as it should. Keywords can be confusing things: they're not quite the same as symbols because they're self-evaluating, yet symbol? returns #t.

4. Omelette Recipes

With this week's omelette recipe we'll dip our toes in some of the more advanced uses of the insanely great PostgreSQL database. With the postgresql egg I've been trying to unlock some advanced features through a high-level API, so it becomes a breeze to use them!

But let's start simple. First, we create a database and matching user account so we can mess around with impunity:

 $ psql -U postgres  # or psql -U pgsql postgres (depending on OS/package)
 psql (8.4.4)
 Type "help" for help.
 
 postgres=# CREATE USER gazette WITH PASSWORD 'whatever';
 CREATE ROLE
 postgres=# CREATE DATABASE gazette WITH OWNER gazette;
 CREATE DATABASE

If the server is running, but the connection cannot be established, you might need to configure pg_hba.conf to allow connections. Please see the pg_hba.conf section of the Postgres manual for more info. The easiest is to set "trust" for "local" connections, assuming you can secure access to the unix domain socket file.

If you ever need a reminder about SQL syntax, you can type "\h CREATE USER" or even just "\h CREATE".

Now, let's fire up Chicken and connect to the database:

(use postgresql)
(define con (connect '((dbname . "gazette")
                       (user . "gazette")
		       (password . "whatever"))))

The alist you pass to the connect procedure can contain all the options described in the list of PQconnectdbParams parameter keywords. You can also pass a connection string as accepted by PQconnectdb, which is just a space-separated string that looks like this: "dbname=gazette user=gazette password=whatever"

Let us assume we're going to store daily average temperature measurements as part of a weather service. Assuming we have several stations across the nation that measure the temperature and we don't want to mess around with datetime types (as those are somewhat problematic in Chicken in my opinion), we could set up a table like this:

(query con (conc "CREATE TABLE weather "
                 "( year integer, "
                 "  month integer, "
		 "  day integer, "
		 "  avg_temps float[], "
                 "  PRIMARY KEY (year, month, day))"))

Disclaimer: Note that the datatypes we're going to define are probably not the best way to deal with this dataset. It's intended as just a simple demonstration. Good database design is still an art, and what's good design depends on how you're going to use your data! Rule of thumb: think of the queries you're going to do most often and optimise for those. It's just like choosing datatypes or algorithms in a programming language...

The temperatures column contains arrays of floating point numbers. This maps to a vector in Chicken:

;; Store temperatures at five measuring stations on 22 November
(query con (conc "INSERT INTO weather (year, month, day, avg_temps) "
                 "VALUES ($1, $2, $3, $4)")
           2010 11 22 (vector 6 7.5 5.2 4.6 5.4))

The dollar signs are placeholders, on which the positional value corresponding to the number after the dollar gets inserted. These values are the extra arguments to query following the query string. The library is smart enough to know how to convert vectors to Postgres native arrays. By the way, always use placeholders like this. Never ever splice values directly into the query string! You'd risk a visit from little Bobby Tables...

When we request our data back into Scheme, we get a vector again:

(value-at (query con (conc "SELECT avg_temps "
                           "FROM weather "
                           "WHERE year = $1 AND month = $2 AND day = $3")
                     2010 11 22))
=>
#(6 7.5 5.2 4.6 5.4)

value-at can return any value located at any row or column from a result set matrix, but by default, it'll pick the first column of the first row.

Now let's suppose that we want to store not only the average temperature, but also the amount of rain in millimeters that fell that day. We could just add another column, but let's say we prefer to keep the data provided by each measuring station together. In order to do so, we'll create a custom "composite type":

(query con "CREATE type weatherdata AS ( rainfall float, avg_temp float )")

And now we'll need to convert our existing data to make use of this new type. For safety, we'll wrap it in a transaction so if we make a mistake we won't end up with a half-converted or corrupted table. If you don't grok the UPDATE query, just skip it and assume it converts properly from the old to the new representation, using NULL values for the rainfall of the converted records.

(with-transaction con
  (lambda ()
    (query con "ALTER TABLE weather ADD COLUMN data weatherdata[]")
    (query con (conc "UPDATE weather w "
                     ;; Set the data to a generated row array:
                     "SET data=arrays.r "
		     "FROM (SELECT array_agg(ROW(NULL, t[i])::weatherdata) AS r, "
		     "             flat.year, flat.month, flat.day "

		     ;; This next SELECT returns the valid indices for each
		     ;; array and that array, so you'd get something like this:
		     ;;
		     ;;  1 | {a, b}    | year | month | day
		     ;;  2 | {a, b}    | year | month | day
		     ;;  1 | {c, d, e} | year | month | day
		     ;;  2 | {c, d, e} | year | month | day
		     ;;  3 | {c, d, e} | year | month | day
		     ;; ... etc ...

		     "      FROM (SELECT generate_subscripts(avg_temps, 1) i, "
		     "                   avg_temps t, year, month, day "
                     "            FROM weather "
                     "            ORDER BY i) AS flat "
                     "      GROUP BY flat.year, flat.month, flat.day) AS arrays "
		     
		     ;; Match up the data belonging to the row being updated
                     "WHERE arrays.year = w.year "
                     "  AND arrays.month = w.month "
                     "  AND arrays.day = w.day"))
    (query con "ALTER TABLE weather DROP COLUMN avg_temps")))

It's a little convoluted, but that's one of the disadvantages of working with arrays; all SQL operations are set-based so you will need to jump through some hoops to convert back and forth.

The with-transaction protects us from stupid mistakes. If an exception is thrown inside, it will automatically roll back the transaction, and since Postgres supports transactions almost everywhere, the alter table and update statements will be undone. Anyone else using the table in the meanwhile will not even notice anything was going on with it.

Anyway, we can now add some new values to the database for the next day, and retrieve what we have so far:

(query con (conc "INSERT INTO weather (year, month, day, data)"
                 "VALUES ($1, $2, $3, $4)")
           2010 11 23 '#((11 4.2) (8.3 5.8) (6.0 7.1) (7.3 6.2) (12 4)))
(row-fold-right cons '() (query con (conc "SELECT year, month, day, data "
                                          "FROM weather ORDER BY day")))
=>
((2010 11 22 #((#<sql-null-type> 6)
               (#<sql-null-type> 7.5)
               (#<sql-null-type> 5.2)
               (#<sql-null-type> 4.6)
               (#<sql-null-type> 5.4)))
 (2010 11 23 #((4.2 11) (5.8 8.3) (7.1 6) (6.2 7.3) (4 12))))

Row-fold-right simply performs a right fold of the supplied procedure over the resulting rows, with the column values provided in a list, in the order of the SELECT list's fields.

Now, the composite values from our weatherdata type are represented as ordinary lists. The position in the list corresponds to the position in the "ROW" type's constructor syntax, which itself corresponds to the ordering of the types in the CREATE TYPE statement we entered earlier. Of course this is not the most convenient way to deal with types, so let's improve the situation by creating an abstract type in Scheme to match the one in SQL, using the defstruct egg:

(use defstruct)
(defstruct weather-data rainfall average-temperature)

;; For debugging purposes:
(define-record-printer (weather-data w out)
  (fprintf out "{rain: ~A temp: ~A}"
               (weather-data-rainfall w)
               (weather-data-average-temperature w)))

This struct somehow needs to be serialized to an SQL row value (you can look up the exact expected syntax in the Composite Value Input section of the Postgres manual):

(define (weather-data-unparser conn w)
  (sprintf "(~A, ~A)"
           (weather-data-rainfall w)
           (weather-data-average-temperature w)))

In case of more complex types, you can also re-use the list-unparser procedure, which simply unparses all elements in a list. This is highly recommended when you are unparsing string values, since those may contain characters which have special meaning in the ROW syntax. That would look like this:

(define (weather-data-unparser conn w)
  (list-unparser conn (list (weather-data-rainfall w)
                            (weather-data-average-temperature w))))

While we're at it, let's also write a weather-data parser which can extract a weather data object from a string describing a ROW returned by the database:

(define weather-data-parser
  (let ((constituent-parser (make-composite-parser
                              (list numeric-parser numeric-parser))))
    (lambda (str)
      (let ((l (constituent-parser str)))
        (make-weather-data rainfall: (car l)
	                   average-temperature: (cadr l))))))

And now we hook these into the connection object:

(update-type-parsers! con
  (cons (cons "weatherdata" weather-data-parser)
        (type-parsers con)))

(update-type-unparsers! con
  (cons (cons weather-data? weather-data-unparser)
        (type-unparsers con)))

The first part says the "weatherdata" type (which you can find in the pg_type system table) can be parsed by the weather-data-parser procedure. Some magic ensures that parsers for arrays and other composite types containing elements of this type are available. If you need to, you could override the parser for weatherdata arrays by providing a parser for the type called "_weatherdata".

The second part says that when query receives an argument for which the weather-data? predicate returns #t, it should be unparsed by the weather-data-unparser procedure.

Finally, we can work with a more readable (if more verbose) interface:

(query con (conc "INSERT INTO weather (year, month, day, data)"
                 "VALUES ($1, $2, $3, $4)")
           2010 11 24 (vector (make-weather-data rainfall: 12
	                                         average-temp: 6)
	                      (make-weather-data rainfall: 11.2
			                         average-temp: 4)
			      (make-weather-data rainfall: 10.1
			                         average-temp: 4.7)
			      (make-weather-data rainfall: 9.4
			                         average-temp: 2.5)
			      (make-weather-data rainfall: 11
                                                 average-temp: 4.3)))
(row-fold-right cons '() (query con (conc "SELECT year, month, day, data "
                                          "FROM weather ORDER BY day")))
=>
((2010 11 22 #({rain: #<sql-null-type> temp: 6}
               {rain: #<sql-null-type> temp: 7.5}
               {rain: #<sql-null-type> temp: 5.2}
               {rain: #<sql-null-type> temp: 4.6}
               {rain: #<sql-null-type> temp: 5.4}))
 (2010 11 23 #({rain: 11 temp: 4.2}
               {rain: 8.3 temp: 5.8}
               {rain: 6 temp: 7.1}
               {rain: 7.3 temp: 6.2}
               {rain: 12 temp: 4}))
 (2010 11 24 #({rain: 12 temp: 6}
               {rain: 11.2 temp: 4}
               {rain: 10.1 temp: 4.7}
               {rain: 9.4 temp: 2.5}
               {rain: 11 temp: 4.3})))

If you want the parsers to be available for all connections, just parameterize the default-type-parsers and default-type-unparsers.

Whew! This omelette recipe grew a little longer than I expected it to become, but I hope it shows that PostgreSQL is an extensible database, and as such can be used a lot like Lispy languages; you can extend it to fit your program's problem domain, and best of all: you can integrate those database-extensions right into Chicken!

For those who are motivated to go further down the rabbit hole, you can try to figure out how to define type hierarchies or custom base types; atomic types which have their own custom "reader syntax" inside Postgres. You'd write those readers/writers in C or another language embedded in Postgres. Any takers for making Chicken one of those?

5. About the Chicken Gazette

The Gazette is produced weekly by a volunteer from the Chicken community. The latest issue can be found at http://gazette.call-cc.org or you can follow it in your feed reader at http://gazette.call-cc.org/feed.atom. If you'd like to write an issue, check out the instructions and come and find us in #chicken on Freenode!

The chicken image used in the logo is kindly provided and © 2010 by Manfred Wischner