Skip navigation

Category Archives: Uncategorized

There are often debates that run the risk of developing into flamewars about the relative merits of static and dynamic typing in popular programming languages. I feel I should air an issue that doesn’t get mentioned often. That is that there are a number of aspects of languages that are typically considered statically-typed that amount to dynamic typing.

In common object-oriented languages, there is frequently a mechanism for determining the type of an object at runtime. In Java, for instance, there is the “instanceof” operator. This can be combined with typecasting, autoboxing etc. to facilitate what amounts to dynamic typing. Unlike in, say, C, typecasting is type-safe: if an object does not conform to a particular type, then an exception will be raised. This is unequivocally dynamic typing. If one were feeling silly, one could type all method parameters and return types as “Object,” and, using these features, write a working, type-safe program. Many other statically-typed object-oriented programming languages contain similar features: Delphi had (or had, I’ve not touched it since version 7) “is” and “as”, and I believe C# has the same basic idea — which isn’t all that surprising, given the personalities behind that language; Go has type assertions that fulfil the same ends, and the “empty interface” offers a catch-all type, along the same lines as Java’s “Object” type.

Slightly more controversially, I would argue that the form of overloading typically referred to as “polymorphism” in these languages amounts to a form of dynamic typing. That is, the behaviour embodied in a particular method call (some of which, at least in theory, can be considered part of the type) depends not only on the interface that it refers to, but also on the run-time type of the object. This is something handled by the language, it doesn’t need to be an explicit concern of the programmer.

There are probably other examples that one could come up with. I think these demonstrate that there is value to dynamic typing beyond saving a few keystrokes on variable declarations, or what have you, and that even if one decides to use a static language to write a program, one can take advantage of this where appropriate. Unless it’s ML-derived or C or similar, as to my knowledge this sort of thing isn’t present in them.

I’ve just got a Lisp (well, Lisp or Scheme, or somewhere in between) interpreter that I wrote in Go to a stable state. It doesn’t have very many features and it’s hardly documented at all. It’s also, most likely, full of bugs. You can grab it here. This post will briefly summarise some of my experiences in writing the thing.

Defining the basic behaviour is quite straightforward, and requires significantly less ceremony than many other languages, particularly static ones. The type system, in particular, is quite friendly to this task. For instance, defining an environment (a mapping from symbols to values) is as simple as

type Environment map[Symbol] Any

where symbols are just aliases on strings, and Any is an alias for the empty interface. Not bad, although to get the full behaviour for lexical scoping, some mechanism of threading environments together had to be found. I came up with a Context object, that links an environments to a “parent” environment. Then looking up a symbol is a breadth-first search through this structure. When Lisp manipulates environments, it is actually affecting Context objects. The Context type doubles as an interpreter: creation via NewContext is for threading environments together, while creation via New creates a Context that can be used as an interpreter.

Of course, functions are a crucial part of any Lisp system. Anything that follows the Function interface can be called by Lisp:


type Function interface {
    Apply(args Any) Any
}

The args parameter there is a list.

Two such types are provided. There is a Primitive type, that’s just a function with the same signature as that Apply method. There is also a closure type, which is not directly accessible from outside the library, and is created by lambda expressions within the language. A function, WrapPrimitive, takes a function that could take up to five arguments, and returns a Primitive object that parses the arguments passed in as well as some basic error checking. A WrapPrimitives function takes a map from strings to such functions, and returns an Environment object. The primitives.go file demonstrates how pleasant these two functions make writing primitive functions for this Lisp system. Functions created by lambda expressions are tail-recursive.

I defined the syntax in terms of two other libraries I had written, a simple Thompson NFA (à la grep or awk) and a PEG parser. This made the description of the syntax feel quite declarative. It certainly felt easier to write than previous forays into parsing Lisp expressions. Given how easy they are to parse, that means it was pretty darned easy. A few things are missing, like the quoting shorthand syntax. Evaluation proceeds in two steps: there is an expansion step, where applicable macros are executed (note: this is currently untested) and an eval step, which is a pretty naïve interpreter design. There are seven built-in syntactic forms: quote, if, lambda, set!, define and begin. I believe the rest can be implemented in terms of the language itself, but we’ll see. I haven’t yet written a standard library for it, to make programming in it even vaguely pleasant.

So far, of the things people seem to moan about most about Go, I haven’t missed generics at all, and exceptions only slightly. Having to manually check if an error has been raised all the time does get tiresome, but honestly the rest of the language more than makes up for their absence. I essentially added exceptions into the Lisp interpreter, which sounds OK, but compared to conditions/restarts or continuations are a bit humble really. When I tested out recursion not in tail position, I was greeted with lots of paging, rather than any error informing me that I’d blown the stack, so I may need to add that in manually as well.

Go is a pretty good language to write an interpreter for a dynamic language in. Given that it has garbage collection, first-class functions, a hashmap type and support for dynamic types built-in, you’re off to a running start.

OK, so we should be all familiar with Paley’s Watchmaker argument. That is, you see a watch in a field, then you know someone made it. Ergo, you see some critter, you know someone made that as well. Praise be to God, and so on.

We all know that evolution through natural selection etc can adequately explain the diversity of life. So that must mean that Paley’s argument is wrong, right? So if we see a watch in a field, that watch must have evolved!

Well, that’s the basis of memetics: take things that we know have been designed, and claim that they have instead evolved. In the paper that started memetics off, Viruses of the Mind, Dawkins does this quite explicitly when he uses computer viruses.

This strikes me as completely bizarre. As Dawkins is fond of saying, evolution through natural selection etc. gives us the appearance of design, without requiring a designer. However, to take his example, computer viruses do have designers, for they are computer programs and such programs need human authors. Therefore we do not need clever theories to explain their properties (beyond the mechanistic properties of those programs), we can just point to some hacker and say “there’s your cause.”

This is a common defence of the patent system, but I believe the reasoning behind it is erroneous.

I can understand why such a rationale would be put forward. After all, the converse, that a system of state-backed monopolies would help the formation of… um… monopolies and therefore not help the little guy, would not be a good selling point. The rhetoric around this has become quite sophisticated, given that there has been a lot of time and effort expended into making the arguments as emotionally appealing as possible, which can make it feel somewhat hard to counter. I shall give a fairly crude example of one such argument, nonetheless. One should understand that my counter arguments apply to more sophisticated versions of the same argument, however.

2 man start up spends a few years generating brilliant product x. It uses new and clever ways of doing things. A big company comes along, steal the idea, puts a 300 man team on it for a few months and gets to market first.

Link

As you might be able to gather if you click the link, I have already responded. There I gave some reasons why such a rationale would not normally apply.

So, there are some problems with this statement as it applies to the real world.

Most forms of technological development are incremental. While there are revolutionary, “game changing” inventions, most build upon previous work. Most of the game changers are themselves accretions of other technologies — think of the internal combustion engine, a synthesis of atomisers (from perfume), swamp gas detectors (a useless invention) and pistons from steam. A new entrant with some fabulous invention is therefore quite likely to infringe in some way on existing inventions. Furthermore, incremental inventions themselves open up new opportunities. High-pressure steam engines were a requirement for some of the more important developments of the industrial revolution, and they were held back by at least ten years by patents. This is analogous to saying that land ownership is available to all at no cost because of homesteading, which is rendered null and void when all land is owned. If one is a new entrant in a field that is swamped with patents, then one is at the whim of the incumbents, and so patents help the big guy.

There is the issue I brought up in my response above. That is, the large incumbent company is seriously outnumbered by the small challengers, and so it would be prohibitive for them to imitate all the challengers’ inventions. They must therefore wait for a particular invention to be successful before it’s worth imitating, at which point it is too late, because unless they are a heavily entrenched monopolist (for instance, they have been granted lots of patents), the new entrant has already carved out a market for themselves.

In reality, small companies do not benefit from patents. Preventing competition does not help the competitor who has the advantage. It helps the incumbent, who has run out of ideas.

Scheme’s DO is generally treated with some disdain. Yet I often find myself recreating it with named LET.

So,


(let next ([i 0] [res '()])
  (if (= i 4)
    res
    (next (+ i 1) (cons i res))))

Is equivalent to:

(do ([i 0 (+ i 1)]
     [res '() (cons i res)])
  [(= i 4) res])

Several weeks ago, I got on the train and sat down. The seat was a lot less comfortable than it should have been, so I reached down, and lo! I found a folder containing some very sensitive information, that appeared to belong to a government employee in training.

A few days ago, I was registering with the NYT, and was denied the option of using my regular (rather heavily obfuscated) password by a foolish password policy:

Password can only contain letters a-z, numbers 0-9, periods ., underscores _, and hyphens -.

Now that’s bad enough, and this subject has been discussed at length by others, so I sent them an email regarding this. This was what I got as a reply (note that I’ve masked some parts):

Thank you for contacting NYTimes.com.

To help you get immediate access to our Web site, we have reset your password.

Please log in with the following information:
Member ID:
paulucy

Password:
************
Please go directly to the following Web address (URL) to sign out of
NYTimes.com :
http://www.nytimes.com/signout

Then click on the link to “log in”.

On the page that follows, look at the right part of your screen for
the section marked “Log In Now”
(please make sure that you do not register again )
and enter your Member ID and Password as shown above.

Please note that your browser must accept cookies from NYTimes .com in
order to go beyond the log-in screen.

And remember to click on the Log In button to submit your request.

Also, we suggest that you immediately select a new password in the
Your Profile area of our Member Center at:
http://www.nytimes.com/profile

While you are on this page, please take a moment and select a Secret
Question & Answer, if you haven’t already. This will make retrieving
your password easier in the future.

We recommend that you write down your log in information.

Regards,
Pam White
NYTimes.com
Customer Service
http://www.nytimes.com/help

That’s right, they responded to a query about the limitations of the password system by giving me the password of another user. It wasn’t only completely useless to me, but it violated someone else’s privacy and security.

In both cases I have done my best to get the issues corrected, beginning with informing the people affected by the lapse.

Security is hard to do, and people make mistakes. Maybe I’ve just been very lucky in being the person discovering these leaks, rather than being the victim of them, but who knows? Is everyone as nice as me when they find stuff like this?

Parallel processing is easy in the shell.

a | b | c

Will run a, b and c in parallel, passing data from one to the next. There is some sequencing here, implicit in the way each of the commands does its business. If you are passing a file through commands that process one line at a time, there will be a lovely sequence of lines flowing down the pipes, in staggered formation, and the number of lines being processed at once approaches the number of commands in the chain.

It sounds like a good idea to make these chains really long, if you want to maximise that easy parallelism. However, there’s only so much you can do, passing data all in one line. Named pipes get you around that, and there’s some support already. Use of the tee command, along with command substitution, allows you to “fan out” a tree of pipes. This gives us even more parallelism, but your endpoints are then probably going to have to be files. Which means waiting for all the fun to stop before continuing. There needs to be a way to close these trees back up, and turn it into a graph. Then you can fan out, getting different bits of data out of the input, then merge those bits of data together to form the output. Sounds easy to me!

I propose two mechanisms, an easy way of constructing temporary named pipes and referring to them later and a command to merge these named pipes together. I shall call the first utility leaf, referring to the bottom of a pipe tree, and I shall call the graph constructor eet, to refer to its role as (sort of) the inverse of tee.

Leaf creates some temporary pipes, and binds them to the variables passed in. eet deletes the pipes and unsets the associated variables when done with them. eet takes a command string to execute, with the pipe names appearing where file names would, suffixed by @.

e.g.

leaf p 
cat /proc/cpuinfo  \
  | tee >(egrep 'physical\ id' | cut -d : -f 2 > $p) \
        | egrep 'core\ id' | cut -d : -f 2 \
        | eet paste p@ - | sort -u | wc -l

Tells you how many processors there REALLY are.

I’ve done a tentative implementation below. This represents a couple of hours’ work, so it’s understandably awful. It sort of works.

Update: the command to unset the pipe variables after they’ve been used didn’t do anything, I’ve now wrapped it in an eval, which seems to do the trick (I know…)


function leaf {
  for arg in "$@"; do
    local pipe=$(make-leaf-pipe)
    eval "export $arg=$pipe"
  done
}


function make-leaf-pipe {
  if [ $TMPDIR ]; then local tmp=$TMPDIR/pipe-graphs
                  else local tmp=/tmp/pipe-graphs; fi
  if [ ! -e $tmp ]; then mkdir $tmp; fi
  local pipe=$(mktemp -u $tmp/XXXXXXXXXXXXX)
  mkfifo $pipe
  echo $pipe
}


function eet {
  local pipen= pipes= argt= arglst= cmd=$1
  shift
  for arg in "$@"; do 
    case "$arg" in
      *@) argt=$(sed 's/\(.*\)@/\1/' <<< $arg)
          pipen="$pipen $argt"
          argt=$(eval "echo \$$argt")
          pipes="$pipes $argt"
          arglst="$arglst $argt"
        ;;
      *)  arglst="$arglst $arg"
        ;;
    esac
  done
  $cmd $arglst
  rm $pipes
  eval "unset $pipen"
}

Reading through R6RS. It seems to actually present a really nice system for programming in, even if it’s pretty big compared to R5RS. It feels seriously serious now. Looking through the library section on syntax-case, I found this:

Using datum->syntax, it is even possible to break hygiene entirely and write macros in the style of old Lisp macros.  The lisp-transformer procedure defined below creates a transformer that converts its input into a datum, calls the programmer’s procedure on this datum, and converts the result back into a syntax object scoped where the original macro use appeared.

(define lisp-transformer
  (lambda (p)
    (lambda (x)
      (syntax-case x ()
        [(kwd . rest)
         (datum->syntax #’kwd
           (p (syntax->datum x)))])

It’s nice to know, after spending all that time and effort trying to ensure hygiene, it’s that easy to break.

There was an article in last Friday’s Guardian by Simon Jenkins about the subject of mathematics education. It came to the conclusion that maths isn’t necessary to know in order for someone to be able to lead a productive life. He’s right, but not in the way he thinks.

The following issue contained lots of (semi) angry rebuttals from mathematicians. Again, in their own ways they were right, but I think they miss the point.

The main points that Jenkins makes are that mathematics has no economic value, and that it is some form of traditionalism, coupled with underhanded lobbying, that gives maths its prominent place in the curriculum. The rebuttals focus on refuting these specific points, by, for instance, pointing out that maths graduates go into finance, or questioning the evil conspiracy Jenkins conjured up. Both sides assume that mathematics is useful, or is geared towards a useful aim. It might be, as a side effect, but that’s not why mathematicians do it (to paraphrase Feynman). The point is that it is fun, although you probably wouldn’t get that from a maths lesson.

That mathematics isn’t necessary to lead a normal life is adequately demonstrated by the fact that nobody is taught mathematics in school. Instead, they are taught things like algebra, arithmetic, calculus, geometry and so on. These things are to maths as spelling is to poetry. Mathematics is spotting patterns, solving puzzles, and playing around with entirely imaginary constructs in a logical manner. It is a fundamentally creative endeavour.  This is the sort of thing that many people profoundly enjoy doing (Sudoku is rather popular, I hear), but without realising that they’re doing maths.

Oh, of course it’s not investigating the Riemann Hypothesis or anything Hard and Professional like that, but people enjoy painting without expecting to come up with a Sistine Chapel ceiling, or what have you. The difference is that art lessons actually involve painting, so you can get a feel for whether or not you enjoy it, while maths lessons, for most people, teach them nothing other than that they hated maths lessons.

This is mostly cribbed from here, but that’s only because this Lockhart chap so successfully encapsulated what I have intuited for years, but was unable to express. Read it.

OK, so there’s this fun thing called object-oriented programming. Scheme is a (mostly) functional programming language, but objects can be put in there, and as a LISP, adding syntactic niceties to support this feature shouldn’t be too difficult. Whilst semi-sold on the idea of OOP as a language paradigm (doing so in my case requires a very loose definition of language), I think it is more important as an issue of design. OOP makes it easier to design certain kinds of large systems. This is less obvious when the language does not make such abstractions easy (GTK anyone?).

Now, I’m certainly not the first to try this idea. Common Lisp has CLOS, which is based upon many other attempts at this concept, and there are versions of CLOS for Scheme. CLOS is both powerful and flexible, and it manages to gel well with the rest of the language. So, why would I want to re-implement it?

Well, I don’t like the way selectors work. This is in part because of the “fitting nicely into the rest of the language” thing. In order to access a member of an object in any Simula-derived language, you do something like:

object.member

In LISP it looks like

(member object)

There isn’t a huge amount of difference, syntactic issues aside. However, the first describes the situation in a better way: it imparts more semantic information. I can see this decision is based upon the way LISP encapsulates abstract data. That is, a data type is defined by the operations which can be performed on it. This works well when we’re just talking about data, but objects aren’t just data.

Objects describe behaviour as well as state. When you send a message to an object, it does something with its data in a well-defined way. So there should be a way of more closely associating methods to objects. CLOS gets around this by ignoring the issue. It does so with multi-methods, which are cool, but aren’t message-passing. A neat hack is still just a hack.

This reason doesn’t sound very pragmatic. I could argue that in large systems it certainly is pragmatic to be able to group related concepts together closely, but I’ll try a different tack. Let’s say you wanted to find the member of a member of an object. Again, in Simula:

object.member1.member2

And in LISP:

(member2 (member1 object))

Starting to seem less practical, what?

Now let’s return to the whole method thing. In CLOS, multi-methods and objects are related, but the binding is rather weak. If I were going to call a method on an object in Simula, I’d do something like:

object.method(arg)

And I could chain them together:

object.method1(arg1).method2(arg2)

But in LISP, it would be:

(method object arg)

and

(method2 (method1 object arg1) arg2)

Which is starting to get silly. Add in selectors, from above, and a single line of Simula is now a nested parenthesis hellhole.

Message-passing has been tried in LISP, and it was a failure because it didn’t work with higher-order procedures. That is:

(send object ‘method arg)

Would need to be wrapped in a lambda if it is to be used with a map or what have you. But this was Common Lisp. Scheme is a 1-LISP, in that procedures and variables occupy the same namespace (which I feel is a more reasonable approach in a language where code and data are literally the same thing). This means that you don’t need to do any escaping for a named procedure to behave like a procedure. That is:

((lambda (x) x) x)

Will return x in Scheme, but to get the same result in Common Lisp, you’d have to do:

(funcall #'(lambda (x) x) x)

So, in Scheme, methods can be treated simply as members of objects, so:

(lookup object ‘(member))

Can return some data on the object, and:

((lookup object ‘(method)) arg)

Will call that method.

To get members of members you could do something like:

(lookup object ‘(member1 member2))

Of course, you can replace that lookup with a reader macro, so that, for instance:

#$object.member

is equivalent to:

(lookup object ‘(member))

And so a more bearable interface to the system is available. That makes message-passing look like this:

(#$object.method arg)

Which is lovely.

Another thing this abstraction avails to the programmer, which is part of more closely tying the object to its members, is that a method can exist within the local scope of the object in which it is defined. That means that you don’t need to give the object as an argument to the method, and you don’t need to worry about what accessors an object has. You can just use the data on self directly. This makes it easier to reason about how objects are going to behave, as well as easily enabling higher-order procedures as a means of combination:

(map #$object.method collection)

Works as expected.

This system does not, however, account at all for the chaining I mentioned above:

object.method1(arg1).method2(arg2)

is very difficult to do with just the #$ syntax. You could achieve it with:

((lookup (lookup object ‘(method1) arg1) ‘(method2)) arg 2)

or, using #$:

(let ((res (#$object.method1 arg1)))
(#$res.method2 arg2))

But, in either case, that looks even worse than the CLOS method, and worse, forces a leak on our lovely #$ abstraction. However, because we’re using macros, we can try processing the syntax to obviate the need for this hack. Simply deciding that:

#$(object.method1 arg1 .method2 arg2)

is the same as the example given above, sorts out some of our worries. Of course, the decision about whether to call or return a method from this statement is complicated when that method takes no parameters. In that case, the easiest way to handle it is for the system to always return a method if it is not given any parameters. This way, the programmer (who is presumably familiar with the object’s interface) can signal that they wish that method to be invoked by wrapping the thing in parentheses:

(#$(object.method1 arg1 .method2 .arg2))

will call whatever method2 returned, passing no parameters.

I have an implementation that does all I’ve said above, working in Guile, although I’m not quite happy with it. It’s about 100 lines (71 opening parentheses), and although it contains elements unrelated to the primary focus of this post, I think it could be a great deal simpler. This discussion also leaves out important issues in object-oriented design, such as data-hiding, polymorphism and inheritance, which is something my object system leaves out as well, at present. I am working on rectifying both situations, and I shall post my results here, along with the final code.