On syntax and semantics

October 1, 2012 at 9:00 am
filed under Coding
Tagged , , , ,

My latest hobby-horse is syntax, and our fixation on it as a profession. And yet I am going to violate my own principles to bring it up, and this probably won’t be the last time. If it makes you feel any better, I feel like I’m belaboring the point big time. Nevertheless I’ve had to repeat variations on this a few times, and I wanted to clarify my thoughts on the issue.

My feelings on syntax are that, relative to how much value they deliver, at least on average, they receive a ridiculous amount of attention. The first thing that people bring up when they see a Lisp is that parentheses are hard. I agree that they are hard if you are not familiar with them. And whether you want to learn Lisp and whether you ought to learn Lisp are both orthogonal. But you should not confuse unfamiliarity with difficulty. And you should not confuse your unfamiliarity with a bit of syntax with the merit of a language. The mere presence of a learning curve ought not be a deal-breaker.

That said, syntax can have value. Syntax which is relatively concise while being semantically precise has value. Granting precision, concision is particularly important because of an important property of code: how often it’s read. But beyond that, we should decrease our fixation on how easy or hard it is to write a given piece of code, and consider how easy it is to read it, separate from how easy it is to read it without knowing the language.

Let’s talk examples.


I use Java at work sometimes. And early on I noticed that Java has very poor support for literals. The closest you can get to a map literal is something like this:

Map<String, String> map = new HashMap<String, String>() {
    add("foo", "bar");
    add("baz", "quux");

It’s essentially the same as declaration and initialization. How often do you see this syntax? Myself, I have hardly ever seen it. Does it actually save any space? It’s just as easy to write this:

Map<String, String> map = new HashMap<String, String>();
map.add("foo", "bar");
map.add("baz", "quux");

In Clojure, maps are a bit different, in that they are immutable. And typically instead of creating a new map and “adding” to it, you just write a literal:

(def mymap {:foo "bar" :baz "quux"})

Is writing the Java code above a real hardship? In the past I’ve complained about it pretty loudly to anyone who’d listen. But no, as much as I love to rag on Java’s verbosity, it’s not a big deal. It’s not fun but if it comes down to a choice between simplicity and robustness versus, well, easy, maybe we should pick the former.

Another question we can ask, which is actually pretty important, is whether this changes the sorts of programs we can write. And I don’t think it does.

As the size of your codebase increases, your coworkers read your code many, many times. They do this as part of debugging, refactoring, or adding new features. And the more code somebody has to read to understand your intent, the slower going it is.

Literals are one feature that can make it substantially easier to discern your intent. [1 2 3] is a vector in Clojure. []string { "foo", "bar" } is an array in Go. And so on. The code which is boilerplate evaporates. The code which is important, the code which is ostensibly why you’re writing this function or that module, remains. And I’d suggest that any language which does not have something like a map or list literal is unlikely to treat these as first-class constructs, meaning there’s likely more rigmarole — semantic and syntactic — to make them work.

It’s a lot like a foreign language. When you don’t know a word, you have to resort to indirection. You have to talk around the word you don’t know to convey intent. It’s a stick with a handle on one end, and the other end has a flat metal piece. In your native language, you can just call it what it is: it’s a shovel. It’s safe to assume any native speaker of English knows what a shovel is. A literal is a way to convey succinctly what something is. Your interlocutor (BAM!) can focus on the meaning you’re trying to convey instead of the presentation.

Incidentally, pattern matching (coupled with algebraic data types like in Haskell) is a similar idea. Instead of trying to say what something isn’t, or deduce what it is based on a collection of attributes, you can say what it is you want:

maybeToString :: Maybe Int -> String
maybeToString (Just x) = "Your number was: " ++ x
maybeToString Nothing  = "You had nothing!"

This function takes a Maybe Int. A Maybe Int can have an Int or Nothing. And the example is Haskell because Clojure does not have pattern matching.

Map, filter, reduce

I’ve griped about this before, so let’s just rehash the old example.

ArrayList<Integer> intList = Arrays.asList(1, 2, 3, 4, 5, 6);
ArrayList<Integer> evenIntsList = new ArrayList<Integer>();
for (Integer i : intList) {
  if (i % 2 == 0) {

Every time someone reads this loop, they have to re-derive the meaning. It’s not at all hard to imagine a more sophisticated comparison, or a combination of a filter (the if statement) and a transformation (square it, say). And forget trying to parallelize an arbitrary loop. It is inherently and inextricably sequential.

(filter even? (range 1 7))

The operation names what I am doing exactly and succinctly. And because we’ve abstracted how this happens, it lends itself to a parallel implementation. It’s an abstraction but it’s a very economical one which allows us to write a simpler program. A program which uses simple yet powerful components has a different starting point from one where you have to reinvent and re-implement abstractions like map and filter as a matter of course.

Bad syntax

Most of the time bad syntax is just unfamiliar or unwieldy syntax. It might grate on you but for a language like Java, the compiler catches your dumb typing errors. IDEs and snippets can help you out, as well.

But sometimes — sometimes! — bad syntax is easily written incorrectly, even for someone relatively familiar with the language. An example would be the classic C/C++ int* a, b. * is actually associative with the variable, so in this case only a is a pointer to an int. b is a regular int. This isn’t so bad once you learn the trick, but it does require some attention to detail. It’s not a huge deal, although forgetting what is and isn’t a pointer in general can be a source of bugs.

The worst kind is the kind where the code compiles and runs, and is in every respect valid but completely incorrect. I don’t mean to pick on a language which I’m unfamiliar with, but CoffeeScript: less typing, bad readability is a great example. A space or pair of parens leads to valid syntax but erroneous behavior. The compiler can’t help you. And it’s not inconceivable that someone who’s familiar with the syntax might drop an extra space in accidentally.

What have we learned?

Syntax, combined with smart semantics, can make a substantial difference. But one thing you should notice about most of these examples is that the syntax is there as a complement to robust or interesting semantics. If Java added type inference, it would make the code much easier to read, but it would not be the same as filter. Rather, simple, precise semantics can enable you to reduce the syntax — the representation of your program — to something cleaner or simpler.

But as easy as it is I don’t think we should bikeshed nearly as much as we do. It’s a terrible antipattern to fixate on syntax, especially for languages we don’t know, as if what’s easy to type has much correlation with its clarity or robustness. What’s easy isn’t always what’s worth doing.

Furthermore, if you’re at all interested in substantially different results, you’re going to have to expose yourself to substantially different constructs. And anything remarkably different in that respect is likely to have different meaning and presentation. I think we should care more about what’s easy to debug or understand, subjective feelings on expressivity aside. And in this respect, composing pure functions are an example of a simpler and easier to understand construct, whether you’re representing this with nested parentheses in Lisp, $ or . in Haskell, or whatever language you happen to be using.

%d bloggers like this: