Thoughts on: Thinking in Data (talk)

October 3, 2012 at 9:00 am
filed under Coding
Tagged , , ,

While taking the dog on his evening walk, I listened to Thinking in Data. I only just now had a chance to review the slides. I’m trying to jot some preliminary thoughts.

I think one of the most brilliant points in the talk was the part where he and his team cobbled together a way to visualize their data. It’s possible most people have some way to do this already. For instance, it’s common e nough to have some diagnostic readout of live data for a running service or process. But taking raw data as it went through the system and piping it through some HTML formatting is a stroke of awesome.

Maps

The one point which gave me a nice shot of endorphins was hearing about how idiomatic it is to use maps for everything. This is actually what I was already doing as I was hacking on a dumb little text adventure engine.

It’s liberating, in any interesting way, to create data in this ad-hoc way. I’m used to some amount of overhead in doing this; even in other dynamic languages, you often have to figure out some kind of schema, like what your object model is or at least the order the fields in a tuple are in.

When you’re talking about a map, well, you just cobble it together. Moreover, the fields in it are orthogonal. And if you really need the safety, you can namespace them. That’s actually a huge deal. When it’s this lightweight to create new representations of data, and you’re not trying to create this mini-language around each piece of data, you can spend more time thinking about your functions.

Also, under normal circumstances, it might be a lousy idea to pass around a monolithic data structure. But some of that stems from performance issues — pass by value, deep copies — and some of that stems from mutability issues. We don’t have those here, if we’re careful to isolate side effects. There’s still value in encapsulation, and separating things which have no conceptual relationship. But if it’s all related, you don’t have to worry nearly as much about how you’re going to aggregate a bunch of objects, each with their own language and representation of the data. Let your data be data.

Testing

Testing is near and dear to my heart, and it’s fun to actually know enough about Clojure to have context for the testing pieces. That aside, not a lot of it was particularly new to me, considering this is my profession.

It did give me some ideas. One of the things he talked about was being able to drive everything through data. That’s a reminder of how valuable it is to have a simple representation of your data, but also what thinking about integration testing up front can get you.

Integration tests have a reputation for being huge, flaky, bulky, and a productivity black hole. And that’s rightly so! Some of that is inherent to the problem: as you go bigger, as you go up to a whole system running, you have to content with what it means to set up, tear down, time out, pass/fail, and so on. You have to deal with configuring a machine with an operating system and maintaining it.

But some of it is not inherent. Some of it is because we have to use tools which are one-size-fits-all, because we didn’t build testing into the app in the first place. And doing that post-hoc is a lot harder, typically less well staffed, and so on. Maybe some of that is a sensible trade-off, considering how hard testing is to get right in terms of cost/efficiency.

BUT.

When your life is easier because of all the other goodness, you get some solid benefits, like the ability to replay an entire game. Your game really is just a function of the initial inputs, and conversely, a sequence of input states can drive your game to a given state. Then you watch.

It’s pretty slick. And it makes me wonder if there’s not some way to apply this in the future, to other products which fall under my purview. It might be too hard with your average imperative language, to be honest, but we’ll see about that.

Testing side effects

Typically this involves mocking, which he mentions as maybe not the best go-to bit. I found this slightly ironic; binding(doc) makes this more seamless than it has any right to, despite otherwise being black magic and possibly a bad idea, outside of test conditions or when You Know What You’re Doing.

And this is without having looked into any of the Clojure test functions, mind you; it’s possible that this is essentially what the mocking frameworks do, only smarter. This talk has done more to increase my interest in that than a variety of others I’ve listened to.

Anyway, having to resort to mocking isn’t actually too bad, considering how much you’ve won already from choosing purity, immutability, and composability. Predictable, composable functions which isolate side effects are a heck of a lot easier to test with confidence.

But the piece where you separate out a decision-making function which is then an input to a function which actually has side effects was an interesting twist.

My first thought was just to take a function which took two functions. Using higher order functions to make a function’s behavior parametric is a common pattern, but uh that’s basically what if is in this case! By putting these two functions on effectively the same level — by not making one of them “below” the other in the call chain — you enhance the testability of both.

Further watching

There seem to be a lot of good Clojure talks on InfoQ, actually, and my list of talks to listen to or watch is building up just from that source.

It’s a real shame so few other sites make their talks available as MP3s; there were a number of blip.tv talks that I wanted to listen to while walking the dog, commuting, etc, and there wasn’t an obvious way to do it. YouTube would be a nice candidate, too, but I challenge you to find both a non-shady and functional option by searching for YouTube to MP3 conversion.

Next up is watching, or perhaps just listening to, DSLs in Clojure.

  • For downloading videos from any site in a variety of formats I make use of Firefox with the extension video download helper, it is the less shady way I foud of getting that kind of material for offline usage.
    Cheers

%d bloggers like this: