Perils of portability

I had fun speaking at QCon in London earlier this month with a talk on the Cloud track entitled the Perils of Portability.

This had some Governmenty stuff in but was mainly part rant, part hope for the future of cloud infrastructure. I had some great conversations with people afterwards who felt some of the similar pain which was nice to know. I also somehow managed to get 120 slides into a 40 minute presentation which I think is a personal records.

The videos will be available at some point in the not too distant future too.

Going fast in government

About a month ago I had the good fortune of speaking at the London Web Performance meetup. This was one of the first talks I’ve done about our work at The Government Digital Service since the luanch of GOV.UK back in October. The topic was all about moving quickly in a large organisation (The UK Civil Service is about 450,000 people so I think it counts) and featured just a hand full of technical and organisational tricks we used.

March madness

With only a week or so to go before the end of February, it’s looking like March might be a little busy.

  • I’m speaking at QCon, in London on Wednesday 6th on Clouds in Government - Perils of Portability (which in hindsight is probably the silliest title for a talk I’ve ever used)
  • On the 15th and 16th of March I’ll be at Devopsdays, again in London. I’ve been helping out with organising the event and I’m very much looking forward to going along after seeing all the work being put in.
  • And last but not least I’m heading to Boston for the rather exciting Monitorama from the 26th until the 30th. Looking forward to meeting up in person with quite a few folks I’ve spoken to over the last year or two.

If you’re going to be at any of these events (QCon and Devopsdays still have tickets available I think) then let me know.

Django and Rails presentation from QCon

I had great fun back in November at the QCon conference in San Francisco. As well as currating one of the tracks and catching up with people in the area I managed to give the following talk.

In hindsight it might have been a bit odd to try and cover both Rails and Django examples in the one presentation but it was quite good fun putting together code examples using both of them at the same time. As well as a large set of tips, tricks and tools I settled on a few things that I think any web (or other) framework should support out of the box.

  • A debug toolbar
  • Transparent caching support
  • Hooks for instrumentation
  • Configurable logging

my personal package repository

I’m a big fan of system packages for lots of reasons and have often ended up rolling my own debian package repository at work, or working with others that have done so. Recently I finally got round to setting up a personal package repo, at More interesting than the repo is probably the tool chain I used, oh and the rather nice bootstrap based styling.

nice looking package repository

The source code for everything is on GitHub although not much documentation exists yet. In the middle are a few shell scripts that generate the repo. Around them is a Vagrant box (which makes it easier to build packages for different achitectures or distros) and some Rake commands

<code>bundle exec rake -T
rake recipes:build[recipe]  # Build a package from one of the available recipes
rake recipes:list           # List available recipes
rake repo:build             # Build the repository</code>

The recipes commands allow for building new packages based on scripts. A few examples are included which use fpm, but you could use anything. The repo:build command triggers the debian repository to be rebuilt.

The vagrant configuration shares various folders between and guest and host which also opens up a few useful features. One is I can just drop any old debian package into the debs folder and run the repo:build command and it will be in my repository. The other useful capability is that the resulting repo is shared back to the host, which means I can then check it into Git and in my case push it up to Heroku.

On the forge

I’ve been spending a bit of time recently pushing a few Puppet modules to the Forge. This is Puppetlabs attempt to make a central repository of reusable puppet modules. I started doing it as a bit of an experiment, to find out what I liked and what worked and I decided to writeup a few opinions.

So far I’ve shipped the following modules:

Quite a few of these started as forks of other modules but have evolved quite a bit towards being more reusable.

I’ve also started sending pull requests for modules that basically do what I want but don’t always play well with others.

Improved tools

It turns out the experience is mainly a pleasurable one, partly down to the much improved tooling around Puppet. Specifically I’m making extensive use of:

  • Rspec Puppet - for writing tests for module behavious
  • Librarian Puppet - dependency management for modules
  • Puppet spec helper - conventions and helpers for testing modules
  • Travis CI - easy continuous integration for module code
  • Vagrant - manage virtual machines, useful for smoke testing on different distributions

Lots of those tools make testing Puppet modules both easier and useful. Here’s an example of one of the above modules being tested. Note that it’s run across Ruby 1.8.7, 1.9.2 and 1.9.3 and Puppet versions 2.7.17, 2.7.18 and 3.0.1 for a total of 9 builds. Handily the Redis module mentioned also had a test suite. The pull request includes changes to that, and Travis automatically tested the pull request for the modules author.


Using modules from the Forge really forces you to think about reusability. The pull request mentioned above for the Redis module for instance replaced an explicit mention of the build-essential package with the “puppetlabs/gcc”: class from the Forge. This makes the module less self contained, but without that change the module is incompatible with any other module that also uses that common package. I also went back and replaced explicit references to wget and build-essential in my Riemann module.

As a rule of thumb. For a specific module only include resources that are unique to the software the module manages. Anything else should be in another module with a dependency in the Modulefile.

This can feel a little much when you’re replacing a simple Package resource with a whole new module but it has two advantages I care about. As well as the ability to use the module with other third party modules more easily it also makes it more likely that the module will work cross platform.

What’s missing?

I’d like to see a few things improved when it comes to the Forge.

  • I’d like to be able to publish a new version of a module without having to use the web interface. The current workflow involves running a build command, then uploading the generated artifact via a web form after logging in.
  • I’d like to see best practice module development guides front and centre on the Forge. Lots of modules won’t work with other modules and I think that’s fixable.
  • Integration with puppet-lint would be nice, giving some indication of whether the authors care about the Puppet styleguide.
  • A command line search interface would be useful. And turns out to exist. Thanks @a1cy for the heads up.
  • The Forge tracks number of downloads, but as a publisher I don’t know how often my modules have been downloaded.
  • And finally I’d like to see more people using it.


Last week we shipped GOV.UK. Over the last year we’ve built a team to build a website. Now we’re busy building a culture too. I’ve got so much that needs writing up about everything we’ve been up to. Hopefully I’ll make a start in the next week or so.

Tale Of A Grok Pattern

I’m all of a sudden adding lots more code to GitHub. Here’s the latest project, grok patterns for logstash. At the moment this repo only contains one new pattern but I’m hoping to add more, and maybe even for others to add more too.

First, a bit of background. Logstash is the excellent, open source, log agregation and processing framework. It takes inputs from various configurable places, processes them with filters and then outputs the results. So maybe you’ll take inputs from various application log files and output then into an elastic search index for easy searching, or output the same inputs to graphite and statsd to get graphs of rates. One of the host powerful filters in logstash is the grok filter. It takes a grok pattern and parses out information contained in the text into fields that can be more easily used by outputs. This post serves hopefully as both an explanation of why and an example of how you might do that.

The problem

Rails logs are horrible, that is until you install the excellent lograge output formatter. That gives you lines like:

GET /jobs/833552.json format=json action=jobs#show status=200 duration=58.33 view=40.43 db=15.26

This contains loads of useful information that’s easily parsable by a developer. We have the HTTP status code, the rails controller and information about response time too. A grok filter lets us teach logstash about that information too. The working grok filter for filtering this line looks like this:

The solution

LOGRAGE %{WORD:method}%{SPACE}%{DATA}%{SPACE}action=%{WORD:controller}#%{WORD:action}%{SPACE}status=%{INT:status}%{SPACE}duration=%{NUMBER:duration}%{SPACE}view=%{NUMBER:view}(%{SPACE}db=%{NUMBER:db})?%{GREEDYDATA}

That was worked out pretty much with a bit of trial and error and use of the logstash java binary, using stdin and stdout inputs and outputs. It works but getting their wasn’t that much funand proving it works outside a running logstash setup was tricky. Enter rspec and the grok implementation in pure Ruby. The project above contains an Rspec matcher for use when testing grok filters for logstash. I’ll probably extract that into a gem at some point but you’ll get the idea. Now we can write tests like these:

the lograge grok pattern
  with a standard lograge log line
    should have the correct http method value
    should have the correct value for the request duration
    should have the correct value for the request view time
    should have the correct controller and action
    should have the correct value for db time
  without the db time
    should have the correct value for the request view time
  with a post request
    should have the correct http method value

Finished in 0.01472 seconds
7 examples, 0 failures

The tests themselves are just basic Rspec with most of the work done in the custom matcher. This not only means I can be a bit more confident that my grok pattern works, it also provides a much nicer framework for writing more patterns for other log formats. Parsing rules like this are one area where test driven development is a huge boon in my experience. And with tests comes continuous integration, in this case via Travis.

I’ll hopefully find myself writing more patterns and tests for them, and if anyone wants to send pull requests and to start collecting working grok patterns together so much the better.

Riemann Puppet Module

Thanks to an errant tweet I started playing with Riemann again. It ticks lots of boxes for me, from the clojure to configuration as code and the overloadable dashboard application. What started as using Puppet and Vagrant to investigate Riemann turned into a full blown tool and module writing exercise, resulting in two related projects on GitHub.

  • garethr-riemann is a Puppet module for installing and configuring Riemann. It allows for easily specifying your own server configuration and dashboard views.
  • riemann-vagrant is a Vagrantfile and other code which uses above puppet module to setup a local testing environment.

I like this combination, a separate Puppet module along with a vagrant powered test bed. I’ve written a reasonable rspec based test suite to check the module but it’s always easier to be able to run vagrant provision as well to check everything is working. This also turned out to be the perfect opportunity to use Librarian-Puppet to manage the dependencies and eventually to ship the module to the Puppet Forge.

The Story

A few weeks ago now (a website I maintain for third party hosted Vagrant base boxes) dissapeared from the internet for a few days. This was completely my fault, the (lovely) hosting people had unfortunately closed down the service they had in beta and I’d been so busy that I hadn’t had chance to move it elsewhere.

The original version of the site (I had the code and good backups of the data) was a pretty simple Django application, but I’d used it to experiment (read over-engineer) with various bits of tech including Varnish, Solr, some ORM caching and lots more. This had been great, but it made it less portable. I had everything described in Puppet, but with virtually no spare time I decided to go a different route.

I threw a flat version of the site up on GitHub, served it using Nginx on Heroku and added a quick Fork me on GitHub badge to the top. Suggest a box moved from being a web form to a pull request. It’s fair to say I did this pretty quickly and made a good few typos on the way. But within a couple of weeks I’ve had 8 pull requests either fixing my bugs, removing dead boxes and adding new ones.

What I’m going to take from this is, if you’re building a community project that’s aimed at developers, then throw the content on GitHub. In my case I have the entire site on there too but I think that’s secondary. Pull requests are much better than any content management system or workflow you’re likely to build, and even more importantly the time to implement something drops hugely.

With all the spare time I don’t have I’ll be thinking about a content management model using GitHub for content, pull requests for workflow and post commit hooks for loading that content into a site or service somewhere.