On the forge

I’ve been spending a bit of time recently pushing a few Puppet modules to the Forge. This is Puppetlabs attempt to make a central repository of reusable puppet modules. I started doing it as a bit of an experiment, to find out what I liked and what worked and I decided to writeup a few opinions.

So far I’ve shipped the following modules:

Quite a few of these started as forks of other modules but have evolved quite a bit towards being more reusable.

I’ve also started sending pull requests for modules that basically do what I want but don’t always play well with others.

Improved tools

It turns out the experience is mainly a pleasurable one, partly down to the much improved tooling around Puppet. Specifically I’m making extensive use of:

  • Rspec Puppet - for writing tests for module behavious
  • Librarian Puppet - dependency management for modules
  • Puppet spec helper - conventions and helpers for testing modules
  • Travis CI - easy continuous integration for module code
  • Vagrant - manage virtual machines, useful for smoke testing on different distributions

Lots of those tools make testing Puppet modules both easier and useful. Here’s an example of one of the above modules being tested. Note that it’s run across Ruby 1.8.7, 1.9.2 and 1.9.3 and Puppet versions 2.7.17, 2.7.18 and 3.0.1 for a total of 9 builds. Handily the Redis module mentioned also had a test suite. The pull request includes changes to that, and Travis automatically tested the pull request for the modules author.


Using modules from the Forge really forces you to think about reusability. The pull request mentioned above for the Redis module for instance replaced an explicit mention of the build-essential package with the “puppetlabs/gcc”: class from the Forge. This makes the module less self contained, but without that change the module is incompatible with any other module that also uses that common package. I also went back and replaced explicit references to wget and build-essential in my Riemann module.

As a rule of thumb. For a specific module only include resources that are unique to the software the module manages. Anything else should be in another module with a dependency in the Modulefile.

This can feel a little much when you’re replacing a simple Package resource with a whole new module but it has two advantages I care about. As well as the ability to use the module with other third party modules more easily it also makes it more likely that the module will work cross platform.

What’s missing?

I’d like to see a few things improved when it comes to the Forge.

  • I’d like to be able to publish a new version of a module without having to use the web interface. The current workflow involves running a build command, then uploading the generated artifact via a web form after logging in.
  • I’d like to see best practice module development guides front and centre on the Forge. Lots of modules won’t work with other modules and I think that’s fixable.
  • Integration with puppet-lint would be nice, giving some indication of whether the authors care about the Puppet styleguide.
  • A command line search interface would be useful. And turns out to exist. Thanks @a1cy for the heads up.
  • The Forge tracks number of downloads, but as a publisher I don’t know how often my modules have been downloaded.
  • And finally I’d like to see more people using it.


Last week we shipped GOV.UK. Over the last year we’ve built a team to build a website. Now we’re busy building a culture too. I’ve got so much that needs writing up about everything we’ve been up to. Hopefully I’ll make a start in the next week or so.

Tale Of A Grok Pattern

I’m all of a sudden adding lots more code to GitHub. Here’s the latest project, grok patterns for logstash. At the moment this repo only contains one new pattern but I’m hoping to add more, and maybe even for others to add more too.

First, a bit of background. Logstash is the excellent, open source, log agregation and processing framework. It takes inputs from various configurable places, processes them with filters and then outputs the results. So maybe you’ll take inputs from various application log files and output then into an elastic search index for easy searching, or output the same inputs to graphite and statsd to get graphs of rates. One of the host powerful filters in logstash is the grok filter. It takes a grok pattern and parses out information contained in the text into fields that can be more easily used by outputs. This post serves hopefully as both an explanation of why and an example of how you might do that.

The problem

Rails logs are horrible, that is until you install the excellent lograge output formatter. That gives you lines like:

GET /jobs/833552.json format=json action=jobs#show status=200 duration=58.33 view=40.43 db=15.26

This contains loads of useful information that’s easily parsable by a developer. We have the HTTP status code, the rails controller and information about response time too. A grok filter lets us teach logstash about that information too. The working grok filter for filtering this line looks like this:

The solution

LOGRAGE %{WORD:method}%{SPACE}%{DATA}%{SPACE}action=%{WORD:controller}#%{WORD:action}%{SPACE}status=%{INT:status}%{SPACE}duration=%{NUMBER:duration}%{SPACE}view=%{NUMBER:view}(%{SPACE}db=%{NUMBER:db})?%{GREEDYDATA}

That was worked out pretty much with a bit of trial and error and use of the logstash java binary, using stdin and stdout inputs and outputs. It works but getting their wasn’t that much funand proving it works outside a running logstash setup was tricky. Enter rspec and the grok implementation in pure Ruby. The project above contains an Rspec matcher for use when testing grok filters for logstash. I’ll probably extract that into a gem at some point but you’ll get the idea. Now we can write tests like these:

the lograge grok pattern
  with a standard lograge log line
    should have the correct http method value
    should have the correct value for the request duration
    should have the correct value for the request view time
    should have the correct controller and action
    should have the correct value for db time
  without the db time
    should have the correct value for the request view time
  with a post request
    should have the correct http method value

Finished in 0.01472 seconds
7 examples, 0 failures

The tests themselves are just basic Rspec with most of the work done in the custom matcher. This not only means I can be a bit more confident that my grok pattern works, it also provides a much nicer framework for writing more patterns for other log formats. Parsing rules like this are one area where test driven development is a huge boon in my experience. And with tests comes continuous integration, in this case via Travis.

I’ll hopefully find myself writing more patterns and tests for them, and if anyone wants to send pull requests and to start collecting working grok patterns together so much the better.

Riemann Puppet Module

Thanks to an errant tweet I started playing with Riemann again. It ticks lots of boxes for me, from the clojure to configuration as code and the overloadable dashboard application. What started as using Puppet and Vagrant to investigate Riemann turned into a full blown tool and module writing exercise, resulting in two related projects on GitHub.

  • garethr-riemann is a Puppet module for installing and configuring Riemann. It allows for easily specifying your own server configuration and dashboard views.
  • riemann-vagrant is a Vagrantfile and other code which uses above puppet module to setup a local testing environment.

I like this combination, a separate Puppet module along with a vagrant powered test bed. I’ve written a reasonable rspec based test suite to check the module but it’s always easier to be able to run vagrant provision as well to check everything is working. This also turned out to be the perfect opportunity to use Librarian-Puppet to manage the dependencies and eventually to ship the module to the Puppet Forge.

The Vagrantbox.es Story

A few weeks ago now Vagrantbox.es (a website I maintain for third party hosted Vagrant base boxes) dissapeared from the internet for a few days. This was completely my fault, the (lovely) hosting people ep.io had unfortunately closed down the service they had in beta and I’d been so busy that I hadn’t had chance to move it elsewhere.

The original version of the site (I had the code and good backups of the data) was a pretty simple Django application, but I’d used it to experiment (read over-engineer) with various bits of tech including Varnish, Solr, some ORM caching and lots more. This had been great, but it made it less portable. I had everything described in Puppet, but with virtually no spare time I decided to go a different route.

I threw a flat version of the site up on GitHub, served it using Nginx on Heroku and added a quick Fork me on GitHub badge to the top. Suggest a box moved from being a web form to a pull request. It’s fair to say I did this pretty quickly and made a good few typos on the way. But within a couple of weeks I’ve had 8 pull requests either fixing my bugs, removing dead boxes and adding new ones.

What I’m going to take from this is, if you’re building a community project that’s aimed at developers, then throw the content on GitHub. In my case I have the entire site on there too but I think that’s secondary. Pull requests are much better than any content management system or workflow you’re likely to build, and even more importantly the time to implement something drops hugely.

With all the spare time I don’t have I’ll be thinking about a content management model using GitHub for content, pull requests for workflow and post commit hooks for loading that content into a site or service somewhere.

Static Sites With Nginx On Heroku

I have a few static sites on Heroku but in one case in particular I already had quite an involved nginx configuration - mainly 410s for some previous content and a series of redirects from older versions of the site. The common way of having static sites on Heroku appears to be to use a simple Rack middleware, but that would have meant reimplementing lots of boring redirect logic.

Heroku buildpacks are great. The newer cedar stack is no longer tied to a particular language or framework, instead all of the discovery and knowledge about particular software is put into a buildpack. As well as the Heroku provided list it’s possible to write you’re own. Or in this case use one someone has created earlier.

I’ve just moved Vagrantbox.es over to Heroku due to the closure of a previous service. In doing that, instead of the simple database backed app, I’ve simply thrown all the content onto GitHub. This means anyone can fork the content and send pull requests. Hopefully this should mean I pay a bit more attention to suggestions and new boxes.

The repository is a nice simple example of using the mentioned Heroku Nginx buildpack too. You just run the following command to create a new Heroku application.

heroku create --stack cedar --buildpack http://github.com/essh/heroku-buildpack-nginx.git

And then in typical Heroku fashion use a git remote to deploy changes and updates. The repository is split into a www folder with the site content and a conf folder with the nginx configuration. The only clever parts involve the use of an ERB template for the nginx configuration file so we can pickup the correct port. We also use 1 worker process and don’t automatically daemonize the process - Heroku deals with this itself.

Self Contained Jruby Web Applications

Several things seemed to come together at once to make me want to hack on this particular project. In no particular order:

The Thoughtworks Technology Radar said the following:

Embedding a servlet container, such as Jetty, inside a Java application has many advantages over running the application inside a container. Testing is relatively painless because of the simple startup, and the development environment is closer to production. Nasty surprises like mismatched versions of libraries or drivers are eliminated by not sharing across multiple applications. While you will have to manage and monitor multiple Java Virtual Machines in production using this model, we feel the advantages offered by the simplicity and isolation are significant.

I’ve been getting more interested in JRuby anyway, partly because we’re finding ourselves using both Ruby and Scala at work, and maintaining a single target platform makes sense to me. Throw in the potential for interop between those languages and it’s certainly worth investigating.

Play 2.0 shipped and currently only provides the ability to create a self contained executable with bundled web server. Creating WAR files for more traditional application servers is on the roadmap but interestingly wasn’t deemed essential for the big 2.0 release. I had a nice chat with Martyn Inglis at work about some of the nice side effects of this setup.

And throw in every time I have to configure straight Ruby applications for production environments I get cross. I know where all the bits and pieces are buried and can do it well, but with so many moving parts it’s absolutely no fun whatsoever.

Warbler, the JRuby tool for creating WAR files from Ruby source, has just added the ability to embed Jetty to the master branch.

I decided to take all of this for a quick spin, and the resulting code is up on GitHub.

This is the simplest Rack application possible, it just prints Hello Jetty. And the README covers how to install and run it so I won’t duplcate that information here.

But I will print some nearly meaningless and unscientific benchmarks because, hey, who doesn’t like those?

⚡ ab -c 50 -n 5000 http://localhost:8090/

Server Software:        Jetty(8.y.z-SNAPSHOT)
Server Hostname:        localhost
Server Port:            8090

Document Path:          /
Document Length:        16 bytes

Concurrency Level:      50
Time taken for tests:   1.827 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      555999 bytes
HTML transferred:       80144 bytes
Requests per second:    2736.47 [#/sec] (mean)
Time per request:       18.272 [ms] (mean)
Time per request:       0.365 [ms] (mean, across all concurrent requests)
Transfer rate:          297.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   2.2      1      18
Processing:     1   16   7.7     15      61
Waiting:        0   14   7.2     13      57
Total:          2   18   7.5     17      61

Percentage of the requests served within a certain time (ms)
  50%     17
  66%     19
  75%     21
  80%     22
  90%     27
  95%     30
  98%     42
  99%     52
 100%     61 (longest request)

Running the same test on the same machine but using Ruby 1.9.2-p290 and Thin gives.

Server Software:        thin
Server Hostname:        localhost
Server Port:            9292

Document Path:          /
Document Length:        16 bytes

Concurrency Level:      50
Time taken for tests:   3.125 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      620620 bytes
HTML transferred:       80080 bytes
Requests per second:    1600.16 [#/sec] (mean)
Time per request:       31.247 [ms] (mean)
Time per request:       0.625 [ms] (mean, across all concurrent requests)
Transfer rate:          193.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       9
Processing:     3   31   6.4     33      52
Waiting:        3   25   6.4     28      47
Total:          4   31   6.4     33      52

Percentage of the requests served within a certain time (ms)
  50%     33
  66%     34
  75%     34
  80%     35
  90%     38
  95%     41
  98%     46
  99%     50
 100%     52 (longest request)

2736 requests per second on JRuby/Jetty vs 1600 on Ruby/Thin. As noted this isn’t meaningfully useful, in that it’s a hello world example and I’ve not tried to pick the fastest stacks on either side. I’m more bothered about it not being slower, because the main reason to pursue this approach is simplicity. Having a single self contained artefact that contains all it’s dependencies including a production web server is very appealing.

I’m hoping to give this a go with some less trivial applications, and probably more importantly look to compare a production stack based around these self-contained executables vs the dependency chain that is modern Ruby application stacks.

Thanks to Nick Sieger for both writing Warbler and for helping with a few questions on the JRuby mailing list and on Twitter. Thanks also to James Abley for a few pointers on Java system properties.

Recent Projects And Talks

I’ve been pretty busy with all things GOV.UK recently but I’ve managed to get a few bits of unrelated code up and a few talks in. I’m still pretty busy so here’s a list of some of them rather than a proper blog post.

  • Puppet Data Mining talk from last weeks PuppetCamp in Edinburgh.
  • Introducting Web Operations talk I gave at work to give my mainly non-development colleagues an idea about what it’s all about.
  • Learning from building GOV.UK talk I gave a month back or so to Cambridge Geek Night. We did an excellent full project retrospective after the beta launch and this lists some of the things we learnt.

After someone bugged me on Twitter I realised the small bit of code we’ve been using for our Nagios dashboard wasn’t out in the wild. So introducing Nash, a very simple high level check dashboard which screenscrapes nagiosand runs happily on Heroku.

Although I’ve not been writing too much on here I’ve been keeping Devops Weekly going each week for over a year now. I’ve just crossed 3000 subscribers which is pretty neat for a pet project.

Dashboards At Gov.Uk

This is a bit of a cheat blog post really. I’ve been crazy busy all month with little time for anything except work (specifically shipping the first release of www.gov.uk). I have had a little time to blog over on the Cabinet Office blog though, about work we’ve done with dashboards.


If you’re ever looking for good little hack projects dashboards are perfect, and often hugely useful once up and running. Convincing people of this before you have a few in the office might be hard - so just build something simple in a lunch break and find a screen to put it on. We’ve had great feedback from ours, both from people wandering through the office and from our colleagues who have a better idea of what’s going on.

What's Jekyll?

Jekyll is a static site generator, an open-source tool for creating simple yet powerful websites of all shapes and sizes. From the project’s readme:

Jekyll is a simple, blog aware, static site generator. It takes a template directory […] and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.

It’s an immensely useful tool and one we encourage you to use here with Hyde.

Find out more by visiting the project on GitHub.