Riemann Puppet Module

Thanks to an errant tweet I started playing with Riemann again. It ticks lots of boxes for me, from the clojure to configuration as code and the overloadable dashboard application. What started as using Puppet and Vagrant to investigate Riemann turned into a full blown tool and module writing exercise, resulting in two related projects on GitHub.

  • garethr-riemann is a Puppet module for installing and configuring Riemann. It allows for easily specifying your own server configuration and dashboard views.
  • riemann-vagrant is a Vagrantfile and other code which uses above puppet module to setup a local testing environment.

I like this combination, a separate Puppet module along with a vagrant powered test bed. I’ve written a reasonable rspec based test suite to check the module but it’s always easier to be able to run vagrant provision as well to check everything is working. This also turned out to be the perfect opportunity to use Librarian-Puppet to manage the dependencies and eventually to ship the module to the Puppet Forge.

The Vagrantbox.es Story

A few weeks ago now Vagrantbox.es (a website I maintain for third party hosted Vagrant base boxes) dissapeared from the internet for a few days. This was completely my fault, the (lovely) hosting people ep.io had unfortunately closed down the service they had in beta and I’d been so busy that I hadn’t had chance to move it elsewhere.

The original version of the site (I had the code and good backups of the data) was a pretty simple Django application, but I’d used it to experiment (read over-engineer) with various bits of tech including Varnish, Solr, some ORM caching and lots more. This had been great, but it made it less portable. I had everything described in Puppet, but with virtually no spare time I decided to go a different route.

I threw a flat version of the site up on GitHub, served it using Nginx on Heroku and added a quick Fork me on GitHub badge to the top. Suggest a box moved from being a web form to a pull request. It’s fair to say I did this pretty quickly and made a good few typos on the way. But within a couple of weeks I’ve had 8 pull requests either fixing my bugs, removing dead boxes and adding new ones.

What I’m going to take from this is, if you’re building a community project that’s aimed at developers, then throw the content on GitHub. In my case I have the entire site on there too but I think that’s secondary. Pull requests are much better than any content management system or workflow you’re likely to build, and even more importantly the time to implement something drops hugely.

With all the spare time I don’t have I’ll be thinking about a content management model using GitHub for content, pull requests for workflow and post commit hooks for loading that content into a site or service somewhere.

Static Sites With Nginx On Heroku

I have a few static sites on Heroku but in one case in particular I already had quite an involved nginx configuration - mainly 410s for some previous content and a series of redirects from older versions of the site. The common way of having static sites on Heroku appears to be to use a simple Rack middleware, but that would have meant reimplementing lots of boring redirect logic.

Heroku buildpacks are great. The newer cedar stack is no longer tied to a particular language or framework, instead all of the discovery and knowledge about particular software is put into a buildpack. As well as the Heroku provided list it’s possible to write you’re own. Or in this case use one someone has created earlier.

I’ve just moved Vagrantbox.es over to Heroku due to the closure of a previous service. In doing that, instead of the simple database backed app, I’ve simply thrown all the content onto GitHub. This means anyone can fork the content and send pull requests. Hopefully this should mean I pay a bit more attention to suggestions and new boxes.

The repository is a nice simple example of using the mentioned Heroku Nginx buildpack too. You just run the following command to create a new Heroku application.

heroku create --stack cedar --buildpack http://github.com/essh/heroku-buildpack-nginx.git

And then in typical Heroku fashion use a git remote to deploy changes and updates. The repository is split into a www folder with the site content and a conf folder with the nginx configuration. The only clever parts involve the use of an ERB template for the nginx configuration file so we can pickup the correct port. We also use 1 worker process and don’t automatically daemonize the process - Heroku deals with this itself.

Self Contained Jruby Web Applications

Several things seemed to come together at once to make me want to hack on this particular project. In no particular order:

The Thoughtworks Technology Radar said the following:

Embedding a servlet container, such as Jetty, inside a Java application has many advantages over running the application inside a container. Testing is relatively painless because of the simple startup, and the development environment is closer to production. Nasty surprises like mismatched versions of libraries or drivers are eliminated by not sharing across multiple applications. While you will have to manage and monitor multiple Java Virtual Machines in production using this model, we feel the advantages offered by the simplicity and isolation are significant.

I’ve been getting more interested in JRuby anyway, partly because we’re finding ourselves using both Ruby and Scala at work, and maintaining a single target platform makes sense to me. Throw in the potential for interop between those languages and it’s certainly worth investigating.

Play 2.0 shipped and currently only provides the ability to create a self contained executable with bundled web server. Creating WAR files for more traditional application servers is on the roadmap but interestingly wasn’t deemed essential for the big 2.0 release. I had a nice chat with Martyn Inglis at work about some of the nice side effects of this setup.

And throw in every time I have to configure straight Ruby applications for production environments I get cross. I know where all the bits and pieces are buried and can do it well, but with so many moving parts it’s absolutely no fun whatsoever.

Warbler, the JRuby tool for creating WAR files from Ruby source, has just added the ability to embed Jetty to the master branch.

I decided to take all of this for a quick spin, and the resulting code is up on GitHub.

This is the simplest Rack application possible, it just prints Hello Jetty. And the README covers how to install and run it so I won’t duplcate that information here.

But I will print some nearly meaningless and unscientific benchmarks because, hey, who doesn’t like those?

⚡ ab -c 50 -n 5000 http://localhost:8090/

Server Software:        Jetty(8.y.z-SNAPSHOT)
Server Hostname:        localhost
Server Port:            8090

Document Path:          /
Document Length:        16 bytes

Concurrency Level:      50
Time taken for tests:   1.827 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      555999 bytes
HTML transferred:       80144 bytes
Requests per second:    2736.47 [#/sec] (mean)
Time per request:       18.272 [ms] (mean)
Time per request:       0.365 [ms] (mean, across all concurrent requests)
Transfer rate:          297.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   2.2      1      18
Processing:     1   16   7.7     15      61
Waiting:        0   14   7.2     13      57
Total:          2   18   7.5     17      61

Percentage of the requests served within a certain time (ms)
  50%     17
  66%     19
  75%     21
  80%     22
  90%     27
  95%     30
  98%     42
  99%     52
 100%     61 (longest request)

Running the same test on the same machine but using Ruby 1.9.2-p290 and Thin gives.

Server Software:        thin
Server Hostname:        localhost
Server Port:            9292

Document Path:          /
Document Length:        16 bytes

Concurrency Level:      50
Time taken for tests:   3.125 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      620620 bytes
HTML transferred:       80080 bytes
Requests per second:    1600.16 [#/sec] (mean)
Time per request:       31.247 [ms] (mean)
Time per request:       0.625 [ms] (mean, across all concurrent requests)
Transfer rate:          193.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       9
Processing:     3   31   6.4     33      52
Waiting:        3   25   6.4     28      47
Total:          4   31   6.4     33      52

Percentage of the requests served within a certain time (ms)
  50%     33
  66%     34
  75%     34
  80%     35
  90%     38
  95%     41
  98%     46
  99%     50
 100%     52 (longest request)

2736 requests per second on JRuby/Jetty vs 1600 on Ruby/Thin. As noted this isn’t meaningfully useful, in that it’s a hello world example and I’ve not tried to pick the fastest stacks on either side. I’m more bothered about it not being slower, because the main reason to pursue this approach is simplicity. Having a single self contained artefact that contains all it’s dependencies including a production web server is very appealing.

I’m hoping to give this a go with some less trivial applications, and probably more importantly look to compare a production stack based around these self-contained executables vs the dependency chain that is modern Ruby application stacks.

Thanks to Nick Sieger for both writing Warbler and for helping with a few questions on the JRuby mailing list and on Twitter. Thanks also to James Abley for a few pointers on Java system properties.

Recent Projects And Talks

I’ve been pretty busy with all things GOV.UK recently but I’ve managed to get a few bits of unrelated code up and a few talks in. I’m still pretty busy so here’s a list of some of them rather than a proper blog post.

  • Puppet Data Mining talk from last weeks PuppetCamp in Edinburgh.
  • Introducting Web Operations talk I gave at work to give my mainly non-development colleagues an idea about what it’s all about.
  • Learning from building GOV.UK talk I gave a month back or so to Cambridge Geek Night. We did an excellent full project retrospective after the beta launch and this lists some of the things we learnt.

After someone bugged me on Twitter I realised the small bit of code we’ve been using for our Nagios dashboard wasn’t out in the wild. So introducing Nash, a very simple high level check dashboard which screenscrapes nagiosand runs happily on Heroku.

Although I’ve not been writing too much on here I’ve been keeping Devops Weekly going each week for over a year now. I’ve just crossed 3000 subscribers which is pretty neat for a pet project.

Dashboards At Gov.Uk

This is a bit of a cheat blog post really. I’ve been crazy busy all month with little time for anything except work (specifically shipping the first release of www.gov.uk). I have had a little time to blog over on the Cabinet Office blog though, about work we’ve done with dashboards.


If you’re ever looking for good little hack projects dashboards are perfect, and often hugely useful once up and running. Convincing people of this before you have a few in the office might be hard - so just build something simple in a lunch break and find a screen to put it on. We’ve had great feedback from ours, both from people wandering through the office and from our colleagues who have a better idea of what’s going on.

What's Jekyll?

Jekyll is a static site generator, an open-source tool for creating simple yet powerful websites of all shapes and sizes. From the project’s readme:

Jekyll is a simple, blog aware, static site generator. It takes a template directory […] and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.

It’s an immensely useful tool and one we encourage you to use here with Hyde.

Find out more by visiting the project on GitHub.

Talking To Jenkins From Campfire With Hubot

In what turned out to be a productive holiday hacking with languages I’d not used before, I got round to writing some coffeescript on node.js. This was more to do with scratching a personal itch that pure experimentation. I had a play with Janky (Github’s Jenkins/Hubot mashup) but found it a little opinionated on the Jenkins side, but the campfire integration is excellent. Looking at the Jenkins commands in hubot-scripts though I found those even more opinionated.

The magic of open source though is you can just fix things, then ask nice people if they like what you’ve done. I set about writing a few more general commands and lo, the’ve been quickly merged upstream.

These add:

  • A command to list all your Jenkins jobs and the current state
  • A command to trigger a normal build
  • A command to trigger a build with a list of parameters

campfire window showing jenkins tasks

This was made much easier by first looking at the previous Jenkins commands, and then looking at other scripts in the hubot-scripts repository. The best way of learning a new language/framework is still on the shoulders of others.

I’ve got a few other good ideas for Jenkins related commands. I want to add a filter command to the jobs list, both by name and by current state. For longer running jobs I also want to report whether a build is currently running. And then maybe get information about a specific job, like the last few runs or similar. Any other requests or ideas most welcome.

EC2 Tasks For Fabric

For running ad-hoc commands across a small number of servers you really can’t beat Fabric. It requires nothing other than ssh installed on the servers, is generally just a one-line install and requires next to no syntaxtic fluff between you and the commands you want running. It’s much more of a swiss army knife to Capistranos bread knife.

I’ve found myself doing more and more EC2 work of late and have finally gotten around to making my life easier when using Fabric with Amazon instances. The result of a bit of hacking is Cloth (also available on PyPi). It contains some utility functions and a few handy tasks for loading host details from the EC2 API and using them in your Fabric tasks. No more static lists of host names that constantly need updating in your fabfile.

Specifically, with a fabfile that looks like:

#! /usr/bin/env python
from cloth.tasks import *

You can run:

fab all list

And get something like:

instance-name-1 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-2 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-3 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-4 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-5 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-6 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-7 (xx.xxx.xxx.xx, xxx.xx.xx.xx)
instance-name-8 (xx.xxx.xxx.xx, xxx.xx.xx.xx)

And then you could run:

fab -P all uptime

And you’d happily get the load averages and uptime for all your EC2 instances.

A few more tricks are documented in the GitHub README, including filtering the list by a regex and some convention based mapping to Fabric roles. I’ll hopefully add a few more features as I need them and generally tidy up a few things but I’m pretty happy with it so far.

First Experience Building Something With Clojure

I nearly always try and grab some time over Christmas to try something new and this year I’d been planning on spending some time with Clojure. I have several friends who are big fans, but dipping in and out of a book hadn’t really worked. What I needed was an itch to scratch.

I stuck with a domain I’m pretty familiar with for this first project, namely building a little web application. It renders a web page, makes HTTP requests, parses JSON into native data structures and does a bit of data juggling. Nothing fancy or overly ambitious, I was mainly interested in picking up the syntax, understanding common libraries and getting something built. Here’s what I’ve got:

Dasboard for Jenkins builds

Jenkins has various API endpoints, but most dashboards I’ve seen concentrate on showing you the current status of all the builds. This is hugely useful when it comes to the simple case of continuous integration, but I’ve also been using Jenkins for other automation tasks, and making extensive use of parameterized builds. What the dashboard above concentrates on is showing recent builds for a specific job, along with the parameters used to run them.

Overall it was a fun project. Clojure made much more sense to me building this application than it had from simple examples. The Noir web framework is excellent and proved easy enough to jump into and simple enough that I could read the source code if I was interested in how something worked. The Leiningen build tool made getting started, downloading and managing dependencies and running tests and the application itself easy.

What I didn’t find particularly appealing was the lack of a strong standard library coupled with the difficulty of tracking down suitable libraries. JSON parsing, dealing with HTTP requests and date handing are very common activities in web programming and all needed me to jump around looking at the best way of dealing with the common case. I settled on clj-http, chesire and using Java interop for date formatting. clj-http suffered from having lots of forks on GitHub to navigate through. I started with clojure-json before discovering it had been deprecated. And neither clj-time or date-clj appeared to support unix timestamps as far as I could tell from the source. Throw in some uncertainty over the status of clojure-contrib that isn’t well documented on the official site and it needs some effort to get started.

The working code for this is already up on GitHub and I’d be interested in any Clojure experts showing me the error of my ways.