Devops At Barcamp Cambridge

I’m at barcamp cambridge this weekend and decided to do a short talk on devops. It’s still a term that not too many people have come across and something that lots of people building websites should think about.

I put the slides together this morning so much of it will be familiar to people who have been reading the same blog posts as me over the last year or so. For anyone else either just finding out about the whole world of operations or old hand sysadmins finding likeminded people hopefully the slides will be useful.

DIBI Twitter Aggregator

So, DIBI is just over a week away and lots of people seem to be getting excited. Personally I’m really looking forward to it. I get a nice trip back up to my old home of Newcastle and get to talk geek to an audience of likeminded (and not so likeminded) web professionals. I’m also going to catch up with quite a few people I haven’t seen in quite a while.

Anyway, Twitter is at it’s best at conferences like this. It’s perfect for the where to go now situation. But I also like going back in time after the fact and looking at what people said, especially to get feedback about my talk. With that in mind I’ve build a little aggregator for all things #dibi. So far that appears to be a mixture of excitement and travel chaos.

I specifically wanted something that ran on the server, which discounted an awful lot of pure javascript versions. I think Twitter clients generally do a good job of the right now anyway. I also like to collect data for mining later, assuming I have the time. After a number of discussions on Twitter, then just some wandering around GitHub I found TwitterEngine which is fantastic. I’ve hacked it around quite a bit into something more general purpose. I’ve added some caching, some other App Engine performance tweaks and some monitoring. I’ll commit those to GitHub when I get the chance. I’d forgotten quite how much fun App Engine development was.

I still want to add a few more features before the conference. The styles work quite well on my Android but less well (text too small) on my iPod Touch. So I’ll fix that first. Any other requests?

Hadoop Hive Web Interface

I’ve been playing with Hive recently and liking what I’ve found. In theory at least it provides a very nice, simple way of getting into analysing large data sets. To make it even easier to show other people what you’re up to Hive has a nascent web interface with a little documentation on the wiki

image of hive web ui

On the one hand it’s rather simple at this point, but that should be easily enought to prettify given a bit of time. The bigger problem was getting it working in the first place. What follows worked for me using the latest cloudera packages on debian testing. I’m assuming you already have Hive and Hadoop installed, the basic packages worked fine for me here.

Next up you’ll need the JDK (not just the JRE) as their is some compilation that will go on the first time you run the web interface.

<% syntax_colorize :bash, type=:coderay do %> apt-get install ant sun-java6-jdk <% end %>

Next up I had to modify the installed /etc/hive/conf/hive-site.xml file as follows:

I changed this:

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hivevar/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

To this. Note the hivevar path doesn’t exist so I’m not sure if this was a typo in the source.

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hive/var/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

I also change the following section regarding the metastore name:

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/${user.name}_db;create=true JDBC connect string for a JDBC metastore <% end %>

To this, with a fixed name. When using the above confirguration the file was actually called ${user.name} rather than my username being subsituted in. Elsewhere this seems to work fine.

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true JDBC connect string for a JDBC metastore <% end %>

I’m not convinced the above two changes are needed but have left them here just in case. The main tricky part is making sure a load of environment variables are correctly set. The following worked for me:

<% syntax_colorize :bash, type=:coderay do %> export ANT_LIB=/usr/share/ant/lib export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/lib/jvm/java-6-sun <% end %>

All being well that should allow you to run the hive command with the web interface like so:

<% syntax_colorize :bash, type=:coderay do %> hive –service hwi <% end %>

That should bring up a webserver on port 9999 where you should see something similar to the screenshot above.

More Django Project Templates

Quite a while ago I released some handy scripts for building up Django project layouts. Part of the reason behind this was to kick off discussions about ideal pproject layouts and maybe even get a few user submitted layouts into the project. Paster makes this soft of thing really easy to do and I was interested in what people might come up with.

Well, the team over at The Chicargo Tribune have done just that, creating a branch of the original project and adding a very rich example project to it

I’ve finally gotten around to merging that back into my branch and releasing a new version to PyPi. It’s got a good number of PostGreSQL commands that might be of use, as well as several Amazon S3 and EC2 examples. Even if none of these complete templates fit what you’re looking for exactly their are lots of smaller fabric recipes worth taking a look at.

You can read all about what is in the project over on their developer blog, which if filled with other interesting web and operations goodness.

So, huge thanks to Chris, Brian, Ryan and Joe for some great work.

Sandbox Your Ruby Gems

I’m a huge fan of virtualenv for Python. It’s a simple tool that lets you have an isolated python environment into which you can install libraries via setup tools. It makes experimenting with different versions of code easier and avoids lots of problems with hard to find bugs caused by unknown third party conflicts.

Sandbox aims to do exactly the same for Ruby. It isolates your gem installation from the system libraries and providesa script to activate the named environment. The GitHub repo has been around since 2008, yet only has 37 followers. I’m not sure whether that is because another more popular and feature rich solution exists or because Ruby people haven’t yet found this technique useful. For me finding tools that work the same way as those I already like in other languages makes me happy.

Static Generator For Web Services

My latest on a train project is Dumper, a static generator for web services. I’m a huge fan of Nanoc and tools like Jekyl for building websites. But I spend at least as much time building small webservicds. I wanted something super simple that would let me expose data I had access to as a read only web service.

At the moment that means using a mysql database, specifying a SQL query and running a python script. Hey presto you have lots and lots of XML and JSON files representing your data. Dumper provides hooks for you to customise the ourput or even overide the database layer. It should be possible if you were so inclined to replace the mysql backend with another database, or other type of data store. Hopefully some of these might end up in my branch at a later date.

At a basic level all you need is a config file which looks a little like:

<% syntax_colorize :ini, type=:coderay do %> [Dumper] path: people index: id backend: mysql

[Database] sql: SELECT id, name FROM people host: localhost username: root password: database: dumper <% end %>

And then run a command line application against that file:

<% syntax_colorize :bash, type=:coderay do %> dumper -c people.ini dump <% end %>

The application supports a number of flags for specifying where you want the files to be generated, what your config file is called and to clean up any generated files if you want to try again. The output will let you know which files have been updated, which deleted and which added too. If you’d rather have a single file but with all your records in then that’s easy too - just add something to the config file.

It’s somewhat early days for Dumper, and I’ve not seen anything similar so their are definately some rough edges that could do with some work. All of that will really come down to how much use it gets. I’d appreciate any feedback from anyone with a similar itch to scratch too.

Piston And Sanitising Json Callbacks

I’m a big fan of Piston, the django app for creating RESTful web services. As part of a project at work I ended up looking through the source code, mainly at some of the neat tricks of serialisation of objects. While poking around I came across something in my mind that wanted fixing. This being open source rather than just file a bug report I setup a bitbucket account and got hacking.

<% syntax_colorize :python, type=:coderay do %> def render(self, request): cb = request.GET.get(‘callback’) seria = simplejson.dumps(self.construct(), cls=DateTimeAwareJSONEncoder, ensure_ascii=False, indent=4)

    # Callback
    if cb:
        return '%s(%s)' % (cb, seria)

    return seria

<% end %>

Can you spot the problem? Note the use of the callback passed in the query string arguments and then used without any checking in the output.

What we really want to do is something like this:

<% syntax_colorize :python, type=:coderay do %> if is_valid_jsonp_callback_value(cb): <% end %>

Which is exactly what has just gone into the code for Piston. This article contains lots of background information about why JSONP callbacks can be a security hole, and helpfully provides a nice Python module to help with the sanitisation. Nice to be on the authors list for something I’m using actively.

Thanks to Jesper Noehr for Piston, for some pointers on bitbucket and for quickly taking the patch. If you’re accepting a callback on your site or application, especially if it’s a public service, you really want to do something like this or you just might have an exploitable security hole.

Mysql Support For Cucumber Nagios

I just noticed Lindsay had committed the amqp steps for cucumber-nagios and remembered I hadn’t mentioned on here some other work I’ve been doing on the same project. We use MySQL quite a bit at work and I’ve been wanting to extent our monitoring for a while. So I set about thinking how that would work with cucumber-nagios. What I’ve come up with looks something like this:

Feature: localhost
  To make sure the rest of the system is in order
  Our database server should not be overloaded

  Scenario: check running processes count
    Given I have a MySQL server on localhost
    And I use the username root
    Then it should have less than 10 processes

  Scenario: check queries per second
    Given I have a MySQL server on localhost
    And I use the username root
    Then it should have less than 200 select queries per second
    Then it should have less than 300 queries per second
    Then it should have less than 5 slow queries pers second
    Then it should have at least 10 queries per second

The numbers, username details and host details are all variables. So you can write senarios for your specific deployments. The tests over time are based on a very short lived sampling mechanic which I’ve yet to test in anger. I’m not sure just yet is this approach will lead to too many false positives but we’ll have to see.

This mysql gmetric script gave me lots of the ideas for invidual steps. I’ll be writing more about some work I’ve been doing with cucumber-nagios and ganglia soon as well.

For the moment if anyone want’s to try these steps out you can either check out my cucumber-nagios fork or just grab the steps from the mysql_steps.rb file. Any feedback much appreciated.

New Nanoc Powered Blog

It’s taken longer than I would have liked but I’ve finally gotten around to relaunching this site on nanoc.

After looking through lots of code from the nanoc showcase I had a pretty good feel for how I wanted things to work and I then used the excellent nanoc3_blog template to get started. I’ve hacked around quite a bit with the code to get things how I wanted them. Using Less to make the CSS more manageable, Coderay for lovely syntax highlighting and making everything default to textile rather than markdown. I’ve also written import scripts for my old blog (in Python) and another one so I can use tumblr is I want to create items on here (in Ruby).

Nanoc really is a joy to work with and I’m hoping that alone will get me back into writing more freqently than I have done for a while. The fact I can just write in Vim or WriteRoom or whatever editor I have to hand feels nice. And using Git, Rake and Rsync complete my little toolset. Everything is still served via Nginx.

I’ve thrown the all the code, including all the content, up on GitHub for anyone interested. Back to writing.

On blogging platforms

Every now and again I feel the need for a change and spent a little time tonight looking at different blogging software. I’m currently running a custom django app I wrote a good while ago, more as an excuse to play with Django than anything. Previously I’ve used Wordpress, Textpattern and even Radiant. But I’m coming to the conclusion that what I want doesn’t exist. In theory that means an opportunity for someone to enter the market, in reality I think it might just be me. So, what do I want?

  • Customisable URL patterns and redirects. Tumblr did well until this point but I have years of content on nice URLs and I’d not going to throw that away.
  • Command line interface for posting snippets
  • Nice looking web based API (bonus points to Typepad here for some nifty features like pubsubhubhub)
  • iPod touch app
  • Real control over the design, not just pre-made templates unless you pay for a pro account
  • Hackable (this could mean anything, but I know it when I see it. So not WordPress then.)
  • Export plain text out
  • Use my own domain name

I have a sneaking suspicion what I might be thinking about is a private tumblr blog used as a datasource for Nanoc3. But that relies on me having the time to build that as I can’t find anyone who might have written such a think. Maybe one day.