Dibi Video

The videos from the DIBI conference are now up on Vimeo. Lots of good stuff and more to come. The one disadvantage of a a two track conference is you miss half the talks so when I get a chance I’ll be catching up with those talks I didn’t get chance to see.

Very Simple Custom Ganglia Metrics

Logging useful information from running systems for monitoring purposes is pretty important if you want to see how your software is behaving in the real world. It’s one thing to test something locally, another to test something under load on a testing environment and quite something else to watch production code while running.

The numbers can be useful for checking newly released code isn’t having a detrimental effect on performance, observing what changes in load are doing to systems over time and planning for future capacity growth.

Creating log files, agregating files from multiple machines and then analysing the results is one approach. Another is using something like Ganglia. Ganglia is great for trending data over time, and ties in nicely to Nagios for reporting. Installing the monitoring daemon on machines and generally getting the default checks (memory, disk, network, etc.) up and running is nice and easy. From there using the gmetric command line to create custom metrics (say checking some mysql statistics) is again straight forward.

So far, so good. The only issue I’ve run into was creating custom metrics on the fly from a machine outside the network. For bonus points these metrics were nothing to do with the machine on which they were collected, but to do with the system overall. More specifically the metrics were web site performance data gathered via some cucumber and celerity scripts.

For this I knocked up a tiny web service wrapper around the gmetric command line. It’s very feature light at the moment (I only needed it to collect time based stats at regular intervals) but it could be made more featureful and expose the rest of the gmetric API if needs be. It does it using a very simple URL scheme:

<% syntax_colorize :bash, type=:coderay do %> /{metric-name}/{metric-value}/ <% end %>

So for example I can create metrics on the fly simply using an HTTP client or a web browser.

<% syntax_colorize :bash, type=:coderay do %> /GarethsCommuteTime/3600/ /ExternalPageLoad/2.005/ <% end %>

The code is up on GitHub and is completely self contained. I’ve been running it mainly using spawning but any small WSGI server could surfice. I looked very briefly at the API for Ganglia but found the gmetric approach to be much simpler.

And if you’re a Ganglia expert and know a much better way of doing this then let me know. Ganglia is awesome, and collecting metrics is both useful and fun (for me at least) but it’s not always obvious how to get into creating simple custom metrics which tell you something about your own appliction code.

Devops Twitter Aggregator

I’ve been hacking on appengine again and have thrown up a simple twitter aggregator for devops. It’s again based on TwitterEngine with an increasing number of additions and changes.

As well as just the tweets I want to build a few other small features. The first of which is link extraction, so at the moment you’ll see recent links a the top of the page. I’ll hopefully make that a little more useful, with better browing and converting short urls into the real ones. I also have vague plans for providing exports, listing people talking about devops and some useful graphs to track general activity around devops.

<img src=“http://image-host.appspot.com/i/img?id=agppbWFnZS1ob3N0cg0LEgVJbWFnZRjphAEM" alt=“Devops aggregator on Twitter”

I’m really interested to see if the whole discussion around the term devops grows over the next year or so, and I’m hoping this will make it a little easier to see that change happen.

Installing Integrity On Debian/Ubuntu

I’ve been playing with Integrity again as a simple continuous integration server and have installed it on a few debian and ubuntu machines in the last few weeks. The current site has good installation instructions for the Ruby side of things but leaves it as an excercise for the installer to make sure all the system level dependencies are installed.

So probably as much for me in the future, here is what I had to install to get the installation instructions to work for me.

<% syntax_colorize :bash, type=:coderay do %> apt-get install build-essential apt-get install ruby apt-get install rdoc apt-get install sqlite3 apt-get install libdbd-sqlite3-ruby apt-get install libdataobjects-sqlite3-ruby1.8 apt-get install libsqlite3-dev apt-get install libxml2 apt-get install libxml2-dev apt-get install libxsl-dev apt-get install libopenssl-ruby <% end %>

I also needed to install the following package on Ubuntu:

<% syntax_colorize :bash, type=:coderay do %> apt-get install ruby1.8-dev <% end %>

If you want to use a database other than the default SQLite then you won’t need those packages and I’ll assume you know what you’re doing.

You're Going To Need A Bigger Toolbox

I’m just getting back from Newcastle after getting to present at the first Design It Build It conference. It was great to be back up in Newcastle and to see lots of familiar faces. As with most conferences it was also good to meet new people (especially those for whom it was their first conference) and to listen to people talking about interesting stuff. Personal highlights for me were David Singleton from Last.fm and Michael Brunton Spall from The Guardian going through really interesting case studies from their respective organisations. It’s the sort of gritty content it’s often hard to come by. Speaking did mean I missed out on most of the design track unfrotunately, but videos should be available soon and by all accounts it sounds like the larger designer crowd went home happy.

I think my presentation at DIBI went OK. I’d got a little bit carried away with cramming content in, which meant it felt rushed at times and I still went over by 5 minutes into my Q&A time. But hey, a few people said nice things afterwards.

I wanted to tell the world (of web developers) about as many different tools as I could. I think most people who read this blog have probably come across most of this software, heck you might be commiting code to it or already using it in production. But lots of people haven’t ever heard of Memcached, never mind Cassandra or RabbitMQ. And more inportant than the specific software is the differnt types of tools available. Small web servers, message queues, HTTP caches, etc. Conferences are a good place to find and educate people. Hopefully I managed to do just that.

One thing I hope I got across, I’m not for a moment saying you shouldn’t be using tools you know and love. Nor am I saying you should jump in and start using lots of crazy software. But keeping an eye on new developments can serve you well when it comes to deciding whether the best approach is really to build something from scratch. I’ve spoken to several people over the last few weeks who where starting to write simple queing systems using cron and mysql, or using a hand rolled file system based caching setup. And in both cases I think they would be better served by existing tools.

My slides had a lot of links in them and I mentioned during my talk I’m put a list of them somewhere:

Small Web Servers


Search Engines

Message Queues

Non-relational Data Stores

Data Mining

Functional Testing

Server Provisioning

Devops At Barcamp Cambridge

I’m at barcamp cambridge this weekend and decided to do a short talk on devops. It’s still a term that not too many people have come across and something that lots of people building websites should think about.

I put the slides together this morning so much of it will be familiar to people who have been reading the same blog posts as me over the last year or so. For anyone else either just finding out about the whole world of operations or old hand sysadmins finding likeminded people hopefully the slides will be useful.

DIBI Twitter Aggregator

So, DIBI is just over a week away and lots of people seem to be getting excited. Personally I’m really looking forward to it. I get a nice trip back up to my old home of Newcastle and get to talk geek to an audience of likeminded (and not so likeminded) web professionals. I’m also going to catch up with quite a few people I haven’t seen in quite a while.

Anyway, Twitter is at it’s best at conferences like this. It’s perfect for the where to go now situation. But I also like going back in time after the fact and looking at what people said, especially to get feedback about my talk. With that in mind I’ve build a little aggregator for all things #dibi. So far that appears to be a mixture of excitement and travel chaos.

I specifically wanted something that ran on the server, which discounted an awful lot of pure javascript versions. I think Twitter clients generally do a good job of the right now anyway. I also like to collect data for mining later, assuming I have the time. After a number of discussions on Twitter, then just some wandering around GitHub I found TwitterEngine which is fantastic. I’ve hacked it around quite a bit into something more general purpose. I’ve added some caching, some other App Engine performance tweaks and some monitoring. I’ll commit those to GitHub when I get the chance. I’d forgotten quite how much fun App Engine development was.

I still want to add a few more features before the conference. The styles work quite well on my Android but less well (text too small) on my iPod Touch. So I’ll fix that first. Any other requests?

Hadoop Hive Web Interface

I’ve been playing with Hive recently and liking what I’ve found. In theory at least it provides a very nice, simple way of getting into analysing large data sets. To make it even easier to show other people what you’re up to Hive has a nascent web interface with a little documentation on the wiki

image of hive web ui

On the one hand it’s rather simple at this point, but that should be easily enought to prettify given a bit of time. The bigger problem was getting it working in the first place. What follows worked for me using the latest cloudera packages on debian testing. I’m assuming you already have Hive and Hadoop installed, the basic packages worked fine for me here.

Next up you’ll need the JDK (not just the JRE) as their is some compilation that will go on the first time you run the web interface.

<% syntax_colorize :bash, type=:coderay do %> apt-get install ant sun-java6-jdk <% end %>

Next up I had to modify the installed /etc/hive/conf/hive-site.xml file as follows:

I changed this:

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hivevar/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

To this. Note the hivevar path doesn’t exist so I’m not sure if this was a typo in the source.

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hive/var/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

I also change the following section regarding the metastore name:

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/${user.name}_db;create=true JDBC connect string for a JDBC metastore <% end %>

To this, with a fixed name. When using the above confirguration the file was actually called ${user.name} rather than my username being subsituted in. Elsewhere this seems to work fine.

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true JDBC connect string for a JDBC metastore <% end %>

I’m not convinced the above two changes are needed but have left them here just in case. The main tricky part is making sure a load of environment variables are correctly set. The following worked for me:

<% syntax_colorize :bash, type=:coderay do %> export ANT_LIB=/usr/share/ant/lib export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/lib/jvm/java-6-sun <% end %>

All being well that should allow you to run the hive command with the web interface like so:

<% syntax_colorize :bash, type=:coderay do %> hive –service hwi <% end %>

That should bring up a webserver on port 9999 where you should see something similar to the screenshot above.

More Django Project Templates

Quite a while ago I released some handy scripts for building up Django project layouts. Part of the reason behind this was to kick off discussions about ideal pproject layouts and maybe even get a few user submitted layouts into the project. Paster makes this soft of thing really easy to do and I was interested in what people might come up with.

Well, the team over at The Chicargo Tribune have done just that, creating a branch of the original project and adding a very rich example project to it

I’ve finally gotten around to merging that back into my branch and releasing a new version to PyPi. It’s got a good number of PostGreSQL commands that might be of use, as well as several Amazon S3 and EC2 examples. Even if none of these complete templates fit what you’re looking for exactly their are lots of smaller fabric recipes worth taking a look at.

You can read all about what is in the project over on their developer blog, which if filled with other interesting web and operations goodness.

So, huge thanks to Chris, Brian, Ryan and Joe for some great work.

Sandbox Your Ruby Gems

I’m a huge fan of virtualenv for Python. It’s a simple tool that lets you have an isolated python environment into which you can install libraries via setup tools. It makes experimenting with different versions of code easier and avoids lots of problems with hard to find bugs caused by unknown third party conflicts.

Sandbox aims to do exactly the same for Ruby. It isolates your gem installation from the system libraries and providesa script to activate the named environment. The GitHub repo has been around since 2008, yet only has 37 followers. I’m not sure whether that is because another more popular and feature rich solution exists or because Ruby people haven’t yet found this technique useful. For me finding tools that work the same way as those I already like in other languages makes me happy.