Git Pre Recieve Hook For Integrity

I’m getting married rather soon so time has been somewhat short (in a good way) for just hacking on stuff, but I’ve finally found a little bit of time to play with something I’ve been mulling over for a while. Namely a continuous deployment workflow using the integrity continous integration server.

I’m hoping to have an incredibly simple but fully operation example available at some point - mainly to act as a good discussion point. For now here’s my current pre-receive hook.

Python: What To Use?

My friend Jamie Rumbelow has started a new project and decided to use Python. He asked a great question over on Stack Overflow which basically came down to what should I use for my first proper Python web application project. After a quick prompting on twitter I decided to have a go. I’ve cross posted my anwser below more because it took as long as a typical blog post to write.

Frameworks

OK, so I’m a little biased here as I currently make extensive use of Django and organise the Django User Group in London so bear that in mind when reading the following.

Start with Django because it’s a great gateway drug. Lots of documentation and literature, a very active community of people to talk to and lots of example code around the web.

That’s a completely non-technical reason. Pylons is probably purer in terms of Python philosophy (being much more a collection of discrete bits and pieces) but lots of the technical stuff is personal preference, at least until you get into Python more. Compare the very active Django tag on Stack Overflow with that of pylons or turbogears though and I’d argue getting started is simply easier with Django irrespective of anything to do with code.

Personally I default to Django, but find that an increasing amount of time I actually opt for writing using simpler micro frameworks (think Sinatra rather than Rails). Lots of things to choose from (good list here). I tend to use MNML (because I wrote parts of it and it’s tiny) but others are actively developed. I tend to do this for small, stupid web services which are then strung together with a Django project in the middle serving people.

Worth noting here is appengine. You have to work within it’s limitations and it’s not designed for everything but it’s a great way to just play with Python and get something up and working quickly. It makes a great testbed for learning and experimentation.

Mongo/ORM

On the MongoDB front you’ll likely want to look at the basic python mongo library first to see if it has everything you need. If you really do want something a little more ORM like then mongoengine might be what you’re looking for. A bunch of folks are also working on making Django specifically integrate more seamlessly with nosql backends. Some of that is for future Django releases, but django-norel has code now.

For relational data SQLAlchemy is good if you want something standalone. Django’s ORM is also excellent if you’re using Django.

API

The most official Oauth library is python-oauth2, which handily has a Django example as part of it’s docs.

Piston is a Django app which provides lots of tools for building APIs. It has the advantage of being pretty active and well maintained and in production all over the place. Other projects exist too, including Dagny which is an early attempt to create something akin to RESTful resources in Rails.

In reality any Python framework (or even just raw WSGI code) should be reasonably good for this sort of task.

Testing

Python has unittest as part of it’s standard library, and unittest2 is in python 2.7 (but backported to previous versions too). Some people also like Nose, which is an alternative test runner with some additional features. Twill is also nice, it’s a “a simple scripting language for Web browsing”, so handy for some functional testing. Freshen is a port of cucumber to Python. I haven’t yet gotten round to using this in anger, but a quick look now suggests it’s much better than when I last looked.

I actually also use Ruby for high level testing of Python apps and apis because I love the combination of celerity and cucumber. But I’m weird and get funny looks from other Python people for this.

Message Queues

For a message queue, whatever language I’m using, I now always use RabbitMQ. I’ve had some success with stompserver in the past but Rabbit is awesome. Don’t worry that it’s not itself written in Python, neither is PostgresSQL, Nginx or MongoDB - all for good reason. What you care about are the libraries available. What you’re looking for here is py-amqplib which is a low level library for talking amqp (the protocol for talking to rabbit as well as other message queues). I’ve also used Carrot, which is easier to get started with and provides a nicer API. Think bunny in Ruby if you’re familiar with that.

Environment

Whatever bits and pieces you decide to use from the Python ecosystem I’d recommend getting to who pip and virtualenv - note that fabric is also cool, but not essential and these docs are out of date on that tool). Think about using Ruby without gem, bundler or rvm and you’ll be in the right direction.

Dibi Video

The videos from the DIBI conference are now up on Vimeo. Lots of good stuff and more to come. The one disadvantage of a a two track conference is you miss half the talks so when I get a chance I’ll be catching up with those talks I didn’t get chance to see.

Very Simple Custom Ganglia Metrics

Logging useful information from running systems for monitoring purposes is pretty important if you want to see how your software is behaving in the real world. It’s one thing to test something locally, another to test something under load on a testing environment and quite something else to watch production code while running.

The numbers can be useful for checking newly released code isn’t having a detrimental effect on performance, observing what changes in load are doing to systems over time and planning for future capacity growth.

Creating log files, agregating files from multiple machines and then analysing the results is one approach. Another is using something like Ganglia. Ganglia is great for trending data over time, and ties in nicely to Nagios for reporting. Installing the monitoring daemon on machines and generally getting the default checks (memory, disk, network, etc.) up and running is nice and easy. From there using the gmetric command line to create custom metrics (say checking some mysql statistics) is again straight forward.

So far, so good. The only issue I’ve run into was creating custom metrics on the fly from a machine outside the network. For bonus points these metrics were nothing to do with the machine on which they were collected, but to do with the system overall. More specifically the metrics were web site performance data gathered via some cucumber and celerity scripts.

For this I knocked up a tiny web service wrapper around the gmetric command line. It’s very feature light at the moment (I only needed it to collect time based stats at regular intervals) but it could be made more featureful and expose the rest of the gmetric API if needs be. It does it using a very simple URL scheme:

<% syntax_colorize :bash, type=:coderay do %> /{metric-name}/{metric-value}/ <% end %>

So for example I can create metrics on the fly simply using an HTTP client or a web browser.

<% syntax_colorize :bash, type=:coderay do %> /GarethsCommuteTime/3600/ /ExternalPageLoad/2.005/ <% end %>

The code is up on GitHub and is completely self contained. I’ve been running it mainly using spawning but any small WSGI server could surfice. I looked very briefly at the API for Ganglia but found the gmetric approach to be much simpler.

And if you’re a Ganglia expert and know a much better way of doing this then let me know. Ganglia is awesome, and collecting metrics is both useful and fun (for me at least) but it’s not always obvious how to get into creating simple custom metrics which tell you something about your own appliction code.

Devops Twitter Aggregator

I’ve been hacking on appengine again and have thrown up a simple twitter aggregator for devops. It’s again based on TwitterEngine with an increasing number of additions and changes.

As well as just the tweets I want to build a few other small features. The first of which is link extraction, so at the moment you’ll see recent links a the top of the page. I’ll hopefully make that a little more useful, with better browing and converting short urls into the real ones. I also have vague plans for providing exports, listing people talking about devops and some useful graphs to track general activity around devops.

<img src=“http://image-host.appspot.com/i/img?id=agppbWFnZS1ob3N0cg0LEgVJbWFnZRjphAEM" alt=“Devops aggregator on Twitter”

I’m really interested to see if the whole discussion around the term devops grows over the next year or so, and I’m hoping this will make it a little easier to see that change happen.

Installing Integrity On Debian/Ubuntu

I’ve been playing with Integrity again as a simple continuous integration server and have installed it on a few debian and ubuntu machines in the last few weeks. The current site has good installation instructions for the Ruby side of things but leaves it as an excercise for the installer to make sure all the system level dependencies are installed.

So probably as much for me in the future, here is what I had to install to get the installation instructions to work for me.

<% syntax_colorize :bash, type=:coderay do %> apt-get install build-essential apt-get install ruby apt-get install rdoc apt-get install sqlite3 apt-get install libdbd-sqlite3-ruby apt-get install libdataobjects-sqlite3-ruby1.8 apt-get install libsqlite3-dev apt-get install libxml2 apt-get install libxml2-dev apt-get install libxsl-dev apt-get install libopenssl-ruby <% end %>

I also needed to install the following package on Ubuntu:

<% syntax_colorize :bash, type=:coderay do %> apt-get install ruby1.8-dev <% end %>

If you want to use a database other than the default SQLite then you won’t need those packages and I’ll assume you know what you’re doing.

You're Going To Need A Bigger Toolbox

I’m just getting back from Newcastle after getting to present at the first Design It Build It conference. It was great to be back up in Newcastle and to see lots of familiar faces. As with most conferences it was also good to meet new people (especially those for whom it was their first conference) and to listen to people talking about interesting stuff. Personal highlights for me were David Singleton from Last.fm and Michael Brunton Spall from The Guardian going through really interesting case studies from their respective organisations. It’s the sort of gritty content it’s often hard to come by. Speaking did mean I missed out on most of the design track unfrotunately, but videos should be available soon and by all accounts it sounds like the larger designer crowd went home happy.

I think my presentation at DIBI went OK. I’d got a little bit carried away with cramming content in, which meant it felt rushed at times and I still went over by 5 minutes into my Q&A time. But hey, a few people said nice things afterwards.

I wanted to tell the world (of web developers) about as many different tools as I could. I think most people who read this blog have probably come across most of this software, heck you might be commiting code to it or already using it in production. But lots of people haven’t ever heard of Memcached, never mind Cassandra or RabbitMQ. And more inportant than the specific software is the differnt types of tools available. Small web servers, message queues, HTTP caches, etc. Conferences are a good place to find and educate people. Hopefully I managed to do just that.

One thing I hope I got across, I’m not for a moment saying you shouldn’t be using tools you know and love. Nor am I saying you should jump in and start using lots of crazy software. But keeping an eye on new developments can serve you well when it comes to deciding whether the best approach is really to build something from scratch. I’ve spoken to several people over the last few weeks who where starting to write simple queing systems using cron and mysql, or using a hand rolled file system based caching setup. And in both cases I think they would be better served by existing tools.

My slides had a lot of links in them and I mentioned during my talk I’m put a list of them somewhere:

Small Web Servers

Caching

Search Engines

Message Queues

Non-relational Data Stores

Data Mining

Functional Testing

Server Provisioning

Devops At Barcamp Cambridge

I’m at barcamp cambridge this weekend and decided to do a short talk on devops. It’s still a term that not too many people have come across and something that lots of people building websites should think about.

I put the slides together this morning so much of it will be familiar to people who have been reading the same blog posts as me over the last year or so. For anyone else either just finding out about the whole world of operations or old hand sysadmins finding likeminded people hopefully the slides will be useful.

DIBI Twitter Aggregator

So, DIBI is just over a week away and lots of people seem to be getting excited. Personally I’m really looking forward to it. I get a nice trip back up to my old home of Newcastle and get to talk geek to an audience of likeminded (and not so likeminded) web professionals. I’m also going to catch up with quite a few people I haven’t seen in quite a while.

Anyway, Twitter is at it’s best at conferences like this. It’s perfect for the where to go now situation. But I also like going back in time after the fact and looking at what people said, especially to get feedback about my talk. With that in mind I’ve build a little aggregator for all things #dibi. So far that appears to be a mixture of excitement and travel chaos.

I specifically wanted something that ran on the server, which discounted an awful lot of pure javascript versions. I think Twitter clients generally do a good job of the right now anyway. I also like to collect data for mining later, assuming I have the time. After a number of discussions on Twitter, then just some wandering around GitHub I found TwitterEngine which is fantastic. I’ve hacked it around quite a bit into something more general purpose. I’ve added some caching, some other App Engine performance tweaks and some monitoring. I’ll commit those to GitHub when I get the chance. I’d forgotten quite how much fun App Engine development was.

I still want to add a few more features before the conference. The styles work quite well on my Android but less well (text too small) on my iPod Touch. So I’ll fix that first. Any other requests?

Hadoop Hive Web Interface

I’ve been playing with Hive recently and liking what I’ve found. In theory at least it provides a very nice, simple way of getting into analysing large data sets. To make it even easier to show other people what you’re up to Hive has a nascent web interface with a little documentation on the wiki

image of hive web ui

On the one hand it’s rather simple at this point, but that should be easily enought to prettify given a bit of time. The bigger problem was getting it working in the first place. What follows worked for me using the latest cloudera packages on debian testing. I’m assuming you already have Hive and Hadoop installed, the basic packages worked fine for me here.

Next up you’ll need the JDK (not just the JRE) as their is some compilation that will go on the first time you run the web interface.

<% syntax_colorize :bash, type=:coderay do %> apt-get install ant sun-java6-jdk <% end %>

Next up I had to modify the installed /etc/hive/conf/hive-site.xml file as follows:

I changed this:

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hivevar/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

To this. Note the hivevar path doesn’t exist so I’m not sure if this was a typo in the source.

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hive/var/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

I also change the following section regarding the metastore name:

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/${user.name}_db;create=true JDBC connect string for a JDBC metastore <% end %>

To this, with a fixed name. When using the above confirguration the file was actually called ${user.name} rather than my username being subsituted in. Elsewhere this seems to work fine.

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true JDBC connect string for a JDBC metastore <% end %>

I’m not convinced the above two changes are needed but have left them here just in case. The main tricky part is making sure a load of environment variables are correctly set. The following worked for me:

<% syntax_colorize :bash, type=:coderay do %> export ANT_LIB=/usr/share/ant/lib export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/lib/jvm/java-6-sun <% end %>

All being well that should allow you to run the hive command with the web interface like so:

<% syntax_colorize :bash, type=:coderay do %> hive –service hwi <% end %>

That should bring up a webserver on port 9999 where you should see something similar to the screenshot above.