DJUGL February

The next Django User Group London is in two weeks time. You can register over on Eventwax. So far we have Brad talking with more speakers to be announced shortly. Hope to see a few people there.

The rise of the in-house team?

I was just thinking about the Design it Build it conference later in the year (full disclosure: I’m speaking). Specifically the people speaking on the developer track. Between myself, Michael Brunton-Spall from The Guardian, David Singleton from Last.fm and Emma Persky from Gumtree four of the six speakers work on in-house teams. Not early stage start-ups, not large software/advertising companies, not as freelancers but in a reasonable sized company on a development team.

My original background was working in agencies, and then a stint working for myself and I’m constantly interested by the different facets of the web software industry. I think conferences or magazines aimed at your average interested web developer or designer play an interesting role in what people perceive as normal. If all you see are people who work as a freelancer you start to think that must be way cooler than whatever it is you’re doing at the time. I remember attending the first @media event and being surprised at the small number of people from larger agencies. Everyone was from smaller boutique places, or Yahoo! or a freelancer. Now lots of people I see at events are involved in startups.

Interestingly as well none of it is about a particular language or framework. Thinking about it as I type I think between the four of us we spend are day jobs mainly using different languages (java, python, php, perl). But I bet we all work in environments where we use other languages at least occasionally, or at the least the people around us do. Mixed environments are commonplace in companies that have been around a good while and run on software. They are far less common elsewhere with startups using whatever is cool (lets build a mobile search engine in Haskel anyone?) and small agencies often using whatever they built their first client website with (probably PHP).

What I’m really interested in though is the type of topics that are going to be talked about. Last.fm vs the Xbox, Scaling the Guardian, my rambling thoughts on a modern toolbox for developers beyond your average LAMP or .NET stack. This is the sort of think I’m interested in. It’s the sort of problems I like having. It’s also, I think, the sort of stuff that doesn’t get a showing a many mainstream conferences. I’m hoping it’s all going to be fairly practical too - things that whatever role people have they can take away and apply.

RabbitMQ support for Cucumber-nagios

I’ve been doing more operations related work of late and am starting to use Cucumber-nagios for various monitoring tasks. Nagios might not be the most attractive of web interfaces but it’s so simple to get clients up and running and extend to do what you need. Cucumber however has a lovely, text based, user interface. And although I’m mainly working with Python at the moment cucumber-nagios (written in Ruby) really is the easiest way I’ve found of writing simple functional tests.

Cucumber-nagios is the creation of Lindsay Holmwood and after several brief conversations over Twitter I set about adding a feature I wanted for my own monitoring setup. Namely support for keeping an eye on RabbitMQ.

At the moment the code is in a fork on GitHub but I’m hoping that once any rough edges have been ironed out and a few people have kicked the tyres then it will make it’s way into trunk. If you want to use this with an existing project straight away you can always drop the contents of amqp_steps.rb into your feature steps file after installing the amqp gem.

I’ve included a little documentation in the fork as well with a quick example:

Feature: github.com
  To make sure the rest of the system is in order
  All our message queues must not be backed up
  Scenario: test queue
    Given I have a AMQP server on rabbit.github.com
    And I want to check on the fork queue
    Then it should have less than 400 messages
    Then it should have at least 5 consumers
    Then it should have less than 50 messages per consumer

My main usecase was to keep an eye on a known queue size and number of consumers. I’m sure I’m missing some features at the moment so any feedback much appreciated.

Processing large files with sed and awk

I found myself using a couple of powerful but underused command line applications this week and felt like sharing.

My problem involved a large text file with over three million lines and a script that processed the file line by line, in this case running a SQL query against a remote database.

My script didn’t try and process everything in one go, rather taking off large chunks and processing them in turn, then stopping and printing out the number of lines processed. This was mainly so I could keep an eye on it and make sure it wasn’t having a detrimental affect on other systems. But once I’d run the script once (and processed the first quarter of a million records or so) I wanted to run it again, except without the first batch of lines. For this I used sed. The following command creates a new file with the contents of the original file, minus the first 254263 lines.

sed '1,254263d' original.txt > new.txt

I could then run my script with the input from new.txt and not have to reprocess the deleted lines. My next problem came when the network connection between the box running the script and the database dropped out. The script printed out the contents of the last line successfully processed, so what I wanted was a new file with the all contents of the old file past the last line. The following awk command does just that, assuming the last line processed was f251f9ee0b39beb5b5c4675ed4802113.

awk '/^f251f9ee0b39beb5b5c4675ed4802113/{f=1;next}f' original.txt > new.txt

Now I could have made the script that did the work more complicated and ensure it dealt with these cases. But it would have involved much more code and the original scripts where only a handful of throw away code. For one off jobs like this a quick dive into the command line seemed more prudent.

Speaking at DIBI

I’ll be heading back up to Newcastle in April to give a talk at what’s shaping up to be a good looking conference to kick off the year with. DIBI is trying to please everyone, with both front and backend focused streams.

Created for both sides of the web coin, DIBI brings together designers and developers for an unusual two-track web conference. World renowned speakers leading in their fields of work will talk about all things web. Taking place in Newcastle upon Tyne, (it’s oop north) at The Sage Gateshead on the 28th April 2010, we’re bringing both sides of the web world together with some awesome speakers.

I’m not a big fan of making a point of dividing frontend and backend work. You nearly always end up with javascript dominated horribleness (because we only had a front end person available) or a so called content management system that means all sites have to look the same except for the colour palette. So I’m hoping lots of cross over stuff happens and interesting conversations abound.

Oh, and if you’re wondering what I’ll be speaking about it’s probably going to be something about all the cool tools you could and should be using when building or looking after web applications. I’ll probably be doing my best to convince people to look outside the comfort of the LAMP or C#/MSSQL stacks and realise the future for lots of web developers might just be more devops.

Dreque

I’ve just found Dreque from Samuel Stauffer on GitHub. It’s yet another take on the whole messaging things which is definitely seeing a lot of activity at the back end of this year. It’s using Redis on the backend and looks really rather nice:

Submitting jobs:

<code>from dreque import Dreque
def some_job(argument):
    pass
dreque = Dreque("127.0.0.1")
dreque.enqueue("queue", some_job, argument="foo")</code>

Worker:

<code>from dreque import DrequeWorker
worker = DrequeWorker(["queue"], "127.0.0.1")
worker.work()</code>

DJUGL December

As mentioned at the last event I’ve taken over organising the Django User Group London event from Rob. Tickets are now available for the next event which is going to be on the 3rd of December at The Guardian offices in Kings Cross.

You can sign up on eventwax

Erlang Screencasts

I’ve been trying to learn Erlang for a while. What I actually mean is it’s been on my list of things to learn for months, along with all sorts of other incredibly interesting bits and pieces. I spend a little bit of time at home but the majority of my learning time is now spent commuting to London and back most days. Sometimes I’m even going all the way to Swindon which gives me even longer to not learn Erlang.

The main problem with learning something new on the train is space. Reading a book (or my new Kindle) or just using my laptop is fine. Trying to do both at once is nearly impossible (I’ve tried). So I’ve decided to give another approach a try, namely screencasts.

I’ve only done the first Erlang in Practice episode so far but I was hugely impressed with the content and general presentation. $5 as well doesn’t seem bad at all I don’t think. The episode was half an hour long, but took me a little longer, probably closer to 45 minutes, as I was playing along at home and typing the code examples as I went. I also got sidetracked with messing with my vim configuration at the same time but hey. This makes them perfect for my hour long commute. The full series is 8 episodes long and with luck I’ll be able to work through them this week.

So, good job Kevin Smith and Pragmatic for a nice, accessible start to Erlang. All I need to do now is find something interesting to hack on in Erlang.

Django Committers

I’ve been lurking on the django-developers mailing list for the last couple of weeks and that provided an excuse to play with the new Twitter Lists feature. So here’s a list of djangocommitters on twitter. If I missed someone do let me know. Their is a chance you won’t be able to see this if you’re not on the beta yet I think, sorry!

Problems Installing Hadoop 0.20 and Dumbo 0.21 on Ubuntu

The Hadoop wiki has a great introduction to installing this piece of software, which I wanted to do to have a play with Dumbo. The Dumbo docs also have a good getting started section which includes a few patches than need to be applied.

Dumbo can be considered to be a convenient Python API for writing MapReduce programs

Unfortunately it’s not quite that simple, at least on Ubuntu Jaunty. Hadoop now uses Java6, but if you just follow the instructions on the wikis you’ll hit a problem when you run “ant package”, namely that a third party application (Apache Forrest) requires Java 1.5. Once you fix that, the build script will complain again that you need to install Forrest. Here’s what I did to get everything working:

pre. sudo apt-get install ant sun-java5-jdk

pre. su - hadoop wget http://mirrors.dedipower.com/ftp.apache.org/forrest/apache-forrest-0.8.tar.gz tar xzf apache-forrest-0.8.tar.gz cd /usr/local/hadoop patch -p0 < /path/to/HADOOP-1722.patch patch -p0 < /path/to/HADOOP-5450.patch patch -p0 < /path/to/MAPREDUCE-764.patch ant package -Djava5.home=/usr/lib/jvm/java-1.5.0-sun -Dforrest.home=/home/hadoop/apache-forrest-0.8/

With all that out of the way you should be able to run the simple examples found on the rather excellent dumbotics blog. If you’re using the Cloudera distribution, or when the Hadoop 0.21 gets a release, these problems will disappear but in the meantime hopefully this saves someone else a bit of head scratching.