Static Generator For Web Services

My latest on a train project is Dumper, a static generator for web services. I’m a huge fan of Nanoc and tools like Jekyl for building websites. But I spend at least as much time building small webservicds. I wanted something super simple that would let me expose data I had access to as a read only web service.

At the moment that means using a mysql database, specifying a SQL query and running a python script. Hey presto you have lots and lots of XML and JSON files representing your data. Dumper provides hooks for you to customise the ourput or even overide the database layer. It should be possible if you were so inclined to replace the mysql backend with another database, or other type of data store. Hopefully some of these might end up in my branch at a later date.

At a basic level all you need is a config file which looks a little like:

<% syntax_colorize :ini, type=:coderay do %> [Dumper] path: people index: id backend: mysql

[Database] sql: SELECT id, name FROM people host: localhost username: root password: database: dumper <% end %>

And then run a command line application against that file:

<% syntax_colorize :bash, type=:coderay do %> dumper -c people.ini dump <% end %>

The application supports a number of flags for specifying where you want the files to be generated, what your config file is called and to clean up any generated files if you want to try again. The output will let you know which files have been updated, which deleted and which added too. If you’d rather have a single file but with all your records in then that’s easy too - just add something to the config file.

It’s somewhat early days for Dumper, and I’ve not seen anything similar so their are definately some rough edges that could do with some work. All of that will really come down to how much use it gets. I’d appreciate any feedback from anyone with a similar itch to scratch too.

Piston And Sanitising Json Callbacks

I’m a big fan of Piston, the django app for creating RESTful web services. As part of a project at work I ended up looking through the source code, mainly at some of the neat tricks of serialisation of objects. While poking around I came across something in my mind that wanted fixing. This being open source rather than just file a bug report I setup a bitbucket account and got hacking.

<% syntax_colorize :python, type=:coderay do %> def render(self, request): cb = request.GET.get(‘callback’) seria = simplejson.dumps(self.construct(), cls=DateTimeAwareJSONEncoder, ensure_ascii=False, indent=4)

    # Callback
    if cb:
        return '%s(%s)' % (cb, seria)

    return seria

<% end %>

Can you spot the problem? Note the use of the callback passed in the query string arguments and then used without any checking in the output.

What we really want to do is something like this:

<% syntax_colorize :python, type=:coderay do %> if is_valid_jsonp_callback_value(cb): <% end %>

Which is exactly what has just gone into the code for Piston. This article contains lots of background information about why JSONP callbacks can be a security hole, and helpfully provides a nice Python module to help with the sanitisation. Nice to be on the authors list for something I’m using actively.

Thanks to Jesper Noehr for Piston, for some pointers on bitbucket and for quickly taking the patch. If you’re accepting a callback on your site or application, especially if it’s a public service, you really want to do something like this or you just might have an exploitable security hole.

Mysql Support For Cucumber Nagios

I just noticed Lindsay had committed the amqp steps for cucumber-nagios and remembered I hadn’t mentioned on here some other work I’ve been doing on the same project. We use MySQL quite a bit at work and I’ve been wanting to extent our monitoring for a while. So I set about thinking how that would work with cucumber-nagios. What I’ve come up with looks something like this:

Feature: localhost
  To make sure the rest of the system is in order
  Our database server should not be overloaded

  Scenario: check running processes count
    Given I have a MySQL server on localhost
    And I use the username root
    Then it should have less than 10 processes

  Scenario: check queries per second
    Given I have a MySQL server on localhost
    And I use the username root
    Then it should have less than 200 select queries per second
    Then it should have less than 300 queries per second
    Then it should have less than 5 slow queries pers second
    Then it should have at least 10 queries per second

The numbers, username details and host details are all variables. So you can write senarios for your specific deployments. The tests over time are based on a very short lived sampling mechanic which I’ve yet to test in anger. I’m not sure just yet is this approach will lead to too many false positives but we’ll have to see.

This mysql gmetric script gave me lots of the ideas for invidual steps. I’ll be writing more about some work I’ve been doing with cucumber-nagios and ganglia soon as well.

For the moment if anyone want’s to try these steps out you can either check out my cucumber-nagios fork or just grab the steps from the mysql_steps.rb file. Any feedback much appreciated.

New Nanoc Powered Blog

It’s taken longer than I would have liked but I’ve finally gotten around to relaunching this site on nanoc.

After looking through lots of code from the nanoc showcase I had a pretty good feel for how I wanted things to work and I then used the excellent nanoc3_blog template to get started. I’ve hacked around quite a bit with the code to get things how I wanted them. Using Less to make the CSS more manageable, Coderay for lovely syntax highlighting and making everything default to textile rather than markdown. I’ve also written import scripts for my old blog (in Python) and another one so I can use tumblr is I want to create items on here (in Ruby).

Nanoc really is a joy to work with and I’m hoping that alone will get me back into writing more freqently than I have done for a while. The fact I can just write in Vim or WriteRoom or whatever editor I have to hand feels nice. And using Git, Rake and Rsync complete my little toolset. Everything is still served via Nginx.

I’ve thrown the all the code, including all the content, up on GitHub for anyone interested. Back to writing.

On blogging platforms

Every now and again I feel the need for a change and spent a little time tonight looking at different blogging software. I’m currently running a custom django app I wrote a good while ago, more as an excuse to play with Django than anything. Previously I’ve used Wordpress, Textpattern and even Radiant. But I’m coming to the conclusion that what I want doesn’t exist. In theory that means an opportunity for someone to enter the market, in reality I think it might just be me. So, what do I want?

  • Customisable URL patterns and redirects. Tumblr did well until this point but I have years of content on nice URLs and I’d not going to throw that away.
  • Command line interface for posting snippets
  • Nice looking web based API (bonus points to Typepad here for some nifty features like pubsubhubhub)
  • iPod touch app
  • Real control over the design, not just pre-made templates unless you pay for a pro account
  • Hackable (this could mean anything, but I know it when I see it. So not WordPress then.)
  • Export plain text out
  • Use my own domain name

I have a sneaking suspicion what I might be thinking about is a private tumblr blog used as a datasource for Nanoc3. But that relies on me having the time to build that as I can’t find anyone who might have written such a think. Maybe one day.

DJUGL February

The next Django User Group London is in two weeks time. You can register over on Eventwax. So far we have Brad talking with more speakers to be announced shortly. Hope to see a few people there.

The rise of the in-house team?

I was just thinking about the Design it Build it conference later in the year (full disclosure: I’m speaking). Specifically the people speaking on the developer track. Between myself, Michael Brunton-Spall from The Guardian, David Singleton from Last.fm and Emma Persky from Gumtree four of the six speakers work on in-house teams. Not early stage start-ups, not large software/advertising companies, not as freelancers but in a reasonable sized company on a development team.

My original background was working in agencies, and then a stint working for myself and I’m constantly interested by the different facets of the web software industry. I think conferences or magazines aimed at your average interested web developer or designer play an interesting role in what people perceive as normal. If all you see are people who work as a freelancer you start to think that must be way cooler than whatever it is you’re doing at the time. I remember attending the first @media event and being surprised at the small number of people from larger agencies. Everyone was from smaller boutique places, or Yahoo! or a freelancer. Now lots of people I see at events are involved in startups.

Interestingly as well none of it is about a particular language or framework. Thinking about it as I type I think between the four of us we spend are day jobs mainly using different languages (java, python, php, perl). But I bet we all work in environments where we use other languages at least occasionally, or at the least the people around us do. Mixed environments are commonplace in companies that have been around a good while and run on software. They are far less common elsewhere with startups using whatever is cool (lets build a mobile search engine in Haskel anyone?) and small agencies often using whatever they built their first client website with (probably PHP).

What I’m really interested in though is the type of topics that are going to be talked about. Last.fm vs the Xbox, Scaling the Guardian, my rambling thoughts on a modern toolbox for developers beyond your average LAMP or .NET stack. This is the sort of think I’m interested in. It’s the sort of problems I like having. It’s also, I think, the sort of stuff that doesn’t get a showing a many mainstream conferences. I’m hoping it’s all going to be fairly practical too - things that whatever role people have they can take away and apply.

RabbitMQ support for Cucumber-nagios

I’ve been doing more operations related work of late and am starting to use Cucumber-nagios for various monitoring tasks. Nagios might not be the most attractive of web interfaces but it’s so simple to get clients up and running and extend to do what you need. Cucumber however has a lovely, text based, user interface. And although I’m mainly working with Python at the moment cucumber-nagios (written in Ruby) really is the easiest way I’ve found of writing simple functional tests.

Cucumber-nagios is the creation of Lindsay Holmwood and after several brief conversations over Twitter I set about adding a feature I wanted for my own monitoring setup. Namely support for keeping an eye on RabbitMQ.

At the moment the code is in a fork on GitHub but I’m hoping that once any rough edges have been ironed out and a few people have kicked the tyres then it will make it’s way into trunk. If you want to use this with an existing project straight away you can always drop the contents of amqp_steps.rb into your feature steps file after installing the amqp gem.

I’ve included a little documentation in the fork as well with a quick example:

Feature: github.com
  To make sure the rest of the system is in order
  All our message queues must not be backed up
  Scenario: test queue
    Given I have a AMQP server on rabbit.github.com
    And I want to check on the fork queue
    Then it should have less than 400 messages
    Then it should have at least 5 consumers
    Then it should have less than 50 messages per consumer

My main usecase was to keep an eye on a known queue size and number of consumers. I’m sure I’m missing some features at the moment so any feedback much appreciated.

Processing large files with sed and awk

I found myself using a couple of powerful but underused command line applications this week and felt like sharing.

My problem involved a large text file with over three million lines and a script that processed the file line by line, in this case running a SQL query against a remote database.

My script didn’t try and process everything in one go, rather taking off large chunks and processing them in turn, then stopping and printing out the number of lines processed. This was mainly so I could keep an eye on it and make sure it wasn’t having a detrimental affect on other systems. But once I’d run the script once (and processed the first quarter of a million records or so) I wanted to run it again, except without the first batch of lines. For this I used sed. The following command creates a new file with the contents of the original file, minus the first 254263 lines.

sed '1,254263d' original.txt > new.txt

I could then run my script with the input from new.txt and not have to reprocess the deleted lines. My next problem came when the network connection between the box running the script and the database dropped out. The script printed out the contents of the last line successfully processed, so what I wanted was a new file with the all contents of the old file past the last line. The following awk command does just that, assuming the last line processed was f251f9ee0b39beb5b5c4675ed4802113.

awk '/^f251f9ee0b39beb5b5c4675ed4802113/{f=1;next}f' original.txt > new.txt

Now I could have made the script that did the work more complicated and ensure it dealt with these cases. But it would have involved much more code and the original scripts where only a handful of throw away code. For one off jobs like this a quick dive into the command line seemed more prudent.

Speaking at DIBI

I’ll be heading back up to Newcastle in April to give a talk at what’s shaping up to be a good looking conference to kick off the year with. DIBI is trying to please everyone, with both front and backend focused streams.

Created for both sides of the web coin, DIBI brings together designers and developers for an unusual two-track web conference. World renowned speakers leading in their fields of work will talk about all things web. Taking place in Newcastle upon Tyne, (it’s oop north) at The Sage Gateshead on the 28th April 2010, we’re bringing both sides of the web world together with some awesome speakers.

I’m not a big fan of making a point of dividing frontend and backend work. You nearly always end up with javascript dominated horribleness (because we only had a front end person available) or a so called content management system that means all sites have to look the same except for the colour palette. So I’m hoping lots of cross over stuff happens and interesting conversations abound.

Oh, and if you’re wondering what I’ll be speaking about it’s probably going to be something about all the cool tools you could and should be using when building or looking after web applications. I’ll probably be doing my best to convince people to look outside the comfort of the LAMP or C#/MSSQL stacks and realise the future for lots of web developers might just be more devops.