It's the Data we Want

A spreadsheet. A CSV file. Whatever is in use internally. Made available to people like us under a suitable license.

I feel a little self adsorbed quoting myself (from a recent Refresh Cambridge discussion) but I did like the turn of phrase. What I was rambling on about was Cambridge County mapping data, after a question from a nice chap from the council about what “new, exciting map technology” we’d like to see. But it applies to any data that you’re trying to make public what-so-ever, be it government or otherwise.

What myself and a few other people were talking about, and one of the things that has been discussed as part of the Rewired State group, is that it’s all about the data, not necessarily about a nice web based API.

Now I’ve written and spoken about the need for well designed API’s being treated as part of the user interface. But remember interface design, and by association API design, isn’t easy. API design is often about building manageable flexibility. A public API is often about managing the flow of data you control out to third parties, as well as the information itself it might include limitations on usage, or request rate, or storage. A public API codifies how that information can be accessed. APIs also have to tread a fine line between making it easy for you to solve your problem, and making it easy for everyone else to solve their completely different problems. These compromises are design.

But not everything needs an API. Sometimes it’s just about the data, and the best way of getting at that data is as raw as possible. Government data is an easy sell here, as it is (or rather should be) our data. It’s also for the most part interesting to read rather than write (historical council tax data, or population data for instance). Raw data can generally be provided quicker than via an API. It doesn’t need fragile computer systems or extensive manual labour. It doesn’t need particularly clever computing resources. Just upload a spreadsheet or a CSV file to a sensible URL on a known, regular basis and away we go.

And giving data like this away to the development community is likely to have a few additional benefits if that data is useful (it probably is to someone). We’ll happily write software libraries, or create APIs over the top of it for you. We’ll also write all sorts of useful tools using the data in ways no one else thought of. So if you’re sat on a load of data that’s not core to your business, or is meant to be public anyway, then lets start talking publicly about how to just get this out on the web quickly and cheaply, rather than spending lots of your time and money on something fancy.

Thoughts on the whole XHTML/HTML5 affair

I wasn’t going to write anything about the whole XHTML2 thing. I noted its passing, got a nice message on Twitter and thought that would be it. But no. The web standards world exploded. I honestly didn’t see that coming.

Let’s get a few things straight:

  • I use XHTML 1.0 for this site. In fact I’ve been using it for the majority of things for most of my professional life.
  • I don’t serve content with an XML mime type. Neither does anyone else. It’s a complete non issue. Ignore it.
  • At my last job we used HTML 4. It meant I had to remember not to close my image elements, which bugged me, but not too much. I still quoted everything. Closed everything I could. And only used lowercase element names.
  • My latest two pet projects are using HTML5. I’m still closing everything (including image elements, yay), quoting everything and lowercasing everything.

Web Standards are interesting, in that they are standards for both implementors (browser makers) and for authors (us). I like coding standards in programming languages too, it’s one of the things I love about Python and PEP8. But with these standards it’s not about making your code work, it’s about shared conventions and readability. So common spacing, UPPERCASE for constants and Leading caps for class names for instance. It’s also about having a tool to check everyone is adhering to standards, like or FXCOP for .NET. If everyone writes code in the same way it’s easier to read, write and to pick up someone else’ code. You can do that with HTML, but you have to do that with XHML.

Now the whole HTML 4.0 vs XHTML 1.0 thing has come up lots of times, on mailing lists, at conferences as well as down the pub. I know on occasion me, Drew, Rachel and Jeremy side against Simon and Nat on the issue. But what’s interesting is that I think we all agree on all the typographical conventions stuff. My former colleagues with a passion for front end standards and HTML 4 did the same thing. I even remember Simon looking for ways to validate against HTML 4 but also to check for all lower case elements, closed paragraphs and the like.

Which brings me to the reason why I use XHTML: The validator enforces my preferred coding standards for HTML - lowercase elements, quoted attributes and closed elements. That’s it. Not much really. I know it’s marketing XHTML rather than technical XHTML. I don’t care. Or rather I do care, I just make a conscious pragmatic decision based on a small personal advantage. I’m both pedantic and like having a tool chain which enforces that, XHTML suits my style.

The markup language debate is being talked about in terms of pragmatists vs purists. But ignoring the people who both really understood and really wanted XHTML2, it’s mainly the pragmatists arguing amongst themselves now. Some of them are big company people, others working for themselves. Some have standards or academic leanings, others are rooted in commercial web design. Some people probably work on huge long term projects, others relatively small sites and apps. And I think it’s these cultural differences that are the root of arguments now. So blog posts coming out saying the same thing but arguing with other people give a strange impression of disagreement. Throw in that the web lends itself to popular blogs gathering a crowd of like-minded people around them and hey presto we have people feeling unfairly put upon and getting agitated.

What a storm in a teacup. Who doesn’t genuinely think the best approach is to use whatever you’re using now for most projects, investigate HTML5 as time permits, and then expect to start using HTML5 in bits and pieces in the short to medium term, when being mainly dependent on your target audience?

In my opinion the only genuine problem that this saga has highlighted is the fear, uncertainty and doubt around all flavours of HTML amongst a large number of web professionals. People don’t get this stuff at all. With the added resources soon to be put into the HTML5 working group at the W3C this outreach and education side of the project has to have just as much love and attention as the spec itself.

Pants Python Code

One of the projects that came out of the Django Dash recently was PyPants which I’m finding very cool.

Urltest on PyPants

It’s basically a quality tracking service for Python modules. For instance my recent UrlTest module has a page on PyPants, scoring a good B grade after some cleanup work earlier today.

Under the hood I think it’s probably CheeseCake which is available as a command line application, maybe with a hint of PyLint and thrown in. But the nice interface, as well as tracking of scores over time, really add something. GitHub has been credited by some as making sharing code more fun, I’m hoping projects like PyPants can do the same for quality in Python code.

Congrats to Eric Holscher, Travis Cline, and Nathan Borror on a fantastic addition to the Python community.

Urltest on PyPi

I’ve been meaning to add some of my code to the Python Package Index for a while and have finally gotten around to it with Urltest, my simple DSL for testing WSGI apps.

You can now find it at and install it using setuptools with:

pre. easy_install urltest

At the moment I’ve not added any categorisation or detailed description to the file, I’ll be doing that soon. I wanted to get it working with the absolute minimum file, which turned out to look like:

pre. #!/usr/bin/env python from setuptools import setup, find_packages setup( name = “urltest”, version = “0.1”, author = “Gareth Rushgrove”, author_email = “[email protected]”, url = “", packages = find_packages(‘src’), package_dir = {”:‘src’}, )

Uploading it to PYPI itself was incredibly simple, partly as I was already using setup tools for local installation.

pre. python register python bdist_egg upload

Let me know if anyone uses this and gets it working. I’ll be adding more details and maybe even some more features when I get the chance. Once I do that I’ll probably work on a few more packages as well.

Ant for Web Developers II - Restart Apache

Following on from yesterdays first useful ant task, here’s another commonly used task - restarting a remote service. I’ve used apache in this example, but it could be any service running on your remote machine and it doesn’t have to be the restart command.

In order to do this we’ll use the sshexec target which has a third party library dependency. This is the same third party library needed for the scp task in yesterdays post

You need first to download JSCH and then compile the source using ant. Just run ant dist in the downloaded folder and you should get a .jar file in the /dist/lib folder. Save this .jar file as as jsch.jar to a folder in your home directory ~/.ant/lib where ant can automatically load it. Alternatively you can run ant with the -lib command to load libraries from a different location.

pre. ant -lib /tools/ant-libraries

With that out of the way lets have a look at the task.


<?xml version=“1.0” encoding=“UTF-8”?>

And running it is as simple as:

pre. ant restart-apache

One potential issue with tasks like this is storing the password in the build file in plain text. The target we’re using can also use key authentication is you’re happy using ssh keys. Alternatively you can set properties on the command line each time you run ant like so.

pre. ant restart-apache -Dpassword={password}

Ant for Web Developers I - Backup Config File

I occasionally get carried away with Apache Ant. For those that haven’t come across it, Ant is a build tool written in Java, using an XML syntax to describe a series of repeatable tasks. In your typical web standards savvy, dynamic language favouring, web developer types that description is probably all they (think they) need to know. It’s Java. It’s XML. It’s only really useful in the context of building software (dull).

But I think Ant is a particularly handy tool to have around for anyone working on even simple websites. A couple of strong use cases come to mind:

  • If you’re working in a team environment then build files are hugely useful when introducing new people to the team, or when moving people around. Getting code up and running at the start of a project, or if you join the team part way through, can be tricky. A well written build file can automate this.
  • Even when working on projects on my own I tend to write simple build files. The main reason is so I don’t forget how to do something. How do you deploy this particular site? How do you run the test suite or generate the documentation? Build files can encapsulate this, and rather than documentation that might be out of date the build file will be executed to do that job in question.

So with all that in mind I’m going to try and do a series of posts each covering a single task, aiming to cover things that your regular web developer will find useful. With that in mind if anyone has any requests or questions let me know either by email or in the comments.

Out first task lets us backup a file from our remote web server, in this case it’s the apache2.conf file used to setup apache. Obviously it could be any file you want to get hold of. The example below has a couple of properties for the username and hostname of the remote machine. Save the following snippet into a file called build.xml and place it anywhere you like on your machine.


<?xml version=“1.0” encoding=“UTF-8”?>

Running the task, once you have ant installed (it comes already installed on OS X and is generally available in whatever linux package management system you prefer), is as simple as typing the following into a console.

pre. ant backup-apacheconf

This should download the apache2.conf file to you local machine, into the same directory as your build file.

The above task requires that you have scp installed on your machine, which is pretty likely if you’re using OS X or Linux. Ant comes with an inbuilt scp task, but it requires you to install a separate java library. If you’re happy doing that then you can write tasks like:


<?xml version=“1.0” encoding=“UTF-8”?>

Less CSS

Ruby people really don’t like CSS do they? But Less is actually pretty cool. It’s basically an attempt to bootstrap features, specifically Variables, Mixins, Operations and Nested Rules, into CSS. The best part about this is it uses CSS syntax and a simple one step compiler. I’d be interested to know what the folks at the W3C think about this.

So for instance you can do:

pre. /* LESS */ brand_color: #4D926F; #header { color:brand_color; } h2 { color: @brand_color; }

and compile it down to:

pre. /* CSS */ #header { color: #4D926F; } h2 { color: #4D926F; }

Message Queues at Cambridge Geek Night

Last night was the first Cambridge Geek Night and saw 35 people or so fill a room above a pub to listen to a few short talks and converse with fellow geeks. I had the pleasure of giving the first talk, a short introduction to using message queues for web developers.

I got lots of good questions from interested people and by the sounds of things it had the desired effect - for people unfamiliar with using a message queue to go out and have a play with some of the cool software available to solve your problems.

Overall the night was definitely a success. Suitably geeky conversations. A chance to meet new people as well as old friends. Good job Vero and David for organising the event and here’s to the next one.

How to Decide on Your Next Programming Language

Neil Crosby got me thinking yesterday about which language to learn/play with next by tweeting

so, lovely people of the interwebs. What webly language should I be spending my time learning then? Right now, I’m all about the PHP.

Neil appears to have gone for Python, but more specifically I’m interested in how you decide what to learn next? And improving the likelihood of you seeing it through and being able to add it to you toolbox. Personally I’ve messed around with a wide range of languages but I would say I’m proficient in only a few of those.

So here goes with a list of questions to ask yourself.

  • Do you have a small pet project you can use the new language on straight away?
  • Is the language increasing or decreasing in popularity?
  • Does the language overlap with what you already know in terms of applicability?
  • Is it a different style of programming to what you’re used to? Say a pure functional language if all you normally do is object orientated.
  • Is the language in demand in the jobs market? If not why not?
  • Could you use the language in your current job if you were allowed to?
  • What are the learning materials like? Are there books or websites that teach the language without passing on bad practices.
  • What is the community around a language like? Do they have an IRC room where beginners can ask questions without being mocked?
  • Does the language have an interactive mode? Sorry but I find anything without to be a chore to learn.
  • What is it going to cost you in terms of time? Remember some languages are bigger that others.
  • What is it going to cost you in terms of money? Do you need specialist software, or hardware, or licenses?
  • Do you have friends or acquaintances who use the language and who will help you out with pointing you in the right direction of resources or helping review your code?
  • Do you have somewhere you can go to meet other people who write the language. That might be a formal user group or it might just be a more general pub meetup that you know a few people attend.
  • How is the language represented on GitHub? Seeing what other people build and being able to read working code is hugely useful when learning anything.

I don’t think all of these apply to everyone or apply all the time, but it’s worth considering and rejecting them when they don’t.

Feel free to disagree in the comments or, even better, add extra ones. Or alternatively just cut to the chase and tell me what I should learn next. update

I’ve updated my vanity domain at with a bit of information in case anyone might be interested in my services.

my personal site

The short version is I’m on the lookout for future projects, probably of a freelance or contract basis but if it’s particularly interesting then maybe a full time position. Basically I’m in quite a nice position and able to wander about a bit looking for something cool to do.

If you reading this site then you know what floats my boat. Python, testing, automation, system design, maybe get into Ruby or another language properly, etc. I’d particularly like to help people get started with testing, continuous integration or automated deployment and the like.