Devops Twitter Aggregator

I’ve been hacking on appengine again and have thrown up a simple twitter aggregator for devops. It’s again based on TwitterEngine with an increasing number of additions and changes.

As well as just the tweets I want to build a few other small features. The first of which is link extraction, so at the moment you’ll see recent links a the top of the page. I’ll hopefully make that a little more useful, with better browing and converting short urls into the real ones. I also have vague plans for providing exports, listing people talking about devops and some useful graphs to track general activity around devops.

<img src=“http://image-host.appspot.com/i/img?id=agppbWFnZS1ob3N0cg0LEgVJbWFnZRjphAEM" alt=“Devops aggregator on Twitter”

I’m really interested to see if the whole discussion around the term devops grows over the next year or so, and I’m hoping this will make it a little easier to see that change happen.

Installing Integrity On Debian/Ubuntu

I’ve been playing with Integrity again as a simple continuous integration server and have installed it on a few debian and ubuntu machines in the last few weeks. The current site has good installation instructions for the Ruby side of things but leaves it as an excercise for the installer to make sure all the system level dependencies are installed.

So probably as much for me in the future, here is what I had to install to get the installation instructions to work for me.

<% syntax_colorize :bash, type=:coderay do %> apt-get install build-essential apt-get install ruby apt-get install rdoc apt-get install sqlite3 apt-get install libdbd-sqlite3-ruby apt-get install libdataobjects-sqlite3-ruby1.8 apt-get install libsqlite3-dev apt-get install libxml2 apt-get install libxml2-dev apt-get install libxsl-dev apt-get install libopenssl-ruby <% end %>

I also needed to install the following package on Ubuntu:

<% syntax_colorize :bash, type=:coderay do %> apt-get install ruby1.8-dev <% end %>

If you want to use a database other than the default SQLite then you won’t need those packages and I’ll assume you know what you’re doing.

You're Going To Need A Bigger Toolbox

I’m just getting back from Newcastle after getting to present at the first Design It Build It conference. It was great to be back up in Newcastle and to see lots of familiar faces. As with most conferences it was also good to meet new people (especially those for whom it was their first conference) and to listen to people talking about interesting stuff. Personal highlights for me were David Singleton from Last.fm and Michael Brunton Spall from The Guardian going through really interesting case studies from their respective organisations. It’s the sort of gritty content it’s often hard to come by. Speaking did mean I missed out on most of the design track unfrotunately, but videos should be available soon and by all accounts it sounds like the larger designer crowd went home happy.

I think my presentation at DIBI went OK. I’d got a little bit carried away with cramming content in, which meant it felt rushed at times and I still went over by 5 minutes into my Q&A time. But hey, a few people said nice things afterwards.

I wanted to tell the world (of web developers) about as many different tools as I could. I think most people who read this blog have probably come across most of this software, heck you might be commiting code to it or already using it in production. But lots of people haven’t ever heard of Memcached, never mind Cassandra or RabbitMQ. And more inportant than the specific software is the differnt types of tools available. Small web servers, message queues, HTTP caches, etc. Conferences are a good place to find and educate people. Hopefully I managed to do just that.

One thing I hope I got across, I’m not for a moment saying you shouldn’t be using tools you know and love. Nor am I saying you should jump in and start using lots of crazy software. But keeping an eye on new developments can serve you well when it comes to deciding whether the best approach is really to build something from scratch. I’ve spoken to several people over the last few weeks who where starting to write simple queing systems using cron and mysql, or using a hand rolled file system based caching setup. And in both cases I think they would be better served by existing tools.

My slides had a lot of links in them and I mentioned during my talk I’m put a list of them somewhere:

Small Web Servers

Caching

Search Engines

Message Queues

Non-relational Data Stores

Data Mining

Functional Testing

Server Provisioning

Devops At Barcamp Cambridge

I’m at barcamp cambridge this weekend and decided to do a short talk on devops. It’s still a term that not too many people have come across and something that lots of people building websites should think about.

I put the slides together this morning so much of it will be familiar to people who have been reading the same blog posts as me over the last year or so. For anyone else either just finding out about the whole world of operations or old hand sysadmins finding likeminded people hopefully the slides will be useful.

DIBI Twitter Aggregator

So, DIBI is just over a week away and lots of people seem to be getting excited. Personally I’m really looking forward to it. I get a nice trip back up to my old home of Newcastle and get to talk geek to an audience of likeminded (and not so likeminded) web professionals. I’m also going to catch up with quite a few people I haven’t seen in quite a while.

Anyway, Twitter is at it’s best at conferences like this. It’s perfect for the where to go now situation. But I also like going back in time after the fact and looking at what people said, especially to get feedback about my talk. With that in mind I’ve build a little aggregator for all things #dibi. So far that appears to be a mixture of excitement and travel chaos.

I specifically wanted something that ran on the server, which discounted an awful lot of pure javascript versions. I think Twitter clients generally do a good job of the right now anyway. I also like to collect data for mining later, assuming I have the time. After a number of discussions on Twitter, then just some wandering around GitHub I found TwitterEngine which is fantastic. I’ve hacked it around quite a bit into something more general purpose. I’ve added some caching, some other App Engine performance tweaks and some monitoring. I’ll commit those to GitHub when I get the chance. I’d forgotten quite how much fun App Engine development was.

I still want to add a few more features before the conference. The styles work quite well on my Android but less well (text too small) on my iPod Touch. So I’ll fix that first. Any other requests?

Hadoop Hive Web Interface

I’ve been playing with Hive recently and liking what I’ve found. In theory at least it provides a very nice, simple way of getting into analysing large data sets. To make it even easier to show other people what you’re up to Hive has a nascent web interface with a little documentation on the wiki

image of hive web ui

On the one hand it’s rather simple at this point, but that should be easily enought to prettify given a bit of time. The bigger problem was getting it working in the first place. What follows worked for me using the latest cloudera packages on debian testing. I’m assuming you already have Hive and Hadoop installed, the basic packages worked fine for me here.

Next up you’ll need the JDK (not just the JRE) as their is some compilation that will go on the first time you run the web interface.

<% syntax_colorize :bash, type=:coderay do %> apt-get install ant sun-java6-jdk <% end %>

Next up I had to modify the installed /etc/hive/conf/hive-site.xml file as follows:

I changed this:

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hivevar/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

To this. Note the hivevar path doesn’t exist so I’m not sure if this was a typo in the source.

<% syntax_colorize :xml, type=:coderay do %> hive.metastore.uris file:///var/lib/hive/var/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. <% end %>

I also change the following section regarding the metastore name:

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/${user.name}_db;create=true JDBC connect string for a JDBC metastore <% end %>

To this, with a fixed name. When using the above confirguration the file was actually called ${user.name} rather than my username being subsituted in. Elsewhere this seems to work fine.

<% syntax_colorize :xml, type=:coderay do %> javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true JDBC connect string for a JDBC metastore <% end %>

I’m not convinced the above two changes are needed but have left them here just in case. The main tricky part is making sure a load of environment variables are correctly set. The following worked for me:

<% syntax_colorize :bash, type=:coderay do %> export ANT_LIB=/usr/share/ant/lib export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/lib/jvm/java-6-sun <% end %>

All being well that should allow you to run the hive command with the web interface like so:

<% syntax_colorize :bash, type=:coderay do %> hive –service hwi <% end %>

That should bring up a webserver on port 9999 where you should see something similar to the screenshot above.

More Django Project Templates

Quite a while ago I released some handy scripts for building up Django project layouts. Part of the reason behind this was to kick off discussions about ideal pproject layouts and maybe even get a few user submitted layouts into the project. Paster makes this soft of thing really easy to do and I was interested in what people might come up with.

Well, the team over at The Chicargo Tribune have done just that, creating a branch of the original project and adding a very rich example project to it

I’ve finally gotten around to merging that back into my branch and releasing a new version to PyPi. It’s got a good number of PostGreSQL commands that might be of use, as well as several Amazon S3 and EC2 examples. Even if none of these complete templates fit what you’re looking for exactly their are lots of smaller fabric recipes worth taking a look at.

You can read all about what is in the project over on their developer blog, which if filled with other interesting web and operations goodness.

So, huge thanks to Chris, Brian, Ryan and Joe for some great work.

Sandbox Your Ruby Gems

I’m a huge fan of virtualenv for Python. It’s a simple tool that lets you have an isolated python environment into which you can install libraries via setup tools. It makes experimenting with different versions of code easier and avoids lots of problems with hard to find bugs caused by unknown third party conflicts.

Sandbox aims to do exactly the same for Ruby. It isolates your gem installation from the system libraries and providesa script to activate the named environment. The GitHub repo has been around since 2008, yet only has 37 followers. I’m not sure whether that is because another more popular and feature rich solution exists or because Ruby people haven’t yet found this technique useful. For me finding tools that work the same way as those I already like in other languages makes me happy.

Static Generator For Web Services

My latest on a train project is Dumper, a static generator for web services. I’m a huge fan of Nanoc and tools like Jekyl for building websites. But I spend at least as much time building small webservicds. I wanted something super simple that would let me expose data I had access to as a read only web service.

At the moment that means using a mysql database, specifying a SQL query and running a python script. Hey presto you have lots and lots of XML and JSON files representing your data. Dumper provides hooks for you to customise the ourput or even overide the database layer. It should be possible if you were so inclined to replace the mysql backend with another database, or other type of data store. Hopefully some of these might end up in my branch at a later date.

At a basic level all you need is a config file which looks a little like:

<% syntax_colorize :ini, type=:coderay do %> [Dumper] path: people index: id backend: mysql

[Database] sql: SELECT id, name FROM people host: localhost username: root password: database: dumper <% end %>

And then run a command line application against that file:

<% syntax_colorize :bash, type=:coderay do %> dumper -c people.ini dump <% end %>

The application supports a number of flags for specifying where you want the files to be generated, what your config file is called and to clean up any generated files if you want to try again. The output will let you know which files have been updated, which deleted and which added too. If you’d rather have a single file but with all your records in then that’s easy too - just add something to the config file.

It’s somewhat early days for Dumper, and I’ve not seen anything similar so their are definately some rough edges that could do with some work. All of that will really come down to how much use it gets. I’d appreciate any feedback from anyone with a similar itch to scratch too.

Piston And Sanitising Json Callbacks

I’m a big fan of Piston, the django app for creating RESTful web services. As part of a project at work I ended up looking through the source code, mainly at some of the neat tricks of serialisation of objects. While poking around I came across something in my mind that wanted fixing. This being open source rather than just file a bug report I setup a bitbucket account and got hacking.

<% syntax_colorize :python, type=:coderay do %> def render(self, request): cb = request.GET.get(‘callback’) seria = simplejson.dumps(self.construct(), cls=DateTimeAwareJSONEncoder, ensure_ascii=False, indent=4)

    # Callback
    if cb:
        return '%s(%s)' % (cb, seria)

    return seria

<% end %>

Can you spot the problem? Note the use of the callback passed in the query string arguments and then used without any checking in the output.

What we really want to do is something like this:

<% syntax_colorize :python, type=:coderay do %> if is_valid_jsonp_callback_value(cb): <% end %>

Which is exactly what has just gone into the code for Piston. This article contains lots of background information about why JSONP callbacks can be a security hole, and helpfully provides a nice Python module to help with the sanitisation. Nice to be on the authors list for something I’m using actively.

Thanks to Jesper Noehr for Piston, for some pointers on bitbucket and for quickly taking the patch. If you’re accepting a callback on your site or application, especially if it’s a public service, you really want to do something like this or you just might have an exploitable security hole.