Chef Hello World

I’ve been playing with Chef recently, in particular the solo variant. The new job at FreeAgent meant setting up new development virtual machines and rather than just jot down instructions I decided to script everything. I’d been wanting an excuse to take a look at Chef for a while and it’s certainly suited to this sort of job.

Unfortunately the getting started documentation isn’t yet great. I’m pretty sure this will improve over time, I had exactly the same problem with the Puppet docs a year ago. The main problem I had was I wanted to know how to use it, not just how to download someone elses cookbook. What I wanted was the following, the absolute simplest thing that will work. A Hello World example for Chef if you will. I’ll say now that I’m not an expert, their may be ways of doing this that are even simpler, but this works for me. And before someone mentions knife or rake tasks a generator isn’t simpler. It might be better when you know what’s going on but until you do it’s a big ass abstraction that will just get in the way.

All my sample cookbook is going to do is install a single package, curl. I’m going to assume you have chef installed for this already. The documentation did an OK job of that, although I’m relativly familiar with installing gems. I did find that the default system packages on Ubuntu at least were way out of date. Either get the packages direct from opscode or use the gem.

First create a directory and file structure that looks like this:

-- config
   -- node.json
   -- solo.rb
-- cookbooks
   -- example
      -- recipes
         -- default.rb

When you run the chef-solo command you need to tell it a few bits of information. The minimum appears to be just telling it where to find the cookbook we’re going to create. I think you can call this file anything you like but in the tree above it’s called solo.rb.

cookbook_path File.expand_path(File.join(File.dirname(__FILE__), '..', "    cookbooks"))

Next up is the details of the given node. This in our very simple case is just a list of the recipes we want to run when we execure chef-solo. Put the following content in the node.json file in the config directory as indicated above:

{
  "run_list": [ "recipe[example]" ]
}

Last up we want to create a cookbook. Now you can go and download example cookbooks from all over the place. This is great for learning new tricks and commands but for me at least to begin with most of them were more complicated than I needed for my simple usecase. Lots of options. Lots of knowing the package names on different distros. I’m just calling this cookbook example. That means the folder in the cookbooks folder is called example and the run list above references example. Feel free to change this to whatever you like, or create new cookbooks with different names. Inside that folder we create a recipes folder and inside that we create a default.rb file with the following content.

package "curl"

And that’s it. A bigish directory structure, three files, each with about one line of content. Simple.

Now to run all that just issue the following command:

sudo chef-solo -c config/solo.rb -j config/node.json

This should output various messages to the console about what chef is doing and, when it’s finished, you should find curl has been installed. Try and add another line to the recipe for another package (or even a gem) and rerun the chef-solo command. Now go read the docs for all the other cool things you can do.

Working For Freeagent

So, I’ve just accepted a new job at FreeAgent, the rather snappy online small business accounting startup. I’ll be starting in about a month, when I get back from devopsdays in Hamburg.

I’m joining a pretty well knit group of designers and developers working on a pretty well loved piece of software. I’ve known Roan (one of the directors) for years, ever since the first Highland Fling conference I think. And the rest of the team seem a good mix of technical smarts and professionalism. Hopefully I can live up to their already high standards.

I’m looking forward to jumping into a decent sized Ruby project and getting properly reacquainted with Rails as well. I’ve spent most of the previous few years in the Python world so that should be a nice challenge. I’m constantly impressed with the number of high quality tools written in Ruby andRails 3 looks a genuinely nice piece of software. Hopefully with a bit more idiomatic ruby under my belt, more time on my hands and some new real world problems to solve I’ll increase my open source contributions and get back to writing again as well. I’ll likely be spending most of my time on those good developer staples new feature development, supporting existing code and scaling systems. I’ll also be involved in some operations stuff if I get chance, introducing software I’m a huge fan of and automating everything I can get my hands on.

The rest of the team are based up in Edinburgh but I’ll be working from home most of the time. I’m quite looking forward to working from Cambridge after having lived here for a few years. I’ll also get to see various friends in Edinburgh that I don’t get chance to meet up with very often and still have time to make it to London for the odd event or get together. I’ll likely be looking into coworking spaces or desk shares in and around Cambridge so if anyone has any recommendations.

Given this weeks pubstandards is apparently all about weddings, birthdays and new jobs (and I have all 3 within a month) it would be rude not to go along for at least one drink. If you’re around London I might see you there.

Script Running Web Interface With Websockets

After something of a break I found a bit of time for some hacking at the weekend and decided to scratch a personal itch. I like writing shell scripts for everything from checking on things to deploying code. For anything that is more than just executing one command, or anything with detailed output or that takes a while I like to be able to see what’s going on. I also like to (OK, only sometimes) let other people run those commands and not all those people want to check something out and run a console. The web maks a pretty good interface for this sort of situation.

Enter Bolt. It’s my first proper go at using websockets for the communication between client and server, and my first stab at EventMachine. The code at the moment needs some improvement (tests, configurability, deployment options) but it works as a proof of concept.

It’s pretty simple, designed for running a single command at the push of a button and showing the results scroll past as they happen.

Bolt script running interface

Hopefully it’s useful to a few more people, or at least it will be when I clean it up a bit. For more information head over to the GitHub page.

Git Pre Recieve Hook For Integrity

I’m getting married rather soon so time has been somewhat short (in a good way) for just hacking on stuff, but I’ve finally found a little bit of time to play with something I’ve been mulling over for a while. Namely a continuous deployment workflow using the integrity continous integration server.

I’m hoping to have an incredibly simple but fully operation example available at some point - mainly to act as a good discussion point. For now here’s my current pre-receive hook.

Python: What To Use?

My friend Jamie Rumbelow has started a new project and decided to use Python. He asked a great question over on Stack Overflow which basically came down to what should I use for my first proper Python web application project. After a quick prompting on twitter I decided to have a go. I’ve cross posted my anwser below more because it took as long as a typical blog post to write.

Frameworks

OK, so I’m a little biased here as I currently make extensive use of Django and organise the Django User Group in London so bear that in mind when reading the following.

Start with Django because it’s a great gateway drug. Lots of documentation and literature, a very active community of people to talk to and lots of example code around the web.

That’s a completely non-technical reason. Pylons is probably purer in terms of Python philosophy (being much more a collection of discrete bits and pieces) but lots of the technical stuff is personal preference, at least until you get into Python more. Compare the very active Django tag on Stack Overflow with that of pylons or turbogears though and I’d argue getting started is simply easier with Django irrespective of anything to do with code.

Personally I default to Django, but find that an increasing amount of time I actually opt for writing using simpler micro frameworks (think Sinatra rather than Rails). Lots of things to choose from (good list here). I tend to use MNML (because I wrote parts of it and it’s tiny) but others are actively developed. I tend to do this for small, stupid web services which are then strung together with a Django project in the middle serving people.

Worth noting here is appengine. You have to work within it’s limitations and it’s not designed for everything but it’s a great way to just play with Python and get something up and working quickly. It makes a great testbed for learning and experimentation.

Mongo/ORM

On the MongoDB front you’ll likely want to look at the basic python mongo library first to see if it has everything you need. If you really do want something a little more ORM like then mongoengine might be what you’re looking for. A bunch of folks are also working on making Django specifically integrate more seamlessly with nosql backends. Some of that is for future Django releases, but django-norel has code now.

For relational data SQLAlchemy is good if you want something standalone. Django’s ORM is also excellent if you’re using Django.

API

The most official Oauth library is python-oauth2, which handily has a Django example as part of it’s docs.

Piston is a Django app which provides lots of tools for building APIs. It has the advantage of being pretty active and well maintained and in production all over the place. Other projects exist too, including Dagny which is an early attempt to create something akin to RESTful resources in Rails.

In reality any Python framework (or even just raw WSGI code) should be reasonably good for this sort of task.

Testing

Python has unittest as part of it’s standard library, and unittest2 is in python 2.7 (but backported to previous versions too). Some people also like Nose, which is an alternative test runner with some additional features. Twill is also nice, it’s a “a simple scripting language for Web browsing”, so handy for some functional testing. Freshen is a port of cucumber to Python. I haven’t yet gotten round to using this in anger, but a quick look now suggests it’s much better than when I last looked.

I actually also use Ruby for high level testing of Python apps and apis because I love the combination of celerity and cucumber. But I’m weird and get funny looks from other Python people for this.

Message Queues

For a message queue, whatever language I’m using, I now always use RabbitMQ. I’ve had some success with stompserver in the past but Rabbit is awesome. Don’t worry that it’s not itself written in Python, neither is PostgresSQL, Nginx or MongoDB - all for good reason. What you care about are the libraries available. What you’re looking for here is py-amqplib which is a low level library for talking amqp (the protocol for talking to rabbit as well as other message queues). I’ve also used Carrot, which is easier to get started with and provides a nicer API. Think bunny in Ruby if you’re familiar with that.

Environment

Whatever bits and pieces you decide to use from the Python ecosystem I’d recommend getting to who pip and virtualenv - note that fabric is also cool, but not essential and these docs are out of date on that tool). Think about using Ruby without gem, bundler or rvm and you’ll be in the right direction.

Dibi Video

The videos from the DIBI conference are now up on Vimeo. Lots of good stuff and more to come. The one disadvantage of a a two track conference is you miss half the talks so when I get a chance I’ll be catching up with those talks I didn’t get chance to see.

Very Simple Custom Ganglia Metrics

Logging useful information from running systems for monitoring purposes is pretty important if you want to see how your software is behaving in the real world. It’s one thing to test something locally, another to test something under load on a testing environment and quite something else to watch production code while running.

The numbers can be useful for checking newly released code isn’t having a detrimental effect on performance, observing what changes in load are doing to systems over time and planning for future capacity growth.

Creating log files, agregating files from multiple machines and then analysing the results is one approach. Another is using something like Ganglia. Ganglia is great for trending data over time, and ties in nicely to Nagios for reporting. Installing the monitoring daemon on machines and generally getting the default checks (memory, disk, network, etc.) up and running is nice and easy. From there using the gmetric command line to create custom metrics (say checking some mysql statistics) is again straight forward.

So far, so good. The only issue I’ve run into was creating custom metrics on the fly from a machine outside the network. For bonus points these metrics were nothing to do with the machine on which they were collected, but to do with the system overall. More specifically the metrics were web site performance data gathered via some cucumber and celerity scripts.

For this I knocked up a tiny web service wrapper around the gmetric command line. It’s very feature light at the moment (I only needed it to collect time based stats at regular intervals) but it could be made more featureful and expose the rest of the gmetric API if needs be. It does it using a very simple URL scheme:

<% syntax_colorize :bash, type=:coderay do %> /{metric-name}/{metric-value}/ <% end %>

So for example I can create metrics on the fly simply using an HTTP client or a web browser.

<% syntax_colorize :bash, type=:coderay do %> /GarethsCommuteTime/3600/ /ExternalPageLoad/2.005/ <% end %>

The code is up on GitHub and is completely self contained. I’ve been running it mainly using spawning but any small WSGI server could surfice. I looked very briefly at the API for Ganglia but found the gmetric approach to be much simpler.

And if you’re a Ganglia expert and know a much better way of doing this then let me know. Ganglia is awesome, and collecting metrics is both useful and fun (for me at least) but it’s not always obvious how to get into creating simple custom metrics which tell you something about your own appliction code.

Devops Twitter Aggregator

I’ve been hacking on appengine again and have thrown up a simple twitter aggregator for devops. It’s again based on TwitterEngine with an increasing number of additions and changes.

As well as just the tweets I want to build a few other small features. The first of which is link extraction, so at the moment you’ll see recent links a the top of the page. I’ll hopefully make that a little more useful, with better browing and converting short urls into the real ones. I also have vague plans for providing exports, listing people talking about devops and some useful graphs to track general activity around devops.

<img src=“http://image-host.appspot.com/i/img?id=agppbWFnZS1ob3N0cg0LEgVJbWFnZRjphAEM" alt=“Devops aggregator on Twitter”

I’m really interested to see if the whole discussion around the term devops grows over the next year or so, and I’m hoping this will make it a little easier to see that change happen.

Installing Integrity On Debian/Ubuntu

I’ve been playing with Integrity again as a simple continuous integration server and have installed it on a few debian and ubuntu machines in the last few weeks. The current site has good installation instructions for the Ruby side of things but leaves it as an excercise for the installer to make sure all the system level dependencies are installed.

So probably as much for me in the future, here is what I had to install to get the installation instructions to work for me.

<% syntax_colorize :bash, type=:coderay do %> apt-get install build-essential apt-get install ruby apt-get install rdoc apt-get install sqlite3 apt-get install libdbd-sqlite3-ruby apt-get install libdataobjects-sqlite3-ruby1.8 apt-get install libsqlite3-dev apt-get install libxml2 apt-get install libxml2-dev apt-get install libxsl-dev apt-get install libopenssl-ruby <% end %>

I also needed to install the following package on Ubuntu:

<% syntax_colorize :bash, type=:coderay do %> apt-get install ruby1.8-dev <% end %>

If you want to use a database other than the default SQLite then you won’t need those packages and I’ll assume you know what you’re doing.

You're Going To Need A Bigger Toolbox

I’m just getting back from Newcastle after getting to present at the first Design It Build It conference. It was great to be back up in Newcastle and to see lots of familiar faces. As with most conferences it was also good to meet new people (especially those for whom it was their first conference) and to listen to people talking about interesting stuff. Personal highlights for me were David Singleton from Last.fm and Michael Brunton Spall from The Guardian going through really interesting case studies from their respective organisations. It’s the sort of gritty content it’s often hard to come by. Speaking did mean I missed out on most of the design track unfrotunately, but videos should be available soon and by all accounts it sounds like the larger designer crowd went home happy.

I think my presentation at DIBI went OK. I’d got a little bit carried away with cramming content in, which meant it felt rushed at times and I still went over by 5 minutes into my Q&A time. But hey, a few people said nice things afterwards.

I wanted to tell the world (of web developers) about as many different tools as I could. I think most people who read this blog have probably come across most of this software, heck you might be commiting code to it or already using it in production. But lots of people haven’t ever heard of Memcached, never mind Cassandra or RabbitMQ. And more inportant than the specific software is the differnt types of tools available. Small web servers, message queues, HTTP caches, etc. Conferences are a good place to find and educate people. Hopefully I managed to do just that.

One thing I hope I got across, I’m not for a moment saying you shouldn’t be using tools you know and love. Nor am I saying you should jump in and start using lots of crazy software. But keeping an eye on new developments can serve you well when it comes to deciding whether the best approach is really to build something from scratch. I’ve spoken to several people over the last few weeks who where starting to write simple queing systems using cron and mysql, or using a hand rolled file system based caching setup. And in both cases I think they would be better served by existing tools.

My slides had a lot of links in them and I mentioned during my talk I’m put a list of them somewhere:

Small Web Servers

Caching

Search Engines

Message Queues

Non-relational Data Stores

Data Mining

Functional Testing

Server Provisioning