Why Developers Should Care About System Packages

First a bit of background. I’m a software developer (lately in Ruby and a tiny bit of Java, previously in Python, C# and PHP; yes I got around a bit), but have spent enough time looking after production hardware (mainly debian, solaris and recently a bit of RHEL) to have a feel for sysadmin work. I even have friends who are systems administrators. I mainly use a shiny apple laptop for my development work, but I actually execute all the code on Linux virtual machines. The aim of this post is to bridge a divide, not start a flame war about specific tools.

I’m writing this partly to address a tweet I made that in hindsight needed more than 140 characters. Actually a number of my recent tweets have been on the same theme so I should be more helpful. What I’m seeing recently is an increase in the ways I’m being asked to install software and for me at least that’s annoying.

  1. Several projects will ask you to do something like curl http://bit.ly/installsh | sh which downloads a shell script and executes it.
  2. Some will insist I have git installed
  3. A new framework might come with it’s own package manager

I’m a polyglot programmer (so I shouldn’t care about #3) that uses git for everything (scratch #2) and who writes little bash scripts to make my life easier (exactly like #1). So I understand exactly how and why these solutions appear fine. And for certain circumstances they are, in particular for local development on a machine owned and maintained by one person. But on a production machine and even on my clean and tidy virtual machines none of these cut it for me in most cases.

Most developers I know have only a passing awareness of packaging so I’m going to have an aside to introduce some cool tricks. I think this is one place where sysadmins go wrong, they assume developers understand their job and that they know the various tools intimately.

System Package Tips

I’m going to show examples using the debian tools so these apply to debian and ubuntu distros. RPM and the Yum tool have similar commands too, I just happen to know debs better.

List all installed packages

This one is a bit obvious, it’s probably going to be available in anyones home grown package management system. But if you’re installing software via hand using git or a shell script then you can’t even ask the machine what is installed.

dpkg -l

List files from package

I love this one. Have you ever installed a package and wondered where the config files are? You can soft of guess based on your understanding of the OS file system layout but this command is handy.

dpkg -L lynx
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/lynx
/usr/share/doc/lynx/copyright
/usr/share/doc/lynx/changelog.gz
/usr/share/doc/lynx/changelog.Debian.gz

Where did that file come from?

Have a file on disk that you’re not sure where it came from? Ask the system package manager. The more everything is installed from packages the more useful this becomes.

dpkg -S /bin/netstat

Unmet dependencies

At the heart of a good package system is the ability to map dependencies and to have unmet dependencies installed as needed. Having tools to query that tree is useful in various places.

apt-cache unmet

Will give you output a little like the followning:

Package libdataobjects-sqlite3-ruby1.9.1 version 0.10.1.1-1 has an unmet dep:
 Depends: libdataobjects-ruby1.9

What needs upgrading?

The apticron tool can alert you to packages that are now out of date. It’s easy to set it up to email you each day for each host and tell you about packages that need upgrading. Remember that the reason one of these might have an update could be a documented security bug and it becomes even more important to know about it quickly.

apticron report [Fri, 19 Jan 2007 18:42:01 -0800]
========================================================================

apticron has detected that some packages need upgrading on: 

    faustus.example.com
    [ 1.2.3.4 ]

The following packages are currently pending an upgrade:

    xfree86-common 4.3.0.dfsg.1-14sarge3
    libice6 4.3.0.dfsg.1-14sarge3
    libsm6 4.3.0.dfsg.1-14sarge3
    xlibs-data 4.3.0.dfsg.1-14sarge3
    libx11-6 4.3.0.dfsg.1-14sarge3
    libxext6 4.3.0.dfsg.1-14sarge3
    libxpm4 4.3.0.dfsg.1-14sarge3

I’m really not an expert on using debs but even I find these tools useful, and you don’t get the same capabilities when you use anything else.

Good and bad examples

Still here? Good. I’m going to pick on a few pieces of software to give examples of what I mean. All of this software I actively use and think is brilliant earth shattering stuff, I’m not dissing the software so if any fanboys reading can kindly not attack me please, I’m one of you.

RabbitMQ (Erlang)

The nice folk building the RabbitMQ message queue provide downloads of the source code as well as various system packages. Knowing that some people will want to use the latest and greatest version of the application they also host the latest deboan packages in their own package repo with details on their site.

Chef (Ruby)

The Chef configuration management system also provides multiple methods to install their software. For people already using, happy and familiar with it they provide everything as a ruby gem. If you prefer system packages they have those too. They also provide their own deb repo for people to grab the latest software.

Cloudera Hadoop (Java)

Before I found the Cloudera Hadoop packages I remember having great fun manually applying patches to get everything working. Cloudera do exactly the same as the above two developers, namely host their owns debs.

RVM

RVM is a fantastic way of managing multiple ruby versions and multiple isolated sets of gems. But it’s also probably the first place I saw the install from remote shell script approach.

bash < <( curl http://rvm.beginrescueend.com/releases/rvm-install-head )

I like to do the same things on my development machine as I do in production, and the main problem I have with RVM is that it’s so useful I want it everywhere. I’d prefer if the system wide install had some sort of option to install the rubies from packages rather than compile everything on the machine (meaning you need a full set of compile tools installed everywhere), or that we can automate the creation of the packages using rvm.

Solr

You’ll probably find packages for the Solr search server in recent distros. It’s hugely popular predominantly because it’s a fantasic piece of software. But everytime I have a look at the system packages I can’t quite get them to work, or they are out of date. I now know my way around Solr setup relatively well and just end up creating my own packages and I’ve spoken to other folk who have done the same. The Solr documentation recommends downloading a zip file to get started and I can’t see any mention of the packages. My guess is the packages aren’t maintained as part of the core development which is a quick way to get them out of sync with current progress.

Enough beating up on my fellow developers

System packages aren’t blameless, I think the culture often seen in debian of splitting the developer from the package maintainer is part of the problem. This manifests in various ways, all negative:

  • Out of date packages. The biggest complaint from developers about system packages is nearly always that they are out of date. Maintainers should more readily release packaging scripts (ideally back to the project) so people can easily roll their own.
  • The documentation around packaging is either fantastic or terrible, depending on what you want to do and who you are. It turns out making your own packages (using something like checkinstall) is actually quite easy.
  • The official debian docs I think focus on the role of package maintainer, rather than trying to push that downstream to the developers. That doesn’t make them bad, it just means we need documentation aimed at a developer just getting started with packaging their software.
  • Developers hosting their own package repository and asking people to point at that is also quite easy. The projects I praised above all do it nicely. But simple attractive documentation is hard to come by.

What to do

First up lets talk more about the distribution and installation of software. And lets do that in the spirit of making things better for everyone involved. The ongoing spat between Ruby and Debian people is just counterproductive. This would be a good article if it didn’t lead with:

This system (apt-get) is out-dated and leads to major headaches. Avoid it for Ruby-related packages. We do Ruby, we know what’s best. Trust us.

We need better documentation aimed at developers. I’m going to try and write some brief tutorials soon (otherwise I’d feel like this rant was just me complaining) but I’m not an expert. I’ll hapily help promote or collate good material as well. Maybe it already exists and I just can’t find it?

I’m a git user and a big GitHub fan, but one of the features of Launchpad I really like is the Personal Package Archive. This lets you upload source code and have it automatically built into a package. This is specific to Ubuntu but that’s understandable given Launchpad is also operated by Canonical. What I’d like is the same feature in GitHub but that allowed building debs and RPMs for different architectures. Alternatively a webhook based third party that could do the same would be awesome (anyone fancy building one? I might pitch in). The only real advantage of it being GitHub would be it would make packages immediately cool, which hopefully you all now realise that they are.

My Default Recipes For Vagrant Virtual Machines

I’ve written about Vagrant previously and the more I use it the more it impresses me and the more it changes how I work. For those that haven’t yet used vagrant the brief summary is, it’s a way of managing, creating and destroying headless virtualbox virtual machines. So when I’m sat at my computer and I want a new 32 bit virtual machine based on Maverick I just type.

vagrant init maverick32
vagrant up

It has some other magic tricks as well, like automatically setting up NFS shares between the host and guest and allowing you to specify ports to forward in the configuration file. You access the machine via ssh, either using the handy vagrant ssh command or by using vagrant ssh-config to dump the relevant configuration to place in ~/.ssh/config.

I’ve been using virtualisation for a few years, initially purely for testing and experimentation, and then eventually for all my development. I’d have a few VMware images, I’d use snapshots and occasionally rollback, but I very rarely created new virtual machines. It was quite a manual process. With vagrant that’s changing. Everytime I start investigating a new tool or new technology or work on a pet project I create a new virtual machine. That way I know exactly what I’m dealing with, and with vagrant the cost of doing that is the 30s waiting for the new machine to boot.

Or rather it would be if I didn’t then have to install and configure the same few things on every machine. Pretty much whatever I might be doing I found myself installing the same things, namely zsh, vim, git and utils like ack, wget, curl and lynx. This is exactly what the provisioning support in vagrant is for, so I set out to use chef to do this for me.

I decided to use a remote tar file for the recipes. I’m not really bothered about managing a chef server just for my personal virtual machines, but I did want to have a canonical source of the cookbooks that wasn’t local to just one of my machines. Plus this means anyone else who shares my opinions about what you want on a new virtual machine can use them too.

My Vagrantfile now looks like this:

Vagrant::Config.run do |config|
  config.vm.box = "maverick32"
  config.vm.provisioner = :chef_solo
  config.chef.recipe_url = "http://cloud.github.com/downloads/garethr/chef-repo/cookbooks.tar.gz"
  config.chef.add_recipe "garethr"
  config.chef.cookbooks_path = [:vm, "cookbooks"]
  config.chef.json.merge!({ :garethr => {
      :ohmyzsh => "https://github.com/garethr/oh-my-zsh.git",
      :dotvim => "https://github.com/garethr/dotvim.git"
    }})
end

You can see the cookbook on GitHub at github.com/garethr/chef-repo. By default it uses the official oh-my-zsh repo and the vim configuration from jtimberman. My own versions are very minor personal preference modifications of those. The Vagrantfile example above shows how you can override the defaults and use your own configs instead if you choose.

One question I was asked about this approach was why I didn’t just create a basebox with all these things installed by default, this would reduce the time taken on first boot as software wouldn’t have to be installed each time. However it would also mean maintaining the basebox’s myself, and as I use different Linux distributions or versions this would be a headache. While doing this and working with vagrant I’ve been thinking about the ecosystem around the tool and I’m planning on writing my thoughts on that subject over the next week or so.

Solr Libraries and Good API Design

I’m a huge Solr fan. Once you understand what it does (it’s a search engine, which means more than you think) and how it works you spot lots of thorny problems that map to it’s features really well. In my experience it’s also very fast and very stable once installed and setup. Oh, and the community support is great as well.

When I talk to some folks about Solr all they can think about is full text search. The main reason for this I think is a number of poor libraries. I’ve come across lots of Python or Ruby libraries that simply say you don’t have to know anything about Solr, just install this code and you get full text search! This works in the same way as using the default Mysql or Apache configs works, nowhere near as well as if you get your hands dirty even a little. Some of the ruby gems even ship the Solr jar file in the gem. Now you don’t even need to know Solr exists. You take a generic configuration and run it using a rake task behind which is some unknown Java application server. Good luck debugging that when it goes wrong, that’s one hell of a leeky abstraction.

In better news I’ve now found two excellent Solr libraries, one’s that start with the assumption that you know what you’re doing or happy to learn about the tools you’re using. All you really want from a library is a good API that maps to how you write in that language.

Delsolr (Ruby)

The delsolr API is beautiful. It seemlessly merges the worlds of Ruby and Solr in a way that’s easy to write and easy to guess. It’s also clever, the design accepts that new features might be added to Solr before the library is updated or that the library might not support every usecase or option. In these cases you can still pass information through to Solr directly.

Solr’s interface is based around URLs, so any library is really just giving you an interface to creating those URLs.Writing the following in Ruby:

rsp = solr.query('standard',
               :query => '*:*',
               :filters => {:status => 'Active'},
               :facets => [{:field => 'project'}]
    ])

Results in the following URL:

/select?q=*:*&wt=ruby&facet=true&facet.field=status&facet.field=name&fq=status:Active

If you already know Solr and how to construct URLs for searches by hand you’ll immediately get the Ruby code. You can probably even guess how to pass other params like sort or order.

Another nice touch is that you can use either hashes or Lucene search syntax for each attribute. So:

:filters => {:status => 'Active'}

Is the same as:

:filters => 'status:Active'

Sunburnt (Python)

Sunburnt is a python Solr interface from the nice folks at Timetric. I’ve not had chance to use this library in anger as it was released after I’d dont quite a bit of python-solr work in an old job but I’d definately use it now. The API looks like:

rsp = solr.query('*:*').filter(status='Active').facet_by('project').execute()

It’s based around chaining so again you can probably guess how to make further queries from even this simple example.

Both Sunburnt and Delsolr also support adding documents to the index.

Uses

Once you understand facets and the usefulness of filter queries you see lots of places where Solr is useful apart from text search. Lots of ecommerce operations use facetted search interfaces, I’m sure everyone has spent time clicking through nested heirachies and watching the numbers (showing the number of products) next to the links decrease? You can built these interfaces using SQL but it’s incredibly expensive and gets out of hand quickly. Caching only helps a bit due to the number of permutations in all but the smallest stores or simplest products. It’s a similar problem with tagging, it’s pretty easy to kill your database

But it’s not just things that have the word search in that you can map Solr to. Two good examples are Timetric (from whom the Sunburnt library comes from) and the Guardian Content API. Both of these present lots of read data straight from Solr with great success and less database killing performance issues. Solr can really be seen as a simple place to denormalise your data, one advantage being that it keeps your database schema clean.

Learning More

Solr could do with better documentation for beginners. The wiki is an excellent reference once you know how to write schema and configuration files but I think the getting started section sacrifices introducing configuration in favour of getting people searching quicker. The example schema and solrconfig files that ship with Solr are also amazingly useful references (officially the best commented XML I’ve ever seen) but also intimidating to beginners. The Drupal community appear to be writing some good docs that fill this gap though, here’s a few links that I’d recommend:

Heroku For...

With the success of Heroku, both in terms of the recent sale and the fact it’s awesome, it was always just a matter of time before other languages and frameworks got into the platform as a service game. Here’s all the one’s I know about so far, many of them in or entering beta testing at the moment. Any others I’m missing?

Update Thanks for all the comments on here and on Hacker News, I’ve updated this list with all the suggestions.

Ruby

Python

PHP

.NET

Java (JVM)

Node.js

RingoJS

Multi Platform

A Vagrant Ecosystem

As mentioned loudly and repeatedly on here and on Twitter I love vagrant. While writing a chef cookbook to bootstrap my virtual machines I started thinking about how things around vagrant could help it be more useful. These might be things I’m going to do, or ideally get involved with others to do. If anyone has any other ideas, or suggestions please leave comments, I definately think this is the time for discussion.

Baseboxes

I don’t really want to have to maintain baseboxes but I want access to lots of them. I’m sure some people will want a Ruby on Rails in a box but all I really care about is having access to recent 32 and 63 bit vanilla linux distributions. I want a good source for trusted baseboxes. At the moment the approach is to look on the wiki, then look on the mailing list and then search the web, then create your own (even using VeeWee it’s still a little fiddly). I’ve managed to find good lucid, maverick and debian boxes, but have had problems with centos and a few others. Part of this is the rate of change recenty of both vagrant and now VirtualBox (both good things), part of it is the lack of reviews and shared experiences around baseboxes.

What I’d love to see is a single place where anyone can post a link to a basebox and vagrant users can come along and assign metadata about whether it worked and on what hardware, vagrant version, virtual box version, etc. It could even act as a tracker, counting downloads of boxes to gauge popularity.

Templated Vagrantfiles

As mentioned previously I have a chef cookbook I use to bootstrap all my new virtual machines. My process is therefore: vagrant init, make some manual changes to the Vagrantfile (or copy it from elsewhere), vagrant up. I’m lazy and want a nicer way to reuse Vagrantfiles or to script their creation.

I started out thinking that the ability to point the init command at a template and to provide context on the command line might be a good idea. Now I’m wondering whether we just need a command line application which allows for writing or modifying the Vagrantfile? Something like:

vagrant config vm.provisioner=:chef_solo
vagrant config chef.recipe_url=http://cloud.github.com/downloads/garethr/chef-repo/cookbooks.tar.gz

Hosted cookbooks

I dissed the idea of a Ruby on Rails in a box basebox above but I still want to be able to let people more easily share custom configuration for specialist applications. But what I’d prefer would be people sharing packaged cookbooks, a bit like I’ve done for my default virtual machine setup. Again the beauty of this is it’s pretty much just sharing a URL to a tar.gz file. This makes more sense to me at least than random people connecting to my chef server (I shouldn’t know about their machines) and lowers the barrier to entry for those not interested in hosting their own chef server or using the opscode platform for local virtual machines.

I’m also not talking here about just sharing individual cookbooks like cookbooks.opscode.com, but rather a packaged collection of individual recipes designed for a specific purpose. A fully working solr instance, a django application server using apache/mod_wsgi, etc.

Many of the points about baseboxes above would work here too I think. Having a good community resource which points to lots of cookbook tar files. Allowing people to feed back about what works for them. I’ve mainly taked about Chef here as that’s what vagrant initially shipped with, with the puppet provisioner now ready to go with would stand for puppet manifests too.

Smoke Testing With Cucumber On Sysadvent

I wrote a quick article last week for the excellent sysadvent advent calendar, Smoke Testing Deployments with Cucumber talks a bit more about using a few of my favourite tools to check whether a deployment just broke anything important.

Sinatra On Glassfish Example

I magically turned into a Java developer last week for a bit when I had to do some integration with a SOAP based API that really really wanted me to write Java on client as well. That led me down the route of having a good look at Jruby (which I’ve used before, mainly for testing using celerity) and in particular how easy it was to use native Java classes in Jruby (very, very easy as it turns out).

All that meant I’ll probably end up writing a nice Jruby application in the not too distant future, and not knowing too much about running such a thing in a production environment I thought I’d take a quick look. I went with Glassfish as the application server for no other reason that it took my fancy. I’d definately be interested in hearing about any positive or negative experiences people may have with it or other similar servers. My quick look turned into running a tiny Sinatra application.

First install the required gems for our little experiment. You’ll obviously need jruby which is sort of the point, I’d recommend using RVM for that.

<code>gem install sinatra warbler

Now create a sinatra app. OK, it could be any Ruby rack based application but I like Sinatra. First we need a rackup file.

<code># config.ru
require 'init'

set :run, false
set :environment, :production

run Sinatra::Application

Now for our application itself.

<code># init.rb
require 'rubygems'
require 'sinatra'
get '/' do
  "Hello World!"
end

Warble is the gem we’re going to use to create a WAR file, which is basically an all in one bundle of our application and it’s dependencies which we can deploy to a java application server.

<code># config/warble.rb
Warbler::Config.new do |config|
  config.dirs = %w(config)
  config.includes = FileList["init.rb"]
  config.gems += ["sinatra"]
  config.gems -= ["rails"]
  config.gem_dependencies = true
end

Now we’re ready to generate our WAR file.

<code>warble

This should create a file called sample.war or similar. Then just deploy that to your application server and away you go. I got this working very easily with Glassfish which seemed to be the recommended tool for such things. Installing Glassfish was time consuming but well documented here. Uploading to Glassfish was done via the web interface for the moment. I just selected a Ruby project from the deployment drop down and uploaded the war file.

Devops Weekly

I’ve really been enjoying Ruby Weekly recently, it’s an email newsletter by Peter Cooper which brings the latest Ruby related news and articles to your inbox.

I have to admit to being sceptical at first about the format, I think I unsubscribed from most email newsletters many years ago, moving my reading to RSS and then Twitter. But I’ve actually found a regularly appearing email a great way to catch up with the goings on. A think the reasons for that change of heart are:

  • Their is simply more being written now that even a few years ago, and I’m interested in more topics.
  • For a short time Hacker News solved this problem for me, but now it’s too high volume for me to subscribe and read everything.
  • I now manage my email inbox pretty well, I’m not always at inbox zero but I can’t remember the last time it was over 20.
  • RSS is great, but it’s always a limited view unless you follow 1000s of feeds.
  • Twitter is great but unless you read every tweet you’ll miss some important bit of new.s

I think Ruby Weekly works because it’s collated by someone with taste. Peter spends a bit of time putting together a small number of high quality links so I don’t have to. With Devops Weekly I’m hoping to do the same for a different niche.

So head over to devopsweekly.com to sign up. I’m not 100% sure yet when the first issue will go out but expect it in the next few weeks, and after that I’ll try and get one issue out each week. Any ideas are more than welcome, as are any news or articles that you think should go in.

Books For People Interested In Devops

Before starting with FreeAgent I decided I should spend a bit more time with Ruby and set about building something I’d been thinking about for a while. I’ve just launched the first one of my related pet projects so thought I better link to it from here.

Devops Books is exactly what it sounds like; a list of books that people interested in the whole devops concept should read. It’s not a complete list just yet and I’ll try and keep it up to date as new interesting books get released. Any suggestions to add to the list most welcome.

It gave me a proper excuse to use Heroku which has been very pleasant. Under the hood it’s a very simple sinatra application using the Mustache template language. The majority of the code is a build script that pulls down information from the Amazon API and mixes it with my own content. It then creates a JSON document that is used as the basis for generating the pages. I love static generators and hand rolling your own for a particular domain is often worthwhile. It means I have the json already lying around if I want to make a JSONP style badge for instance.

I plan on using the code as a bit of learning tool. Try out some different testing approaches, maybe add in GEOIP detection, make some of the commands I’m running into Rake tasks, that sort of thing. Given how easy it will be to throw up little sites like this using the generator I have a few other similar things in mind too.

Why You Should Be Using Virtualisation

My main development machine for a while has been an apple laptop. From looking around at conferences, offices and usergroups I know I’m not alone. But I don’t really run code on my mac very often, certainly not for work. I might edit the code on my mac but I execute it running in a virtualised Linux environment matching (as close as possible) the production environment it’s going to end up in. This blog post is an attempt to explain why this is a good idea to the majority of people who develop on a mac (or a windows machine) and deploy to something else. This isn’t language specific either. You might be working on small PHP web sites or huge Python applications, you’ll still one day run into the same issues.

Why is virtualisation a good idea?

Bugs happen, but catching them early, way before they even hit your shared source code repository makes them much less of an issue. Catching bugs only after a live release, when they affect customers and cost someone money is bad. And if your release is a long period of time after the work was done then fixing them is harder to boot. These are just some of the reasons we’re all fond of unit testing and continuous integration.

But if you’re running those tests against code executing on different hardware, on a different operating system, with different low level libraries or a different web server version or a different database server then you are not going to catch all the problems. If you take this to an extreme then you can only get rid of all of these problems by giving each developer a full production stack of their very own. This is obviously impossibly expensive for anything past the most trivial setup. But eliminating even some of these issues makes it more likely you’ll catch bugs early and less likely you’ll have a bug on your hands that you can’t recreate locally. So we’ll aim for production like rather than a 100% copy.

Here’s a real example, a case insensitive file system. Grab a terminal prompt on your mac and type the following in an empty directory. Then do the same on a typical linux machine. All we’re doing is using touch to create a file.

touch Test
touch test
ls

On you’re mac you’ll probably see:

Test

On your linux box you’re more likely to get:

Test
test

What? The mac treated Test and test as meaning the same thing. It wont let you have a file called test and one called Test in the same place. It’s case insensitive. But the linux machine didn’t have this problem. Now imagine we’re not dealing with empty files called test but either files you’re running code is creating at run time (a file cache maybe or a user uploaded file) or even more interestingly your source code files. Lets say you have git on a linux box in the corner of the office and someone checks in two files from a linux machine called Pages_controller.rb and pages_controller.rb. What happens when you get these to you’re mac? I haven’t actually tried this but it’s not going to end well. And imagine debugging this sort of issue? If you think this is all hypothetical I know about this little quirk exactly because I saw someone trying to fix a bug related to it.

What if the bug was because you had one version of lib_xml on your local development machine and a different one on production. Up to that point you might not even know what libxml was or how it got on your shiny apple laptop.

How many people can genuinely say they have never had a bug they could recreate on live and not on their development machine? Same code, different behaviour. Load and data often play a part in bugs like this as well but these can be isolated and tests created in at least some cases. Being pragmatic what we’re aiming for isn’t to eliminate all differences, it’s to get rid of those that are easy to eliminate.

How can I do this?

Virtualisation tools used to be cumbersome and expensive and generally not aimed at consumers. I’ve used both VMWare Fusion and VirtualBox on my mac and even compared to a few years ago these tools are increasingly easy to use. And VirtualBox is free and open source too. On top of that we now have tools like vagrant which I’ll give a quick example of here.

Vagrant for those that haven’t come across it yet describes itself thus:

Vagrant uses Oracle’s VirtualBox to build configurable, lightweight, and portable virtual machines dynamically

What it really is is a tool for quickly and painlessly building virtual machines based on sensible default configurations, and then providing programatic hooks for more advanced configuration. For instance you’ll have a configuration file to describe which ports you want forwarded and you can use Chef to install packages when the VM first boots. Once you have it installed it’s as easy as this to use

vagrant box add lucid32 http://files.vagrantup.com/lucid32.box
vagrant init lucid32
vagrant up

The first line downloads a 32bit Ubuntu disk image but you’ll only need to do that once. You’ll find lots of images for different distros too. The next two lines create and then boot a new headless virtual machine. That’s it.

vagrant ssh

Will let you jump straight into a ssh session with the new machine, for an idea of what else it can do here’s the help output:

Tasks:
  vagrant box                        # Commands to manage system boxes
  vagrant destroy                    # Destroy the environment, deleting the create...
  vagrant halt                       # Halt the running VMs in the environment
  vagrant help [TASK]                # Describe available tasks or one specific task
  vagrant init [box_name] [box_url]  # Initializes the current folder for Vagrant u...
  vagrant package                    # Package a Vagrant environment for distribution
  vagrant provision                  # Rerun the provisioning scripts on a running VM
  vagrant reload                     # Reload the environment, halting it then rest...
  vagrant resume                     # Resume a suspended Vagrant environment.
  vagrant ssh                        # SSH into the currently running Vagrant envir...
  vagrant ssh_config                 # outputs .ssh/config valid syntax for connect...
  vagrant status                     # Shows the status of the current Vagrant envi...
  vagrant suspend                    # Suspend a running Vagrant environment.
  vagrant up                         # Creates the Vagrant environment
  vagrant version                    # Prints the Vagrant version information

I’ll leave it their as this post is more of a rant than a tutorial, but I might write more about using vagrant later. But in the meantime read the web site for a pretty simple walkthrough. And don’t be put off by the fact it’s written in Ruby or that the example shows a Rails app, this is a great tool whatever language you’re going to be using on the virtual machine.

Arguments against

I see too few developers doing this for it to just be about a lack of awareness. Lots of developers not doing this might be running local virtual machines for cross browser testing for instance. Here’s a few complaints I’ve heard and what I think the answer is.

Speed

If something is slow and you don’t have as much RAM as you can get into your machine then do that before complaining about anything. Running a few extra operating systems inside your main operating system is obviously going to be intensive so don’t scrimp on your tools. Also the defaults when setting up new virtual machines in VirtualBox or VMWare Fusion at least are quite minimal. Try increasing the ammount of RAM you let them use or give them access to more processes. I can genuinely say I’ve had a problem with this once and the real solution was changing the code, not throwing away all the advantages of virtualisation. If you’re doing some crazy real time video processing thing then your mileage will vary, but then you probably want a faster machine anyway.

Lower level that you’re used to

As a PHP/Ruby/Python developer why should I have to care about Apache? I just write code!

This argument just bugs me, but I do know part of that is me. I need to know how all the bits work and fit together and I accept not everyone does, or indeed needs to understand everything. But someone on your team probably wants to know this stuff and importantly be able to tell others how they should do things. It’s pretty common for developers to setup a development environment for a pure frontend developer or a designer so they can make changes and commit CSS or new templates. This is no different. Most designers don’t need to know about the software environment in detail, it’s easier for them to defer to a domain expert. If a developer just wants to write code then they too should defer to someone who does know about the lower level when it comes to their development environment too.

Something else to setup

This argument has some merit. We’re all busy and downing tools to setup something for you and your team is time consuming. And I think until pretty recently the time taken and the knowledge needed was a genuine barrier. Personally I’ve tended to have few problems, but then I’m familiar enough with Linux administration to avoid some common pitfalls. But problems with setting up port forwarding or shared folders can be pretty irritating when you want to work on a pressing bug or shiny new feature. But with tools like vagrant providing a simple interface to do this I think this is hopefully a thing of the past.

Developers workstations should be personal

I agree up to a point here. Discussions of standardising individual developer tools turn into holy flame wars over whether everyone should use some IDE, Vim or Emacs (answer: Vim). This is pointless. File managers, utilities, test editors, terminal styles, host operating system. All of these and more should be up to the individual developers. But in the same way you generally don’t allow individual developers to use a new language no one knows without at least some discussion, why would this be different for the web server or operating system on which you’ll be running that code in production. Most of the time it’s not that developers make a consious decision to use a different version either. It’s more likely that they will take the path of least resistance and follow a tutorial or just use a package manager. It’s more likely if you ask the question “what specific version of Apache are you using on your development machine” they won’t know the answer.

Conclusions

I’ve not even gone into some of the other advantages of virtualisation here. Being able to snapshot your environment at any point and roll back an entire virtual machine like you do your code is hugely handy. As is the ability to create virtual machines that you can share with other members of your team. No more do new employees have to spend the first week installing dependencies and just getting code running.

I’m certainly not the only person doing this and it’s not a new idea. But it’s never been easier or cheaper. And with an increasing move towards virtualisation or cloud computing production environments it’s even easier to share good practices with your friendly systems administrators.

I’ve renabled comments on this blog after something of a break and I’d love to hear what people think, positive and negative.