Logging Django Performance

I’ve been doing some basic performance profiling work with Ruby on Rails recently and one tool I found very useful was the request log analyzer. It’s a relatively simple command line application that you can point at the Rails application log files and it outputs lots of information in agregate. So information about request duration averages or about SQL queries run. When working on a recent Django project I wanted a tool to do the same thing and ended up writing timelog.

I did a bit of research to see if I could find something similar. Here are a few other interesting tools that didn’t quite do what I wanted:

  • Django Slow Log - This logs things like memory and load averages
  • Django Dump Slow - Similar but designed to just log slow requests rather than everything, also needs a Redis backend
  • Zamboni Middleware - This is very similar to what I wanted, but it’s not a separate module and I didn’t find anything to analyse the results

Timelog is very similar to the middleware in Zamboni, the only real difference being I’m using the new logging support in Django 1.3. More interesting is the bundled management command which will output something like:

+--------------------------+--------+--------+-------+---------+---------+-------+-----------------+
| view                     | method | status | count | minimum | maximum | mean  | stdev           |
+--------------------------+--------+--------+-------+---------+---------+-------+-----------------+
| boxes.viewsBoxDetailView | GET    | 200    | 9430  | 0.14    | 0.28    | 0.21  | 0.0700037118541 |
+--------------------------+--------+--------+-------+---------+---------+-------+-----------------+
| boxes.viewsBoxListView   | GET    | 200    | 66010 | 0.17    | 0.28    | 0.232 | 0.0455415351076 |
+--------------------------+--------+--------+-------+---------+---------+-------+-----------------+
| django.views.staticserve | GET    | 200    | 61295 | 0.00    | 0.02    | 0.007 | 0.0060574669888 |
+--------------------------+--------+--------+-------+---------+---------+-------+-----------------+

At the moment I’ve not done any real benchmarking or optimisation of the script, but it will chew through a 300,000 line (20MB) log file in under 20s on my aging macbook.

The code for Timelog is on github at github.com/garethr/django-timelog and I’ve uploaded to PyPi as well at pypi.com/django-timelog. You can install it with the usual tools like pip:

pip install django-timelog

Once installed you need to do a little configuration to get things working. First add the middleware to your MIDDLEWARE_CLASSES in your settings file.

MIDDLEWARE_CLASSES = (
  'timelog.middleware.TimeLogMiddleware',

Next add timelog to your INSTALLED_APPS list. This is purely for the management command discovery.

INSTALLED_APPS = (
  'timelog',

Then configure the logger you want to use. This really depends on what you want to do, the django 1.3 logging setup is pretty powerful. Here’s how I’ve got logging setup as an example:

TIMELOG_LOG = '/path/to/logs/timelog.log'

LOGGING = {
  'version': 1,
  'formatters': {
    'plain': {
      'format': '%(asctime)s %(message)s'},
    },
  'handlers': {
    'timelog': {
      'level': 'DEBUG',
      'class': 'logging.handlers.RotatingFileHandler',
      'filename': TIMELOG_LOG,
      'maxBytes': 1024 * 1024 * 5,  # 5 MB
      'backupCount': 5,
      'formatter': 'plain',
    },
  },
  'loggers': {
    'timelog.middleware': {
      'handlers': ['timelog'],
      'level': 'DEBUG',
      'propogate': False,
     }
  }
}

In order for the analyser script to work correctly you’ll need to use the plain formatter and to register a handler for the timelog.middleware logger.

With that all configured try hitting your application a few times. You should see a log file created at the location specified in TIMELOG_LOG. Fill that up with a few lines and then run the analyze_timelog management command:

python manage.py analyze_timelog

This should output something similar to the above table but with your timings and views. The management command currently allows you to point the analyzer at a different file say from a backup, or a file you’ve pulled down from production but want to run the command locally. I’ll likely add some filters as time permits for things like not showing all views or showing only views between a given date range.

The above table showing the view function is good for big picture trends, but if you want to dig down into a particular path then you can pass an argument to not reverse the path:

python manage.py analyze_timelog --noreverse

This should give you something more like:

+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| view                             | method | status | count | minimum | maximum | mean  | stdev            |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| /assets/css/main.css             | GET    | 200    | 61295 | 0.00    | 0.02    | 0.007 | 0.0060574669888  |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| / bob                            | GET    | 404    | 4715  | 0.01    | 0.01    | 0.01  | 0.0              |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| /                                | GET    | 200    | 66010 | 0.17    | 0.28    | 0.232 | 0.0455415351076  |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| /__debug__/m/css/toolbar.min.css | GET    | 304    | 4715  | 0.00    | 0.00    | 0.0   | 0.0              |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+
| /14/                             | GET    | 200    | 9430  | 0.14    | 0.28    | 0.21  | 0.0700037118541  |
+----------------------------------+--------+--------+-------+---------+---------+-------+------------------+

Debugging HTTP Headers with RedBot

I’ve been using the Vagrantbox.es site as a bit of a playground recently and I’ve been meaning to blog about some of the overengineering I’ve been doing. Here’s a smaller starter.

Getting the headers returned by your web server correct is both easy to do and easy to forget about. Unless you go actively looking for headers with curl or similar you’ll probably miss them, and even then will you spot an incorrect header by eye? RedBot is an excellent online tool that not only shows you the headers but makes recommendations about what might be missing, invalid or things you haven’t considered.

For instance the RedBot results for vagrantbox.es look like this:

Debugging information from Redbot

Or at least they do now after a few tweaks to my nginx configuration. In particular I’ve added

add_header Vary Accept-Encoding;

I’d assumed that enabling gzip compression with gzip_vary would have set this automatically and never checked. The way RedBot provides a checklist of recommendations is very handy.

As well as checking the page itself you can also check all the assets associated with a page by adding a query string argument. For instance here is the assets view for Vagrantbox.es. RedBot also provides a JSON encoded version of the result which might be useful in a CI system. If you’d rather run your own instance of the software you can, the code can be found on github at mnot.github.com/redbot/. It currently doesn’t work with HTTPS resources but that’s about the only thing I’d like to see added.

Python On Cloudfoundry

For those that haven’t yet had a look Cloudfoundry from VMware is two things, one of which is nice, one of which is very cool indeed:

  • On one hand it’s a platform as a service, allowing you to easily deploy Ruby, Java and Node.js applications to cloudfoundry.com.
  • On the other hand it’s an open source project with all the code on Github allowing you to run the entire stack wherever you like.

I’m pretty interested in the latter. Its API could in theory become a defacto standard for application and service buildouts, in the same way as we’re seeing the EC2 API expand outside AWS for managing infrastructure (and arguably how we’re using Chef and Puppet for managing the things installed on that infrastructure). The really interesting bit is the fact it’s all open source. Not only does that mean you can run it on your own infrastructure (including as I’m doing on a virtual machine on my laptop), but it’s also designed so new services (like Redis, MySQL, RabbitMQ), new runtimes (like Ruby 1.8, RUby 1.9, Java) and new frameworks (like Rails, Sinatra, Spring) can be added easily.

I’m running vcap on a vagrant managed VirtualBox instance, but you could install it anywhere you like. I used these chef recipes to get everything installed. I ran into an issue with the mysql service not starting correctly that I fixed and I needed to manually install chef into the rvm gemset part way through the install, but the recipes pretty much just worked.

Looking for an excuse to have a route through the vcap code I decided to see if I could add rudimentary support for Python applications. After a day of noodling around I’ve forked the code and sent a few pull requests back with it basically working. This required changes to the vmc client, to the vcap service and like all good open source contributions to the test suite.

Thanks hugely to existing pull requests (mainly the one’s for adding PHP support) it’s not taken long at all. The Sinatra and Rails support which shipped with the first release from VMWare supports using Bundler to define gem dependencies that can be pulled in at deploy time. It shouldn’t be too much effort I’m hoping to do the same for using pip and virtualenv for each deployed python application. I think I can also see how to support python3 and how to add django as a supported framework.

I had huge fun, but you might not at this early stage in the project. I’m relatively happy with reading and writing Ruby, futzing with rvm, debugging installation woes and hunting down service configuration problems. The best tool for working out what was happening was generally tailing the logs in /tmp/vcap-run/ and finding the code that wrote a given message. If you just want something to work I’d wait a little while, if you like the sound of the above it’s a pretty nice codebase to play around in. I’d love to eventually see some official contributor documentation and some hints and tips on things like running the tests. At the moment flicking through reported issues and pull requests on GitHub is the best place to start.

Vagrant Plugin For Interacting With Vagrantbox.es

After Patrick released Sahara, a nifty extension for the Vagrant command line tool, I’ve been meaning to put together a similar extension for interacting with the growing list of base boxes on vagrantbox.es.

After a bit of hacking this morning I’ve just pushed out an initial release of the vagrantboxes gem and you can find the source code and some documentation on GitHub.

The extensions adds a vagrantboxes namespace to the vagrant command line tool which provides a few useful commands. Specifically you can list all the available boxes, do text searches, show the full details of a box and most handily of all add a box that takes your fancy to your local base box store, all without having to worry about the URLs of the boxes if you don’t want to.

Here’s an example of a simple search:

>> vagrant vagrantboxes search centos
3    centos 5.5                http://dl.dropbox.com/u/15307300/vagrant-0.7-centos-64-base.box
6    opscode centos 5          http://opscode-vagrant-boxes.s3.amazonaws.com/centos5-gems.box
7    opscode ubuntu 10.04      http://opscode-vagrant-boxes.s3.amazonaws.com/ubuntu10.04-gems.box
9    puppet centos 5.5 64      http://puppetlabs.s3.amazonaws.com/pub/centos5_64.box
10   puppet centos 4 64        http://puppetlabs.s3.amazonaws.com/pub/centos4_64.box
21   centos 5.6 32             http://yum.mnxsolutions.com/vagrant/centos_56_32.box

And another quick example, this time of show printing the full description. In the future I might look to store more structured metadata and make this more useful.

>> vagrant vagrantboxes show 18
puppet debian lenny 64
http://puppetlabs.s3.amazonaws.com/pub/debian_lenny_64.box

Debian Lucid 64 bit vagrant box. Puppet 2.6.6 installed and ready to provision using the Puppet provisioner in Vagrant.

For good sample modules, try the jeffmccune-motd and jeffmccune-mockbuild modules posted to http://forge.puppetlabs.com/

Created by Ken Barber, [email protected]

This proved a good excuse to delve into the Vagrant source code which is pretty readable for the most part once you understand the libraries it’s build upon. It’s simple enough to extend for adding commands like this too, which bodes well for other more useful additions in the future.

If anyone has any problems with this extensions do let me know. Error handling currently consists of returning blank output rather than sensible error codes or messages, and as I’ve yet to add a small test suite so their might (will) be a few bugs lying around. I’ll try and make it better behaved in the next week or two but reasoned it’s useful straight away.

Version Control And Deployment Of Cron Jobs

A recent question on Twitter prompted me to write a quick blog post about managing cron jobs. As more and more people want to automate provisioning and deployment of web applications some, maybe previously manually managed, items come into the fold.

Cron jobs are interesting because you may prefer to see them as part of the infrastructure (like apache or mysql) or as part of your application code. I think both are valid, sometimes at the same time. For instance you might have a cron job which deals with scheduled database backups. All that requires is the database and the script to be present. At other times your cron jobs might require your entire application stack. For instance a rake task which uses a Rails application model, or a django management command.

Configuration Management and Cron

Both Chef and Puppet provide a cron resource type. This is particularly handy for things like database backup scripts or ganglia gmetric scripts. You probably want these scripts and cron jobs to be installed on machines that have the related software installed, and you’re probably already describing this in your Chef recipes or Puppet manifests. If you’re not already using one of these tools using them to manage just your cron jobs is a nice way of starting out.

Using the Puppet Cron Type looks like this:

<code>cron { command:
  command => "/usr/local/sbin/command",
  hour => 2,
  minute => 0
}</code>

And the equivalent Chef Reasource looks like:

<code>cron "describe your job" do
  hour "2"
  minute "0"
  command "/usr/local/sbin/command"
end</code>

The important part is that by describing your cron jobs in code you can then version control them easily, and both Chef and Puppet have mechanisms to push these jobs out to be installed by the relevant hosts. With cron jobs you might not want all your machines to be running the same jobs for instance.

Using Whenever

An alternative, or complimentary, approach to versioning and deploying cron jobs would be to tie it in with your application code. This makes sense when those jobs are tightly coupled to your application, for instance rails specific rake tasks or django management commands. Whenever is a tool I’ve been using recently that makes this pretty simple. You describe your cron jobs in a file called schedule.rb like so:

<code>every 3.hours do
  command "/usr/local/sbin/command"
end</code>

And then running the provided whenever command line application will generate a working crontab. Whenever ships with capistrano integration and some useful shortcuts for running rake tasks or accessing Ruby code, but it’s simple to write your own command shortcuts without having to write ruby code too.

Other Approaches

I have seen some tools which replace cron completely, but I’ve never liked that idea much. Cron works pretty well, and is clever enough to deal with things like daylight saving time and leap years inteligently if needed. I know other folks who are centralising all regular jobs into something like Jenkins. I can see advantages to that, although I’ve yet to find a really nice way of automating this outside the gui interface or manually modifying configuration files.

Creating A Cucumber Nagios Package With Fpm

I’ve written before about why I like System Packages, but even I’ll admit that the barriers to creating them mean I don’t use them for everything. FPM however is making it much easier, to the point where I’m starting to create a few packages I can reuse on projects. I thought a write up of how I’m doing that for Cucumber-Nagios might be useful.

For those that haven’t seen it yet, FPM (or Effing Package Management) is a tool that helps you build packages, like RPMs and DEBs, quickly. It can take in gems, python packages, node.js npm files or just plain directories and make files and from that create you one or more system packages. Lets have a look at a full example with a Ruby gem.I really like using cucumber-nagios, whatever platform or language I happen to be using at the time. I have a number of Django projects for instance with cucumber-nagios checks, so being able to not worry about most of the Ruby install is useful.

In order to use FPM you’ll need to install it. It’s available as a Ruby gem so lets start there. I’m not going to delve into setting up a Ruby Gems environment as it’s annoying and covered for most platforms elsehere on the internet.

<code>gem install fpm</code>

First off lets install the cucumber-nagios gem, along with all it’s dependencies, into a local folder on my build machine. This might be a virtual machine or a separate machine in your cluster. It should be running the same OS as the intended production machines however. The following examples are from Ubuntu, but it’s much the same for RPM based distros.

<code>gem install --no-ri --no-rdoc --install-dir ~/tmp/gems cucumber-nagios</code>

Cucumber-nagios has a large number of dependencies, so we’re going to need to create packages for all of those too. FPM will cleverly deal with maintaining the specified dependencies thought. We’ll use find to do this quickly, and then instruct FPM to convert from a gem to a deb. You could obviously do this line by line if you prefer.

<code>find ~/tmp/gems/cache -name '*.gem' | xargs -rn1 fpm -s gem -t deb -a all</code>

That should leave us with lots of new .deb files that we can have a closer look at:

<code>dpkg --info rubygem-cucumber-nagios_0.9.0_i686.deb
dpkg -c rubygem-cucumber-nagios_0.9.0_i686.deb</code>

If you already have a private apt repository set up then just upload these packages and away you go. I’d like to use apt for installing them so I can leave it to manage all the dependencies easily. If not then I’ll show you briefly how to create a local file system repo, it’s not much more work to create a shared repo available over HTTP.

First create a directory to store our packages and move our newly created .deb files into it. You’ll need to be able to execute some of these commands as root but given the topic I’m assuming you’ll be able to deal with that.

<code>mkdir /usr/local/repo
cp *.all.deb /usr/local/repo</code>

For the next part you’ll need to install the dpkg development tools, and then create a file that can be read by apt when it’s looking for packages it can install.

<code>apt-get install dpkg-dev
dpkg-scanpackages /usr/local/repo /dev/null | gzip -9c > /usr/local/repo/Packages.gz</code>

Now add your new package repo to your apt sources and update your package cache.

<code>/etc/apt/sources.list
deb file:/usr/local/repo ./
apt-get update</code>

At this point everything should be up and running. Let’s do a search in our repo:

<code>apt-cache search rubygem-
rubygem-amqp - AMQP client implementation in Ruby/EventMachine
rubygem-builder - Builders for MarkUp.
rubygem-bundler - The best way to manage your application's dependencies
rubygem-cucumber - cucumber-0.10.2
rubygem-cucumber-nagios - Systems testing plugin for Nagios using Cucumber.
rubygem-diff-lcs - Provides a list of changes that represent the difference between two sequenced collections.
rubygem-eventmachine - Ruby/EventMachine library
rubygem-extlib - Support library for DataMapper and Merb
rubygem-gherkin - gherkin-2.3.6
rubygem-highline - HighLine is a high-level command-line IO library.
rubygem-json - JSON Implementation for Ruby
rubygem-mechanize - The Mechanize library is used for automating interaction with websites
rubygem-net-ssh - Net::SSH: a pure-Ruby implementation of the SSH2 client protocol.
rubygem-nokogiri - Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser
rubygem-rack - a modular Ruby webserver interface
rubygem-rack-test - Simple testing API built on Rack
rubygem-rspec - rspec-2.5.0
rubygem-rspec-core - rspec-core-2.5.1
rubygem-rspec-expectations - rspec-expectations-2.5.0
rubygem-rspec-mocks - rspec-mocks-2.5.0
rubygem-templater - Templater has the ability to both copy files from A to B and also to render templates using ERB
rubygem-term-ansicolor - Ruby library that colors strings using ANSI escape sequences
rubygem-webrat - Ruby Acceptance Testing for Web applications</code>

And finally lets install cucumber-nagios from our shiny new package.

<code>apt-get install rubygem-cucumber-nagios</code>

If everything has worked out you should be able to run the cucumber-nagios-gen command to create a new project. Note that the path may be different, and in the case of debian based distros the gem bin path is not on the Path.

<code>/usr/lib/ruby/gems/1.8/bin/cucumber-nagios-gen project test
Generating with project generator:
     [ADDED]  .gitignore
     [ADDED]  .bzrignore
     [ADDED]  Gemfile
     [ADDED]  README
     [ADDED]  features/steps
     [ADDED]  features/support</code>

Devops Weekly Archive

Since I launched it back in November my Devops Weekly email has been pretty well received I think. Folks on twitter seem to like it, as do a few people I’ve met at recent events who have said nice things. One thing a few people have been after though is an online archive, either because email just doesn’t work for them or more often because they missed out on the earlier issues.

With a little time this weekend and the help of Jekyll I’ve just added the seventeen iss§ues to date to the new Devops Weekly archive. I’d have liked to do something a bit more useful, maybe introduce per link tags or a nifty search feature, but I’m pretty busy at present and reason this should serve the main purpose of getting these links online.

Collecting Metrics With Ganglia And Friends

I had the pleasure of speaking at Cambridge Geek Night on Monday again, the topic of conversation being using Ganglia to collect more than just base systems metrics.

The audience of web developers, the odd sysadmin and business folk seemed to enjoy it and we had lots of time for questions at the end. The main point I tried to get across was that Ganglia makes a great platform for ad-hoc metrics gathering due to the ability to just throw values at it and get time series graphs without any extra configuration.

The slides include a few bits of code I’ve been using that I’ll throw onto GitHub as a proper project when I get the time. These are very simple Python servers, one which allows for sending metrics information over HTTP, the other using TCP instead. Both really handy for getting more people to add hooks into applications.

Vagrant At The Guardian

As recent blog posts on here make clear, I’m a big fan of Vagrant. And when Michael asked if I’d fancy talking to some of his colleagues at The Guardian about how I use it I really couldn’t say no.

I gave a short talk, running through the following slides, and running a few demos showing creating, destroying and provisioning new machines.

More interesting I thought were the questions and conversations that followed. We talked a little about how Vagrant might fit into a continuous integration setup. Another aspect some of the systems folk took to was how flexible the network configuration was and whether they might be able to use this to more effectively test firewall configurations well before the final push into a production environment. It’s not something I’ve been doing but it sounds feasable and useful in some organisations. If anyone is doing interesting things with Vagrants network config I’d be interested to know.

Devops Isn't A Methodology

I was reading Devops is a poorly executed scam and just couldn’t resist a reply. Not because of the entertaining title, but because I both agree and disagree quite strongly with parts of the post. Read it first if you haven’t already. And yes I know I’m feeding the internet.

I’m going to pick parts of the post out and then comment. Hopefully I’m not quoting these in any way out of context.

“It’s got the potential to make a handful of people a lot of money in the same way that Agile did, but nobody is really executing on.”

People are pretty aware of this fact I think, but watch what happens when people post on the mailing lists or turn up at community events with a purely marketing hat on. They just get no traction and even damage their product brand amongst the early adopters. The fact the term is starting to get used in job adverts and marketing materials isn’t really being driven by the people talking about what devops might or might not be. I think the main reason for this is that most of the people I talk to in person or online are actually pretty happy with their jobs and generally work inside companies rather than as independent consultants. They have often reached an age where they want to improve within a given field but would like a wider network than their current colleagues to discuss things with.

“How do you implement Devops?”

I don’t think you do. The comparisons with Agile are interesting from a community point of view but Scrum is a methodology. To me at least devops isn’t, it’s just a banner or tag under which interesting conversations are happening. The argument that “You should be doing this anyway. Not earth shattering.” is a good thing. You’d be suprised by how many people don’t do all the things they should be doing, especially in small and young companies. And one of the reasons for that is no one bothered writing a list of these things down anywhere and then discussing them. I’m not saying this huge list exists or even whether it should, but the discussion is happening.

“The underlying problem, however, is that dev and ops have different goals”

This is spot on. I think this maybe does get missed in talk that focuses more on tools but not in the wider discussion happening about business improvements. Devops quite litterally brings those two things together. You’ll always have individual goals but where you have separate operations and development teams they should have the same fundamental goals.

“Developers develop in the same environment production runs in If you deploy to Linux, you develop on Linux. No more of this coding on your Macbook Pro and deploying to Ubuntu: that is why you can’t have nice things.”

Yes, yes and yes again. I’m definitely from the developer side of the tracks and I’m constantly telling people this and it’s definitely something I don’t see enough people doing. What I’d love is for all the operations people to state this to their development team and most importantly to help them set that up. Just saying work like me or I won’t let you near the production machines is just being obstructive. Educating and helping with tooling helps build those bridges and trust. And with trust comes the access the developers want. And less stupid bugs and less deployment issues with differing package dependencies.

“None of this amounts to a methodology, as the Devops people would have you believe.”

Still unsure which Devops people are saying it’s a bonefide methodology. I see the word used sometimes but generally in passing and not I don’t think meant how you mean here. And I don’t think I’ve heard people speak about it in person. “Scrum methodology” returns more than 113,000 results in Google. “Devops methodology” returns about 150, some of which 404 and half of which are agregators pointing at the other half.

“The Devops movement smells of a scam in the making”

Some company or other is definitely going to be scammed into paying over the odds for a consultant because they used the word Devops in the sales pitch. That will have next to nothing to do with what I’d see as the Devops movement and everything to do with human nature (and sales people).