Your Own PyPi server

So one of the problems with using pip or easy_install as part of an automated deployment process is they rely on an internet connection. More than that, they rely on PyPi being up as it’s a centralised system, unlike all the apt package mirrors.

The best solution seems to be to host your own PyPi compliant server. Not only can you load all the third party modules you use onto it, but you could also upload any internal applications or libraries that you like. By running this on your local network you ensure your not dependent on pypi or an internet connection.

At the moment I’m playing with Chishop which is a django application for maintaining a PyPi compatible server. Another alternative if that doesn’t work out is EggBasket

To install from your own PyPi server you can specify the location of your Chishop instance with the -i flag.

pre. easy_install -i http://localhost:8000/ PACKAGE_NAME

This will fall back to the PyPi server if it doesn’t find the relevant package. If you want to stop that behaviour and make sure you have a local package then you can limit the hosts with the -H flag like so.

pre. easy_install -H localhost:8000 -i http::/localhost:8000/ PACKAGE_NAME

I’m not yet sure how to do this with pip, if someone wants to enlighten me in the comments then I’d be most grateful.

Fabric, Django, Git, Apache, mod_wsgi, virtualenv and pip deployment

I’ve been playing with automating Django deployments again, this time using Fabric. I found a number of examples on the web but non of them quite fit the bill for me. I don’t like serving directly from a repository, I like to have either a package or tar I can use to say “that is what went to the server”. I also like having a quick rollback command as well as being able to deploy a particular version of the code when the need arises. I also wanted to go from a clean ubuntu install (plus SSH) to a running Django application in one command from the local development machine. The Apache side of things is nicely documented in this Gist which made a good starting point.

I’m still missing a few things in this setup mind and at the moment you still have to setup your local machine yourself. I’m probably going to create a paster template and another fabfile to do that I think. The instructions are a little rough as well at the moment and I’ve left the database out of it as everyone has there own preference.

This particular fabric file makes setting up and deploying a django application much easier, but it does make a few assumptions. Namely that you’re using Git, Apache and mod_wsgi and your using Debian or Ubuntu. Also you should have Django installed on your local machine and SSH installed on both the local machine and any servers you want to deploy to.

note that I’ve used the name project_name throughout this example. Replace this with whatever your project is called.

First step is to create your project locally:

pre. mkdir project_name cd project_name django-admin.py startproject project_name

Now add a requirements file so pip knows to install Django. You’ll probably add other required modules in here later. Creat a file called requirements.txt and save it at the top level with the following contents:

pre. Django

Then save this fabfile.py file in the top level directory which should give you:

pre. project_name fabfile.py requirements.txt project_name init.py manage.py settings.py urls.py

You’ll need a WSGI file called project_name.wsgi, where project_name is the name you gave to your django project. It will probably look like the following, depending on your specific paths and the location of your settings module

pre. import os import sys

  1. put the Django project on sys.path sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(file), “../”))) os.environ[“DJANGO_SETTINGS_MODULE”] = “project_name.settings” from django.core.handlers.wsgi import WSGIHandler application = WSGIHandler()

Last but not least you’ll want a virtualhost file for apache which looks something like the following. Save this as project_name in the inner directory. You’ll want to change /path/to/project_name/ to the location on the remote server you intent to deploy to.

pre. WSGIDaemonProcess project_name-production user=project_name group=project_name threads=10 python-path=/path/to/project_name/lib/python2.6/site-packages WSGIProcessGroup project_name-production WSGIScriptAlias / /path/to/project_name/releases/current/project_name/project_name.wsgi Order deny,allow Allow from all ErrorLog /var/log/apache2/error.log LogLevel warn CustomLog /var/log/apache2/access.log combined

Now create a file called .gitignore, containing the following. This prevents the compiled python code being included in the repository and the archive we use for deployment.

pre. *.pyc

You should now be ready to initialise a git repository in the top level project_name directory.

pre. git init git add .gitignore project_name git commit -m “Initial commit”

All of that should leave you with

pre. project_name .git .gitignore requirements.txt fabfile.py project_name init.py project_name project_name.wsgi manage.py settings.py urls.py

In reality you might prefer to keep your wsgi files and virtual host files elsewhere. The fabfile has a variable (config.virtualhost_path) for this case. You’ll also want to set the hosts that you intend to deploy to (config.hosts) as well as the user (config.user).

The first task we’re interested in is called setup. It installs all the required software on the remote machine, then deploys your code and restarts the webserver.

pre. fab local setup

After you’ve made a few changes and commit them to the master Git branch you can run to deply the changes.

pre. fab local deploy

If something is wrong then you can rollback to the previous version.

pre. fab local rollback

Note that this only allows you to rollback to the release immediately before the latest one. If you want to pick a arbitrary release then you can use the following, where 20090727170527 is a timestamp for an existing release.

pre. fab local deploy_version:20090727170527

If you want to ensure your tests run before you make a deployment then you can do the following.

pre. fab local test deploy

The actual fabfile looks like this. I’ve uploaded a Gist of it, along with the docs, so if you want to improve it please clone it.

pre. # globals config.project_name = ‘project_name’

  1. environments def local(): “Use the local virtual server” config.hosts = [‘172.16.142.130’] config.path = ‘/path/to/project_name’ config.user = ‘garethr’ config.virtualhost_path = “/”
  2. tasks def test(): “Run the test suite and bail out if it fails” local(“cd $(project_name); python manage.py test”, fail=“abort”) def setup(): “”” Setup a fresh virtualenv as well as a few useful directories, then run a full deployment “”” require(‘hosts’, provided_by=[local]) require(‘path’) sudo(‘aptitude install -y python-setuptools’) sudo(‘easy_install pip’) sudo(‘pip install virtualenv’) sudo(‘aptitude install -y apache2’) sudo(‘aptitude install -y libapache2-mod-wsgi’) # we want rid of the defult apache config sudo(‘cd /etc/apache2/sites-available/; a2dissite default;’) run(‘mkdir -p $(path); cd $(path); virtualenv .;’) run(‘cd $(path); mkdir releases; mkdir shared; mkdir packages;’, fail=‘ignore’) deploy() def deploy(): “”” Deploy the latest version of the site to the servers, install any required third party modules, install the virtual host and then restart the webserver “”” require(‘hosts’, provided_by=[local]) require(‘path’) import time config.release = time.strftime(‘%Y%m%d%H%M%S’) upload_tar_from_git() install_requirements() install_site() symlink_current_release() migrate() restart_webserver() def deploy_version(version): “Specify a specific version to be made live” require(‘hosts’, provided_by=[local]) require(‘path’) config.version = version run(‘cd $(path); rm releases/previous; mv releases/current releases/previous;’) run(‘cd $(path); ln -s $(version) releases/current’) restart_webserver() def rollback(): “”” Limited rollback capability. Simple loads the previously current version of the code. Rolling back again will swap between the two. “”” require(‘hosts’, provided_by=[local]) require(‘path’) run(‘cd $(path); mv releases/current releases/_previous;‘) run(‘cd $(path); mv releases/previous releases/current;’) run(‘cd $(path); mv releases/_previous releases/previous;‘) restart_webserver()
  3. Helpers. These are called by other functions rather than directly def upload_tar_from_git(): require(‘release’, provided_by=[deploy, setup]) “Create an archive from the current Git master branch and upload it” local(‘git archive –format=tar master | gzip > $(release).tar.gz’) run(‘mkdir $(path)/releases/$(release)’) put(‘$(release).tar.gz’, ‘$(path)/packages/’) run(‘cd $(path)/releases/$(release) && tar zxf ../../packages/$(release).tar.gz’) local(‘rm $(release).tar.gz’) def install_site(): “Add the virtualhost file to apache” require(‘release’, provided_by=[deploy, setup]) sudo(‘cd $(path)/releases/$(release); cp $(project_name)$(virtualhost_path)$(project_name) /etc/apache2/sites-available/‘) sudo(‘cd /etc/apache2/sites-available/; a2ensite $(project_name)‘) def install_requirements(): “Install the required packages from the requirements file using pip” require(‘release’, provided_by=[deploy, setup]) run(‘cd $(path); pip install -E . -r ./releases/$(release)/requirements.txt’) def symlink_current_release(): “Symlink our current release” require(‘release’, provided_by=[deploy, setup]) run(‘cd $(path); rm releases/previous; mv releases/current releases/previous;’, fail=‘ignore’) run(‘cd $(path); ln -s $(release) releases/current’) def migrate(): “Update the database” require(‘project_name’) run(‘cd $(path)/releases/current/$(project_name); ../../../bin/python manage.py syncdb –noinput’) def restart_webserver(): “Restart the web server” sudo(‘/etc/init.d/apache2 restart’)

What's new in Django 1.1

With the release candidate for Django 1.1 out the door I decided to have a quick look at what’s new. This isn’t a complete list, rather the bits I found most interesting.

Conditional Views

Django now has much better support for conditional view processing using the standard ETag and Last-Modified HTTP headers. This means you can now easily short-circuit view processing by testing less-expensive conditions. For many views this can lead to a serious improvement in speed and reduction in bandwidth.

A nice set of decorators for dealing with ETags and Last-Modified headers. Again very simple to use and set up, and a simple way of squeezing a little more performance out of you application.

Admin Actions

The basic workflow of Django’s admin is, in a nutshell, “select an object, then change it.” This works well for a majority of use cases. However, if you need to make the same change to many objects at once, this workflow can be quite tedious. In these cases, Django’s admin lets you write and register “actions” – simple functions that get called with a list of objects selected on the change list page.

Anything that makes the admin a little more powerful and a little more flexible is a good idea in my book. Admin actions allow you to run code over multiple objects at once, simple select them with a checkbox then select an action to run. This is worth it for the delete action alone, but you can write your own actions simply enough as well (for instance for approving a batch of comments, or archiving a set or articles.)

Editable Admin List Items

You can now make fields editable on the admin list views via the new list_editable admin option. These fields will show up as form widgets on the list pages, and can be edited and saved in bulk.

Another time saving admin addition, this time for making some fields editable from the change list rather than the object view. For quick changes, especially to boolean fields, I think this again is a nice addition.

Unmanaged Models

You can now control whether or not Django creates database tables for a model using the managed model option. This defaults to True, meaning that Django will create the appropriate database tables in syncdb and remove them as part of reset command. That is, Django manages the database table’s lifecycle. If you set this to False, however, no database table creating or deletion will be automatically performed for this model. This is useful if the model represents an existing table or a database view that has been created by some other means.

I particularly like this addition. One of the issues I had with Django was some of the built in assumptions, in particular that you’d be using a SQL database backend. Using unmanaged models looks like a great approach to using an alternative database like couchdb, tokyotyrant or mongodb or representing a webservice interface in your application.

I’m sure I’ll have missed a few other interesting changes or additions. Anyone else have a favourite?

Asteroid - simple app for running scripts and recording the results

Asteroid is a simple web interface for running scripts and recording the results. It’s like a much simpler and more general purpose version of something like Cruise Control. You can get the code on Github.

Asteroid Dashboard

I built it to solve two main problems:

  • It’s sometimes useful to have a historical record of a scripts execution, in particular whether it passed or failed and what the output was. Just running a command line script probably doesn’t give you that. It’s also useful to have a more graphical interface for those members of the team who don’t use the command line.
  • When working in a team you often want to run scripts against shared infrastructure, for instance deploying a testing release or running a test suite. Seeing what is running at present helps with that.

So it should be useful for running deployments, running test suites, running backups, etc. It currently doesn’t have scheduling or similar build in, but as everything is triggered by hitting a URL it would be simple enough to use cron for something like that. It should also be useful whatever language you write your scripts in; rake, ant, shell scripts, etc. At the end of the day it just executes a command at the console.

Requirements

Asteroid uses the Django Python framework under the hood.

You’ll also need a database. The default in the shipped settings is to use sqlite but this should work with any database supported by Django.

You’ll also need a decent web browser. I’ve gone and used HTML5 as an experiment and with this being a developer tool I’m hoping to stick with it. It would be easy enough to convert the templates if this is a problem however.

The application has an optional message queue backend which can be enabled in the settings file. This is used to improve the responsiveness of the application as well as allow commands to be executed on a remote machine, rather than on the box Asteroid is running.

Other AMQP compliant message queues should work but it’s currently only tested with Rabbit.

If you are intending to do any development on Asteroid, or just want to look more closely at the code, I’d recommend installing

Usage Instructions

You should be able to just download asteroid and run it from wherever you put it, once you setup the database.

cd asteroid/configs/common
manage.py syncdb
manage.py runserver

This should bring the local web server up on port 8000 so visit http://localhost:8000 and see.

If you’re using the message queue backend you’ll need to run the listener script in order to get your commands executed. At the moment that means modifying a constant in the listener script to point at a running message queue instance at asteroid/bin/asteroid_listen.py.

cd asteroid/bin
./asteroid_listen.py

Once you’re up and running you should be able to add commands via the admin interface at http://localhost:8000/admin/. The username and password should be those you added when creating the database via the syncdb command above.

The development configs include a few additional applications (mentioned above) which I use for testing and debugging. You can run the test suite like so:

cd asteroid/configs/development
manage.py test --coverage

Todo

This is an early release that just about works for me. I can already see a number of areas I’d like to clean up a little or extend. For instance:

  • Other deployment options, including a WSGI file and a spawning startup script.
  • Use a database migration system to make upgrades easier.
  • Make the message queue listener script more robust.
  • Make the command entry more robust, it sometimes takes a bit of fiddling with to get something to run correctly.
  • Formalise running scripts on remote machines, including support for running on multiple machines.
  • Paging for long lists of commands or runs.

Notes

I’m pretty happy with how it’s shaping up so far. Under the hood it works by having the web app put a JSON document on the message queue. The JSON contains the command to be run and a callback URL. The script listening to the message queue picks up the message, runs the command, and posts a JSON document back to the webhook url. It keeps the web interface snappy, as well as meaning it can show which commands are currently in progress at any given time. It also has the side benefit of meaning you can execute commands on a remote machine, as the listener doesn’t care where it’s running.

As noted above I have a few ideas of where I want to take it, but I’m going to try using it for a bit and see how that goes. If anyone else finds it useful then do let me know.

It's the Data we Want

A spreadsheet. A CSV file. Whatever is in use internally. Made available to people like us under a suitable license.

I feel a little self adsorbed quoting myself (from a recent Refresh Cambridge discussion) but I did like the turn of phrase. What I was rambling on about was Cambridge County mapping data, after a question from a nice chap from the council about what “new, exciting map technology” we’d like to see. But it applies to any data that you’re trying to make public what-so-ever, be it government or otherwise.

What myself and a few other people were talking about, and one of the things that has been discussed as part of the Rewired State group, is that it’s all about the data, not necessarily about a nice web based API.

Now I’ve written and spoken about the need for well designed API’s being treated as part of the user interface. But remember interface design, and by association API design, isn’t easy. API design is often about building manageable flexibility. A public API is often about managing the flow of data you control out to third parties, as well as the information itself it might include limitations on usage, or request rate, or storage. A public API codifies how that information can be accessed. APIs also have to tread a fine line between making it easy for you to solve your problem, and making it easy for everyone else to solve their completely different problems. These compromises are design.

But not everything needs an API. Sometimes it’s just about the data, and the best way of getting at that data is as raw as possible. Government data is an easy sell here, as it is (or rather should be) our data. It’s also for the most part interesting to read rather than write (historical council tax data, or population data for instance). Raw data can generally be provided quicker than via an API. It doesn’t need fragile computer systems or extensive manual labour. It doesn’t need particularly clever computing resources. Just upload a spreadsheet or a CSV file to a sensible URL on a known, regular basis and away we go.

And giving data like this away to the development community is likely to have a few additional benefits if that data is useful (it probably is to someone). We’ll happily write software libraries, or create APIs over the top of it for you. We’ll also write all sorts of useful tools using the data in ways no one else thought of. So if you’re sat on a load of data that’s not core to your business, or is meant to be public anyway, then lets start talking publicly about how to just get this out on the web quickly and cheaply, rather than spending lots of your time and money on something fancy.

Thoughts on the whole XHTML/HTML5 affair

I wasn’t going to write anything about the whole XHTML2 thing. I noted its passing, got a nice message on Twitter and thought that would be it. But no. The web standards world exploded. I honestly didn’t see that coming.

Let’s get a few things straight:

  • I use XHTML 1.0 for this site. In fact I’ve been using it for the majority of things for most of my professional life.
  • I don’t serve content with an XML mime type. Neither does anyone else. It’s a complete non issue. Ignore it.
  • At my last job we used HTML 4. It meant I had to remember not to close my image elements, which bugged me, but not too much. I still quoted everything. Closed everything I could. And only used lowercase element names.
  • My latest two pet projects are using HTML5. I’m still closing everything (including image elements, yay), quoting everything and lowercasing everything.

Web Standards are interesting, in that they are standards for both implementors (browser makers) and for authors (us). I like coding standards in programming languages too, it’s one of the things I love about Python and PEP8. But with these standards it’s not about making your code work, it’s about shared conventions and readability. So common spacing, UPPERCASE for constants and Leading caps for class names for instance. It’s also about having a tool to check everyone is adhering to standards, like pep8.py or FXCOP for .NET. If everyone writes code in the same way it’s easier to read, write and to pick up someone else’ code. You can do that with HTML, but you have to do that with XHML.

Now the whole HTML 4.0 vs XHTML 1.0 thing has come up lots of times, on mailing lists, at conferences as well as down the pub. I know on occasion me, Drew, Rachel and Jeremy side against Simon and Nat on the issue. But what’s interesting is that I think we all agree on all the typographical conventions stuff. My former colleagues with a passion for front end standards and HTML 4 did the same thing. I even remember Simon looking for ways to validate against HTML 4 but also to check for all lower case elements, closed paragraphs and the like.

Which brings me to the reason why I use XHTML: The validator enforces my preferred coding standards for HTML - lowercase elements, quoted attributes and closed elements. That’s it. Not much really. I know it’s marketing XHTML rather than technical XHTML. I don’t care. Or rather I do care, I just make a conscious pragmatic decision based on a small personal advantage. I’m both pedantic and like having a tool chain which enforces that, XHTML suits my style.

The markup language debate is being talked about in terms of pragmatists vs purists. But ignoring the people who both really understood and really wanted XHTML2, it’s mainly the pragmatists arguing amongst themselves now. Some of them are big company people, others working for themselves. Some have standards or academic leanings, others are rooted in commercial web design. Some people probably work on huge long term projects, others relatively small sites and apps. And I think it’s these cultural differences that are the root of arguments now. So blog posts coming out saying the same thing but arguing with other people give a strange impression of disagreement. Throw in that the web lends itself to popular blogs gathering a crowd of like-minded people around them and hey presto we have people feeling unfairly put upon and getting agitated.

What a storm in a teacup. Who doesn’t genuinely think the best approach is to use whatever you’re using now for most projects, investigate HTML5 as time permits, and then expect to start using HTML5 in bits and pieces in the short to medium term, when being mainly dependent on your target audience?

In my opinion the only genuine problem that this saga has highlighted is the fear, uncertainty and doubt around all flavours of HTML amongst a large number of web professionals. People don’t get this stuff at all. With the added resources soon to be put into the HTML5 working group at the W3C this outreach and education side of the project has to have just as much love and attention as the spec itself.

Pants Python Code

One of the projects that came out of the Django Dash recently was PyPants which I’m finding very cool.

Urltest on PyPants

It’s basically a quality tracking service for Python modules. For instance my recent UrlTest module has a page on PyPants, scoring a good B grade after some cleanup work earlier today.

Under the hood I think it’s probably CheeseCake which is available as a command line application, maybe with a hint of PyLint and pep8.py thrown in. But the nice interface, as well as tracking of scores over time, really add something. GitHub has been credited by some as making sharing code more fun, I’m hoping projects like PyPants can do the same for quality in Python code.

Congrats to Eric Holscher, Travis Cline, and Nathan Borror on a fantastic addition to the Python community.

Urltest on PyPi

I’ve been meaning to add some of my code to the Python Package Index for a while and have finally gotten around to it with Urltest, my simple DSL for testing WSGI apps.

You can now find it at pypi.python.org/pypi/urltest and install it using setuptools with:

pre. easy_install urltest

At the moment I’ve not added any categorisation or detailed description to the setup.py file, I’ll be doing that soon. I wanted to get it working with the absolute minimum setup.py file, which turned out to look like:

pre. #!/usr/bin/env python from setuptools import setup, find_packages setup( name = “urltest”, version = “0.1”, author = “Gareth Rushgrove”, author_email = “[email protected]”, url = “http://github.com/garethr/urltest", packages = find_packages(‘src’), package_dir = {”:‘src’}, )

Uploading it to PYPI itself was incredibly simple, partly as I was already using setup tools for local installation.

pre. python setup.py register python setup.py bdist_egg upload

Let me know if anyone uses this and gets it working. I’ll be adding more details and maybe even some more features when I get the chance. Once I do that I’ll probably work on a few more packages as well.

Ant for Web Developers II - Restart Apache

Following on from yesterdays first useful ant task, here’s another commonly used task - restarting a remote service. I’ve used apache in this example, but it could be any service running on your remote machine and it doesn’t have to be the restart command.

In order to do this we’ll use the sshexec target which has a third party library dependency. This is the same third party library needed for the scp task in yesterdays post

You need first to download JSCH and then compile the source using ant. Just run ant dist in the downloaded folder and you should get a .jar file in the /dist/lib folder. Save this .jar file as as jsch.jar to a folder in your home directory ~/.ant/lib where ant can automatically load it. Alternatively you can run ant with the -lib command to load libraries from a different location.

pre. ant -lib /tools/ant-libraries

With that out of the way lets have a look at the task.

pre.

<?xml version=“1.0” encoding=“UTF-8”?>

And running it is as simple as:

pre. ant restart-apache

One potential issue with tasks like this is storing the password in the build file in plain text. The target we’re using can also use key authentication is you’re happy using ssh keys. Alternatively you can set properties on the command line each time you run ant like so.

pre. ant restart-apache -Dpassword={password}

Ant for Web Developers I - Backup Config File

I occasionally get carried away with Apache Ant. For those that haven’t come across it, Ant is a build tool written in Java, using an XML syntax to describe a series of repeatable tasks. In your typical web standards savvy, dynamic language favouring, web developer types that description is probably all they (think they) need to know. It’s Java. It’s XML. It’s only really useful in the context of building software (dull).

But I think Ant is a particularly handy tool to have around for anyone working on even simple websites. A couple of strong use cases come to mind:

  • If you’re working in a team environment then build files are hugely useful when introducing new people to the team, or when moving people around. Getting code up and running at the start of a project, or if you join the team part way through, can be tricky. A well written build file can automate this.
  • Even when working on projects on my own I tend to write simple build files. The main reason is so I don’t forget how to do something. How do you deploy this particular site? How do you run the test suite or generate the documentation? Build files can encapsulate this, and rather than documentation that might be out of date the build file will be executed to do that job in question.

So with all that in mind I’m going to try and do a series of posts each covering a single task, aiming to cover things that your regular web developer will find useful. With that in mind if anyone has any requests or questions let me know either by email or in the comments.

Out first task lets us backup a file from our remote web server, in this case it’s the apache2.conf file used to setup apache. Obviously it could be any file you want to get hold of. The example below has a couple of properties for the username and hostname of the remote machine. Save the following snippet into a file called build.xml and place it anywhere you like on your machine.

pre.

<?xml version=“1.0” encoding=“UTF-8”?>

Running the task, once you have ant installed (it comes already installed on OS X and is generally available in whatever linux package management system you prefer), is as simple as typing the following into a console.

pre. ant backup-apacheconf

This should download the apache2.conf file to you local machine, into the same directory as your build file.

The above task requires that you have scp installed on your machine, which is pretty likely if you’re using OS X or Linux. Ant comes with an inbuilt scp task, but it requires you to install a separate java library. If you’re happy doing that then you can write tasks like:

pre.

<?xml version=“1.0” encoding=“UTF-8”?>