Support for Rev=Canonical

There has been lots of talk recently about URL shortening. Services like TinyURL have been around for a good while, offering shortened versions of URLs like tinyurl.com/dd7w2m which are easier to put in a tweet or an email. The problem with this is that not only does the shorter version mask any information about the destination, but if TinyURL or one of the other shortening services goes away, or loses control of it’s domain name, a large number of links are going to stop working the way they should.

Kellan in particular has been proposing some simple steps that might get us out of this hole. You can read more about the ideas behind using Rev=Canonical and try out the future (maybe) of these services at revcanonical.appspot.com.

The nicest thing from my point of view about this idea is how simple it is to implement. This blog is running a custom Django based blogging engine called Train.

The posts on this site exist at urls like the following: morethanseven.net/2009/04/04/mixing-it-programming-language-choice/. With only a small view function, a change to a template and the addition of a url this blog should now work with Kellans new url shortener.

I decided to use the ids for the articles on the blog as the key for the short versions. So if you were to visit morethanseven.net/284 you would get the article above. I decided to issue a redirect from the short version to the long version in the end rather than serve duplicate content with the canonical link, not sure which way is probably best though.

The markup for each article contains the required link in the head of the document:

pre.

And the django view looks something like this:

def tiny(request, id):
    "Provide tiny urls based on ids for articles"
    # get the article or throw a 404
    article = get_object_or_404(Article, id=id, status='live')
    url = article.get_absolute_url()
    # redirect to the relevant
    return HttpResponsePermanentRedirect(url)

All in all, incredible simply to implement, especially in something like Rails or Django which make this sort of wire up urls to view stuff easy. So what’s stopping you adding this to you site or current project? If enough people just do it we can make the web a slightly better place in reasonably short order.

Mixing it Up - Programming Language Choice

So the Register article about Twitter seems to have kicked over yet another Ruby/Rails doesn’t scale debate - mainly it seems from people who haven’t read any of the back story or the real meat of the story. For anyone catching up I’d suggest reading this recent interview with three of the Twitter developers. Ikai Lan made some particular good points about people who don’t RTFM and the comments are well worth reading too. Tony Arcieri, of Reia fame, took another approach and wondered why non of the open source message queues every got a look in

What it really all comes down to is that Twitter are using more than one language to write their systems in. What I don’t understand is why this is a shock to anyone, or why it’s a bad think? Google appear to use Java and Python for most things. Yahoo! use Java and PHP (and C and Perl I think?). Microsoft use VB and C/C**/C# and probably F#. At work we use .NET and Python for different things.

Big companies have been using multiple languages and platforms for good and bad reasons for ever. Sometimes it’s about legacy systems, but often it’s about using the right tool for the job. I think lots of people jumping in on the debate do so from a point of view that everyone uses one language for everything, mainly because most personal projects or small agency style jobs do just that. Why overcomplicate smaller projects with the need for people to know more than one language? In a small general purpose team it’s also going to make recruiting and getting people up to speed much harder.

But for startups doing interesting things it’s potentially both more efficient and more interesting to use multiple platforms. I think Dopplr might be mainly Ruby with a smattering of Erlang, all built around an ApacheMQ message queue, and Matt has talked about that architecture at various conferences without being called out on it.

So for your next personal project why not pick a handy message queue (personally I like RabbitMQ and StompServer), and at least two complimentary languages and see how it changes the architecture of what you build? Mix PHP with an Erlang backend or go for Twitters Ruby/Scala mix. It might very well be overkill for that blog or todo list application you had in mind, but it just might teach you more about picking the right tools for the job when you come across non-trivial problems.

Google Search New Features? Timeline Search and Wonder Wheel

Looks like I’m being experimented on. I just got a strange Web button appearing on a search today and decided to click it. It revealed a host of new Google features (at least I’ve never seen em before), including various filters and visualisation tools.

The entertainingly named Wonder Wheel was my first click. It’s a visualisation which shows related search terms. I’d love to have access to that data via an API as well.

Example of Wonder Wheel search on Google

The timeline view, and time based filters, look more useful for most search activities. Just getting hold of recent content on a particular search term is the sort of thing blog search engines have sended to do well in the past and that Twitter search has been showing off.

I hope I keep these features during whatever testing is going on and that they all go live soon. It’s a pretty big improvement if you ask me.

Github Links to Lines of Code

Just saw this and thought it was cool. You can link to a specific line, or set of lines on GitHub. All you need to do is append something like #L17-24 to specify highlighting lines 17 to 24.

Hacker Posts

I’ve been playing again with App Engine, and going back to an on/off pet project that I’ve build variations of for a while.

Hacker Posts

It’s basically a pretty straightforward aggregation platform, taking content from a number of feeds and creating relationships between the items. It’s mainly an experiment in creating a decent size site on App Engine - it can be surprising how many urls you can get out of a good corpus of data:

  • 400 items, each with a unique url
  • each item has a variable number of tags generated, at the moment that amounts to about 1000 unique urls
  • each month and year of content gets a url so that’s another 4 or so
  • each day for which we have items we also get a url, so that’s another 90 or so

So starting off with 1500 or so pages isn’t bad. The site also grows over time as new items are posted or I add more feeds. Everything else I’ve done with App Engine has been more application focused so seeing how a content site performs is interesting in and of itself.

The data in question I used for this first experiment was the feeds I could find from the top posters on Hacker News. Hence the name Hacker Posts. That left me with 35 feeds:

I have a couple of other communities or events that I’d like to do the same thing around, as well as a few features I want to add to the underlying software. The nice thing with App Engine is rolling new instances out is as simple as running a command.

Webapp custom filters

The webapp framework wish ships with App Engine uses the Django templating system by default, but without Django apps doesn’t support the same mechanism for loading template tags and filters. This is how to do it though using a few webapp.template methods.

Simple WSGI Middleware (for App Engine)

WSGI is the Web Server Gateway Interface. It is a specification for web servers and application servers to communicate with web applications (though it can also be used for more than that). It is a Python standard, described in detail in PEP 333.

For Ruby people WSGI is the Rack in Python. In fact it was one of the inspirations behind Rack. Rack descriptions itself as:

Rack provides an minimal interface between webservers supporting Ruby and Ruby frameworks.

Which I think is a clearer explanation, except in WSGI’s case we replace Ruby with Python.

As well as being able to write WSGI middleware for Django or Pylons we can also write WSGI middleware for App Engine applications - which is what I spent some time doing today. For the most part I found the examples and documentation interesting but overkill for what I needed to do. Specifically I wanted a piece of middleware which modified the response content, adding extra content into the response. Most of the examples I found didn’t focus on middleware, or where full blown examples making them hard to follow.

So for anyone looking for a simple example of WSGI middleware which adds content into the response here goes. I used the WebOb framework because it provides a nicer interface to the request and response objects and it’s included in the standard App Engine SDK. The following sample middleware simple adds Hello World to the end of every response.

pre. from webob import Request class SimpleMiddleware(object): “Example middleware that appends a message to all 200 html responses” def init(self, app): self.app = app def call(self, environ, start_response): # deal with webob request and response objects # due to a nicer interface req = Request(environ) resp = req.get_response(self.app) # add a string to the end of the body body = resp.body + “Hello World” # set the body to the new copy resp.body = body return resp(environ, start_response)

In reality you might want to append something to a specific place in the response, or introduce conditionals. This is easy enough to do by parsing the initial value of resp.body in the example above.

To use the middleware in your application you simple wrap your current WSGIApplication instance with the middleware class.

pre. application = webapp.WSGIApplication(ROUTES, debug=settings.DEBUG)

  1. add simple middleware application = SimpleMiddleware(application) run_wsgi_app(application)

WSGI middleware is both a useful place for common functionality to live in your App Engine application as well as being a handy tool for anyone working across multiple Python frameworks to share code.

XMPP and Queues in App Engine via Jaiku? Not quite yet

So JauikuEngine, the open source, App Engine based, version of Jaiku is now available for everyone to look at. I found the repo a couple of days ago but it was restricted to project members. The main reason I want to hunt through the code is to have a look at what I’m guessing will be API’s available in a soon to be released version of App Engine - with specific interest in anything to do with XMPP, queues and offline processing.

Well it looks like I’m out of luck for the moment at least. In the settings file I found the following two snippets though:

pre. #

  1. XMPP / IM #
  2. Enabling IM will require a bit more than just making this True, please
  3. read the docs at http://code.google.com/p/jaikuengine/wiki/im_support IM_ENABLED = False
  4. This is the id (JID) of the IM bot that you will use to communicate with
  5. users of the IM interface IM_BOT = ‘[email protected]
  6. Turn on test mode for IM IM_TEST_ONLY = False
  7. JIDs to allow when testing live XMPP so you don’t spam all your users IM_TEST_JIDS = []

And another for queues:

pre. #

  1. Task Queue #
  2. Enabling the queue will allow you to process posts with larger numbers
  3. of followers but will require you to set up a cron job that will continuously
  4. ping a special url to make sure the queue gets processed QUEUE_ENABLED = True
  5. The secret to use for your cron job that processes your queue QUEUE_VENDOR_SECRET = ‘SECRET’

The only problem appears to be that the page referenced, code.google.com/p/jaikuengine/wiki/im_support, currently says:

TODO (termie): describe how to get IM working with Jaiku Engine

So termie, or anyone else on the inside, I’d love to know how to get this up and running?

I’ll be having a better look through the code when I get a chance. This was just the first thing I jumped on before heading out the door. I love Open Source.

App Engine Remote API calls

Not sure how I missed this but apparently App Engine (as of 1.1.9) supports remote access to your live data store. This means you can create administration applications more easily by running them locally, rather than within the limitations of the live platform. You can even run a local python prompt with access to your live datastore which is pretty neat.

Services Vs Applications: Does Rails Encourage SOA Better Than Django?

Building larger applications tends to mean splitting your codebase up some how into manageable chunks. I’m quite interested in what I see as different approaches in the Rails and Django communities:

Django tends to recommend building Reusable Apps and we have sites like Django Pluggables to catalog what’s available. You then grab a few of these applications from the web or write your own, add them run them all together as part of a single application. Pinax is probabaly the poster child for this approach. The 0.5.1 release for instance appears to have 41 individual reusable apps, many written by other people and projects.

The Rails community tends to talk more about RESTful service orientated architectures, with things like ActiveResource making this sort of thing easier. So rather than your manageable chunks being within your application they’re separate instances in their own right.

I’d be interested in hearing from more people about their experiences, in particular if you’ve gone against the grain so to speak.