Testing Vagrant runs with Cucumber

I’ve been a big fan of Vagrant since it’s initial release and still find myself using it for various tasks.

Recently I’ve been using it to test collections of Puppet modules. For a single host vagrant-serverspec is excellent. Simply install the plugin, add a provisioner and write your serverspec tests. The serverspec provisioner looks like the following:

config.vm.provision :serverspec do |spec|
  spec.pattern = '*_spec.rb'
end

But I also found myself wanting to test behaviour from the host (serverspec tests are run on the guest), and also wanted to write tests that checked the behaviour of a multi-box setup. I started by simply writing some Cucumber tests which I ran locally, but I decided I wanted this integrated with vagrant. Enter vagrant-cucumber-host. This implements a new vagrant provisioner which runs a set of cucumber features locally.

config.vm.provision :cucumber do |cucumber|
  cucumber.features = []
end

Just drop your features in the features folder and run vagrant provision. If you just want to run the cucumber features, without any of the other provisioners running you can use:

vagrant provision --provision-with cucumber

Another advantage of writing this as a vagrant plugin is that it uses the Ruby bundled with vagrant, meaning you just install the plugin rather than faff about with a local Ruby install.

A couple of other vagrant plugins that I’ve used to make the testing setup easier are vagrant-hostsupdater and vagrant-hosts. Both help with managing hosts files, which makes writing tests without knowing the IP addresses easier.

Buy vs Build your Monitoring System

At the excellent London Devops meetup last week I asked what was apparently a controversial question:

should you just use software as a service monitoring products rather than integrate lots of open source tools?

This got a few people worked up and I promised a blog post.

Note that I wrote a post listing lots of open source monitoring tools not that long ago. And I’ve been to both the Monitorama events about open source monitoring. And have a bunch of Puppet modules for open source monitoring tools. I’m a fan of both open source and of open source monitoring. Please don’t read this as an attack on either, and particularly on the work of awesome people working on great open source monitoring products.

Some assumptions

  1. No one product exists that does everything. I think this is true for SaaS as much as for open source.
  2. Lets work with about 200 hosts. This is a somewhat arbitrary number I know, some people will have more and others less.
  3. If it saves money we’ll pay yearly, rather than monthly or hourly.
  4. We could probably get some volume discounts from some of the suppliers, but we’ll use list prices for this post.

Show me the money

So what would it cost to get up and running with a state of the art software as a service monitoring system? In order to do this we need to choose our software. For this post that means I’m going to pick products I’ve used (sometimes only a bit) and like. This isn’t a comprehensive study of all the alternatives I’m afraid - though feel free to write your own alternative blog posts.

  • New Relic provides a crazy amount of data about the running of both your servers and your applications. This includes application performance data, errors, low level metrics and even rolled up method or database query performance. $149 per host per month for our 200 hosts gives us $29,800 per month.

  • Librato Metrics provides a fantastic way of storing arbitrary time series data. We’re already storing lots of data in New Relic but Metrics provides us with less opinionated software so we can use it for anything, for instance number of logins or searches or other business level metrics. We’ll go for a plan with 200 data sources, 100 metrics each and at 10 second resolution for a cost of $3,860 per month.

  • Pagerduty is all about the alerts side of monitoring. Most of the other SaaS tools we’ve chosen integrate with it so we can make sure we get actionable emails and SMS messages to the right people at the right time. Our plan costs $18 per person per month, so lets say we have 30 people at a cost of $540 per month.

  • Papertrail is all about logs. Simple setup your servers with syslog and Papertrail will collect, analyze and store all your log messages. You get a browser based interface, search tools and the ability to setup alerts. We like lots of logs so we’ll have a plan for 2 weeks of search, 1 year archive and 100GB month of log traffic. That all costs $575 per month.

  • Sentry is all about exceptions. We could be simply logging these and sending them to Papertrail but Sentry provides tools for tracking and rolling up occurences. We’ll go for a plan with 90 days of history and 200 events per minute at a cost of $199 a month.

  • Pingdom used to provide a very simple external check service, but now they have added more complex multistage checks as well as real user monitoring to the basic ping. We’ll choose the plan with 250 checks, 20 Real User Monitoring sites and 500 SMS alerts for $107 a month.

How much!

In total that all comes to $35,080 (£20,922) per month, or $420,960 (£251,062) per year.

Now the first reaction of lots of people will be that’s a lot of money and it is. But remember open source isn’t free either. We need to pay for:

  • The servers we run our monitoring software on
  • The people to operate those servers
  • The people to install and configure our monitoring software
  • The office space and other costs of employing people (like management and hiring)

I think people with the ability to build software tend to forget they are expensive, whether as a contractor or as a full time member of staff. And people without management experience tend to forget costs like insurance, rent, management overhead, recruitment, etc.

And probably more important than these for some people we need to consider:

  • The time taken to build a good open source monitoring system

The time needed to put together a good monitoring stack based on for instance logstash, kibana, riemann, sensu, graphite and collectd isn’t small. And don’t forget the number of other moving parts like redis, rabbitmq and elasticsearch that need installing configuring and maintaining. That probably means compromising in the short term or shipping later. In a small team how core is building your monitoring stack to what you do as a business?

But I can’t use SaaS

For some people, using a software as a service product just isn’t going to cut it. Here’s a list of reasons I can think of:

  • Regulation constrains where your data can be stored, for instance it’s not allowed out of the country
  • Sheer size of infrastructure, although you may be able to get a volume discount it might not be enough

I think everything else is a cost/benefit issue or personal preference (or bias). Happy to add more to that list, but I don’t think it’s a very long list.

Conclusions

I’ve purposefully not talked about the quality of the tools here, just the cost. I’ve also not mentioned that it’s likely not an all or nothing decision, lots of people will mix SaaS products and open source tools.

Whether taking a SaaS approach will be quicker, cheaper or better will depend on your specific business context. But try and make that about the organisation and not about the technology.

If you’ve never used the current crop of SaaS monitoring tools (and not just the one’s mentioned above) then I think you’re missing out. Even if you stick with a mainly open source monitoring stack you might look at your tools a bit differently after you’ve experimented with some of the commercial competition.

A template for Puppet modules

A little while ago I published a template writing your own puppet modules. It’s very opinionated but comes out of the box with lots of the tools you eventually find and add to your tool box. I’m posting this as it came up at the recent Configuration Management Camp and after discussing it I realised I hadn’t actually wrote anything about it anywhere.

What do you get?

  • A simple install, config, service class pattern
  • Unit tests with rspec-puppet
  • Rake tasks for linting and syntax checking
  • Integration tests using Beaker
  • A Modulefile to provide Forge metadata
  • Command line tools to upload to the Forge with blacksmith
  • A README based on the Puppetlabs documentation standards
  • Travis CI configuration based on the official Puppetlabs support matrix
  • A Guardfile which can run all the tests when you change manifests

Obviously you can choose not to use parts of this, or even delete aspects, but I find that approach much quicker than starting from scratch or copying files from previous modules and changing names.

How can I use it?

Simple. The following will install the module skeleton to ~/.puppet/var/puppet-module/skeleton. This turns out to be picked up by the Puppet module tool.

git clone https://github.com/garethr/puppet-module-skeleton 
cd puppet-module-skeleton
find skeleton -type f | git checkout-index --stdin --force --prefix="$HOME/.puppet/var/puppet-module/" --

With that in place you can then just run the following to create a new module, where puppet-ntp is the name of our new module.

puppet module generate puppet-ntp

We use puppet module like this rather than just copying the files because otherwise you would have to rename everything from class names to test assertions. The skeleton actually contains erb templates in places, and running puppet module generate results in the module name being available to those templates.

Now what?

Assuming you have run the above commands you should have a folder called puppet-ntp in your current directory. cd into that and then install the dependencies:

bundle install

Bundler is a dependency manager for Ruby. If you don’t already have it installed you should be able to do so with the following:

gem install bundler

Now you have the dependencies why not run the full test suite? This checks syntax, lints the Puppet code and runs the unit tests.

bundle exec rake test

Unit tests give fast feedback and help make sure the code you write is going to do what you intend, but they aren’t actually applying the manifests to a real machine. For that you want an integration test. You’ll need Vagrant installed for this next step. Lets run those as well with:

bundle exec rspec spec/acceptance

This will take a while, especially the first time. This uses Beaker to download a virtual machine from Puppetlabs (if you don’t already have it) and then brings up a new machine, applies a simple manifest, runs the acceptance tests and then destroys the machine.

The CONTRIBUTING.md file has more information for running the test suite.

What’s new?

I’ve recently added a Guardfile to help with testing. You can run this with:

bundle exec guard

Now in a separate tab or pane make a change to any of the manifests. The tests should run automatically in the tab or pane where guard is running.

Can you add this new tool?

Probably. Although I started the repo a few other people have contributed code or made improvements already. Just sent a pull request or open an issue.

Code coverage for Puppet modules

One of my favourite topics for a while now has been infrastructure as code. Part of that involves introducing well understood programming techniques to infrastructure - from test driven design, to refactoring and version control. One tool I’m fond of (even with it’s potential to be misused) is code coverage. I’d been meaning to go code spelunking to see if this could be done for testing Puppet modules.

The functionality is now in master for rspec-puppet and so anyone feeling brave can use it now, or if you must wait for the 2.0.0 release. The actual implementation is inspired by the same functionality in ChefSpec written by Seth Vargo. Lots of the how came from here, and the usage is very similar.

How to use it?

First add (or hopefully change) your Gemfile line item for rspec-puppet to the following:

gem "rspec-puppet", :git => 'https://github.com/rodjek/rspec-puppet.git'

Then all you need to do is include the following line anywhere in a spec.rb file in your spec directory.

at_exit { RSpec::Puppet::Coverage.report! }

What do I get?

Here’s an example module, including a file called coverage_spec.rb. When running the test suite with rake spec you now get coverage details like so:

Total resources:   24
Touched resources: 8
Resource coverage: 33.33%

Untouched resources:
  Class[Nginx]
  File[preferences.d]
  Anchor[apt::update]
  Class[Apt::Params]
  File[sources.list]
  Exec[Required packages: 'debian-keyring debian-archive-keyring' for nginx]
  Anchor[apt::source::nginx]
  Class[Apt::Update]
  File[configure-apt-proxy]
  Apt::Key[Add key: 7BD9BF62 from Apt::Source nginx]
  Anchor[apt::key/Add key: 7BD9BF62 from Apt::Source nginx]
  Anchor[apt::key 7BD9BF62 present]
  File[nginx.list]
  Exec[apt_update]
  File[sources.list.d]
  Exec[e407f76c6e349fc397947a4a49260a9320196cb1]

Here’s the output on Travis CI as well for a recent build.

Why is this useful?

I’ve already found coverage useful when writing tests for a few of my puppet modules. The information about the total number of resouces is interesting (and potentially an indicator of complexity) but the list of untouched resources is the main useful part. These represent both information about what your module is doing, and potential things you might want to test.

I’m hoping to find some more time to make this even better, providing more information about untouched resources, adding some configuration options and hopefully to integrate with the Coveralls API.

Shell provisioner for Test Kitchen

As of a few weeks ago Test Kitchen has a shell provisioner as well as the original Chef provisioners. This opens up all sorts of interesting testing potential.

If you’ve not already seen Test Kitchen, probably because you’re not using Chef, it’s a tool for integration testing infrastructure code. Configured by a simple YAML file it will setup a matrix of virtual machines, using Virtualbox, AWS, OpenStack and more, run some setup code (normally applying Chef recipes) and then run a test suite (with support for Bats, ShUnit2, Rspec and Serverspec). It’s all very pluggable. With the addition of the shell provisioner it’s useful to just about anyone. To try and prove that here’s a hello world style example.

Dependencies

First we need to install Test Kitchen. We’ll use vagrant and virtualbox for our example too so we need a few extra dependencies. I’m going to assume you have bundler installed, if not you may be able to do so with gem install bundler but as the number of ways of setting a ruby environment up is greater than the number of people on the planet I’ll have to defer to instructions elsewhere for getting that far.

First create a file called Gemfile with the following contents:

source "https://rubygems.org"

gem "test-kitchen", :git => "https://github.com/test-kitchen/test-kitchen.git"
gem "kitchen-vagrant"
gem "vagrant-wrapper"

Then run:

bundle install

This should install the above software. Note that the shell provisioner is not yet in an official release so where installing direct from GitHub for the moment.

Configuration

Next we’ll tell Test Kitchen what we want to do. As much for demonstration purposes I’m going to grab one of the Puppetlabs boxes. This is just plain Vagrant so feel free to substitude the box and box_url for alternatives you already have installed locally. Otherwise the first run will take a little longer as it downloads a large file.

Pull all of the following in a file called `.kitchen.yml’.

---
driver:
  name: vagrant

provisioner:
  name: shell

platforms:
  - name: puppet-precise64
    driver_config:
      box: puppet-precise64
      box_url: http://puppet-vagrant-boxes.puppetlabs.com/ubuntu-server-12042-x64-vbox4210.box

suites:
  - name: default

The shell provisioner is going to look for a file called bootstrap.sh by default. You can overide this but we’ll leave it for the moment. Our bootstrap script is going to do something very simple, install the ntp package. But the important part is it could do anything; run Salt, run Ansible, run Puppet, execute any arbitrary code we choose. In this case our script is completely self contained but if it needed some additional files we could put them in a directory called data and they would be copied to the newly created virtual machine under /tmp/kitchen.

#!/bin/bash

apt-get install ntp -y

Tests

The last step is to write a test. I’m suddently finding lots of excuses to use Serverspec so we’ll use that, but if you prefer you can use pretty much anything. The following file should be saved as test/integration/default/serverspec/ntp_spec.rb. Note the default in the path which matches our suite above in the .kitchen.yml file. Test Kitchen allows for multiple suites all with separate tests based on a strong set of file path conventions.

require 'serverspec'

include Serverspec::Helper::Exec
include Serverspec::Helper::DetectOS

RSpec.configure do |c|
  c.before :all do
    c.path = '/sbin:/usr/sbin'
  end
end

describe package('ntp') do
  it { should be_installed }
end

describe service('ntp') do
  it { should be_enabled }
  it { should be_running }
end

Running the tests

With all of that in place we’re ready to run our tests.

bundle exec kitchen test

This should:

  • download the virtual machine image if you don’t already have it locally
  • create a new virtual machine based on the image
  • run the bootstrap.sh script
  • run our serverspec test suite

The real power comes from doing this iteratively as you work on code, probably code more complex than a simple one-line bash script. You can also test across multiple virtual machines at a time, for instance different operating systems or different machine roles. The kitchen command line tool provides lots of help too, with the ability to login to machines, verify that specific combinations of platform and suite are working and print lots of diagnotic information to aid development.

Hopefully this will make it into a release soon, and we’ll see more involved examples using higher level tools and more documentation. But even now I’d be looking at Test Kitchen for any infrastructure testing you might be doing.

Testing Packer created images with serverspec

Packer provides a great way of describing the steps for creating a virtual machine image. But it doesn’t have a built-in way of verifying those images.

Serverspec provides a nice framework for writing tests against infrastructure, asserting the operation of services or the installation of packages.

I’m interested at the moment in building continous delivery pipelines for infrastructure components and have a simple working example of testing Packer with Serverspec on Github. The example uses the AWS builder and the Puppet provisioner but the approach should work with other combinations.

This doesn’t represent a complete infrastructure pipeline, but it does demonstrate an approach to automating one particular component - building base images.

Testing

In our example I’m using the Puppetlabs NTP module to install and configure NTP. Once the Puppet provisioner has run, but before we build the AMI (or other virtal machine image) we run a test suite. For our example the tests are pretty simple:

describe package('ntp') do
  it { should be_installed }
end

describe service('ntp') do
  it { should be_enabled   }
  it { should be_running   }
end

If the tests fail, Packer will stop and the AMI won’t be built. The combination of storing the code (Packer template) alongside a test suite (Serverspec) and building a new AMI whenever you change the code, makes this setup perfect for continuous integration.

Wercker builds

As an example of a continuous integration setup the repository contains a wercker.yml configuration file for the excellent Wercker service. Wercker makes setting up multi-step built pipelines easy and nicely configurable via a simple text file in your repository.

The Wercker build for this project is public. Currently the build involves downloading Packer, running packer validate to check the template and eventually running packer build to boot an instance and run our serverspec tests.

Making the web secure, one unit test at a time

Originally written as part of Sysadvent 2013.

Writing automated tests for your code is one of those things that, once you have gotten into it, you never want to see code without tests ever again. Why write pages and pages of documentation about how something should work when you can write tests to show exactly how something does work? Looking at the number and quality of testing tools and frameworks (like cucumber, rspec, Test Kitchen, Server Spec, Beaker, Casper and Jasmine to name a few) that have popped up in the last year or so I’m obviously not the only person who has a thing for testing utilities.

One of the other things I am interested in is web application security, so this post is all about using the tools and techniques from unit testing to avoid common web application security issues. I’m using Ruby in the examples but you could quickly convert these to other languages if you desire.

Any port in a storm

Lets start out with something simple. Accidentally exposing applications on TCP ports can lead to data loss or introduce a vector for attack. Maybe your main website is super secure, but you left the port for your database open to the internet. It’s the server configuration equivalent of forgetting to lock the back door.

Nmap is a tool lots of people will be familiar with for spanning for open ports. As well as a command line interface Nmap also has good library support in lots of languages so lets try and write a simple tests suite around it.

require "tempfile"
require "nmap/program"
require "nmap/xml"

describe "the scanme.nmap.org website" do
  file = Tempfile.new("nmap.xml")
  before(:all) do
    Nmap::Program.scan do |nmap|
      nmap.xml = file.path
      nmap.targets = "scanme.nmap.org"
    end
  end

  @open_ports = []
  Nmap::XML.new("scan.xml") do |xml|
    xml.each_host do |host|
      host.each_port do |port|
        @open_ports << port.number if port.state == :open
      end
    end
  end
end

With the above code in place we can then write tests like:

it "should have two ports open" do
 @open_ports.should have(2).items
end

it "should have port 80 open" do
 @open_ports.should include(80)
end

it "should have port 22 closed" do
 @open_ports.should_not include(22)
end

We can run these manually, but also potentially as part of a continuous integration build or constantly as part of a monitoring suite.

Run the Guantlt

We had to do quite a bit of work wrapping Nmap before we could write the tests above. Wouldn’t it be nice if someone had already wrapped lots of useful security minded tools for us? Gauntlt is pretty much just that, it’s a security testing framework based on cucumber which currently supports curl, nmap, sslyze, sqlmap, garmr and a bunch more tools in master. Lets do something more advanced than our port scanning test above by testing a URL for a SQL injection vulnerability.

@slow
Feature: Run sqlmap against a target
  Scenario: Identify SQL injection vulnerabilities
    Given "sqlmap" is installed
    And the following profile:
      | name       | value                                      |
      | target_url | http://localhost/sql-injection?number_id=1 |
    When I launch a "sqlmap" attack with:
      """
      python <sqlmap_path> -u <target_url> —dbms sqlite —batch -v 0 —tables
      """
    Then the output should contain:
      """
      sqlmap identified the following injection points
      """
    And the output should contain:
      """
      [2 tables]
      +-----------------+
      | numbers         |
      | sqlite_sequence |
      +-----------------+
      """

The Gauntlt team publish lots of examples like this one alongside the source code, so getting started is easy. Gauntlt is very powerful, but as you’ll see from the example above you need to know quite a bit about the underlying tools it is using. In the case above you need to know the various arguments to sqlmap and also how to interpret the output.

Enter Prodder

Prodder is a tool I put together to automate a few specific types of security testing. In many ways it’s very similar to Gauntlt; it uses the cucumber testing framework and uses some of the same tools (like nmap and sslyze) under the hood. However rather than a general purpose security framework like Gauntlt, Prodder is higher level and very opinionated. Here’s an example:

Feature: SSL
  In order to ensure secure connections
  I want to check the SSL configuration of my servers
  Background:
    Given "sslyze.py" is installed
    Scenario: Check SSLv2 is disabled
      When we test using the "sslv2" protocol
      Then the exit status should be 0
      And the output should contain "SSLv2 disabled"

    Scenario: Check certificate is trusted
      When we check the certificate
      Then the output should contain "Certificate is Trusted"
      And the output should match /OK — (Common|Subject

Alternative) Name Matches/ And the output should not contain “Signature Algorithm: md5” And the output should not contain “Signature Algorithm: md2” And the output should contain “Key Size: 2048”

    Scenario: Check certificate renegotiations
      When we test certificate renegotiation
      Then the output should contain "Client-initiated

Renegotiations: Rejected” And the output should contain “Secure Renegotiation: Supported”

    Scenario: Check SSLv3 is not using weak ciphers
      When we test using the "sslv3" protocol
      Then the output should not contain "Anon"
      And the output should not contain "96bits"
      And the output should not contain "40bits"
      And the output should not contain " 0bits"

This is a little higher level than the Gauntlt example — it’s not exposing the workings of sslyze that is doing the actual testing. All you need is an understanding of SSL certifcates. Even if you’re not an expert on SSL you can accept the aforementioned opinions of Prodder about what good looks like. Prodder currently contains steps and exampes for port scanning, SSL certificates and security minded HTTP headers. If you already have a cucumber based test suite (including one based on Gauntlt) you can reuse the step definitions in that too.

I’m hoping to build upon Prodder, adding more types of tests and getting agreement on the included opinions from the wider systems administration community. By having a default set of shared assertions about the expected security of out system we can more easily move onto new projects, safe in the knowledge that a test will fail if someone messes up our once secure configuration.

I’m convinced, what should I do next?

As well as trying out some of the above tools and techniques for yourself I’d recommend encouraging more security conversations in your development and operations teams. Here’s a few places to start with:

Introducing Hyde

Hyde is a brazen two-column Jekyll theme that pairs a prominent sidebar with uncomplicated content. It’s based on Poole, the Jekyll butler.

Built on Poole

Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Hyde here) includes the following:

  • Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
  • Mobile friendly design and development
  • Easily scalable text and component sizing with rem units in the CSS
  • Support for a wide gamut of HTML elements
  • Related posts (time-based, because Jekyll) below each post
  • Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)

Hyde features

In addition to the features of Poole, Hyde adds the following:

  • Sidebar includes support for textual modules and a dynamically generated navigation with active link support
  • Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via <body> classes
  • Eight optional color schemes, available via <body> classes

Head to the readme to learn more.

Browser support

Hyde is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.

Download

Hyde is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.

Thanks!

Looking into monitoring and logging tools

Originally published on Medium.

We have a bunch of internal mailing lists at work, and on one of them someone asked:

we’re looking into monitoring/logging tools…

I ended up writing a bit of a long reply which a few people found useful, so I thought I’d repost it here for posterity. I’m sure this will date but I think it’s a reasonable snapshot of the state of open source monitoring tools at the end of 2013.

Simply put, think about four elements and you won’t be far off on the technical front. Miss one and you’re probably in trouble.

  • logs
  • metric storage
  • metric collection
  • monitoring checks

For logs, some combination of syslog at one end and elasticsearch and Kibana at the other are probably the state of the open source art at the moment. The shipping around is more interesting; Logstash is improving constantly, Heka is an similar alternative from Mozilla, and Fluentd looks nice too.

For pure metrics it’s all about Graphite, which is both awesome and perilous. Not much else really competes in the open source world at present. Maybe OpenTSB (is you’re into a Hadoop stack.)

For collecting metrics on boxes I’d probably look at collectd or diamond both of which have pros and cons but work well. Statsd is also useful here for different types of metric collection and aggregation. Ganglia is interesting too, it combines some aspects of the metrics collection tools with an integrated storage and visualisation tool similar to Graphite.

Monitoring checks is a bit more painful. I’ve been experimenting with Sensu in hope of not installing Nagios. Nagios works but it’s just a bit ungainly. But you do need somewhere to write checks against metrics or other aspects of your system and to issue alerts.

At this point everyone loves dashboards, and Dashing is particularly lovely. Graphiti and Tasseo for Graphite are useful too.

For bonus points things like Flapjack and Reimann provide some interesting extra capabilities around alert control or real time monitoring respectively.

And for that elusive top of the class grade take a look at Kale, which provides anomaly detection on top of Graphite and Elasticsearch .

You might be thinking that’s a lot of moving parts and you’d be right. If you’re a small project running all of that is too much overhead, turning to something like Zabbix might be more sensible.

Depending on money/sensitivity/control issues lots of nice and not so nice commercial products exist. Circonus, Splunk, New Relic, Boundary and Librato Metrics are all lovely in different ways and provide part of the puzzle.

And that’s just the boring matter of tools. Now you get into alert design and other gnarly people stuff.

If you got this far you should watch all the Monitorama videos too.

Platform as a Service and the network gap

Originally published on Medium.

I’m a big fan of the Platform as a Service (PaaS) model of operating web application infrastructure. But I’m a much bigger user and exponent of Infrastructure as a Service (IaaS) products within my current role working for the UK Government. This post describes why that is, and hopefully helps anyone else inside other large enterprise organisations reason about the advantages and disadvantages, and helps PaaS vendors and developers understand what I personally thing is a barrier to adoption in that type of organisation.

A quick word of caution, I don’t know every product inside out. It’s very possible a PaaS product exists that deals with the problems I will describe. If you know of such a product do let me know.

A simple use case

PaaS products make for the very best demos. Have a working application? Deployment is probably as simple as:

git push azure master 

Your app has started to run slowly because visitors are flooding in? Just scale out with something like:

heroku ps:scale web+2

The amount of complexity being hidden is astounding and the ability to move incredibly quickly is obvious for anyone with experience of doing this in a more traditional organisation.

A not so simple use case

Even small systems are often being built out of many small services these days. Many large organisations have been up to this for a while under the banner of Service Orientated Architecture. I’m a big fan of this approach, in my view it moves operational and organisational complexity back into the development team where its impact can often be minimised by automation. But that’s a topic for another post.

In a PaaS world having many services is fine. We just have more applications running on the Platform which can be independently scaled out to meet our needs. But services need to communicate with each other somehow, and this is where our problems start. We’ll keep things simple here by assuming communication is over HTTPS (which should be pretty typical) but I don’t think other protocols make the problem I have go away. The same problem applies if you’re using a SaaS database for example.

It’s the network, stupid

Over what network does my HTTPS internal service call travel? The internet? The internal PaaS vendor’s network? If the latter, is my traffic travelling over the same network as other clients on the platform? Maybe I’m running my own PaaS in-house. But do I trust everyone else in my very large organisation and want my traffic on the same network as other things I don’t even know about? Even if it’s just me do I want internal service traffic mixing with requests coming from the internet? And are all my services created equally with regards what they can and cannot access?

Throw in questions like: is the PaaS supplier running on infrastructure provided by a public IaaS suppliers who you don’t have a relationship with and you start to question the suitability of the current public PaaS products for building secure service based systems.

A journey into Enterprise Architectures

You might be thinking, pah, what’s the worst that can happen? If you work for a small company or a shiny startup that might be completely valid. If on the other hand you’re working in a regulated environment (say PCI) or dealing with large volumes of highly sensitive information you’re very likely to have to build systems that provide layers of trust, and to be doing inspection, filtering and integrity checking as requests flow between those layers.

Imagine that I have a service dealing with some sensitive data. If I control the infrastructure (virtualised or not, IaaS provided or not) I’ll make sure that service endpoint isn’t available to anything that doesn’t need access to it via my network configuration. If I’m being more thorough I’ll filter traffic through some sort of proxy that does checking of the content; It should be JSON (or XML), it should meet this schema, It shouldn’t exceed this rate, it shouldn’t exceed this payload size or response size, etc. That is before anything even reaches the services application. And that’s on top of SSL and maybe client certificates.

If I don’t control the infrastructure, for example when running on a PaaS, I lose some of the ability to have the network protect me. I can probably get some of this back by running my own PaaS on my own infrastructure, but without awareness and a nice interface to that functionality at the PaaS layer I’m going to lose lots of the benefits of running the PaaS in the first place. It’s nice that I can scale my application out, but if new instances can’t connect to the required backend services without some additional network configuration that’s invisible to the PaaS what use is that?

The question becomes; how to implement security layers within existing PaaS products (without changing them). And my answer is “I don’t know”. Yet.

Why isn’t SSL enough?

SSL doesn’t help as much as you’d like to think here because if I’m an attacker what I’m probably going to attack is your buggy code rather than the transport mechanism. SSL doesn’t protect you from SQL injection or unpatched software or zero-day exploits. If the only thing that my backend service will talk to is my frontend application, an attacker has to compromise two things rather than just ignore the frontend and go after the data. Throw in a filter as described above and it’s really three things that need to be overcome.

The PaaS/IaaS interface

I think part of the solution lies in exposing some of the underlying infrastructure via the PaaS interface. IaaS is often characterised as compute, storage and network. In my experience everyone forgets the network part. In a PaaS world I don’t want to be exposed to storage details (I just want it to appear infinite and pay for what I use) or virtual machines (I just care about computing power, say RAM, not the number of machines I’m running on) but I think I do, sometimes, want to be exposed to the (virtual) network configuration.

Hopefully someone working on OpenShift or CloudFoundry or Azure or Heroku or DotCloud or insert PaaS here is already working on this. If not maybe this post will prompt someone to do so.