Shell provisioner for Test Kitchen

As of a few weeks ago Test Kitchen has a shell provisioner as well as the original Chef provisioners. This opens up all sorts of interesting testing potential.

If you’ve not already seen Test Kitchen, probably because you’re not using Chef, it’s a tool for integration testing infrastructure code. Configured by a simple YAML file it will setup a matrix of virtual machines, using Virtualbox, AWS, OpenStack and more, run some setup code (normally applying Chef recipes) and then run a test suite (with support for Bats, ShUnit2, Rspec and Serverspec). It’s all very pluggable. With the addition of the shell provisioner it’s useful to just about anyone. To try and prove that here’s a hello world style example.

Dependencies

First we need to install Test Kitchen. We’ll use vagrant and virtualbox for our example too so we need a few extra dependencies. I’m going to assume you have bundler installed, if not you may be able to do so with gem install bundler but as the number of ways of setting a ruby environment up is greater than the number of people on the planet I’ll have to defer to instructions elsewhere for getting that far.

First create a file called Gemfile with the following contents:

source "https://rubygems.org"

gem "test-kitchen", :git => "https://github.com/test-kitchen/test-kitchen.git"
gem "kitchen-vagrant"
gem "vagrant-wrapper"

Then run:

bundle install

This should install the above software. Note that the shell provisioner is not yet in an official release so where installing direct from GitHub for the moment.

Configuration

Next we’ll tell Test Kitchen what we want to do. As much for demonstration purposes I’m going to grab one of the Puppetlabs boxes. This is just plain Vagrant so feel free to substitude the box and box_url for alternatives you already have installed locally. Otherwise the first run will take a little longer as it downloads a large file.

Pull all of the following in a file called `.kitchen.yml’.

---
driver:
  name: vagrant

provisioner:
  name: shell

platforms:
  - name: puppet-precise64
    driver_config:
      box: puppet-precise64
      box_url: http://puppet-vagrant-boxes.puppetlabs.com/ubuntu-server-12042-x64-vbox4210.box

suites:
  - name: default

The shell provisioner is going to look for a file called bootstrap.sh by default. You can overide this but we’ll leave it for the moment. Our bootstrap script is going to do something very simple, install the ntp package. But the important part is it could do anything; run Salt, run Ansible, run Puppet, execute any arbitrary code we choose. In this case our script is completely self contained but if it needed some additional files we could put them in a directory called data and they would be copied to the newly created virtual machine under /tmp/kitchen.

#!/bin/bash

apt-get install ntp -y

Tests

The last step is to write a test. I’m suddently finding lots of excuses to use Serverspec so we’ll use that, but if you prefer you can use pretty much anything. The following file should be saved as test/integration/default/serverspec/ntp_spec.rb. Note the default in the path which matches our suite above in the .kitchen.yml file. Test Kitchen allows for multiple suites all with separate tests based on a strong set of file path conventions.

require 'serverspec'

include Serverspec::Helper::Exec
include Serverspec::Helper::DetectOS

RSpec.configure do |c|
  c.before :all do
    c.path = '/sbin:/usr/sbin'
  end
end

describe package('ntp') do
  it { should be_installed }
end

describe service('ntp') do
  it { should be_enabled }
  it { should be_running }
end

Running the tests

With all of that in place we’re ready to run our tests.

bundle exec kitchen test

This should:

  • download the virtual machine image if you don’t already have it locally
  • create a new virtual machine based on the image
  • run the bootstrap.sh script
  • run our serverspec test suite

The real power comes from doing this iteratively as you work on code, probably code more complex than a simple one-line bash script. You can also test across multiple virtual machines at a time, for instance different operating systems or different machine roles. The kitchen command line tool provides lots of help too, with the ability to login to machines, verify that specific combinations of platform and suite are working and print lots of diagnotic information to aid development.

Hopefully this will make it into a release soon, and we’ll see more involved examples using higher level tools and more documentation. But even now I’d be looking at Test Kitchen for any infrastructure testing you might be doing.

Testing Packer created images with serverspec

Packer provides a great way of describing the steps for creating a virtual machine image. But it doesn’t have a built-in way of verifying those images.

Serverspec provides a nice framework for writing tests against infrastructure, asserting the operation of services or the installation of packages.

I’m interested at the moment in building continous delivery pipelines for infrastructure components and have a simple working example of testing Packer with Serverspec on Github. The example uses the AWS builder and the Puppet provisioner but the approach should work with other combinations.

This doesn’t represent a complete infrastructure pipeline, but it does demonstrate an approach to automating one particular component - building base images.

Testing

In our example I’m using the Puppetlabs NTP module to install and configure NTP. Once the Puppet provisioner has run, but before we build the AMI (or other virtal machine image) we run a test suite. For our example the tests are pretty simple:

describe package('ntp') do
  it { should be_installed }
end

describe service('ntp') do
  it { should be_enabled   }
  it { should be_running   }
end

If the tests fail, Packer will stop and the AMI won’t be built. The combination of storing the code (Packer template) alongside a test suite (Serverspec) and building a new AMI whenever you change the code, makes this setup perfect for continuous integration.

Wercker builds

As an example of a continuous integration setup the repository contains a wercker.yml configuration file for the excellent Wercker service. Wercker makes setting up multi-step built pipelines easy and nicely configurable via a simple text file in your repository.

The Wercker build for this project is public. Currently the build involves downloading Packer, running packer validate to check the template and eventually running packer build to boot an instance and run our serverspec tests.

Making the web secure, one unit test at a time

Originally written as part of Sysadvent 2013.

Writing automated tests for your code is one of those things that, once you have gotten into it, you never want to see code without tests ever again. Why write pages and pages of documentation about how something should work when you can write tests to show exactly how something does work? Looking at the number and quality of testing tools and frameworks (like cucumber, rspec, Test Kitchen, Server Spec, Beaker, Casper and Jasmine to name a few) that have popped up in the last year or so I’m obviously not the only person who has a thing for testing utilities.

One of the other things I am interested in is web application security, so this post is all about using the tools and techniques from unit testing to avoid common web application security issues. I’m using Ruby in the examples but you could quickly convert these to other languages if you desire.

Any port in a storm

Lets start out with something simple. Accidentally exposing applications on TCP ports can lead to data loss or introduce a vector for attack. Maybe your main website is super secure, but you left the port for your database open to the internet. It’s the server configuration equivalent of forgetting to lock the back door.

Nmap is a tool lots of people will be familiar with for spanning for open ports. As well as a command line interface Nmap also has good library support in lots of languages so lets try and write a simple tests suite around it.

require "tempfile"
require "nmap/program"
require "nmap/xml"

describe "the scanme.nmap.org website" do
  file = Tempfile.new("nmap.xml")
  before(:all) do
    Nmap::Program.scan do |nmap|
      nmap.xml = file.path
      nmap.targets = "scanme.nmap.org"
    end
  end

  @open_ports = []
  Nmap::XML.new("scan.xml") do |xml|
    xml.each_host do |host|
      host.each_port do |port|
        @open_ports << port.number if port.state == :open
      end
    end
  end
end

With the above code in place we can then write tests like:

it "should have two ports open" do
 @open_ports.should have(2).items
end

it "should have port 80 open" do
 @open_ports.should include(80)
end

it "should have port 22 closed" do
 @open_ports.should_not include(22)
end

We can run these manually, but also potentially as part of a continuous integration build or constantly as part of a monitoring suite.

Run the Guantlt

We had to do quite a bit of work wrapping Nmap before we could write the tests above. Wouldn’t it be nice if someone had already wrapped lots of useful security minded tools for us? Gauntlt is pretty much just that, it’s a security testing framework based on cucumber which currently supports curl, nmap, sslyze, sqlmap, garmr and a bunch more tools in master. Lets do something more advanced than our port scanning test above by testing a URL for a SQL injection vulnerability.

@slow
Feature: Run sqlmap against a target
  Scenario: Identify SQL injection vulnerabilities
    Given "sqlmap" is installed
    And the following profile:
      | name       | value                                      |
      | target_url | http://localhost/sql-injection?number_id=1 |
    When I launch a "sqlmap" attack with:
      """
      python <sqlmap_path> -u <target_url> —dbms sqlite —batch -v 0 —tables
      """
    Then the output should contain:
      """
      sqlmap identified the following injection points
      """
    And the output should contain:
      """
      [2 tables]
      +-----------------+
      | numbers         |
      | sqlite_sequence |
      +-----------------+
      """

The Gauntlt team publish lots of examples like this one alongside the source code, so getting started is easy. Gauntlt is very powerful, but as you’ll see from the example above you need to know quite a bit about the underlying tools it is using. In the case above you need to know the various arguments to sqlmap and also how to interpret the output.

Enter Prodder

Prodder is a tool I put together to automate a few specific types of security testing. In many ways it’s very similar to Gauntlt; it uses the cucumber testing framework and uses some of the same tools (like nmap and sslyze) under the hood. However rather than a general purpose security framework like Gauntlt, Prodder is higher level and very opinionated. Here’s an example:

Feature: SSL
  In order to ensure secure connections
  I want to check the SSL configuration of my servers
  Background:
    Given "sslyze.py" is installed
    Scenario: Check SSLv2 is disabled
      When we test using the "sslv2" protocol
      Then the exit status should be 0
      And the output should contain "SSLv2 disabled"

    Scenario: Check certificate is trusted
      When we check the certificate
      Then the output should contain "Certificate is Trusted"
      And the output should match /OK — (Common|Subject

Alternative) Name Matches/ And the output should not contain “Signature Algorithm: md5” And the output should not contain “Signature Algorithm: md2” And the output should contain “Key Size: 2048”

    Scenario: Check certificate renegotiations
      When we test certificate renegotiation
      Then the output should contain "Client-initiated

Renegotiations: Rejected” And the output should contain “Secure Renegotiation: Supported”

    Scenario: Check SSLv3 is not using weak ciphers
      When we test using the "sslv3" protocol
      Then the output should not contain "Anon"
      And the output should not contain "96bits"
      And the output should not contain "40bits"
      And the output should not contain " 0bits"

This is a little higher level than the Gauntlt example — it’s not exposing the workings of sslyze that is doing the actual testing. All you need is an understanding of SSL certifcates. Even if you’re not an expert on SSL you can accept the aforementioned opinions of Prodder about what good looks like. Prodder currently contains steps and exampes for port scanning, SSL certificates and security minded HTTP headers. If you already have a cucumber based test suite (including one based on Gauntlt) you can reuse the step definitions in that too.

I’m hoping to build upon Prodder, adding more types of tests and getting agreement on the included opinions from the wider systems administration community. By having a default set of shared assertions about the expected security of out system we can more easily move onto new projects, safe in the knowledge that a test will fail if someone messes up our once secure configuration.

I’m convinced, what should I do next?

As well as trying out some of the above tools and techniques for yourself I’d recommend encouraging more security conversations in your development and operations teams. Here’s a few places to start with:

Introducing Hyde

Hyde is a brazen two-column Jekyll theme that pairs a prominent sidebar with uncomplicated content. It’s based on Poole, the Jekyll butler.

Built on Poole

Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Hyde here) includes the following:

  • Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
  • Mobile friendly design and development
  • Easily scalable text and component sizing with rem units in the CSS
  • Support for a wide gamut of HTML elements
  • Related posts (time-based, because Jekyll) below each post
  • Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)

Hyde features

In addition to the features of Poole, Hyde adds the following:

  • Sidebar includes support for textual modules and a dynamically generated navigation with active link support
  • Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via <body> classes
  • Eight optional color schemes, available via <body> classes

Head to the readme to learn more.

Browser support

Hyde is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.

Download

Hyde is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.

Thanks!

Looking into monitoring and logging tools

Originally published on Medium.

We have a bunch of internal mailing lists at work, and on one of them someone asked:

we’re looking into monitoring/logging tools…

I ended up writing a bit of a long reply which a few people found useful, so I thought I’d repost it here for posterity. I’m sure this will date but I think it’s a reasonable snapshot of the state of open source monitoring tools at the end of 2013.

Simply put, think about four elements and you won’t be far off on the technical front. Miss one and you’re probably in trouble.

  • logs
  • metric storage
  • metric collection
  • monitoring checks

For logs, some combination of syslog at one end and elasticsearch and Kibana at the other are probably the state of the open source art at the moment. The shipping around is more interesting; Logstash is improving constantly, Heka is an similar alternative from Mozilla, and Fluentd looks nice too.

For pure metrics it’s all about Graphite, which is both awesome and perilous. Not much else really competes in the open source world at present. Maybe OpenTSB (is you’re into a Hadoop stack.)

For collecting metrics on boxes I’d probably look at collectd or diamond both of which have pros and cons but work well. Statsd is also useful here for different types of metric collection and aggregation. Ganglia is interesting too, it combines some aspects of the metrics collection tools with an integrated storage and visualisation tool similar to Graphite.

Monitoring checks is a bit more painful. I’ve been experimenting with Sensu in hope of not installing Nagios. Nagios works but it’s just a bit ungainly. But you do need somewhere to write checks against metrics or other aspects of your system and to issue alerts.

At this point everyone loves dashboards, and Dashing is particularly lovely. Graphiti and Tasseo for Graphite are useful too.

For bonus points things like Flapjack and Reimann provide some interesting extra capabilities around alert control or real time monitoring respectively.

And for that elusive top of the class grade take a look at Kale, which provides anomaly detection on top of Graphite and Elasticsearch .

You might be thinking that’s a lot of moving parts and you’d be right. If you’re a small project running all of that is too much overhead, turning to something like Zabbix might be more sensible.

Depending on money/sensitivity/control issues lots of nice and not so nice commercial products exist. Circonus, Splunk, New Relic, Boundary and Librato Metrics are all lovely in different ways and provide part of the puzzle.

And that’s just the boring matter of tools. Now you get into alert design and other gnarly people stuff.

If you got this far you should watch all the Monitorama videos too.

Platform as a Service and the network gap

Originally published on Medium.

I’m a big fan of the Platform as a Service (PaaS) model of operating web application infrastructure. But I’m a much bigger user and exponent of Infrastructure as a Service (IaaS) products within my current role working for the UK Government. This post describes why that is, and hopefully helps anyone else inside other large enterprise organisations reason about the advantages and disadvantages, and helps PaaS vendors and developers understand what I personally thing is a barrier to adoption in that type of organisation.

A quick word of caution, I don’t know every product inside out. It’s very possible a PaaS product exists that deals with the problems I will describe. If you know of such a product do let me know.

A simple use case

PaaS products make for the very best demos. Have a working application? Deployment is probably as simple as:

git push azure master 

Your app has started to run slowly because visitors are flooding in? Just scale out with something like:

heroku ps:scale web+2

The amount of complexity being hidden is astounding and the ability to move incredibly quickly is obvious for anyone with experience of doing this in a more traditional organisation.

A not so simple use case

Even small systems are often being built out of many small services these days. Many large organisations have been up to this for a while under the banner of Service Orientated Architecture. I’m a big fan of this approach, in my view it moves operational and organisational complexity back into the development team where its impact can often be minimised by automation. But that’s a topic for another post.

In a PaaS world having many services is fine. We just have more applications running on the Platform which can be independently scaled out to meet our needs. But services need to communicate with each other somehow, and this is where our problems start. We’ll keep things simple here by assuming communication is over HTTPS (which should be pretty typical) but I don’t think other protocols make the problem I have go away. The same problem applies if you’re using a SaaS database for example.

It’s the network, stupid

Over what network does my HTTPS internal service call travel? The internet? The internal PaaS vendor’s network? If the latter, is my traffic travelling over the same network as other clients on the platform? Maybe I’m running my own PaaS in-house. But do I trust everyone else in my very large organisation and want my traffic on the same network as other things I don’t even know about? Even if it’s just me do I want internal service traffic mixing with requests coming from the internet? And are all my services created equally with regards what they can and cannot access?

Throw in questions like: is the PaaS supplier running on infrastructure provided by a public IaaS suppliers who you don’t have a relationship with and you start to question the suitability of the current public PaaS products for building secure service based systems.

A journey into Enterprise Architectures

You might be thinking, pah, what’s the worst that can happen? If you work for a small company or a shiny startup that might be completely valid. If on the other hand you’re working in a regulated environment (say PCI) or dealing with large volumes of highly sensitive information you’re very likely to have to build systems that provide layers of trust, and to be doing inspection, filtering and integrity checking as requests flow between those layers.

Imagine that I have a service dealing with some sensitive data. If I control the infrastructure (virtualised or not, IaaS provided or not) I’ll make sure that service endpoint isn’t available to anything that doesn’t need access to it via my network configuration. If I’m being more thorough I’ll filter traffic through some sort of proxy that does checking of the content; It should be JSON (or XML), it should meet this schema, It shouldn’t exceed this rate, it shouldn’t exceed this payload size or response size, etc. That is before anything even reaches the services application. And that’s on top of SSL and maybe client certificates.

If I don’t control the infrastructure, for example when running on a PaaS, I lose some of the ability to have the network protect me. I can probably get some of this back by running my own PaaS on my own infrastructure, but without awareness and a nice interface to that functionality at the PaaS layer I’m going to lose lots of the benefits of running the PaaS in the first place. It’s nice that I can scale my application out, but if new instances can’t connect to the required backend services without some additional network configuration that’s invisible to the PaaS what use is that?

The question becomes; how to implement security layers within existing PaaS products (without changing them). And my answer is “I don’t know”. Yet.

Why isn’t SSL enough?

SSL doesn’t help as much as you’d like to think here because if I’m an attacker what I’m probably going to attack is your buggy code rather than the transport mechanism. SSL doesn’t protect you from SQL injection or unpatched software or zero-day exploits. If the only thing that my backend service will talk to is my frontend application, an attacker has to compromise two things rather than just ignore the frontend and go after the data. Throw in a filter as described above and it’s really three things that need to be overcome.

The PaaS/IaaS interface

I think part of the solution lies in exposing some of the underlying infrastructure via the PaaS interface. IaaS is often characterised as compute, storage and network. In my experience everyone forgets the network part. In a PaaS world I don’t want to be exposed to storage details (I just want it to appear infinite and pay for what I use) or virtual machines (I just care about computing power, say RAM, not the number of machines I’m running on) but I think I do, sometimes, want to be exposed to the (virtual) network configuration.

Hopefully someone working on OpenShift or CloudFoundry or Azure or Heroku or DotCloud or insert PaaS here is already working on this. If not maybe this post will prompt someone to do so.

Web application security tools

I’ve become increasingly interested in web application security issues over the last year or so. Working in Government will do that to you. And I’ve come to the conclusion that a) there are lots of good open source security tools, b) many of them are terribly packaged and c) most developers don’t use any of them.

I’ve been having related conversations at recent events I’ve made it along to, including Devopsdays London which featured some good open spaces discussions on the subject. Security is one of those areas that, for many organisations, is basically outsourced to third party penetration testing firms or consultants. Specialists definitely have a role to play, but with a move towards increasingly rapid releases I think in-house security testing and monitoring is going to get more and more important.

A collection of security tools

I’ve started to build a collection of tools on GitHub, along with a vagrant setup to test them out. Full instructions are available on that repository but the short version is you can run one command and have one virtual machine filled with security testing tools and, if useful, another machine running a vulnerable web application with which to test. The current list of tools runs to:

But I’ll add more tools as I discover them or as people file issues or pull requests.

What about Backtrack?

When I started investigating tools for security and penetration testing most roads led to Backtrack. This is a complete Linux distribution packed with a huge number of security tools, including many if not all of the above. Why then did I write puppet code rather than create a Vagrant box from Backtrack? Firstly, Backtrack is probably great if you’re a professional penetration tester, but the barrier to entry to installing a new distibution for most developers is too high in my view. And with a view to using some of these tools as part of monitoring systems I don’t always want a separate virtual machine. I want to be able to install the tools wherever I want. A good configuration management tool gives you that portability, and Vagrant gives you all the benefits of a local virtual machine.

Future plans

As mentioned I’d like to expand how some of these tools are used to include automated monitoring of applications, maybe look at ways of extracting data for metrics or possibily writing a Sensu plugin or two. The first step to that is probably breaking down the monolithic puppet manifest into separate modules for each tool. Along the way I can add support for more operating systems as required. I’ve already done that for the wackopicko module which is up on the Forge.

I’m also soliciting any and all feedback, especially from developers who don’t do any security related testing but feel like they should.

Government Service Design Manual

I’ve not been writing many blog posts lately, but I have been doing quite a bit of writing elsewhere. One of the things I’ve had a hand in at work is the new Government Service Design Manual. This is the work of many people I work with as well as further afield. It’s intended to be a good starting place to find information about building high quality digital services.

The manual is in beta and we’re looking for as much feedback as possible on the whole thing. It’s already proving useful and a good way of framing the scope of discussions, but it has lots of room for improvement.

If you’re reading this post I’m going to wager you’re interest lies in or around devops flavoured content. The following are guides I’ve written in this area that I’d love any and all feedback on.

If you’re interested in the background to this endeavour then a couple of blog posts from some of my colleagues might be of interest too. First Richard Pope talks about how the manual came about and here’s a post from Andrew Greenway about this beta testing of the service standard.

The source for all this is on GitHub so if you prefer you can just sent a pull request. Or I’m happy to get emails or comments on this post. In particular if people have good references or next steps for these guides then let me know as several of them in particular are lacking in that area.

Perils of portability

I had fun speaking at QCon in London earlier this month with a talk on the Cloud track entitled the Perils of Portability.

This had some Governmenty stuff in but was mainly part rant, part hope for the future of cloud infrastructure. I had some great conversations with people afterwards who felt some of the similar pain which was nice to know. I also somehow managed to get 120 slides into a 40 minute presentation which I think is a personal records.

The videos will be available at some point in the not too distant future too.

Going fast in government

About a month ago I had the good fortune of speaking at the London Web Performance meetup. This was one of the first talks I’ve done about our work at The Government Digital Service since the luanch of GOV.UK back in October. The topic was all about moving quickly in a large organisation (The UK Civil Service is about 450,000 people so I think it counts) and featured just a hand full of technical and organisational tricks we used.