Tuesday, 24 February 2015


Just a quick note on what DevOps means to me. At heart I think it's two things:

  1. Developers need to take responsibility for what happens in production. This goes across definition of done (devs need to make sure the appropriate automated checks are in place so that the team will know both when it's not working and, as far as possible, why it's not working) and also across support; developers should be on support, feeling the pain of poor operational performance and monitoring.
  2. Operations work needs to be automated. Ideally nothing should ever be changed manually in production; everything should be done by an automated process that runs against multiple environments with an automated build, check & deploy process fast enough to use to deploy a fix when production's on fire.
    Automation is a form of development, and consequently requires the same disciplines and skills as any other development; automation code needs to be as well factored and well tested as any other form of code.
In other words, a lot of ops work is development and developers need to be doing ops work. Which does not mean there is no room for specialisation; but like a US undergraduate degree your ops major should have a minor in dev and your dev major should have a minor in ops. In addition they should be on the same team, working together (hopefully pairing) to bring both their specialities to bear on the problem of making the product work seamlessly in production.

Wednesday, 17 September 2014

Homebrew & Finder launched Applications

Recently had an issue where scripts launched from IntelliJ did not have my Homebrew installed executables on their path in Snow Leopard. Fixed it with the following:

sudo sh -c 'echo "setenv PATH /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin" >> /etc/launchd.conf'

and restarting. No guarantees for any other machine / OS! YMMV.

Tuesday, 31 December 2013

Running a service on a restricted port using IP Tables

Common problem - you need to run up a service (e.g. an HTTP server) on a port <= 1024 (e.g. port 80). You don't want to run it as root, because you're not that stupid. You don't want to run some quite complicated other thing you might misconfigure and whose features you don't actually need (I'm looking at you, Apache HTTPD) as a proxy just to achieve this end. What to do?

Well, you can run up your service on an unrestricted port like 8080 as a user with restricted privileges, and then do NAT via IP Tables to redirect TCP traffic from a restricted port (e.g. 80) to that unrestricted one:

iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080

However, this isn't quite complete - if you are on the host itself this rule will not apply, so you still can't get to the service on the restricted port. To work around this I have so far found you need to add an OUTPUT rule. As it's an OUTPUT rule it *must* be restricted to only the IP address of the local box - otherwise you'll find requests apparently to other servers are being re-routed to localhost on the unrestricted port. For the loopback adapter this looks like this:

iptables -t nat -A OUTPUT -p tcp -d --dport 80 -j REDIRECT --to-ports 8080

If you want a comprehensive solution, you'll have to add the same rule over and over for the IP addresses of all network adapters on the host. This can be done in Puppet as so:

define localiptablesredirect($to_port) {
  $local_ip_and_from_port = split($name,'-')
  $local_ip = $local_ip_and_from_port[0]
  $from_port = $local_ip_and_from_port[1]

  exec { "iptables-redirect-localport-${local_ip}-${from_port}":
    command => "/sbin/iptables -t nat -A OUTPUT -p tcp -d ${local_ip} --dport ${from_port} -j REDIRECT --to-ports ${to_port}; service iptables save",
    user    => 'root',
    group   => 'root',
    unless  => "/sbin/iptables -S -t nat | grep -q 'OUTPUT -d ${local_ip}/32 -p tcp -m tcp --dport ${from_port} -j REDIRECT --to-ports ${to_port}' 2>/dev/null"

define iptablesredirect($to_port) {
  $from_port = $name
  if ($from_port != $to_port) {
    exec { "iptables-redirect-port-${from_port}":
      command => "/sbin/iptables -t nat -A PREROUTING -p tcp --dport ${from_port} -j REDIRECT --to-ports ${to_port}; service iptables save",
      user    => 'root',
      group   => 'root',
      unless  => "/sbin/iptables -S -t nat | grep -q 'PREROUTING -p tcp -m tcp --dport ${from_port} -j REDIRECT --to-ports ${to_port}' 2>/dev/null";

    $interface_names = split($::interfaces, ',')
    $interface_addresses_and_incoming_port = inline_template('<%= @interface_names.map{ |interface_name| scope.lookupvar("ipaddress_#{interface_name}") }.reject{ |ipaddress| ipaddress == :undefined }.uniq.map{ |ipaddress| "#{ipaddress}-#{incoming_port}" }.join(" ") %>')
    $interface_addr_and_incoming_port_array = split($interface_addresses_and_incoming_port, ' ')

    localiptablesredirect { $interface_addr_and_incoming_port_array:
      to_port    => $to_port

iptablesredirect { '80':
  to_port    => 8080

Monday, 30 December 2013

Fixing Duplicate Resource Definitions for Defaulted Parameterised Defines in Puppet

Recently I have been working on a puppet module which defines a new resource which in turn requires a certain directory to exist, as so:

define mything ($log_dir='/var/log/mythings') {

  notify { "${name} installed!": }

  file { $log_dir:
    ensure => directory

  file { "${log_dir}/${name}":
    ensure => directory,
    require => File[$log_dir]

As you can see the log directory is parameterised with a default, combining flexibility with ease of use.

As it happens there's no reason why multiple of these mythings shouldn't be installed on the same host, as so:

mything { "thing1": }
mything { "thing2": }

But of course that causes puppet to bomb out:
Duplicate definition: File[/var/log/mythings] is already defined

The solution I've found is to realise a virtual resource defined in an unparameterised class, as so:

define mything ($log_dir='/var/log/mythings') {

  notify { "${name} installed!": }

  include mything::defaultlogging

  File <| title == $log_dir |>

  file { "${log_dir}/${name}":
    ensure => directory,
    require => File[$log_dir]

class mything::defaultlogging {
  @file { '/var/log/mythings':
    ensure => directory

Now the following works:
mything { "thing1": }
mything { "thing2": }

If we want to override and use a different log directory as follows:
mything { "thing3":
  log_dir => '/var/log/otherthing'
we get this error:
Could not find dependency File[/var/log/otherthing] for File[/var/log/otherthing/thing3] at /etc/puppet/modules/mything/manifests/init.pp:12

This just means we need to define the new log directory as so:
@file { $other_log_dir:
  ensure => directory
mything { "thing3":
  log_dir => $other_log_dir
and all is good. Importantly, applying this manifest will not create the default /var/log/mythings directory.

Tuesday, 10 December 2013

H2 & HSQLDB for Simulating Oracle

H2 & HSQLDB are two Java in-memory databases. They both offer a degree of support for simulating an Oracle database in your tests. This post describes the pros and cons of each.


How to setup:

import org.h2.Driver;
import javax.sql.DataSource;
import org.springframework.jdbc.datasource.DriverManagerDataSource;

DataSource dataSource = new DriverManagerDataSource(

DB_CLOSE_DELAY is vital here or the database is deleted whenever the number of connections drops to zero - a highly unintuitive situation.


In general I've found I had to make fewer compromises on my SQL syntax in general and my DDL syntax in particular using H2's Oracle compatibility mode. For instance it supports sequences and making the default value of a column a select from a sequence, which HSQLDB does not.


The transaction capabilities are not as good as HSQLDB. Specifically, if you use MVCC=true in the connection string then H2 does not support a transaction isolation of serializable, only read committed. If you do not set MVCC=true then a transaction isolation of serializable does work but only by doing a full table lock, which is not at all how Oracle does it.


How to setup:

import org.hsqldb.jdbc.JDBCDriver;
import javax.sql.DataSource;
import org.springframework.jdbc.core.JdbcTemplate; 
import org.springframework.jdbc.datasource.DriverManagerDataSource;

DataSource dataSource = new DriverManagerDataSource(
JdbcTemplate  jdbcTemplate = new JdbcTemplate(dataSource)
jdbcTemplate.execute("set database sql syntax ORA TRUE;");
jdbcTemplate.execute("set database transaction control MVCC;"); 


MVCC with a transaction isolation of serializable works as expected - other transactions can continue to write whilst a transaction sees only the state of the DB when it started.


Support for Oracle syntax, particularly in DDL, is patchy - I was unable to run the following, which works fine in Oracle:

Friday, 25 October 2013

CAP Theorem 2 - The Basic Tradeoff

WARNING - on further reading I'm not at all sure the below is accurate. Take it with a large pinch of salt as part of my ;earning experience...


You can't sacrifice Availability, so you have to choose between being Consistent and being Partition Tolerant. But only in the event of a network partition! You can be Partition Tolerant and still be Consistent when no partition is occurring.

Following up from my previous post on CAP Theorem, I'm going to discuss what in practical terms the CAP trade-off means.

A is non-negotiable - a truly CP data store is a broken idea

Remember, "Available" doesn't mean "working", "Available" means "doesn't hang indefinitely". In the event of a network partition a truly CP data store will simply hang on a request until it has heard from all its replicas. A system that hangs indefinitely is a chocolate tea pot. Poorly written clients will also hang indefinitely, ending up with users sitting staring at some equivalent of the Microsoft sand timer. In the end someone (a well written client, or just the poor schmuck staring at his non-responsive computer) will decide to time the operation out and give up, leaving them in the same not-working state as a CA system but with the additional worry that they've no idea what happened to the request they sent.

Hang on, there are CP data stores out there aren't there?

No, not really - not as I understand CAP theorem, anyway. See below!

The choice is between CA and AP

In fact it can be reduced to a very, very simple trade-off - in the event of a network partition, do I want the data store to continue to work or do I want the data store to remain consistent?

CA means a single point of failure

CA is the simplest model. It's what we get when we run up a single node ACID data store - it's either there, working and consistent or it isn't. There are ways to add a measure of redundancy to it in the form of read-only slaves with a distributed lock, but fundamentally if a network partition occurs between them and the master then the master has to stop accepting writes if it is to remain consistent with the slave.

It's a model that means outages are essentially guaranteed. If that's acceptable then it's nice and easy for developers to work with; but it's rarely acceptable.

Which leaves AP

Nearly all data stores used in scenarios where there is a desire to avoid outages entirely in so far as is possible (human error notwithstanding). Which means having multiple copies of state on machines connected by the network, which means network partitions can and will happen. Which means needing to be available and tolerant of those partitions.

Oh noes! No consistency! Sounds dreadful...

The important point to remember here is that the loss of Consistency implied by Partition Tolerance (i.e. Continuing to Work) only has to be accepted in the event of a partition. This is what lots of so-called "CP" systems are trying to do - remain consistent whilst the network is healthy, and only become inconsistent in the event of a partition.

Wednesday, 23 October 2013

CAP Theorem

WARNING - on further reading I'm not at all sure the below is accurate. Take it with a large pinch of salt as part of my learning experience...

I've been a bit confused over the meaning of the C, A and P of CAP theorem. I think I've got it sussed now, so this post is my attempt to encapsulate that knowledge and get it out there for someone to correct if I'm still wrong!

C - Consistent

This is the easy one - I think i've always understood this, though I'm sure there are nuances to it. If you write some data then, on anyone anywhere trying to read it, then so long as they do not get an error or someone else has independently updated it in the meantime then they will see the same data you wrote.

A - Available

Took me a while to get this one; I was thinking of it in terms of whether a system is up or not. That's not what Available means in this context. All it means is "able to return a response in a timely manner". That response could be as simple as a refusal to allow a new TCP connection - that's a response, and a timely one. An HTTP system returning 500 errors is available. If you're not timing out trying to communicate with the system, it's available, no matter how unhelpful the responses you are getting back are.

In contrast, a system is unavailable when a client gets nothing back at all and are left waiting until you timeout (you've got a timeout set up, right? Right?). Stick a Thread.sleep(Long.MAX_VALUE) in your HTTP handling code and your system is unavailable. Put a firewall in the way that quietly drops all response packets and you're unavailable.

P - Partition Tolerant

There's two aspects to this one. The first is the obvious - a network partition occurs so that two nodes in a cluster are unable to communicate without them getting a chance to sign off from each other beforehand. What was less obvious to me at first is that a node that crashes is an example of a partition - not, as I naively thought, an example of being unavailable. The other nodes in the cluster cannot distinguish between "crashed" and "network issue somewhere between us". 

A system is Partition Tolerant if it a) has more than one node and b) it can handle transactions without returning an error in the event that those nodes cannot communicate. 


It should be obvious that network partitions can always happen wherever a cluster exists with multiple nodes that hold their own copy of state and that need to communicate over a network in order to maintain consistent state. CAP theorem says that when that partition happens, one of C, A and P has to be sacrificed. And now it should be fairly clear why. When a client attempts to write to a cluster which is partitioned, that write will arrive at one side or the other of the partition, and the system will have to do one of three things:

Wait for the Partition to Heal (CP)

The simple solution to maintain consistency is for the node getting the write to wait, and not return a response until it knows that write has been committed on all nodes. Obviously this sacrifices availability - the partition may never heal, or not for a prohibitively long time. However, data will be consistent and no errors are returned, so we have a rather useless Partition Tolerance.

Discard the Write and return an Error (CA)

Option two is to return an error. Consistency is maintained by the simple expedient of not changing state at all. The system is available - it's returning errors in a timely manner. However it's not partition tolerant - indeed it's questionable whether there's any benefit over a single node data store. By having more than one node and a network connection the chances of failure are simply increased. A single node data store is CA - it's either there or not.

Accept the Write (AP)

The system is available and partition tolerant - no hanging, no error returned. The cost is that it is not consistent - the state either side of the partition is different, and someone reading from the other side of the partition will not see it. A dynamo style store with a read/write quota lower than half the nodes has sacrificed C in return for A and P. 

It's Not That Simple

Of course it isn't - the C, A and P qualities are not binary, they are a continuum, and data stores can make trade offs between them. A dynamo style store can choose to sacrifice some tolerance to a partition in return for more consistency by setting quora at a level of n/2 +1. A system could tolerate mild unavailability in the hope of the partition healing quickly. A store can vote up masters so that consistency is only sacrificed between partitioned halves, not sacrificed between all nodes. You get the idea.