Tuesday, 26 February 2013

Script to Install Public Key on Multiple Hosts

Here's a little script that can upload a public key onto a server - you could run it for multiple servers at the same time. Requires sshpass to be installed.

stty -echo
read -p "Password: " passw; echo
stty echo

public_key=`cat ~/.ssh/id_rsa.pub`

function updateKey {
  sshpass -p $passw ssh -oLogLevel=Error -oStrictHostKeyChecking=no $1 "mkdir -p ~/.ssh; chmod og=,u=rwx ~/.ssh; if [ ! -f ~/.ssh/authorized_keys ]; then touch ~/.ssh/authorized_keys; fi; if ! grep -Fxq \"$public_key\" ~/.ssh/authorized_keys; then echo \"$public_key\" >> ~/.ssh/authorized_keys; fi"
  if [ $? -ne 0 ]; then echo "Failed to update key on $1"; else echo "Updated key on $1"; fi
}

Tuesday, 19 February 2013

Specifications, Tests & Code

This is a quick reaction to various things I've read recently, most immediately this tweet:


I think the observations in this article by Bertrand Meyer about the limits of testing are entirely correct. In any even vaguely complex system you cannot begin to test all the combinations of inputs and outputs. That's why we focus on testing what we think are the important cases and what we think are the boundary conditions. I agree with him that as such the tests are not the specification and cannot be. So I don't think we can just replace "Test" with "Spec" and solve the problem.

(I should stick in a caveat here - I read a tweet by Ben Goldacre recently saying that people who rebut tweets in blog posts (or newspaper articles) are being prats, because a tweet by its nature is going to lack subtlety and depth of argument. I dare say that Kevlin Henney would mount a staunch defence of what he actually meant, perhaps along the lines of Martin Fowler's "Specification by Example" essay which acknowledges that a specification by example will be necessarily incomplete with the rest of the specification to be inferred from it.)

I think there's a useful analogy with real science. A specification is the equivalent of a theory; F=MA, for instance, or E=mc2. A test is the equivalent of an experiment; for a given set of controlled inputs, it measures the actual output against that predicted by the theory. And the running system is the equivalent of the real world. Just as in science, the tests (experiments) cannot prove the specification (theory) holds in the runtime system (real world), they can only disprove it. A black swan event can still occur (and anyone who has ever written software will have encountered bugs in well tested software arising from inputs the tester had not anticipated and so had not tested for).

The analogy breaks down in two respects; firstly, a correct but failing experiment in science means that it's time to re-evaluate the theory, because reality isn't subject to error, whereas often in programming it means that the running system is not behaving as actually desired.

Secondly, in science the theory (specification) is something a human being writes and understands and is obviously distinctly separate from the real world (runtime system); it may or may not accurately represent it. This leads me on to the second article that prompted me to write this post; Leslie Lamport arguing that we need formal specifications in addition to code. To me a specification is a formal, logically precise, human readable statement of precisely how a system is expected to operate under all conditions. So far so in agreement. However, once you've got such a thing, I think it should be possible to compile it into a form a computer can execute, and the name for human readable text that can be compiled into a form that a computer can execute is "source code".

I do not accept at all the the notion that the specification states "what and why" and the code states "how". Code is written at multiple levels of abstraction, typically represented by functions. I would argue that the why, what and how are encoded in these abstraction layers. For any given function, the function name states "what", the context of the parent function in which it is called states "why" and the body of the function states "how". As you move up and down the call stack, these roles change.

Which I think raises the question - if the runtime system (real world) is actually compiled from the specification (the theory) and the tests (experiments) are written to validate the specification (theory) is correct, haven't we got a circular argument? How can the tests ever fail? And why do we even need them?

I think the answer is that most of the time in programming we have two levels of specification. One exists in our heads or in a requirement document or a user story; it's informal, it doesn't cover all the cases, it may even be self contradictory or downright impossible at times, but it's essentially "correct" in the sense that it captures what we actually want this system to do. That's the one we use to write our tests with. Then we have to create the formal specification of what it should actually do under all circumstances, by writing the code. Our tests are about validating that the formal specification actually specifies what we were hoping it would specify.

Monday, 4 February 2013

Maven Logging Config for Libraries & Applications

A quick dump of my standard Maven poms for both libraries & applications.

Basic theory - pipe everything to SLF4J & use Logback as the SLF4J implementation.

A library should ONLY have a dependency on slf4j-api - it should not use classes in any logging implementation.

Libraries:


  4.0.0
  com.acme
  my_library
  1.2.3

  
    1.7.1
  

  
    
      version99
      http://version99.qos.ch/
    
  

  
    
    
      org.slf4j
      slf4j-api
      ${slf4j.version}
      compile
    

    
    
      ch.qos.logback
      logback-classic
      1.0.7
      test
    
    
      org.slf4j
      jcl-over-slf4j
      ${slf4jversion}
      test
    
    
      org.slf4j
      jul-to-slf4j
      ${slf4jversion}
      test
    
    
      uk.org.lidalia
      jul-to-slf4j-config
      1.0.0
      test
    
    
      org.slf4j
      log4j-over-slf4j
      ${slf4jversion}
      test
    
    
      commons-logging
      commons-logging
      99-empty
      test
    
    
      log4j
      log4j
      99-empty
      test
    
  


  
  
  


Application:

  4.0.0
  com.acme
  my_application
  1.2.3

  
    1.7.1
  

  
    
      version99
      http://version99.qos.ch/
    
  

  
    
    
      org.slf4j
      slf4j-api
      ${slf4jversion}
      compile
    

    
    
      ch.qos.logback
      logback-classic
      1.0.7
      runtime
    
    
      org.slf4j
      jcl-over-slf4j
      ${slf4jversion}
      runtime
    
    
      org.slf4j
      jul-to-slf4j
      ${slf4jversion}
      runtime
    
    
      uk.org.lidalia
      jul-to-slf4j-config
      1.0.0
      runtime
    
    
      org.slf4j
      log4j-over-slf4j
      ${slf4jversion}
      runtime
    
    
      commons-logging
      commons-logging
      99-empty
      runtime
    
    
      log4j
      log4j
      99-empty
      runtime
    
  


  
  
  
    
       %d [%thread] %-5level %logger{36} CLIENTID=%X{CLIENTID} SESSIONID=%X{SESSIONID} USERID=%X{USERID} TRANSACTIONID=%X{TRANSACTIONID} - %msg%n
    
  
  
    
  
  

Wednesday, 24 October 2012

Scala's Maligned Type System

Around the net you can find a lot of criticisms of Scala's type system, often focussing on this signature:

def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That

Before we start, let's make one thing clear - I am not arguing that this is trivial for a Java developer to read. There are a lot of barriers to legibility for a Java dev here, which I'll try and unpick below. My issue with this is as an example of type system complexity, since as far as I can see the difficulties with legibility here are nothing to do with the type system - at least for a Java developer. What's going on here, from a type perspective, is no more complicated than Java's generics permits.

In an attempt to prove this, let's "undo" the non-type system syntax features that make this difficult to read for a Java developer.

1) Prefix types rather than postfix types
Scala puts types after variable / method signatures rather than before. Let's switch back to the java way:
That ++ [B >: A, That] (TraversableOnce[B] that)(implicit CanBuildFrom[List[A], B, That] bf)

2) Multiple parameter lists
Scala allows multiple parameter lists for a function/method, so lets collapse them into one:
That ++ [B >: A, That] (TraversableOnce[B] that, implicit CanBuildFrom[List[A], B, That] bf)

3) Implicit parameters
Scala allows implicit parameters - let's lose that keyword:
That ++ [B >: A, That] (TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

4) Operators as method names
Scala allows operators as method names - let's be a bit more Java about it:
That addAll [B >: A, That] (TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

5) Generic Type Param position
Scala puts the generic type params between the method name and the parameter list, Java puts them first. Let's put it the Java way around:
[B >: A, That] That addAll(TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

6) Generic Type Param declaration
Scala uses [ and ] around its generic types, Java uses < and >. Let's go Java:

<B >: A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf)

7) Generic Type Param bounds declaration
Scala uses >: and <: where Java uses super and extends to set type bounds. Java version:
<B super A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf)

We're now into a nearly valid Java signature for a method on a type List<A>. The only bit of this which does not compile is <B super A> - for reasons I don't fully understand Java does not support  a lower bound for a type parameter on a method. Switch it to an upper bound however, and Java's perfectly happy:

interface List<A> {
    <B extends A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf);
} 

Conceptually upper and lower bounds are basically equivalent. So it's possible to express the type ideas in the example at the top almost entirely in pure Java.

Remember I'm focussed purely on the type system here - the method declaration we've ended up with is still complicated and has legibility issues with names. It's just that you can produce the same complications in Java - there's been no added complexity from Scala. Here's a verbose version that might be easier to read:

interface List<E> {
    <SubE extends E, ReturnCollectionType>
    ReturnCollectionType addAll(
            TraversableOnce<SubE> itemsToAdd,
            CollectionFactory<List<E>, SubE, ReturnCollectionType> factory);
}

The irony is that there are aspects of the Scala type system that are significantly different to Java, and so are susceptible to accusations of a type system in overdrive. It's just that this method signature, used as an example of Scala's allegedly baroque type system, shows none of them.

Personally I find generic type parameters easier in Scala than Java - I don't know how much time I've wasted trying to work out how to make javac accept complicated generic signatures, with hopelessly illegible compile errors about "? capture 1 of 3"  not matching. I have a suspicion that many of the complaints about the Scala type system either apply as much and often more to the Java type system, or are more a function of the vocabulary Scala developers use to describe the generic types being unfamiliar to Java developers who might otherwise recognise a concept familiar to them from Java.

For instance, to be told "List[A+] means List is covariant in A" is daunting. To find out that all this means is that effectively the declaration
val elements: List[Number] = new List(1, 2, 3)
 in Scala is always equivalent to the declaration
List<? extends Number> elements = new ArrayList<Integer>();
 in Java is much less daunting - Java developers are already aware that wildcards allow covariance and that as such List<? extends Number> is a valid supertype of List<Integer>. The Scala form is actually simpler & more intuitive by not requiring the wildcard type bound on every reference & instead declaring the variance rule on the type itself.

Monday, 24 September 2012

Reproducing Boolean Logic in Pure Scala

One of Scala's big claims is that the syntax is sufficiently powerful to allow programmers to produce control structures as APIs that would require dedicated language support in most languages.

It occurred to me that it would be interesting (if admittedly quixotic) to put this to the test on the most basic control structure - the humble if/else with booleans. The goal here is to implement all the normal boolean logic you would get in Java or C or the like, but without using any boolean keywords. Instead of true, false, if and else I will use true1, false1, if1 and else1 (since Scala does have these keywords, we can't use the most natural form). This is partly for fun, partly to learn Scala, partly to explore the power of the language. The full listings are at the bottom if you're not interested in the workings.

First cut - implement some basic operations on objects representing true and false. By and large this isn't really doing anything you couldn't do using an Enum in Java, with the obvious exception of the fact Scala allows you to define methods with names that would be operators. The only exception is the unary_! method; by prefixing ! with unary_ Scala allows us to put the method call before the instance it is being called on, allowing the conventional !true and !false format. By making it a sealed class we ensure no third instance of Boolean escapes into the wild.

Here are the tests:


And here's the code to make them pass:


Then we need to add the shortcut && and || to the mix. For this we use Scala's support for call by name, by prefixing the argument type to the function with =>. This means Scala will not evaluate the expression passed as this argument to the function until it is referenced within the function, unlike normal behaviour where the expression is evaluated prior to the function being called. We'll test for this:


And make those tests pass:


The last complication is implementing if/elseif/else behaviour. The standard form of an if statement looks very like a function taking two arguments - a boolean, followed by a function to evaluate if the boolean is true. Scala allows functions to have multiple argument lists, which allows us to mimic this form in Scala's version of a static function - one defined on a singleton object. Let's call it if1. From that function we can then return an instance of a type IfResult on which we can further call either the function elseif or the function else1 - see the tests for examples. Using the call by name form we can prevent branches being executed when we don't want them to be (and test for this, naturally). The actual choice as to what to execute can be done in the true1 and false1 instances, by using the classic strategy of polymorphism rather than conditionals.

Here are the tests:


And here's the implementation:


All told I'm pretty impressed - the only ugliness I failed to massage away was the need to have a dot between the end of the previous branch and the elseif function call. The one element of functionality I couldn't reproduce is a return statement from the enclosing function - since it happens in an expression it simply returns from that expression, and you get a compile error. If one were programming in a functional style this wouldn't be a major issue.

(Of course, there's a certain irony in reimplementing if/else if/else by using polymorphism - which is the purist OO way of avoiding if/else in the first place...)

Here are links to the complete test & implementation listings:
BooleanTest.scala
Boolean.scala

Friday, 16 December 2011

REST and HATEOAS - Problems with Clients

At Level 3 of the REST maturity model, the client is meant to bind to two things - the entry URI for the system and the media type(s) of the entities sent and received.

The client is then meant to retrieve any other URIs from links embedded in the entities - links it can recognize by semantically meaningful rel attributes. In theory this means that the providers of the service can then arbitrarily change the URI patterns at which its resources reside, as the client can instantly programatically react to this change.

I have a couple of concerns with this theory, which I'm going to explore in this blog post.

  1. Programming such a client is more arduous

    Imagine that as a client there's some specific resource you want to retrieve - perhaps a representation of a customer. If you are obeying the principles of HATEOAS you go to the root URI, and retrieve the following resource:

    {
      ...
      "link": {
        "rel":"customer",
        "href-pattern":"/customers/${customerName}"
      }
      ...
    }
    
    You then find the link with a rel of customer, get the href-pattern attribute, and use that to build the URI to the customer you want. Then you make sure you have configured a sensible HTTP cache for the root resource, otherwise that extra request to the root resource is an expensive overhead on every request for a customer.

    Or you behave badly, disobey HATEOAS and hard code the URI pattern:

    /customers/${customerName}

  2. URIs cannot be hidden from the client

    Many programming languages provide a means to enforce which elements are public, and hence suitable for clients to bind to, and which are not - normally with visibility modifiers of some form. There are frequently ways for clients to get round these restrictions, but at a bare minimum it makes it very clear to the client that they are not meant to be doing it and upgrades are likely to break them.

    There is no way to do this with a URI. If a badly behaved client decides to bind to some resource specific URI deep in your API, there's nothing to stop them and indeed nothing beyond documentation & knowledge of HATEOAS (which in my experience is still pretty limited in the general marketplace of software developers) to suggest to them that they are doing the wrong thing.

    In addition, unlike a programming library which is upgraded at the client's whim and so the client can do thorough tests to ensure the upgrade has broken nothing, a remote service may alter without the clients' knowledge; in this sense binding to a private API in a library is much safer than binding to a private API in a remote service.

    In practice this means that if the publishers of the service do decide to change the URI patterns they must do so in the knowledge that they may be breaking badly behaved clients; clients who may well not consider themselves to be behaving badly.

    There are circumstances where this may be acceptable. Those offering a free and desirable service to the general public may well take the attitude that if a client breaks because the client has not obeyed the HATEOAS contract, that's the client's problem. This is a nice situation to be in, but there are other circumstances in which it is more difficult to be so ruthless.

    If, as is common, the service is either an internal or external business-to-business service then in my experience if things stop working it's the people who changed something who are held responsible. Asserting that it is the client's fault for being insufficiently robust rarely cuts it with the top brass. The onus is generally on the people wishing to make a change to ensure it will not break their clients, rather than on clients to be robust. This is made particularly difficult as there is no way from the server side to tell whether a client is well behaved or not - they still hit the same URIs. Only actually looking at their code can show you whether or not you will break them before you do so. Alternatively, if the service is to the general public but requires payment the client is a paying customer, which makes their unhappiness a significantly bigger deal.

That combination of it being more arduous for your customer to create a HATEOAS client and there being nothing to discourage them from binding to specific URI patterns seems to me fairly toxic. In practice it seems to me that a real world service is quite likely to have to treat its URIs as part of its public API.

Monday, 7 November 2011

Apache HTTPD - settings for use as a pure proxy

When using Apache HTTPD as a pure proxy to an application server, it may be useful to set AllowEncodedSlashes to "On" and set nocanon on the ProxyPass. This has the effect that the URIs are passed to the application server "as is" without Apache doing any security check on them or otherwise attempting to correct them. Naturally this puts the onus on the origin server to be secure.

It may also be useful to set retry=0. By default after a failure to get a response from the origin server HTTPD caches the fact that the origin server is unavailable for a minute. This is a pain when automating deployments. Setting retry=0 makes it genuinely proxy every call down to the origin server regardless.

    AllowEncodedSlashes On
    ProxyPreserveHost On
    ProxyPass / http://localhost:8080/ retry=0 nocanon
    ProxyPassReverse / http://localhost:8080/

Along the same lines, when using Tomcat to serve RESTful requests it may be useful to allow encoded slashes. This is turned off by default because if you have a servlet that serves up files it may allow an attacker to retrieve arbitrary files from your server using ../../ type paths. If you are mapping all URLs to servlet(s) that do not do this then you can re-enable them using the following command line arguments:

-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true
or by adding the following to $CATALINA_HOME/conf/catalina.properties:
org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true