Wednesday, 24 October 2012

Scala's Maligned Type System

Around the net you can find a lot of criticisms of Scala's type system, often focussing on this signature:

def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That

Before we start, let's make one thing clear - I am not arguing that this is trivial for a Java developer to read. There are a lot of barriers to legibility for a Java dev here, which I'll try and unpick below. My issue with this is as an example of type system complexity, since as far as I can see the difficulties with legibility here are nothing to do with the type system - at least for a Java developer. What's going on here, from a type perspective, is no more complicated than Java's generics permits.

In an attempt to prove this, let's "undo" the non-type system syntax features that make this difficult to read for a Java developer.

1) Prefix types rather than postfix types
Scala puts types after variable / method signatures rather than before. Let's switch back to the java way:
That ++ [B >: A, That] (TraversableOnce[B] that)(implicit CanBuildFrom[List[A], B, That] bf)

2) Multiple parameter lists
Scala allows multiple parameter lists for a function/method, so lets collapse them into one:
That ++ [B >: A, That] (TraversableOnce[B] that, implicit CanBuildFrom[List[A], B, That] bf)

3) Implicit parameters
Scala allows implicit parameters - let's lose that keyword:
That ++ [B >: A, That] (TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

4) Operators as method names
Scala allows operators as method names - let's be a bit more Java about it:
That addAll [B >: A, That] (TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

5) Generic Type Param position
Scala puts the generic type params between the method name and the parameter list, Java puts them first. Let's put it the Java way around:
[B >: A, That] That addAll(TraversableOnce[B] that, CanBuildFrom[List[A], B, That] bf)

6) Generic Type Param declaration
Scala uses [ and ] around its generic types, Java uses < and >. Let's go Java:

<B >: A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf)

7) Generic Type Param bounds declaration
Scala uses >: and <: where Java uses super and extends to set type bounds. Java version:
<B super A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf)

We're now into a nearly valid Java signature for a method on a type List<A>. The only bit of this which does not compile is <B super A> - for reasons I don't fully understand Java does not support  a lower bound for a type parameter on a method. Switch it to an upper bound however, and Java's perfectly happy:

interface List<A> {
    <B extends A, That> That addAll(TraversableOnce<B> that, CanBuildFrom<List<A>, B, That> bf);

Conceptually upper and lower bounds are basically equivalent. So it's possible to express the type ideas in the example at the top almost entirely in pure Java.

Remember I'm focussed purely on the type system here - the method declaration we've ended up with is still complicated and has legibility issues with names. It's just that you can produce the same complications in Java - there's been no added complexity from Scala. Here's a verbose version that might be easier to read:

interface List<E> {
    <SubE extends E, ReturnCollectionType>
    ReturnCollectionType addAll(
            TraversableOnce<SubE> itemsToAdd,
            CollectionFactory<List<E>, SubE, ReturnCollectionType> factory);

The irony is that there are aspects of the Scala type system that are significantly different to Java, and so are susceptible to accusations of a type system in overdrive. It's just that this method signature, used as an example of Scala's allegedly baroque type system, shows none of them.

Personally I find generic type parameters easier in Scala than Java - I don't know how much time I've wasted trying to work out how to make javac accept complicated generic signatures, with hopelessly illegible compile errors about "? capture 1 of 3"  not matching. I have a suspicion that many of the complaints about the Scala type system either apply as much and often more to the Java type system, or are more a function of the vocabulary Scala developers use to describe the generic types being unfamiliar to Java developers who might otherwise recognise a concept familiar to them from Java.

For instance, to be told "List[A+] means List is covariant in A" is daunting. To find out that all this means is that effectively the declaration
val elements: List[Number] = new List(1, 2, 3)
 in Scala is always equivalent to the declaration
List<? extends Number> elements = new ArrayList<Integer>();
 in Java is much less daunting - Java developers are already aware that wildcards allow covariance and that as such List<? extends Number> is a valid supertype of List<Integer>. The Scala form is actually simpler & more intuitive by not requiring the wildcard type bound on every reference & instead declaring the variance rule on the type itself.

Monday, 24 September 2012

Reproducing Boolean Logic in Pure Scala

One of Scala's big claims is that the syntax is sufficiently powerful to allow programmers to produce control structures as APIs that would require dedicated language support in most languages.

It occurred to me that it would be interesting (if admittedly quixotic) to put this to the test on the most basic control structure - the humble if/else with booleans. The goal here is to implement all the normal boolean logic you would get in Java or C or the like, but without using any boolean keywords. Instead of true, false, if and else I will use true1, false1, if1 and else1 (since Scala does have these keywords, we can't use the most natural form). This is partly for fun, partly to learn Scala, partly to explore the power of the language. The full listings are at the bottom if you're not interested in the workings.

First cut - implement some basic operations on objects representing true and false. By and large this isn't really doing anything you couldn't do using an Enum in Java, with the obvious exception of the fact Scala allows you to define methods with names that would be operators. The only exception is the unary_! method; by prefixing ! with unary_ Scala allows us to put the method call before the instance it is being called on, allowing the conventional !true and !false format. By making it a sealed class we ensure no third instance of Boolean escapes into the wild.

Here are the tests:

And here's the code to make them pass:

Then we need to add the shortcut && and || to the mix. For this we use Scala's support for call by name, by prefixing the argument type to the function with =>. This means Scala will not evaluate the expression passed as this argument to the function until it is referenced within the function, unlike normal behaviour where the expression is evaluated prior to the function being called. We'll test for this:

And make those tests pass:

The last complication is implementing if/elseif/else behaviour. The standard form of an if statement looks very like a function taking two arguments - a boolean, followed by a function to evaluate if the boolean is true. Scala allows functions to have multiple argument lists, which allows us to mimic this form in Scala's version of a static function - one defined on a singleton object. Let's call it if1. From that function we can then return an instance of a type IfResult on which we can further call either the function elseif or the function else1 - see the tests for examples. Using the call by name form we can prevent branches being executed when we don't want them to be (and test for this, naturally). The actual choice as to what to execute can be done in the true1 and false1 instances, by using the classic strategy of polymorphism rather than conditionals.

Here are the tests:

And here's the implementation:

All told I'm pretty impressed - the only ugliness I failed to massage away was the need to have a dot between the end of the previous branch and the elseif function call. The one element of functionality I couldn't reproduce is a return statement from the enclosing function - since it happens in an expression it simply returns from that expression, and you get a compile error. If one were programming in a functional style this wouldn't be a major issue.

(Of course, there's a certain irony in reimplementing if/else if/else by using polymorphism - which is the purist OO way of avoiding if/else in the first place...)

Here are links to the complete test & implementation listings: