Error handling when enforcing invariants in the construction of statically checked types
tl;dr - error handling would be a lot less painful if the compiler could coerce
an Either[A, B]
to an A
or a B
at compile time if the program would not
type check if it was an Either[A, B]
, would type check if it was an A
or a
B
, and whether it was in fact a Left[A, B]
or a Right[A, B]
was decidable
at compile time.
Here’s a problem that’s been bothering me for a long time - how to error handle
when enforcing invariants in the construction of statically checked types. It’s
the nature of types to constrain the set of values they can represent.
Consequently you have to construct an instance of a type from one or more other,
often more general, types, and in one form or another it will frequently
therefore be possible to call that constructor with values that are not in the
set of valid values for that type. At which point the attempt to construct that
instance of the type will hopefully be rejected, to prevent invalid program
state occurring. For instance, Java has a URI
class with a constructor that
accepts a String
. Since the String
cannot be guaranteed to be a valid URI
,
the constructor throws URISyntaxException
if it’s not a valid URI
.
This introduces a problem, and one I’ve written about before.
Ideally in a statically type checked language the type checker will tell you
about the different values a function can return. And the URI
constructor
function does just that, via the much hated checked exception; the constructor
will “return” either a valid URI
or the exception, and the type checker will
force you to do something about it.
In more functional languages such as Scala or Haskell we might represent that
via a genuine return type - an Either[URISyntaxException, URI]
, or a
Try[URI]
, or perhaps Option[URI]
if we’re not interested in giving much
feedback on what went wrong. Which is all fine and dandy if the input is
untrusted. I like the type checker to remind me that if I take some input from a
user and try and turn it into a URI I need to handle the case when the user has
failed to oblige.
However, it becomes a lot more irritating in the case where we know the input is valid. Whether it’s because we’re using the language to configure something, or writing a test, or just jamming a little bit of code together to do something right now that doesn’t need to take inputs and validate them, there are times when you just want to hardcode that value and so there’s no error handling to be done.
Let’s say we define a type to represent the set of natural numbers:
object Natural {
def apply(natural: Int): Either[IllegalArgumentException, Natural] =
if (natural > 0) Right(new Natural(natural))
else Left(new IllegalArgumentException(s"$natural is not a Natural number"))
}
case class Natural private (natural: Int) extends AnyVal
Cool - there’s only one way to construct it, which enforces the invariant we want, and the type checker tells us it may not succeed. And let’s define a function that will use it:
def sumOfFirst(n: Natural): Natural = ???
OK, let’s test it:
test("sum of first 4 natural numbers") {
sumOfFirst(Natural(4)) mustBe Natural(10)
}
And… compiler says no. I’m trying to pass an
Either[IllegalArgumentException, Natural]
to a function that expects a plain
Natural
, and I’m trying to compare the resulting plain Natural
with an
Either[IllegalArgumentException, Natural]
. Not going to work. But making the
compiler happy is going to make a bit of a mess of what was a fairly legible
test:
test("sum of first 4 natural numbers") {
Natural(4).map(sumOfFirst) mustBe Natural(10)
}
So now you’ve got the additional cognitive load of mapping over an Either
to
call the function you wanted to test, plus the fact that the types you are
actually comparing in the assertion are both
Either[IllegalArgumentException, Natural]
rather than the simple Natural
you
were interested in. Not nice.
In the URI case mentioned earlier, Java provides a get out of jail option:
public static URI create(String str)
does not throw a checked exception, it will only fail at runtime if the string is invalid. And we could do likewise for Natural. But that’s a bit unsatisfying, because in the process we’ve opened up the possibility that the programmer abuses that facility, and the type checker has thereby lost some of its ability to prove our programs correct.
Scala offers some interesting string parsing options where the compiler can start statically checking the contents of a String to create an instance of a constrained type and report any failure at compile time; Jon Pretty has made this significantly easier with http://co.ntextu.al/.
However, it occurs to me that there might be a more general way of doing this.
What if, in cases where a pure function returns a result of type Either[A, B]
,
but is being used in a way that means the compiler expects its result to be a
plain A
or a plain B
, and it is called in such a way that its arguments are
fully known at compile time, the compiler could actually call the function at
compile time and check whether it returns a Left(a: A)
or a Right(b: B)
? And
if it returns the correct one, permit its use directly as an A
or B
? And if
not, report the other as the text of a compile error?
This would then work entirely seamlessly, with no library code, and would not
require any String parsing for cases such as our Natural
above that are not
naturally stringy, so long as your constructor function enforces its invariants
and reports the result as an Either
. The first version of the test case above
would happily pass, as both the constructions of Natural
could be proved at
compile time to return a Right(n: Natural)
and hence be coerced to a Natural
since that would type check and the Either[IllegalArgumentException, Natural]
type would not.
The compiler would of course be clever enough to still treat the output as an
Either[A, B]
if it were being used in a context that required it to be an
Either[A, B]
- indeed, that would be the default. Only if the required type of
the output was not satisfied by the type Either[A, B]
but was satisfied by
either the type A
or the type B
would it bother to start trying to work out
whether it was decidable at compile time, and if so which it was.