2021-11-08
Algebraic data types (ADT’s) are the kind of programming language feature that is hard to live without once gotten used to. Unfortunately, Nix ostensibly does force you to live without ADT’s, but let’s see if we can add them back in.
In this post, I will show you three ways of emulating ADT’s, and when to choose which. This post was inspired by my work on PureNix, and aims to explain why pattern matching is implemented the way it is.
This snippet of Haskell/PureScript code will serve as the basis of our discussion for now:
data These a b
= This a
| That b
| These a b
foo :: These Int Int -> Int
This a) = a + 1
foo (That b) = if b > 0 then b else 0
foo (These a b) = a * b foo (
The most natural and common encoding of an ADT is as what’s often called a tagged union. In Nix, that would mean an attribute set of
{"This"; field0 = a; };
This = a: { tag = "That"; field0 = b; };
That = b: { tag = "These"; field0 = a; field1 = b; };
These = a: b: { tag =
foo = x:if x.tag == "This" then x.field0 + 1 else
if x.tag == "That" then if x.field0 > 0 then x.field0 else 0 else
if x.tag == "These" then x.field0 * x.field1 else
builtins.throw "pattern match error";
}
And sure, this technically is a correct implementation of the Haskell code, but it’s not exactly elegant or ergonomic. The pattern matching in particular is hard to read, hard to write, and doesn’t compose very nicely either. Imagine that we’re writing a library, and this is how we ask users to pattern match on the data that we supply them? Yuk.
Fortunately, we can do better.
The crucial insight is that Nix’s attribute sets are actually much more suitable for encoding pattern matching than they are for encoding constructors. With the perspective that pattern matches are attribute sets of continuations, constructors simply become the inverse – field selectors and applications. It’s wonderful dualities like these that make functional programmers fancy themselves mystics.
{
This = a: match: match.This a;
That = b: match: match.That b;
These = a: b: match: match.These a b;
foo = x: x {1;
This = a: a + if b > 0 then b else 0;
That = b:
These = a: b: a * b;
}; }
This is how you want to emulate ADT’s and pattern matching in Nix. It’s readable, composes wonderfully, and even gives some pretty sensible errors if you make what would otherwise be a type error or pattern match failure. The general principle is the same as Church encoding, but by using attribute sets instead of lambda’s, we get named, unordered branches.
But if this attribute set-based Church encoding is so great, why isn’t this how pattern matching works in PureNix?
Up until now, this post has been about practical tips that you can apply when writing Nix code. From here on out however, things are going to be less practical (or at least, I hope that you don’t actually have to write code like this by hand). The goal is to explain and motivate the concepts behind the pattern matching implementation in PureNix.
Languages with first-class support for pattern matching typically offer many more features than just individually matching on every constructor. In PureScript, a patterns can consist of literals, wildcards, ignores, as-patterns, guards, nested patterns, and tupled patterns. We could try to extend the continuation-based implementation above to support some of that, but when you want to support all of these features you end up sacrificing the things that made it so nice in the first place. So, let’s go back to the drawing board.
Fundamentally, a pattern does two things;
It can do these things multiple times and in any order, but ultimately it always reaches one of two outcomes:
A full pattern match tries the given patterns in order and if none apply, it throws an error.
Implementation-wise, we start by capturing a failure continuation, and from there it’s just a soup of if
-expressions and let
-bindings. Here is (a cleaned up version of) what PureNix actually outputs. For the ADT’s, we’re back to our initial naive tagged union implementation, but you can see how patterns are now named and composable:
{"This"; field0 = value0; };
This = value0: { tag = "That"; field0 = value0; };
That = value0: { tag = "These"; field0 = value0; field1 = value1; };
These = value0: value1: { tag =
foo = v:let
if v.tag == "This" then let a = v.field0; in a + 1 else fail;
pattern0 = fail: if v.tag == "That" then let b = v.field0; in if b > 0 then b else 0 else fail;
pattern1 = fail: if v.tag == "These" then let a = v.field0; b = v.field1; in a * b else fail;
pattern2 = fail: builtins.throw "Pattern match failure in src/Main.purs at 16:1 - 16:28";
patternFail = in
pattern0 (pattern1 (pattern2 patternFail)); }
Of course, this example doesn’t really show how we now support more complicated patterns than before. Let’s fix that:
bar :: These (These Int Int) Int -> Int
These a@(These b _) c) | b > c = foo a
bar (= 0 bar _
This snippet compiles to:
bar = v: let
pattern0 = fail: if v.tag == "These" && v.field0.tag == "These"
then
let
a = v.field0;
b = v.field0.field0;
c = v.field1;in
if b > c then foo a else fail
else fail;
0;
pattern1 = fail: builtins.throw "Pattern match failure in src/Main.purs at 21:1 - 21:40";
patternFail = in
pattern0 (pattern1 patternFail);
Here in pattern0
we can see
fail
continuation allows us to fail in multiple places during the scrutinization of the pattern.In Section 2 we saw an elegant way of encoding ADT’s and pattern matching as attribute sets of continuations. It’s very readable, pleasant to work with, and works great for most cases. Unfortunately, it doesn’t actually have the power to emulate all of the pattern matching features you expect in a modern functional language.
In that case, we instead define patterns one-by-one, and compose them by capturing a failure continuation. At this point, the code is not the kind of code you’d typically be happy writing by hand, but the ideas can still be useful in other contexts.
Thanks for reading, I hope you got something out of it.