2020-05-10
One of my eventual goals with this blog is a post, or series of posts self-importantly titled Effective Haskell1. It’s a response to the recent Simple/Boring/Junior Haskell movement. It’s true that when writing Haskell in production, you need to be cautious not to accrue accidental complexity. My issue with the argument is the implication that we’re giving something up. Instead, I’d argue that using “advanced” features like GADTs, Type Families and even type classes is often unnecessary in the first place2, and a failure to find a simpler solution.
I’m not sure if I’ll ever actually finish a single post in Effective Haskell, but what I’m going to do in the meantime is use this blog to collect examples of what I think would classify as Effective Haskell.
In this inaugural post, we will be studying a technique (ostensibly) for iterating over record fields and adding metadata to them, without using Template Haskell or Generics. I’m calling it a descriptor because that’s what it reminds me of, but if somebody else has named it before, or just knows a more fitting name, please let me know.
Our initial motivating example is from where I initially stumbled into this technique.
The goal of the library in this example is to give a way to define graphics pipelines in a safe and efficient way. The user of the library formulates the global arguments (uniforms in GL parlance) to a pipeline on the Haskell side as a data type:
data Blinn f = Blinn
viewPos :: f (V3 Float)
{ lightAmbient :: f (V3 Float)
, lightDiffuse :: f (V3 Float)
, lightSpecular :: f (V3 Float)
,...
shininess :: f Float
, mvpMatrices :: MVP f
, }
The shaders (GPU code) are written in a different language, and we can only compile them manually at runtime. From this Haskell record, we then need to:
All of that is handled by createProgram
, which creates our pipeline and only requires that the user pass a descriptor of the Blinn
record:
log) <- createProgram "glsl/common.vert" "glsl/blinn.frag" $
(program, -> Blinn <$> f viewPos 0 "viewPos"
\f <*> f lightAmbient 0.2 "light.ambient"
<*> f lightDiffuse 1 "light.diffuse"
<*> f lightSpecular 1 "light.specular"
-- (lines omitted)
<*> f shininess 32 "material.shininess"
<*> descMvp (\f' -> f (f' . mvpMatrices))
The descriptor is the function from the second line downward. This descriptor contains, for every field, the respective field accessor, an initial value, and the name of the variable in the shader code. If createProgram
succeeds it returns a Program Blinn
value, and now the field accessors of the original struct can e.g. be used as type-safe update functions:
$ do
withProgram program $= x
objViewPos . mvp) $= modelMat
(model . mvp) $= viewMat
(view drawMesh mesh
The point is this; a descriptor is an ordinary value, that you can have users provide about their data type. You can then use this value to do complicated things, in a type-safe manner, without the library ever having seen the original type, and without the user having to be aware of the machinery.
What the descriptor ultimately looks like will depend on the application3, but the principles stay the same. You might already be able to discern a lot by looking at how they’re used above, but let’s see how they work.
Let’s say we’re writing a library that provides a way to ask for data on the command line. We expose a function that, given a label, parsing function, and verification function, asks for and yields a single value:
ask :: String -> (String -> Maybe a) -> (a -> Bool) -> IO a
= go where
ask label parse check = do
go putStrLn $ "What's your " <> label <> "?"
<$> getLine >>= \case
parse Just r | check r -> return r
-> putStrLn "Invalid response" >> go _
And the user then uses Applicative
to assemble into a function asking for an entire record:
data Person = Person
pName :: String
{ pAge :: Int
,
}
askPerson :: IO Person
= Person
askPerson <$> ask "name" Just ((>1) . length . words)
<*> ask "age" readMaybe (\a -> a >= 18 && a <= 99)
This is the basis for our descriptor. The arguments to ask
in askPerson
describe general properties of the fields of Person
that might be useful in other contexts. We turn askPerson
into a descriptor as follows:
ask
so it takes a general field
function as an argumentfield
, the respective record field accessorIO
into any Applicative
descPerson :: Descriptor Person
= Person
descPerson field <$> field pName "name" Just ((>1) . length . words)
<*> field pAge "age" readMaybe (\a -> a >= 18 && a <= 99)
The here Descriptor
is a type synonym that forces us to be polymorphic:
type Descriptor s = forall m. Applicative m
=> (forall a. (s -> a)
-> String
-> (String -> Maybe a)
-> (a -> Bool)
-> m a
)-> m s
And that’s it. As a first exercies, we can use Descriptor
to construct something equivalent to the askPerson
we defined above:
askDesc :: Descriptor p -> IO p
= desc (const ask)
askDesc desc
askPerson :: IO Person
= askDesc descPerson askPerson
But what have we gained? We could swap out ask
for a similar function, of course. But there is a point to passing the record field accessor to field
; it allows us to work with existing data. For example, we can perform just the validation:
validate :: Descriptor p -> p -> [String]
= execWriter $ desc $ \field lbl _ p ->
validate desc pers let a = field pers
in unless (p a) (tell ["Invalid " <> lbl]) $> a
λ> validate descPerson (Person "aa" 45)
["Invalid name"]
Or, we can enumerate all the fields in a descriptor:
fields :: Descriptor p -> [String]
= execWriter $ desc $ \_ lbl _ _ ->
fields desc $> undefined tell [lbl]
λ> fields descPerson
["name", "age"]
That’s the gist of a descriptor; a function applied, to each field of a record, with some arguments, polymorphic over any applicative. How you structure the field
function depends on what you use the descriptor for, but this outlines the general idea.
It’s a pretty neat trick, but unfortunately, there are some issues here:
Descriptor
s that still type check:descNonsense :: Descriptor Person
= pure $ Person "太郎" 3 descNonsense _
undefined
to make the fields
definition above type check.validate
is a bit contrived; only outputting a list of invalid fields is hard to deal with safely.All of that is solved when we use Higher-Kinded Data (HKD), which is where this technique really comes into its own.
Higher-Kinded Data is a pattern where you parameterize record fields over some functor, like this:
data HPerson f = HPerson
hName :: f String
{ hAge :: f Int
, }
With HKD, HPerson Identity
is equivalent to the original Person
record, but we also get HPerson Maybe
that might have missing fields, HPerson (Const a)
that has a value of type a
for every field, etc.
We can apply the idea of the descriptor to HKD almost verbatim. Our new descPerson
and askDesc
look pretty much the same at the term level:
descHPerson :: HDescriptor HPerson
= HPerson
descHPerson field <$> field hName "name" Just ((> 1) . length . words)
<*> field hAge "age" readMaybe (\a -> a > 18 && a < 99)
askHDesc :: HDescriptor s -> IO (s Identity)
= desc $ \_ lbl parse check -> Identity <$> ask lbl parse check askHDesc desc
In HDescriptor s
our s
is now also polymorphic over the base functor. This is its type:
type HDescriptor s = forall m f. Applicative m
=> (forall a. (forall g. s g -> g a) -- or, equivalently, Field s a, see below
-> String
-> (String -> Maybe a)
-> (a -> Bool)
-> m (f a)
)-> m (s f)
Let’s revisit the issues with the non-HKD approach.
HDescriptor
cannot choose the underlying functor, it has to use field
to construct it. We can no longer construct a nonsensical HDescriptor
without explicitly using undefined
.fields
using Proxy
instead of undefined
:hfields :: HDescriptor p -> [String]
= execWriter $ desc $ \_ lbl _ _ -> tell [lbl] $> Proxy hfields desc
validate
:hvalidate :: HDescriptor s -> s Identity -> s Maybe
= runIdentity $ desc $ \f _ _ check ->
hvalidate desc s <&> (\a -> if check a then Just a else Nothing) f s
Here’s something that we couldn’t do at all before. Imagine that we get a HPerson (Const String)
from, say, a web form. We can then use the HDescriptor
to parse and check each field individually.
hParseCheck :: HDescriptor s -> s (Const String) -> s Maybe
= runIdentity $ desc $ \f _ parse check -> pure $
hParseCheck desc s case parse $ getConst (f s) of
Just r | check r -> Just r
-> Nothing _
When you use HKD, you typically want to be able to map
/traverse
/<*>
the fields of your record. There are libraries like higgledy
, barbies
, barbies-th
, or hkd
that help you derive the required instances (and other nice things). We can show that a descriptor gives you the same power:
dmap :: HDescriptor s ->
forall a. f a -> g a) -> s f -> s g
(= runIdentity $
dmap desc fn s $ \f _ _ _ -> pure $ fn (f s)
desc
dtraverse :: Applicative m => HDescriptor s ->
forall a. f a -> m (g a)) -> s f -> m (s g)
(=
dtraverse desc fn s $ \f _ _ _ -> fn (f s)
desc
dpure :: HDescriptor s ->
forall a. f a) -> s f
(= runIdentity $
dpure desc a $ \_ _ _ _ -> pure a
desc
dliftA2 :: HDescriptor s ->
forall x. f x -> g x -> h x) -> s f -> s g -> s h
(= runIdentity $
dliftA2 desc fn sf sg $ \f _ _ _ -> pure $ fn (f sf) (f sg) desc
This doesn’t necessarily mean that descriptors compete with the libraries above. The actual use cases are different, descriptors work best when you have to provide an interface to library users and don’t want to force them to use Template Haskell, Generics, or dependencies.
Briefly, before we continue: every record field accessor of an HKD has type forall f. s f -> f a
. To avoid having to quantify the f
every time, we’re going to assign it a type signature:
type Field s a = forall f. s f -> f a
For example, hName :: Field HPerson String
and hAge :: Field HPerson Int
.
One of the issues with normal Storable
-based FFI is that, even if you define a Storable
instance for a user-defined struct, you cannot perform any field-wise updates on it. With HKD we can, as follows:
data MyStruct f = MyStruct
versionMajor :: f Int
{ versionMinor :: f Int
, frictionCoefficient :: f Double
, baconNumber :: f Word8
,
}
data SPtr struct = SPtr
sBase :: Ptr ()
{ sOffsets :: struct (Const Int)
,
}
setField :: Storable a => SPtr struct -> Field struct a -> a -> IO ()
SPtr base offsets) field = poke ptr
setField (where
= plusPtr base . getConst . field $ offsets ptr
As you can see, the trick is to use MyStruct (Const Int)
to store the offset of every field. We can then update a single field using
1
setField ptr baconNumber
-- Or, if you want to get fancy,
let ($=) :: Storable a => Field struct a -> a -> ReaderT (SPtr struct) IO ()
$=) = ...
(flip runReaderT ptr $ do
$= 2
versionMajor $= 1 versionMinor
The field accessors of MyStruct
now double as field accessors for our foreign struct. I’m leaving getField
as an exercise, but it works the same way.
SPtr
Where does the SPtr
actually come from? As you might have guessed, we can make one with a descriptor.
<- newSPtr $ \field -> MyStruct
ptr <$> field versionMajor 1
<*> field versionMinor 9
<*> field frictionCoefficient 0.9
<*> field baconNumber 0
The second argument to field
is the initial value of each field.
newSPtr
traverses the constructor, creating the record of the offsets for each field. It then malloc
s the total size, and assigns each field its initial value:
newSPtr :: SDescriptor s -> IO (SPtr s)
= do
newSPtr desc <- mallocBytes size
base $ \f a -> poke (plusPtr base . getConst . f $ offsets) a $> Proxy
desc pure (SPtr base offsets)
where
= flip runState 0 $
(offsets, size) $ \_ a -> state (\s -> (Const s, s + sizeOf a)) desc
What makes SDescriptor
different from our previous descriptors is that it has a type class constraint on the field
function:
type SDescriptor struct = forall m f. Applicative m
=> ( forall a. Storable a
=> Field struct a
-> a
-> m (f a)
)-> m (struct f)
This means that, as soon as one of the fields of MyStruct
is not Storable
, you cannot write a SDescriptor
for it. Conversely, the existence of the SDescriptor MyStruct
proves that every field of MyStruct
is Storable
. For example, you could not add a String
field to MyStruct
, since String
aren’t Storable
. We’ll look into how you might deal with strings in the section on arrays below.
The initial example already hinted at the fact that structs/descriptors can be nested. The data definition is fairly straightforward, no different from how you would normally do it with HKD:
data MySuperStruct f = MySuperStruct
someInt :: f Int
{ nestedData :: MySubStruct f
, }
As for the descriptor itself, you simply call the descriptor for the nested struct in the place it occurs, but you’ll have to prepend the record field accessor as follows:
descMySuperStruct :: SDescriptor MySuperStruct
= MySuperStruct
descMySuperStruct field <$> field someInt 1
<*> descMySubStruct (\subField -> field (subField . nestedData))
As a final thought, let’s think about how to approach structs that contain arrays. This will be just one of the ways to tackle it, but there are ways to go e.g. statically known sizes.
The trick here is to give our records two4 functor parameters:
data Image fArr fPrim = Image
imgW :: fPrim Int
{ imgH :: fPrim Int
, imgData :: fArr Word8
, }
Correspondingly, our descriptor now takes two function arguments, with the one for arrays taking an extra one indicating the size:
myArrStructDescriptor :: ArrDescriptor MyStructWithArrays
= MyStructWithArrays
myArrStructDescriptor array field <$> field imgW 99
<*> field imgH 99
<*> array imgData (99 * 99 * 3) 0
I’ll give the type of ArrDescriptor
below for completeness’ sake, but even more than before it’s not about the specifics of this approach, but the general idea; you can have multiple field
-style functions. In this case the difference is between primitive updates in fPrim
and indexed updates in fArr
, but you could, for example, also have read/write-only fields.
type ArrDescriptor struct = forall m fArr fField. Applicative m
=> ( forall a. Storable a
=> (forall gArr gField. struct gArr gField -> gArr a)
-> Int
-> a
-> m (fArr a)
)-> ( forall a. Storable a
=> (forall gArr gField. struct gArr gField -> gField a)
-> a
-> m (fField a)
)-> m (struct fArr fField)
When you’re in the trenches of a tutorial like this, it can be hard to see the forest for the trees. Especially when working with nested structs and arrays, our types got pretty involved. However, I hope I have also been able to convince you that when this approach works, it can work really well. The library author (person who defines the descriptor) gets a lot of power, and the user (person who implements the descriptor) only has to define a single generic traversal. Furthermore, since we aren’t using any existing abstractions, we get to completely tailor it to our own needs, as you saw in the array example.
Ultimately, I’m not sure if the ideas here are going to be useful for many people. I have worked with libraries that horribly over-complicated their FFI so I know there are at least some people who might find this useful, but that’s not really the point of this post. Most importantly, I think it’s a neat example of how we can write wonderful abstract interfaces with just RankNTypes
and some polymorphism in the right places.
If you have any questions or criticism, feel free to contact me.