I enjoy using a type system to help make sure my term level code is unimpeachably correct. This is where my interest in writing the singletons
library came from. This library allows you to write some dependently typed code in Haskell, using singleton types. I didn’t invent this idea, but I did write a nice library to remove some of the pain of using this encoding. SHE can be considered an ancestor of singletons
.
At my Haskell Symposium (2012) presentation of the singletons work, an attendee asked if singleton generation works for higher-order functions, like map
. I innocently answered “yes”, at which point Conor McBride, sitting in the back, stood up and said “I don’t believe you!” I wasn’t lying — singletons
does indeed handle higher-order functions. However, Conor’s skepticism isn’t unfounded: a “singletonized” higher-order function isn’t so useful.
This blog post explores why singletonized higher-order functions aren’t useful and suggests defunctionalization as a way to fix the problem.
Before we get too much further, this blog post is a literate Haskell file, so we have some necessary throat-clearing:
> {-# LANGUAGE TemplateHaskell, DataKinds, PolyKinds, TypeFamilies,
> GADTs, FlexibleContexts, RankNTypes, TypeOperators #-}
> import Prelude hiding (map)
> import Data.Singletons
I should also warn that some of the code rendered as Haskell in this blog post does not have bird-tracks. This code is not intended as part of the executable code. I should finally note that this code does not compile with the current HEAD of GHC (but it does compile with 7.6.1, at least). The new ambiguity check overzealously flags some of the code here as inherently ambiguous, which it is not. I have filed bug report #7804.
Introduction to singletons
First off, what is a singleton? I will give a brief introduction here, but I refer you to the singletons paper on the subject. A singleton is a type with exactly one value. Let’s make a singleton for the natural numbers:
> data Nat = Zero | Succ Nat
> $(genSingletons [''Nat])
The second line above generates the following singleton definition for Nat
:
data SNat :: Nat -> * where
SZero :: SNat Zero
SSucc :: SNat n -> SNat (Succ n)
(Well, it doesn’t quite generate that, but let’s pretend it does. See the paper [or use -ddump-splices
!] for more details.) According to this definition, there is exactly one value for every SNat n
. For example, the type SNat (Succ Zero)
has one value: SSucc SZero
. This is interesting because it means that once we identify a value, say in a case
expression, we also know a type index. This interplay between term-level matching and type-level information is what makes singletons enable something like dependently typed programming.
The singletons
library provides a singleton for []
, but with alphanumeric names. We can pretend this is the definition:
data SList :: [k] -> * where
SNil :: SList '[]
SCons :: Sing h -> SList t -> SList (h ': t)
The Sing
in there (the first parameter to SCons
) is defined thus:
data family Sing (a :: k)
Using a data family allows GHC to choose the correct singleton type, depending on the kind k
. An instance of this family is defined for every singleton type we create. So, actually, the list type built into the singletons
library is more like
data instance Sing (list :: [k]) where
SNil :: Sing '[]
SCons :: Sing h -> Sing t -> Sing (h ': t)
The singletons
library also provides a synonym SList
to refer to this instance of the Sing
family. Again, you may find more clarity in the singletons paper, which spends a little more time drawing all of this out.
Singleton first-order functions
We can singletonize more than just datatypes. We can singletonize functions. Consider the following predecessor function on Nat
s, defined in such a way that we get the singleton definition generated for us:
> $(singletons [d|
> pred :: Nat -> Nat
> pred Zero = Zero
> pred (Succ n) = n
> |])
A definition of pred
that works on singleton Nat
s is generated for us. It looks something like
sPred :: SNat n -> SNat ???
sPred SZero = SZero
sPred (SSucc n) = n
The problem is those pesky ???
marks. Because the type indices of a singleton mirror the computation of singleton values, every function on singletons must be mirrored at the type level. So, to define sPred
, we must have the type family Pred
as well:
type family Pred (n :: Nat) :: Nat
type instance Pred Zero = Zero
type instance Pred (Succ n) = n
sPred :: SNat n -> SNat (Pred n)
...
The singletons
library generates both the type-level Pred
and the singletonized sPred
.
Singleton higher-order functions
But what about map
?
> $(singletons [d|
> map :: (a -> b) -> [a] -> [b]
> map _ [] = []
> map f (h : t) = f h : map f t
> |])
The singletons
library generates these definitions (but with some extra class constraints that don’t concern us):
type family Map (f :: k1 -> k2) (list :: [k1]) :: [k2]
type instance Map f '[] = '[]
type instance Map f (h ': t) = ((f h) ': (Map f t)
sMap :: (forall a. Sing a -> Sing (f a)) -> Sing list -> Sing (Map f list)
sMap _ SNil = SNil
sMap f (SCons h t) = SCons (f h) (sMap f t)
Whoa! What’s the bizarre type doing in sMap
? The forall
declares that the function passed into sMap
must be valid for any a
. That’s not so strange, when we think about the fact the index a
must be isomorphic to the term Sing a
. We’re used to having functions that work for any term. Here, because of the relationship between term values and type indices, the function must also work for any type index a
. This is particularly important, because Map
will apply f
to all the a
s in the list list
. If we leave off the forall
, the function won’t type check.
This is all well and good, but this definition of sMap
isn’t useful. This is because the type of that first parameter is quite restrictive. We must have a function of that type, and f
must be inferrable. Let’s look at some examples. We can write the following just fine:
> sOne = SSucc SZero
> sTwo = SSucc sOne
> sThree = SSucc sTwo
> sNums = SCons sOne $ SCons sTwo $ SCons sThree SNil -- [1,2,3]
>
> two_three_four = sMap sSucc sNums
(sSucc
is a so-called “smart” constructor. It is equivalent to SSucc
, but adds extra class constraints that don’t concern us here. See Section 3.1 of the singletons paper.) The type of SSucc
is forall a. Sing a -> Sing (Succ a)
, so the call to sMap
type checks just fine. SSucc
is perfect here. Let’s try something else:
zero_one_two = sMap sPred sNums
The type of sPred
is forall n. Sing n -> Sing (Pred n)
, as written above, so one would think all is good. All is not good. The problem is that Pred
is a type family, not a regular type constructor like good old Succ
. Thus, GHC does not (and cannot, with good reason) infer that f
in the type of sMap
should be Pred
:
Couldn't match type `Pred t1' with `t t1'
Expected type: Sing Nat t1 -> Sing Nat (t t1)
Actual type: Sing Nat t1 -> Sing Nat (Pred t1)
The reason this inference is bogus is that GHC will not let a type variable unify with a type family. In its internal constraint-solving machinery, GHC assumes that all type variables are both injective and generative. Injective means that from an assumption of t a ~ t b
, (where ~
denotes type equality) we can derive a ~ b
. Generative means that from an assumption of t a ~ s b
, we can derive t ~ s
. Type families, in general, have neither property. So, GHC won’t let a type variable unify with a type family.
This problem — called the saturation requirement of type families — is what Conor was thinking about when he disbelieved that singletons
handled map
.
Defunctionalization
Over lunch while at ICFP, I had the good fortune of sitting with Tillmann Rendel, and we got to talking about this problem. He suggested that I think about defunctionalization. I have thought about this, and I think it’s the answer to the problem.
Defunctionalization is an old technique of dealing with higher-order functions. The idea is that, instead of making a closure or other pointer to code, represent a function with some symbol that can be looked up and linked to the code later. Danvy and Nielsen wrote a more recent paper explaining how the whole thing works. One drawback of the technique that they outline is that defunctionalization tends to require whole-program translation. That is, the transformation requires looking at the entire codebase to do the translation. This is generally necessary so that the table of function “symbols”, encoded as an algebraic datatype, can be matched on. However, in Haskell, we have open type functions, so this problem does not limit us.
Another drawback of defunctionalization is that it is generally poorly-typed. If we are just using some symbol to denote a function, how can we be sure that a function application is well-typed? Pottier and Gauthier address this issue in their paper on the topic by using generalized algebraic datatypes (GADTs). But, given the way type families work in Haskell, we don’t need the power of GADTs to do this for us.
Encoding defunctionalization in Haskell type families
At the heart of any defunctionalization scheme is an apply
function:
type family Apply (f :: k1 -> k2) (a :: k1) :: k2
But wait, we don’t really want Apply
to have that kind, because then we would have to pass Pred
in as the function, and Pred
all on its own is unsaturated. What we need is some symbol that can represent Pred
:
type family Apply (f :: *) (a :: k1) :: k2
data PredSym :: *
type instance Apply PredSym n = Pred n
This is progress. We can now pass PredSym
around all by itself, and when we apply it, we get the desired behavior. But, this is weakly kinded. We would like to be able to define many symbols akin to PredSym
, and we would like GHC to be able to make sure that we use these symbols appropriately — that is, we don’t say Apply PredSym '[Int, Bool]
.
Yet, we still want to be able to create new symbols at will. So, we want to use data
declarations to create the symbols. Thus, the kind of these symbols must end in -> *
. But, we have complete freedom as to what appears to the left of that arrow. We will use this definition to store the kinds:
> data TyFun :: * -> * -> *
Now, we can make our richly kinded Apply
:
> type family Apply (f :: (TyFun k1 k2) -> *) (x :: k1) :: k2
> data PredSym :: (TyFun Nat Nat) -> *
> type instance Apply PredSym x = Pred x
This one works. But, we want it to work also for real type constructors (like Succ
), not just type families. We have to wrap these type constructors in an appropriately kinded wrapper:
> data TyCon :: (k1 -> k2) -> (TyFun k1 k2) -> *
> type instance Apply (TyCon tc) x = tc x
Then, we define a new version of sMap
that works with Apply
:
> type family DFMap (f :: (TyFun k1 k2) -> *) (ls :: [k1]) :: [k2] > type instance DFMap f '[] = '[] > type instance DFMap f (h ': t) = (Apply f h ': DFMap f t)
sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]). (forall a. Sing a -> Sing (Apply f a)) -> Sing ls -> Sing (DFMap f ls) sDFMap _ SNil = SNil sDFMap f (SCons h t) = SCons (f h) (sDFMap f t)
We’re close, but we’re not there yet. This sDFMap
function has a major problem: it is inherently ambiguous. The type variable f
appears only inside of type family applications, and so there’s no way for GHC to infer its value. This problem has a straightforward solution: use a proxy.
> data Proxy a = P
sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]). Proxy f -> (forall a. Sing a -> Sing (Apply f a)) -> Sing ls -> Sing (DFMap f ls) sDFMap _ _ SNil = SNil sDFMap p f (SCons h t) = SCons (f h) (sDFMap p f t)
This one really does work, but we can still do better. The problem is that some plumbing is exposed. When calling this version of sDFMap
with sPred
, we still have to explicitly create the proxy argument and give it the correct type, even though we would like to be able to infer it from sPred
. The trick is that, while we do need to have f
exposed in the type of sDFMap
, the location where it is exposed doesn’t matter. It can actually be in an argument to the callback function. This next, final version also contains those pesky class constraints that we’ve been trying to avoid this whole time.
> sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]).
> SingKind (KindParam :: OfKind k2) =>
> (forall a. Proxy f -> Sing a -> Sing (Apply f a)) ->
> Sing ls -> Sing (DFMap f ls)
> sDFMap _ SNil = sNil
> sDFMap f (SCons h t) = sCons (f P h) (sDFMap f t)
To call this function, we will need to create wrappers around, say, sPred
and sSucc
that explicitly relate the functions to their defunctionalization symbols:
> sPred' :: Proxy PredSym -> Sing n -> Sing (Pred n)
> sPred' _ = sPred
> sSucc' :: Proxy (TyCon Succ) -> Sing n -> Sing (Succ n)
> sSucc' _ = sSucc
Now, finally, we can use sDFMap
as desired:
> two_three_four' = sDFMap sSucc' sNums
> zero_one_two = sDFMap sPred' sNums
Conclusions
This whole thing is a bit of a hack, but it’s one that seems to follow a nice pattern that could be automated. In particular, I believe it should be relatively straightforward to incorporate this kind of encoding into a future version of singletons
. This would allow more code to be singletonized and support dependently typed programming better.
One clear drawback of this approach is that the arity of a defunctionalized function must be a part of the encoding. In some future world with kind families, it may be possible to generalize the arities. One idea I have for a future blog post is to grapple with higher-arity and partially applied functions, which may be a bit icky. And, what about defunctionalizing higher-order functions?
The question at the end of all of this is: could this be an approach to desaturating type families in Haskell? In other words, could an approach based on the ideas here be incorporated into GHC to make all of this Just Work? I’ve thought about it a bit, but not long enough or hard enough to really have an opinion. What do you think?
And, if I write this functionality into singletons
, will it be useful to you? We all have limited time, and it’s helpful to know if such an improvement will be put to use.
It is in fact not quite true that you’re not using GADTs (in comparison with the work of François Pottier and Nadji Gauthier): the use you make of kind-level data is precisely similar to what GADTs are at a type level.
When you write
type family Apply (f :: k1 -> k2) (a :: k1) :: k2
this should be understood as a kind-polymorphic definition of the form
type family Apply k1 k2 (f :: k1 -> k2) (a :: k1) :: k2
In this style, the PredSym case for Apply would be:
type instance Apply k1 k2 PredSym x = Pred x
now why is this definition well-typed? Because when matching on PredSym, you learn the kind equalities (k1 = Nat) and (k2 = Nat), which allows you to refine your context and give a more precise type to this branch. This is exactly how GADTs type-checking is understood.
There is something even more interesting about your construction. The usual understanding of defunctionalization (typed or not) is as a non-modular transform: you need to define the apply function on all the tags in one single place, so it cannot easily be extended afterwards with new functions. The paper by Pottier and Gauthier claims that this is an orthogonal problem, as using *extensible* sum types (as the exception type in ML) instead of the usual closed GADTs would allow after-the-fact extension of the apply function. It’s not too hard to say that in a “Future Work” section, but in practice I’m not aware of real-world uses of extensible GADTs for modular defunctionalization (Leo P. White has an implementation of extensible sum types as a patch for OCaml, that handles GADTs, https://sites.google.com/site/ocamlopen/ , but to my knowledge the dots have not been connected).
On the contrary, your approach seems naturally modular and extensible. How can this be simpler at the kind level than at the type level? It’s not. It’s just that we have been used to work with type system features supporting extensibility (addition of new types, new type classes, new instances, and now new instances of type families), so the type-level language of Haskell happens to support exactly this style of extensible GADTs necessary for a modular defunctionalization. This is very good for your present work (it moves defunctionalization from “essentially unusable” to “barely usable”), but it is also maybe a sign of trouble to come: extensible variants are generally harder to handle than simple variants, and surely they’ll find a way to bite back Haskell metatheorists at some point.
Thanks for adding some texture to the background behind this technique.
My understanding of the mechanics of type family instances differs from yours, subtly. In the PredSym instance of Apply, I believe GHC interprets the source as meaning the following:
type instance Apply Nat Nat PredSym x = Pred x
Thus, the instance is well kinded without the need to reason about equalities. (Equalities of the nature you mention — equating, say, k1 with Nat — don’t currently exist in GHC.)
I do certainly agree that this technique just happens to work with Haskell’s extensible type system, and the open kind *, in particular. And, yes, there are some problems with extensibility and modularity, particularly when dealing with overlapping type families.
Lastly, I most certainly agree with your comment “barely usable” as applied to this technique. It’s an idea that might work well with code generators (i.e. the singletons library), but it would be very painful to use directly.
Pingback: singletons v0.9 Released! | Types and Kinds
Utilized in the new version of Vinyl, ~ 0.4, as mentioned in the talk, Programming in Vinyl / BayHac 2014:
> Generative means that from an assumption of t a ~ s b, we can derive t ~ s.
I believe it should be `t a ~ s a => t ~ s`
I disagree here — I did mean (t a ~ s b) implies (t ~ s). There is no need for `a` and `b` to be the same. Is there a case you’re worried about?
I’ve returned to the article and I think you’re right. I just always thought of generativity as a property as I’ve described above.
Pingback: Why Dependent Haskell is the Future of Software Development - Serokell – gutuka
Pingback: Thoughts on Faking Some of GADTs in Rust - Hey There Buddo!
Pingback: Why Dependent Haskell is the Future of Software Development – Serokell – TechBits
Pingback: Why Dependent Haskell is the Future of Software Development - Serokell - Latest News,Jobs,Notifications.Gk,Telugu News,Results,MCQs,Science,Tips and Tricks and More.
Pingback: Hexbyte News Computers Why Dependent Haskell is the Future of Software Development - Serokell - HexByte Inc.
Pingback: why this is the future of software development / geek magazine – Developers