I just returned from yet another fantastic ICFP week. I always have a blast at the conference. It would be easy to take the conversations I had there and turn them into a full-blown research program that would easily take me to next year’s ICFP… and then rinse and repeat!
But in this post, I want to focus on one particlar thorn in my side that started bothering me more than it had before: type families. I’ve realized, in the course of many conversations, that I don’t really understand them. This post will be more questions than answers as I try to write down what ails me.
This post is Literate Haskell.
> {-# LANGUAGE TypeFamilies, DataKinds, UndecidableInstances, GADTs,
> MultiParamTypeClasses, FunctionalDependencies,
> FlexibleInstances, FlexibleContexts #-}
> module WhatAreTypeFamilies where
Total type families are OK
A total type family feels quite comfortable. By total here, I mean a type family whose equations cover all the possible cases and which is guaranteed to terminate. That is, a total type family is properly a function on types. Those other dependently typed languages have functions on types, and they seem to work nicely. I am completely unbothered by total type families.
Non-covering type families are strange
A non-covering type family is a type family whose patterns do not cover the whole space. Let’s consider closed and open families separately, because the issues that come up are different.
Closed type families
For example:
> type family F1 a where
> F1 Int = Bool
Note that F1
is closed – we won’t ever be adding more equations. F1
is clearly non-covering. Yet, the type F1 Char
is perfectly well formed.
> sillyId :: F1 Char -> F1 Char
> sillyId x = x
But sillyId
can’t ever be called on a value. This doesn’t stop us from doing chicanery with F1
though.
> data SillyGadt a where
> MkSillyGadt1 :: Int -> SillyGadt (F1 Char)
> MkSillyGadt2 :: Float -> SillyGadt (F1 Double)
Now, even though F1 Char
is nonsense, SillyGadt (F1 Char)
is inhabited. And GHC distinguishes between F1 Char
and F1 Double
.
In GHC, a type family that can’t reduce is considered “stuck”. Such a type is effectively a new type, equal only to itself, like empty datatypes a programmer might declare. The only thing different about something like F1 Char
from something like Maybe Char
is that F1 Char
cannot be used in a type pattern (that is, the left-hand side of a type family equation or data family instance). That’s the only difference. So, when we declare F1
, we create an infinite family of unique types containing F1
applied to all types of kind *
, except for Int
, because F1 Int
is identically Bool
.
I find this treatment of non-covering closed type families rather bizarre. Compare against term-level behavior. When a term-level function is non-covering, passing an argument for which there are no equations immediately results in a call to error
. Yet the same thing in types behaves very differently.
I’m not the only one who finds this strange: Lennart Augustsson posted GHC bug #9636 about this very issue. Indeed, my current views on the subject are strongly influenced by Lennart’s comments.
But, sometimes you want stuckness
Sadly, sometimes authors want stuck type families. I know of two examples.
- The bowels of GHC defines this:
type family Any :: k where {}
Thus,
Any
is a closed type family, available at any kind, with no equations. GHC makes convenient use forAny
. Consider the expressionlength []
. This expression has some pointless polymorphism in it:length
can be applied to a list of any type, and[]
can be a list of any type. Yet, GHC must choose precisely which type these polymorphic definitions are specialized to. Not to play favorites, GHC choosesAny :: *
. Such pointless polymorphism can indeed happen at any kind, soAny
is polymorphic in its return kind. (Before kind polymorphism,Any
existed as a special form. It’s not so special any more.)It’s quite important that
Any
is a type family, not a datatype. If it were a datatype (and we forget about the weirdness of a datatype polymorphic in its return kind), then it could be matched in type patterns, making this strangeness possible:type family ThreeBools (b :: Bool) :: Nat where ThreeBools 'True = 1 ThreeBools 'False = 2 ThreeBools Any = 3
Bizarre! And yet this was possible in GHC 7.6 and below (modulo the closed type family, which didn’t exist yet). And it caused problems.
Of course, with
Any
, we really want it to be stuck. We don’t wantAny
to reduce to some other type, and we don’t want uses ofAny
to be errors. - In his work implementing a units-of-measure library via type-checker plugins, Adam Gundry uses closed type families to define type functions whose behavior is specified only via a plugin, not by GHC’s built-in type family reduction rules. He wanted empty closed type families specifically, and so implemented them within GHC. The idea is that units-of-measure with their integer powers form an abelian group, which can be solved via custom machinery but not by GHC’s rather limited left-to-right rewriting used in type family reduction.
Once again, we want a closed type family that is stuck but is not an error. The existence of multiple such things got me ruminating on all of this some time ago. I attempted to figure this all out in a post on a GHC ticket, but that didn’t get me very far. And I won’t claim to have made any real progress since that post.
Open type families
Having a stuck open type family makes moderately more sense than a stuck closed type family. Suppose we have this:
> type family F2 a
> type instance F2 Int = Bool
Because new equations can be added to open type families at any time, talking about F2 Char
isn’t utterly bogus. But it might be nice to have to assert that an equation for F2 Char
is in scope before we use it in a type.
Injective type families
Injective type families are a new feature that just got merged with GHC head. Jan Stolarek, Simon Peyton Jones, and I had a paper at Haskell Symposium about them.
The idea is that you can put an injectivity annotation on a type family declaration, which is rather like a functional dependency. (The injective type family code in this post is not runnable, as it works only with GHC HEAD. The feature will be available with GHC 8.0, due out early 2016.)
type family F3 a = r | r -> a where
F3 Int = Bool
F3 Bool = Int
F3 a = a
Note that if you know the value of F3 a
, then you can discover the value of a
: that’s what injectivity is all about.
Now consider this:
type family F4 a = r | r -> a where
F4 (Maybe a) = a
Looks injective, right? Wrong. The problem is that, if this were injective, we could take a proof that F4 a ~ F4 b
and get a proof of a ~ b
. Now, consider F4 (Maybe (F4 Int))
. This reduces to F4 Int
. Thus, we have F4 (Maybe (F4 Int)) ~ F4 Int
. And now, by injectivity, we get Maybe (F4 Int) ~ Int
. That is, a Maybe <blah>
equals an Int
. Terrible!
So, in our implementation of injectivity, we exclude type families like F4
. But it’s really unexpected that F4
should be rejected (at least to me). The problem, at its core, is about partiality. Note that we needed F4 Int
– an un-covered case – to make the bad equality proof.
Looping type families are strange
At first blush, it seems that a looping type families cause chiefly one problem: a looping GHC. But I’m very unconcerned about a looping GHC. After all, if a user has written a non-terminating type family, that amounts to a non-terminating program to be run at compile time. No one should be very surprised when a program that contains a loop does, in fact, loop.
The problem is that looping type families can actually create effectively infinite types. GHC has various protections against actually running off to infinity, but a clever programmer can defeat these protections. And infinite types can cause unsoundness. Indeed, while writing the closed type family paper, we uncovered one such unsoundness, closing bug #8162.
However, even with #8162 gone, we still have a lingering problem. Consider
> type family F5 a b where
> F5 a a = 'True
> F5 a b = 'False
This is a straightforward equality test, and it works in many cases. But if we ask F5 a (Maybe a)
(for some unknown a
), the type family doesn’t reduce. It really should! There is no way a finite a
can be the same as Maybe a
. But we can have an infinite a
, instantiated to MaybeInf
:
> type family MaybeInf where MaybeInf = Maybe MaybeInf
MaybeInf
here is a perfectly well-formed nullary (0-argument) type family. If the a
in F5 a (Maybe a)
were instantiated with MaybeInf
, then reducing to 'True
would be correct. And thus GHC can’t reduce F5 a (Maybe a)
to 'False
, as one might like.
And this isn’t purely an academic concern. I don’t have links I can share, but this has bitten real programmers doing real work. Once again, the problem boils down to partiality: GHC must be conservative about reducing closed type families (such as F5
) because of the possibility that someone, somewhere will write a non-terminating type family. If F5 a (Maybe a)
were to reduce to 'False
, I’m confident that a clever enough person could use this behavior to implement unsafeCoerce
. Similar concerns prevent certain type families from being considered injective – see the paper for the example. If we could somehow be sure that there were no non-terminating type families, these concerns would go away.
Contrast with functional dependencies
Simply, functional dependencies don’t suffer from these problems. Let’s look at the issue around injectivity, as it’s representative. First, recast in terms of functional dependencies:
> class F4' a r | a -> r, r -> a
> instance F4' (Maybe a) a
Note the bidirectional functional dependency. The left-to-right dependency is the built-in dependency inherent in type families. The right-to-left dependency arises from the injectivity annotation.
Our problem case above was F4 (Maybe (F4 Int))
. This case maps to
fundep :: (F4' Int f4_int, F4' (Maybe f4_int) result) => ()
fundep = ()
GHC helpfully tells us
Couldn't match type ‘Maybe f4_int’ with ‘Int’
arising from a functional dependency between:
constraint ‘F4' Int f4_int’
arising from the type signature for
fundep :: (F4' Int f4_int, F4' (Maybe f4_int) result) => Int
instance ‘F4' (Maybe a) a’
So GHC notices that there is a problem. But even if it didn’t, we’d be OK. That’s because the F4' Int f4_int
constraint essentially asserts that there is some value for F4' Int
. Of course, there isn’t any such value, and so fundep
would never be callable. Indeed, we can’t write any more instances of F4'
, as anything we could write would violate the dependencies.
The bottom line here is that there is no way to get into trouble here because the constraints necessary to use bogus functional dependencies can never be satisfied.
A radical proposal
So, what should we do about all of this? I’m inspired by the non-problems with functional dependencies. I thus propose the following. (Actually, I don’t. This is too radical. But I’m trying to provoke discussion. So take this all with a heap of salt.)
- Remove unassociated open type families. They’re quite awkward anyway – either they should be closed and define some sort of concrete function, or they will need a class constraint nearby to be useful.
- Require class constraints to use associated type families. That is, if
F a
is associated with a classC a
, writingF Int
without aC Int
constraint in scope is an error. This means that the use of an associated type family in the right-hand side of another associated type family will require the class constraint in the instance context. - Require closed type families to be total, with an improved termination checker. Naturally, closed type families would be unable to call partial type families.
- Devise a mechanism to associate a closed type family with a class. This would allow for partial closed type families, which indeed are useful.
The bottom line to all of this is that any partial type family should be associated with a class constraint. If the constraint can be satisfied, this means that the type family is defined and well-behaved at the arguments provided. With this, I believe all the problems above go away.
Another way to see this proposal is that functional dependencies are the Right Way to do this, and type families (at least partial ones) are wonky. But type families have much better syntax, and I’d like to keep that.
So, where to go from here? I’m not sure. This proposal is pretty radical, but it could potentially be implemented with the requisite deprecation period, etc. But, mostly, I’m curious to see what conversation this all provokes.