Haskell allows what I will call coincident overlap among type family instances. Coincident overlap occurs when two (or more) type family instances might be applicable to a given type family usage site, but they would evaluate to the same right-hand side. This post, inspired by Andy Adams-Moran’s comment to an earlier blog post, explores coincident overlap and how to extend it (or not!) to branched instances.
This post is a literate Haskell file (though there really isn’t that much code). Paste it into a .lhs file and load it into GHCi. Because the post uses branched instances, you will need the HEAD version of GHC. (Branched instances will be included in GHC 7.8, but not before.)
> {-# LANGUAGE TypeFamilies, DataKinds, GADTs, TypeOperators #-}
Our examples will be over Bool
s, and we need some way to get GHC to evaluate our type families. The easiest way is to use the following singleton GADT:
> data SBool :: Bool -> * where
> SFalse :: SBool False
> STrue :: SBool True
Conflict checking with type family instances
When compiling type family instances, GHC checks the instances for conflicts. To know if two instances conflict (i.e., could both match the same usage site), GHC unifies the two left-hand sides. For example, the following code is bad and is rejected:
type family F x
type instance F Int = Bool
type instance F a = Double
Compiling the above instances gives the following error message:
Conflicting family instance declarations:
F Int -- Defined at ...
F a -- Defined at ...
This check is a good thing, because otherwise it would be possible to equate two incompatible types, such as Int
and Bool
.
Coincident overlap among unbranched instances
Here is a nice example of how coincident overlap is useful:
> type family (x :: Bool) && (y :: Bool) :: Bool
> type instance False && a = False -- 1
> type instance True && b = b -- 2
> type instance c && False = False -- 3
> type instance d && True = d -- 4
Although the first two equations fully define the &&
operation, the last two instances allow GHC to reduce a use of &&
that could not otherwise be reducible. For example:
> and1 :: SBool a -> SBool (True && a)
> and1 x = x
>
> and2 :: SBool a -> SBool (a && True)
> and2 x = x
and1
uses the second instance of &&
, but and2
requires the fourth instance. If we comment that instance out, and2
fails to compile, because GHC cannot figure out that a && True
must be a
for all values of a
. For various good reasons, perhaps to be explored in another post, GHC does not do case analysis on types during type inference.
How does GHC know that overlap is coincident? During the conflict check, GHC looks for a substitution that unifies two potentially-conflicting instances. In our case, the fourth and first instances would conflict under the substitution {a |-> True, d |-> False}
. However, after finding the unifying substitution, GHC checks the right-hand sides under that same substitution. If they are the same, then GHC considers the overlap to be coincident and allows the instance pair. In our case, applies the substitution {a |-> True, d |-> False}
to False
and d
and discovers that both are False
, and so the instances are allowed.
Coincident overlap within branched instances
When thinking about branched instances and coincident overlap, there are two possibilities to consider: coincident overlap within a branched instance and coincident overlap among two separate branched instances. Let’s consider the first case here.
Imagine we define ||
analogously to &&
, but using one branched instance:
> type family (x :: Bool) || (y :: Bool) :: Bool
> type instance where
> False || a = a -- 1
> True || b = True -- 2
> c || False = c -- 3
> d || True = True -- 4
Now, let’s consider simplifying the type e || False
. The first two branches don’t match, but the third does. Now, following the rule for branched instance simplification (as stated in the Haskell wiki), we check to see if any previous branches might be applicable to e || False
, for any possible instantiation of e
. The first branch certainly might apply, and so e || False
fails to simplify. This is surely counterintuitive, because the third branch matches e || False
exactly!
Just to prove this behavior, I tried running this code through GHC:
bar :: SBool a -> SBool (a || False)
bar x = x
Here is the error:
Couldn't match type ‛a’ with ‛a || 'False’
At first blush, it seems I’ve missed something important here in the implementation — allowing coincident overlap within a branched instance. But, there is a major problem with such overlap in this setting. Let’s think about how coincident overlap would work in this setting. After selecting the third branch to simplify e || False
(with the substitution {c |-> e}
), GHC checks to see if any previous branch could be applicable to e || False
. The first branch, False || a
, unifies with e || False
, so it might be applicable later on. The unifying substitution is {a |-> False, e |-> False}
. Now, if we wanted to check for coincident overlap, we would apply both substitutions ({c |-> e}
and {a |-> False, e |-> False}
) to the right-hand sides. In this case, we would see that both right-hand sides would become False
, and it seems we should allow the simplification of e || False
to e
.
Let’s try a harder case. What if we want to simplify (G f) || False
, for some type family G
? The third branch matches, with the substitution {c |-> G f}
. Now, we check earlier branches for applicability. The first branch is potentially applicable, if G f
simplifies to False
. But, we can’t produce a substitution over type variables to witness to check right-hand sides. In this case, it wouldn’t be hard to imagine a substitution like {(G f) |-> False}
, but that’s a slippery slope to slide down. What if f
appears multiple times in the type, perhaps under different type family applications? How do we deal with this? There may well be an answer, but it would be subtle and likely quite fragile from the programmer’s point of view. So, we decided to ignore the possibility of coincident overlap within a branch. We were unable to come up with a compelling example of why anyone would want this feature, it seemed hard to get right, and we can just write ||
using separate instances, anyway.
Coincident overlap between branched instances
Consider the following (contrived) example:
type family F (x :: Bool) (y :: Bool) :: *
type instance where
F False True = Int
F a a = Int
F b c = Bool
type instance F d True = Int
Is this set of instances allowable? Is that your final answer?
I believe that this set of instances wouldn’t produce any conflicts. Anything that matches F d True
would have to match one of the first two branches of the branched instance, meaning that the right-hand sides coincide. However, it is difficult to reason about such cases, for human and GHC alike. So, for the sake of simplicity, we also forbid coincident overlap whenever even one instance is branched. This means that
type instance F Int = Bool
and
type instance where F Int = Bool
are very subtly different.
Andy Adams-Moran’s example
This post was partially inspired by Andy Adams-Moran’s comment in which Andy poses this example, paraphrased slightly:
> data T a
> type family Equiv x y :: Bool
> type instance where
> Equiv a a = True -- 1
> Equiv (T b) (T c) = True -- 2
> Equiv (t d) (t e) = Equiv d e -- 3
> Equiv f g = False -- 4
Alas, this does not work as well as one would like. The problem is that we cannot simplify, say, Equiv (T a) (T b)
to True
, because this matches the second branch but might later match the first branch. Simplifying this use would require coincident overlap checking within branched instances. We could move the first branch to the third position, and that would help, but not solve the problem. With that change, Equiv x x
would not simplify, until the head of x
were known.
So, is this the “compelling example” we were looking for? Perhaps. Andy, why do you want this? Can your use case be achieved with other means? Do you (anyone out there) have a suggestion for how to deal with coincident overlap within branched instances in a simple, easy-to-explain manner?