Monthly Archives: December 2012

Ordered overlapping type family instances

I am pleased to announce that I have pushed my implementation of ordered overlapping type family instances to GHC HEAD.

This blog post is a literate Haskell file. Copy and paste into a .lhs file to try out this code. This file will only compile with GHC HEAD, however.

We need some header formalities:

> {-# LANGUAGE TypeFamilies, DataKinds, PolyKinds, TypeOperators #-}
> import Prelude hiding (zipWith)

The Problem

When writing term-level functions, it is natural to write a series of equations, each using a sequence of patterns to select which equation should be triggered when calling the function. Critically for this discussion, the first matching equation is used. Let’s use a particularly favorite function of mine as an example:

> import Prelude hiding (zipWith)
> 
> zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
> zipWith f (a:as) (b:bs) = (f a b):(zipWith f as bs)
> zipWith _ _      _      = []

Let’s try to naively write this function at the type level on promoted lists:

type family ZipWith (f :: a -> b -> c) (as :: [a]) (bs :: [b]) :: [c]
type instance ZipWith f (a ': as) (b ': bs) = (f a b) ': (ZipWith f as bs)
type instance ZipWith f as        bs        = '[]

Urk. We get the following error:

Conflicting family instance declarations:
  ZipWith k k k f ((':) k a as) ((':) k b bs)
  ZipWith k k k f as bs

(The repetition of the variable k is incorrect, and has been reported as GHC bug #7524. This is not the issue we are pursuing here, however.)

The problem is really that type instance declarations are essentially unordered. The order in which they appear in a file is irrelevant to GHC. Relatedly, a programmer can define instances of the same type family in multiple modules. With separate compilation, the lack of ordering and the overlap check are necessary for type soundness. This is quite different from term-level function definition equations. All equations defining the same function not only have to be in the same module, but they must be one right after another.

The particular example here has an easy solution. Because we are matching over a closed kind ([a] at the kind level), we could simply expand out the different cases we wish to match against. However, this solution is not possible when matching over an open kind, such as *. We’ll see a useful example of overlap involving * shortly.

The Solution

GHC HEAD now contains an implementation for ordered overlapping type family instances. The example above can be written thus:

> type family ZipWith (f :: a -> b -> c) (as :: [a]) (bs :: [b]) :: [c]
> type instance where
>   ZipWith f (a ': as) (b ': bs) = (f a b) ': (ZipWith f as bs)
>   ZipWith f as        bs        = '[]

More interestingly, we can now define this:

> type family Equals (a :: k) (b :: k) :: Bool
> type instance where
>   Equals a a = True
>   Equals a b = False

Ordered overlapping type family instances allow us to define a general, write-once use-everywhere Boolean equality at the type level. Yay!

This new form of type family instance also seems to close the biggest known gap between the expressivity of functional dependencies and type families: functional dependencies have long supported overlap (through OverlappingInstances or IncoherentInstances) that type families could not. Although functional dependencies’ overlap requires ordering based on specificity and type families’ overlap is based on an explicit ordering, it would seem that any code that took advantage of functional dependencies and overlap can now be translated to use overlapping type families.

Details

  • type instance where does not work with associated types. Class instances can be sprinkled across modules, and having this form of overlap appear across modules would not be type safe in the presence of separate compilation.
  • type instance where does not work with associated types, even when the overlap is intended to exist just within one instance. There is no great reason for this restriction, but it seemed unimportant. Yell if this matters to you.
  • Choosing which equation in a group to use is somewhat delicate. For example, consider the Equals type family. What if we want to simplify Equals a Int? Well, we can’t. That’s because a might (sometimes) be instantiated to Int, and if we simplified Equals a Int to False, we would have a type soundness issue. So, perhaps counterintuitively, we can’t simplify even Equals a b to False until a and b are known.

This GHC wiki page gives an outline of how to get GHC compiling on your machine so you can play with this feature in HEAD. I don’t imagine it will be in 7.6.2, but I believe it will be in 7.8.1, whenever that is released. Enjoy, and please post any feedback!

Acknowledgments

Many thanks to Simon Peyton Jones, Dimitrios Vytiniotis, and Stephanie Weirich for getting me started and helping me think through some of the finer points.

A formalization of GHC’s core language

There have been a handful of papers about System FC, the internal language used within GHC ([1] [2] [3] [4], etc.). Each of these papers uses a different characterization of FC, with variations among the definitions and judgments. Unsurprisingly, each of these formalisms differs also from the actual implementation of FC within GHC. At ICFP, Simon PJ asked me to look at GHC’s implementation of System FC, and write it up using formal notation. The result of this work (developed using Ott) now lives in the GHC repo, in docs/core-spec. There are comments in various places reminding GHC contributors to update the formalization as they update the implementation.

Do you have a stake in this work? Are you planning on reasoning about GHC’s core language? Check out the formalism here. Any feedback is welcome!

Decidable Propositional Equality in Haskell

A recent discussion on the cvs-ghc mailing list touched on propositional decidable equality in Haskell. This post will explain propositional equality and consider different encodings of this idea in Haskell.

Preface

This blog post is a literate Haskell file, compatible with GHC 7.6.1. As usual, we need some initial declarations to get off the ground.

> {-# LANGUAGE DataKinds, PolyKinds, GADTs, TypeFamilies,
>              LambdaCase #-}
> {-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
> 
> data Nat = Zero | Succ Nat

Throughout the post, I will be talking about equality on Nats, but the ideas here extend to any types that would admit a good instance of Eq.

We will need length-indexed vectors to make interesting use of the ideas here:

> data Vec :: * -> Nat -> * where
>   VNil  :: Vec a Zero
>   VCons :: a -> Vec a n -> Vec a (Succ n)
> 
> safeHead :: Vec a (Succ n) -> a
> safeHead (VCons h _) = h
> 
> safeTail :: Vec a (Succ n) -> Vec a n
> safeTail (VCons _ t) = t

Note that there is no need for another clause for safeHead or safeTail because the types forbid, for example, safeHead VNil.

Propositional Equality

We are all (hopefully?) familiar with Boolean equality:

> boolEq :: Nat -> Nat -> Bool
> boolEq Zero     Zero     = True
> boolEq (Succ a) (Succ b) = boolEq a b
> boolEq _        _        = False

We can even lift this idea to the type level:

> type family BoolEq (a :: Nat) (b :: Nat) :: Bool
> type instance BoolEq Zero Zero = True
> type instance BoolEq (Succ a) (Succ b) = BoolEq a b
> type instance BoolEq Zero (Succ x) = False
> type instance BoolEq (Succ x) Zero = False

Let’s try to write a function to retrieve the second element from a vector:

cadr :: Vec a n -> a
cadr v = safeHead (safeTail v)

We get an error:

Couldn't match type `n' with 'Succ ('Succ n0)

Naturally, GHC can’t confidently apply safeHead and safeTail because we don’t know that v has at least 2 elements.

Let’s try again:

cadr :: (BoolEq n (Succ (Succ n')) ~ True) => Vec a n -> a
cadr v = safeHead (safeTail v)

Still doesn’t work:

Could not deduce (n ~ 'Succ ('Succ n0))
from the context (BoolEq n ('Succ ('Succ n')) ~ 'True)

The problem is that GHC doesn’t know that our Boolean equality function really shows the equality of two types.

This is a contrived example, though. Instead, let’s consider a program passing around explicit evidence of whether or not a list has at least two elements. If the list doesn’t, the function should return a supplied default.

To pass around evidence of a type-level Boolean quantity, we need the singleton type for Booleans:

> data SBool :: Bool -> * where
>   STrue  :: SBool True
>   SFalse :: SBool False

(If you’ve never seen singleton types before, my singletons paper on the subject contains helpful information.)

cadr :: SBool (BoolEq n (Succ (Succ n'))) -> a -> Vec a n -> a
cadr evidence deflt v = case evidence of
  STrue -> safeHead (safeTail v)
  SFalse -> deflt

Still, no go:

Could not deduce (n ~ 'Succ ('Succ n0))
from the context (BoolEq n ('Succ ('Succ n')) ~ 'True)
  bound by a pattern with constructor
             STrue :: SBool 'True,

In the end, this last example is the same as the previous. Pattern-matching on the SBool just brings the equality (BoolEq n (Succ (Succ n'))) ~ True into the context.

We need to convert Boolean equality to propositional equality, which is denoted by ~. Propositional equality is an equality among types that GHC can make use of in type checking code. To work with propositional equality, we need to make it first class, instead of just a constraint.

> data PropEq :: k -> k -> * where
>   Refl :: PropEq x x

Let’s now try to write a conversion function from Boolean equality to propositional equality:

boolToProp :: (BoolEq a b ~ True) => PropEq a b
boolToProp = Refl

Same old problem:

Could not deduce (a ~ b)
from the context (BoolEq a b ~ 'True)

What we need to do is to build up the propositional equality from pieces that GHC can easily verify are indeed equal. We need an inductive proof that our definition of Boolean equality is correct for any natural number. To write such a proof, we will need to do case analysis on a and b. To do that, in turn, we will need a singleton over natural numbers.

> data SNat :: Nat -> * where
>   SZero :: SNat Zero
>   SSucc :: SNat n -> SNat (Succ n)

Now, let’s write the inductive proof:

boolToProp :: (BoolEq a b ~ True) => SNat a -> SNat b -> PropEq a b
boolToProp SZero SZero = Refl
boolToProp (SSucc x') (SSucc y') = boolToProp x' y'

Oops:

Could not deduce (n ~ 'Succ n)
...
Could not deduce (n1 ~ 'Succ n1)

The problem is that we are returning the result of boolToProp x' y' directly from boolToProp, even though x' and y' have different types than SSucc x and SSucc y. The solution is to use a pattern match on the result from the recursive call. Let’s call the type index associated with x' to be a' and that with y' to be b'. Then, the recursive call gives us (Refl :: PropEq a' b'). If we pattern match on this, we get the propositional equality a' ~ b' into the context. This can be used to show Succ a' ~ Succ b' (which is really just a ~ b), so we can now use Refl once again, though at a different type:

> boolToProp :: (BoolEq a b ~ True) => SNat a -> SNat b -> PropEq a b
> boolToProp SZero SZero = Refl
> boolToProp (SSucc a') (SSucc b') =
>   case boolToProp a' b' of
>     Refl -> Refl

Great. Except now we get this warning:

Pattern match(es) are non-exhaustive
In an equation for `boolToProp':
    Patterns not matched:
        SZero (SSucc _)
        (SSucc _) SZero

The problem is that there is no possible way out in these cases, so we’ll just have to put undefined:

boolToProp SZero (SSucc _) = undefined
boolToProp (SSucc _) SZero = undefined

Wait. Now there’s a new problem:

Couldn't match type 'False with 'True
Inaccessible code in ...
Couldn't match type 'False with 'True
Inaccessible code in ...

GHC rightly determines that these cases are impossible. Why are they impossible? Because we know that BoolEq a b ~ True. In these cases, that wouldn’t be the case, so GHC can’t match False with True.

But now we are in a quandary. Without the extra matches, we get a warning (due to -fwarn-incomplete-patterns, which you should never leave home without). With the matches, we get an error. That’s silly. And others agree that it’s silly, filing bug report #3927. According to the commentary on the bug report, Simon PJ says, “The more complaints the more likely I am to put off other things to do this one!” So, at the risk of pestering dear Simon, if you are annoyed by this, please complain! The best way to complain is simply to add yourself to the Cc list of the bug report. If enough people do this, the bug will get fixed sooner. Or, even better, try to fix it yourself!

So, where does this leave us? I can’t stand a warning in my code, so we’ll suppress it with this:

> boolToProp _ _ = error "bug 3927"

Let’s try to write cadr one last time, this time armed with boolToProp:

> cadr :: SBool (BoolEq n (Succ (Succ n'))) -> SNat n
>      -> SNat n' -> a -> Vec a n -> a
> cadr evidence n n' deflt v = case evidence of
>   STrue -> case boolToProp n (SSucc (SSucc n')) of
>     Refl -> safeHead (safeTail v)
>   SFalse -> deflt

It works! Hurrah!

The sad part here is that, to make it work, we needed to pass around two SNats and perform an O(n) operation (at runtime – the boolToProp “proof” runs!) to prove to GHC that the operation is valid. Can we do better?

Decidable Propositional Equality

Yes, we can.

The problem lies in the fact that we branch on a witness of Boolean equality. There is an alternative: decidable propositional equality. The idea is that instead of just type-level Booleans, decidable equality stores either evidence that two types are equal or evidence that they are not. We know how to write evidence that two types a and b are equal: PropEq a b. What’s the opposite of PropEq a b? It’s the statement that PropEq a b implies falsehood. In Haskell, we can represent falsehood with an empty type.

> data Falsehood
> type Not a = a -> Falsehood

Now, we can define decidable equality in Haskell:

> type DecidableEquality (a :: k) (b :: k) = Either (PropEq a b) (Not (PropEq a b))

We can even write a function to decide equality over Nats. Because this function produces runtime evidence, it uses singletons to work with types.

> decEq :: SNat a -> SNat b -> DecidableEquality a b
> decEq SZero SZero = Left Refl
> decEq (SSucc x') (SSucc y') = case decEq x' y' of
>   Left Refl -> Left Refl
>   Right contra -> Right (\case Refl -> contra Refl)

There’s a little magic going on here, so let’s pause and reflect. The first equation is straightforward. In the second, we recur on a' and b'. If those in fact are equal, we still unpack the Left Refl result and create a new Left Refl at a different type. We’ve seen this before, so it’s OK.

But what’s going on with Right? Once again, let’s call the type index associated with x' to be a', and likewise with y' and b'. The return type of decEq x' y' is DecidableEquality a' b'. Because we’ve matched with Right, we know that contra must have type Not (PropEq a' b'), synonymous with PropEq a' b' -> Falsehood. We must produce something of type PropEq a b -> Falsehood. So, we write a lambda-case pattern match on the PropEq a b to get the equality a ~ b. Because a is Succ a' and b is Succ b', GHC can use a ~ b to derive a' ~ b', and thus we can call contra with Refl :: PropEq a' b'. Whew.

Now, we deal with the failure cases. If we know that, say, a is Zero and b is Succ b', then GHC rightly figures out that PropEq a b (which expands to PropEq Zero (Succ b')) is an empty type.

decEq SZero (SSucc _) = Right (\case {})
decEq (SSucc _) SZero = Right (\case {})

No go:

parse error on input `}'

GHC does not support empty pattern matches. (UPDATE [Jan 4, 2013]: Simon PJ has implemented empty pattern matches in HEAD. Yay!) A feature request to support these was submitted as bug report #2431. Drat. Explicitly pattern matching on Refl gives us an inaccessible code error (correctly), so we are forced to do this dreadful workaround:

> decEq SZero (SSucc _) = Right (\_ -> error "bug 2431")
> decEq (SSucc _) SZero = Right (\_ -> error "bug 2431")

(Incidentally, this bug seems much easier to fix than #3927, so if you have a little time, go for it!)

So, now that we’ve defined decEq, how can we use it? Let’s write a wrapper around safeHead with evidence. We’ll first need a way to eliminate Falsehood. From logic, we learn that falsehood implies anything, neatly expressed in this type:

> exFalsoQuodlibet :: Falsehood -> a

Unfortunately, even with a manifestly empty type, we can’t use an empty pattern match. So, we do this:

> exFalsoQuodlibet = \case _ -> error "bug 2431"

Here is the type for our new head operation:

> safeHead' :: DecidableEquality n Zero -> SNat n -> a -> Vec a n -> a

In this example, as opposed to above, we reason about whether n is Zero. Because we set the example up this way, the evidence when n is not Zero will be necessary for a complete definition of the safeHead' operation.

> safeHead' (Left Refl) _ deflt _ = deflt
> safeHead' (Right contra) n _ v = case n of
>   SZero -> exFalsoQuodlibet (contra Refl)
>   SSucc _ -> safeHead v

Note that this definition is complete: we can use the primitives that we have built up to eliminate the impossible case match. Of course, those primitives had to avoid empty pattern matches. However, it is easy to imagine a future where bugs #2431 and #3927 are gone and we can define this without any partial features of Haskell.

I should also note that we can’t use exFalsoQuodlibet (contra Refl) in the SSucc _ case. GHC rightly complains Couldn't match type 'Succ n1 with 'Zero in the use of Refl.

Fun stuff, no?