Tag Archives: overlapping-tyfams

Coincident Overlap in Type Families

Haskell allows what I will call coincident overlap among type family instances. Coincident overlap occurs when two (or more) type family instances might be applicable to a given type family usage site, but they would evaluate to the same right-hand side. This post, inspired by Andy Adams-Moran’s comment to an earlier blog post, explores coincident overlap and how to extend it (or not!) to branched instances.

This post is a literate Haskell file (though there really isn’t that much code). Paste it into a .lhs file and load it into GHCi. Because the post uses branched instances, you will need the HEAD version of GHC. (Branched instances will be included in GHC 7.8, but not before.)

> {-# LANGUAGE TypeFamilies, DataKinds, GADTs, TypeOperators #-}

Our examples will be over Bools, and we need some way to get GHC to evaluate our type families. The easiest way is to use the following singleton GADT:

> data SBool :: Bool -> * where
>   SFalse :: SBool False
>   STrue  :: SBool True

Conflict checking with type family instances

When compiling type family instances, GHC checks the instances for conflicts. To know if two instances conflict (i.e., could both match the same usage site), GHC unifies the two left-hand sides. For example, the following code is bad and is rejected:

type family F x
type instance F Int = Bool
type instance F a   = Double

Compiling the above instances gives the following error message:

Conflicting family instance declarations:
  F Int -- Defined at ...
  F a -- Defined at ...

This check is a good thing, because otherwise it would be possible to equate two incompatible types, such as Int and Bool.

Coincident overlap among unbranched instances

Here is a nice example of how coincident overlap is useful:

> type family (x :: Bool) && (y :: Bool) :: Bool
> type instance False && a     = False   -- 1
> type instance True  && b     = b       -- 2
> type instance c     && False = False   -- 3
> type instance d     && True  = d       -- 4

Although the first two equations fully define the && operation, the last two instances allow GHC to reduce a use of && that could not otherwise be reducible. For example:

> and1 :: SBool a -> SBool (True && a)
> and1 x = x
> 
> and2 :: SBool a -> SBool (a && True)
> and2 x = x

and1 uses the second instance of &&, but and2 requires the fourth instance. If we comment that instance out, and2 fails to compile, because GHC cannot figure out that a && True must be a for all values of a. For various good reasons, perhaps to be explored in another post, GHC does not do case analysis on types during type inference.

How does GHC know that overlap is coincident? During the conflict check, GHC looks for a substitution that unifies two potentially-conflicting instances. In our case, the fourth and first instances would conflict under the substitution {a |-> True, d |-> False}. However, after finding the unifying substitution, GHC checks the right-hand sides under that same substitution. If they are the same, then GHC considers the overlap to be coincident and allows the instance pair. In our case, applies the substitution {a |-> True, d |-> False} to False and d and discovers that both are False, and so the instances are allowed.

Coincident overlap within branched instances

When thinking about branched instances and coincident overlap, there are two possibilities to consider: coincident overlap within a branched instance and coincident overlap among two separate branched instances. Let’s consider the first case here.

Imagine we define || analogously to &&, but using one branched instance:

> type family (x :: Bool) || (y :: Bool) :: Bool
> type instance where
>   False || a     = a     -- 1
>   True  || b     = True  -- 2
>   c     || False = c     -- 3
>   d     || True  = True  -- 4

Now, let’s consider simplifying the type e || False. The first two branches don’t match, but the third does. Now, following the rule for branched instance simplification (as stated in the Haskell wiki), we check to see if any previous branches might be applicable to e || False, for any possible instantiation of e. The first branch certainly might apply, and so e || False fails to simplify. This is surely counterintuitive, because the third branch matches e || False exactly!

Just to prove this behavior, I tried running this code through GHC:

bar :: SBool a -> SBool (a || False)
bar x = x

Here is the error:

Couldn't match type ‛a’ with ‛a || 'False’

At first blush, it seems I’ve missed something important here in the implementation — allowing coincident overlap within a branched instance. But, there is a major problem with such overlap in this setting. Let’s think about how coincident overlap would work in this setting. After selecting the third branch to simplify e || False (with the substitution {c |-> e}), GHC checks to see if any previous branch could be applicable to e || False. The first branch, False || a, unifies with e || False, so it might be applicable later on. The unifying substitution is {a |-> False, e |-> False}. Now, if we wanted to check for coincident overlap, we would apply both substitutions ({c |-> e} and {a |-> False, e |-> False}) to the right-hand sides. In this case, we would see that both right-hand sides would become False, and it seems we should allow the simplification of e || False to e.

Let’s try a harder case. What if we want to simplify (G f) || False, for some type family G? The third branch matches, with the substitution {c |-> G f}. Now, we check earlier branches for applicability. The first branch is potentially applicable, if G f simplifies to False. But, we can’t produce a substitution over type variables to witness to check right-hand sides. In this case, it wouldn’t be hard to imagine a substitution like {(G f) |-> False}, but that’s a slippery slope to slide down. What if f appears multiple times in the type, perhaps under different type family applications? How do we deal with this? There may well be an answer, but it would be subtle and likely quite fragile from the programmer’s point of view. So, we decided to ignore the possibility of coincident overlap within a branch. We were unable to come up with a compelling example of why anyone would want this feature, it seemed hard to get right, and we can just write || using separate instances, anyway.

Coincident overlap between branched instances

Consider the following (contrived) example:

type family F (x :: Bool) (y :: Bool) :: *
type instance where
  F False True = Int
  F a     a    = Int
  F b     c    = Bool
type instance F d True = Int

Is this set of instances allowable? Is that your final answer?

I believe that this set of instances wouldn’t produce any conflicts. Anything that matches F d True would have to match one of the first two branches of the branched instance, meaning that the right-hand sides coincide. However, it is difficult to reason about such cases, for human and GHC alike. So, for the sake of simplicity, we also forbid coincident overlap whenever even one instance is branched. This means that

type instance F Int = Bool

and

type instance where F Int = Bool

are very subtly different.

Andy Adams-Moran’s example

This post was partially inspired by Andy Adams-Moran’s comment in which Andy poses this example, paraphrased slightly:

> data T a
> type family Equiv x y :: Bool
> type instance where
>    Equiv a      a     = True        -- 1
>    Equiv (T b)  (T c) = True        -- 2
>    Equiv (t d)  (t e) = Equiv d e   -- 3
>    Equiv f      g     = False       -- 4

Alas, this does not work as well as one would like. The problem is that we cannot simplify, say, Equiv (T a) (T b) to True, because this matches the second branch but might later match the first branch. Simplifying this use would require coincident overlap checking within branched instances. We could move the first branch to the third position, and that would help, but not solve the problem. With that change, Equiv x x would not simplify, until the head of x were known.

So, is this the “compelling example” we were looking for? Perhaps. Andy, why do you want this? Can your use case be achieved with other means? Do you (anyone out there) have a suggestion for how to deal with coincident overlap within branched instances in a simple, easy-to-explain manner?

Ordered overlapping type family instances

I am pleased to announce that I have pushed my implementation of ordered overlapping type family instances to GHC HEAD.

This blog post is a literate Haskell file. Copy and paste into a .lhs file to try out this code. This file will only compile with GHC HEAD, however.

We need some header formalities:

> {-# LANGUAGE TypeFamilies, DataKinds, PolyKinds, TypeOperators #-}
> import Prelude hiding (zipWith)

The Problem

When writing term-level functions, it is natural to write a series of equations, each using a sequence of patterns to select which equation should be triggered when calling the function. Critically for this discussion, the first matching equation is used. Let’s use a particularly favorite function of mine as an example:

> import Prelude hiding (zipWith)
> 
> zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
> zipWith f (a:as) (b:bs) = (f a b):(zipWith f as bs)
> zipWith _ _      _      = []

Let’s try to naively write this function at the type level on promoted lists:

type family ZipWith (f :: a -> b -> c) (as :: [a]) (bs :: [b]) :: [c]
type instance ZipWith f (a ': as) (b ': bs) = (f a b) ': (ZipWith f as bs)
type instance ZipWith f as        bs        = '[]

Urk. We get the following error:

Conflicting family instance declarations:
  ZipWith k k k f ((':) k a as) ((':) k b bs)
  ZipWith k k k f as bs

(The repetition of the variable k is incorrect, and has been reported as GHC bug #7524. This is not the issue we are pursuing here, however.)

The problem is really that type instance declarations are essentially unordered. The order in which they appear in a file is irrelevant to GHC. Relatedly, a programmer can define instances of the same type family in multiple modules. With separate compilation, the lack of ordering and the overlap check are necessary for type soundness. This is quite different from term-level function definition equations. All equations defining the same function not only have to be in the same module, but they must be one right after another.

The particular example here has an easy solution. Because we are matching over a closed kind ([a] at the kind level), we could simply expand out the different cases we wish to match against. However, this solution is not possible when matching over an open kind, such as *. We’ll see a useful example of overlap involving * shortly.

The Solution

GHC HEAD now contains an implementation for ordered overlapping type family instances. The example above can be written thus:

> type family ZipWith (f :: a -> b -> c) (as :: [a]) (bs :: [b]) :: [c]
> type instance where
>   ZipWith f (a ': as) (b ': bs) = (f a b) ': (ZipWith f as bs)
>   ZipWith f as        bs        = '[]

More interestingly, we can now define this:

> type family Equals (a :: k) (b :: k) :: Bool
> type instance where
>   Equals a a = True
>   Equals a b = False

Ordered overlapping type family instances allow us to define a general, write-once use-everywhere Boolean equality at the type level. Yay!

This new form of type family instance also seems to close the biggest known gap between the expressivity of functional dependencies and type families: functional dependencies have long supported overlap (through OverlappingInstances or IncoherentInstances) that type families could not. Although functional dependencies’ overlap requires ordering based on specificity and type families’ overlap is based on an explicit ordering, it would seem that any code that took advantage of functional dependencies and overlap can now be translated to use overlapping type families.

Details

  • type instance where does not work with associated types. Class instances can be sprinkled across modules, and having this form of overlap appear across modules would not be type safe in the presence of separate compilation.
  • type instance where does not work with associated types, even when the overlap is intended to exist just within one instance. There is no great reason for this restriction, but it seemed unimportant. Yell if this matters to you.
  • Choosing which equation in a group to use is somewhat delicate. For example, consider the Equals type family. What if we want to simplify Equals a Int? Well, we can’t. That’s because a might (sometimes) be instantiated to Int, and if we simplified Equals a Int to False, we would have a type soundness issue. So, perhaps counterintuitively, we can’t simplify even Equals a b to False until a and b are known.

This GHC wiki page gives an outline of how to get GHC compiling on your machine so you can play with this feature in HEAD. I don’t imagine it will be in 7.6.2, but I believe it will be in 7.8.1, whenever that is released. Enjoy, and please post any feedback!

Acknowledgments

Many thanks to Simon Peyton Jones, Dimitrios Vytiniotis, and Stephanie Weirich for getting me started and helping me think through some of the finer points.

Variable-arity zipWith

At ICFP in September, an interesting problem was posed: Is it possible to define a variable-arity zipWith in Haskell using GHC 7.6.1? Can we leverage the new expressivity in promoted types and kinds to do away with zipWith3, zipWith4 and friends? The answer turns out to be yes.

Let’s start by enabling a bunch of non-controversial language options and declaring the module:

{-# LANGUAGE TypeFamilies, ExplicitForAll, DataKinds, GADTs,
    	     MultiParamTypeClasses, FlexibleInstances, FlexibleContexts #-}

module ZipWith where

import Prelude hiding (zipWith)

Though promotion is not strictly necessary to pull this off, it turns out to be convenient for GHC to kind-check our code. We define the natural numbers to use at the kind level:

data Nat = Zero | Succ Nat

Now, we need to start thinking about what the type of a variable-arity zipWith must be. Clearly, it will need to take the function to apply and a bunch of lists, but the number of lists is not known when we write the type. We correspondingly don’t know how many arguments the function itself should take. We’ve narrowed our type down to f -> <dragons>, for some function type f. The dragons will have to be some type-level function that evaluates to the correct sequence of arrows and argument types, based on the type substituted for f.

Examples may help here:

  • If f is a -> b, then the dragons should be [a] -> [b].
  • If f is a -> b -> c, then the dragons should be [a] -> [b] -> [c].
  • and so on.

OK. That’s not too hard. We essentially want to map the type-level [] operator over the components of the type of f. However, a problem lurks: what if the final result type is itself an arrow? In the first example above, there is nothing stopping b from being d -> e. This turns out to be a fundemental ambiguity in variable-arity zipWith. Let’s explore this for a moment.

We’ll need a three-argument function to make the discussion interesting. Here is such a function:

splotch :: Int -> Char -> Double -> String
splotch a b c = (show a) ++ (show b) ++ (show c)

Now, there are two conceivable ways to apply splotch with zipWith:

*ZipWith> :t zipWith2 splotch
zipWith2 splotch :: [Int] -> [Char] -> [Double -> String]
*ZipWith> :t zipWith3 splotch
zipWith3 splotch :: [Int] -> [Char] -> [Double] -> [String]

(Here, zipWith2 is really just the zipWith in the Prelude.)

In general, there is no way for an automated system to know which one of these possibilities we want, so it is sensible to have to provide a number to the dragons, which we’ll now name Listify. This number is the number of arguments to the function f. Here is the definition for Listify:

-- Map the type constructor [] over the types of arguments and return value of
-- a function
type family Listify (n :: Nat) (arrows :: *) :: *
type instance Listify (Succ n) (a -> b) = [a] -> Listify n b
type instance Listify Zero a = [a]

Now it would seem we can write the type of zipWith. Except, when we think about it, we realize that the operation of zipWith will have to be different depending on the choice for n in Listify. Because this n is a type, it is not available at runtime. We will need some runtime value that the implementation of zipWith can branch on.

Furthermore, we will need to convince GHC that we’re not doing something very silly, like trying Listify (Succ (Succ (Succ Zero))) (Int -> Int). So, we create a GADT that achieves both of these goals. A value from this GADT will both be a runtime witness controlling how zipWith should behave and will assert at compile time that the argument to Listify is appropriate:

-- Evidence that a function has at least a certain number of arguments
data NumArgs :: Nat -> * -> * where
  NAZero :: NumArgs Zero a
  NASucc :: NumArgs n b -> NumArgs (Succ n) (a -> b)

oneArg = NASucc NAZero
twoArgs = NASucc oneArg
threeArgs = NASucc twoArgs

Finally, we can give the type for zipWith:

zipWith :: NumArgs numArgs f -> f -> Listify numArgs f

Note that, though this zipWith is variable-arity, we still have to tell it the desired arity. More on this point later.

Once we have the type, we still need an implementation, which will need to be recursive both in the length of the lists and the number of arguments. When we think about recursion in the number of arguments to f, currying comes to the rescue… almost. Consider the following:

zipWith threeArgs splotch [1,2] ['a','b'] [3.5,4.5]

We would like a recursive call to come out to be something like

zipWith twoArgs  ['a','b'] [3.5,4.5]

The problem is that there is no replacement for <splotch ??> that works. We want to apply (splotch 1) to the first members of the lists and to apply (splotch 2) to the second members. What we really need is to take a list of functions to apply. Let’s call the function that works with list of functions listApply. Then, the recursive call would look like

listApply twoArgs [splotch 1, splotch 2] ['a','b'] [3.5,4.5]

With such a listApply function, we can now implement zipWith:

zipWith numArgs f = listApply numArgs (repeat f)

The type and implementation of listApply is perhaps a little hard to come up with, but otherwise unsurprising.

-- Variable arity application of a list of functions to lists of arguments
-- with explicit evidence that the number of arguments is valid
listApply :: NumArgs n a -> [a] -> Listify n a
listApply NAZero fs = fs
listApply (NASucc na) fs = listApply na . apply fs
  where apply :: [a -> b] -> [a] -> [b]
        apply (f:fs) (x:xs) = (f x : apply fs xs)
        apply _      _      = []

And now we’re done. Here are some examples of it all working:

example1 = listApply (NASucc NAZero) (repeat not) [False,True]
example2 = listApply (NASucc (NASucc NAZero)) (repeat (+)) [1,3] [4,5]

example3 = zipWith twoArgs (&&) [False, True, False] [True, True, False]
example4 = zipWith twoArgs (+) [1,2,3] [4,5,6]

example5 = zipWith threeArgs splotch [1,2,3] ['a','b','c'] [3.14, 2.1728, 1.01001]

But wait: can we do better? The zipWith built here still needs to be told what its arity should be. Notwithstanding the ambiguity mentioned above, can we somehow infer this arity?

I have not come up with a way to do this in GHC 7.6.1. But, I happen to (independently) be working on an extension to GHC to allow ordering among type family instance equations, just like equations for term-level functions are ordered. GHC will try the first equation and then proceed to other only if the first doesn’t match. The details are beyond the scope of this post (but will hopefully appear later), but you can check out the GHC wiki page on the subject. The following example should hopefully make sense:

-- Count the number of arguments of a function
type family CountArgs (f :: *) :: Nat
type instance where
  CountArgs (a -> b) = Succ (CountArgs b)
  CountArgs result = Zero

This function counts the number of arrows in a function type. Note that this cannot be defined without ordered equations, because there is no way in GHC 7.6.1 to say that result (the type variable in the last equation) is not an arrow.

Now, all we need to do is to be able to make the runtime witness of the argument count implicit through the use of a type class:

-- Use type classes to automatically infer NumArgs
class CNumArgs (numArgs :: Nat) (arrows :: *) where
  getNA :: NumArgs numArgs arrows
instance CNumArgs Zero a where
  getNA = NAZero
instance CNumArgs n b => CNumArgs (Succ n) (a -> b) where
  getNA = NASucc getNA

Here is the new, implicitly specified variable-arity zipWith:

{-# LANGUAGE ScopedTypeVariables #-}
-- Variable arity zipWith, inferring the number of arguments and using
-- implicit evidence of the argument count.
-- Calling this requires having a concrete return type of the function to
-- be applied; if it's abstract, we can't know how many arguments the function
-- has. So, zipWith (+) ... won't work unless (+) is specialized.
zipWith' :: forall f. CNumArgs (CountArgs f) f => f -> Listify (CountArgs f) f
zipWith' f = listApply (getNA :: NumArgs (CountArgs f) f) (repeat f)

This version does compile and work with my enhanced version of GHC. Expect to see ordered type family instances coming to a GHC near you soon!