Category Archives: Code

Coincident Overlap in Type Families

Haskell allows what I will call coincident overlap among type family instances. Coincident overlap occurs when two (or more) type family instances might be applicable to a given type family usage site, but they would evaluate to the same right-hand side. This post, inspired by Andy Adams-Moran’s comment to an earlier blog post, explores coincident overlap and how to extend it (or not!) to branched instances.

This post is a literate Haskell file (though there really isn’t that much code). Paste it into a .lhs file and load it into GHCi. Because the post uses branched instances, you will need the HEAD version of GHC. (Branched instances will be included in GHC 7.8, but not before.)

> {-# LANGUAGE TypeFamilies, DataKinds, GADTs, TypeOperators #-}

Our examples will be over Bools, and we need some way to get GHC to evaluate our type families. The easiest way is to use the following singleton GADT:

> data SBool :: Bool -> * where
>   SFalse :: SBool False
>   STrue  :: SBool True

Conflict checking with type family instances

When compiling type family instances, GHC checks the instances for conflicts. To know if two instances conflict (i.e., could both match the same usage site), GHC unifies the two left-hand sides. For example, the following code is bad and is rejected:

type family F x
type instance F Int = Bool
type instance F a   = Double

Compiling the above instances gives the following error message:

Conflicting family instance declarations:
  F Int -- Defined at ...
  F a -- Defined at ...

This check is a good thing, because otherwise it would be possible to equate two incompatible types, such as Int and Bool.

Coincident overlap among unbranched instances

Here is a nice example of how coincident overlap is useful:

> type family (x :: Bool) && (y :: Bool) :: Bool
> type instance False && a     = False   -- 1
> type instance True  && b     = b       -- 2
> type instance c     && False = False   -- 3
> type instance d     && True  = d       -- 4

Although the first two equations fully define the && operation, the last two instances allow GHC to reduce a use of && that could not otherwise be reducible. For example:

> and1 :: SBool a -> SBool (True && a)
> and1 x = x
> 
> and2 :: SBool a -> SBool (a && True)
> and2 x = x

and1 uses the second instance of &&, but and2 requires the fourth instance. If we comment that instance out, and2 fails to compile, because GHC cannot figure out that a && True must be a for all values of a. For various good reasons, perhaps to be explored in another post, GHC does not do case analysis on types during type inference.

How does GHC know that overlap is coincident? During the conflict check, GHC looks for a substitution that unifies two potentially-conflicting instances. In our case, the fourth and first instances would conflict under the substitution {a |-> True, d |-> False}. However, after finding the unifying substitution, GHC checks the right-hand sides under that same substitution. If they are the same, then GHC considers the overlap to be coincident and allows the instance pair. In our case, applies the substitution {a |-> True, d |-> False} to False and d and discovers that both are False, and so the instances are allowed.

Coincident overlap within branched instances

When thinking about branched instances and coincident overlap, there are two possibilities to consider: coincident overlap within a branched instance and coincident overlap among two separate branched instances. Let’s consider the first case here.

Imagine we define || analogously to &&, but using one branched instance:

> type family (x :: Bool) || (y :: Bool) :: Bool
> type instance where
>   False || a     = a     -- 1
>   True  || b     = True  -- 2
>   c     || False = c     -- 3
>   d     || True  = True  -- 4

Now, let’s consider simplifying the type e || False. The first two branches don’t match, but the third does. Now, following the rule for branched instance simplification (as stated in the Haskell wiki), we check to see if any previous branches might be applicable to e || False, for any possible instantiation of e. The first branch certainly might apply, and so e || False fails to simplify. This is surely counterintuitive, because the third branch matches e || False exactly!

Just to prove this behavior, I tried running this code through GHC:

bar :: SBool a -> SBool (a || False)
bar x = x

Here is the error:

Couldn't match type ‛a’ with ‛a || 'False’

At first blush, it seems I’ve missed something important here in the implementation — allowing coincident overlap within a branched instance. But, there is a major problem with such overlap in this setting. Let’s think about how coincident overlap would work in this setting. After selecting the third branch to simplify e || False (with the substitution {c |-> e}), GHC checks to see if any previous branch could be applicable to e || False. The first branch, False || a, unifies with e || False, so it might be applicable later on. The unifying substitution is {a |-> False, e |-> False}. Now, if we wanted to check for coincident overlap, we would apply both substitutions ({c |-> e} and {a |-> False, e |-> False}) to the right-hand sides. In this case, we would see that both right-hand sides would become False, and it seems we should allow the simplification of e || False to e.

Let’s try a harder case. What if we want to simplify (G f) || False, for some type family G? The third branch matches, with the substitution {c |-> G f}. Now, we check earlier branches for applicability. The first branch is potentially applicable, if G f simplifies to False. But, we can’t produce a substitution over type variables to witness to check right-hand sides. In this case, it wouldn’t be hard to imagine a substitution like {(G f) |-> False}, but that’s a slippery slope to slide down. What if f appears multiple times in the type, perhaps under different type family applications? How do we deal with this? There may well be an answer, but it would be subtle and likely quite fragile from the programmer’s point of view. So, we decided to ignore the possibility of coincident overlap within a branch. We were unable to come up with a compelling example of why anyone would want this feature, it seemed hard to get right, and we can just write || using separate instances, anyway.

Coincident overlap between branched instances

Consider the following (contrived) example:

type family F (x :: Bool) (y :: Bool) :: *
type instance where
  F False True = Int
  F a     a    = Int
  F b     c    = Bool
type instance F d True = Int

Is this set of instances allowable? Is that your final answer?

I believe that this set of instances wouldn’t produce any conflicts. Anything that matches F d True would have to match one of the first two branches of the branched instance, meaning that the right-hand sides coincide. However, it is difficult to reason about such cases, for human and GHC alike. So, for the sake of simplicity, we also forbid coincident overlap whenever even one instance is branched. This means that

type instance F Int = Bool

and

type instance where F Int = Bool

are very subtly different.

Andy Adams-Moran’s example

This post was partially inspired by Andy Adams-Moran’s comment in which Andy poses this example, paraphrased slightly:

> data T a
> type family Equiv x y :: Bool
> type instance where
>    Equiv a      a     = True        -- 1
>    Equiv (T b)  (T c) = True        -- 2
>    Equiv (t d)  (t e) = Equiv d e   -- 3
>    Equiv f      g     = False       -- 4

Alas, this does not work as well as one would like. The problem is that we cannot simplify, say, Equiv (T a) (T b) to True, because this matches the second branch but might later match the first branch. Simplifying this use would require coincident overlap checking within branched instances. We could move the first branch to the third position, and that would help, but not solve the problem. With that change, Equiv x x would not simplify, until the head of x were known.

So, is this the “compelling example” we were looking for? Perhaps. Andy, why do you want this? Can your use case be achieved with other means? Do you (anyone out there) have a suggestion for how to deal with coincident overlap within branched instances in a simple, easy-to-explain manner?

Defunctionalization for the win

I enjoy using a type system to help make sure my term level code is unimpeachably correct. This is where my interest in writing the singletons library came from. This library allows you to write some dependently typed code in Haskell, using singleton types. I didn’t invent this idea, but I did write a nice library to remove some of the pain of using this encoding. SHE can be considered an ancestor of singletons.

At my Haskell Symposium (2012) presentation of the singletons work, an attendee asked if singleton generation works for higher-order functions, like map. I innocently answered “yes”, at which point Conor McBride, sitting in the back, stood up and said “I don’t believe you!” I wasn’t lying — singletons does indeed handle higher-order functions. However, Conor’s skepticism isn’t unfounded: a “singletonized” higher-order function isn’t so useful.

This blog post explores why singletonized higher-order functions aren’t useful and suggests defunctionalization as a way to fix the problem.

Before we get too much further, this blog post is a literate Haskell file, so we have some necessary throat-clearing:

> {-# LANGUAGE TemplateHaskell, DataKinds, PolyKinds, TypeFamilies,
>              GADTs, FlexibleContexts, RankNTypes, TypeOperators #-}
> import Prelude hiding (map)
> import Data.Singletons

I should also warn that some of the code rendered as Haskell in this blog post does not have bird-tracks. This code is not intended as part of the executable code. I should finally note that this code does not compile with the current HEAD of GHC (but it does compile with 7.6.1, at least). The new ambiguity check overzealously flags some of the code here as inherently ambiguous, which it is not. I have filed bug report #7804.

Introduction to singletons

First off, what is a singleton? I will give a brief introduction here, but I refer you to the singletons paper on the subject. A singleton is a type with exactly one value. Let’s make a singleton for the natural numbers:

> data Nat = Zero | Succ Nat
> $(genSingletons [''Nat])

The second line above generates the following singleton definition for Nat:

data SNat :: Nat -> * where
  SZero :: SNat Zero
  SSucc :: SNat n -> SNat (Succ n)

(Well, it doesn’t quite generate that, but let’s pretend it does. See the paper [or use -ddump-splices!] for more details.) According to this definition, there is exactly one value for every SNat n. For example, the type SNat (Succ Zero) has one value: SSucc SZero. This is interesting because it means that once we identify a value, say in a case expression, we also know a type index. This interplay between term-level matching and type-level information is what makes singletons enable something like dependently typed programming.

The singletons library provides a singleton for [], but with alphanumeric names. We can pretend this is the definition:

data SList :: [k] -> * where
  SNil  :: SList '[]
  SCons :: Sing h -> SList t -> SList (h ': t)

The Sing in there (the first parameter to SCons) is defined thus:

data family Sing (a :: k)

Using a data family allows GHC to choose the correct singleton type, depending on the kind k. An instance of this family is defined for every singleton type we create. So, actually, the list type built into the singletons library is more like

data instance Sing (list :: [k]) where
  SNil  :: Sing '[]
  SCons :: Sing h -> Sing t -> Sing (h ': t)

The singletons library also provides a synonym SList to refer to this instance of the Sing family. Again, you may find more clarity in the singletons paper, which spends a little more time drawing all of this out.

Singleton first-order functions

We can singletonize more than just datatypes. We can singletonize functions. Consider the following predecessor function on Nats, defined in such a way that we get the singleton definition generated for us:

> $(singletons [d|
>   pred :: Nat -> Nat
>   pred Zero     = Zero
>   pred (Succ n) = n
>   |])

A definition of pred that works on singleton Nats is generated for us. It looks something like

sPred :: SNat n -> SNat ???
sPred SZero     = SZero
sPred (SSucc n) = n

The problem is those pesky ??? marks. Because the type indices of a singleton mirror the computation of singleton values, every function on singletons must be mirrored at the type level. So, to define sPred, we must have the type family Pred as well:

type family Pred (n :: Nat) :: Nat
type instance Pred Zero     = Zero
type instance Pred (Succ n) = n

sPred :: SNat n -> SNat (Pred n)
...

The singletons library generates both the type-level Pred and the singletonized sPred.

Singleton higher-order functions

But what about map?

> $(singletons [d|
>   map :: (a -> b) -> [a] -> [b]
>   map _ []      = []
>   map f (h : t) = f h : map f t
>   |])

The singletons library generates these definitions (but with some extra class constraints that don’t concern us):

type family Map (f :: k1 -> k2) (list :: [k1]) :: [k2]
type instance Map f '[]      = '[]
type instance Map f (h ': t) = ((f h) ': (Map f t)

sMap :: (forall a. Sing a -> Sing (f a)) -> Sing list -> Sing (Map f list)
sMap _ SNil        = SNil
sMap f (SCons h t) = SCons (f h) (sMap f t)

Whoa! What’s the bizarre type doing in sMap? The forall declares that the function passed into sMap must be valid for any a. That’s not so strange, when we think about the fact the index a must be isomorphic to the term Sing a. We’re used to having functions that work for any term. Here, because of the relationship between term values and type indices, the function must also work for any type index a. This is particularly important, because Map will apply f to all the as in the list list. If we leave off the forall, the function won’t type check.

This is all well and good, but this definition of sMap isn’t useful. This is because the type of that first parameter is quite restrictive. We must have a function of that type, and f must be inferrable. Let’s look at some examples. We can write the following just fine:

> sOne   = SSucc SZero
> sTwo   = SSucc sOne
> sThree = SSucc sTwo
> sNums  = SCons sOne $ SCons sTwo $ SCons sThree SNil -- [1,2,3]
> 
> two_three_four = sMap sSucc sNums

(sSucc is a so-called “smart” constructor. It is equivalent to SSucc, but adds extra class constraints that don’t concern us here. See Section 3.1 of the singletons paper.) The type of SSucc is forall a. Sing a -> Sing (Succ a), so the call to sMap type checks just fine. SSucc is perfect here. Let’s try something else:

zero_one_two = sMap sPred sNums

The type of sPred is forall n. Sing n -> Sing (Pred n), as written above, so one would think all is good. All is not good. The problem is that Pred is a type family, not a regular type constructor like good old Succ. Thus, GHC does not (and cannot, with good reason) infer that f in the type of sMap should be Pred:

Couldn't match type `Pred t1' with `t t1'
Expected type: Sing Nat t1 -> Sing Nat (t t1)
  Actual type: Sing Nat t1 -> Sing Nat (Pred t1)

The reason this inference is bogus is that GHC will not let a type variable unify with a type family. In its internal constraint-solving machinery, GHC assumes that all type variables are both injective and generative. Injective means that from an assumption of t a ~ t b, (where ~ denotes type equality) we can derive a ~ b. Generative means that from an assumption of t a ~ s b, we can derive t ~ s. Type families, in general, have neither property. So, GHC won’t let a type variable unify with a type family.

This problem — called the saturation requirement of type families — is what Conor was thinking about when he disbelieved that singletons handled map.

Defunctionalization

Over lunch while at ICFP, I had the good fortune of sitting with Tillmann Rendel, and we got to talking about this problem. He suggested that I think about defunctionalization. I have thought about this, and I think it’s the answer to the problem.

Defunctionalization is an old technique of dealing with higher-order functions. The idea is that, instead of making a closure or other pointer to code, represent a function with some symbol that can be looked up and linked to the code later. Danvy and Nielsen wrote a more recent paper explaining how the whole thing works. One drawback of the technique that they outline is that defunctionalization tends to require whole-program translation. That is, the transformation requires looking at the entire codebase to do the translation. This is generally necessary so that the table of function “symbols”, encoded as an algebraic datatype, can be matched on. However, in Haskell, we have open type functions, so this problem does not limit us.

Another drawback of defunctionalization is that it is generally poorly-typed. If we are just using some symbol to denote a function, how can we be sure that a function application is well-typed? Pottier and Gauthier address this issue in their paper on the topic by using generalized algebraic datatypes (GADTs). But, given the way type families work in Haskell, we don’t need the power of GADTs to do this for us.

Encoding defunctionalization in Haskell type families

At the heart of any defunctionalization scheme is an apply function:

type family Apply (f :: k1 -> k2) (a :: k1) :: k2

But wait, we don’t really want Apply to have that kind, because then we would have to pass Pred in as the function, and Pred all on its own is unsaturated. What we need is some symbol that can represent Pred:

type family Apply (f :: *) (a :: k1) :: k2
data PredSym :: *
type instance Apply PredSym n = Pred n

This is progress. We can now pass PredSym around all by itself, and when we apply it, we get the desired behavior. But, this is weakly kinded. We would like to be able to define many symbols akin to PredSym, and we would like GHC to be able to make sure that we use these symbols appropriately — that is, we don’t say Apply PredSym '[Int, Bool].

Yet, we still want to be able to create new symbols at will. So, we want to use data declarations to create the symbols. Thus, the kind of these symbols must end in -> *. But, we have complete freedom as to what appears to the left of that arrow. We will use this definition to store the kinds:

> data TyFun :: * -> * -> *

Now, we can make our richly kinded Apply:

> type family Apply (f :: (TyFun k1 k2) -> *) (x :: k1) :: k2
> data PredSym :: (TyFun Nat Nat) -> *
> type instance Apply PredSym x = Pred x

This one works. But, we want it to work also for real type constructors (like Succ), not just type families. We have to wrap these type constructors in an appropriately kinded wrapper:

> data TyCon :: (k1 -> k2) -> (TyFun k1 k2) -> *
> type instance Apply (TyCon tc) x = tc x

Then, we define a new version of sMap that works with Apply:

> type family DFMap (f :: (TyFun k1 k2) -> *) (ls :: [k1]) :: [k2]
> type instance DFMap f '[]      = '[]
> type instance DFMap f (h ': t) = (Apply f h ': DFMap f t)

sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]).
          (forall a. Sing a -> Sing (Apply f a)) -> Sing ls -> Sing (DFMap f ls)
sDFMap _ SNil        = SNil
sDFMap f (SCons h t) = SCons (f h) (sDFMap f t)

We’re close, but we’re not there yet. This sDFMap function has a major problem: it is inherently ambiguous. The type variable f appears only inside of type family applications, and so there’s no way for GHC to infer its value. This problem has a straightforward solution: use a proxy.

> data Proxy a = P

sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]).
          Proxy f -> (forall a. Sing a -> Sing (Apply f a)) ->
          Sing ls -> Sing (DFMap f ls)
sDFMap _ _ SNil        = SNil
sDFMap p f (SCons h t) = SCons (f h) (sDFMap p f t)

This one really does work, but we can still do better. The problem is that some plumbing is exposed. When calling this version of sDFMap with sPred, we still have to explicitly create the proxy argument and give it the correct type, even though we would like to be able to infer it from sPred. The trick is that, while we do need to have f exposed in the type of sDFMap, the location where it is exposed doesn’t matter. It can actually be in an argument to the callback function. This next, final version also contains those pesky class constraints that we’ve been trying to avoid this whole time.

> sDFMap :: forall (f :: TyFun k1 k2 -> *) (ls :: [k1]).
>           SingKind (KindParam :: OfKind k2) =>
>           (forall a. Proxy f -> Sing a -> Sing (Apply f a)) ->
>           Sing ls -> Sing (DFMap f ls)
> sDFMap _ SNil        = sNil
> sDFMap f (SCons h t) = sCons (f P h) (sDFMap f t)

To call this function, we will need to create wrappers around, say, sPred and sSucc that explicitly relate the functions to their defunctionalization symbols:

> sPred' :: Proxy PredSym -> Sing n -> Sing (Pred n)
> sPred' _ = sPred
> sSucc' :: Proxy (TyCon Succ) -> Sing n -> Sing (Succ n)
> sSucc' _ = sSucc

Now, finally, we can use sDFMap as desired:

> two_three_four' = sDFMap sSucc' sNums
> zero_one_two = sDFMap sPred' sNums

Conclusions

This whole thing is a bit of a hack, but it’s one that seems to follow a nice pattern that could be automated. In particular, I believe it should be relatively straightforward to incorporate this kind of encoding into a future version of singletons. This would allow more code to be singletonized and support dependently typed programming better.

One clear drawback of this approach is that the arity of a defunctionalized function must be a part of the encoding. In some future world with kind families, it may be possible to generalize the arities. One idea I have for a future blog post is to grapple with higher-arity and partially applied functions, which may be a bit icky. And, what about defunctionalizing higher-order functions?

The question at the end of all of this is: could this be an approach to desaturating type families in Haskell? In other words, could an approach based on the ideas here be incorporated into GHC to make all of this Just Work? I’ve thought about it a bit, but not long enough or hard enough to really have an opinion. What do you think?

And, if I write this functionality into singletons, will it be useful to you? We all have limited time, and it’s helpful to know if such an improvement will be put to use.

Variable-arity zipWith

At ICFP in September, an interesting problem was posed: Is it possible to define a variable-arity zipWith in Haskell using GHC 7.6.1? Can we leverage the new expressivity in promoted types and kinds to do away with zipWith3, zipWith4 and friends? The answer turns out to be yes.

Let’s start by enabling a bunch of non-controversial language options and declaring the module:

{-# LANGUAGE TypeFamilies, ExplicitForAll, DataKinds, GADTs,
    	     MultiParamTypeClasses, FlexibleInstances, FlexibleContexts #-}

module ZipWith where

import Prelude hiding (zipWith)

Though promotion is not strictly necessary to pull this off, it turns out to be convenient for GHC to kind-check our code. We define the natural numbers to use at the kind level:

data Nat = Zero | Succ Nat

Now, we need to start thinking about what the type of a variable-arity zipWith must be. Clearly, it will need to take the function to apply and a bunch of lists, but the number of lists is not known when we write the type. We correspondingly don’t know how many arguments the function itself should take. We’ve narrowed our type down to f -> <dragons>, for some function type f. The dragons will have to be some type-level function that evaluates to the correct sequence of arrows and argument types, based on the type substituted for f.

Examples may help here:

  • If f is a -> b, then the dragons should be [a] -> [b].
  • If f is a -> b -> c, then the dragons should be [a] -> [b] -> [c].
  • and so on.

OK. That’s not too hard. We essentially want to map the type-level [] operator over the components of the type of f. However, a problem lurks: what if the final result type is itself an arrow? In the first example above, there is nothing stopping b from being d -> e. This turns out to be a fundemental ambiguity in variable-arity zipWith. Let’s explore this for a moment.

We’ll need a three-argument function to make the discussion interesting. Here is such a function:

splotch :: Int -> Char -> Double -> String
splotch a b c = (show a) ++ (show b) ++ (show c)

Now, there are two conceivable ways to apply splotch with zipWith:

*ZipWith> :t zipWith2 splotch
zipWith2 splotch :: [Int] -> [Char] -> [Double -> String]
*ZipWith> :t zipWith3 splotch
zipWith3 splotch :: [Int] -> [Char] -> [Double] -> [String]

(Here, zipWith2 is really just the zipWith in the Prelude.)

In general, there is no way for an automated system to know which one of these possibilities we want, so it is sensible to have to provide a number to the dragons, which we’ll now name Listify. This number is the number of arguments to the function f. Here is the definition for Listify:

-- Map the type constructor [] over the types of arguments and return value of
-- a function
type family Listify (n :: Nat) (arrows :: *) :: *
type instance Listify (Succ n) (a -> b) = [a] -> Listify n b
type instance Listify Zero a = [a]

Now it would seem we can write the type of zipWith. Except, when we think about it, we realize that the operation of zipWith will have to be different depending on the choice for n in Listify. Because this n is a type, it is not available at runtime. We will need some runtime value that the implementation of zipWith can branch on.

Furthermore, we will need to convince GHC that we’re not doing something very silly, like trying Listify (Succ (Succ (Succ Zero))) (Int -> Int). So, we create a GADT that achieves both of these goals. A value from this GADT will both be a runtime witness controlling how zipWith should behave and will assert at compile time that the argument to Listify is appropriate:

-- Evidence that a function has at least a certain number of arguments
data NumArgs :: Nat -> * -> * where
  NAZero :: NumArgs Zero a
  NASucc :: NumArgs n b -> NumArgs (Succ n) (a -> b)

oneArg = NASucc NAZero
twoArgs = NASucc oneArg
threeArgs = NASucc twoArgs

Finally, we can give the type for zipWith:

zipWith :: NumArgs numArgs f -> f -> Listify numArgs f

Note that, though this zipWith is variable-arity, we still have to tell it the desired arity. More on this point later.

Once we have the type, we still need an implementation, which will need to be recursive both in the length of the lists and the number of arguments. When we think about recursion in the number of arguments to f, currying comes to the rescue… almost. Consider the following:

zipWith threeArgs splotch [1,2] ['a','b'] [3.5,4.5]

We would like a recursive call to come out to be something like

zipWith twoArgs  ['a','b'] [3.5,4.5]

The problem is that there is no replacement for <splotch ??> that works. We want to apply (splotch 1) to the first members of the lists and to apply (splotch 2) to the second members. What we really need is to take a list of functions to apply. Let’s call the function that works with list of functions listApply. Then, the recursive call would look like

listApply twoArgs [splotch 1, splotch 2] ['a','b'] [3.5,4.5]

With such a listApply function, we can now implement zipWith:

zipWith numArgs f = listApply numArgs (repeat f)

The type and implementation of listApply is perhaps a little hard to come up with, but otherwise unsurprising.

-- Variable arity application of a list of functions to lists of arguments
-- with explicit evidence that the number of arguments is valid
listApply :: NumArgs n a -> [a] -> Listify n a
listApply NAZero fs = fs
listApply (NASucc na) fs = listApply na . apply fs
  where apply :: [a -> b] -> [a] -> [b]
        apply (f:fs) (x:xs) = (f x : apply fs xs)
        apply _      _      = []

And now we’re done. Here are some examples of it all working:

example1 = listApply (NASucc NAZero) (repeat not) [False,True]
example2 = listApply (NASucc (NASucc NAZero)) (repeat (+)) [1,3] [4,5]

example3 = zipWith twoArgs (&&) [False, True, False] [True, True, False]
example4 = zipWith twoArgs (+) [1,2,3] [4,5,6]

example5 = zipWith threeArgs splotch [1,2,3] ['a','b','c'] [3.14, 2.1728, 1.01001]

But wait: can we do better? The zipWith built here still needs to be told what its arity should be. Notwithstanding the ambiguity mentioned above, can we somehow infer this arity?

I have not come up with a way to do this in GHC 7.6.1. But, I happen to (independently) be working on an extension to GHC to allow ordering among type family instance equations, just like equations for term-level functions are ordered. GHC will try the first equation and then proceed to other only if the first doesn’t match. The details are beyond the scope of this post (but will hopefully appear later), but you can check out the GHC wiki page on the subject. The following example should hopefully make sense:

-- Count the number of arguments of a function
type family CountArgs (f :: *) :: Nat
type instance where
  CountArgs (a -> b) = Succ (CountArgs b)
  CountArgs result = Zero

This function counts the number of arrows in a function type. Note that this cannot be defined without ordered equations, because there is no way in GHC 7.6.1 to say that result (the type variable in the last equation) is not an arrow.

Now, all we need to do is to be able to make the runtime witness of the argument count implicit through the use of a type class:

-- Use type classes to automatically infer NumArgs
class CNumArgs (numArgs :: Nat) (arrows :: *) where
  getNA :: NumArgs numArgs arrows
instance CNumArgs Zero a where
  getNA = NAZero
instance CNumArgs n b => CNumArgs (Succ n) (a -> b) where
  getNA = NASucc getNA

Here is the new, implicitly specified variable-arity zipWith:

{-# LANGUAGE ScopedTypeVariables #-}
-- Variable arity zipWith, inferring the number of arguments and using
-- implicit evidence of the argument count.
-- Calling this requires having a concrete return type of the function to
-- be applied; if it's abstract, we can't know how many arguments the function
-- has. So, zipWith (+) ... won't work unless (+) is specialized.
zipWith' :: forall f. CNumArgs (CountArgs f) f => f -> Listify (CountArgs f) f
zipWith' f = listApply (getNA :: NumArgs (CountArgs f) f) (repeat f)

This version does compile and work with my enhanced version of GHC. Expect to see ordered type family instances coming to a GHC near you soon!