Tag Archives: functional dependencies

What are type families?

I just returned from yet another fantastic ICFP week. I always have a blast at the conference. It would be easy to take the conversations I had there and turn them into a full-blown research program that would easily take me to next year’s ICFP… and then rinse and repeat!

But in this post, I want to focus on one particlar thorn in my side that started bothering me more than it had before: type families. I’ve realized, in the course of many conversations, that I don’t really understand them. This post will be more questions than answers as I try to write down what ails me.

This post is Literate Haskell.

> {-# LANGUAGE TypeFamilies, DataKinds, UndecidableInstances, GADTs,
>              MultiParamTypeClasses, FunctionalDependencies,
>              FlexibleInstances, FlexibleContexts #-}
> module WhatAreTypeFamilies where

Total type families are OK

A total type family feels quite comfortable. By total here, I mean a type family whose equations cover all the possible cases and which is guaranteed to terminate. That is, a total type family is properly a function on types. Those other dependently typed languages have functions on types, and they seem to work nicely. I am completely unbothered by total type families.

Non-covering type families are strange

A non-covering type family is a type family whose patterns do not cover the whole space. Let’s consider closed and open families separately, because the issues that come up are different.

Closed type families

For example:

> type family F1 a where
>   F1 Int = Bool

Note that F1 is closed – we won’t ever be adding more equations. F1 is clearly non-covering. Yet, the type F1 Char is perfectly well formed.

> sillyId :: F1 Char -> F1 Char
> sillyId x = x

But sillyId can’t ever be called on a value. This doesn’t stop us from doing chicanery with F1 though.

> data SillyGadt a where
>   MkSillyGadt1 :: Int   -> SillyGadt (F1 Char)
>   MkSillyGadt2 :: Float -> SillyGadt (F1 Double)

Now, even though F1 Char is nonsense, SillyGadt (F1 Char) is inhabited. And GHC distinguishes between F1 Char and F1 Double.

In GHC, a type family that can’t reduce is considered “stuck”. Such a type is effectively a new type, equal only to itself, like empty datatypes a programmer might declare. The only thing different about something like F1 Char from something like Maybe Char is that F1 Char cannot be used in a type pattern (that is, the left-hand side of a type family equation or data family instance). That’s the only difference. So, when we declare F1, we create an infinite family of unique types containing F1 applied to all types of kind *, except for Int, because F1 Int is identically Bool.

I find this treatment of non-covering closed type families rather bizarre. Compare against term-level behavior. When a term-level function is non-covering, passing an argument for which there are no equations immediately results in a call to error. Yet the same thing in types behaves very differently.

I’m not the only one who finds this strange: Lennart Augustsson posted GHC bug #9636 about this very issue. Indeed, my current views on the subject are strongly influenced by Lennart’s comments.

But, sometimes you want stuckness

Sadly, sometimes authors want stuck type families. I know of two examples.

  1. The bowels of GHC defines this:
    type family Any :: k where {}

    Thus, Any is a closed type family, available at any kind, with no equations. GHC makes convenient use for Any. Consider the expression length []. This expression has some pointless polymorphism in it: length can be applied to a list of any type, and [] can be a list of any type. Yet, GHC must choose precisely which type these polymorphic definitions are specialized to. Not to play favorites, GHC chooses Any :: *. Such pointless polymorphism can indeed happen at any kind, so Any is polymorphic in its return kind. (Before kind polymorphism, Any existed as a special form. It’s not so special any more.)

    It’s quite important that Any is a type family, not a datatype. If it were a datatype (and we forget about the weirdness of a datatype polymorphic in its return kind), then it could be matched in type patterns, making this strangeness possible:

    type family ThreeBools (b :: Bool) :: Nat where
      ThreeBools 'True  = 1
      ThreeBools 'False = 2
      ThreeBools Any    = 3

    Bizarre! And yet this was possible in GHC 7.6 and below (modulo the closed type family, which didn’t exist yet). And it caused problems.

    Of course, with Any, we really want it to be stuck. We don’t want Any to reduce to some other type, and we don’t want uses of Any to be errors.

  2. In his work implementing a units-of-measure library via type-checker plugins, Adam Gundry uses closed type families to define type functions whose behavior is specified only via a plugin, not by GHC’s built-in type family reduction rules. He wanted empty closed type families specifically, and so implemented them within GHC. The idea is that units-of-measure with their integer powers form an abelian group, which can be solved via custom machinery but not by GHC’s rather limited left-to-right rewriting used in type family reduction.

    Once again, we want a closed type family that is stuck but is not an error. The existence of multiple such things got me ruminating on all of this some time ago. I attempted to figure this all out in a post on a GHC ticket, but that didn’t get me very far. And I won’t claim to have made any real progress since that post.

Open type families

Having a stuck open type family makes moderately more sense than a stuck closed type family. Suppose we have this:

> type family F2 a
> type instance F2 Int = Bool

Because new equations can be added to open type families at any time, talking about F2 Char isn’t utterly bogus. But it might be nice to have to assert that an equation for F2 Char is in scope before we use it in a type.

Injective type families

Injective type families are a new feature that just got merged with GHC head. Jan Stolarek, Simon Peyton Jones, and I had a paper at Haskell Symposium about them.

The idea is that you can put an injectivity annotation on a type family declaration, which is rather like a functional dependency. (The injective type family code in this post is not runnable, as it works only with GHC HEAD. The feature will be available with GHC 8.0, due out early 2016.)

type family F3 a = r | r -> a where
  F3 Int  = Bool
  F3 Bool = Int
  F3 a    = a

Note that if you know the value of F3 a, then you can discover the value of a: that’s what injectivity is all about.

Now consider this:

type family F4 a = r | r -> a where
  F4 (Maybe a) = a

Looks injective, right? Wrong. The problem is that, if this were injective, we could take a proof that F4 a ~ F4 b and get a proof of a ~ b. Now, consider F4 (Maybe (F4 Int)). This reduces to F4 Int. Thus, we have F4 (Maybe (F4 Int)) ~ F4 Int. And now, by injectivity, we get Maybe (F4 Int) ~ Int. That is, a Maybe <blah> equals an Int. Terrible!

So, in our implementation of injectivity, we exclude type families like F4. But it’s really unexpected that F4 should be rejected (at least to me). The problem, at its core, is about partiality. Note that we needed F4 Int – an un-covered case – to make the bad equality proof.

Looping type families are strange

At first blush, it seems that a looping type families cause chiefly one problem: a looping GHC. But I’m very unconcerned about a looping GHC. After all, if a user has written a non-terminating type family, that amounts to a non-terminating program to be run at compile time. No one should be very surprised when a program that contains a loop does, in fact, loop.

The problem is that looping type families can actually create effectively infinite types. GHC has various protections against actually running off to infinity, but a clever programmer can defeat these protections. And infinite types can cause unsoundness. Indeed, while writing the closed type family paper, we uncovered one such unsoundness, closing bug #8162.

However, even with #8162 gone, we still have a lingering problem. Consider

> type family F5 a b where
>   F5 a a = 'True
>   F5 a b = 'False

This is a straightforward equality test, and it works in many cases. But if we ask F5 a (Maybe a) (for some unknown a), the type family doesn’t reduce. It really should! There is no way a finite a can be the same as Maybe a. But we can have an infinite a, instantiated to MaybeInf:

> type family MaybeInf where MaybeInf = Maybe MaybeInf

MaybeInf here is a perfectly well-formed nullary (0-argument) type family. If the a in F5 a (Maybe a) were instantiated with MaybeInf, then reducing to 'True would be correct. And thus GHC can’t reduce F5 a (Maybe a) to 'False, as one might like.

And this isn’t purely an academic concern. I don’t have links I can share, but this has bitten real programmers doing real work. Once again, the problem boils down to partiality: GHC must be conservative about reducing closed type families (such as F5) because of the possibility that someone, somewhere will write a non-terminating type family. If F5 a (Maybe a) were to reduce to 'False, I’m confident that a clever enough person could use this behavior to implement unsafeCoerce. Similar concerns prevent certain type families from being considered injective – see the paper for the example. If we could somehow be sure that there were no non-terminating type families, these concerns would go away.

Contrast with functional dependencies

Simply, functional dependencies don’t suffer from these problems. Let’s look at the issue around injectivity, as it’s representative. First, recast in terms of functional dependencies:

> class F4' a r | a -> r, r -> a
> instance F4' (Maybe a) a

Note the bidirectional functional dependency. The left-to-right dependency is the built-in dependency inherent in type families. The right-to-left dependency arises from the injectivity annotation.

Our problem case above was F4 (Maybe (F4 Int)). This case maps to

fundep :: (F4' Int f4_int, F4' (Maybe f4_int) result) => ()
fundep = ()

GHC helpfully tells us

Couldn't match type ‘Maybe f4_int’ with ‘Int’
arising from a functional dependency between:
  constraint ‘F4' Int f4_int’
    arising from the type signature for
                   fundep :: (F4' Int f4_int, F4' (Maybe f4_int) result) => Int
  instance ‘F4' (Maybe a) a’

So GHC notices that there is a problem. But even if it didn’t, we’d be OK. That’s because the F4' Int f4_int constraint essentially asserts that there is some value for F4' Int. Of course, there isn’t any such value, and so fundep would never be callable. Indeed, we can’t write any more instances of F4', as anything we could write would violate the dependencies.

The bottom line here is that there is no way to get into trouble here because the constraints necessary to use bogus functional dependencies can never be satisfied.

A radical proposal

So, what should we do about all of this? I’m inspired by the non-problems with functional dependencies. I thus propose the following. (Actually, I don’t. This is too radical. But I’m trying to provoke discussion. So take this all with a heap of salt.)

  1. Remove unassociated open type families. They’re quite awkward anyway – either they should be closed and define some sort of concrete function, or they will need a class constraint nearby to be useful.
  2. Require class constraints to use associated type families. That is, if F a is associated with a class C a, writing F Int without a C Int constraint in scope is an error. This means that the use of an associated type family in the right-hand side of another associated type family will require the class constraint in the instance context.
  3. Require closed type families to be total, with an improved termination checker. Naturally, closed type families would be unable to call partial type families.
  4. Devise a mechanism to associate a closed type family with a class. This would allow for partial closed type families, which indeed are useful.

The bottom line to all of this is that any partial type family should be associated with a class constraint. If the constraint can be satisfied, this means that the type family is defined and well-behaved at the arguments provided. With this, I believe all the problems above go away.

Another way to see this proposal is that functional dependencies are the Right Way to do this, and type families (at least partial ones) are wonky. But type families have much better syntax, and I’d like to keep that.

So, where to go from here? I’m not sure. This proposal is pretty radical, but it could potentially be implemented with the requisite deprecation period, etc. But, mostly, I’m curious to see what conversation this all provokes.