Tuesday, January 29, 2013

On safety and abstraction

Haskell does not have full-fledged dependent types. For example, there is no way to create naturals as a subset of integers. One way to sidestep this restriction is creating an abstract datatype. Define
newtype Natural = N Integer
and add a collection of methods (addition, printing etc.). Clients can interact with this type only via these methods. They cannot construct a pathological value such as N (-1), which could violate some invariants and crash the program.

Furthermore, we can change the implementation later on (for example, to data Natural = Zero | Succ Natural) and the clients will not notice, as long as we do not change the protocol.

It seems that abstract datatypes give us two things at once: safety from invalid values and modularity. I think this is a bad conflation. These two concerns should be separated.

Subsets

Without support of dependent types we can at least label where they should be. Define
newtype Subset a = Subset a

fromSubset :: Subset a -> a
fromSubset (Subset a) = a

unsafeToSubset :: a -> Subset a
unsafeToSubset = Subset
and change the definition of naturals to
newtype Natural = N (Subset Integer)
Here Subset a is an unspecified, context-dependent, subset of type a.

This way safety and modularity is separated. If you match on N it means that you break the module abstraction: your code depends on the implementation details. If, furthermore, you use Subset, you have to guarantee you're not constructing pathological values.

The same thing happens in restricted IO from GHC's manual, which can be defined with:
newtype RIO a = RIO (Subset (IO a))

Quotients

In naturals, unconstrained use of constructor N might lead to nonsensical objects. Sometimes there's a dual situation: unconstrained use of pattern matching might lead to invalid results.
For example, rationals are pairs of integers p/q, but we want 1/2 and 2/4 to be indistinguishable - the "numerator" operation breaks abstraction. In this post I talked about this phenomenon with functions represented by algorithms.
newtype Quotient a = Quotient a

toQuotient :: a -> Quotient a
toQuotient = Quotient

unsafeFromQuotient :: Quotient a -> a
unsafeFromQuotient (Quotient a) = a
Note that the direction of functions is reversed. Going from an object to quotient is well-defined, but when you go the other way you have to guarantee consistent behavior for equivalent answers.

Free groups (see sigfpe's post) can be defined with
newtype FreeGroup x = FreeGroup (Quotient [Either x x])
which means that an element of a free group is represented by a list, but with some identifications: [Right a, Left a] is equivalent to [] since aa-1=1.

What about rationals? An explicit version of
data Rational = Rational Integer Integer
is
data Rational = Rational (Quotient (Integer, Subset Integer))

Here there are both mechanisms: equivalent pairs must be equal, and the denominator must be nonzero. Both constructing and deconstructing rationals is unsafe in full generality; the type tells you exactly in what manner.

Final thoughts

In my opinion marking quotients and subsets should become a custom, even in absence of dependent types. It makes things explicit where they should be.

If you think those newtypes are littering too much, you can at least make them transparent with
type Subset a = a
type Quotient a = a
and let your type definitions speak more.

Finally, while both Subset a and Quotient a are ambiguous (they do not specify the subset or the partition), you can make an indication with a phantom tag:
data NonNegative
newtype Subset' p a = Subset' a
type Natural = Subset' NonNegative Integer
but handling tags gets troublesome rather fast (there is no automatic way to reason on them).

Followers