Jakub Arnold's Blog


Using Phantom Types in Haskell for Extra Safety - Part 2

I’ve received a lot of reactions to the previous blog post about Phantom Types over the past two days, which is why I’ve decided to summarize what I’ve learned in another blog post.

First, here’s a summarized problem from the previous post. We have a Message which can be either PlainText or Encrypted. We’ve used Phantom Types to enforce this in the type system:

data Message a = Message String

data PlainText
data Encrypted

send :: Message Encrypted -> IO ()
encrypt :: Message PlainText -> Message Encrypted
decrypt :: Message Encrypted -> Message PlainText

Can newtype do the same?

Many people mentioned that we could use the Haskell’s newtype to do the same, here’s how that would look.

data Message = Message String
newtype PlainTextMessage = PlainTextMessage Message
newtype EncryptedMessage = EncryptedMessage Message

send :: EncryptedMessage -> IO ()
encrypt :: PlainTextMessage -> EncryptedMessage
decrypt :: EncryptedMessage -> PlainTextMessage

This example would work perfectly fine, and it’s how you’d probably solve this in a statically typed language with no option for representing Phantom Types.

But there’s one downside to this solution. Our new PlainTextMessage and EncryptedMessage are no longer related, which means we can’t write a function that operates on both of them. Why would we need that? I’m glad you asked! Here’s how a simple length function would look in Haskell.

length :: [a] -> Int
length [] = 0
length (x:xs) = 1 + length xs

In order to calculate the length of a list, we do not care what is in the list. The same way if we wanted to calculate a messageLength, we don’t care if the message has been encrypted or not, we just want to count the characters. This is dead simple if we had Phantom Types, but it would be very hard using the newtype solution, since PlainTextMessage and EncryptedMessage are parametrically (is that even a word?) not the same thing.

messageLength :: Message a -> Int
messageLength (Message m) = length m

As you can see, we simply ignore the type parameter a of the Message type and calculate the length of the inner String.

We could achieve the same in the newtype solution using type classes, but it would be unnecessarily more complicated. Phantom types just fit this solution more naturally.

GADTs

Some people have noted that we could achieve the same thing using GADTs (Generalised Algebraic Data Types), which is an extension to the Haskell’s type system. I didn’t want to dive into this at first, since GADTs are much harder to understand for non-Haskell programmers, but let’s show a simple implementation of this example.

data Encrypted
data PlainText

data Message a where
  EncryptedMessage :: String -> Message Encrypted
  PlainTextMessage :: String -> Message PlainText

The difference here is that we’re basically creating typed value constructors which automatically enforce the resulting type of the Message. For example if we do EncryptedMessage "hello", it will automatically have the type of Message Encrypted. This might seem the same as the newtype solution mentioned above, but by using GADTs we can still write a generic messageLength function, exactly as we did previously.

messageLength :: Message a -> Int
messageLength (EncryptedMessage m) = length m
messageLength (PlainTextMessage m) = length m

The difference here is, that we need to pattern match on both of the constructors. An implementation fo the send function might look something like this.

send :: Message Encrypted -> IO ()
send (EncryptedMessage m) = -- some magic

If you’re familiar a bit with Haskell, you might be thinking that this function is not total and could produce a non-exhaustive pattern match error. But in fact it can’t, because it expects it’s argument of the type Message Encrypted. If you try to call it with a PlainText message it would be a type error.

send (PlainTextMessage "hello") -- type error

This is one of the beauties of GADTs. If you’re interested in learning more about them, I recommend reading the Haskell Wiki page as well as many others. I’ll probably write another followup article that explains just GADTs, just because they’re such a rich feature.

Tell don’t ask™

Patrick Dlogan actually took the time to write an article as a reaction to mine, where he shows a solution in which messages know how to encrypt themselves, which allows you to get rid of the if check in a dynamic language. Here’s also a similar response from comments on Lobste.rs.

Message = Struct.new(:text) do
  def ciphertext
    @ciphertext ||= # encrypt plain text logic
  end
end

def send_message(message)
  # send using message.ciphertext
end

We could label both of these solutions as a kind of tell don’t ask™ principle. Basically what it means is that instead of performing the encryption first, and then sending the message out, the encryption step is being run directly when sending the message.

Here’s how something similar might look in Haskell. We’re simply doing the encryption when sending the message.

send :: Message -> IO ()
send (Message m) = someMagic (encrypt m)

Now this might make sense in some cases, but what if there is more than one place where a message can get encrypted? We could solve that by making encrypt do nothing for already encrypted messages, but there are downsides to doing that.

First of all it’s important to realize that this is restructuring how the program works. If encrypt is something that can fail we’ve effectively moved that failure to a different place. If encrypt was throwing an exception that had to be handled, now that error handling needs to happen in the place of the caller of send (assuming it’s not something we can deal right in place.)

Another more important reason why this wouldn’t always be possible is that the code for constructing messages might be outside of our control. Say that all of the logic is hidden in a library which you can’t change for various reason, or these are just some data types you’re receiving from an API.

The library could still make use of Phantom Types to safely tag the values on the type level, while you wouldn’t be able to apply this tell don’t ask approach, since the encrypt logic is not in your control.

I guess the TL;DR here is that by using the type system in a smart way we can add additional checks that are verified at compile time, that increase the safety of our programs. It’s not a technique for re-structuring or re-designing a portion of the codebase.

Related
Haskell