Jakub Arnold's Blog


Using Phantom Types for Extra Safety

If you’ve been programming in a dynamic language, you’ve probably heard that type systems can catch more errors before your application even gets run. The more powerful the type system is, the more you can express in it. And because we’re talking about Haskell, we have a great number of tools at our disposal when trying to express things in terms of the types.

Why is this important? Sometimes a function has an expectation about the value that it’s receiving. In most imperative languages those expectations are implicit and up to the programmer to hold, such as the following

def foo(bar)
  bar.baz
end

In this example the function foo implicitly expects an object which is not nil. If you call foo(nil), you’ll get an exception at runtime. To combat this we usually write unit tests to verify that our system will never get into such state that the function would get passed in a nil. Now this is a very simple example, let’s take a look at a more complicated one.

Imagine you’re writing a service which receives messages from users, encrypts them, and sends them on through an unsecured channel. The messages are both being sent and received as base64 encoded strings, so you can’t easily tell if a message has been encrypted by just inspecting it.

Here’s how we could represent the message in Haskell and in Ruby, just so that we can compare the code.

data Message = Message String
class Message
  attr_accessor :text

  def initialize(text)
    @text = text
  end
end

Now this is all well and good, but we also want to keep track if the message has been encrypted or if it is still in plain text. To do this in Haskell we’ll use a simple Algebraic Data Type, while in Ruby we’ll add an additional attribute called encrypted, which will default to false.

data Message = PlainText String | Encrypted String
class Message
  attr_accessor :text, :encrypted

  def initialize(text)
    @text = text
    @encrypted = false
  end
end

While the Haskell version is less verbose, it doesn’t give us much more safety guarantees at this point. Let’s say we want to define a function which sends a message. We want it only to accept a message that has been encrypted, since sending a plain text message is unsafe and should not be allowed.

def send_message(message, recipient)
  if message.encrypted
    # send logic
  else
    raise ArgumentError, "Can’t send a plain text message"
  end
end
send :: Message -> Recipient -> IO ()
send (Encrypted m) recipient = some magic with m
send (PlainText _) _ = undefined

It doesn’t really matter how we chose to represent this in Haskell. Even if we used a Maybe or Either to handle the failure, we would still have to handle this at runtime. Which means only one thing, this function needs to be for the edge case that we pass in a message in an invalid state, and we would also need to test the error handling. This is as far as we can go with Ruby, since there’s no way to enforce more structure into the program.

But wouldn’t it be much nicer if a program that’s trying to call send with PlainText message would get rejected by the type checker? Such program is not valid in our business domain and it shouldn’t compile. If we manage to do that, we can save ourselves the error handling, and also writing tests for the error handling.

To be able to do this we need to express the relationship between the Encrypted message and the send function at the type level. The trick that allows us to do this is called Phantom Types, but to understand those, first let’s take a look at simple parametric data types in Haskell. They are very similar to templates or generics in C++/C#/Java and many other languages. Here’s a simple parametric type:

data Maybe a = Just a | Nothing

The a on the left side is simply a type parameter. If we choose to create a value such as Just 3, it would have the type of Maybe Int.

Phantom Types

A type is called a Phantom Type if it has a type parameter which only appears on the left hand side, but is not used by any of the value constructors. Here’s how we could need to modify our Message type to make it into a Phantom Type.

data Message a = Message String

This allows us to have things like Message Int, Message String, Message (Maybe Char), and so on. In itself it might not look appealing, since no matter what type we use it will still have a single value constructor which works with Strings. But let’s expand this further by adding two empty data types, one for each type of the message.

data Encrypted
data PlainText

This gives us an option to create both Message Encrypted and Message PlainText types. Remember that even if we’re not using the type parameter in any of the constructors, it is still verified by the type system, which means we can change our send function to have the following signature.

send :: Message Encrypted -> Recipient -> IO ()
encrypt :: Message PlainText -> Message Encrypted
decrypt :: Message Encrypted -> Message PlainText

The last thing we would need to do to make this completely safe is to make the constructor for Message private and only export a function for creating a new instance of the type. This makes it impossible to change the state of the Message type in any other way, but by using our encrypt and decrypt functions, because you wouldn’t be able to use pattern matching to extract the inner value. The function for creating a new Message could look something like this

newMessage :: String -> Message PlainText
newMessage s = Message s

Now armed with the power of Phantom Types, the following would be rejected by the type system, making it impossible to send plain-text messages.

send (newMessage "hello!") "[email protected]"

A similar thing could also be implemented using Generalised Algebraic Data Types (GADTs), but that’s in the scope of this article. If you’re interested in learning more, I recommend checking out the Haskell Wiki article about Phantom Types, which has some great examples, or the WikiBooks entry.

Update: As it was just pointed out in the comments on Lobste.rs, it’s worth noting that all of this safety guarantee comes for free. The types are stripped when the program type checks and compiles, so there is no runtime overhead. This might be something not so obvious to people used to programming in dynamic languages.

Related
Haskell