If you’ve been programming in a dynamic language, you’ve probably heard that type systems can catch more errors before your application even gets run. The more powerful the type system is, the more you can express in it. And because we’re talking about Haskell, we have a great number of tools at our disposal when trying to express things in terms of the types.
Why is this important? Sometimes a function has an expectation about the value that it’s receiving. In most imperative languages those expectations are implicit and up to the programmer to hold, such as the following
def foo(bar) bar.baz end
In this example the function
foo implicitly expects an object which is not
nil. If you call
foo(nil), you’ll get an exception at runtime. To combat
this we usually write unit tests to verify that our system will never get into
such state that the function would get passed in a
nil. Now this is a very
simple example, let’s take a look at a more complicated one.
Imagine you’re writing a service which receives messages from users, encrypts them, and sends them on through an unsecured channel. The messages are both being sent and received as base64 encoded strings, so you can’t easily tell if a message has been encrypted by just inspecting it.
Here’s how we could represent the message in Haskell and in Ruby, just so that we can compare the code.
data Message = Message String
class Message attr_accessor :text def initialize(text) @text = text end end
Now this is all well and good, but we also want to keep track if the message
has been encrypted or if it is still in plain text. To do this in Haskell we’ll
use a simple Algebraic Data Type, while in Ruby we’ll add an additional
encrypted, which will default to
data Message = PlainText String | Encrypted String
class Message attr_accessor :text, :encrypted def initialize(text) @text = text @encrypted = false end end
While the Haskell version is less verbose, it doesn’t give us much more safety guarantees at this point. Let’s say we want to define a function which sends a message. We want it only to accept a message that has been encrypted, since sending a plain text message is unsafe and should not be allowed.
def send_message(message, recipient) if message.encrypted # send logic else raise ArgumentError, "Can’t send a plain text message" end end
send :: Message -> Recipient -> IO () send (Encrypted m) recipient = some magic with m send (PlainText _) _ = undefined
It doesn’t really matter how we chose to represent this in Haskell. Even if we
Either to handle the failure, we would still have to handle
this at runtime. Which means only one thing, this function needs to be for the
edge case that we pass in a message in an invalid state, and we would also need
to test the error handling. This is as far as we can go with Ruby, since
there’s no way to enforce more structure into the program.
But wouldn’t it be much nicer if a program that’s trying to call
PlainText message would get rejected by the type checker? Such program is not
valid in our business domain and it shouldn’t compile. If we manage to do that,
we can save ourselves the error handling, and also writing tests for the error
To be able to do this we need to express the relationship between the
Encrypted message and the
send function at the type level. The trick that
allows us to do this is called
Phantom Types, but to understand those, first
let’s take a look at simple parametric data types in Haskell. They are very
similar to templates or generics in C++/C#/Java and many other languages.
Here’s a simple parametric type:
data Maybe a = Just a | Nothing
a on the left side is simply a type parameter. If we choose to create a
value such as
Just 3, it would have the type of
A type is called a Phantom Type if it has a type parameter which only appears
on the left hand side, but is not used by any of the value constructors. Here’s
how we could need to modify our
Message type to make it into a Phantom Type.
data Message a = Message String
This allows us to have things like
(Maybe Char), and so on. In itself it might not look appealing, since no
matter what type we use it will still have a single value constructor which
Strings. But let’s expand this further by adding two empty data
types, one for each type of the message.
data Encrypted data PlainText
This gives us an option to create both
Message Encrypted and
PlainText types. Remember that even if we’re not using the type parameter in
any of the constructors, it is still verified by the type system, which means
we can change our
send function to have the following signature.
send :: Message Encrypted -> Recipient -> IO () encrypt :: Message PlainText -> Message Encrypted decrypt :: Message Encrypted -> Message PlainText
The last thing we would need to do to make this completely safe is to make the
Message private and only export a function for creating a new
instance of the type. This makes it impossible to change the state of the
Message type in any other way, but by using our
functions, because you wouldn’t be able to use pattern matching to extract the
inner value. The function for creating a new
Message could look something
newMessage :: String -> Message PlainText newMessage s = Message s
Now armed with the power of Phantom Types, the following would be rejected by the type system, making it impossible to send plain-text messages.
send (newMessage "hello!") "email@example.com"
A similar thing could also be implemented using Generalised Algebraic Data Types (GADTs), but that’s in the scope of this article. If you’re interested in learning more, I recommend checking out the Haskell Wiki article about Phantom Types, which has some great examples, or the WikiBooks entry.
Update: As it was just pointed out in the comments on Lobste.rs, it’s worth noting that all of this safety guarantee comes for free. The types are stripped when the program type checks and compiles, so there is no runtime overhead. This might be something not so obvious to people used to programming in dynamic languages.