I’ve received a lot of reactions to the previous blog post about Phantom Types over the past two days, which is why I’ve decided to summarize what I’ve learned in another blog post.
First, here’s a summarized problem from the previous post. We have a
which can be either
Encrypted. We’ve used Phantom Types to
enforce this in the type system:
data Message a = Message String data PlainText data Encrypted send :: Message Encrypted -> IO () encrypt :: Message PlainText -> Message Encrypted decrypt :: Message Encrypted -> Message PlainText
Can newtype do the same?
Many people mentioned that we could use the Haskell’s
newtype to do the same,
here’s how that would look.
data Message = Message String newtype PlainTextMessage = PlainTextMessage Message newtype EncryptedMessage = EncryptedMessage Message send :: EncryptedMessage -> IO () encrypt :: PlainTextMessage -> EncryptedMessage decrypt :: EncryptedMessage -> PlainTextMessage
This example would work perfectly fine, and it’s how you’d probably solve this in a statically typed language with no option for representing Phantom Types.
But there’s one downside to this solution. Our new
EncryptedMessage are no longer related, which means we can’t write a function
that operates on both of them. Why would we need that? I’m glad you asked!
Here’s how a simple
length function would look in Haskell.
length :: [a] -> Int length  = 0 length (x:xs) = 1 + length xs
In order to calculate the length of a list, we do not care what is in the list.
The same way if we wanted to calculate a
messageLength, we don’t care if the
message has been encrypted or not, we just want to count the characters. This
is dead simple if we had Phantom Types, but it would be very hard using the
newtype solution, since
parametrically (is that even a word?) not the same thing.
messageLength :: Message a -> Int messageLength (Message m) = length m
As you can see, we simply ignore the type parameter
a of the
and calculate the length of the inner
We could achieve the same in the
newtype solution using type classes, but it
would be unnecessarily more complicated. Phantom types just fit this solution
Some people have noted that we could achieve the same thing using
(Generalised Algebraic Data Types), which is an extension to the Haskell’s type
system. I didn’t want to dive into this at first, since GADTs are much harder
to understand for non-Haskell programmers, but let’s show a simple
implementation of this example.
data Encrypted data PlainText data Message a where EncryptedMessage :: String -> Message Encrypted PlainTextMessage :: String -> Message PlainText
The difference here is that we’re basically creating typed value constructors
which automatically enforce the resulting type of the
Message. For example if
EncryptedMessage "hello", it will automatically have the type of
Message Encrypted. This might seem the same as the
mentioned above, but by using
GADTs we can still write a generic
messageLength function, exactly as we did previously.
messageLength :: Message a -> Int messageLength (EncryptedMessage m) = length m messageLength (PlainTextMessage m) = length m
The difference here is, that we need to pattern match on both of the
constructors. An implementation fo the
send function might look something
send :: Message Encrypted -> IO () send (EncryptedMessage m) = -- some magic
If you’re familiar a bit with Haskell, you might be thinking that this function
is not total and could produce a non-exhaustive pattern match error. But in
fact it can’t, because it expects it’s argument of the type
Encrypted. If you try to call it with a
PlainText message it would be a type
send (PlainTextMessage "hello") -- type error
This is one of the beauties of
GADTs. If you’re interested in learning more
about them, I recommend reading the Haskell Wiki
well as many
others. I’ll probably write
another followup article that explains just
GADTs, just because they’re such
a rich feature.
Tell don’t ask™
Patrick Dlogan actually took the time to write an article as a reaction to
where he shows a solution in which messages know how to encrypt themselves,
which allows you to get rid of the
if check in a dynamic language. Here’s
also a similar response from comments on
Message = Struct.new(:text) do def ciphertext @ciphertext ||= # encrypt plain text logic end end def send_message(message) # send using message.ciphertext end
We could label both of these solutions as a kind of tell don’t ask™ principle. Basically what it means is that instead of performing the encryption first, and then sending the message out, the encryption step is being run directly when sending the message.
Here’s how something similar might look in Haskell. We’re simply doing the encryption when sending the message.
send :: Message -> IO () send (Message m) = someMagic (encrypt m)
Now this might make sense in some cases, but what if there is more than one
place where a message can get encrypted? We could solve that by making
encrypt do nothing for already encrypted messages, but there are downsides to
First of all it’s important to realize that this is restructuring how the
program works. If
encrypt is something that can fail we’ve effectively moved
that failure to a different place. If
encrypt was throwing an exception that
had to be handled, now that error handling needs to happen in the place of the
send (assuming it’s not something we can deal right in place.)
Another more important reason why this wouldn’t always be possible is that the code for constructing messages might be outside of our control. Say that all of the logic is hidden in a library which you can’t change for various reason, or these are just some data types you’re receiving from an API.
The library could still make use of Phantom Types to safely tag the values on
the type level, while you wouldn’t be able to apply this tell don’t ask
approach, since the
encrypt logic is not in your control.
I guess the TL;DR here is that by using the type system in a smart way we can add additional checks that are verified at compile time, that increase the safety of our programs. It’s not a technique for re-structuring or re-designing a portion of the codebase.