I’ve received a lot of reactions to the previous blog post about Phantom Types over the past two days, which is why I’ve decided to summarize what I’ve learned in another blog post.
First, here’s a summarized problem from the previous post. We have a Message
which can be either PlainText
or Encrypted
. We’ve used Phantom Types to
enforce this in the type system:
data Message a = Message String
data PlainText
data Encrypted
send :: Message Encrypted -> IO ()
encrypt :: Message PlainText -> Message Encrypted
decrypt :: Message Encrypted -> Message PlainText
Can newtype do the same?
Many people mentioned that we could use the Haskell’s newtype
to do the same,
here’s how that would look.
data Message = Message String
newtype PlainTextMessage = PlainTextMessage Message
newtype EncryptedMessage = EncryptedMessage Message
send :: EncryptedMessage -> IO ()
encrypt :: PlainTextMessage -> EncryptedMessage
decrypt :: EncryptedMessage -> PlainTextMessage
This example would work perfectly fine, and it’s how you’d probably solve this in a statically typed language with no option for representing Phantom Types.
But there’s one downside to this solution. Our new PlainTextMessage
and
EncryptedMessage
are no longer related, which means we can’t write a function
that operates on both of them. Why would we need that? I’m glad you asked!
Here’s how a simple length
function would look in Haskell.
length :: [a] -> Int
length [] = 0
length (x:xs) = 1 + length xs
In order to calculate the length of a list, we do not care what is in the list.
The same way if we wanted to calculate a messageLength
, we don’t care if the
message has been encrypted or not, we just want to count the characters. This
is dead simple if we had Phantom Types, but it would be very hard using the
newtype
solution, since PlainTextMessage
and EncryptedMessage
are
parametrically (is that even a word?) not the same thing.
messageLength :: Message a -> Int
messageLength (Message m) = length m
As you can see, we simply ignore the type parameter a
of the Message
type
and calculate the length of the inner String
.
We could achieve the same in the newtype
solution using type classes, but it
would be unnecessarily more complicated. Phantom types just fit this solution
more naturally.
GADTs
Some people have noted that we could achieve the same thing using GADTs
(Generalised Algebraic Data Types), which is an extension to the Haskell’s type
system. I didn’t want to dive into this at first, since GADTs are much harder
to understand for non-Haskell programmers, but let’s show a simple
implementation of this example.
data Encrypted
data PlainText
data Message a where
EncryptedMessage :: String -> Message Encrypted
PlainTextMessage :: String -> Message PlainText
The difference here is that we’re basically creating typed value constructors
which automatically enforce the resulting type of the Message
. For example if
we do EncryptedMessage "hello"
, it will automatically have the type of
Message Encrypted
. This might seem the same as the newtype
solution
mentioned above, but by using GADTs
we can still write a generic
messageLength
function, exactly as we did previously.
messageLength :: Message a -> Int
messageLength (EncryptedMessage m) = length m
messageLength (PlainTextMessage m) = length m
The difference here is, that we need to pattern match on both of the
constructors. An implementation fo the send
function might look something
like this.
send :: Message Encrypted -> IO ()
send (EncryptedMessage m) = -- some magic
If you’re familiar a bit with Haskell, you might be thinking that this function
is not total and could produce a non-exhaustive pattern match error. But in
fact it can’t, because it expects it’s argument of the type Message Encrypted
. If you try to call it with a PlainText
message it would be a type
error.
send (PlainTextMessage "hello") -- type error
This is one of the beauties of GADTs
. If you’re interested in learning more
about them, I recommend reading the Haskell Wiki
page as
well as many
others. I’ll probably write
another followup article that explains just GADTs
, just because they’re such
a rich feature.
Tell don’t ask™
Patrick Dlogan actually took the time to write an article as a reaction to
mine,
where he shows a solution in which messages know how to encrypt themselves,
which allows you to get rid of the if
check in a dynamic language. Here’s
also a similar response from comments on
Lobste.rs.
Message = Struct.new(:text) do
def ciphertext
@ciphertext ||= # encrypt plain text logic
end
end
def send_message(message)
# send using message.ciphertext
end
We could label both of these solutions as a kind of tell don’t ask™ principle. Basically what it means is that instead of performing the encryption first, and then sending the message out, the encryption step is being run directly when sending the message.
Here’s how something similar might look in Haskell. We’re simply doing the encryption when sending the message.
send :: Message -> IO ()
send (Message m) = someMagic (encrypt m)
Now this might make sense in some cases, but what if there is more than one
place where a message can get encrypted? We could solve that by making
encrypt
do nothing for already encrypted messages, but there are downsides to
doing that.
First of all it’s important to realize that this is restructuring how the
program works. If encrypt
is something that can fail we’ve effectively moved
that failure to a different place. If encrypt
was throwing an exception that
had to be handled, now that error handling needs to happen in the place of the
caller of send
(assuming it’s not something we can deal right in place.)
Another more important reason why this wouldn’t always be possible is that the code for constructing messages might be outside of our control. Say that all of the logic is hidden in a library which you can’t change for various reason, or these are just some data types you’re receiving from an API.
The library could still make use of Phantom Types to safely tag the values on
the type level, while you wouldn’t be able to apply this tell don’t ask
approach, since the encrypt
logic is not in your control.
I guess the TL;DR here is that by using the type system in a smart way we can add additional checks that are verified at compile time, that increase the safety of our programs. It’s not a technique for re-structuring or re-designing a portion of the codebase.
Haskell