Haskell typeclasses by example

Arithmetic expressions and a simple virtual machine

This presentation is based on University of Pennsylvania course CIS194, Spring 2013 edition, Homework 5.

This course is online and is a good tool for teaching yourself Haskell.


By Elze Hamilton, @elze , elze.hamilton@gmail.com

What follows is the code for the Parser and StackVM modules. It was written by the authors of CIS194 Homework 5, and provided with the homework. Parser parses strings into arithmetic expressions, and StackVM runs a "virtual machine". I'm including it here because this presentation is generated from a Jupyter IHaskell notebook, and this code is necessary for the code in the notebook to run. Feel free to ignore the next three slides for the purposes of this presentation. The presentation starts at Slide 5.

The code for Parser.hs could be found at Parser.hs

The code for StackVM.hs could be found at StackVM.hs

In [1]:
:extension TypeSynonymInstances
:extension FlexibleInstances
In [2]:
-- Applicative parser for infix arithmetic expressions without any
-- dependency on hackage. Builds an explicit representation of the
-- syntax tree to fold over using client-supplied semantics.
module Parser (parseExp) where
import Control.Applicative hiding (Const)
import Control.Arrow
import Data.Char
import Data.Monoid
import Data.List (foldl')

-- Building block of a computation with some state of type @s@
-- threaded through it, possibly resulting in a value of type @r@
-- along with some updated state.
newtype State s r = State (s -> Maybe (r, s))

-- Expressions
data Expr = Const Integer
          | Add Expr Expr
          | Mul Expr Expr
            deriving Show

instance Functor (State s) where
    fmap f (State g) = State $ fmap (first f) . g

instance Applicative (State s) where
    pure x = State $ \s -> Just (x, s)
    State f <*> State g = State $ \s ->
                          case f s of
                            Nothing -> Nothing
                            Just (r, s') -> fmap (first r) . g $ s'

instance Alternative (State s) where
    empty = State $ const Nothing
    State f <|> State g = State $ \s -> maybe (g s) Just (f s)

-- A parser threads some 'String' state through a computation that
-- produces some value of type @a@.
type Parser a = State String a

-- Parse one numerical digit.
digit :: Parser Integer
digit = State $ parseDigit
    where parseDigit [] = Nothing
          parseDigit s@(c:cs)
              | isDigit c = Just (fromIntegral $ digitToInt c, cs)
              | otherwise = Nothing

-- Parse an integer. The integer may be prefixed with a negative sign.
num :: Parser Integer
num = maybe id (const negate) <$> optional (char '-') <*> (toInteger <$> some digit)
    where toInteger = foldl' ((+) . (* 10)) 0

-- Parse a single white space character.
space :: Parser ()
space = State $ parseSpace
    where parseSpace [] = Nothing
          parseSpace s@(c:cs)
              | isSpace c = Just ((), cs)
              | otherwise = Nothing

-- Consume zero or more white space characters.
eatSpace :: Parser ()
eatSpace = const () <$> many space

-- Parse a specific character.
char :: Char -> Parser Char
char c = State parseChar
    where parseChar [] = Nothing
          parseChar (x:xs) | x == c = Just (c, xs)
                           | otherwise = Nothing

-- Parse one of our two supported operator symbols.
op :: Parser (Expr -> Expr -> Expr)
op = const Add <$> (char '+') <|> const Mul <$> (char '*')

-- Succeed only if the end of the input has been reached.
eof :: Parser ()
eof = State parseEof
    where parseEof [] = Just ((),[])
          parseEof _  = Nothing

-- Parse an infix arithmetic expression consisting of integers, plus
-- signs, multiplication signs, and parentheses.
parseExpr :: Parser Expr
parseExpr = eatSpace *>
            ((buildOp <$> nonOp <*> (eatSpace *> op) <*> parseExpr) <|> nonOp)
    where buildOp x op y = x `op` y
          nonOp = char '(' *> parseExpr <* char ')' <|> Const <$> num

-- Run a parser over a 'String' returning the parsed value and the
-- remaining 'String' data.
execParser :: Parser a -> String -> Maybe (a, String)
execParser (State f) = f

-- Run a parser over a 'String' returning the parsed value.
evalParser :: Parser a -> String -> Maybe a
evalParser = (fmap fst .) . execParser

-- Parse an arithmetic expression using the supplied semantics for
-- integral constants, addition, and multiplication.
parseExp :: (Integer -> a) -> (a -> a -> a) -> (a -> a -> a) -> String -> Maybe a
parseExp con add mul = (convert <$>) . evalParser (parseExpr <* eof)
    where convert (Const x) = con x
          convert (Add x y) = add (convert x) (convert y)
          convert (Mul x y) = mul (convert x) (convert y)
Use <|>
maybe (g s) Just (f s)
Why Not:
f s <|> g s
Redundant $
State $ parseDigit
Why Not:
State parseDigit
Redundant $
State $ parseSpace
Why Not:
State parseSpace
Use void
const () <$> many space
Why Not:
Control.Monad.void (many space)
Redundant bracket
const Add <$> (char '+')
Why Not:
const Add <$> char '+'
Redundant bracket
const Mul <$> (char '*')
Why Not:
const Mul <$> char '*'
In [3]:
module StackVM (StackVal(..), StackExp(..), Stack, Program, stackVM) where

-- Values that may appear in the stack. Such a value will also be
-- returned by the stackVM program execution function.
data StackVal = IVal Integer | BVal Bool | Void deriving Show

-- The various expressions our VM understands.
data StackExp = PushI Integer
              | PushB Bool
              | Add
              | Mul
              | And
              | Or
                deriving Show

type Stack   = [StackVal]
type Program = [StackExp]

-- Execute the given program. Returns either an error message or the
-- value on top of the stack after execution.
stackVM :: Program -> Either String StackVal
stackVM = execute []

errType :: String -> Either String a
errType op = Left $ "Encountered '" ++ op ++ "' opcode with ill-typed stack."

errUnderflow :: String -> Either String a
errUnderflow op = Left $ "Stack underflow with '" ++ op ++ "' opcode."

-- Execute a program against a given stack.
execute :: Stack -> Program -> Either String StackVal
execute [] []                               = Right Void
execute (s:_) []                            = Right s

execute s (PushI x : xs)                    = execute (IVal x : s) xs
execute s (PushB x : xs)                    = execute (BVal x : s) xs

execute (IVal s1 : IVal s2 : ss) (Add : xs) = execute (s':ss) xs
    where s' = IVal (s1 + s2)
execute (_:_:_) (Add:_)                     = errType "Add"
execute _ (Add:_)                           = errUnderflow "Add"

execute (IVal s1:IVal s2:ss) (Mul : xs)     = execute (s':ss) xs
    where s' = IVal (s1 * s2)
execute (_:_:_) (Mul:_)                     = errType "Mul"
execute _ (Mul:_)                           = errUnderflow "Mul"

execute (BVal s1:BVal s2:ss) (And : xs)     = execute (s':ss) xs
    where s' = BVal (s1 && s2)
execute (_:_:_) (And:_)                     = errType "And"
execute _ (And:_)                           = errUnderflow "And"

execute (BVal s1 : BVal s2 : ss) (Or : xs)  = execute (s':ss) xs
    where s' = BVal (s1 || s2)
execute (_:_:_) (Or:_)                      = errType "Or"
execute _ (Or:_)                            = errUnderflow "Or"

test = stackVM [PushI 3, PushI 5, Add]

Let's say we want to write a calculator in Haskell. All it will do is add and multiply integers.

Maybe we started by modeling the domain with the following data type of arithmetic expressions:

In [4]:
data ExprT  = Const Integer
             | Add ExprT ExprT
             | Mul ExprT ExprT
             deriving (Show, Eq)

This type, ExprT, is capable of representing expressions involving integer constants, addition, and multiplication.

On the right hand side there are three data constructors:

  • Const, which simply wraps an Integer in an ExprT,
  • Add, which constructs an ExprT from two other ExprT's, and represents addition
  • Mul, which also constructs an ExprT from two other ExprT's, and represents multiplication

For example, the expression (2 + 3) × 4 would be represented by the value

Mul (Add (Const 2) (Const 3)) (Const 4)

But now let's say we want to also make this expression work on Booleans.

Let's say we thought of a way to interpret Integers as Booleans, and we want to construct expressions from them using the same methods.

For example:

  • if an Integer is > 0, it is interpreted as True, otherwise False
  • An Add for two Booleans is an or
  • A Mul for two Booleans is an and

What we want is the expression Mul (Add (Const (-2)) (Const 3)) (Const 4) to be interpreted differently, depending on what type the argument represents. If 2, 3, and 4 represent Integers, we want the expression to evaluate to 4 (because (-2 + 3) x 4 = 4). If 2, 3, and 4 represent Booleans, we want this expression to evaluate to True (because (False || True) && True == True).

So, we want methods Const, Add and Mul to have different meanings depending on what the argument type is.

This is where typeclasses help.

A typeclass defines a collection of methods that a type should implement if it wants to have an instance of the typeclass.

In our example, we can write a typeclass for our methods like this:

In [5]:
class Expr a where
  const :: Integer -> a
  add :: a -> a -> a
  mul :: a -> a -> a

This typeclass, Expr, has methods const, add, and mul.

const takes an Integer and converts it to a value of type a.

add takes two values of type a and returns an a.

mul takes two values of type a and returns an a.

(The Haskell syntax a -> a -> a can be read as "(a -> a) -> a", or in other words, the argument is a function that takes a value of type a and returns a value of type a, and the result is a value of type a. But since all functions in Haskell are one-argument functions, a two-argument function is a composition of two one-argument functions, a. k. a. curried functions.)

Notice that this syntax only lists method signatures, i.e. the types of the arguments and the result. It says nothing about what the methods should do. That's the job of the types that define the instances of those typeclasses.

To create expressions out of integers, let's write an instance of typeclass Expr for Integer:

In [6]:
instance Expr Integer where
  const x = x
  add x y = x + y
  mul x y = x * y
In [7]:
add (const 2) (const 9)

To construct boolean expressions out of integers, let's write an instance of typeclass Expr for Bool.

Let's say that a Bool const from an Integer is true if the integer is positive, and false otherwise.

To add two booleans, we OR them, and to multiply them, we AND them.

In [8]:
instance Expr Bool where
  const x = x > 0  
  add x y = x || y
  mul x y = x && y
In [9]:
const 5 :: Bool
In [10]:
const (-2) :: Bool

As we see, we need to specify the type at the end if the expression (as in :: Bool) so the const method would be called as a Bool.

We can define Expr typeclass for even more types. Here is a type MinMax, which is the same as Integer, except that addition for it means taking the maximum of two integers, and multiplication means taking the minimum of the two integers.

In [11]:
newtype MinMax = MinMax Integer deriving (Eq, Ord, Show)

instance Expr MinMax where
  const x = MinMax x
  add x y = max x y
  mul x y = min x y
In [12]:
add (const 5) (const (-2)) :: MinMax
MinMax 5
In [13]:
mul (const 5) (const (-2)) :: MinMax
MinMax (-2)

So, these methods "const", "add" and "mul", implemented differently for each type, can give us a different result (appropriate for that type) from the same expression.

The homework provided us with a Parser module that has a function parseExp. This function will parse a string that contains an expression consisting of usual numbers as well as addition and multiplication signs. It will convert it to an expression consisting of const, add and mul methods. Given those methods, it will perform the appropriate operations and give us a result.

E.g. it can take a string "(3 * -4) + 5" and give us a result "7", because that's the result of these operations.

But for that, parseExp needs us to give methods const, add and mul so it would know how to add and multiply those values.

Thus, we call parseExp like this:

In [14]:
parseExp const add mul "(3 * -4) + 5"
Just (-7)

Why "Just"? If we look at the parseExp type description

In [15]:
:t parseExp
parseExp :: forall a. (Integer -> a) -> (a -> a -> a) -> (a -> a -> a) -> String -> Maybe a

we'll see it returns Maybe a. The definition of Maybe a is

data Maybe a = Just a | Nothing

Which means that if there is a value, you'll get a Just , otherwise you'll get a Nothing .

For example, if you give parseExp an invalid string (multiplication sign out of place)

In [16]:
parseExp const add mul "(3 -4 *) + 5"

Just like we got an Integer from parseExp, we can get a Bool or any other type for which we defined an instance of the Expr typeclass, because we have provided the methods const, add and mul.

In [17]:
parseExp const add mul "(3 * -4) + 5" :: Maybe Bool
Just True

(because const 3 && const (-4) == True && False == False and False || const 5 == False || True == True)

In [18]:
parseExp const add mul "(3 * -4) + 5" :: Maybe MinMax
Just (MinMax 5)

What does this give us?

If we want to implement a function that takes a string of arithmetic expressions and produces the answer as an Integer, a Boolean, or a MinMax, we don't need to write 3 different functions.

We only need a function like parseExp that will replace * and + with appropriate methods. And we supply the meaning of those methods via typeclass instances.

You can implement addition and multiplication for even more complex types this way.

A more complex type for which we can write an instance of the Expr typeclass (i.e. implement addition and multiplication), is Program.

A Program consists of a list of StackExp's, or stack expressions.

A StackExp represents an instruction for the virtual machine.

It is defined in StackVM.h, which was written by the homework authors and provided to us.

data StackExp = PushI Integer
    | PushB Bool
    | Add
    | Mul
    | And
    | Or
    deriving Show

type Stack = [StackVal] type Program = [StackExp]

(StackVal is a type representing a "stack value", but we don't need to know much about it, because it's for the StackVM internal usage.)

Program is a synonym for [StackExp], or a list of stack expressions.

An example of a Program would be

[PushI 3, PushI 4, Add]

It pushes 3 on the stack, pushes 4 on the stack, and adds them.

A longer example would be

[PushI 1, PushI 2, PushI 3, PushI 4, Add, Mul, Add]

It does the following:

  • Pushes 3 on the stack,
  • Pushes 4 on the stack,
  • Adds them,
  • Pushes 2 on the stack,
  • Multiplies the result of the previous addition (7) by 2,
  • Pushes 1 on the stack,
  • Adds the result of the previous multiplication (14) to 1.

What does it mean to add two Programs?

You concatenate their lists and append an Add at the end.

What does it mean to multiply two Programs?

You concatenate their lists and append a Mul at the end.

What is a const for a Program?

(Let's consider only the simplest Programs that consist of PushI, Add and Mul StackExps, omitting the boolean StackExps for the sake of simplicity.)

A const for a Program is a list consisting of just PushI x

In [34]:
instance Expr Program where
      const x = [PushI x]
      add x y = x ++ y ++ [Add]
      mul x y = x ++ y ++ [Mul]
In [35]:
add [PushI 3, PushI 4, Add] [PushI 1, PushI 2, PushI 3, PushI 4, Add, Mul, Add]
[PushI 3,PushI 4,Add,PushI 1,PushI 2,PushI 3,PushI 4,Add,Mul,Add,Add]

We can execute Programs with stackVM (provided by the homework authors) to make sure our typeclass methods do the right thing.

(Right means the result is an Either type. An Either type wraps the result in either the Left or a Right, and it is often used to wrap error messages in a Left and good values in a Right. It is a way for stackVM to avoid throwing exceptions and return a nice error message instead.

IVal is a StackVal type, and it is used internally by stackVM. We only see it in the result.)

In [36]:
stackVM [PushI 3, PushI 4, Add]
Right (IVal 7)
In [37]:
stackVM [PushI 1, PushI 2, PushI 3, PushI 4, Add, Mul, Add]
Right (IVal 15)
In [38]:
stackVM (add [PushI 3, PushI 4, Add] [PushI 1, PushI 2, PushI 3, PushI 4, Add, Mul, Add])
Right (IVal 22)
In [44]:
stackVM [PushI 5]
Right (IVal 5)

Finally, we can write a function compile, which will take an arithmetic expression as a string, and return a Maybe Program. (Maybe is necessary because the expression may be invalid, in which case it will return a Nothing.)

In [39]:
compile :: String -> Maybe Program
compile s = parseExp const add mul s :: Maybe Program
In [40]:
compile "(3 * -4) + 5"
Just [PushI 3,PushI (-4),Mul,PushI 5,Add]

To sum up, it's pretty neat that an arithmetic expression like "(3 * -4) + 5" can be interpreted as an integer, a boolean, or a stack of instructions for a virtual machine, depending on three simple methods that we implemented. That's the power of typeclasses.

We can run the Program generated by compile like this.

In [41]:
fmap stackVM (compile "(3 * -4) + 5")

which is equivalent to this:

In [42]:
stackVM <$> compile "(3 * -4) + 5"
Just (Right (IVal (-7)))

fmap or <$> is a method provided by Functor, and Maybe is a Functor, so we can't simply call stackVM on the result of compile -- we need to fmap it.


Elze Hamilton