Custom data types in Haskell

Hopefully after the first post in this series you can quickly write up simple functions over existing Haskell types and also understand basic polymorphism as well as function composition. We’ll now start digging into creating custom types and how to handle these in our functions. Lets start by creating a Tree type:

data Tree a = Leaf | Node a (Tree a) (Tree b)

When defining elements of the type above you can do so in the following manner:

sampleTree = Node 2 (Node 1 Leaf Leaf) (Node 3 Leaf)

Which represents the Tree:

      2
     / \
    /   \
   1     3
  / \   / \
 *   * *   *

We now want to write a function that can calculate the maximum depth of a Tree and when we write this function we must not forget to handle all of the cases that compose the data type Tree. Which means handling the Leaf and handling the Node x y z, like so:

maxdepth :: Tree a -> Int
maxdepth Leaf = 0
maxdepth (Node _ l r) = 1 + max (maxdepth l) (maxdepth r)

As you can see pattern matching on your custom data type is extremely easy to read and is just like doing so with any built in type. We introduced the use of the _ variable which is how you handle variables when you don’t care to use them in your functions calculations. We’re using the max function from Prelude to handle calculating the maximum of those two possible choices in a Tree.

Lets look at how to print an existing tree with in Haskell. So firstly if you try to just show the current sampleTree you’ll find that ghci shell will actually complain it can’t show this new element:

No instance for (Show (Tree Integer)) arising from a use of `print' Possible fix: add an instance declaration for (Show (Tree Integer)) In a stmt of an interactive GHCi command: print it

The above error is basically trying to tell you that Haskell doesn’t know how to “show” the data type you just created. One easy thing to do is to just let Haskell derive a basic representation for your type by adding the following to the declaration of the type:

data Tree a = Leaf | Node a (Tree a) (Tree b)
    deriving (Show)

With that you can now get a simple representation of your data like so:

Main> sampleTree Node 1 (Node 2 Leaf Leaf) (Node 3 (Node 4 Leaf Leaf) Leaf)

But you can also define exactly how you’d like to represent your trees. This is done by defining an instance of the class Show for your datatype and then defining the function show for your type, something like so:

padding :: (Num a) => a -> String
padding 0 = ""
padding n = " " ++ padding(n-1)

showTree :: (Show a, Num b) => Tree a -> b -> String
showTree Leaf n = (padding n) ++ "."
showTree (Node a l r ) n = let showl = showTree l (n+4) in
                           let showr = showTree r (n+4) in
                           let showc = (padding n) ++ (show a) in
                           showc ++ "\n" ++ showl ++ "\n" ++ showr

instance (Show a) => Show (Tree a) where
    show a = showTree a 0

I’ve introduced here the concept of typeclasses, they are heavily used in conjunction with polymorphism to better describe the type of elements that can be used with the current function. Typeclasses seem like class definitions but they’re much more powerful in the sense that you are defining an abstract operation that needs to be defined per type that wants to be usable by certain functions. In the code above you’ll notice how the padding function is expressing that it can accept any a as long as its an implementation of the typeclass Num.

Basically you’re telling Haskell that to show a Tree a type you want Haskell to represent it in the manner a specific manner by giving Haskell the definition of the show function you’d rather use. Now when we try to reference our sampleTree you’ll get a more readable representation like so:

Main> sampleTree 1 2 . . 3 4 . . .

If we wanted we could even extend this show implementation to draw a few ASCII lines and make the tree a bit easier to read. We’ll leave that as an exercise for the reader.

Lets find something more interesting function to write and we’ll start by writing up a insert function takes a Tree and an element and inserts the Tree with the new element inserted while at least making sure that the nodes are in order in the tree so that if we print the tree in order it will print the elements in order

insert :: (Ord a) => a -> Tree a -> Tree a
insert a Leaf = Node a Leaf Leaf
insert a (Node b l r) = if a < b
                            then Node b (insert a l) r
                            else Node b l (insert a r)

Again we used the typeclass Ord which is define as:

class  (Eq a) => Ord a  where
   compare              :: a -> a -> Ordering
   (<), (<=), (>=), (>) :: a -> a -> Bool
   max, min             :: a -> a -> a

        -- Minimal complete definition:
        --      (<=) or compare
        -- Using compare can be more efficient for complex types.
    compare x y
         | x == y    =  EQ
         | x <= y    =  LT
         | otherwise =  GT

    x <= y           =  compare x y /= GT
    x <  y           =  compare x y == LT
    x >= y           =  compare x y /= LT
    x >  y           =  compare x y == GT

-- note that (min x y, max x y) = (x,y) or (y,x)
    max x y
         | x >= y    =  x
         | otherwise =  y
    min x y
         | x <  y    =  x
         | otherwise =  y

Basically the definition tells you that defining the compare function is enough to for the other functions be inferred from.

One last thing about defining data types is the ability to create synonyms for existing types. This is done using the type keyword and allows you to make your functions more readable. Here are a few examples:

type String = [Char]

type Name = String
data Address = None | Addr String
type Person = (Name,Address)

type StringList = [String]