Below explanation is by Anupam Jain. Here’s how to derive final tagless from first principles - 1. Final tagless is all about recursive data types. 2. The usual way to encode recursive data types is to encode the recursion manually. For example - ```haskell data Tree = Leaf | Node Tree Tree ``` Here the definition for `Tree` mentions `Tree`. This is called the initial encoding. 3. A common trick for any recursion is to parametrise the recursive type. ```haskell data Tree a = Leaf | Node a a ``` 4. Once you parametrise, you can also convert this to a class. ```haskell class Tree t where leaf :: t node :: t -> t -> t ``` This is called final encoding. 5. How would you add values to this tree structure? Consider a tree with values at leaf levels, you could use multi parameter type classes like below but that has its own can of worms - ```haskell class Tree t v where leaf :: v -> t node :: t -> t -> t ``` One problem here is that the node function still has no information about the leaf values, which causes issues with type inference. 6. A better solution is to use higher order type classes and `tag` the type of the tree - ```haskell class Tree t where leaf :: forall v. -> t v node :: forall v. t v -> t v -> t v ``` 7. Now it’s possible to add things that only work on some types of trees, for example sum all the leaves of the tree that works only for Integer trees - ```haskell sum :: t Int -> Int ``` This is the final tagless style. 8. Note that you can do the same thing with GADTs. ```haskell data Tree v where Leaf :: forall v. v -> Tree v Node :: forall v. Tree v -> Tree v -> Tree v ``` and ```haskell sum :: Tree Int -> Int ``` But GADTs aren't in `Haskell98`, and were considered “advanced” when final tagless was proposed.