Below explanation is by Anupam Jain.

Here’s how to derive final tagless from first principles -

1. Final tagless is all about recursive data types.
2. The usual way to encode recursive data types is to encode the recursion manually. For example -

```haskell
data Tree = Leaf | Node Tree Tree
```
Here the definition for `Tree` mentions `Tree`. This is called the initial encoding.

3. A common trick for any recursion is to parametrise the recursive type.

```haskell
data Tree a = Leaf | Node a a
```

4. Once you parametrise, you can also convert this to a class.

```haskell
class Tree t where
  leaf :: t
  node :: t -> t -> t
```

This is called final encoding.

5. How would you add values to this tree structure? Consider a tree with values at leaf levels, you could use multi parameter type classes like below but that has its own can of worms -

```haskell
class Tree t v where
  leaf :: v -> t
  node :: t -> t -> t
```

One problem here is that the node function still has no information about the leaf values, which causes issues with type inference.

6. A better solution is to use higher order type classes and `tag` the type of the tree -

```haskell
class Tree t where
  leaf :: forall v. -> t v
  node :: forall v. t v -> t v -> t v
```

7. Now it’s possible to add things that only work on some types of trees, for example sum all the leaves of the tree that works only for Integer trees -

```haskell
sum :: t Int -> Int
```

This is the final tagless style.

8. Note that you can do the same thing with GADTs.

```haskell
data Tree v where
  Leaf :: forall v. v -> Tree v
  Node :: forall v. Tree v -> Tree v -> Tree v
```
and

```haskell
sum :: Tree Int -> Int
```

But GADTs aren't in `Haskell98`, and were considered “advanced” when final tagless was proposed.