Monday, May 5, 2008

Red-Black Tree in 2 hours

Many computer science students have heard of red-black trees. If you use any container class library that has a map (ordered associative container) like STL you probably know that red-black trees are a popular implementation of a map. It may be non-obvious why the constraints associated with red-black trees cause them to remain roughly balanced but what is even more daunting is the implementation of such trees in an imperative language like C!

While working on an implementation for a recipe in my forth coming "Mathematica Cookbook" I found a functional implementation in Haskell (postscript) by Chris Okasaki. I know this has been said a million times before but it never ceases to amaze me how succinct and beautiful the functional approach can be. Using the Haskell solution as a guide, I was able to develop a complete red-black implementation in Mathematica in under 2 hours. This may not sound that impressive but consider that the referenced paper does not show how to implemented a remove operation. Even without the need for a remove, how many programmers could take a red-black tree written in say C and translate it into a completely working implementation in Java in under 2 hrs? I suppose a few can but this is really not a post about bragging rights. The functional approach to software development is just god-damn beautiful at so many levels and it is this and not my hacker abilities which made this exercise possible.

Now, there are caveats. There always are. Any C implementation of a red-black tree is bound to have numerous optimizations and a purely functional solution will not fare well for every application of a map (but surprisingly it is competitive for many. I'll post some C++ comparisons when I have a chance.)

If you'd like to learn more about red-black trees but don't know Haskell I would highly recommend learning the minimum of Haskell you need to understand Okasaki's paper instead of trying to learn about them by digging into a C implementation first.

11 comments:

Anonymous said...

The problem with Okasaki's solution is that there is no remove operation that can be performed with same Big-O complexity as a corresponding C-language implementation of remove. Okasaki hand waves around it by saying that you always want to remove elements from an RB Tree in the reverse order that it was inserted in, which is just totally lame. You can do this in Haskell by turning the RB Tree into a Monad, but that just makes the data structure imperative. However a fair comparison with C must implement the remove with same Big-O complexity, possibly as a Monad if not 100% functional.

Sal Mangano said...

Luckily Mathematica is not a pure functional language so I think there are ways for me to cheat and get good performance with out losing most of the simplicity. I have not proved this to myself yet but I'll post again when I know more.

However, a solution that can't remove is not completely useless. There are lots of problems that need a map where the map remains constant after initialization or only grows. Of course, if this is not your problem than this is not much help.

The real world is messy!!

Kalani Thielen said...

Sal, I think that your point about reading the Haskell implementation and grokking it quickly hits the nail on the head. The implementation may not have the same time/space complexity characteristics of the optimized C version, but as a vehicle to convey the idea of red-black trees, Okasaki's Haskell implementation is superior.

In fact, the most common example I've used to illustrate this point is QuickSort, which has an incredibly brief implementation in Haskell that clearly conveys the main idea. To get the best performance, though, you perform a number of "mostly dumb" transformations on the Haskell program to yield the more efficient C implementation (sort arrays instead of linked lists, choose the pivot at random rather than from the front, reuse/mutate a temporary array for the pivot-filter step, etc). The important lesson there for CS students (or practitioners) is to try to understand and solve the problem with the most natural notation, and only worry about optimization once the solution is well understood.

BTW, Okasaki has a very popular book on purely-functional data structures:

http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1210039860&sr=8-1

He recently posted a 10-year retrospective on lambda-the-ultimate:

http://lambda-the-ultimate.org/node/2665

It says a lot about the affect of this book that Peter van Roy (coauthor of "SICP 2.0") was so influenced by it early on.

Sal Mangano said...

Thanks Kalani. I just ordered that book from Amazon (unfortuntaly no Kindle edition yet).

Anonymous said...

Back in the early 90s (prior to the STL) I implemented a Red/Black tree in C++. A more functional style than my approach at the time definitely would have made it easier to implement (which C++ templates goes a long way in helping). However, my clearest recollection is that deletes were the hardest things to get right!

Anonymous said...

Binstock and Rex's "Practical Algorithms" implements complete red-black trees, and most other forms of trees. And it follows them up with a complete ISAM implementation to demo their use. Code is entirely in C.

jherber said...

red black tree implementation used by scala:

https://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/src/library/scala/collection/immutable/RedBlack.scala?rev=9646

Sal Mangano said...

jherber,

Thanks for sharing that. How does the del operation perform in practice for in order and reverse order deletions? random deletions?

Unknown said...

In response to anonymous on the lack of a remove:

You are wrong here, I think. It is just as easy to implement remove with O(log n) complexity.

As an example:
http://www.cse.unsw.edu.au/~dons/data/RedBlackTree.html

And, I don't see anywhere in the paper where Okasaki says you always want to remove in the reverse order.

Matt R said...

My guess is that the anonymous first commenter was referring to Okasaki's blog post on Ten Years of Purely Functional Data Structures, where he says, "...I only describe insertions, not deletions. There were two main reasons for that omission. First, deletions are much messier than insertions, and the introduction did not seem like the right place for a lot of messy details. Second, and more importantly, probably 95+% of uses of functional red-black trees don't need deletions anyway. Because the trees are immutable, when you want to delete something, you can simply revert back to a previous version of the tree from before the unwanted item was inserted. This is almost always both the easiest and the right thing to do. The exception is when you inserted something else that you want to keep after the item to be deleted, but this is extremely rare."


I'm implementing this in Java, and I have to confess that I'd also really want my red-black trees to support deletion in arbitrary order.

Unknown said...

Okay, but Okasaki didn't say that deletion is never needed. He also didn't say that it can't be done with the smae big-O as in C.

In fact there's no fundamental problem with deletion, as the link I posted shows. And, the added complexity is not as bad as Okasaki suggests, in my opinion.

Leaving it out was more a matter of keeping the presentation concise, and then later I think he was overly defensive about it.