Latest Publications

Sorting algorithms: Heap sort

It’s been a while since I did the last post (merge sort) in my sorting algorithms in PHP series (new readers start with Sorting algorithms: bubblesort) so I thought it about time that I add the next part. This time I’m going for heap sort. This algorithm has some similarities with merge sort, mainly the recursive nature, the divide and conquer idea: heap sort treats an array like a tree of elements instead of just one big pool of elements. Other than that, like some of the other sort algorithms presented it can be sorted in place, leading to less memory use – however it is not a stable sort, as the elements can quite easily change order.

In details, heap sort works by imposing a hierarchical tree structure on the set of elements and that the structure satisfies the heap requirement: that any given node be of equal to or less value than it’s parent. This is actually only a requirement for one type of heaps, the max-heaps – the other type, min-heaps, work in the opposite way, namely by having a parent node be always equal to or less than a child node. The reason this is significant is that you will always know which element is the largest: it’s the root node (in a max-heap, still). Hence, if we have a heap in proper order, we can then take the root node out of the heap and put it at the end of the array to sort. After bringing the heap in order again, we can repeat this, popping off the next largest element and putting it at the second to last position in the array.

Laying it out in steps, it looks like:

  1. heapify array
  2. pop root element off and store at end of array
  3. heapify remaining heap
  4. repeat step 2-3 while constantly moving one step back when storing popped element

And that’s it! Well, it obviously gets a bit harder once you get into the details – and without those, no sorting :) As with merge sort, it’s the recursive function that’s most important to the algorithm: the heapify thing. This is where the time will be spent so this is where it’s most important that things are optimized and that the algorithm works out.

So how does the heapify function work? It’s rather simple: it compares a node with it’s two leaves and if one of the is (or both are) larger than the root a swap is made – the largest leaf takes the place of the root node and vice versa. Because of the hierarchical structure of the heap, every node can have at most two leaves. If a swap is made, the heapify function recursively calls itself with the index of the swapped out leaf (the previous root, now a leaf) because this may now be a new root node in another local heap which then has to be checked. If on the other hand a swap is not made the heapify function simply returns: the heap is in order.

In terms of sorting, this means that you first heapify the array once and then of course once per element you pop off the array. As the algorithm swaps out the root node (the one to be popped off the heap) with the last leaf in the array the heapify function then has to be called to get the heap back into shape. Because the last element of the heap is smaller than any of it’s ancestors, the heapify function will recursively call itself a maximum of log2(n)-1 times. As the heap grows smaller, the number of recursions drops but this obviously happens on a log2 scale as well so this is not a major factor in the speed – the biggest speed factor lies in a combination of the fact that sorting a heap when a new root is introduced, given that the heap has previously been heapified, only takes log2(n) moves and that the first heapification of an array is quite fast too.

Enough talk, now code.

The Code

<?php
class HeapSort extends BaseSort
{
    public function sortFunction()
    {
        $array = $this->store;
        $heap_length = count($array);
        $this->buildHeap($array, $heap_length);
        while ($heap_length)
        {
            $temp = $array[0];
            $array[0] = $array[$heap_length - 1];
            $array[$heap_length - 1] = $temp;
            $heap_length--;
            $this->heapify($array, 1, $heap_length);
        }
        $this->store = $array;
    }
    private function buildHeap(&$array, $heap_length)
    {
        for ($i = floor($heap_length / 2); $i > 0; $i--)
        {
            $this->heapify($array, $i, $heap_length);
        }
    }
    private function heapify(&$array, $index, $heap_length)
    {
        $largest = $index;
        $comp = $array[$largest-1];
        $r = $index << 1;
        $l = $r++;
        if (($l <= $heap_length) && ($array[$l-1] > $comp))
        {
            $largest = $l;
            $comp = $array[$l-1];
        }
        if (($r <= $heap_length) && ($array[$r-1] > $comp))
        {
            $largest = $r;
            $comp = $array[$r-1];
        }
        if ($largest != $index)
        {
            $temp = $array[$index-1];
            $array[$index-1] = $comp;
            $array[$largest-1] = $temp;
            $this->heapify($array, $largest, $heap_length);
        }
    }
}

Quick walkthrough: first, the heap is built (or rather, the array is treated like a heap that’s out of order) after which the main loop is entered. Here, with each iteration, the root node is popped of the array (shifted, in PHP terms) and the heap is heapified again. The main action takes part in the heapify() method: it’s called with the heap, the index of the node to heapify and the length of the heap. The method checks the indexed node against it’s leaves and if any of them are larger than it a swap is made and the method is called recursively.

I’ve experimented a bit with optimizing the code and the version above is the fastest I’ve got it to. The biggest payoffs came from passing the array by reference instead of using a property of the class as well as passing the array length into functions – I hadn’t expected either to be a speed improvement but going that way shaved off 5 (five) seconds of the running time when sorting 100,000 elements. Which, I guess, leads me to the results.

Results

1000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.000878 seconds. Memory used: 64360 bytes
Running Heapsort sort. Algorithm took: 0.089941 seconds. Memory used: 64360 bytes

At 1,000 elements the algorithm it’s running in about 1.25 times the time of the merge sort implementation I made.

10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.012138 seconds. Memory used: 665804 bytes
Running Heapsort sort. Algorithm took: 1.208370 seconds. Memory used: 665800 bytes

At 10,000 elements it’s running in 1.5 times the merge sort – pretty good but not quite the same.

100000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.222491 seconds. Memory used: 6524576 bytes
Running Heapsort sort. Algorithm took: 14.997653 seconds. Memory used: 6524560 bytes

With 100,000 elements the pattern is fairly clear in terms of how the algorithm is doing – a bit less than a straight O(n log n). Final test is 1,000,000 elements.

1000000 elements to sort.
Sanity check. PHP native sort() algorithm took: 3.602368 seconds. Memory used: 64194572 bytes
Running Heapsort sort. Algorithm took: 188.843620 seconds. Memory used: 64194576 bytes

Increasing the number of elements to sort by an order of magnitude increases the run time by 12.5 – which is pretty good (compare to the internal PHP algorithm which increases by a factor 15).

Another Plone book review

Cover of Plone 3 MultimediaI’ve been given the opportunity to review another Plone book – this time it’s Plone 3 Multimedia by Tom Gross. The book will be a nice followup to the Plone 3.3 Products Development Cookbook I’ll be reviewing shortly – my plan is to use the cookbook as a foundation and then use the multimedia book on top of that. That should hopefully provide a good basis for evaluating both books and thus for reviewing them.

The teaser for the multimedia book states:

From watermarked images, integrated Silverlight-applications over geotagged content and rich podcasts to protected video-on-demand solutions this book provides a rich repository of tools and techniques to add full multimedia power to Plone. This step-by-step guide will show you how to collaborate with many external web resources to build a powerful interactive Plone site that perfectly meet your needs.

If it’s caught your interest already, check the publishers site for details – links are above.

Typo3 site fail

Have a look at the image below – a screenshot from a website I came across. It’s a quite clear, massive fail (unless you’re fluent in whatever font is used). This is something that can happen to any site, if you’re messing with I18N or L10N – something goes completely screwy. However, it’s also one of those things that – when using a professional CMS – just should work without massive headaches or hidden traps. Typo3 wants be a professional CMS, so, following the logic, this shouldn’t happen (or the site author should have massive warning bells ringing).

Website used to promote TYPO3 Kochbuch

What went wrong here? Well, several things. First off, the site uses Typo3 to generate images containing text – and uses that for links. In itself a horrible idea that’s very bad for SEO but is used by Typo3 so users can get “that special font”. It wouldn’t be so bad though if it wasn’t for the fact the there’s either a mismatch between encoding somewhere in the system or the font needed can’t be found or used (I’m assuming that the install hasn’t been hacked and someone switched the font used to wingdings). If the system had just been outputting text and setting the wrong encoding then at least users would be able to manually switch. As is, there’s just no way of figuring out what the text is supposed to be. Your best bet is looking at the html source and trying to find out by looking at the link titles – not exactly what you want users to do.

Now, to add an extra bit of irony to this: the site name is Typo3 Experts and they’re hyping their Typo3 cookbook. Would you buy it after seeing this site?

More reviews to come

Cover of Plone 3.3 Products Development CookbookNot the books referred to in my last post although it could be (once I’ve read them :) ). Nope, I’ll be reviewing another book from Packt Publishing (my previous review was of TYPO3 Multimedia Cookbook): this one is Plone 3 Products Development Cookbook by Juan Pablo Giménez and Marcos F. Romero. It looks very interesting as far as I can tell: the strategy of the book is to create a working news site step by step, with each step being a recipe. So you can use bits from the book if you just happen to have a specific problem but your site is doing fine otherwise, or you can take it as a guide on how to go about creating sites with Plone.

With a bit of luck, I should have the book in two weeks and knowing my schedule it’ll be between three and four weeks after that before the review is on the site here.

Geek birthdays

As a kid I always liked the hard packages the most, for birthdays and christmas. Those were the ones you could play with and immerse yourself in right away. Fun fun fun! As opposed to clothes, which you can … wear. And that’s about it. As I grew older, the softer presents got more interesting, but there’s enough of the child left me to appreciate a gook set of hard, geeky birthday presents: books! Strictly speaking they’re not entirely hard (a bit wobbly, I suppose) but they’re firmly in the “hard” packages category.

So what am I on about? My recent birthday, I got designing with web standards by Jeffrey Zeldman and Ethan Marcotte, Mastering Regular Expressions by Jeffrey E.F. Friedl, and Learning Python by Mark Lutz. I’m a happy little boy :D If I manage to, might write a review or two as well :)