It's been a while since I did the last post (merge sort) in my sorting algorithms in PHP series (new readers start with Sorting algorithms: bubblesort) so I thought it about time that I add the next part. This time I'm going for heap sort. This algorithm has some similarities with merge sort, mainly the recursive nature, the divide and conquer idea: heap sort treats an array like a tree of elements instead of just one big pool of elements. Other than that, like some of the other sort algorithms presented it can be sorted in place, leading to less memory use - however it is not a stable sort, as the elements can quite easily change order.
In details, heap sort works by imposing a hierarchical tree structure on the set of elements and that the structure satisfies the heap requirement: that any given node be of equal to or less value than it's parent. This is actually only a requirement for one type of heaps, the max-heaps - the other type, min-heaps, work in the opposite way, namely by having a parent node be always equal to or less than a child node. The reason this is significant is that you will always know which element is the largest: it's the root node (in a max-heap, still). Hence, if we have a heap in proper order, we can then take the root node out of the heap and put it at the end of the array to sort. After bringing the heap in order again, we can repeat this, popping off the next largest element and putting it at the second to last position in the array.
Laying it out in steps, it looks like:
- heapify array
- pop root element off and store at end of array
- heapify remaining heap
- repeat step 2-3 while constantly moving one step back when storing popped element
And that's it! Well, it obviously gets a bit harder once you get into the details - and without those, no sorting :) As with merge sort, it's the recursive function that's most important to the algorithm: the heapify thing. This is where the time will be spent so this is where it's most important that things are optimized and that the algorithm works out.
So how does the heapify function work? It's rather simple: it compares a node with it's two leaves and if one of the is (or both are) larger than the root a swap is made - the largest leaf takes the place of the root node and vice versa. Because of the hierarchical structure of the heap, every node can have at most two leaves. If a swap is made, the heapify function recursively calls itself with the index of the swapped out leaf (the previous root, now a leaf) because this may now be a new root node in another local heap which then has to be checked. If on the other hand a swap is not made the heapify function simply returns: the heap is in order.
In terms of sorting, this means that you first heapify the array once and then of course once per element you pop off the array. As the algorithm swaps out the root node (the one to be popped off the heap) with the last leaf in the array the heapify function then has to be called to get the heap back into shape. Because the last element of the heap is smaller than any of it's ancestors, the heapify function will recursively call itself a maximum of log~2~(n)-1 times. As the heap grows smaller, the number of recursions drops but this obviously happens on a log~2~ scale as well so this is not a major factor in the speed - the biggest speed factor lies in a combination of the fact that sorting a heap when a new root is introduced, given that the heap has previously been heapified, only takes log~2~(n) moves and that the first heapification of an array is quite fast too.
Enough talk, now code.
The Code
<?php
class HeapSort extends BaseSort
{
public function sortFunction()
{
$array = $this->store;
$heap_length = count($array);
$this->buildHeap($array, $heap_length);
while ($heap_length)
{
$temp = $array[0];
$array[0] = $array[$heap_length - 1];
$array[$heap_length - 1] = $temp;
$heap_length--;
$this->heapify($array, 1, $heap_length);
}
$this->store = $array;
}
private function buildHeap(&$array, $heap_length)
{
for ($i = floor($heap_length / 2); $i > 0; $i--)
{
$this->heapify($array, $i, $heap_length);
}
}
private function heapify(&$array, $index, $heap_length)
{
$largest = $index;
$comp = $array[$largest-1];
$r = $index << 1;
$l = $r++;
if (($l <= $heap_length) && ($array[$l-1] > $comp))
{
$largest = $l;
$comp = $array[$l-1];
}
if (($r <= $heap_length) && ($array[$r-1] > $comp))
{
$largest = $r;
$comp = $array[$r-1];
}
if ($largest != $index)
{
$temp = $array[$index-1];
$array[$index-1] = $comp;
$array[$largest-1] = $temp;
$this->heapify($array, $largest, $heap_length);
}
}
}
Quick walkthrough: first, the heap is built (or rather, the array is treated like a heap that's out of order) after which the main loop is entered. Here, with each iteration, the root node is popped of the array (shifted, in PHP terms) and the heap is heapified again. The main action takes part in the heapify() method: it's called with the heap, the index of the node to heapify and the length of the heap. The method checks the indexed node against it's leaves and if any of them are larger than it a swap is made and the method is called recursively.
I've experimented a bit with optimizing the code and the version above is the fastest I've got it to. The biggest payoffs came from passing the array by reference instead of using a property of the class as well as passing the array length into functions - I hadn't expected either to be a speed improvement but going that way shaved off 5 (five) seconds of the running time when sorting 100,000 elements. Which, I guess, leads me to the results.
Results
1000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.000878 seconds. Memory used: 64360 bytes
Running Heapsort sort. Algorithm took: 0.089941 seconds. Memory used: 64360 bytes
At 1,000 elements the algorithm it's running in about 1.25 times the time of the merge sort implementation I made.
10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.012138 seconds. Memory used: 665804 bytes
Running Heapsort sort. Algorithm took: 1.208370 seconds. Memory used: 665800 bytes
At 10,000 elements it's running in 1.5 times the merge sort - pretty good but not quite the same.
100000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.222491 seconds. Memory used: 6524576 bytes
Running Heapsort sort. Algorithm took: 14.997653 seconds. Memory used: 6524560 bytes
With 100,000 elements the pattern is fairly clear in terms of how the algorithm is doing - a bit less than a straight O(n log n). Final test is 1,000,000 elements.
1000000 elements to sort.
Sanity check. PHP native sort() algorithm took: 3.602368 seconds. Memory used: 64194572 bytes
Running Heapsort sort. Algorithm took: 188.843620 seconds. Memory used: 64194576 bytes
Increasing the number of elements to sort by an order of magnitude increases the run time by 12.5 - which is pretty good (compare to the internal PHP algorithm which increases by a factor 15).