It's been a while since I did the last post (merge
sort) in my
sorting algorithms in PHP series (new readers start with Sorting
algorithms:
bubblesort)
so I thought it about time that I add the next part. This time I'm going
for heap sort. This algorithm
has some similarities with merge sort, mainly the recursive nature, the
divide and conquer idea: heap sort treats an array like a tree of
elements instead of just one big pool of elements. Other than that, like
some of the other sort algorithms presented it can be sorted in place,
leading to less memory use - however it is not a stable sort, as the
elements can quite easily change order.
In details, heap sort works by imposing a hierarchical tree structure on
the set of elements and that the structure satisfies the heap
requirement: that any given node be of equal to or less value than it's
parent. This is actually only a requirement for one type of heaps, the
max-heaps - the other type, min-heaps, work in the opposite way, namely
by having a parent node be always equal to or less than a child node.
The reason this is significant is that you will always know which
element is the largest: it's the root node (in a max-heap, still).
Hence, if we have a heap in proper order, we can then take the root node
out of the heap and put it at the end of the array to sort. After
bringing the heap in order again, we can repeat this, popping off the
next largest element and putting it at the second to last position in
the array.
Laying it out in steps, it looks like:
- heapify array
- pop root element off and store at end of array
- heapify remaining heap
- repeat step 2-3 while constantly moving one step back when storing
popped element
And that's it! Well, it obviously gets a bit harder once you get into
the details - and without those, no sorting :) As with merge sort, it's
the recursive function that's most important to the algorithm: the
heapify thing. This is where the time will be spent so this is where
it's most important that things are optimized and that the algorithm
works out.
So how does the heapify function work? It's rather simple: it compares a
node with it's two leaves and if one of the is (or both are) larger than
the root a swap is made - the largest leaf takes the place of the root
node and vice versa. Because of the hierarchical structure of the heap,
every node can have at most two leaves. If a swap is made, the heapify
function recursively calls itself with the index of the swapped out leaf
(the previous root, now a leaf) because this may now be a new root node
in another local heap which then has to be checked. If on the other hand
a swap is not made the heapify function simply returns: the heap is in
order.
In terms of sorting, this means that you first heapify the array once
and then of course once per element you pop off the array. As the
algorithm swaps out the root node (the one to be popped off the heap)
with the last leaf in the array the heapify function then has to be
called to get the heap back into shape. Because the last element of the
heap is smaller than any of it's ancestors, the heapify function will
recursively call itself a maximum of log~2~(n)-1 times. As the heap
grows smaller, the number of recursions drops but this obviously happens
on a log~2~ scale as well so this is not a major factor in the speed -
the biggest speed factor lies in a combination of the fact that sorting
a heap when a new root is introduced, given that the heap has previously
been heapified, only takes log~2~(n) moves and that the first
heapification of an array is quite fast too.
Enough talk, now code.
The Code
<?php
class HeapSort extends BaseSort
{
public function sortFunction()
{
$array = $this->store;
$heap_length = count($array);
$this->buildHeap($array, $heap_length);
while ($heap_length)
{
$temp = $array[0];
$array[0] = $array[$heap_length - 1];
$array[$heap_length - 1] = $temp;
$heap_length--;
$this->heapify($array, 1, $heap_length);
}
$this->store = $array;
}
private function buildHeap(&$array, $heap_length)
{
for ($i = floor($heap_length / 2); $i > 0; $i--)
{
$this->heapify($array, $i, $heap_length);
}
}
private function heapify(&$array, $index, $heap_length)
{
$largest = $index;
$comp = $array[$largest-1];
$r = $index << 1;
$l = $r++;
if (($l <= $heap_length) && ($array[$l-1] > $comp))
{
$largest = $l;
$comp = $array[$l-1];
}
if (($r <= $heap_length) && ($array[$r-1] > $comp))
{
$largest = $r;
$comp = $array[$r-1];
}
if ($largest != $index)
{
$temp = $array[$index-1];
$array[$index-1] = $comp;
$array[$largest-1] = $temp;
$this->heapify($array, $largest, $heap_length);
}
}
}
Quick walkthrough: first, the heap is built (or rather, the array is
treated like a heap that's out of order) after which the main loop is
entered. Here, with each iteration, the root node is popped of the array
(shifted, in PHP terms) and the heap is heapified again. The main action
takes part in the heapify() method: it's called with the heap, the index
of the node to heapify and the length of the heap. The method checks the
indexed node against it's leaves and if any of them are larger than it a
swap is made and the method is called recursively.
I've experimented a bit with optimizing the code and the version above
is the fastest I've got it to. The biggest payoffs came from passing the
array by reference instead of using a property of the class as well as
passing the array length into functions - I hadn't expected either to be
a speed improvement but going that way shaved off 5 (five) seconds of
the running time when sorting 100,000 elements. Which, I guess, leads me
to the results.
Results
1000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.000878 seconds. Memory used: 64360 bytes
Running Heapsort sort. Algorithm took: 0.089941 seconds. Memory used: 64360 bytes
At 1,000 elements the algorithm it's running in about 1.25 times the
time of the merge sort implementation I made.
10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.012138 seconds. Memory used: 665804 bytes
Running Heapsort sort. Algorithm took: 1.208370 seconds. Memory used: 665800 bytes
At 10,000 elements it's running in 1.5 times the merge sort - pretty
good but not quite the same.
100000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.222491 seconds. Memory used: 6524576 bytes
Running Heapsort sort. Algorithm took: 14.997653 seconds. Memory used: 6524560 bytes
With 100,000 elements the pattern is fairly clear in terms of how the
algorithm is doing - a bit less than a straight O(n log n). Final test
is 1,000,000 elements.
1000000 elements to sort.
Sanity check. PHP native sort() algorithm took: 3.602368 seconds. Memory used: 64194572 bytes
Running Heapsort sort. Algorithm took: 188.843620 seconds. Memory used: 64194576 bytes
Increasing the number of elements to sort by an order of magnitude
increases the run time by 12.5 - which is pretty good (compare to the
internal PHP algorithm which increases by a factor 15).