The next sorting algorithm in this series is the Merge
Sort.
Of the algorithms covered so far, this algorithm comes closest to the
Shell Sort
algorithm.
The reason for this lies in how the comparisons are done in both
algorithms: instead of working on the entire array, both the shell sort
and the merge sort work by breaking the array to sort into smaller
arrays and then sorting those first.
To be more specific, merge sort works by taking an array, breaking it
down to the smallest parts, and then combining these parts. When
combining the parts, there's a good chance that one part can just be
appended to the other, which means that there's one comparison and a
merge to be done. However, when the second part cannot be appended to
the first part, the two parts must be merged together. Here, however,
merge sort has an advantage in that both parts to be merged are already
sorted, which makes the merge easier.
The algorithm looks roughly like this:
- Check if the array contains more than 1 element.
- If yes, break the array into two parts and go back to 1.
- If no, return the array - it is now sorted (there's 1 element in it,
so it's trivially sorted).
- Compare the last element of the first returned array with the first
element of the second return array.
- If the first array element from step 4 is less or equal to the
second array element from step 4, then append the second array on to
the first array.
- If the first array element from step 4 is not less or equal to the
second array element, then merge the two arrays.
- Return the appended/merged array - it is now sorted.
The sub-part - the merging function - looks as follows:
- Initialize an empty array.
- Check if both arrays to merge have elements. If not, skip to 4.
- Compare the first elements of both arrays. Remove the lower one from
it's array and append it to the result array. Continue from 2.
- If either array to be sorted still has elements left, append it to
the result array.
- Return the result array - it's a sorted merge of the two provided
arrays.
As can be gathered from this description, there are two parts to the
algorithm: a recursive function, that breaks up the provided array into
ever smaller parts, before recombining them, and a merge function, that
sorts and merges the arrays to be recombined.
The Code
<?php
class MergeSort extends BaseSort
{
public function sortFunction()
{
$this->store = $this->breakdownArray($this->store);
}
public function breakdownArray(array $array)
{
$count = count($array);
if ($count > 1)
{
$half = $count / 2;
$left = $this->breakdownArray(array_slice($array, 0, $half));
$right = $this->breakdownArray(array_slice($array, $half));
if (end($left) < reset($right))
{
return array_merge($left, $right);
}
else
{
return $this->arraySortMerge($left, $right);
}
}
else
{
return $array;
}
}
public function arraySortMerge(array $left, array $right)
{
$return = array();
$count_left = count($left);
$count_right = count($right);
$l = $r = 0;
while ($l < $count_left && $r < $count_right)
{
if ($left[$l] <= $right[$r])
{
$return[] = $left[$l];
$l++;
}
else
{
$return[] = $right[$r];
$r++;
}
}
if ($l < $count_left)
{
return array_merge($return, array_slice($left, $l));
}
else
{
return array_merge($return, array_slice($right, $r));
}
}
}
Here the structure of the algorithm should be clear: the recursive
function at the top, then the merge function below.
The most important part to optimize in this algorithm is the merge
function - this is where most of the time in the sorting will be spent.
When I wrote up the algorithm, this was the hardest part to create: a
function that merges two already sorted arrays and takes advantage of
this fact. First, I considered extracting the first variables of the
arrays, and then reinsert the one not used (only one variable at a time
should be append to the result array, the other should be checked again
next loop). This was much slower than what the algorithm should run
like, so I went over it a couple of times, trying to improve. The next
version took advantage of the
array_shift
function - this function shifts out the first element of an array and
then reorders the array, so the new first element can be reference via
\$array[0]. This improved the speed of the algorithm by a factor 10, but
it was still running very slow. In fact, the algorithm was running at
O(n^2^) and not the O(n log n) it should. So, I went back to the code
and rethought things: instead of changing the array, it might be faster
to just use indexes for the merge - which indeed was the case. The code
you can see above actually runs in much better time - it's in O(n log n)
now.
The moral should be clear: if you're doing algorithms like this and
you'll be throwing a lot of data after them, you need to do as little
modifying to data structures as possible. Creating a new array by
appending elements doesn't take a lot of time, but shifting elements out
of one and reordering it will.
There's another lesson to be learned as well: the algorithm was running
in O(n2) even with a proper merge function! So, while in theory the
worst case scenario of merge sort is O(n log n) this very much depends
upon the implementation.
Results
At 1,000 elements
1000 elements to sort.Sanity check. PHP native sort() algorithm took: 0.002649 seconds. Memory used: 48264 bytes
Running MergeSort sort. Algorithm took: 0.022447 seconds. Memory used: 48264 bytes
This is pretty good time, less than 10 times slower than PHP itself.
Next, 10,000 elements:
10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.035824 seconds. Memory used: 505704 bytes
Running MergeSort sort. Algorithm took: 0.268344 seconds. Memory used: 505744 bytes
About 11 times slower for 10 times as many elements. 100,000 elements:
100000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.541093 seconds. Memory used: 4924456 bytes
Running MergeSort sort. Algorithm took: 3.329802 seconds. Memory used: 4924504 bytes
About 12 times slower, at 10 times as many elements. The algorithm is
running steady with running times incrementing as expected for an O(n
log n) sort. Final test, 1,000,000 elements:
1000000 elements to sort.
Sanity check. PHP native sort() algorithm took: 7.504231 seconds. Memory used: 48194472 bytes
Running MergeSort sort. Algorithm took: 41.222942 seconds. Memory used: 48194504 bytes
Sorting one million elements in 41 seconds is not that bad, considering
that the implementation is in PHP, an interpreted language!
Like the shell sort algorithm, this one could actually be used in a
production environment (if you can't use the sort() functions in php).
It's slightly faster than the shell sort as well, so we're clearly
moving into the realm of effective algorithms here.