Latest Publications

An end to censorship

Google announced yesterday that they will stop censoring search results on google.cn. There can only be one response to that: finally! It is definitely the right way to go, as the Chinese government is hell-bent on ignoring all concerns from foreigners (as well as from critical voices in their own country), meaning that any “influencing them slowly” is futile.

  • Google Reader
  • Delicious
  • Digg
  • Blogger Post
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
  • WordPress
  • Share/Bookmark

Alarms in Ubuntu: update

In Alarms in Ubuntu, I published a script that lets you set an alarm from the command line, nice and easy. One thing was lacking though: visual notification of the alarm, so if you happen to be away from the computer when the alarm sounds you’ll still see the dialog box. To achieve that I’ve modified the script, added an extra one, so here’s the new and shinier alarm script.

First, the alarm script:

#!/bin/bash
if [ "$1" = '' ]
then
    echo "No arguments for for alarm! Supply with time and optionally message
example:
    alarm 7:45
    alarm 19:59
    alarm '3pm + 3 day'
    alarm 2010-09-18
    alarm 'now + 5 minutes' 'go do ... stuff'
    "
    exit 1
else
    message="alarm time reached"
    [ "$2" = '' ] && message=$2
    if `echo aplay -q /home/fake51/Downloads/gqold.wav \&\& ddisplay \"$message\" | at $1 2>\&1 > /dev/null`
    then
        exit 0
    else
        echo "Setting alarm failed"
        exit 1
    fi
fi

Now, the major difference to the previous script lies in the script accepting a message, setting a default message, and then using ddisplay to display a message box.
Now, ddisplay is not a linux command – it’s the second part of this scripting excercise.

#!/bin/bash
if [ "$1" = "" ]
then
    exit 1
else
    export DISPLAY=:0
    zenity --warning --text="$1"
    exit 0
fi

This script makes use of the zenity command – which basically displays GTK+ dialogs. The ‘warning’ option makes zenity display a normal dialog box on top of everything, while the ‘text’ option is obviously the text to display. Hence, pass a text string to ddisplay and you’ll get a dialog box with it – and that’s what the first script does, thus playing the alarm sound and popping up a dialog box when the sound is done.
The reason for adding the extra script is that ‘at’ schedules commands to run – so putting the dialog box code in a function in the alarm script isn’t an option. One could try sticking the zenity command straight in the ‘echo’ piped to ‘at’, but to run ‘zenity’ from a script, you typically need to set a few environment variables (these are set if you run zenity straight from the command line, but not necessarily if run by cron or at). On the plus side, ddisplay can be reused for other scripts as well, simplifying them.

  • Google Reader
  • Delicious
  • Digg
  • Blogger Post
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
  • WordPress
  • Share/Bookmark

Sorting algorithms: Merge sort

The next sorting algorithm in this series is the Merge Sort. Of the algorithms covered so far, this algorithm comes closest to the Shell Sort algorithm. The reason for this lies in how the comparisons are done in both algorithms: instead of working on the entire array, both the shell sort and the merge sort work by breaking the array to sort into smaller arrays and then sorting those first.

To be more specific, merge sort works by taking an array, breaking it down to the smallest parts, and then combining these parts. When combining the parts, there’s a good chance that one part can just be appended to the other, which means that there’s one comparison and a merge to be done. However, when the second part cannot be appended to the first part, the two parts must be merged together. Here, however, merge sort has an advantage in that both parts to be merged are already sorted, which makes the merge easier.

The algorithm looks roughly like this:

  1. Check if the array contains more than 1 element.
  2. If yes, break the array into two parts and go back to 1.
  3. If no, return the array – it is now sorted (there’s 1 element in it, so it’s trivially sorted).
  4. Compare the last element of the first returned array with the first element of the second return array.
  5. If the first array element from step 4 is less or equal to the second array element from step 4, then append the second array on to the first array.
  6. If the first array element from step 4 is not less or equal to the second array element, then merge the two arrays.
  7. Return the appended/merged array – it is now sorted.

The sub-part – the merging function – looks as follows:

  1. Initialize an empty array.
  2. Check if both arrays to merge have elements. If not, skip to 4.
  3. Compare the first elements of both arrays. Remove the lower one from it’s array and append it to the result array. Continue from 2.
  4. If either array to be sorted still has elements left, append it to the result array.
  5. Return the result array – it’s a sorted merge of the two provided arrays.

As can be gathered from this description, there are two parts to the algorithm: a recursive function, that breaks up the provided array into ever smaller parts, before recombining them, and a merge function, that sorts and merges the arrays to be recombined.

The Code

<?php

class MergeSort extends BaseSort
{
    public function sortFunction()
    {
        $this->store = $this->breakdownArray($this->store);
    }

    public function breakdownArray(array $array)
    {
        $count = count($array);
        if ($count > 1)
        {
            $half = $count / 2;
            $left = $this->breakdownArray(array_slice($array, 0, $half));
            $right = $this->breakdownArray(array_slice($array, $half));

            if (end($left) < reset($right))
            {
                return array_merge($left, $right);
            }
            else
            {
                return $this->arraySortMerge($left, $right);
            }
        }
        else
        {
            return $array;
        }

    }

    public function arraySortMerge(array $left, array $right)
    {
        $return = array();
        $count_left = count($left);
        $count_right = count($right);
        $l = $r = 0;
        while ($l < $count_left && $r < $count_right)
        {
            if ($left[$l] <= $right[$r])
            {
                $return[] = $left[$l];
                $l++;
            }
            else
            {
                $return[] = $right[$r];
                $r++;
            }
        }
        if ($l < $count_left)
        {
            return array_merge($return, array_slice($left, $l));
        }
        else
        {
            return array_merge($return, array_slice($right, $r));
        }
    }
}

Here the structure of the algorithm should be clear: the recursive function at the top, then the merge function below.

The most important part to optimize in this algorithm is the merge function – this is where most of the time in the sorting will be spent. When I wrote up the algorithm, this was the hardest part to create: a function that merges two already sorted arrays and takes advantage of this fact. First, I considered extracting the first variables of the arrays, and then reinsert the one not used (only one variable at a time should be append to the result array, the other should be checked again next loop). This was much slower than what the algorithm should run like, so I went over it a couple of times, trying to improve. The next version took advantage of the array_shift function – this function shifts out the first element of an array and then reorders the array, so the new first element can be reference via $array[0]. This improved the speed of the algorithm by a factor 10, but it was still running very slow. In fact, the algorithm was running at O(n2) and not the O(n log n) it should. So, I went back to the code and rethought things: instead of changing the array, it might be faster to just use indexes for the merge – which indeed was the case. The code you can see above actually runs in much better time – it’s in O(n log n) now.

The moral should be clear: if you’re doing algorithms like this and you’ll be throwing a lot of data after them, you need to do as little modifying to data structures as possible. Creating a new array by appending elements doesn’t take a lot of time, but shifting elements out of one and reordering it will.

There’s another lesson to be learned as well: the algorithm was running in O(n2) even with a proper merge function! So, while in theory the worst case scenario of merge sort is O(n log n) this very much depends upon the implementation.

Results

At 1,000 elements

1000 elements to sort.Sanity check. PHP native sort() algorithm took: 0.002649 seconds. Memory used: 48264 bytes
Running MergeSort sort. Algorithm took: 0.022447 seconds. Memory used: 48264 bytes

This is pretty good time, less than 10 times slower than PHP itself. Next, 10,000 elements:

10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.035824 seconds. Memory used: 505704 bytes
Running MergeSort sort. Algorithm took: 0.268344 seconds. Memory used: 505744 bytes

About 11 times slower for 10 times as many elements. 100,000 elements:

100000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.541093 seconds. Memory used: 4924456 bytes
Running MergeSort sort. Algorithm took: 3.329802 seconds. Memory used: 4924504 bytes

About 12 times slower, at 10 times as many elements. The algorithm is running steady with running times incrementing as expected for an O(n log n) sort. Final test, 1,000,000 elements:

1000000 elements to sort.
Sanity check. PHP native sort() algorithm took: 7.504231 seconds. Memory used: 48194472 bytes
Running MergeSort sort. Algorithm took: 41.222942 seconds. Memory used: 48194504 bytes

Sorting one million elements in 41 seconds is not that bad, considering that the implementation is in PHP, an interpreted language!
Like the shell sort algorithm, this one could actually be used in a production environment (if you can’t use the sort() functions in php). It’s slightly faster than the shell sort as well, so we’re clearly moving into the realm of effective algorithms here.

  • Google Reader
  • Delicious
  • Digg
  • Blogger Post
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
  • WordPress
  • Share/Bookmark

Checking Apache logs

Had an interesting find today as I was looking over the Apache logs for PL PHP:

85.190.0.3 - - [07/Jan/2010:09:55:55 +0100] "CONNECT 213.92.8.7:31204 HTTP/1.0" 200 4679 "-" "-"
85.190.0.3 - - [07/Jan/2010:09:55:58 +0100] "POST http://213.92.8.7:31204/ HTTP/1.0" 404 1311 "-" "-"

I had a couple of those in the logs and was rather wondering what it was – seemed fairly odd. Some quick googling didn’t show anything apart from others having the same confusion over this. The explanation turned out to be simple, though: if you’re connected to freenode (an IRC network), they’ll scan your IP checking for open proxies. They do that to defend themselves against DDoS attacks, so it’s hard to get annoyed by it … would be nice if the user agent would specify something like it, though.

How I found out? Whois the ip the traffic comes from, and you’ll see the following message:

remarks:        ****************************************************
remarks:        ****************************************************
remarks:        If you see portscans/abuse from 85.190.0.3
remarks:        Please read http://freenode.net/policy.shtml#proxies
remarks:        ****************************************************
remarks:        ****************************************************

Good old whois, always providing for the info-needy.

  • Google Reader
  • Delicious
  • Digg
  • Blogger Post
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
  • WordPress
  • Share/Bookmark

Sorting algorithms: Selection sort

1000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.002461 seconds. Memory used: 48264 bytes
Running SelectionSort sort. Algorithm took: 0.273260 seconds. Memory used: 48272 bytes

1000 elements to sort.Sanity check. PHP native sort() algorithm took: 0.002461 seconds. Memory used: 48264 bytesRunning SelectionSort sort. Algorithm took: 0.273260 seconds. Memory used: 48272 bytesNext in line of this series: the Selection Sort. This algorithm is not too unlike Insertion Sort, in that it works by putting one element in the right place, at a time. More specifically, selection sort works by making n-1 passes over an n-sized array, each time finding the smallest element and putting it in position n of the array. On the surface it might also look like the bubble sort algorithm: like that, the selection sort does a load of comparisons per pass over the array. However, where bubble sort makes n-1 comparisons per pass of the array, selection sort makes (n-1)-m comparisons per pass, where m is the current pass. In other words, the more passes done, the less comparisons to do.

As a set of steps:

  • Step 1: set index of array to 0
  • Step2: find smallest element of array, searching from array index and up
  • Step 3: swap smallest element with element at array index
  • Step 4: increase array index

The code

The code for this algorithm is fairly simple – as one might expect. It looks like:

<?php
class SelectionSort extends BaseSort
{
    public function sortFunction()
    {
        $count = count($this->store);
        for ($i = 0; $i < $count - 1; $i++)
        {
            $comparison = $this->store[$i];
            $index = $i;
            for ($ii = $i+1; $ii < $count; $ii++)
            {
                if ($this->store[$ii] < $comparison)
                {
                    $index = $ii;
                    $comparison = $this->store[$ii];
                }
            }
            $this->store[$index] = $this->store[$i];
            $this->store[$i] = $comparison;
        }
    }
}

Again, there’s the typical two-loop structure: outer loop and inner loop, matching position in array and sorting comparisons. Looking a bit closer, one can see that this algorithm is stable and that it doesn’t use extra memory to sort in (apart from a few extra variables).

Results

1,000 elements first:

1000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.002461 seconds. Memory used: 48264 bytes
Running SelectionSort sort. Algorithm took: 0.273260 seconds. Memory used: 48272 bytes

At 1,000 elements, it’s 13 times slower than the shell sort. It’s about twice as slow as the insertion sort algorithm. Next try is 10,000 elements.

10000 elements to sort.
Sanity check. PHP native sort() algorithm took: 0.035268 seconds. Memory used: 505704 bytes
Running SelectionSort sort. Algorithm took: 27.733544 seconds. Memory used: 505704 bytes

100 times slower than at 1,000 elements – the algorithm runs in O(n2) as one would expect, given how it works. This is not an algorithm to choose if you’ve got a lot of elements to sort and limited time. However, the memory footprint does make it nice if you’ve got limited resources.
Compared to the other algorithms, selection sort runs in about half the time of bubble sort and twice the time of insertion sort. And obviously, massively slower than shell sort.

  • Google Reader
  • Delicious
  • Digg
  • Blogger Post
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
  • WordPress
  • Share/Bookmark