Latest Publications

Beginning actionscript

Set off by a job interview I got, I’ve started learning actionscript. It’s not something I’ve dealt with before apart from decompiling flash scripts to debug them. There’s been some obstacles before I could really dig into it – the first being getting an actionscript compiler installed on Ubuntu. As always, though, someone else had already had the same problem, solved it, and blogged about it. turns out all you have to do is grab the Adobe Flex SDK, unzip, chmod the mxmlc file and you’re set on your way.

Of course, some people demand an IDE and it turns out that Adobe has been heeding their calls: you can get the Adobe Flex Builder as a plugin for Eclipse. Getting that to work on Ubuntu 9.04 was a bit more work than the above, but again: others have already solved that problem. Get a Java runtime environment, get the newest build of Eclipse (the one in the Ubuntu default sources is too old), get the plugin from Adobe.

After that, I could start having a look at the language, and it is indeed obvious that it’s based on EcmaScript: it’s fairly straightforward if you know JavaScript, which is obviously a plus for me. There are some big differences from the start, though, such as the strange need to package things … but allowing for stuff outside the package in files. Also, if only one class is allowed inside a package, there’s no need whatsoever to enforce naming to be the same as the file containing it – whatever the public class inside the package in the file, THAT’S what is instantiated.

I’m guessing that whoever designed ActionScript liked a lot of JavaScript but was scared witless by all the weak typing and the passing around of functions, etc., so strong typing was (almost) enforced on all variables (the actionscript compiler will complain about missing types but will compile nonetheless …) as well as classes and what have you.

So far, the best part about actionscript is the ease with which you’ll get going. And the worst part is my lack of a debugger (can’t blame the language on that though). On that note, I find it rather strange that Flex Builder provides me with rubbish error output whereas running the command line compiler pinpoints the problems.

Anyway, here’s my first try at writing something in actionscript: line-drawing with random colors [flash file].

And here’s the code for it:

package
{
    import flash.display.Sprite;
    import flash.events.Event;
    import flash.events.MouseEvent;

    public class drawing extends Sprite
    {
        private var _mouse_moving:Boolean;
        private var _line:Line;

        public function drawing()
        {
            stage.addEventListener(Event.ENTER_FRAME, init);
        }

        private function init(event:Event):void
        {
            stage.removeEventListener(Event.ENTER_FRAME, init);
            stage.addEventListener(MouseEvent.MOUSE_DOWN, mouseDown);
            stage.addEventListener(MouseEvent.MOUSE_UP, mouseUp);
            stage.addEventListener(MouseEvent.MOUSE_MOVE, mouseMove);
        }

        public function mouseMove(event:MouseEvent):void
        {
            if (_mouse_moving == true)
            {
                _line.drawLine(event.stageX, event.stageY);
            }
        }

        public function mouseDown(event:MouseEvent):void
        {
            _mouse_moving = true;
            var color:int = uint(((Math.random() * 255) << 16) + ((Math.random() * 255) << 8) + ((Math.random() * 255)))
            _line = new Line(color, 2, event.stageX, event.stageY);
            addChild(_line);
        }

        public function mouseUp(event:MouseEvent):void
        {
            _mouse_moving = false;
        }
    }
}

import flash.display.Sprite;
import flash.display.Shape;
import flash.events.Event;

class Line extends Shape
{
    private var _color:int;
    private var _thickness:int;
    private var _start_x:int;
    private var _start_y:int;

    public function Line(color:int, line_thickness:int, start_x:int, start_y:int)
    {
        _color = color;
        _thickness = line_thickness;
        _start_x = start_x;
        _start_y = start_y;
    }

    public function drawLine(end_x:int, end_y:int):void
    {
        graphics.clear();

        graphics.lineStyle(_thickness, _color);
        graphics.moveTo(_start_x, _start_y);
        graphics.lineTo(end_x, end_y);
    }
}

Ranting about PHP

Came across this blog post on PHP today. Seems to me the author needs a fair amount of sleep and then a cop of coffee or two …

First,

… it all started with me trying to be clever with array_map and array_filter.

Being clever is not a good thing. In my experience it always leads to crappy code that you’ll regret ever having done when time comes round to code maintenance. Whenever the thought strikes you – “I’ll come up with a clever solution for this” – think twice, then step back and redesign.

Secondly,

I knew array_filter existed and what it was all about since before, however I started working with something requiring array_map first, all well and OK, array_map looks like this: array_map(’callback’, Array). So then I assumed I could use array_filter in the same fashion, big mistake.

As Henrik goes on to point out, PHP is not consistent in function definitions, which is a problem. A much bigger problem is that something with 5 years of self-proclaimed PHP-hacking experience doesn’t check function definitions but wastes time assuming either that 1) function definitions are the same for similar functions (which apparently he knows is not the case) or that 2) he knows the functions already when he doesn’t. How many tools are there for checking PHP function definitions easily? A simple search key in FireFox would have solved this in 30 secs.

Thirdly,

So the class method containing both the array_map and array_filter callbacks has only one argument: $name, which contains a partial string to match against users’ full names or usernames. I wanted to use that variable inside the array_filter callback.

What was I thinking? I knew that it wouldn’t work, of course it wouldn’t since callbacks are just an ugly hack bolted on top of PHP, hell you’re calling the name of the callback in the form of a string. I’ve got no idea how the interpreter is actually evaluating this stuff and I don’t want to know, I’d rather keep what I have in my stomach where it is.

Yes, callbacks are nasty but they provide for quite a lot of functionality. You can use the name of a function, an object method, the result of create_function or even (from PHP 5.3) a PHP closure. But of course, that doesn’t directly solve Henriks problem: passing that extra variable to array_filter or array_map. And here there obviously is a problem that could use a nice and simple design like allowing for passing an extra variable. Workarounds are possible, though:

<?php
class dummy
{
    private $_partial;
    public function getFriendsByPartial($partial)
    {
        $this->_partial = $partial;
        return array_filter($input_array, array($this, 'callback'));
    }

    public function callback($input)
    {
        // test $input against $this->_partial here
    }
}

Is that pretty? No. Does it work? Pretty sure it does. Does it keep things inside the class? Yup. Did it take me longer to write than it took me to read Henriks post? No.

However, this was apparently not clean enough, so instead of doing something simple like this, the alternative was writing up a function using foreach and stripos. Only Henrik couldn’t remember (after 5 years) that there’s strpos and stripos (and, again, checking function lists was not an option) so he wasted more time.

I probably lost over an hour just on that one and it’s happened to me at least 5 times, yes roughly one time per year, just this fucking shitty little function. Why oh why are there two of them? You have to be rainman to navigate this swamp, of course I should’ve used stripos()! What an idiot I am right? Why couldn’t my pea brain remember that, especially since it’s happened to me so many times before? I must truly be a complete moron.

I don’t think he’s a moron for not remembering the difference between strpos and stripos (or rather that both exist). If there’s anything at play here, it’s the belief that you can actually remember all the functions of PHP. So when the ironic morale comes

When it comes to PHP, don’t try to be clever, and most importantly of all, know your functions, all 5000 of them.

it’s very hard not to just shrug it off and think that the programmers arrogance got to Henrik. Don’t ever think you can know all of the functions in PHP – look them up before you start abusing, that way you’ll avoid spending more than 5 minutes on the thing that cost Henrik hours.

I’m glad that in the end though, he did gain the insight of not trying to be clever.

Refactoring Infosys

The last week or more has seen quite a bit of activity for Infosys – I’ve been frenzied, to put it mildly :) Almost every core class has had major changes and the framework is fundamentally different now. It’s been lots of fun, lots of learning involved – here are some thoughts on the process:

1. Get some overview, create a gameplan

As with all other areas of development (and most things work, really), step 1 should be getting an overview of the current state of things and what needs to be done. If the project is your own (and Infosys, at this point, is pretty much 100% me) there’s a fair chance you’ll want to just ‘dig in’ – start refactoring to improve the framework straight away, because ‘you know how it works already and how to improve it’. This is a bad idea, because it will inevitably lead straight back to where you are now: refactoring things you’re not quite happy with.

In the case of Infosys, lots of core classes were singletons – something I don’t have a problem with as such (see my post on the matter) but was thinking of moving away from in favour of dependency injection. My main reasons for not using Singletons anymore were related to testing (using PHPUnit, to be specific) where singletons can be a pain (although they don’t have to be) and to closing classes off as much as possible, decoupling them from one another.

I started out by looking at the core code and looked at what would function better as pure objects and decided on a limited set of classes – moving some functionality out of the request handler into a request object, changing session and log classes. However, I started refactoring things sooner than I should have, believing that I had found a new and better design. The problem was that I didn’t actually look at the entire framework core with an objective eye – specifically, I opted to keep the database class a singleton (this app will, for any foreseeable future, ONLY NEED ONE CONNECTION! SO DONT EVEN BOTHER!). I thought it problematic to pass a single database object around (you want to create it when initializing your app if it’s database driven, instead of finding out very late in the process that you don’t have a database connection) because it meant handing it to other objects that had no reason to have a copy (a request handler shouldn’t touch the database) in order to pass it to the ones that need it. As you can probably guess, though, I got over this problem, because the alternative is a) using a registry (which I had been doing but also refactored, again to improve decoupling and testability) or b) direct access to the singleton (which I did for a short period until I decided that really wasn’t a very good idea).

The moral here is that I ended up doing more work than I should have done. I should have realised that it would suit Infosys better to not use singletons at all and gone for a more unified refactoring of the core classes. Instead I first refactored code to access the database through the classic getInstance method of the database class, and then afterwards refactored code to just pass an instance of the database class around. The end product is the same, I just spent more time doing it.

2. Decide on the overall design

One thing that I’ve learned from handling several different frameworks and frameworks in different versions is that consistency rules. Creating new features or fixing bugs in old ones is so much easier if the code and surrounding framework is in harmony and if the framework has a driving idea behind it.

One problem with creating a framework over time is that as you learn, you’ll be adding new stuff to it. Perhaps some new ways of doing things, new code that works better for some needed things. Unless you really strive to keep everything consistent, you’ll end up with having different ways of doing related things in the framework – and this makes creating new features a hassle. This was one of my concerns when refactoring the core classes of the Infosys framework: if half the core classes were refactored to be normal classes and the other half stayed singletons, I would constantly have to guess at which was which. Should I create a new instance of Log? Or should I ask the Registry for the instance? As noted before, I refactored the singletons, but still I have a minor problem here:

class Someclass
{
    public function test()
    {
        $log = new Log($this->db);
    }
}

The issue in the code above is that the Log class handles both writing to file and writing to the database (the first is used for errors and debug stuff, while the latter is used for logging normal application actions) so the Log class needs a database connection. However, because I’m only dealing with one connection that I want to reuse, the database object is always passed around, while log objects are just created. As both classes constitute core framework code, it would be easier if they worked the same way: less to remember. I could obviously just have opted for creating one instance of the Log class but there wouldn’t be much point in passing it around: it does very little and does not need to persist (quite the contrary, in fact).

3. Simplicity over cleverness

Part of refactoring is to me the idea of bringing everything together in a consistent idea. You’re removing the clutter, bringing things in line with how they should be. Implicit in this idea is also that you should go for the simple design – that clever idea you got a some point is probably the reason you’re refactoring things now. I would even take this one step further: if you ever have the choice between clever and simple, avoid shooting yourself in the foot by choosing clever. There’s a 95% chance you’ll regret going with clever later.

An example of this from Infosys would be the reuse of a searchform. Prior to refactoring, MVC apps used view classes to store all output in – not in itself a clever solution, but unfortunately not a very good one either as some view classes reached the 2k+ lines. Also, controllers called view functions and then returned to the request handler – instead of just telling the request handler which template should be used and then be done with it. The clever part came in reusing template bits, specifically a fairly big searchform containing lots of inputs. The reuse was handled by having a private method in a view class – so the template would just call the private method.

Now, when refactoring to use templates loaded into a generic page class rather than having a view class for each MVC, this becomes a problem: how to get the shared template in a nice and easy way? Templates could always require it, but it would really be nasty to see a require fail in the middle of rendering a template. It would also be possible to hand over the searchform to a helper class, which might in the end be the best solution – however, then the searchform will be removed from it’s normal place, which is in the templates folder for the given app.

Now, I may very well go with the latter idea (haven’t quite decided yet) as I’m already using a view helper and keeping other bits of template for reuse in there. The problem is obviously that I didn’t go with this path for all the reused templates from the start: had I done that, I wouldn’t be facing this problem now but would just be able to move on to the next bit of code that needs work. Again, a decision to go for a more clever solution instead of simplicity and consistency has ended up biting me in the rear.

4. Test, test, test

One of the key things I wanted to implement in this drive to improve Infosys is unit tests. Again, seeing as Infosys is the product of pretty much just me alone, I have no problem diving into the code and changing some fundamental thing to make it better. This is a good and a bad thing – but luckily the bad consequences can be pretty much nullified with tests. First, it’s obviously a good thing because it means Infosys doesn’t have to stick to using bad code (well, it obviously has to stick with however good the code I create is but the point is that if I know some designs or ideas to be better than the existing Infosys code, I’ll just change it). As a consequence, using Infosys is now easier and more consistent than just two weeks ago. On the bad side, I’ve completely broken Infosys many times over during the last two weeks. If I didn’t at the same time implement unit tests for all the things I change, I’d have no idea if I was introducing bugs in the code while refactoring it.

So, what I’ve been doing is creating tests every time I’ve changed code. When refactoring the database class I created a database test class to go along with it, so I’d know that it would still work the same way, even if the insides changed radically. At times this approach created some minor problems – I was refactoring one class when I had to stop that in order to refactor a dependency. The end effect is truly awesome though. So far I’ve 154 tests done and can be fairly certain that if something underlying or unexpected changes in the tested classes, I’ll know about it as soon as I run the tests – which happens everytime I change code.

Better still, in case someone else at some point decides to join in on the development of Infosys it will be a breeze to install the system and then do a systems test to make sure that everything is working as it should. If any tests break, you’ll know immediately that there’s a problem in the system – and you won’t need to dig around for hours looking for the problem.

And of course, there’s another added bonus: while writing tests for the Infosys code, I’ve come across things that weren’t quite as I would expect them to be. In other words, writing tests have helped me fix bugs I didn’t even know I had and that didn’t get picked up in the code. One particularly nasty bug went something like this:

...
$select->setWhere(field, =, value);
return $this->findBySelectMany($select);
...

There’s nothing wrong as such with the above code – in fact, it hasn’t changed at all. However, the underlying code worked in a way that made the above code dangerous. Specifically, if something was bad with the input, the function would fail with a bool return code. Which the above code obviously just ignores. I had no idea this code was flawed until I created a test to check it. Now, however, the underlying code throws exceptions if the input is bad – no longer any way for code to ignore errors silently.

5. Keep it together

Refactoring Infosys has been a ton of fun so far, I’ve learned a lot through it. However, while doing it it’s important to keep in mind that there’s a reason for doing this: refactoring in itself is pointless, it has to be done to achieve something. For Infosys this something is the need to be able to implement a greater range of features and do so more easily (the issue list on GitHub is fairly long). If I didn’t try to attain a specific goal, I would fall into the trap of the DIY person: there’s always more to do and it can always be done better. Sooner or later you’ll tire of your neverending project and put it aside. This is something to avoid and the way to do that is create clear goals and attain them one by one. Then you can measure your progress and you will actually stand a chance of getting done.

Migrating servers

This week and the weekend before that I’ve had the chance to experience the joy of migrating servers. BeWelcome has opted for a new production server, as a) the new contract is cheaper and b) the new server has better hardware. Based on that experience, there are a number of conclusions to draw:

  • Before you do anything that looks like migrating, make sure that everything on the server is documented.
  • Detail everything that needs to be done before starting.
  • After that, make sure that everything is backed up.
  • Start moving things to the new server in good time. Last minute means errors.
  • Make sure you go through all the phases: preparation, migration, cleanup.

Getting ready for the server migration, I tried to get down details of everything that needed to be moved. This proved VERY hard, because we had no documentation on what was running on the server. Sure, I know that we’re running a website using an Apache webserver and utilizing PHP and MySQL. However, what do we use to send out emails? How are we handling normal backups? Which users are needed, which are outdated and should be removed? Most of these things weren’t documented anywhere, so there was a whole lot of detective work to do. And while this can be interesting, you don’t want to lose data because of something as simple as this.

There are obviously also further consequences to the lack of documentation: which scripts should be running automatically? Those that run now or less? Should they be running with the privileges they have now or should they actually be restrained? While you don’t necessarily want to deal with this during a server migration, this is actually the time when you’re most likely to look closer at what’s running and how – unless you’re doing an actual integrity check of the server (and how often are you doing those? As often as you should or less?).

Another of the points that didn’t exactly to plan for our move was backing things up. I didn’t give it too much thought, as the data would be present on the old server during the migration and afterwards too. However, we were handing the old server back to the provider, so the data would obviously not be available for long – in fact, the data was available for one day only after the migration was done. We didn’t manage to lose anything critical, but there were a few things we didn’t manage to get set up properly before the old settings were gone. This relates to the point on documentation: there were some things that were not documented properly and so we didn’t know they were needed on the new server. Hence, when we found out, it was too late to go back and check.

Now, one way of handling backup would of course be to just dd the entire disk and be done with it. Which is a good idea if you have the space for it and can transfer it off your server in good time. Even easier, if you’re just migrating servers and will be mirroring the old setup, then you can just copy the old server and stick it on the new. In our case, though, we were moving from Debian 4 to 5. Doing a copy of the system wasn’t an option, and to make sure that apps were working properly, we needed to get them set up first. Exactly what do you backup then? On a Linux setup, probably /etc, /home, some of /var … and again, you see the problem of lacking documentation. Guesswork isn’t really good enough here.

Because we weren’t done with the migration till a day before handing the old server back, there wasn’t a lot of time for cleaning up either. This wasn’t too big a problem with regards to the old server: files were deleted, the system was reinstalled a couple of times, nothing should remain unless you’re either reading the sectors manually or going for the magnetic trace of things – neither of which are too likely. With regard to the new server, there is more cleaning up to do: because of the lacking documentation, some things were setup in less than perfect ways and some data was copied over without ending in the right places. In other words, there’s quite a bit of cleanup to do. And because the migration took place too late, there’s no original setup to look at for comparison.

For next time

One thing is for sure – I’ve definitely learned my lesson from this experience. I’m currently busy writing up documentation on what we’re running, where everything is, how to go about setting it up, etc. It should result in two things:

  1. Proper documentation of what we use and what we have
  2. Proper documentation on how to move it

Together, the two should allow for a fairly easy time when handling the next migration. Not sure I want to be part of that, but at the very least I should make sure anyone taking up the job after me has an easier time. Chances are, though, that I’ll be needing the documentation myself before another migration given the daily maintenance tasks …

PHP optimizing

I’ve read various tips about optimizing PHP code and at first I happily took them in. Later on, having read other points of view, I started to wonder a bit about some of the optimizations and later still I realised that some things are not just dubious but plain wrong. Yes, I’m talking about strings here.

If you’ve looked at optimizing PHP code and browsed the net regarding the topic, you will invariably have come across tips amounting to use single quotes wherever possible instead of double quotes. You will almost certainly also have come across articles or blog posts telling you that this is bogus. Initially, looking at the advice, you might think it makes sense: PHP evaluates double-quoted strings to see if they contain variables that need to be inserted into the string. However, if you roll your own tests of this advice, you’ll soon see two things: 1) it is not clear in any way that using single quoted strings is faster than double quoted strings and 2) if there is any gain in speed, you won’t notice it until you’re running millions of iterations. Literally, millions of string operations.

With this in mind, I don’t care at all about single or double quoted strings, I just use whatever … which turns out to be mainly double-quoted strings, because of inserted variables. You see, the thing that I do care about is code legibility. Take the following two pieces of code – which is more legible?

$string = "some text and " . $variable;
$string = "some text and {$variable}";

To me there’s no doubt: the latter is by far more readable. I don’t have to jump in and out of strings, I know that it’s all one string and I can read it as such. I know that a lot of others have a different coding style though, preferring the former. This got me thinking: is the case the same for the two strings above as for strings without variables? In other words: is there a difference in performance when variables enter the picture?

To answer my question, I wrote a quick test:

$count = 20000000;
$start = microtime(true);
$string = '';
for ($i = 0; $i < $count; $i++)
{
    $string = "test string" . $i . "test string" . $i . "test string" . $i . "some string";
}
$middle = microtime(true);
$string = '';
for ($i = 0; $i < $count; $i++)
{
    $string = "test string{$i}test string{$i}test string{$i}some string";
}
$end = microtime(true);
echo "First 20000000 iterations took: " . ($middle - $start) . PHP_EOL;
echo "Last 20000000 iterations took: " . ($end - $middle) . PHP_EOL;

I ran it first with a million iterations and just one variable and got about a second for each loop. Seeing as this is rather low, I upped the count to 20 million, which gave a more interesting result – there was about a second difference between the two loops. Trying a couple more times, the difference stayed, so I wondered if there was something significant there. I figured that if anything, then more variables should make it more obvious – which it did: the first test took about 20 secs for both loops, the second set of iterations (two variables) took 33 and 34 secs. Adding another variable took me to 46 and 48 seconds for each 20 million loops – seemed that finally I might be seeing some actual difference. Until I switched the places of the loops, when suddenly the loop that was faster before was the slowest.

At this point, there are two options: look at the memory usage of the loops to see how that plays into things (maybe the difference seen previously can be tracked to that) or call it quits. Looking at the results, calling it quits is the only reasonable thing to. 20 million iterations and no clear result to show for it … other than that when variables are in the picture, strings take a lot longer to process. Which doesn’t leave you with a lot, as you’d have to reduce the amount of variables in the strings to improve speed.

And again: 1 million iterations with 1 variable amounts to 1 (one) second. 3 variables puts it at 2.1 seconds. Don’t ever bother with string optimizations. Not worth it unless you’re doing many, MANY millions of iterations.