PHP unit-testing

Lately I've begun work on Infosys again - a mixture of new features, bug-fixes and an insane amount of refactoring. Because Infosys is a pet-project of mine, partly done to experiment and learn, I have no problem redoing all the core parts now and then - just for kicks. However, the application also needs to work at the end of the day, and unit-tests seem a fairly good idea (well, essentially they always do, but sometimes more than others), so I've set about updating and creating more unit-tests for Infosys.

Unit-testing is fairly easy to get going with if you've designed your app properly - which essentially means you've avoided singletons and globals, and used dependency injection. And you've either gone with OOP or a functional paradigm (when your unit is a script 2,000 lines long running in global scope, unit-testing just isn't an option).

The problem

However, even given the above, there may still be a snag or two. For instance, there is still the good old question of the database. Also, if you happen to have implementation details that are of protected visibility, you might face some problems. And, of course, you might just be in the middle of converting your app to a more reasonable architechture.

Specifically, for Infosys, the problems I'm facing are:

lack of data for testing database related classes
lack of access to internals of classes
tests must not change live data

The part of the code I'm working on now is an Active Record pattern hierarchy of data-classes. Most of the methods in the classes I want to test deal with data, one way or the other - so there's just no testing without data. Had I been using a Data Mapper pattern, I wouldn't have had the problem, seeing as then I'd just be able to directly set the data in the objects. Instead I went for a hierarchy along the lines of:

class DBObject
{
  // all db data goes here
  protected $storage = array();
  // various code here, including data loading stuff such as:
  public function findById($id)
  {
  // code that queries the database goes here
  }
}

class User extends DBObject
{
 // various code here that calls inherited methods
}

Now, the code reuse is nice and I'm happy with that - however, I'm less happy with not being able to set data directly on the objects. Actually, I'm unhappy about not being able to set important data like IDs, which are shielded off from the outside. Even if I were able to do that, though, I would still face another problem: the data classes use inherited methods to load other objects, and they also keep track of state through protected properties. On top of that there's then the typical problem of avoiding tests messing up the database.

Lack of data - solution

The most obvious solution to the problem is to mock or stub the database access. Luckily, the data classes are all injected with a DB object that handles all database queries - this means that if I replace the DB properly, the data classes will never know the difference but I can write and run exactly the tests I want, providing the data objects with the data I need.

There are different ways to go about mocking out a class: you can either use PHPUnits built in methods, or you can subclass the class you need to test. Both have pro's and con's - but normally I'd start with PHPUnits mocks as one of the pros here is that it's easy to get started with. This means using the inherited getMock() method to create a mock class dynamically. Or, at least it used to mean that: now you can instead use the getMockBuilder() method to chainlink some methods and end up with the same result. The difference between the two?

// getMock() way
$mock = $this->getMock(
 'class',
 $methods_to_mock,
 $constructor_params,
 $new_class_name,
 $disable_original_constructor,
 $disable_original_clone_constructor,
 $disable_autoload
 );

// getMockBuilder() way
$mock = $this->getMockBuilder('class_name')
 ->disableOriginalConstructor()
 ->disableAutoload()
 ->getMock();

In case you only need to disable the constructor and autoload, the getMockBuilder() way is so much more intuitive! It makes understanding the test code a lot easier too, as you know what the parameters are for the mock without looking them up.

So, armed with this, I could get to work mocking my DB class to inject it into my data objects. Essentially, the basics I needed was:

// build DB mock
$mock = $this->getMockBuilder('DB')
 ->disableOriginalConstructor()
 ->getMock();

This works and the data objects happily accept the mock. In terms of dealing with data it doesn't do too much, though. To fix that:

// accept calls to query()
$mock->expects($this->any())
 ->method('query')
 ->will($this->returnCallback(array($this, 'mockDBQuery')));

// accept calls to exec()
$mock->expects($this->any())
 ->method('exec')
 ->will($this->returnCallback(array($this, 'mockDBExec')));

The point of the above calls is twofold: first, I can provide data to the objects through their normal channels, but secondly I also get a chance to react to the SQL run. As the SQL is mostly auto-generated, it's fairly easy to setup rules for it. Currently I've just got a fairly big method reacting to various queries run, but I could easily setup testing so that each test declares which queries it expects should be run, and anything else would trigger a fail.

One note about the methods above: making a PHPUnit mock return different data for multiple calls to the same method can be hard to figure out, but a bit of googling shows at least two ways:

using the at() method
using a callback

The at() method allows you to specify when a given method should be called and what should happen when it is called. This also allows you to specify that the same method will be called several times with different results - normally PHPUnit just overwrites previous expects() for the same method. With at() you specify the order of calls - however, that is also the downside, you need to specify in exactly which order calls are made. Should anything change, you then need to change your testing code as well - even though the behaviour of your SUT remains the same (and this is essentially the only thing you should care about).

So I'm using the second option, the callback. This works in the same fashion as callbacks normally do in PHP - provide an array with an object and the name of the method to call, and PHPUnit will call that method when the expected test-method is called, providing your return value from the callback to the caller of the test method. That allows me to do something like:

  public function mockDBQuery()
  {
    $args = func_get_args();
    $query = $args[0];
    $arguments = isset($args[1]) && is_array($args[1]) ? $args[1] : array();

    if ($query instanceof Select) {
      $arguments = $query->getArguments();
      $query = $query->assemble();
    }

    if (stripos($query, 'describe `users`') !== false) {
      return $this->returnUserTableInfo();
    }

    if (stripos($query, 'describe `roles`') !== false) {
      return $this->returnRoleTableInfo();
    }

    throw new Exception('Unexpected query: ' . $query);
  }

  protected function returnUserTableInfo()
  {
    return array(
      'data goes here',
    );
  }

  protected function returnRoleTableInfo()
  {
    return array(
     'data goes here',
    );
  }

Similarly, in the exec() method mock I can check that queries look right and come with the correct amount of parameters.

Lack of access to internals of classes

The second problem I'm facing is that some of the behaviour of the data objects is different based on internal state. More specifically, if they have been loaded with data from the database, they're aware of that. What it comes down to is essentially awares of object ID: it's only set when loading the object or creating it in the database, to make sure it cannot be set or changed from the outside (it would suck pretty bad to load an object, accidentally change the ID, then save it as a new object). While it's a good safety to have, it also creates problems, as I'd need to create the object and then load it with data - even though I have no need for that in my tests (it's best to minimize the amount of code in tests, because you want to test as few things at a time as possible, to be able to tell exactly what is breaking).

Lack of access to internals of classes - solution

To get around this problem, I went for the second solution outlined above: subclassing. The beauty of this solution is that you can have your cake and eat it: I am not changing the behaviour of my data objects in any way, yet I still get to set them up exactly as I need them. Example:

class DBObject
{
  protected $has_loaded = false;
  protected $storage = array();

  public function isLoaded()
  {
    return $this->has_loaded;
  }
}

class User extends DBObject
{
  // various code and stuff to test here
}

class UserMock extends User
{
  public function overrideHasLoaded($bool)
  {
    $this->has_loaded = !!$bool;
    return $this;
  }

  public function overrideId($id)
  {
    $this->storage['id'] = $id;
    return $this;
  }
}

The User class doesn't have the overrideHasLoaded() or overrideId() methods, neither does the DBObject class - so by subclassing User and adding those methods, I'm not changing the behaviour of User, which means the results of my tests will be valid. This way I can get at the internals of the User class without exposing any of it to normal operations. The mock subclasses I create this way live with my tests so they won't get autoloaded by mistake - which could cause problems by exposing an interface that shouldn't exist.

Conclusion

There just really isn't any good reason to avoid unit-testing - not even technical ones. In fact, once you get started testing stuff, it turns out it's good fun to figure out how you can make sure your code is tested! So get to it :)

The problem

Lack of data - solution

Lack of access to internals of classes

Lack of access to internals of classes - solution

Conclusion

social