The last week or more has seen quite a bit of activity for
Infosys - I've
been frenzied, to put it mildly :) Almost every core class has had major
changes and the framework is fundamentally different now. It's been lots
of fun, lots of learning involved - here are some thoughts on the
process:
1. Get some overview, create a gameplan
As with all other areas of development (and most things work, really),
step 1 should be getting an overview of the current state of things and
what needs to be done. If the project is your own (and Infosys, at this
point, is pretty much 100% me) there's a fair chance you'll want to just
'dig in' - start refactoring to improve the framework straight away,
because 'you know how it works already and how to improve it'. This is a
bad idea, because it will inevitably lead straight back to where you are
now: refactoring things you're not quite happy with.
In the case of Infosys, lots of core classes were
singletons
- something I don't have a problem with as such (see my post on the
matter) but was thinking of moving away
from in favour of
dependency
injection.
My main reasons for not using Singletons anymore were related to testing
(using PHPUnit, to be specific) where
singletons can be a pain (although they don't have to be) and to closing
classes off as much as possible, decoupling them from one another.
I started out by looking at the core code and looked at what would
function better as pure objects and decided on a limited set of classes
- moving some functionality out of the request handler into a request
object, changing session and log classes. However, I started refactoring
things sooner than I should have, believing that I had found a new and
better design. The problem was that I didn't actually look at the entire
framework core with an objective eye - specifically, I opted to keep the
database class a singleton (this app will, for any foreseeable future,
ONLY NEED ONE CONNECTION! SO DONT EVEN BOTHER!). I thought it
problematic to pass a single database object around (you want to create
it when initializing your app if it's database driven, instead of
finding out very late in the process that you don't have a database
connection) because it meant handing it to other objects that had no
reason to have a copy (a request handler shouldn't touch the database)
in order to pass it to the ones that need it. As you can probably guess,
though, I got over this problem, because the alternative is a) using a
registry (which I had been doing but also refactored, again to improve
decoupling and testability) or b) direct access to the singleton (which
I did for a short period until I decided that really wasn't a very good
idea).
The moral here is that I ended up doing more work than I should have
done. I should have realised that it would suit Infosys better to not
use singletons at all and gone for a more unified refactoring of the
core classes. Instead I first refactored code to access the database
through the classic getInstance method of the database class, and then
afterwards refactored code to just pass an instance of the database
class around. The end product is the same, I just spent more time doing
it.
2. Decide on the overall design
One thing that I've learned from handling several different frameworks
and frameworks in different versions is that consistency rules. Creating
new features or fixing bugs in old ones is so much easier if the code
and surrounding framework is in harmony and if the framework has a
driving idea behind it.
One problem with creating a framework over time is that as you learn,
you'll be adding new stuff to it. Perhaps some new ways of doing things,
new code that works better for some needed things. Unless you really
strive to keep everything consistent, you'll end up with having
different ways of doing related things in the framework - and this makes
creating new features a hassle. This was one of my concerns when
refactoring the core classes of the Infosys framework: if half the core
classes were refactored to be normal classes and the other half stayed
singletons, I would constantly have to guess at which was which. Should
I create a new instance of Log? Or should I ask the Registry for the
instance? As noted before, I refactored the singletons, but still I have
a minor problem here:
class Someclass
{
public function test()
{
$log = new Log($this->db);
}
}
The issue in the code above is that the Log class handles both writing
to file and writing to the database (the first is used for errors and
debug stuff, while the latter is used for logging normal application
actions) so the Log class needs a database connection. However, because
I'm only dealing with one connection that I want to reuse, the database
object is always passed around, while log objects are just created. As
both classes constitute core framework code, it would be easier if they
worked the same way: less to remember. I could obviously just have opted
for creating one instance of the Log class but there wouldn't be much
point in passing it around: it does very little and does not need to
persist (quite the contrary, in fact).
3. Simplicity over cleverness
Part of refactoring is to me the idea of bringing everything together in
a consistent idea. You're removing the clutter, bringing things in line
with how they should be. Implicit in this idea is also that you should
go for the simple design - that clever idea you got a some point is
probably the reason you're refactoring things now. I would even take
this one step further: if you ever have the choice between clever and
simple, avoid shooting yourself in the foot by choosing clever. There's
a 95% chance you'll regret going with clever later.
An example of this from Infosys would be the reuse of a searchform.
Prior to refactoring, MVC apps used view classes to store all output in
- not in itself a clever solution, but unfortunately not a very good one
either as some view classes reached the 2k+ lines. Also, controllers
called view functions and then returned to the request handler - instead
of just telling the request handler which template should be used and
then be done with it. The clever part came in reusing template bits,
specifically a fairly big searchform containing lots of inputs. The
reuse was handled by having a private method in a view class - so the
template would just call the private method.
Now, when refactoring to use templates loaded into a generic page class
rather than having a view class for each MVC, this becomes a problem:
how to get the shared template in a nice and easy way? Templates could
always require it, but it would really be nasty to see a require fail in
the middle of rendering a template. It would also be possible to hand
over the searchform to a helper class, which might in the end be the
best solution - however, then the searchform will be removed from it's
normal place, which is in the templates folder for the given app.
Now, I may very well go with the latter idea (haven't quite decided yet)
as I'm already using a view helper and keeping other bits of template
for reuse in there. The problem is obviously that I didn't go with this
path for all the reused templates from the start: had I done that, I
wouldn't be facing this problem now but would just be able to move on to
the next bit of code that needs work. Again, a decision to go for a more
clever solution instead of simplicity and consistency has ended up
biting me in the rear.
4. Test, test, test
One of the key things I wanted to implement in this drive to improve
Infosys is unit tests. Again, seeing as Infosys is the product of pretty
much just me alone, I have no problem diving into the code and changing
some fundamental thing to make it better. This is a good and a bad thing
- but luckily the bad consequences can be pretty much nullified with
tests. First, it's obviously a good thing because it means Infosys
doesn't have to stick to using bad code (well, it obviously has to stick
with however good the code I create is but the point is that if I know
some designs or ideas to be better than the existing Infosys code, I'll
just change it). As a consequence, using Infosys is now easier and more
consistent than just two weeks ago. On the bad side, I've completely
broken Infosys many times over during the last two weeks. If I didn't at
the same time implement unit tests for all the things I change, I'd have
no idea if I was introducing bugs in the code while refactoring it.
So, what I've been doing is creating tests every time I've changed code.
When refactoring the database class I created a database test class to
go along with it, so I'd know that it would still work the same way,
even if the insides changed radically. At times this approach created
some minor problems - I was refactoring one class when I had to stop
that in order to refactor a dependency. The end effect is truly awesome
though. So far I've 154 tests done and can be fairly certain that if
something underlying or unexpected changes in the tested classes, I'll
know about it as soon as I run the tests - which happens everytime I
change code.
Better still, in case someone else at some point decides to join in on
the development of Infosys it will be a breeze to install the system and
then do a systems test to make sure that everything is working as it
should. If any tests break, you'll know immediately that there's a
problem in the system - and you won't need to dig around for hours
looking for the problem.
And of course, there's another added bonus: while writing tests for the
Infosys code, I've come across things that weren't quite as I would
expect them to be. In other words, writing tests have helped me fix bugs
I didn't even know I had and that didn't get picked up in the code. One
particularly nasty bug went something like this:
...
$select->setWhere(field, =, value);
return $this->findBySelectMany($select);
...
There's nothing wrong as such with the above code - in fact, it hasn't
changed at all. However, the underlying code worked in a way that made
the above code dangerous. Specifically, if something was bad with the
input, the function would fail with a bool return code. Which the above
code obviously just ignores. I had no idea this code was flawed until I
created a test to check it. Now, however, the underlying code throws
exceptions if the input is bad - no longer any way for code to ignore
errors silently.
5. Keep it together
Refactoring Infosys has been a ton of fun so far, I've learned a lot
through it. However, while doing it it's important to keep in mind that
there's a reason for doing this: refactoring in itself is pointless, it
has to be done to achieve something. For Infosys this something is the
need to be able to implement a greater range of features and do so more
easily (the issue list on GitHub is fairly long). If I didn't try to
attain a specific goal, I would fall into the trap of the DIY person:
there's always more to do and it can always be done better. Sooner or
later you'll tire of your neverending project and put it aside. This is
something to avoid and the way to do that is create clear goals and
attain them one by one. Then you can measure your progress and you will
actually stand a chance of getting done.