Debugging javascript

Over the last week, I've had the joy of debugging some nasty javascript bugs. Generally, I really like javascript, especially since discovering Douglas Crockfords take on it (available in boiled-down form at Amazon). It turned from the typical "Javascript is a ridiculous toy language that doesn't work" to "My goodness, there is real beauty here!". There are, however, some downsides to it - a number in the language itself and then a fair amount connected to it. In fact, by far the worst parts of javascript seem to me to be the implementations of it. Yes, it's time for the good old "I hate you, Microsoft!!"

Bug 1: Ajax post request

The first bug I had to fix was not really a javascript bug. The only buggy part about it, in terms of javascript, was that I was allowed to write buggy code with no alarms going off. The problematic code was sending a post request done through pure javascript - no jQuery magic or anything like it, just pure javascript. Because the request needed to be a fullblown multipart/form-data request, the headers were from constructed from scratch - this included creating the boundary string for the request. Unfortunately, the code I was using was creating a flawed boundary (in particular, it included a semicolon at the end of the boundary) but all browsers happily sent the request along. And everything worked smoothly, till we upgraded the PHP binary on our server: in version 5.3.9, PHP enforces the HTTP standard more strictly and so it simply drops the incoming POST data if the boundary string isn't formatted properly. Which meant that all of a sudden, the ajax requests dropped like bricks.

How do you debug that? You can see that there's no POST data coming through, but you also know the script sending the data worked until recently - and you know the script hasn't changed. What you don't know is that the server was upgraded - but you learn this after about 1 hour. However, you do know that Chrome sends the data through just fine - but IE and Firefox don't. Where do you start looking for the problem? Nowhere do you get any errors (PHP doesn't notify you that the data from the POST request was dropped - and the browsers don't tell you that the request is faulty), so you have to guess at what the error might be.

What led me to the answer was that Google Chrome detected the problem and fixed it - although silently. Chrome rewrites the boundary identifier before sending the request through - if it detects a faulty boundary identifier. So when I tried googles boundary string, suddenly things worked. A bit of experimentation later and I had the solution - which also led me to the changelog for PHP 5.3.9, where this thing is mentioned.

Bug 2: cloning in IE9

The second bug that caused me massive headaches was cloning DOM elements in IE9. I do the cloning in order to send back html to the server, for converting to PDF. Turns out that it's much, much, MUCH easier to deal with the html through the DOM (and especially through jQuery or other such libraries) than it is to deal with it through PHP. So what is done is that the main node is cloned, various bits/pieces removed and then the content is sent through a request as a text string. How could this go wrong, you ask? Well, turns out that IE9 decided to throw hissy fits and just randomly apply classes to elements. Well, not quite randomly: in the process of cloning, it came across one class-name it liked so much, that it decided to apply it to about 75% of all elements on the page (but not all the elements - that would have been too consistent).

What fascinates me about this is that all other browsers I have tried work just fine. Yes, that also means IE7 and IE8 (we don't cater to IE6 on the project) - no trouble there. Microsoft actually worked hard to make sure that there were new bugs in IE9. Well, I suppose the world wouldn't have been the same if one of their browsers actually worked as it should.

Of course, this is exaggerating things. Most likely, the fault is mine for cloning elements with IDs. Still, I think Microsoft got the golden rule of interoperability wrong (the one that goes something like: "be lax on your input but strict on your output"): instead of being lax about input, they decided to be strict about it ... while they in general are rather lax about their output (another example of this sending a response to an ajax request - you had better not think to set the charset to utf8, as that will just result in errors you have no idea how decipher).

What makes me think that? Well, the fact that my hacky workaround works: grab the string representation of the node to clone, add an extra bit to all IDs, then create a new element from that and start working. And hey presto, problem solved. Do I feel dirty now? Yes. Is my loathing of IE and MS bigger? Yes. Was I fooled again by them, thinking that IE had in fact improved? Yes. Shame on me? Yes.

The twist

I have recently developed a small error-handling script in JS (after reading about a new service that lets you install that in an easy fashion on your site and thinking "I can do that") and have put that up on the site. I had hoped to glean information from this but it turned out utterly useless for both bugs - because none of them were bugs in javascript or with my programming. They were bugs of the implementation of javascript in browsers - not of the language itself.

In fact, the error handler only served to confuse things, because it has picked up on a number of bugs, including errors in browser plugins and weird script handler issues in the Bing crawler. I don't regret putting it into place but certainly needs a lot of analysing before it gives off any goodies.

In the end, only lucky guessing and some deductions provided answers. Of course, various tools could probably have helped, but I would have needed to know that these would be useful in tracking down the error - so I would still have had to guess at the problem. I have learned a bit more about browsers, and - most importantly - about errors. The worst possible error is the one disguised as success: either your typical silent error (such as dropping the POST vars without mentioning it) or your more atypical everything-is-fine error (where things fail but appearances are kept up).

social