OK, I'll jump in on this one, give some news, and some apologies.
I've been very busy lately, service calls like crazy, new servers to install and set up, and of course my normal job description, and more recently, marriage plans!
I've been working on the parser as I can, (did you ever get curl to work in another thread? I had mild success with that)
The parser is coming along, I'm still having issues, one I'm seriously concerned about is memory usage, and the way I'm currently doing it basically ends up putting the page in memory twice, which is a waste of space. It currently runs through the entire html, and stores it in a multidimensional array, with text, tags, and attributes. This means twice the memory usage. I'm probably going to change that, and cause it to write out each tag (or what it's supposed to do) in order AS it parses it (and throw in some error handlers) and correct any layout problems at the end of the parsing.
I'm also concerned about standards, as web developers aren't always good developers, and leave tags without closing tags, and such.
Anyways, this is news, but I am making exciting progress, and I note that AT is as well!