Archive for May, 2006

Did you ever need to index an xml doc

Thursday, May 18th, 2006

and preserve the xml information in the index? May I present “the XML Indexer“.

My brother, who’s very populer AJAX Bible app has been getting attention, needed an xml index of the KJV Bible. He asked if I could help him get it. We would be parsing the KJV in XML format and I needed to pull out the reference information for every occurence of every word. Well I thought an xml indexer might be useful in more than one capacity and there wasn’t much on the net or cpan with the capability to do it. It needed to be light and fast because it was going to be parsing the entire bible so a DOM parser was out of the question. So I wrote my own.

xml_indexer.pm is a module to index the words in an xml document and preserve the xml information about each occurence of the word. It’s a little rough around the edges right now but it works. It uses the expat parser so it’s light and fast. Look at the bible_index.pl script for an example of how it works. I’ll do a tutorial on it later.

Update:
This baby has been confirmed to parse the entire bible in Zaphania xml format in under 3 minutes. That is a 16 MB file. It spits out a 23 MB index in that space of time. Quite honestly it surprised me.

Mod_Perl 2.0 - Second in a series

Thursday, May 11th, 2006

My next mod_perl article is up:
Mod_Perl 2.0 Writing a Useful Handler Read it while it’s hot.

Wordpress Templating sucks…

Wednesday, May 3rd, 2006

Now don’t get me wrong. I really like the wordpress UI for posting and managing pages and posts and comments and spam. However they are sadly sadly lacking in the templating department. I have discovered this in the last few days after attempting to modify a template. I really dislike the way it all works. You aren’t templating your coding in PHP. and it’s a little obtuse that way. It’s probably just a matter of preference. Perhaps I should right a templating plugin. If that’s possible. I wonder…..?