Archive for the 'Data' Category

Thursday, November 9th, 2006

This article was my life saver

Transactions as a debugging tool

Wednesday, June 7th, 2006

Have you ever wanted to test a long sql DDL script for syntax errors but didn’t want to actually create your db structure yet? I’ve found the easiest way to do this is through the use of transactions. simply begin a transaction at the start of the script and roll it back at the end of the script. For example:


-- PostgreSQL DDL script
BEGIN; -- begins our transaction block

CREATE TABLE test_tbl
(
 pk numeric NOT NULL,
 data varchar(128),
);

ROLLBACK; -- roll back everything this script just did
COMMIT; -- use this instead of ROLLBACK to commit the changes

This has the benefit of allowing us to test the script for errors and yet not actually run it on the DB. The EXPLAIN command can do this also on some DB’s but you would need it for every statement you wrote in the script and some statements will error out if you use EXPLAIN on them. I’ve found the Transaction method to work best for what I want to do.

Did you ever need to index an xml doc

Thursday, May 18th, 2006

and preserve the xml information in the index? May I present “the XML Indexer“.

My brother, who’s very populer AJAX Bible app has been getting attention, needed an xml index of the KJV Bible. He asked if I could help him get it. We would be parsing the KJV in XML format and I needed to pull out the reference information for every occurence of every word. Well I thought an xml indexer might be useful in more than one capacity and there wasn’t much on the net or cpan with the capability to do it. It needed to be light and fast because it was going to be parsing the entire bible so a DOM parser was out of the question. So I wrote my own.

xml_indexer.pm is a module to index the words in an xml document and preserve the xml information about each occurence of the word. It’s a little rough around the edges right now but it works. It uses the expat parser so it’s light and fast. Look at the bible_index.pl script for an example of how it works. I’ll do a tutorial on it later.

Update:
This baby has been confirmed to parse the entire bible in Zaphania xml format in under 3 minutes. That is a 16 MB file. It spits out a 23 MB index in that space of time. Quite honestly it surprised me.