Hello!
Tags:
The real power of this is what it allows you to do in your code. Adding Metadata is dead simple. You can organize and sort you data in any fashion you want to. You can update, change, or expand your data description on the fly with changes to the database. A whole world of possibilities begins to expand before you. Not every application can benefit from that kind of flexibility though. Certain Applications need the strict control over relationships that a traditional RDBMS design gives. Accounting applications for instance rely on strictly defined relationships. But if your application can benefit from this then it's a huge boon to your development and design to use.
I was going to use XML to store my Data. It offered the following benefits: It was a text file, and it had a number of ready to use parsers in all the common Server Scripting platforms. I could use the file anywhere. The first step, of course, was to decide on the XML elements to use and how they would be used in the document. I had to write an XML spec of sorts so I knew how to interpret the file.
First, I needed a Root element. XML documents require a document root element in order to be valid. We'll call that element the "map" element. After all, this document is going to amount to a sitemap of sorts. Which brings us to a side benefit of using XML. I can use the same file to generate a dynamic sitemap should I wish to and so can anyone else. I can provide this document publicly and anyone can host a way to get anywhere on my site from theirs. Who knows? It may be useful some day. Right now, our document looks like this:
< ?xml version='1.0' ?>; Next, we need to have an element that holds all the data about one link. We'll call that the "section" element since it describes a section of the site. We also need elements inside this element to hold all the pieces we need to know to build our menu. In this case we are storing the link, the description, and the name of each link for the menu. Those elements are:
< ?xml version='1.0' ?> Each section element can be repeated as many times as necessary. The section element can only hold one link, description, and name element. That concludes our specs for the XML document.
Now I had to select a parser to use, my platform for development at the time happened to be PHP. So what did PHP offer in the way of XML parsers? PHP actually had two parsers available to use. One was a SAX parser and the other is a DOM parser. At the time of this project, however, only the SAX Parser was included in the default distribution of PHP. So Sax it was.
SAX parsers are event driven parsers. They are simpler to learn but harder to implement than DOM parsers. Event driven parsers work by firing events when something happens as it goes through the text line by line. The events that fire are:
So, you write handlers for each element and the parser calls them as each event fires. First, we need to create a parser object using $parser = xml_parser_create(). Then, we need to create the handler functions, and assign them to the parser object using xml_set_element_handler($parser, 'startElement', 'endElement'). We also need to set our parser options using xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0). Here is our php code so far:
$currentElement = ""; // Begin code to create menu from xml file. $varFileXmlFile = "xml_databases/sitemap.xml"; $xmlFile = fopen ($varFileXmlFile, "r"); $xmlString = fread ($xmlFile, filesize ($varFileXmlFile) ); $strMenu = ""; $currentElement = ""; $name = "1"; $link = "2"; $title = "3"; function startElement ($parserHandle, $elementName, $attributes) { // declare the global variables here } // function to handle the end of an element function endElement ($parserHandle, $elementName) { //declare global variables here } // function to handle the data in an element. function cData ($parserHandle, $cdata) { //declare global variables here } $parser = xml_parser_create(); xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0); xml_set_element_handler($parser, 'startElement', 'endElement'); xml_set_character_data_handler($parser, 'cData'); Now we need to decide what happens when a start element is reached in our document. Since our goal is to retrieve the info from certain elements when they arrive, all we really need to know from this event is what element we are on. When the parser runs the startElement function it passes in the parser handle, element name and attribute list. All we have to do is add code to the function that stores the element's name in a global variable for other functions to use.
function startElement ($parserHandle, $elementName, $attributes) { // declare the global variables here global $currentElement; //identify current element here $currentElement = $elementName; } We declare $currentElement using the global keyword so the function will use the variable declared earlier in the script instead of creating a function specific variable to store it. We want other functions to be able to access the current element so they know what element they are working on.
We also need to decide what happens when an end element is reached in our document. We only really care about the end element for section elements since this is when we will store the values we gathered from the child elements for section. When the parser fires the endElement function it passes in the parser handle and the element name. So all we have to do is add a test to see if its a section end element, and then output the values we will store using the CDATA handler. Additionally we need to clear the $currentElement variable since we are no longer in that element any longer.
// function to handle the end of an element function endElement ($parserHandle, $elementName) { //declare global variables here global $currentElement; global $strMenu; global $name; global $title; global $link; $currentElement = ""; if ($elementName == "section") { $strMenu .= "- " $strMenu .= $name . "
\n"; } } Again we declared our variables with the global keyword so we would be able to retrieve the data from the variables we will store globally using the CDATA handler. In our case we want to output list item elements for inclusion in an unordered list later.
The last event we have to handle is when CDATA is reached. CDATA is text data that is not an XML element. In other words it's the data the elements are holding for us. When the cData Function is called by the parser it passes in the parser handle and the value it retrieved. For our purposes, we need to do something different with the data depending on which element we are inside of. If we are in a link element we store the value in our link variable and so on for all the other section sub elements like this:
// function to handle the data in an element. function cData ($parserHandle, $cdata) { //declare global variables here global $currentElement; switch ($currentElement) { case "link" : global $link; $link = $cdata; break; case "description" : global $title; $title = $cdata; break; case "name" : global $name; $name = $cdata; break; default : break; } } Now that we have handled all the events, we are ready to retrieve our XML file. So, we run the parser with the stored string from our XML file.
$varFileXmlFile = "xml_databases/sitemap.xml"; $xmlFile = fopen ($varFileXmlFile, "r"); $xmlString = fread ($xmlFile, filesize ($varFileXmlFile) ); xml_parse($parser, $xmlString, true);
// Begin code to create menu from xml file. $varFileXmlFile = "xml_databases/sitemap.xml"; $xmlFile = fopen ($varFileXmlFile, "r"); $xmlString = fread ($xmlFile, filesize ($varFileXmlFile) ); $strMenu = ""; $currentElement = ""; $name = "1"; $link = "2"; $title = "3"; // create handler functions here // function to handle the beginning of an element function startElement ($parserHandle, $elementName, $attributes) { // declare the global variables here global $currentElement; //identify current element here $currentElement = $elementName; } // function to handle the end of an element function endElement ($parserHandle, $elementName) { //declare global variables here global $currentElement; global $strMenu; global $name; global $title; global $link; $currentElement = ""; if ($elementName == "section") { $strMenu .= " < !ELEMENT tagname (tag contents) >. You need one of these for each of your template tags. The part in parentheses is a list of all the tags which can be inside your tag. Use EMPTY if the tag shouldn't contain any text or other tags, ANY if the tag can contain any kind of content, and PCDATA or CDATA for parsed character data or character data respectively. Next you need to define what attributes if any your tag can have. Each attribute is defined using an attlist declaration: < !ATTLIST tagname attribute name CDATA #REQUIRED > CDATA indicates the tag's default value should be character data. You can specify an actual value if you wish. The #REQUIRED pragma is one of several which tells the validators whether the attribute is fixed (#FIXED), assumed (#IMPLIED) or required (#REQUIRED). Lastly you need to tell the validator where all these tags fit in with the other XHTML1.1 tags. This is done through the magic of parameter entities. Parameter entites allow you to define a "variable" to represent a block of text in your DTD. These statements look like this: < !ENTITY % Misc.extra "| tagname | second tagname"> this appends the "| tagname | second tagname" string to the Misc.extra entity which is defined in the xhtml1.1 Modules. This particular one has the effect of adding those tags as accepted content to most anywhere in the body of an xhtml document. For a detailed breakdown of all the entities defined in the XHTML1.1 Modules you can look here. Now we've defined all our tags all we have to do is link the XHTML1.1 modules in with our DTD. We do this by declaring our own parameter entity which points to the XHTMl1.1 DTD and then including that DTD into our custom DTD like so: < !ENTITY % xhtml11.dtd PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> %xhtml11.dtd; The first line sets the xhtml11.dtd entity equal to the contents of the file located at the url. then the subsequent line uses that entity to include the file at the bottom of ours. Now we have a working DTD for our template language. Great!!! Now how do we use it? We upload our DTD to a web location for global access or somewhere on your intranet if you only need local access. Then write your doctype definitions to point there. Like so: < !DOCTYPE html PUBLIC "-//PhotoKit//DTD XHTML-PKTMPL 1.0//EN" "http://www.marzhillstudios.com/DTDs/PKTMPL-1.0.dtd"> Everything else in your web document will stay the same with the exception of any template tags you may put in there. It will validate in any standards based validator. and validating edtiors can perform real time error checking and perhaps even command assist for your template tags. Just another piece of value added to your app. Additional Reading:
package pk_cgi; require Exporter; use strict; our @ISA = qw(Exporter); our @EXPORT = qw(cgi_request new_header get_cookie_list get_cookie); sub new_header { my $proto = shift; my $class = ref($proto) || $proto; my $header = "Content-type: text/html\n\n"; return bless(\$header, $class); } What we did here is create a string with that all important last line in the header. This header is actually completely viable now. we could output it and it would be perfectly acceptable to the requesting application. It does not however have any cookies defined in it. Those will be added later if we should want them. We will use an object method for those. As you can see the object creation is almost absurdly simple. Just a string blessed into the object. Should you wish, you could also add arguments to set the mime type to something else. Now, what about those cookies? How do we handle those? Cookies are handled by lines in the header like this one "Set-Cookie: cookiename=cookievalue\n". That is a basic cookie header. you could also add some optional parameters: "Set-Cookie: cookiename=cookievalue; path=value; expires=value; domain=value\n" Our method needs to build the Set-Cookie line based on parameters and then prepend the line onto our header object. That prepend is very important, remember. because the line already in the header has to stay last. We will use a hash to pass the cookie's name and value pair into the method. If we have any of the optional parameters to set we can store those in the hash also. sub add_cookie { my $self = shift; my $Cookie = shift; my $String = "Set-Cookie: " . $$Cookie{name} . "="; $String .= "$$Cookie{value}"; if (exists ($$Cookie{path})) { ## set cookies path $String .= "; path="; $String .= $$Cookie{path}; } if (exists ($$Cookie{expires})) { ## set cookies expiration $String .= "; expires="; $String .= $$Cookie{expires}; } if (exists ($$Cookie{domain})) { ## set cookies domain $String .= "; domain="; $String .= $$Cookie{expires}; } $$self = $String . "\n" . $$self; } Again the method is absurdly simple. using the values from the hash to build the header line and prepending it to the header object's string. what if we want multiple values in our cookies though? We could just set a whole bunch of cookies for each name=value pair we needed, but for some applications this would quickly get unwieldy to use. What we need is multivalue cookies. Happily such a thing is possible. We just have to work out a way to separate a cookie's value string into sub name=value pairs. To do this we need a separator for each pair and one to separate each name from the value. These separators are arbitrary, but you probably want to use something that isn't likely to occur in your names or values. Also the ; and = is not a good idea since it is used elswhere in the HTTP header as a separator. I chose the ":" and the "," to act as separators. The Colon separates names from values and the Comma separates name:value pairs. Once we have our separators we need to change our object method so it can recognize if we are setting multivalue or single value cookies and act accordingly. We still use the hash to pass the values, but this time if its a multivalue cookie the value key of the hash stores a reference to another hash holding the name=value pairs for our multivalue cookie. We need to test for the presence of this hash and generate our Set-Cookie line accordingly. Here is our new object method. sub add_cookie { my $self = shift; my $Cookie = shift; my $String = "Set-Cookie: " . $$Cookie{name} . "="; if (ref($$Cookie{value}) eq "HASH") { #print "its a hash"; my $CookieValue = $$Cookie{value}; foreach my $key (sort(keys(%$CookieValue))) { $String = $String . $key . ":" . $$CookieValue{$key} . ","; } #print $String; } else { #print "its not a hash"; #print $$Cookie{value}; $String .= "$$Cookie{name}:$$Cookie{value}, "; } if (exists ($$Cookie{path})) { ## set cookies path $String .= "; path="; $String .= $$Cookie{path}; } if (exists ($$Cookie{expires})) { ## set cookies expiration $String .= "; expires="; $String .= $$Cookie{expires}; } if (exists ($$Cookie{domain})) { ## set cookies domain $String .= "; domain="; $String .= $$Cookie{expires}; } $$self = $String . "\n" . $$self; #print $$self; } We changed several things in order to facilitate retrieval of our cookies later with the new multivalue cookie format. Since, while retrieving our cookies, we don't know whether the cookie is a multivalue cookie or not storing the single value cookie in the same format will make it easier on us during retrieval. The new method checks for a hash reference in $$Cookie{value} if there is a hash reference then it stores the multiple values in that hash. If there isn't a hash then it stores the single value in the same format. We can now handle setting cookies in our CGI module. All we have left is retrieving the cookies. There are two ways we might want to retrieve the cookies we've set in our application: Retrieving a list of all the cookies sent, or retrieving a cookie by name. Retrieving by name is probably the most useful of the two so lets start with that one. First we need to pull the list of cookies out of the header. then we need to locate the cookie we want to find and finally we need to return that cookies value or list of name=value pairs. Since we had to foresight to store single values in the same format as multivalues we made it a little easier on ourselves. We can treat single value cookies the same as multivalue cookies. We will pass the cookie id in as a string. When a browser application sends cookies to the server they get stored in perl's %ENV hash under the HTTP_COOKIE key. If the key doesn't exist then there were no cookies. So lets get started on that method. We pull the list of cookies out of the header like so: sub get_cookie { my $CookieId = shift; my %CookieVars; if (exists $ENV{'HTTP_COOKIE'}) { my @buffer = split(/;/,$ENV{'HTTP_COOKIE'}); } else { $CookieVars{Status} = 0; return 0; } } A return value of 0 means no cookies were found. The cookies are stored in a buffer array after being split on the ";". The next thing we need to do is extract the name value pairs of each of the cookies and return the cookie we are looking for. foreach my $i (@buffer) { #print $i; (my $Name, my $Value) = split(/=/,$i); if ($CookieId eq $Name) { my @buffer2 = split(/,/, $Value); foreach my $y (@buffer2) { (my $CVar, my $CVal) = split(/:/, $y); $CookieVars{$CVar} = $CVal; #print "$CVar = $CVal"; } $CookieVars{Status} = 1; return %CookieVars; } } After storing the cookies in the buffer we step through the buffer and split the cookie on the equal sign storing the name and the value. Then using an if statement we test to see if the name is the same as the cookie ID. When we locate the cookie we want we store its name:value pairs in a hash and and return said hash. It doesn't matter whether the cookie had multiple values or not it still returns a hash. Lastly if we couldn't find the cookie we were looking for we set the status field to 0 in the hash and return the hash. When we retrieve a cookie we can check this status field for a value of 0 to see if the cookie existed. A value of 1 means the cookie did exist. Here is the complete method: sub get_cookie { my $CookieId = shift; my %CookieVars; if (exists $ENV{'HTTP_COOKIE'}) { my @buffer = split(/;/,$ENV{'HTTP_COOKIE'}); foreach my $i (@buffer) { #print $i; (my $Name, my $Value) = split(/=/,$i); if ($CookieId eq $Name) { my @buffer2 = split(/,/, $Value); foreach my $y (@buffer2) { (my $CVar, my $CVal) = split(/:/, $y); $CookieVars{$CVar} = $CVal; #print "$CVar = $CVal"; } $CookieVars{Status} = 1; return %CookieVars; } } } else { $CookieVars{Status} = 0; return %CookieVars; } } Retrieving a list of cookies is actually much easier to do. Here is the code for the method, I'll leave it as an exercise for the reader to interpret it. sub get_cookie_list { my @buffer = split(/;/,$ENV{'HTTP_COOKIE'}); my %cookies; foreach my $i (@buffer) { (my $Name, my $Value) = split(/=/,$i); my @CookieValues = split(/,/, $Value); my %CookieVars; foreach my $j (@CookieValues) { (my $CookieVariable, my $CookieValue) = split(/:/, $j); $CookieVars{$CookieVariable} = $CookieValue; } $cookies{$Name} = \%CookieVars; } return %cookies; } Using the module is also an easy matter. Simply create a new header object using the constructor method. Add your cookies using the appropriate object methods and output the header before printing any thing else on your page. A simple script using it is shown below. use pk_cgi; my $header = pk_cgi->new_header; $new_cookie = {name => "name", value => "some value"}; $header->add_cookie($new_cookie); print $header->get_header; %cookie = pk_cgi::get_cookie("name"); print "< ?xml version='1.0' encoding='UTF-8'?> < !DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'> "; print " "; In a later article I will address retrieving get and post variables from forms. Additional Reading: * http://www.perldoc.com/ * http://www.cgi101.com/class/ * http://www.ltsw.se/knbase/internet/mime.htp You may also be interested in these tutorials and articles: Mod_Perl 2.0 how-to's