Thursday, June 30, 2011

SimpleRDFElement Class: an extension of SimpleXML

In my last post, SimpleXML and Namespace Quirks, I complained about how bad the namespace handling is in the SimpleXML framework in PHP.

Since then, I have searched all over the web looking for pre-made solutions to the problem of parsing RDF XML that is heavy with namespace use, and found nothing that fits my needs. I did find some extensive RDF parsing frameworks written in PHP, but they were way too involved: I don't want to have to install a dozen class files when all I want to do is convert a simple RDF/XML string into triples. I also found some "simple" solutions that were inadequate, like simply replacing the ":" character in the string to an "_" character, so that it no longer had to deal with namespaces at all. (This is a terrible solution because namespace prefixes are just "shortcuts" to a URL, and different people can use different prefix characters to represent the same namespace.)

So, on failure to find anything acceptable, I wrote my own solution.

Please enjoy SimpleRDFElement: a class that extends the SimpleXMLElement class, and that therefore can be used elegantly hand-in-hand with code for SimpleXML

The source code is one file: simplerdfelement.php (opens as plain txt file)

(Obviously save it as a .php file and include it in your php script to use it)

If you have your RDF-style XML stored as text in a string variable $xml, then you can create your SimpleRDFElement Object this way:

$xmlobj = simplexml_load_string($xml,'SimpleRDFElement');

The resulting object, $xmlobj, acts just like a SimpleXMLObject, except you have a few more methods available:

$xmlobj->getPrefix()
Returns the namespace prefix of the root element of the object, based on the namespace definitions defined by the XML text.
$xmlobj->getNamespace()
Returns the full URI of the namespace of the root element of the object, based on the namespace definitions defined by the XML text.
$xmlobj->getFullName()
Returns the full qualified name of the root element, using the prefix-colon-tagname format, e.g. rdfs:Class
$xmlobj->getFullURI()
Returns the full URI of the root element, using the expanded URI of the namespace followed by the element tag name, e.g. http://www.w3.org/2000/01/rdf-schema#Class
$xmlobj->getChildNodes()
Returns an array of all of the child elements (as SimpleRDFElement Objects) of the current top-level element. Unlike the built-in children() method, this returns all child elements regardless of namespace.
$xmlobj->getAttributes()
Returns an array of all of the attributes (as individual SimpleRDFElement Objects) of the current top-level element. Unlike the built-in attributes() method, this returns all attributes regardless of namespace.
$xmlobj->getTriples()
Returns an array SimpleRDFTriple objects. This is a simple helper class that defines an object with three properties: tripleSubject, triplePredicate, tripleObject. This method parses the top level element and constructs triples based on that element, its attributes, and its immediate child elements. It is not recursive.

No comments:

Post a Comment