Social Software powered by Instant Communities
Springe direkt: zur Navigationzum Inhaltzur Sidebar

Archiv für 'XML'

Importing topic maps with QuaaxTMIO

07.11.2010

Last week, version 0.5.4 of the QuaaxTM PHP Topic Maps engine was released which, finally, brought XTM 2.1 read/write support via the QuaaxTMIO library. QuaaxTM now provides a basic tool set of Topic Maps technologies enabling PHP developers to build subject centric applications on the LAMP stack.

An essential feature is the import of topic maps. Topic maps (TMDM instances) can be serialized to one of the various Topic Maps syntaxes such as XTM 1.0/2.0/2.1, CTM, LTM, AsTMa=, or JTM to allow data interchange. Let’s see how to import a publicly available topic map from Maiana using the QuaaxTMIO library; I choose Lutz Maicher‘s  topic map about Donald Duck in XTM 2.1 for the following example.

QuaaxTM has to be set up like described in README (maybe you’d like to tun the PHPUnit tests afterwards). You’ll need three classes then which you have to require_once:

  • TopicMapSystemFactory.class.php (from lib/phptmapi2.0/core)
  • PHPTMAPITopicMapHandler.class.php (from lib/quaaxtmio/src/in)
  • XTM201TopicMapReader.class.php (from lib/quaaxtmio/src/in)

First, get a TopicMapSystem:
$tmSystemFactory = TopicMapSystemFactory::newInstance();
// QuaaxTM specific feature
$tmSystemFactory->setFeature(
VocabularyUtils::QTM_FEATURE_AUTO_DUPL_REMOVAL, false
);
$tmSystem = $tmSystemFactory->newTopicMapSystem();

Import the topic map using the XTM 2.0/2.1 parser and a topic map handler:

$tmHandler = new PHPTMAPITopicMapHandler($tmSystem, 'http://localhost/topicmaps/1');
$reader = new XTM201TopicMapReader($tmHandler);
$reader->readFile(
'http://maiana.topicmapslab.de/u/lmaicher/tm/ducks/download.xtm'
);

Finally access the imported topic map via PHPTMAPI 2.0:

$tm = $tmSystem->getTopicMap('http://localhost/topicmaps/1');
$topics = $tm->getTopics();
echo count($topics);// 278

xml_set_character_data_handler: Beware of chunks!

19.05.2008

Die Behandlung von Character Data mit PHPs XML Parser (basierend auf James Clarks expat) weist eine undokumentierte Eigenart auf. Die jeweils definierte Callback-Funktion für Character Data, z. B.


xml_set_character_data_handler($this->sax, 'data');
...
private function data($sax, $data) {
$this->data .= (string) $data;
}

wird Datengrößen-bedingt mehrfach gerufen (siehe http://www.php.net/manual/en/function.xml…).

Ich konnte das Splitting-Phänomen für CDATA-Sections sowie “einfache” Strings, die Umlaute enthalten, wie folgt nachvollziehen:

CDATA-Sections: “”, Inhalt CDATA-Section, “”. Also drei Iterationen mit dem Muster [empty], [Daten], [empty].
Einfache Strings mit Umlauten, z. B. “Hallö Günther”: “”, “”, “”, “”, “”, “”, “Hall”, “ö Günther”. Also acht Iterationen mit diesem Muster: Alles vor dem ersten Umlaut wird in der vorletzten Iteration erfasst, der Rest in der letzten.

Ohne Concatenation-Operator (.) in der Wertezuweisung wird man hier also vergeblich auf den vollständigen Zeichendaten-Inhalt warten. Das obige Beispiel ist bereits entsprechend korrigiert.

You are currently browsing the archives for the XML category.

Creative Commons License
This work is licensed under a
Creative Commons Attribution-Share Alike 2.5 License.
t8d blogged mit WordPress