Table of Contents
Like the underlying libxml library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.
All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.
Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.
The DOM parser parses the whole document at once and stores the structure in memory, available via Parser::get_document()
. With methods such as Document::get_root_node()
and Node::get_children()
, you may then navigate into the heirarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.
You should use C++ RTTI (via dynamic_cast<>
) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only Element
s have attributes. Here is the inheritance hierarchy of node types:
xmlpp::Node:
xmlpp::Attribute
xmlpp::ContentNode
xmlpp::CdataNode
xmlpp::CommentNode
xmlpp::ProcessingInstructionNode
xmlpp::TextNode
xmlpp::Element
xmlpp::EntityReference
Although you may obtain pointers to the Node
s, these Node
s are always owned by their parent Nodes. In most cases that means that the Node will exist, and your pointer will be valid, as long as the Document
instance exists.
There are also several methods which can create new child Node
s. By using these, and one of the Document::write_*()
methods, you can use libxml++ to build a new XML document.
This example looks in the document for expected elements and then examines them. All these examples are included in the libxml++ source distribution.
File: main.cc
#ifdef HAVE_CONFIG_H #include <config.h> #endif #include <libxml++/libxml++.h> #include <iostream> void print_indentation(unsigned int indentation) { for(unsigned int i = 0; i < indentation; ++i) std::cout << " "; } void print_node(const xmlpp::Node* node, unsigned int indentation = 0) { std::cout << std::endl; //Separate nodes by an empty line. const xmlpp::ContentNode* nodeContent = dynamic_cast<const xmlpp::ContentNode*>(node); const xmlpp::TextNode* nodeText = dynamic_cast<const xmlpp::TextNode*>(node); const xmlpp::CommentNode* nodeComment = dynamic_cast<const xmlpp::CommentNode*>(node); if(nodeText && nodeText->is_white_space()) //Let's ignore the indenting - you don't always want to do this. return; const Glib::ustring nodename = node->get_name(); if(!nodeText && !nodeComment && !nodename.empty()) //Let's not say "name: text". { print_indentation(indentation); const Glib::ustring namespace_prefix = node->get_namespace_prefix(); if(namespace_prefix.empty()) std::cout << "Node name = " << nodename << std::endl; else std::cout << "Node name = " << namespace_prefix << ":" << nodename << std::endl; } else if(nodeText) //Let's say when it's text. - e.g. let's say what that white space is. { print_indentation(indentation); std::cout << "Text Node" << std::endl; } //Treat the various node types differently: if(nodeText) { print_indentation(indentation); std::cout << "text = \"" << nodeText->get_content() << "\"" << std::endl; } else if(nodeComment) { print_indentation(indentation); std::cout << "comment = " << nodeComment->get_content() << std::endl; } else if(nodeContent) { print_indentation(indentation); std::cout << "content = " << nodeContent->get_content() << std::endl; } else if(const xmlpp::Element* nodeElement = dynamic_cast<const xmlpp::Element*>(node)) { //A normal Element node: //line() works only for ElementNodes. print_indentation(indentation); std::cout << " line = " << node->get_line() << std::endl; //Print attributes: const xmlpp::Element::AttributeList& attributes = nodeElement->get_attributes(); for(xmlpp::Element::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) { const xmlpp::Attribute* attribute = *iter; print_indentation(indentation); const Glib::ustring namespace_prefix = attribute->get_namespace_prefix(); if(namespace_prefix.empty()) std::cout << " Attribute " << attribute->get_name() << " = " << attribute->get_value() << std::endl; else std::cout << " Attribute " << namespace_prefix << ":" << attribute->get_name() << " = " << attribute->get_value() << std::endl; } const xmlpp::Attribute* attribute = nodeElement->get_attribute("title"); if(attribute) { std::cout << "title found: =" << attribute->get_value() << std::endl; } } if(!nodeContent) { //Recurse through child nodes: xmlpp::Node::NodeList list = node->get_children(); for(xmlpp::Node::NodeList::iterator iter = list.begin(); iter != list.end(); ++iter) { print_node(*iter, indentation + 2); //recursive } } } int main(int argc, char* argv[]) { // Set the global C++ locale to the user-configured locale, // so we can use std::cout with UTF-8, via Glib::ustring, without exceptions. std::locale::global(std::locale("")); std::string filepath; if(argc > 1 ) filepath = argv[1]; //Allow the user to specify a different XML file to parse. else filepath = "example.xml"; #ifdef LIBXMLCPP_EXCEPTIONS_ENABLED try { #endif //LIBXMLCPP_EXCEPTIONS_ENABLED xmlpp::DomParser parser; //parser.set_validate(); parser.set_substitute_entities(); //We just want the text to be resolved/unescaped automatically. parser.parse_file(filepath); if(parser) { //Walk the tree: const xmlpp::Node* pNode = parser.get_document()->get_root_node(); //deleted by DomParser. print_node(pNode); } #ifdef LIBXMLCPP_EXCEPTIONS_ENABLED } catch(const std::exception& ex) { std::cout << "Exception caught: " << ex.what() << std::endl; } #endif //LIBXMLCPP_EXCEPTIONS_ENABLED return 0; }