libxml++ - An XML Parser for C++

Murray Cumming


This is an introduction to libxml's C++ binding, with simple examples.

Table of Contents

UTF-8 and Glib::ustring
Compilation and Linking
DOM Parser
SAX Parser
TextReader Parser


libxml++ is a C++ API for the popular libxml XML parser, written in C. libxml is famous for its high performance and compliance to standard specifications, but its C API is quite difficult even for common tasks.

libxml++ presents a simple C++-like API that can achieve common tasks with less code. Unlike some other C++ parsers, it does not try to avoid the advantages of standard C++ features such as namespaces, STL containers or runtime type identification, and it does not try to conform to standard API specifications meant for Java. Therefore libxml++ requires a fairly modern C++ compiler such as g++ 3.

But libxml++ was created mainly to fill the need for an API-stable and ABI-stable C++ XML parser which could be used as a shared library dependency by C++ applications that are distributed widely in binary form. That means that installed applications will not break when new versions of libxml++ are installed on a user's computer. Gradual improvement of the libxml++ API is still possible via non-breaking API additions, and new independent versions of the ABI that can be installed in parallel with older versions. These are the general techniques and principles followed by the GNOME project, of which libxml++ is a part.


libxml++ is packaged by major Linux and *BSD distributions and can be installed from source on Linux and Windows, using any modern compiler, such as g++, SUN Forte, or MSVC++.

For instance, to install libxml++ and its documentation on debian, use apt-get or synaptic like so:

    # apt-get install libxml++2.6-dev libxml++2.6-doc

To check that you have the libxml++ development packages installed, and that your environment is working properly, try pkg-config libxml++-2.6 --modversion.

The source code may be downloaded from . libxml++ is licensed under the LGPL, which allows its use via dynamic linking in both open source and closed-source software. The underlying libxml library uses the even more generous MIT licence.

UTF-8 and Glib::ustring

The libxml++ API takes, and gives, strings in the UTF-8 Unicode encoding, which can support all known languages and locales. This choice was made because, of the encodings that have this capability, UTF-8 is the most commonly accepted choice. UTF-8 is a multi-byte encoding, meaning that some characters use more than 1 byte. But for compatibility, old-fashioned 7-bit ASCII strings are unchanged when encoded as UTF-8, and UTF-8 strings do not contain null bytes which would cause old code to misjudge the number of bytes. For these reasons, you can store a UTF-8 string in a std::string object. However, the std::string API will operate on that string in terms of bytes, instead of characters.

Because Standard C++ has no string class that can fully handle UTF-8, libxml++ uses the Glib::ustring class from the glibmm library. Glib::ustring has almost exactly the same API as std::string, but methods such as length() and operator[] deal with whole UTF-8 characters rather than raw bytes.

There are implicit conversions between std::string and Glib::ustring, so you can use std::string wherever you see a Glib::ustring in the API, if you really don't care about any locale other than English. However, that is unlikely in today's connected world.

glibmm also provides useful API to convert between encodings and locales.

Compilation and Linking

To use libxml++ in your application, you must tell the compiler where to find the include headers and where to find the libxml++ library. libxml++ provides a pkg-config .pc file to make this easy. For instance, the following command will provide the necessary compiler options: pkg-config libxml++-2.6 --cflags --libs

When using autoconf and automake, this is even easier with the PKG_CHECK_MODULES macro in your file. For instance:

    PKG_CHECK_MODULES(SOMEAPP, libxml++-2.6 >= 2.10.0)