Skip to content

rath/libxml2-java

Repository files navigation

libxml2-java

libxml2-java is Java language binding for well-known libxml2.

Document

Javadoc is available online.

Build from Source

You need essential build tools such as Java Development Kit 6 or higher, Gradle, GNU Make and most importantly you should have libxml2 development package on your system.

If you need to specify jdk directory manually over system default location, then use --with-jdk option

./configure --with-jdk=/opt/local/java

Otherwise, configure script would try to detect where JDK is installed on your system.

Mac OS X

$ sudo port install libxml2
$ ./configure 
$ gradle build 

Ubuntu

$ sudo apt-get install libxml2-dev
$ ./configure 
$ gradle build

CentOS

$ sudo yum install libxml2-devel
$ ./configure 
$ gradle build

While you freely run make command many times on your own hand, this step is not required. On running gradle build script, a task named processNativeResources will execute make.

TCMalloc

As libxml2 frequantly allocate small chunk of memory, it supports Google's TCMalloc for performance boost.

./configure --with-tcmalloc 

It requires system to have google/tcmalloc.h and -ltcmalloc.

Memory consideration

libxml2-java will free underlying native resources on Object.finalize() by default. It makes you not hassle with memory management issue. However If you have to claim it explicitly, Document.dispose(), XPathContext.dispose() and all nodes implement Disposable will do the job. Note that Docucment.dispose() will free all children nodes as well.

If you don't believe timing of Object.finalize() and calling dispose() method manually as I don't, libxml2-java allows you to handle memory de-allocation by calling autoDispose(). It will retain Disposable items to backend list until you claim LibXml.disposeAutoRetainedItems(). If it also makes you hassle, you can avoid it by calling LibXml.setAutoRetainEveryDisposable. This would retain every disposable objects automatically until you call LibXml.disposeAutoRetainedItems(). The backend list holding disposable items is not thread-safe and is managed by internal thread-local-storage. so you need to call LibXml.disposeAutoRetainedItems() on the same thread as the thread allocated (retained) that items.

Document doc = LibXml.parseString(xml).autoDispose();
// do your job freely.
LibXml.disposeAutoRetainedItems();

or

LibXml.setAutoRetainEveryDisposable();
// use document, xpath without calling dispose() or autoDispose()
LibXml.disposeAutoRetainedItems();

Calling LibXml.printTcmallocStat() allows you to investigate current allocated native memory map by printing status to standard output. If you configure libxml2-java without TCMalloc, LibXml.printTcmallocStat() won't print anything.

Examples

  • Print all child elements under the root node.
String xml = "<?xml version=\"1.0\"?><root><item /><item /><item /></root>";

Document doc = LibXml.parseString(xml);
Node rootNode = doc.getRootElement();
for(Node node : rootNode) {
  out.printf("%s: type=%s%n", node.getName(), node.getType());
}
  • Use libxml2-java as default DocumentBuilder by passing org.xmlsoft.jaxp.DocumentBuilderFactoryImpl as java.xml.parsers.DocumentBuilderFactory system property. Then, it allows you to start coding with the standard JAXP API.
DocumentBuilder builder = LibXml.createDocumentBuilderFactory().newDocumentBuilder();
// <?xml version="1.0"?><html><head /><body><p>Good morning</p><p>How are you?</p></body></html>
Document doc = builder.parse(new File("sample.xml"));
Assert.assertEquals("html", doc.getDocumentElement().getNodeName());
  • XPath
String xml = "<?xml version=\"1.0\"?>";
xml += "<root>";
xml += "<item>Apple</item>";
xml += "<item tag=\"1\">Bear</item>";
xml += "<item>Cider</item>";
xml += "</root>";

Document doc = LibXml.parseString(xml);
XPathContext ctx = doc.createXPathContext();
XPathObject result = ctx.evaluate("//item[@tag=\"1\"]");

out.println(result.getFirstNode().getChildText()); // Bear

Test

Compatibility with JAXP

SAX

SAXParserFactory implementation has been tested with

  • Apache Ant 1.9
  • Build simple projects
  • Build with Ivy
  • Build android projects
  • Apache Tomcat 7
  • Launched with web.xml, server.xml, context.xml, and my webapps works well as usual

by setting org.xmlsoft.jaxp.SAXParserFactoryImpl as javax.xml.parsers.SAXParserFactory system property then adding libxml2-java.jar on classpath.

DOM

DocumentBuilderFactory implementation has been tested with

  • Spring Framework 3.2
  • Simple app using Spring Data JPA

by setting org.xmlsoft.jaxp.DocumentBuilderFactoryImpl as javax.xml.parsers.DocumentBuilderFactory system property then adding libxml2-java.jar on classpath.

Unit tests

  • BasicTest: Test cases building DOM with XML and navigating dom tree
  • JaxpTest: Test cases with DocumentBuilderFactory
  • SaxTest: Test cases with bare and JSR SAX
  • XPathTest: Test cases for XPath APIs.
  • DomManipulationTest: Test cases for creating and update DOM.

Performance

libxml2-java is not so fast as I expected. The following is a brief comparison with Apache Xerces which is bundled on JDK with 100KB xml document. You can examine below comparison by running org.xmlsoft.test.RssTest.

Parsing as DOM

libxml2-java is simple wrapper for native libxml2. Document object that LibXml.parseFile returns and their children are lazy initialised on demand, for example Node.getName() directly calls NewStringUTF(env, xmlNodePtr->name). For that reason, returning Document object is 2 times faster than Apache Xerces, but when you start calling Node.getName() or Node.getChildText(), it shows same speed or even slower than Apache Xerces's implementation.

Parsing as SAX

Calling Java method from the native codes is obviously slow. Even though libxml2-java caches all core classes, jmethodID, jfieldID, and uses CallNonvirtualXXXMethod rather than CallXXXMethod, it almost 2 times slower than Apache Xerces. Aside from this issue, there are lots of byte/char conversion on every callback method. It makes SAX parsing performance of libxml2-java cannot beat implementation of pure java. Although I tried to put tricky codes to overcome this weakness, it didn't help a lot.

Notes

  • Make sure libxml2 library is configured with --with-threads option.

License

libxml2-java is licensed under MIT.

About

Java binding for libxml2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published