Simplifying Mapping XML Docs

What’s New in Apache Commons Digester 3.0?

Jessica Thornsby

Apache Commons Digester is one of the oldest Apache Commons components.

Version 3.0 of the Apache Commons Digester project was recently released, with a major rewrite to the original Apache Commons Digester implementation. In this interview, we speak to ASF PMC member Simone Tripodi, on what’s new in this latest release.

JAXenter: Can you introduce us to the Apache Commons Digester project?

Simone Tripodi: Apache Commons Digester is one of the oldest Apache Commons components. It aims to simplify the task of mapping XML documents to Java objects, configuring events (Rule, in the Digester language) that have to be triggered when certain parsing events, or better, when XML elements pattern, match while ingesting the document stream. That is indeed the main difference between Apache Commons Digester and other XML mappers: it is able not just to create objects and set their properties, but it can invoke arbitrary methods, so its use cases can vary from the simply XML <-> POJOs mapping to – like some user reported in the Mailing List – creating Apache Lucene Documents and indexing them, or ingesting large RDF documents and storing data on TripleStores, or processing an XML pipeline – like Apache Cocoon 3 – result and store data in an RDBMS, and the most interesting thing is that all of these operations can be performed while avoiding the mapping to temporary POJOs!

The Apache Commons Digester component was originally developed inside the Apache Struts project, to make the actions descriptor ingestion easier, then the codebase was extracted and contributed, at that time, to Jakarta Commons. During its lifecycle development, relevant and influential Apache committers such as Geir Magnusson Jr. (one of the Apache Board Directors), Jason Van Zyl (the Apache Maven creator), Jeanfrançois Farcand (the Asynchronous HTTP Client creator), Rahul Akolkar (W3C’s State Chart XML co-editor, Apache Member), Tim O’Brian (Apache Maven) and Simon Kitching (Apache Commons, Apache MyFaces) have actively contributed to Apache Commons Digester development.

JAXenter: The original Apache Commons Digester implementation has been rewritten for the recent 3.0 release. What are the benefits, of this rewrite?

Simone: To better understand what the Apache Commons Digester 3.0 benefits are, it is necessary to remark on the difference between obtaining a Digester instance and then configuring it, and being given (a set of) configuration(s) and obtaining a Digester instance. Even if both approaches sound complementary, the core concept is the assumption that every Digester instance is not thread-safe, which implies that in a multi-thread application, users often have to re-instantiate the Digester and reconfigure it. There’s nothing wrong with that approach, but configurations are not reusable! The RuleSet interface tries to fill, in some way, the reuse that configurations lack. Anyway, it doesn’t represent a configuration, indeed:

* it just sets rules to a given Digester instance;
* configuring more than one Rule to the same pattern requires the pattern to be explicitly specified in times for how many rules match, which violates the DRY principle;
* Rules semantic is not intuitive, since their creation is strictly related to methods/constructors arguments.

In the new Digester, RuleSet use has been ‘suppressed’ in favor of RulesModule, which allows Rules configurations to be expressed via fluent APIs, making rules semantic simpler to understand. The key feature of Digester 3 is expressing rule bindings using an embedded DSL APIs collection, which speaks more in English, rather than in a programming language!

Anyway, conservative developers are not forced to migrate their applications; they are still allowed to use older APIs!

Yet another interesting feature is the Rules binding errors reporting improvement: with the older Apache Commons Digester release, users were able to get Rules binding error at runtime. Instead, the new Digester tries as much as possible to check patterns/rules binding errors during the bootstrap; avoiding exceptions, such as Classes that cannot be loaded or found in the current ClassLoader, for example, during the parsing operations.

A detailed errors list of wrong bindings is reported when the loader attempts to create a new Digester instance, and not when running it; which would also make debugging operations easier, since developers obtain a report of the bigger part of errors when obtaining a new Digester instance, and don’t need to execute a test to fix iteratively potential bugs.

JAXenter: Can you tell us about the new universal loader?

Simone: Older Apache Commons Digester versions were able to load Digester Rules via extensions, such as the XML rules descriptor, and lately using Java5 Annotations that reflect Digester rules; the main disadvantage of this approach was that every extension required its own DigesterLoader, so mixing configurations – some Rules from XML, some from Annotations and some manually bound – was not immediate. In the new Digester version, extensions have been re-designed in order to inherit from the core Rules binder – every extension is now a RulesModule – and the effort of instantiating a new Digester instance is totally delegated to a unique DigesterLoader, which analyzes all the RulesModule instances, and so it is ready to create new Digester instances with pre-filled rules!

JAXenter: How has the reusability of Digester configurations been improved, in this release?

Simone: Users can now create generic Digester configuration(s) simply by implementing the RulesModule, packaging and redistributing them in separate packages – I’d love it if users would start to contribute, by sharing their reusable Modules! For instance, I would imagine that generic use cases such an Atom parser, where Atom APIs and Digester rules can be reused in more than a single contest; users would be free to use their preferred extension approach – nothing prevents them from providing a set of annotated POJOs with Digester Annotations, and a ready-to-use RulesModule that just needs to be passed to the DigesterLoader – that would save a lot of developing time!

JAXenter: What are the next steps for the project?

Simone: Like in all Apache projects, in Apache Commons every single component path needs to be reviewed by the Project Management Committee that, via unanimous vote, decides which features will, and will not, be included in the new releases. I would personally try to start some experiment writing Digester adapters for different formats, besides XML, such as JSON and YAML, where the concept of pattern/rule continues to be valid. I’ll made a new proposal as soon as I’m ready to submit at least a working Proof of Concept, but I can’t guarantee this will be a new release feature because, as I said, Apache projects have the characteristic of not being a “one man band” project but rather being really community driven – one of our mantras is “community over the code!”

And the new Apache Commons Digester 3 is not an exception, many special people contributed to making that release a reality:

* Rahul Akolkar, Luc Maisonobe and Phil Steiz, for mentoring;
* James Carman, who provided the initial idea of building a Digester with fluent APIs;
* Matt Benson, for having influenced the DSL;
* Joerg Shaible, for spending time with his “compiler zoo” for reviewing;
* Daniele Testa, who provided the new Digester3 logo.

Thanks JAXenter for your interest on the new Apache Commons Digester release, and thanks Jessica for the big opportunity to speak about it!

For those interested in knowing more about Apache Commons Digester, they can visit the documentation site. For any questions, feedback, or contributions, don’t forget to subscribe to the users and developers Mailing List!!! I hope to hear from you soon!

Inline Feedbacks
View all comments