Wednesday, December 27, 2006

XML data "duck typing"

What follows is a so-called best practice technique that I've gravitated into as a means to better facilitate the evolvability of my distributed software systems. This was originally posted in the follow-on discussion of a recent Ted Neward article Java, .NET, But Why Together?

One of the most paramount concerns in working with enterprise distributed software systems is the issue of tight coupling between nodes. This particular issue is one reason I prefer messaging vs invoking methods on a remote interface. Interface binding leads to very tight coupling. Tight coupling leads to difficulties in attempting to evolve a distributed system over time. In practice it is seldom feasible to rev tightly coupled nodes in lock step with one another. And the practice of versioning interfaces eventually becomes too burdensome if attempted pervasively.

When messaging is applied with certain tactics it can bring about some significant relief on this front.

Now as I was first starting out with my JMS distributed system development, I was conveying relatively complex XML documents around. These were being described in XSD and then tools like JAXB for Java and XsdObjectGen for .NET were used to generate suitable marshaling code. The documents could then be marshaled into .NET or Java object graphs with ease. This works pretty well (though Ted Neward can speak to some significant limitations and caveats) and the nature of XML Schema, as described by XSD, makes it possible to evolve document schema over time while not disrupting existing code that binds intimately to the resulting object graphs. (Yet one may sometimes still need to regenerate the code when the XSD changes which can require updating deployed code.)

However, I eventually drifted away from this practice and now use an approach that I dub XML data duck typing - a play on the term duck typing that Dave Thomas is well known for using in his Ruby talks.

Basically I was finding that processing nodes tended to have interest in subset information as gleaned from XML documents. This was just a natural evolutionary outcome of organic expansion/enhancement of the distributed software systems. Indeed it was very advantageous to encourage this tendency (a single message could have multiple nuances of effect based on what node processes it).

So I now write components such that they just use XPath to glean the information that they care about from XML documents. I have a technique and a few helper methods that make this a very easy and systematic approach. The upshot of this, though, is that a given node in the distributed system could care less how radically an XML document is evolving over time so long as the info it cares about can continue to be successfully gleaned. If it looks like a duck and quacks like a duck then...

Dropping the code generation tools and just going with XPath-based XML data duck typing has been a breath of fresh air as it breathed a new level of loose coupling into the distributed systems.

BTW, Ted Neward is all over this entire subject matter in a talk he is giving at developer conferences that he dubs XML Services. You can catch him speaking on this at both Java and .NET conferences.

In my distributed application architecture, there isn't really much OOP-think that goes on. Is mostly topology, data flow, event flow, with a lot of emphasis on filtering, routing, bridging, and transformation. OOP is pretty much just relegated to the internal design of distributed components.