Wednesday, December 27, 2006

XML data "duck typing"

What follows is a so-called best practice technique that I've gravitated into as a means to better facilitate the evolvability of my distributed software systems. This was originally posted in the follow-on discussion of a recent Ted Neward article Java, .NET, But Why Together?

One of the most paramount concerns in working with enterprise distributed software systems is the issue of tight coupling between nodes. This particular issue is one reason I prefer messaging vs invoking methods on a remote interface. Interface binding leads to very tight coupling. Tight coupling leads to difficulties in attempting to evolve a distributed system over time. In practice it is seldom feasible to rev tightly coupled nodes in lock step with one another. And the practice of versioning interfaces eventually becomes too burdensome if attempted pervasively.

When messaging is applied with certain tactics it can bring about some significant relief on this front.

Now as I was first starting out with my JMS distributed system development, I was conveying relatively complex XML documents around. These were being described in XSD and then tools like JAXB for Java and XsdObjectGen for .NET were used to generate suitable marshaling code. The documents could then be marshaled into .NET or Java object graphs with ease. This works pretty well (though Ted Neward can speak to some significant limitations and caveats) and the nature of XML Schema, as described by XSD, makes it possible to evolve document schema over time while not disrupting existing code that binds intimately to the resulting object graphs. (Yet one may sometimes still need to regenerate the code when the XSD changes which can require updating deployed code.)

However, I eventually drifted away from this practice and now use an approach that I dub XML data duck typing - a play on the term duck typing that Dave Thomas is well known for using in his Ruby talks.

Basically I was finding that processing nodes tended to have interest in subset information as gleaned from XML documents. This was just a natural evolutionary outcome of organic expansion/enhancement of the distributed software systems. Indeed it was very advantageous to encourage this tendency (a single message could have multiple nuances of effect based on what node processes it).

So I now write components such that they just use XPath to glean the information that they care about from XML documents. I have a technique and a few helper methods that make this a very easy and systematic approach. The upshot of this, though, is that a given node in the distributed system could care less how radically an XML document is evolving over time so long as the info it cares about can continue to be successfully gleaned. If it looks like a duck and quacks like a duck then...

Dropping the code generation tools and just going with XPath-based XML data duck typing has been a breath of fresh air as it breathed a new level of loose coupling into the distributed systems.

BTW, Ted Neward is all over this entire subject matter in a talk he is giving at developer conferences that he dubs XML Services. You can catch him speaking on this at both Java and .NET conferences.

In my distributed application architecture, there isn't really much OOP-think that goes on. Is mostly topology, data flow, event flow, with a lot of emphasis on filtering, routing, bridging, and transformation. OOP is pretty much just relegated to the internal design of distributed components.


  • I would like to get in touch with you via email. Can you drop me a line so I can write back to you with respect to J2EE vs .NET.

    By Anonymous Anonymous, at 5:25 AM  

  • i believe xml duck typing as well as conventional duck typing within a scripting vm makes a whole lot of sense, and my guess is that the reason is our knowledge about our data. when you live together with a person and you hear a sound from the kitchen you are not likely to rush and demand a legally valid identification. you just assume it's that person and you would be right, most of the time. yet in languages/cultures like java, xml, and xml schema that's what you must do on every single method definition: check this is an integer and that is a foo object. your programs would run with less, most of the time, and there can be a lot of breakage even with narrowly typechecked stuff.---btw, to the best of my knowledge, the very term ducktyping originated in a python context, and alex martelli is the suspect (see wikipedia.

    By Blogger flow, at 11:28 AM  

  • Hi Roger, I love XML messaging too. Rather hate the WSDL though (partly developed by a previous work colleague of ours). I prefer simpler XML than WSDL, and have developed a Interface Language for defining messages, and writing stubs for Java, and ActionScript.

    Ping me back (

    By Blogger John Allen, at 6:07 AM  

Post a Comment

<< Home