Richard Deadman
It's the latest buzzword in the internet application arena; XML, the
open-standards child of SGML that promises to provide platform and language
neutral data encapsulation and separate application logic from application
data. It's hot. It's powerful. Everyone loves it.
But isn't this exactly what CORBA already provides? In this article we will explore why XML is so hot, show how XML can be used as a distributed computing protocol, and look at its advantages, disadvantages and appropriate uses. We will also speculate on some of the non-technical forces driving the XML phenomenon. |
XML is being touted as the perfect partner for Java. Java supports the development of web-aware, platform-neutral applications while XML is a platform-neutral document description meta-language. And documents are a very old form of persistent information dating back to papyrus texts in the Nile delta. Web-centric, vendor-neutral behaviour and data. Like a web start-up, both technologies have great promise, and maybe greater hype. Small wonder that most of the XML tools that are becoming available are written in Java.
What are the advantages? Well, lets take a particular example. Suppose that presently corporate timesheets are created using an HTML editor. This is great in that the creator is free to pick the most appropriate tool on the most appropriate platform for her. As well, anyone can view the timesheets with ubiquitous technology. But how does the corporation collect the information. Humans infer the day/project information by their placement within tables, but this kind of inference is very hard for a computer application to do accurately. However powerful they may be at crunching numbers, computers are not really very smart.
XML offers the computer help. We can define XML tags that state that we are now in a day with a day number attribute, and have embedded tags in that day for each project and number of hours spent on that project. As well, we can now define multiple style-sheets for the XML document that allow payroll and managers to view the same document with different views. The advantages are two-fold: explicit structure for data manipulation and decoupling of the presentation from the data to allow for different views.
Wonderful, you say. Better structured documents than HTML can give us.
Of course, despite many predictions, XML will not cause HTML to disappear. There are just too many instances where the data being captured for presentation does not have enough "structural value" to justify the added cost of developing the DTD and XSL. Think of your average home page on the world of gum collecting. There are many cases where the information value or need for management is not sufficient to justify the cost of creating a structure and presentation layer. For these cases, HTML, with its predefined generic structure and presentation, is the logical choice. Indeed, HTML may become one of a list of predefined XML instances, as it is modified slightly to fit the XML syntax.
Microsoft has taken this approach in the definition of CDF, an XML-based channel definition format that it released in March of 1997, when PointCast and channels were all the rage. CDF defines channels and software distribution meta-information, such as logos, abstracts and modification dates. The last modification date is, for instance, an attribute string of format "yyyymmmddhhmmss" that indicates in Greenwich Mean Time when the channel was last changed. It is repeated in several elements, since XML does not support element inheritance.
It would be a simple matter to define the CDF as a CORBA IDL that supports a reusable date type. In fact, DTD to IDL mapping is much simpler than IDL to DTD mapping, since DTDs only support strings.
Here's the briefest backgrounder on CORBA you're ever likely to read. Basically CORBA is an distributed object framework. It creates local proxy objects for remote objects that your code wishes to talk to. When you send the local proxy object a message, including parameters, the proxy marshals the data to the remote system, calls the same method on the remote object, and returns the result back to you. In-out parameters, remote exceptions, one-way messages, contexts, security frameworks, naming, trader, transaction and other services also exist, of course.
The magic of CORBA is that your client can be a Java applet running on a linux PC sending messages to objects in a Smalltalk application on a Solaris server. And the parameters can be basic types (int, String, float, etc.) or any object whose class has an IDL structure mapping. What is IDL? Just as XML allows for a DTD to share the structure of the tag sets, the Interface Definition Language is the language neutral shared definition of the object types (and remote messages they support). It is what it says it is, a language-neutral interface definition language. Tools are supplied by CORBA vendors (and JavaSoft) that create stubs and skeletons for IDL specifications.
CORBA is not perfect for all applications. Although free ORBs are available, to get the services or support required often involves the licencing of both developers and deployed applications, with all the licence key management overhead and costs that involves. And unlike Java's RMI, automatic behaviour marshaling is not supported. That is, if the IDL says a object of type User is being sent across the wire, and you send an instance of Student (subclass of User) into the remote method, the object will morph into a User at the remote end, probably with incorrect behaviour for some methods. Not that XML handles this any better. What XML does handle better, however, is:
Capability | Description | XML | CORBA | RMI | Winner
(excluding RMI) |
---|---|---|---|---|---|
Platform independent | Can the client and server be on almost any OS and hardware | yes | yes | yes | tie |
Language independent | Is the application language hidden from the remote system | yes | yes | no (only supports Java) | tie |
Separate content from presentation | Is the content structure separated from the content and the content separate from how it is displayed | yes | yes | yes | tie |
Presentation language | How is data presented to the user? | XSL, or hand-crafted code | hand-crafted code | hand-crafted code | XML |
Handle human readable text | Can the document be viewed and edited with low-tech tools? | yes | no | no | XML |
Handle non-string data types | Is there native for integers, floating point numbers, booleans, etc. | no
(each DTD may define its own mapping to string-based representation, but there is no validation or parsing support or reuse. Note: Currently work is underway add this to XML through the Document Content Definition proposal) |
string, short, int, float, double, boolean,
byte, char, enums, structs (limited OO support), union |
all serializable objects | CORBA |
Huge data set management | Can the data be larger than the allowable application memory? | yes | no
(although with CORBA, data is usually only retrieved on an as-need basis as part of a remote conversation) |
no
(see CORBA note) |
XML |
Schema versioning | Can an application handle two data streams that were created using different versions of the schema definition | yes | no | no | XML |
Distribution Support |
|
none
(simple messaging can be achieved by parsing transmitted XML documents) |
full | partial
(naming, remote method invocation, mobile code security) EJB, JTS, JNDI, etc. offer most CORBA services |
CORBA |
Distribution transparency | How aware of the distribution technology must the application be | Not transparent.
The application must manually send XML documents into any CORBA call or remote socket. XML parser must be called for parsing and building XML document. |
Fairly transparent.
Once the remote connection is set up, objects may be transmitted by passing them as parameters to remote message calls. Remote exceptions must be handled. |
Like CORBA, but in-out data passing is not supported. | CORBA |
Object-Oriented | Can the data structure have behaviour associated with it? | no | Partial.
IDL to Java generator creates mobile classes without behaviour. |
yes | CORBA |
Inheritance and re-use | Is inheritance supported? Can I extend one node with more-specific additional entities? | no | no
(remote objects can inherit, but not marshaled objects) |
yes | tie |
Handle cyclic references | If node A refers to node B, can node B refer to node A | no | no | yes | tie |
Handle shared references | Can node B and C both refer to node A | yes, using entities. Data sets are flat trees. | no.
Shared subtrees end up as multiple subtrees. |
yes | XML |
Handle lazy pointers | Can data refer to other data that is not available or is to be retrieved later? | Yes. An XML document will still parse with missing entities. The XLL language provides external links much like HTML links. | No. (May be approximated with some sort of value-holder mechanism) | No . (See CORBA) | XML |
Integration with application object model | Does the technology allow the application object model to be used as-is. | no. XML parsers generally support DOM or SAX based object models. Use of application object model requires mapping functions or adapter layers. | somewhat
(the IDL to java generator creates structure classes that must then be used) |
full (any application object which is serializable, can be passed as a remote parameter or returned as a return value) | CORBA |
Handling of dynamic data formats | Can the application read a new structure it was not written for and manipulate it | yes | no | yes | XML |
Technology overhead | What is the client overhead of adding the technology? | Most parsers are 200 to 400K | Most CORBA systems require 200 to 400K.
In Java 2 the basic CORBA classes are included |
Part of core from JDK 1.1 on. Need extra packages to support transactions, etc. | tie |
Take Microsoft for example. It is always dangerous to speculate too much in the political arena. Note, however, that Microsoft has made XML one of its linchpins in its drive to web-based open systems support. Of course there are rumblings that Microsoft is trying to put its own proprietary spin on the technology, but let's give Bill and company the benefit of the doubt.
CORBA has never been endorsed by Microsoft. They were early members of the OMG long before the web was hot. But they always sat on the fence about supporting CORBA and finally decided that it competed too directly with their DCOM (now called simply COM) object model plans. DCOM was the distributed extension of COM, which itself was an evolution of OLE, a technology for allowing windows applications to interact with each other. Some say that Microsoft wants to control the desktop environment and saw DCOM as a way to extend its desktop presence to a whole network, thereby mandating its operating systems as the required defacto standard. They couldn't support CORBA too directly without sabotaging their DCOM message.
Assuming that this analysis is true, what happened next? Well the web came out of nowhere and threw Microsoft's plans for Blackbird, its proprietary Information Superhighway technology, into disarray. For several months Microsoft appeared to be reacting like IBM at the beginning of the PC revolution. Finally, they ate some crow, or maybe blackbird, and embraced the web.
But, some say, the web is antithetical to Microsoft's philosophy of controlling the technology behind its products. In the web world, open is good, vendor-neutral is good, OS and language independence is good. So, while ActiveX morphed and limped out of OCX's, Java became ubiquitous. Open source and Gnu public licences became the flavour du jour. Finally, the thesis goes, Microsoft needed to get in the open game when it came to web-based data standards. The W3C was not going to accept submissions for a channel definition format based on DCOM. But publishing it as a CORBA IDL would be treasonous to its DCOM technology.
Enter XML to save the day. It's an open, vendor-neutral, OS independent standard spearheaded by the W3C. And it doesn't give the image of betraying DCOM. In fact, paradoxically, it is precisely XML's inability to specify a remote messaging interface (the core of both CORBA and DCOM) that make it an acceptable remote data format to Microsoft.
Or so the thesis goes. I have no knowledge of Microsoft's decision-making process and so all of this is, obviously, speculation. Flame shields up.
IBM, as well, has come out with a lot of XML support and components. This appears, however, to be more of a support for OS-neutral standards than a push to extend XML into the distributed messaging market. With so many of its own operating systems to support, as well as a stodgy image to update, IBM has a lot at stake in supporting both Java and XML.
Even the OMG has had to support XML, with some obvious reservations. But with the OMG that is probably more of an effort to appear open to new technologies and directions than any belief that it may be a good replacement for the core CORBA technology. Better to say "XML has its uses, and here's how it can fit with CORBA," than to put your hands over your ears and chant "I can't hear you."
There are many application arenas where XML provides huge advantages,
especially in document management and stream-based data manipulation.
Other applications may not benefits as much from XML as they would to other
open web technologies better suited to their needs. Hopefully this article
will help architects and designers get a better understanding on when it
is and is not appropriate to use XML as part of the system architecture.