TOPLink
A Step beyond Object to Data Mapping
Richard Deadman
|
Abstract
As Java moves beyond cute dancing applets to become a platform
for distributed multi-platform computing, the race to supply the infrastructure
services is starting. This includes not only distributed "CORBAish"
management services (Naming, Event Channel, Security, Trader, Administration)
but also facilities for managing the data which is centric to corporate
applications. Several Object to Relational Database mapping frameworks
are now becoming available, but The Object People's "TOPLink/J"tm
goes beyond just mapping the data to also managing it.
So, you've coded your RMI chat applet and impressed your director enough
to be given the task of heading up a team to make the corporate widget
directory available across the network and remotely to sales people.
You look at JDBC and some remote JDBC products and think, great, but shouldn't
I be using an object-oriented methodology and coming up with an object
model instead of populating screens from JDBC-wrapped SQL calls.
Modern OO language meet entrenched Relational database. You've
still got one, haven't you? Object-oriented databases have been around
for years but are not always an available solution, sometimes for legacy
reasons, sometimes for perceptions of immaturity, sometimes due to available
skill set reasons, sometimes due to a lack of a mature OODB query standard.
Whatever the reason, the call has been made -- a relational database it
is.
You assign your best modellers to designing the logical object
model and sit down yourself to solve the physical issues -- object mapping,
transactions, caching, object handle faulting, object identity. Suddenly
you realize that you are going to have to build your own one-off proprietary
pseudo-OO database on top of JDBC. And doing so will involve 40-60%
of the non-gui coding effort. Surely, hopefully, someone else has
run into this before.
A sure indicator of the seriousness which with Java is entering the
corporate IT market are the number of Java Object to JDBC table mapping
tools that are emerging on the market (Sun's upcoming JavaBlend, Thought's
CocoBase, Novera's EPIC dbBlend, ChiMu's FORM, O2's JRB, Objectmatter's
BSF, Software Tree's JDX, 2Link Consulting's dbGen, CrossLogic's Universe).
Some are new, some available for free, but one of the most complete turns
out to be a product that isn't new at all, not at its roots anyway.
The Object People's TOPLink/Jtm is a both new and old. While not
perfect, it offers a wealth of features that makes the management of data
in serious corporate applications much easier. Presently in Beta
(as of February, 1998), TOPLink/J, if priced similarly to its Smalltalk
cousin, will not be a cheap product but will pay for itself in development
time, effort, stability and maintainability. The key to it's advanced
standing is that The Object People have done it all before.
Before Java was a coined name for Sun's Oak project, The Object People
were mentoring the building of pure OO Smalltalk systems by large corporate
customers. They saw a need for a tool to map objects into and out
of the legacy relational databases that were firmly entrenched in many
customer's sites. And so, for over five years, they have been selling
a tool for the Smalltalk market called TOPLink. In this time, they
extended the tool and its framework to support the management of objects
as well as the mapping of them to database tables. As a result, TOPLink
supports not only object mapping, but caching, object identity, faulting,
transactions, units of work and some three tier support.
Object Mapping
The cornerstone of TOPLink, and all the other Object to Relational
Database mapping products now making their way onto the market, is the
ability to automate the mapping of objects into and out of relational tables.
In this regard, TOPLink provides a great deal of flexibility. It
includes six different mapping methods and two accessor methods.
Its mapping methods allow straight table field to instance variable mapping
but also includes the ability to do dictionary based translations, collection
mappings using a mapping table and contained object mapping from within
the same table. There is not space within this article to explain
all the mapping features provided, but they include a last-resort transformational
mapping which allows you to plug in your own mapping algorithm.
The price of all this flexibility is, as always, a bit of extra complexity.
While the direct mapping method is simple to use, understanding all six
and when to use them will take some work. Unlike TOPLink/J's ancestor,
TOPLink for Smalltalk, TOPLink must work within the confines of the Java
language. This means that since TOPLink is simply another class library,
it must rely on public instance variables or public accessors for those
variables. The builder tool at present does not support the most
complex transformational mapping scheme, so this must by specified through
the Java API calls of the framework. As well, the builder tool has
the unfortunate habit of forgetting your accessor method and defaulting
to direct variable calls at times.
Caching and Object Identity
TOPLink allows you to specify a cache on objects fetched out of the
database. This provides for quicker access to objects already fetched
and ensures that two fetches from the database for the same logical object
will result in two handles to the same object. TOPLink provides three
caching schemes: none, cache size limited with least-recently-used collection
and grow-to-infinity. The need for both of the last two derives from
a lack of weak references in Java (prior to 1.2). TOPLink cannot
know when an object is no longer referenced and so must either keep it
forever or used a algorithm to guess when to remove an object from its
cache. Both schemes have costs -- either in memory or in some extraneous
database accesses and in not guaranteeing object identity.
To ensure that the objects in the cache stay in synch with the database,
TOPLink allows the use a version field. The framework can then query
the database on the version number and determine whether it must update
a cached value before returning in to the user. As is the pattern
with TOPLink, several database consistency check options are provided.
For instance, if you know that no other programs are updating the database,
version consistency checking can be turned off. At the other end
of the spectrum, the database can always be read, in which case the cache
is only useful for preserving object identity.
Faulting
So your object model specifies large trees (or even cyclic graphs)
of object relationships. TOPLink can do the mapping to read these
graphs out of memory, but what if you just want to view the widget description
without also reading (at considerable cost) all the other objects which
are reachable from an instance of Widget? One way is to replace your
object pointers in your object models with indirect references which can
be used to find related objects using a directory or factory object.
A more transparent and elegant way, however, is to use a pattern found
in other Object-oriented repositories, such as Gemstone. The trick
is to use a place-holder proxy which "faults" in the rest of the tree when
it is first accessed. TOPLink for Smalltalk leverages off of some
features in that language which are not to be found in Java. Gemstone
accomplishes this in Smalltalk and Java by providing their own VM which
creates a faulting proxy transparently. Since TOPLink is designed
to work on all Java VMs, the designers were forced instead to use a less
transparent mechanism and require the object model to use an instance or
a ValueHolder instead of the referenced object within the object's state
(instance variables). This valueholder is then hidden from the object
model by resolving the valueholder to the real object within an accessor
method.
Transactions and Units of Work
Transactions are handled in a traditional manner in TOPLink shared
with other Object persistent frameworks. A transaction is opened
in a session, changed objects are added to the transaction and the transaction
is committed. Instead of managing locks, which becomes complex in
three-tier applications, TOPLink just provides optimistic locking.
Using the same internal version field used to detect cache consistency,
TOPLink determines if all the objects to be committed are derived from
the latest information in the database. As with all atomic commits,
the write will either succeed or fail for all objects in the transaction.
Perhaps more interesting are the facilities TOPLink provides for Units
of Work. A unit of work is much like a transaction except for two
points:
-
The changes are made to deep object copies, meaning that changes will not
be seen by other parts of the system sharing the same objects until the
changes have been committed.
-
At commit time, the changed object graphs are searched against the originals
for the deltas and only the actual changed fields are written to the database.
Care must be taken with Units of Work, however, since the cloning of objects
added to the Unit of Work is a deep copy based on the TOPLink-known object
hierarchy information. A large hierarchy will cause a large amount
of object cloning for every object put into the Unit of Work. And two entries
which share an object will result in two clones for the sub-object and
undefined behaviour if both sub-objects are changed.
Three Tier Support
Finally, the Beta of TOPLink for Java contained some support for three-tier
access to the database. This support allows for multiple clients
to hook to a database through a TOPLink server using different sessions
with different security permissions. A single read session is used
on the server to allow for a unified cache. While this is useful,
managing objects from a server to a client is a similar problem to managing
the object from a database to a server -- in both cases caching, faulting
and object identity need to be provided.
Limitations and Drawbacks
TOPLink is not perfect. The Beta version tested at the time of
this article had some problems with the TOPLink builder. Re-opening
a project or updating a class or table set of information invariably caused
the accessor mode for the class's variables to be lost. As well,
the class editor screen failed to open after being closed. These
are inconveniences. More troublesome was the poor documentation on
the meaning of "read-only" fields. If you set a variable to field
mapping to read-only for a primary key so that it won't be changed, you
get a runtime error. TOPLink requires the primary fields not be read-only,
since they are used to uniquely write to a database row. The reason
makes sense, but is not necessarily intuitive and is not well documented.
Instead you get a exception thrown when you try to write to the database.
Furthermore, only one instance variable should be mapped to any field,
since otherwise TOPLink does not know which variable to resolve back to
the field at write time.
The lack of a pessimistic locking scheme means that there is no way
to guarantee that a transaction can be started that will proceed (there
never is, but with pessimistic locking, transactions only fail due to network
or database failures, not data collisions). While optimistic locking
allows TOPLink to avoid nasty lock management issues, it does mean that
applications may be forced to make users re-enter whole datasets at times.
At present, TOPLink throw only runtime exceptions -- a legacy of its
Smalltalk roots. While this makes sense for true unpredictable errors,
such as faulting-in errors, I would prefer to see transaction and database
errors forced to be caught. As well, its current support for distributed
computing could be enhanced to support client-side units-of-work caching
and faulting..
Conclusion
The purpose of this article is not to compare the features, costs and
differences of all the available relational to JDBC products presently
available. Such a comparison will of necessity be biased in some
way and must be made depending on the needs of each project. Instead,
the article is a review of one of the most complete products, suitable
for industrial-strength Java solutions. While TOPLink is not perfect,
it offers data management services that are more sophisticated that those
of other products in its class. Depending on the persistence sophistication
required by your project, TOPLink may or may not be worth the licencing
and training costs. It offers a wealth of features and good defaults,
allowing the normal things to be done simply and the abnormal things to
be possible. For a project that requires such services, TOPLink will
quickly pay for itself. It certainly sets a standard that other products
are sure to want to follow.