When in Rome
A Guide to the Java Paradigm
Richard Deadman
Whether or not you have programmed in an object-oriented language before,
moving to a new language involves some paradigm shifts in how you think
about structuring programs and solving problems. Often we think in terms
of solutions instead of problems and so ask questions like "How do I pass
a method pointer in Java?", instead of "How do I encapsulate a behaviour
reference in Java?". This is particularly important for C++ programmers
migrating to Java, since the similarity in syntax between the languages
can lead to the assumption that the language paradigm is identical.
In this article, I discuss two classes of problems faced by developers.
The first is what I call "Conceptual Confusion", where a user of one language
carries their assumptions to another language and then gets confused when
their assumptions are invalid. The second class is the desire for new features
to be added to the language. This "Creeping Featurism" generally involves
adding complexity to the language for the sake of mimicking another language's
feature, often no more than syntactic sugar. That is, the proposed feature
may reduce the typing without adding to the power of the language.
Let me warn you of my bias: I find too many features in a language confuses
me. I find that a simple language based on a single paradigm provides for
less confusion, better maintainability and quicker code development. You
may understand that neat "constant reference" feature, but think of the
person who may have to fix, reuse or extend you code a year from now.
Conceptual Confusion
There are at least four good examples of conceptual confusion that I have
run across. Mostly these affect people migrating from either C++ or Smalltalk.
Virtual Methods
A common mistake is to assume that casting an object in Java changes which
methods are bound to that object. Java uses late binding; that is, the
instance method invoked when a message is sent to an object is determined
by the object's type, not the type of the handle to the object. If the
object is a sub-type of a base-type, then the sub-type's instance method
will always get called, whether or not the handle to the object is declared
as of the base-type, an interface or any sub-type between the base-type
and the subtype. This obviously has to be true when the handle is declared
to be of an interface type. So...
class Foo {
public String toString() {
return "An instance of class Foo";
}
class Bar extends Foo {
public String toString() {
return "An instance of class Bar";
}
public class FooBar {
public static void main(String[] args) {
Foo myPersonalFoo = new Bar();
System.out.println(myPersonalFoo);
}
}
will print "An instance of class Bar".
It should be up to the object to determine how messages sent to it will
be handled. Note that this is the opposite to how static methods are bound
in Java. More on this later.
Since the default in C++ is for non-virtual methods, some C++ programmers
can inadvertently assume that early binding takes place and not then understand
the behaviour of their code.
All the compiler checks is that the message is supported by the handle's
object type. To send a sub-type's message to an object handle which does
not support that message, you must cast the handle to the proper sub-type:
import java.rmi.server.*;
...
Object vectorContent = aVector.firstElement();
if (vectorContent instanceof ServerRef) {
System.out.println("Remote call from " +
((ServerRef)vectorContent.getHostName()));
}
Pass-by-Reference
Every variable in Java is a handle to either a basic type or an object
instance. When you pass an object into a method, or out via a return value,
you are actually passing a copy of the reference; the object pointed to
stays the same. And there is no way to restrict access to that referenced
object (see the discussion below on the "const" feature of C++). So you
can alter the state of the object pointed to (unless they are immutable
-- including basic types), but you cannot change someone else's reference
to that object, since their reference was copied when it was given to you.
This is identical to Smalltalk's parameter passing but contrasts significantly
with the C++ allowed modes:
-
pass-by-reference. As in Smalltalk and Java, but involves passing the pointer
to an object explicitly. The receiver must then explicitly de-reference
the pointer. If you don't "get" pointers, you shouldn't be playing with
C and C++.
-
pass-by-value. A copy is made of the parameter. In Java the equivalent
can be accomplished by cloning the object before sending it.
-
pass-by-reference but seem like a value. This is the most confusing C++
addition, the "Reference". Here you tell the method that you are passing
a reference to the object but you hide the syntax to make it seem like
the object was passed as a value.
As long as you understand the paradigm, you can achieve your desired contract
between objects within Java, with the benefit of a simpler object model.
(Note: With RMI, the rules change for distributed objects. All parameters
which are not themselves remote handles are serialized between the virtual
machine, effectively implementing a pass-by-value paradigm for non-Remote
objects.)
Class/Static Methods
In Smalltalk, classes and instances are both first-class objects and there
is a name space separation between class methods and instance methods.
If an instance wants to call a registration method on its class to add
itself to some class-based cache of instances, it can perform:
addInstanceToCache
self class addInstanceToCache: self.
Subclasses of the base class can override "addInstanceToCache" and they
will be called by instances of the subclass which perform the "addInstanceToCache"
method. In other words, the class methods are dynamically bound in Smalltalk,
as are all methods.
Smalltalkers moving to Java often mistake "static" methods and variables
as being equivalent to "class" methods and variables in Smalltalk. This
is not strictly true. While "static" methods and variables are bound to
a class and not an instance, they are resolved "statically" to the type
of the variable (or named class), by the compiler. If the variable type
and instance class are different, the static method or variable is not
over-ridden at run-time by the instance's class. Hence:
class Bar {
static String getName() { return "Bar"; }
}
class Baz extends Bar {
static String getName() { return "Baz"; }
}
public class Foo {
public static void main(String[] args) {
Bar myBar = new Baz();
System.out.println(myBar.getName());
}
}
will output "Bar" and not "Baz", even though "myBar" is of class Baz.
Here is one of Java's great inconsistencies: instance methods involve
late binding but class methods involve early binding. Think of static as
meaning "bound statically to the class at compile time". Now write it out
fifty times.
A similar problem exists in the understanding of Interfaces...
No Static Interface Methods
Interfaces are often seen as contracts or signatures for classes. As such,
interfaces can define instance methods that classes must implement. Obviously
instance variables cannot be defined in an interface, since the interface
does not define behaviour which could operate on such instance variables.
But people are often confused as to why interfaces can define static
variables and cannot define static methods. After all, what is the purpose
of a static variable if there is not static behaviour? And why can't I
use an interface to declare that all classes which implement this interface
will implement these static methods?
Well, the answer gets back to understanding how static variables and
methods work. They are bound at compile time against the class identified
either by class name or variable type. For variables this means that you
can actually use interface state:
char doneCharacterForMyText = java.text.CharacterIterator.DONE;
Often interface variables are defined as final and given in uppercase to
simulate the C "#define" feature, but they are not limited to this use.
Note that if your class tries to implement multiple interfaces which define
the same static variable, the compiler will throw an exception -- this
is as close as you can get to the multiple-inheritance "Diamond of Death".
For methods, what does it mean to define a static method in the interface?
Well, since static method calls are always bound to type of the variable,
it means that any calls to these static methods from variables which are
defined as instances of the interface will try to invoke the interface's
static method -- and since by definition interfaces cannot have behaviour,
this would be a problem.
Some argue that this is an argument for static methods and variables
being dynamically bound, but that moves us into the next section...
Creeping Featurism
Some features, like Inner Classes, are conceptually useful and can be added
without altering the language paradigm. Providing Template support, while
really syntactic sugar for automatic code generation, has proven so useful
to data management that, given the anaemic support in Java for different
container types, is very likely on the list of future features being debated
inside the bowels of Javasoft. However, most of the additional language
features being proposed on various newsgroups and mailing lists can already
be solved within the language without adding the complexity of their syntactic
sugar.
Here is a rundown of some of the features I have heard cries for within
the last year:
-
First class methods
Justification:
Sometimes we need to tell an entity which method to call in response to
an asynchronous event. This is the so-called "callback" function pointer
prevalent in so many C and C++ APIs. Methods should be first class citizens
in Java.
Rebuttal:
Java is an OO language (unlike C++ which is an OO/Procedural mix). While
Java's reflection mechanisms do allow you to find a method object, this
method object is not bound to any particular instance. Since instance methods
are useless out of context of the object in which they reside, you really
need to pass the context of the method, that is the object. But method
pointers would allow me to pass different methods in to the same notifier,
you say? Well here the adapter pattern (Gamma et al, pp. 139) comes in
useful, mixed with the syntax sugar of Inner classes.
Define a notification interface that the notifier understands and uses
to notify clients. Now the object that needs to be notified can implement
this interface and be passed to the notifier, or an Inner class can be
used to create an adapter to the object that needs to be notified. This
is the basis of java's Observer/Observable system as well. In fact, now
we can provide more than one notification method simply, which otherwise
would require sending multiple callback function pointers.
An added benefit of registering objects through an observable pattern
is that "user data" does not have to be registered with the event source.
This context information can be saved with the observer object without
having to break encapsulation and expose the data to the observable, which
has no intrinsic interest in the data. As well, a single instance can create
and register multiple observer adapters for each context/event it is interested
in.
-
Auto-delegation
Justification:
Since Java does not have multiple inheritance, the alternative approach
to coalescing the behaviour of two classes into one subclass is through
delegation. While Java provides interfaces which allow delegation contracting,
the hooking up of the interface implementors to the delegates methods is
both trivial and laborious. Why not just add a flag to the Java source
code that indicates that this interface's methods should be delegated to
this internal instance variable.
Rebuttal:
This is syntactic sugar in the highest form. No added functionality is
being added to the language, just automatic typing. As well, you would
then need to be able to define how to resolve interface method intersections
and how to override automatic delegation. Since the design decisions are
hidden from the designer, side effects of other changes to the object model
may not be transparent. Say you implement both interface A and B and delegate
them to a and b respectively. Now a change in the object model moves a
method from interface A to interface B. Even though your signature hasn't
changed, you now need to recompile your source code since the hidden delegation
decisions are no longer valid.
This is not to say that the automatic generation of code does not have
its place. IDEs use code generation to create GUIs; the BeanBox uses code
generation to glue components together using adapter classes. The difference
is that here the code generation is part of a tool and not part of the
language. An automatic interface delegate tool would be a useful addition
to an IDE toolset. The generated code would be Java source code and could
be intelligently managed within the scope of the application builder.
-
Support for constant parameters
Justification:
When I send an object to another object as a parameter or return value,
I want to ensure that the other object does not change its state.
By specifying that the variable is of type "const", I can ensure that the
other object does not modify the object (unless, in C++, the method casts
the object to a non-constant variable).
Rebuttal:
If you want to protect the data from unauthorized changes, Java supplies
at least three other options that do not add the conceptual complexity
of const (which is not truly secure in C++ anyway):
-
Cloning. Here you clone the object before passing it to ensure that the
receiver has a copy that is decoupled from the object you are pointing
to. This is expensive but much safer than the C++ "const" feature.
Note, however, that by default cloning in Java involves a shallow copy.
Shallow copy means that only the object is copied, not any objects contained
within the object as instance variables. Since all all variables
are handles to other instances or to base types, if the referenced objects
are not also cloned, the original instance and its shallow copy will both
contain handles to the same entities. So the original object and the cloned
object I passed you will share any contained objects, leaving the possibility
of some shared state. Implementing your own deep-copy (which recursively
clones all state to some specified level) is recommended if this is a serious
problem. A simple form of deep-copy is performed by Java's Serialization
facility.
-
Protected interface. Write an interface which does not modify your
object's state and implement the interface within your object. If
the object is passed as a type of this interface, the effect is equivalent
to the C++ "const" feature -- that is you have protection but it can be
cast aside.
-
Protection Proxy (Gamma et al., pp. 207). Wrap up the object within
a protection proxy object which checks and disallows certain access on
the object. Here the whole interface is exported, but some methods
will return a "Disallowed" exception. This also allows for capabilities-based
dynamic authorization (i.e. user access can be more finely controlled).
This protection strategy is probably the most secure, but may be the most
work.
To add "const" to the language" would not only add to the conceptual complexity,
it would also then require specifying which methods were safe to send to
a "const" version of the class (essentially a parallel class, much like
every class has a parallel array class) -- at least as much work as defining
a protection interface. This is particularly true because java involves
much more passing of objects instead of raw data types (int, long) than
is often found in C++ programs.
-
In/Out parameters
Justification:
This is the flip-side of the "const" issue. Since Java passes variables
using a copied reference handle, I can send an object to another object,
where it is modified. However if the receiving object wants to replace
the object, my reference still points to the old object. I want to
be able to pass my reference in directly, so that the receiver can modify
my reference directly, pointing me to an new object.
Rebuttal:
There are several ways to accomplish passing references without complicating
the calling semantics of Java with "const", "by-reference", "by-reference-but-look-like-value".
The easiest way is to wrap up the passed object in a container object,
such as a one-element array of the object being passed. Then when
the receiver changes the object in the array's first slot, the new object
will be accessible to the sending object through the array.
As with many other issues on this list, this is an example of using
OO patterns and techniques to solve problems instead of making the language
more complex and incorporating the techniques into the language semantics.
Kind of like RISC for programming languages.
-
Inlined getters and setters
Justification:
For performance reasons, I would like to have access to my instance variables
"inlined". That is, the byte code is copied from the receivers class to
the senders to reduce run-time method lookup and invocation time.
Rebuttal:
Inlining is difficult with polymorphic late-bound languages. You
have to know that the object receiving the message has not overridden your
inlined method, so the method had better be "final". If the method
is final, many Java compiler's will allow inlining as an optimization option.
In essence, the inlining decision is removed from the language and moved
to the compiler.
Note however, that inlining may speed up your code at the cost of large
class files. A classic computing time-space trade-off. Large class
files may mean greater download times and even, if your memory is running
low, slower performance due to memory swapping. Optimizing your code is
rarely as simple as it first seems and often has side effects (such as
extensibility and support problems). There are many rules to optimization,
but I like:
-
Don't do it unless absolutely necessary
-
Don't do it yet
-
If you do optimize, make sure you're optimizing the system bottlenecks
(run your system through a call tracer)
-
Multi-valued return parameters.
Justification:
My method can take multiple parameters, why can't I specify multiple return
values.
Rebuttal:
One of the key rules of Object Oriented technology is that each method
should do one and only one thing. This is a rule that should occasionally
be broken, particularly during distributed interface design. However, as
a fundamental rule, it is a good one, and has led (along with the C and
procedure history of many OO languages) to single-value return for Java
methods. Its is certainly semantically simple.
Enough with excuses. Multi-value return is sometimes useful.
But if you are returning a collection of related objects, maybe you should
take a more OO approach and encapsulate the data into an object. If the
data is really unrelated and creating an encapsulation class doesn't make
sense, you can always use arrays, Vectors or some other generic collection
class.
-
Dynamic "static"
Justification:
As I discussed in the previous section, static variables and methods are
bound to the class or interface at compile time. This means that you cannot
have calls to your static methods be overridden by subclass static methods
based on the type of object your variable is pointing to. Dynamic class
variables and methods are a nice clean feature and implemented in languages
such as Smalltalk.
Rebuttal:
The problem is that you would have to either add a new keyword and a new
type of class variable and method (try explaining to someone that we have
both static and dynamic class methods and variables and the rules for using
them with Interfaces) or break almost all code already in existence. The
cows are out of the barn, so closing the barn door doesn't help at this
point.
Satiric Java Feature Request List
I first posted this list to a mailing list in December of 1996 in response
to a thread advocating the addition of Lisp-style multiple return values
to Java methods. Additional contributions have been noted.
<satire>
Golly Gosh, now that we have spent time analyzing the syntactic sugar
needed to add multiple-return values to the language, I have some other
suggestions. I have programmed in Assembler, Machine (PDP-11), Fortran-77,
COBOL, BASIC and other powerful languages and would like to add some other
features to Java that will occasionally make it easier to think in my old
ways.
Here are some features that Java needs now to become a serious programming
language:
-
Goto statement. There's the code I want to do in that other class, so why
can't I just:
goto OtherClass.method()::line;
-
Push and Pop. Heck in-line assembly code should be allowed. I know were
those values I want are, let me get at 'em.
-
Turn off garbage collection and add free()/delete(). I can do a better
job than any compiler/VM.
-
Pointers. Please, I finally understand these. Give 'em back.
-
Compile-time platform optimization, header files, #define, #pragma.
-
More basic types modifiers. Microsoft has the right idea
unsigned long FAR PASCAL *data.
-
Friend classes, methods, variables, Vector components. I would like to
set up a Vector and say that only some of it's member's are available to
other classes.
i.e. Vector data = new Vector(3, 1::private, 2::friend, 3::public);
-
First class methods. Who needs OO after-all.
-
More passing mechanisms. Add Pass-by-reference, pass-by-value, pass-by-reference-but-look-like-a-value
(C++ reference).
-
Multiple inheritance. Never used it, but someday I may...
-
Enforced Hungarian notation, which allows us to avoid declaring variables,
something I liked in Fortran. Instead of:
UserInfo user = new UserInfo();
we could have:
cUserInfo_data = new UserInfo();
Mark Wutka (wutka@netcom.com) suggested (http://MetaDigest.XCF.Berkeley.EDU/archive/advanced-java/9611/0834.html)
an enhancement to this for either:
-
Implicit typing. The first time a cUserInfo_data variable is used, the
default constructor is automatically called.
-
"Extended Hungarian notation" that could automatically define the class
within the variable name.
-
More fluff words so that the code is more readable, ala COBOL:
add iNative_first to iNative_second giving iNative_result;
Hmmm, maybe I should just forward 11 and 12 to the C++ committee; they're
more responsive and understanding anyway.
</satire>
Conclusion
One of the strengths of great languages like "C", "Lisp", or "Smalltalk"
is their conceptual simplicity. In C, you have pointers and functions and
if you understand these, the rest is straightforward. In fact, a general
programming language can solve any problem, the differences really lie
in the conceptual paradigm they support and the syntactic sugar provided
to make programming less error-prone. Often there is a trade-off between
power and maintainability that must be made by the language designers.
For some domains, such as OS programming, low-level power is imperative
but for the vast majority or domains long-term maintainability, reuse and
extensibility are much more important. As faster processors and better
compilers have become available we have seen the progression of dominant
paradigms migrate from Assembly to Functional to Object-Oriented simply
because the dominant problem has changed from performance to manageability
as the complexity of our systems has increased.
In an effort to keep the language as simple and clean as possible, the
Java language designers purposely choose to provide a simple programming
paradigm. Some exceptions were made, notably the inclusion of basic data
types which are not objects. But overall the language is simple, elegant
and clean.
Programmers used to solving problems using the syntactic features of
other languages often pine for the adoption of those features in Java.
What they miss is that there is a conceptual cost to adding those features,
both in complexity and in paradigm. This is particularly difficult for
C++ programmers since the syntax of Java was purposely modelled on the
syntax of C++, and this often leads new Java programmers to bring their
C++ mindset with them to Java.
To write well architected Java programs a designer must have a good
understanding of the language paradigm, remember to think of patterns and
not techniques and remember to ask herself "How do I solve this problem?"
rather than "How do I apply this technique native to my old language?"