Internet Security Professional Reference:Java Security

-->

Java’s Interpreted Features

When compiling Java code, the compiler outputs what is known as Java bytecode. This bytecode is an executable for a specific machine—the JVM—which just happens not to exist, at least in silicon. The JVM executable is then run through an interpreter on the actual hardware that converts the code to the target hardware and executes it. By compiling for the virtual machine, all code is guaranteed to run on any computer that has the interpreter ported to it. In doing so, Java solves many of the portability issues. Interpreters have never had the tradition of performance thoroughbreds necessary for survival in today’s marketplace, however. Java had to overcome a large obstacle in making an interpreted architecture endure.

The solution was to compile to an intermediate stage where the file was still portable across different platforms, but close enough to machine code that interpretation would not produce excessive overhead. In addition, by taking advantage of advanced operating system features such as multithreading, much of the interpreter overhead could be pushed into background processes.

The advantage of compiling to bytecodes is that the resulting executable is machine neutral, but close enough to native code that it runs efficiently on any hardware. Imagine the Java interpreter as tricking the Java bytecode file into thinking that it is running on a JVM. In reality, this could be a Sun SPARCstation 20 running Solaris, an Apple/IBM/Motorola PowerPC running Windows NT, or an Intel Pentium running Windows 95, all of which could be sending Java applets or receiving code through the Internet to any other kind of computer imaginable.

Java’s Dynamic Loading Features

By connecting to the Internet, thousands of computers and programs become available to a user. Java is a dynamically extensible system that can incorporate files from the computer’s hard drive, a computer on the local area network, or a system across the continent over the Internet. Object-oriented programming and encapsulation mean that a program can bring in the classes it needs to run in a dynamic fashion. As mentioned previously, multiple inheritance in C++, however, can create a situation in which subclasses must be recompiled if their superclass has a method or variable changed.

This recompiling problem arises from the fact that C++ compilers reduce references of class members to numeric values and pre-compute the storage layout of the class. When a superclass has a member variable or function changed, this alters the numeric reference and storage allocation for the class. The only way to allow subclasses to be capable of calling the methods of the superclass is to recompile. Recompilation is a passable solution if you are a developer distributing your program wrapped as a single executable. This, however, defeats the idea of object-oriented programming. If you are dynamically linking classes for use in your code (classes that may reside on any computer on the Internet) at runtime, it becomes impossible to ensure that those classes will not change. When they do, your program will no longer function.

Java solves the memory layout problem by deferring symbolic reference resolution to the interpreter at runtime. Rather than creating numeric values for references, the compiler delivers symbolic references to the interpreter. At the same time, determining the memory layout of a class is left until runtime. When the interpreter receives a program, it resolves the symbolic reference and determines the storage scheme for the class. The performance hit is that every time a new name is referenced, the interpreter must perform a lookup at runtime regardless of whether the object is clearly defined or not. With the C++ style of compilation, the executable does not have any lookup overhead and can run the code at full speed if the object is defined, and only needs to resort to runtime lookup when there is an ambiguity in such cases as polymorphism. Java, however, only needs to perform this resolution one time. The interpreter reduces the symbolic reference to a numeric one, allowing the system to run at near native code speed.

The benefit of runtime reference resolution is that it enables updated classes to be used without the concern that they will affect your code. If you are linking in a class from a different system, the owner of the original class can freely update the old class without the worry of crashing every Java program that referred to it. The designers of Java knew this was a fundamental requirement if the language was to survive in a distributed systems environment.

In this capability to change the classes that make up a program in such a robust manner, Java introduces a problem not covered in many of the security features mentioned so far, which deal with programs directly accessing file or memory space. This problem, where a known good class is substituted with a faulty or intentionally erroneous class, is a difficult and new problem that occurs with distributed systems.

In traditional software architectures, all of the code resides on a single disk, and remains static until the user of the software changes it manually. In this scenario, the user of the software knows when a change is made, and can implement testing to ensure that a new piece of software provides the same level of security and error-free computation before implementing it on a day-to-day basis. If classes are being dynamically loaded from across the web each time a program is run, it would be impossible to necessarily tell when any single dependent classes had been updated. This problem is also discussed in the next section on the execution of class files.

Table of Contents