2012年7月30日月曜日

Inside Class Loaders

This series of articles started when I wanted to write a weblog about the impact of class loaders in a J2EE server. But the log entry grew, due the fact that a few basic rules still can provide a complex system, as you see in physics, where a few basic components and forces can build up something like our universe with all of the stars, black holes, pulsars, galaxies, and planets.

In this part, I want to lay the groundwork on which we can start a discussion about dynamic and modular software systems. Class loaders may seem to be a dry topic, but I think it is one of the topics that separate the junior from the senior software engineer, so bear with me for an exciting journey into the darker corners of Java.

Now you may ask yourself, "Why should I deal with multiple class loaders and their limitations and problems?" The short answer is that you probably have to, one way or the other. Even when you write a simple servlet or JSP program and deploy within a servlet container, your code is loaded by your very own class loader, preventing you from accessing other web applications' classes. In addition, many "container-type" applications such as J2EE servers, web containers, NetBeans, and others are using custom class loaders in order to limit the impact of classes provided by a component, and thus will have an impact on the developer of such components.

As we will see later, even with dynamic class loading, there can only be one class loaded in a particular JVM. Additional class loaders enable a developer to partition the JVM so that the reduced visibility of a class makes it possible to have multiple, different definitions of the same class loaded.

The class loaders work like the federal bank of each country, issuing their own currency. The border of each country defines the visibility and usability of the currency and makes it possible to have multiple currencies in the world.

First we need to explain some definitions:

CL: Class loader.
Initial CL: The CL that initiated the loading of the class.
Effective CL: The CL that actually loaded the class.
Class type: The fully qualified class name (package plus class name).
Class: A combination of the class type and effective class loader.
java.lang.Class: A class in the JDK that represents a class (name, fields, methods, etc.).
Symbolic Link: A class type used within the source code, such as superclasses, extended interfaces, variables, parameters, return values, instanceofs, and upcasts.

Class loaders and their usage follow a few simple rules:

  • Class loaders are hierarchically organized, where each one has a parent class loader, except the bootstrap class loader (the root).
  • Class loaders should (practically: must) delegate the loading of a class to the parent, but a custom class loader can define for itself when it should do so.
  • A class is defined by its class type and the effective class loader.
  • A class is only loaded once and then cached in the class loader to ensure that the byte code cannot change.
  • Any symbolic links are loaded by the effective class loader (or one of its ancestors), if this is not already done. The JVM can defer this resolution until the class is actually used.
  • An upcast of an instance to another class fails when the class of the instance and the class of the symbolic link do not match (meaning their class loaders do not match).

Now I want to put on some meat to these bare-bone rules to provide better understanding.

Class Loader Organization and Delegation

Before we start, let's look at a typical class loader hierarchy, as illustrated by Figure 1:

Figure 1
Figure 1. Class loader hierarchy example

As shown in Figure 1, the bootstrap class loader (BS) loads the classes from the JVM, as well as extensions to the JDK. The system class loader (CP) loads all of the classes provided by the CLASSPATH environment variable or passed using the -classpath argument to the java command. Finally we have several additional class loaders, where A1-3 are children of the CP, and B1-2 are children of A3. Every class loader (except BS) has a parent class loader, even if no parent is provided explicitly; in the latter case, the CP is automatically set as the parent.

That alone does not mean much but has a big impact on class-loading delegation. The Javadoc of java.lang.ClassLoader specifies that any class loader must first delegate the loading of a class to the parent, and only if this fails does it try to load the class itself. Actually, the class loader does not care about which one gets the bytes of the class, but rather which one calls defineClass(). In this final method, an instance of classjava.lang.Class is created and cached in the class loader so that the byte code of a class cannot change on a following request to load the class. This method also checks that the given class name matches the class name in the byte code. Because this method is final, no custom class loader can change this behavior.

As previously mentioned, a class loader must delegate the loading of a class (although a developer can override loadClass() and change this behavior). On one hand, if loading of system classes is not delegated, an application could provide malicious code for JDK classes and introduce a ton of problems. On the other hand, all classes at least extendjava.lang.Object, and this class must be resolved, too. Thus the custom class loader has to load this class by itself, otherwise the load fails with a linkage error. These two facts imply that a custom class loader has to delegate class loading. In JDK 1.4, two of the three versions of defineClass() throw a SecurityException if the given class name starts with "java", while the third version is deprecated due to these security concerns.

I want to stress the fact here that there is a difference between the class loader that starts the process of loading the class and the one that actually loads (defines) the class. Assuming that in our example no class loader delegates the loading of a class to one of its children, any class is either loaded by the Initial CL or by one of its ancestors. Let us assume that a class A contains a symbolic link to class B that in turn contains a symbolic link to class C. The class loader of C can never be a child of the class loader of B or of A. Of course, one should never say "never," and yes, it is possible to break this rule, but like multiple inheritance in C++, this is "black belt" programming.

A more prominent exception of the JDK delegation model of "delegating first" is the class loader for a web container described in the servlet specification. This one tries to load a class first by itself before it delegates to the parent. Nevertheless, some classes, such as java.*, javax.*, org.xml.sax.* and others, are delegated first to the parent. For more information, please check out the Tomcat 5.0 documentation.

Class Linking

After a class is defined with defineClass(), it must be linked in order to be usable by the final resolveClass() method. Between this method call and the first usage of a symbolic link, the class type is loaded by the class loader of the containing class as Initial CL. If any linked class (type) cannot be loaded, the method will throw a linkage error (java.lang.NoClassDefFoundError). Keep in mind that the resolution of symbolic links is up to the JVM and can be done anywhere between the loading of the containing class (eager resolution or C-style) and the first actual usage of the symbolic link (lazy resolution). It can happen that a symbolic link is in a class and if it is never used, the linked class will never be loaded such as in this example with JDK 1.4.2 on Windows 2000:

public class M {      // In JDK 1.4.2 on W2K this class can be used      // fine even if class O is not available.  	public O mMyInstanceOfO;  }

whereas this class will fail with a linkage error if the class O cannot be loaded:

public class M {      // In JDK 1.4.2 and W2K the creation of an      // instance of M will FAIL with      // a NoClassDefFoundError if class O is not      // available  	public O mMyInstanceOfO = new O();  }

and to make matters a little bit more complicated, it only fails when an instance is created:

    // Fine because in JDK 1.4.2 on W2K class      // linking is done lazy      Class lClassM = Class.forName("M");      // Fails with NoClassDefFoundError      Object lObject = lClassM.newInstance();

For more information, please read Chapter 12: "Execution" in the Java Language Specification.

Class Definition

To a beginner, a class is identified solely by the class type. As soon as you start to deal with class loaders, this is no longer the case. Provided that class type M is not available to CP, A1 and A2 could load the same class type M with different byte code. Even when the byte code would be the same from a Java point of view, these classes are different, no matter if the byte code is the same or not. To avoid ambiguities, a class is identified by its class type as well as the Effective CL, and I will use the notation <Class Name>-<Class Loader>. So for this case, we have classes M-A1 and M-A2. Imagine we also have another class, Test-A1, with a method upcastM() that looks like this:

public void upcastM(Object pInstance)          throws Exception {      M lM = (M) pInstance;  }

Because the class Test is loaded by A1, its symbolic link M is also loaded by A1. So we are going to upcast a given object to M-A1. When this method is called with an instance of the class M-A1 as an argument, it will return successfully, but if it is called with an instance of M-A2, it will throw a ClassCastException because it is not the same class, according to the JVM. Even with reflection this rule is enforced, because both java.lang.Class.newInstance() andjava.lang.reflect.Constructor.newInstance() return an instance of class java.lang.Object-BS. Unless only reflection is used during the lifetime of this object, the instance has to be upcast at some point. In the case of only using reflection to avoid conflicts, any arguments of a method still be subject to an upcast to the class of the method signature and therefore the classes must match, otherwise you get a java.lang.IllegalArgumentException due to the ClassCastException.

Test

The sample code may help the reader to better understand the concepts described above and, later, to do their own investigations. In order to run the sample code, just extract it in the directory of your choice and execute the ant build script in the classloader.part1.basics directory.

It has three directories: main, version_a, and version_b. The main directory contains the startup class Main.java as well as the custom class loader that will load classes from a given directory. The other two directories both contain one version of M.javaand Test.java. The class Main will first create two custom class loaders each loading classes, after delegating to the parent class loader, from either the version_a or version_b directories. Then it will load the class M by each of these two class loaders and create an instance through reflection:

// Create two class loaders: one for each dir.  ClassLoader lClassLoader_A =     new MyClassLoader(        "./build/classes/version_a" );  ClassLoader lClassLoader_B =     new MyClassLoader(        "./build/classes/version_b" );  // Load Class M from first CL and  // create instance  Object lInstance_M_A =     createInstance( lClassLoader_A, "M" );  // Load Class M from second CL and  // create instance  Object lInstance_M_B =     createInstance( lClassLoader_B, "M" );

In order to test an upcast, I need a class where the Effective CL is one of the custom class loaders. I then use reflection in order to invoke a method on them because I cannot upcast them because Main is loaded by the CP:

// Check the upcast of a instance of M-A1  // to class M-A1. This test must succeed  // because the CLs match.  try {      checkUpcast(          lClassLoader_A, lInstance_M_A );      System.err.println(          "OK: Upcast of instance of M-A1"          + " succeeded to a class of M-A1" );  } catch (ClassCastException cce) {      System.err.println(         "ERROR: Upcast of instance of M-A1"         + " failed to a class of M-A1" );  }  // Check the upcast of a instance of M-A2 to  // class M-A1. This test must fail because  // the CLs does not match.  try {      checkUpcast(         lClassLoader_A, lInstance_M_B );      System.err.println(         "ERROR: upcast of instance of M-A2"         + " succeeded to a class of M-A1" );  } catch (ClassCastException cce) {      System.err.println(         "OK: upcast of instance of M-A2 failed"         + " to a class of M-A1" );  }

The checkUpcast() loads the class Test through reflection and calls the Test.checkUpcast() method, which makes a simple upcast:

private static void checkUpcast(     ClassLoader pTestCL, Object pInstance )        throws Exception {      try {          Object lTestInstance =             createInstance( pTestCL, "Test" );          Method lCheckUpcastMethod =             lTestInstance.getClass().getMethod(                "checkUpcast",                new Class[] { Object.class } );          lCheckUpcastMethod.invoke(             lTestInstance,             new Object[] { pInstance } );      } catch( InvocationTargetException ite ) {          throw (ClassCastException)             ite.getCause();      }  }

Afterwards, there are some tests that do the same thing, but check the upcast restriction against reflection to ensure that reflection cannot compromise the rules posted at the beginning of the article. The last test checks the linking of symbolic links. On Windows 2000 and JDK 1.4.2, it will also show the lazy loading of classes because the loading of the class succeeds, whereas the creation of the instance eventually fails:

// Load a class N that has a symbolic link to  // class O that was removed so that the class  // resolving must fail  try {      // Upcast ClassLoader to our version in      // order to access the normally protected      // loadClass() method with the resolve      // flag. Even the resolve flag is set to      // true the missing symbolic link is only      // detected in W2K and JDK 1.4.2 when the      // instance is created.      Class lClassN = ( (MyClassLoader)         lClassLoader_A).loadClass( "N", true );      // Finally when the instance is created      // any used symbolic link must be resolved      // and the creation must fail      lClassN.newInstance();      System.err.println(         "ERROR: Linkage error not thrown even"         + "class O is not available for"         + " class N" );  } catch( NoClassDefFoundError ncdfe ) {      System.err.println(         "OK: Linkage error because class O"         + " could not be found for class N" );  }

Please note that in the directory version_a there is a class named O.java, because in order to compile the class N.java, this class is needed. However, the ant build script will remove the compiled class O.class before the test is started.

Conclusion

As long as a Java developer does not deal with his or her own class loader, all of the classes are loaded by the bootstrap and system class loader, and there will never be a conflict. Thus, it seems that a class is defined only by the fully qualified class name. As soon as there are sibling class loaders -- neither a parent of the other -- a class type can be loaded multiple times with or without different byte code. The class loader also defines the visibility of a class type because any upcast checks against the class name as well as its class loaders.

To use the currency analogy, this is expressed by the fact that you can have several currencies in your wallet, but as soon as you want to use one, the cashier will check if your money is of the local currency. Still, you can carry these currencies in your pocket wherever you go, and likewise, you can carry around instances of classes even when they are unknown or not compatible in a particular class, as long as the class of the reference is compatible there. Luckily in Java, java.lang.Object is the superclass of all instances and is loaded by the BS, which is the parent of all class loaders no matter what. This means a reference of a class java.lang.Object is always compatible. I think of this as a "tunneling through" of classes from one compatible island to the next -- something that is very important in J2EE, as will be shown in a future installment.

My analogy with the currencies is very simplified, because it implies that all classes have the same visibility due to the single border of a country. The analogy is based on the two-dimensional world map, whereas with Java class loaders, each level within the hierarchy of the class loaders is adding a new layer and building up a three-dimensional space.

Additional class loaders enable a Java developer to write modular applications where the visibility of classes is restricted, and therefore, multiple class types can be loaded and managed. Nevertheless, it requires effort to understand the used class loaders and the organization of the classes and class loaders. As with threads, class loading is a runtime behavior that is not obviously visible to the developer, and requires experience and testing to understand and utilize.

Now that the groundwork is laid, we can finally delve into the usage of class loaders. In the next article, we will see how class loaders can be used in a J2EE application server to manage deployments and what the effects are on invocations through local or remote interfaces. Afterwards, we will see how advanced class loaders make it possible to drop class types or massage the code in order to add "Advices" (AOP) at runtime without changing or recompiling your code.

 

http://onjava.com/pub/a/onjava/2003/11/12/classloader.html

0 件のコメント:

コメントを投稿