Terracotta Clustering Development Task Descriptions

Introduction to the Terracotta Clustering Tasks

This page details the work that went on to enable Terracotta clustering in a branch of Sakai-2-5-x. In particular it documents the SAK-* JIRA tickets and a general description of the change made. Unless otherwise stated, all SAK-* tickets are children tasks to the parent SAK-13324 ticket.

ComponentManager

In order to use Terracotta, work had to be done to register the custom Sakai ClassLoaders with Terracotta. This work was done with SAK-13834. A new TerracottaClassLoader class was created (with no compile time dependencies on Terracotta), to introduce a new __tc_getClassLoaderName method. Additionally, the constructor does extra work (with reflection, to avoid terracotta compile time dependencies) to register the ClassLoader with Terracotta. A small change was required ComponentsLoader to check to see if this Sakai instance should be Terracotta enabled, if so, it uses the TerracottaClassLoader instead of a URLClassLoader, if the property is not set, then everything continues to work like it did before.

Refactoring MySession and MyLittleSession

In order to cluster enable at the Sakai Session level, the MySession and MyLittleSession inner classes to SessionComponent had to be made top level classes. This work was done with SAK-13837. It might have been possible to simply make these classes non-static, however, the making them independent classes, seem to make sense given their size and complexity.

Tool Whitelisting

The original theory to partially supporting Terracotta relied on trapping Terracotta exceptions. During testing, it was discovered that this would not work. So, that approach was abandoned and a new approach started by relying on whitelisting which tools were Terracotta enabled. This work happened with SAK-14105. MySession and MyLittleSession were both changed to have an additional data structure to store non shared data. Additionally, SessionComponent was changed to support a shared method of determining whether or not data should be placed in shared or non-shared data structure. This was based on obtaining the current tool id and comparing it to a white list of terracotta enabled tools. This allows system administrators to turn support on or off for any enabled tools.

Rounding out the solution

In the process of reaching a successful test of all the work done to this point, a few miscellaneous changes had to be coded. These changes were checked in under the existing tickets, but are not concrete deliverables by themselves. These changes included, a method for resolving the transient fields in MySession and MyLittleSession. Work on an initial tc-config.xml file.

UsageSession

Next, we needed to resolve the potential problem of UsageSession being incorrect if a user failed over to a new server. The work to address UsageSession was done with SAK-13330. The Sakai RequestFilter and UsageSession classes were modified to compare the previous and current server values. If these values were not the same, a log message was put out saying a change had been detected. Then the values would be updated and a push of the new data to the database was done.

RequestFilter and Coarse Grained Locking

While system testing the changes made to this point, it was determined that normal worksites were not rendering normally. Debugging the problem, it was determined this could be a form of deadlock with the Terracotta synchronization. In order to fix this problem, a new coarse grained locking approach was introduced with SAK-14449. The Sakai RequestFilter was modified to synchronize access to the Sakai Session object for the duration of the FilterChain.doFilter() call. The Terracotta tc-config.xml file was modified to create a Terracotta lock on this synchronized block. The net effect, is that all changes to the Terracotta cluster made during this request cycle will happen under the same coarse grained transaction boundary. This change fixed the previous site rendering problem and was seen as a better design to the fine grained locking approach that had be used before.

Session Maintenance

There is one side-effect to the coarse grained locking approach. Any code that works with the SessionManager and does not go through the Sakai RequestFilter might encounter problems if they modify Terracotta clustered object data. One clear example of this is the SessionComponent class background thread Maintenance. Maintenance operates in the background of a Sakai JVM checking Session objects and removing them from SessionManager if they have expired. The act of removing Session objects from the SessionComponent.m_session data structure, is a modification of clustered data. This change must be covered with a Terracotta lock and transaction boundary. Additionally, there was a risk of a Sakai JVM pulling in all Sessions related to all other Sakai JVMs. The MySession and SessionComponent classes were modified with SAK-14581. A system was put in place to separately track the expected timeout values for sessions with their session id. The SessionComponent.Maintenance thread first checks this timeout data structure. Only for sessions whose timeout value is expired, are then attempted to be queried. Even that is done by only querying the object directly, rather then walking the whole SessionComponent.m_sessions data structure. These changes will avoid one Sakai JVM bringing in all the Session object for every other Sakai JVM. Additionally, the SessionComponent.Maitenance thread can be disabled on a per JVM instance if problems do occur.

Cluster Enabling the Worksite Setup Tool

John Wiley & Sons, Inc. have customized Worksite Setup. The initial proof of concept for session failover was done by modifying this custom Worksite Setup tool in SAK-14241. A very large part of the changes to cluster enable this tool, was to make several inner classes, top level classes instead. This needed to be done for two reasons. First, we could not have an inner (data) class dependent on a service class we didn't want to cluster. Second, we needed to prevent accidental direct field access. Terracotta relies on instrumenting bytecode by intercepting method calls. Therefore any code that directly accesses a field, could bypass the Terracotta bytecode instrumentation. Specifically, a parent class can directly access an inner classes fields, even if those fields are made private. Therefore there is no Java syntax to prevent this behavior. By making these classes top level classes, we prevent direct field access and make sure any field change can be detected by Terracotta clustering. While most of the SiteAction inner classes are small data classes, there does not seem to be any negative effect to having these all be top level classes. SiteAction still remains 12,853 lines long with the inner classes were taken out.

Cluster Enabling the Resources Tool

We thought an example tool taken straight from the main Sakai 2.5.x code base would be a useful example for anyone considering these Terracotta clustering changes. We also thought that a tool a s student might use could be useful for performance testing the Terracotta clustering changes. To accomplish both of these goals we decided to cluster enable the Resources tool under SAK-14494. The approach for the Resources tool (the content module and subprojects in Sakai) turned out to be very similar to the Worksite Setup tool. Many the inner classes needed to be promoted to top level classes in order to prevent clustering/distributing the corresponding content services. In this case, distributing the services would have been very difficult because it contained an active database connection. In addition to promoting many inner classes to top level classes, many needed to have certain fields marked as transient in the terracotta configuration file. These fields then needed code written either in the config file or in the Java file, to resolve these transient fields. This was common in the inner classes that had strong ties to the parent service class they were pulled out of. (This was also true for MySession and MyLittleSession.) In trying to get all of the functionality of the Resources tool to work when Terracotta clustering was turned on, a certain portion of the Citations tool was also cluster enabled. At first this work seemed to be completed, however in system testing after the fact, we discovered the Citations code running inside of the Resources tool, is not completely cluster enabled. It's enabled enough to 'run' fine when the server is running, however it does not survive a failover if the user is in the middle of citations work. The rest of the Resources tool, while not in dire need of failover, does appear to failover correctly.

Terracotta Configuration and Terracotta Integration Modules

Terracotta must be configured in a very specific and detailed way. The Terracotta configuration file (typically called tc-config.xml) must define the names of classes to instrument, which methods will contain lock/transaction boundaries and how non clustered enable data will be handled (such as defining and resolving transient fields). Initially this work was done with SAK-13908. SAK-13908 defined the first tc-config.xml file and a terracotta-config module to encompass this configuration. This file grew unreasonably large and started to contain data that was specific to a particular tool. Given that our goal was to enable system administrators to turn Terracotta functionality on/off on a per tool basis, it did not make sense that all Terracotta configuration for all tools live in the same file. To improve Terracotta configuration, SAK-14675 was created. SAK-14675 defined a Terracotta Integratiom Module (TIM) for every Sakai module that required Terracotta configuration. A new xyz-tim sub-module was created for each area (tool-tim, event-tim, content-tim, site-manage-tim). Additionally, the new terracotta-config module was modified to be aware of these *-tim modules and to deploy both the tc-config.xml and the related TIM jar files at the deployers request. This continues with the premise of not forcing a Sakai deployer to take Terracotta changes they do not want, unless they decide to turn these features on. One caveat to this, is the *-tim modules are maven build-time dependent on the Terracotta repository. This somewhat violates the goal of no compile time dependencies, but it does so only for configuration file build artifacts, not for any Java code.

Other Jira Tickets

The following are the other JIRA tickets under the parent SAK-13324 ticket that have not been mentioned above. Either these tickets are still open or they were closed with no changes, based on a change in direction.

  • SAK-13325 Script acceptance tests for session clustering capabililty
    Still open
  • SAK-13326 Add SessionManager unit tests
    Complete
  • SAK-13327 Script component-level SessionManager performance tests
    Still open
  • SAK-13328 Allow clustered session lookup/intialization
    Closed, no longer relevant
  • SAK-13329 Allow configurable participation in distributed session attribute storage
    Closed, no longer relevant
  • SAK-13331 Refactor Presence to cope with session attribute clustering
    Closed, no longer relevant
  • SAK-13355 Instrument Sakai sessions such that attribute values and context can be logged and reported on
    Complete
  • SAK-13497 Add ComponentManager/ComponentLoader unit tests
    Complete
  • SAK-13832 Add UsageSessionServiceAdapter/UsageSession unit tests
    Still open
  • SAK-13835 Add "local failover" capability to Sessions, ToolSessions, and ContextSessions
    Most of this work changed to whitelist approach (see SAK-14105)
  • SAK-13893 Verify Terracotta-enabled ComponentManager and SessionComponent
    Still open
  • SAK-14240 Refactor BaseUsageSession to avoid casts to UsageSessionService implementation (UsageSessionServiceAdaptor)
    Still open