Sakai Session Clustering via Terracotta High Level Design

Mainly thanks to Josh Holtzman, it has been observed that Sakai ClassLoader complications make it difficult to bring "traditional" distributed cache technologies to bear on the problem of clustering Sakai sessions. Assuming that refactoring Sakai tools and services away from session usage altogether isn't a viable short-term option, a means for distributing application-defined ClassLoaders is called for. SAK-13330 discusses certain hacks for doing exactly this. Terracotta, though, provides such functionality off the shelf. For more discussion of Terracotta as a session clustering solution for Sakai, see this thread.

The remainder of this document discusses design and implementation issues related to a Terracotta-based session clustering solution. It does not address tool- or service-specific issues except for those which implement the core distribution framework and those whose semantics are unavoidably disrupted by distributed sessions. At the time we are only aware of a single instance of the latter (UsageSession).

Terracotta Object Distribution Constraints and Usage Patterns

Refer to the Terracotta.org documentation for additional information on object "portability" constraints and gotchas.

A portable object is an object instance of a classs that has been configured for distributed semantics. This configuration is referred to as "instrumentation" because it directs Terracotta to manipulate the object's bytecode.

A portable object is not actually distributed until it is added to the object graph associated with a "shared root" object. A portable object distributed in a "shared root" is not guaranteed to be distributed to client JVMs until the JVM requests a reference to the object. Terracotta may optionally optimize distribution of collection attributes by either eagerly fetching the objects in the collection, or by lazily distributing them as the collection is accessed. Shared "root objects" are typically collections themselves. In our case, we propose configuring the collection of session objects in the Sakai SessionManager as a Terracotta shared root.

The following subsections discuss certain consequences and limitations of such configuration as well as the use of Terracotta technology in general.

Inner classes

It is quite common for Sakai tools and services to implement domain object APIs as non-static inner classes on UI backing beans and service implementations. Initial testing shows that it is quite common for such objects to be placed into user sessions. For example, org.sakaiproject.site.tool.SiteAction$MyTool, org.sakaiproject.site.tool.SiteAction$SiteInfo, and org.sakaiproject.site.tool.SiteAction$MyIcon. Even the default implementation of Session itself is factored this way.

Because an instance of a non-static inner classes has an implicit reference to the instance of the outer class which created it, the outer class must be portable if the inner class is portable. For more details see the Terracotta "Gotchas" guide.

This constraint poses problems because it is unlikely that we intend, at least during initial development, to give Sakai singleton objects clustered semantics. It is also probably undesirable to instrument such objects but not retain their singleton semantics. For example, if we simply instrumented SiteAction as is, placing a MyTool instance in the session effectively results in a minimum of two SiteAction instances on all other JVMs in the cluster. This may or may not lead to actual application failure, but is certainly contrary to developer intentions and expectations.

This issue is discussed in slightly more detail below, but essentially we feel it is simpler and less risky to instrument only those objects that generally fit one's mental model of "what goes in the session." As such we are generally willing to accept the risk of refactoring non-static inner classes to top-level classes. FWIW, we generally feel that doing so improves Sakai's implementation on the whole, regardless of Terracotta considerations.

Locking

Update: 10/30/08

Initially it was thought that locking would be the most complicated and time-consuming aspect of distributing Sakai sessions via Terracotta will involve the issue of object locking. It has since been discovered, that a simpler approach to locking will work. Comments viewed below around 'locking' refer to previous versions of this page.

Terracotta requires that a lock be obtained prior to modifying a field of any currently clustered portable object, i.e. a portable object reachable from a shared root. Terracotta locks define transaction scopes as well as limiting concurrent modification access to shared objects. No two threads in a cluster can share a write lock on the same portable object, this is the concurrency aspect to Terracotta locks. Terracotta locks also serve as transaction boundaries for modifications made to ANY portable object state that is updated while holding a terracotta lock. The transactions are guaranteed to operate on a consistent, cluster-wide view of object-graph state, and will be committed in their entirety to the Terracotta server or not at all.

Terracotta supports declaring locks in a variety of ways. The most common way is to use the Java synchronize (or synchronized) keyword, and then declare and autolock section for the method where the keyword appears. Terracotta will then instrument the corresponding bytecode to cause a Terroctta cluster-lock to be obtained and released along the same line as the corresponding Java object lock would be.

The initial thinking was that each object in the cluster would need to be individually locked before a change was made to it. After better understanding of Terracotta locking was obtained, a simpler approach was discovered. All requests (of interest) pass through the Sakai RequestFilter to place the Sakai Session object in a wrapped HttpRequest object (see Sakai Request Filter and Sakai Session discussion here for more information on how these two relate). The Request Filter can be used advantageously, by obtaining a Terracotta lock on the Session object before the Filter doChain() method is called, and released after the method completes. This has the effect, of making all changes to the user's Session state for one web request, complete in one Terracotta transaction.

A risk to this approach (obtaining the Terracotta lock in the Sakai Request Filter) would not be sufficient if anything in Sakai modifies portable object state outside of the Sakai Request Filter. One example of this, is the Sakai SessionManager Maintenance thread. This thread tries to detect sessions that have expired and remove them from the SessionManager. In this case, the Maintenance thread must obtain (and release) it's own lock. This same principle can be applied to any other case where portable object state is modified outside of the Sakai Request Filter. This does not appear to be the case for anything else discovered so far (although this could simply be due, to the small number of cluster-enabled tools).

Fine-grained locking (synchronizing any portable object before changing it) does not appear to offer any advantage. In initial implementation, it even caused some problems when multiple iFrames tried to render at the same time.

For more detail see the related notes in the Terracotta.org docs. The locking configuration guide is also useful.

Loggers and Other Services/Framework Dependencies

Transience (discussed elsewhere, so skipping for now)

Change Detection for Portable Objects

A key distinction between Terracotta-based and EhCache-based session distribution is that Terracotta does not require explicit object dirtying/flushing to propagate modified object state to the cluster. That is, rather than calling myDistributedCollection.put("myKey", myObject) each time myObject is modified, Terracotta will transparently detect field-level modifications to myObject and ensure their distribution throughout the cluster. This is particularly Good News for distributing JSF session beans should client-side session state not prove to be viable for either technical or scheduling reasons.

Custom Collections

Terracotta distinguishes between physically and logically managed (i.e. portable) objects. Physically managed object state is distributed by effectively broadcasting field-level changes across the cluster. Logically managed objects are kept in-sync by method replay. HashMap, Hashtable, and HashSet are examples of logically managed objects. Subclasses of logically-managed classes can only be made portable with some restrictions. This does not seem like it will pose a significant problem for Sakai except possibly for JSF-implemented map extensions placed into the session. The two most likely candidates for this type of portability issue in Sakai seem to be org.sakaiproject.cheftool.Configuration and org.sakaiproject.component.app.help.model.HelpContextConfig.

A discussion of the difference between physically and logically managed objects is available here.

An incomplete list of strictly non-portable and unsupported classes is available here.

Terracotta Deployment Model

Hub/spoke (documented elsewhere, so skipping for now)

`ComponentManager`

Terracotta works fairly simply with most Java applications. Certain things make using Terracotta more complicated. Custom Classloaders is one of those things. To better understand what might be required the following post was made on the Terracotta Forums. It turns out, that any custom classloader that will load classes, for objects that are eventually made portable, must be registered with Terracotta. This registration is done using Java Reflection and an additional method on the custom ClassLoader. This was done with a prototype outside of Sakai to prove the concept first, then the change was made to the Sakai ComponentManager with SAK-13834. The registering of the Sakai ClassLoader is disabled, if Terracotta clustering is disabled in a particular Sakai instance.

`SessionManager`

DSO Roots

SessionComponent is the default SessionManager implementation. Most Sessions it creates are tracked by creating entries in an internal Map: SessionComponent.m_sessions. We propose configuring this map as a DSO root. In this way, entire Session graphs will be distributed across the cluster.

This entails instrumenting (in the Terracotta sense) Session, ToolSession, and ContextSession implementations and objects which instances of those implementation references, i.e. most objects added to sessions as attribute values.

Local vs Distributed Sessions and Attributes

Ideally, it need not be necessary to instrument all attribute values. For example, for performance reasons as well as in order to phase the overall development effort, it would be convenient if certain attributes could be considered local and some distributed. It remains to be seen how feasible this solution will be in a Terracotta environment, but at this time the proposed solution involves adding a second sessions datastructure to SessionComponent and a second attributes datastructure to MySession (the default Session impl). The original idea was to implement fallback logic added to SessionComponent that would trap Terracotta portability exceptions when first adding an attribute to a (shared/portable) Session and register that attribute in the new, "transient" map. This did not work as expected with Terracotta (the portability exception was not "catchable" in the place necessary). Instead a different approach, of creating a tool whitelist was implemented in SAK-14105. Tools that are listed in the whitelist are assumed to only put Terracotta portable objects into the session. Tools that are NOT in the whitelist, will have all their data stored in a "transient" data structure that mirrors the portable/shared data structure. The check for whether a tool is whitelisted or not is disabled, if Terracotta clustering is disabled in a particular Sakai instance. In that case, no data is stored in the transient/mirror data structure.

Session "Currentness"

The SessionManager API exposes operations for retrieving and setting references to the current Session and the current ToolSession. There is also a convenience method for retrieving the user ID associated with the current Session. Because ThreadLocalManager cannot be distributed, implementing these methods is potentially problematic for a Terracotta-distributed SessionManager. In reality, though, because we can assume that a given request cycle always completes "on" a particular app server node and that Terracotta is not configured to distribute arbitrary method calls, these methods should not require modification. So long as SessionManager.getSession(String) lookups are effectively distributed by configuring SessionComponent.m_sessions as a DSO, and so long as all entry points to the application, e.g. RequestFilter, WebServlet, correctly set their sessions as "current" (which we have to assume they do, even under a non-distributed configuration), no special code should be required for distributing session "currentness".

Transient Sessions

Edge cases may exist where SessionManager clients create "transient" Sessions. That is, in some cases a Session may be effectively scoped to the current request thread, even if its actual visibility is wider by virtue of having been cached in SessionComponent.m_sessions. Sessions can be considered transient for two reasons:

The Session is never added to SessionComponent.m_sessions, as is the case if a scheduled job or rogue application entry point relies on SessionManager.getCurrentSession() rather than SessionManager.startSession() for session instantiation.
The Session is added to SessionComponent.m_sessions but a corresponding key is not persisted to a response for lookup by a subsequent request

It is unclear how often either or both of these scenarios occur in Sakai, if at all, but even if they do occur we do not believe they disrupt the session distribution effort since distribution does not alter their functional impact. An argument could possibly be made that distributing such transient sessions in the second case is unnecessary overhead, which is true, but the cost of detecting and reacting to such usage is probably not worth the (potential) efficiency gain.

Session "Stickiness"

Theoretically, Terracotta-backed session distribution obviates the need for session "stickiness" whereby the load balancing mechanism ensures that requests for a particular session are directed to a particular application server, at least until that application server is unavailable for whatever reason. Since we propose scoping SessionComponent.m_session to the cluster rather than to a JVM, though, we should be able to service any given request on any app server node. We believe this is a naive expectation, though, since it may not be feasible or desirable to actually distribute the entirety of the Session, particularly during the early stages of this development effort. Therefore, we should assume that sticky sessions will continue to be the normative load-balanced configuration as it provides the best user experience and allows us to stage the development effort.

That said, the ability to distribute sessions may impact application objects which assume strictly sticky sessions, most notably UsageSession objects which cache (and persist) the ID of the app server node in which they were originally created. This topic is discussed in more detail below.

Session "Maintenance"

SessionComponent.init() instantiates and starts an instance of SessionComponent.Maintenance. The latter is responsible for periodically sweeping SessionComponent.m_sessions and checking for and invalidating inactive Sessions. This mechanism may be problematic for a Terracotta-backed Session distribution solution because distributing SessionComponent.m_sessions implies that per-VM SessionComponent.Maintenance instances will sweep potentially much larger collections and individual Sessions may receive redundant invalidate() calls. Additionally, there is a risk that by querying every Session in the m_sessions collection would bring all data in all session across every Sakai JVM into every other Sakai JVM.

With recommendation from developers at Terracotta, Inc. SAK-14581 was created and resolved. A new session timeout data structure was designed to keep track of Session identifiers and possible timeout values for the corresponding Session. When the SessionComponent.Maintenance runs, it will first check the session timeout data structure. Only for session ids in this structure that correspond to perceived expired sessions, will any further checking or invalidation be done. The further checking will indeed bring the Session into the local Sakai JVM (even if that Session did not originate there). However, this is expected to happen infrequently enough, the overhead of faulting in the Session object can be tolerated. If the performance is still a concern, then SessionComponent.Maintenance can be set to run on only one Sakai server.

`MySession` and `MyLittleSession`

By default, Session, ToolSession and ContextSession are implemented by non-static inner classes in SessionComponent. SessionComponent.MySession implements Session. SessionComponent.MyLittleSession implements the other two interfaces. This is problematic because we do not necessarily want to distribute SessionComponent itself, which is necessary if the *Session implementations are to remain non-static inner classes. Thus, in addition to adding portability exception handling to these classes, we propose refactoring them to top-level classes. Although this is prompted specifically by Terracotta constraints, we do believe this is also an incremental design improvement.

MySession and MyLittleSession depend on its implicit reference to SessionComponent in several locations. We will not list them all here as it is easy to determine the dependencies by using Eclipse's refactoring tools.

When refactoring these classes (see SAK-13324), we used Eclipse's automated refactoring output as a baseline. This output defines constructors that accept a reference to the previously outer class – SessionComponent in this case. This field was generalized to reference SessionManager, which had a ripple effect on direct field references preserved by Eclipse's automated refactoring.

The field storing the SessionManager reference should also be configured as transient in the Terracotta sense since we do not intend to distribute SessionComponent. Several options exist for resolving transient references in a Terracotta environment. Retrieving a SessionComponent reference from ComponentManager via a method in MySession and telling Terracotta to call that method any time it brings the object into the JVM was the approach taken here.

In some cases we relocated such semi-complex initialization logic into SessionComponent. If that object is the factory for *Session, it should be responsible for providing all arguments necessary to construct a valid *Session. Requiring that *Session call back into the SessionManager to get a UUID just obscures valid *Session construction semantics and is invalid in any case if SessionComponent dependencies will be refactored to SessionManager dependencies. This did not work in all cases, however. When code has a reference to a Session and they need to start a ToolSession or ContextSession, the same UUID is needed. To handle these new behaviors after the refactoring to top level classes, a new mix-in interface was added to SessionComponent. This was necessary instead of modifying the existing SessionManager interface, to avoid changing the SessionManager API.

Why Not Just Distribute the `SessionComponent` Spring Bean?

Instead of configuring SessionComponent.m_sessions as a DSO root, we could leverage Terracotta's existing Spring integration and distribute the SessionComponent bean in its entirety. This might allow us to avoid refactoring MySession and MyLittleSession to top-level classes, and might not be too difficult to pull off, assuming Terracotta can cope with Sakai's semi-unique ApplicationContext initialization process, and assuming it is straightforward to mark the IdManager and ThreadLocalManager dependencies as transient in the Terracotta sense.

We've not chosen to take this approach, though, on the grounds that it doesn't actually simplify the solution. MySession and MyLittleSession as well as the SessionComponent itself still need to be modified to support local attributes and possibly sessions so long as we assume not all objects stuffed into a session need necessarily be distributable. So long as we're "forced" to make such code changes, then, and given that we perceive elimination of non-static inner classes as a Good Thing, we do not perceive a benefit in trading the latter for the potentially increased complexity and risk of a Terracotta-distributed ApplicationContext.

`UsageSession`

UsageSession is a persistent object attached to the "real" Session. A UsageSession describes the client-server relationship. Of specific interest to us is its reference to the "current" app server's identifier. If this field were always a calculated value, it would not pose a problem. As it stands, though, this value is persisted to the database. Thus session stickiness is baked into the default UsageSession implementation (UsageSessionServiceAdapter).

As noted above, we still assume sessions will be sticky, but with new fail-over capability. So it's not the stickiness of UsageSessions that really poses a problem for us. It's the issue of detecting what passes for session "migration" in our proposed Terracotta environment. Recall that a given Session will effectively "be in" all JVMs at all times. As such, standard Terracotta field "transience" features will not be helpful for emulating session "migration" in the usual JEE sense.

One solution might implement UsageSession.getServer() to somehow check the current server ID, compare it to the UsageSession's cached server ID, and, if necessary replace with the current, flushing the new state to the database. Such a side-effect laden design seems distinctly undesirable on principle.

Another solution would leverage Sakai event subsystem. Under this design, SessionManager.setCurrentSession(), which should be invoked by all application entry points, could fire an event for which UsageSessionServiceAdaptor registers as a local observer. Upon receipt of such an event, UsageSessionServiceAdapter could attempt to retrieve the current UsageSession from the now-current Session and perform the UsageSession.getServer()-with-side-effects operations described immediately above. This solution has the disadvantage of relative obscurity and of adding overhead to each request cycle. Note that the request cycle overhead, though, could be eliminated by configuration for installations not deploying Terracotta.

A third option simply refuses to persist server IDs with UsageSession instances when running under a Terracotta configuration. The most noticeable drawback to this approach would be degraded functionality in the "Online" tool. It is not known what other effects might be created by null or non-sensical values in SAKAI_SESSION.SESSION_SERVER. For example, some institutions use the SAKAI_SESSION and SAKAI_EVENT tables for data mining purposes, in which case suddenly eliding SESSION_SERVER fields might be a rude suprise. Additionally, several SQL queries implemented by UsageSessionServiceSqlDefault either order results by SESSION_SERVER or have predicates which test the value of SESSION_SERVER, most notably getOpenSessionsOnInvalidServersSql(). ClusterEventTracking also depends on the SESSION_SERVER field to avoid duplicate processing of local events. SessionWarehouseTask queries this field when loading SessionBean objects, but usage of those objects has not been fully investigated. There do not seem to be any compile-time dependencies on the corresponding accessor (SessionBean.getServer()).

A fourth option is to change the RequestFilter from the tool-util module. This filter is used for anyone hitting the Sakai web site. The filter finds the user's Session object (an instance of MySession) from the SessionManager based on the user's cookie which contains the session id. In this scenario, we can modify the RequestFilter to examine the Session after it has retrieved it from the SessionManager. Once RequestFilter has the Session object, it can try to pull the UsageSession out using the well known key that UsageSessionServiceAdaptor uses to place UsageSession in the Session in the first place. If RequestFilter finds a UsageSession object, it can then compare the server id it reports against the server id obtained from the ConfigurationService. If the two values match, no additional processing is necessary and the RequestFilter continues as normal. If the two server id values differ, then RequestFilter can call a new method on UsageSession to update the server id in the UsageSession object. BaseUsageSession (the implementation of UsageSession) should then be modified to support this new set method. The implementation of the set method should also force an update to the corresponding row in the database. This can be done similar to how BaseUsageSession.closeSession() currently works (it updates the database as well).

At this time, we chose to implement the fourth solution, this is detailed in SAK-13330.

In order to distribute BaseUsageSession without also distributing UsageSessionServiceAdapter, it is also necessary to refactor BaseUsageSession to a top-level class (this was done as SAK-13833. Presumably, we do not intend to distribute UsageSessionServiceAdapter on the grounds that doing so would require instrumenting the Spring ApplicationContext, which was rejected above on the (not entirely compelling) grounds of complexity and risk. For example, presumably we do not intend to distribute all the service beans upon which UsageSessionServiceAdapter depends, which implies a fair amount of transient dependency configuration and uneven semantics (ok, which beans are cluster-scoped singletons again?). If we do not distribute the ApplicationContext, but do distribute the UsageSessionServiceAdapter class, we end up with multiple instances of that class in each JVM which, while possibly not disastrous, is certainly not the original design intent.

Promoting BaseUsageSession to a top-level class probably involves more code changes than ApplicationContext instrumentation, but seems simpler nevertheless. And, as with MySession and MyLittleSession, seems to contribute an improvement to the code base in general.

We will not go into the details of what the process of promoting BaseUsageSession entails. The notes above re such refactoring for MySession and MyLittleSession should be sufficient to provide a sense of general approach and scope of work.

Configuration

In general, we hope that Terracotta-based session clustering can be easily enabled and disabled without modifications to source code or Spring bean definitions. The following sections categorize the types of changes we anticipate being necessary to cluster Sakai sessions via Terracotta and the corresponding impact on an installation choosing to run without Terracotta or to run with a localized Terracotta configuration, e.g. with more or fewer distributable session attributes.

Session Attribute Value Refactoring

Code changes to enable the portability of any given object which may be added to a Sakai session as an attribute value should not introduce any functional regressions. For example, refactoring MySession to a top-level class should have no impact on how Sakai functions from an end-user perspective.

Such changes may impact code dependencies, though, in any situation where localized code has dependencies on the "old" inner class factoring. We assume the latter is unlikely, given that part of the point of factoring these classes as inner classes should have been to tightly scope dependencies. However, even if an institution were not to run a Terracotta server, its local code customizations may be impacted by Terracotta-related code changes. At least in the case of MySession, automated regression tests have already been added, but these will not survive entirely in-tact because the proposed refactoring alters the object construction mechanism.

In general, we hope to avoid adding new locking code to instrumented classes, but in some cases this may be unavoidable (Sakai RequestFilter and SessionComponent.Maintenance). This is potentially undesirable on the grounds that a Terracotta-less installation should not need to pay that synchronization overhead. In reality, though, session attributes are always subject to multi-threaded access, so where locking code must be added, we anticipate that its absence could likely be construed as a bug, regardless of Terracotta considerations.

Framework Support

As described above, certain modifications to ComponentManager and SessionManager implementations are necessary to enable Terracotta-based session distribution. The ComponentManager must register component ClassLoaders with the Terracotta runtime and the SessionManager must cope with session attributes, and possibly entire sessions, which contain or are themselves non-portable objects.

In the case of the ComponentManager, the proposed modification requires a very minor subclass of the existing ClassLoader implementation (to add a Terracotta getName() method). Enabling/disabling Terracotta ClassLoader registration can be done with a simple boolean Java property. The bigger problem in this case is probably the issue of compile-time dependencies on Terracotta. This can generally be side-stepped via reflection, especially b/c ClassLoader registration is strictly a startup-time activity.

In the case of SessionManager, the proposed changes are more invasive, but the existing implementation isn't exactly factored to support a subclass to add special fallback to local session attribute management. And since the whole point of the proposed changes is to preserve existing behaviors, it doesn't seem that any special configuration should be necessary to enable or disable Terracotta support in SessionManager. Again, the primary issue is the additional overhead of checking a Tool id against a whitelist. This can be minimized by enabling/disabling the functionality with the same boolean Java property used for ClassLoader registration mentioned above.

Terracotta Configuration

A Terracotta server or client is configured by a single file: tc-config.xml. In production, clients typically retrieve an authoritative version of this file from the server.

File modularization can be accomplished by importing TIMs (Terracotta Integration Modules) which are typically distributed as Maven-generated jar artifacts containing fragments of a tc-config.xml.

tc-config.xml must explicitly reference the TIMs it wishes to import, which may be hosted below any number of file paths, the latter also being configured in tc-config.xml.

For more information on TIMs, see the configuration guide and the TIM development space

Our proposed approach, then, is to decompose Sakai object instrumentation into per-module TIMs. This method was documented and completed with SAK-14675. This allows the institution to localize a "master" tc-config.xml relatively easily by selecting from a set of pre-configured configuration bundles that represent the set of objects that should be instrumented in that installation. Deploying the TIMs and creating and managing the master tc-config.xml will be an exercise left to the local installation, although a default version of this file will be available for manual build and deploy.

Terracotta Cluster Enabling a Sakai Tool

With the work done in SAK-14105, session attributes are only clustered when they are part of a whitelisted tool. Adding a Sakai tool id to the whitelist, is an indication that all the work necessary to configure Terracotta has been done for that tool. Typically this requires a minimum of a new Terracotta Integration Module (TIM, see previous section). It might also require code to resolve transient (non portable) fields in clustered objects. Resolving transient fields can be done inside of the Terracotta config XML file, however, it can sometimes be difficult to get the syntax right. It is also likely a piece of configuration a developer might forget to update if they are not, themselves, using Terracotta. The general approach to Terracotta cluster enabling a tool, will be described in a separate guide found here.

Testing

Some testing has been described in this page. Further testing needs to be conducted with the recently terracotta-cluster-enabled Resources tool (see SAK-14494.