Sakai Session Clustering Problems and Ideas

Update: 10/29/08

The information provided on this page, is provided for historical and background purposes. This page explains the other options that were explored prior to settling on an approach that uses Terracotta to enable cluster failover support. For information on the current approach please review the Sakai Session Clustering via Terracotta High Level Design page.

That Sakai user sessions do not fail over from node to node in a Sakai custer has been a source of some frustration for systems administrators and end users. It is also something of a blocker issue for commercial interests hoping to host content and assessment delivery engines on a Sakai platform. Recent discussion on the sakai-dev list indicates that this issue is still open. Excessively large object serialization/replication overhead seems being the biggest obstacle. Additionally, Sakai does not use container-managed sessions either exclusively or by default, meaning that "traditional" Servlet container-managed session replication likely wouldn't prove to be a viable solution, even if session footprint could be successfully reduced. Instead, as Ian indicated, replacing Sakai sessions' attribute store(s) with a distributed cache might allow us to exercise fine-grained control over the scope and size of replication events while avoiding the need to completely re-imagine/re-write Sakai session management. Thus it seemed natural to collect notes on session replication (See also, the discussion of distributing Sakai caches in general).

Currently, this document consists primarily of an under-the-covers look at "sessions" in Sakai intended to help better understand the problem space. As with most framework abstractions, Glenn Golden wrote the authoritative document on Sakai sessions. Glenn's paper on Sakai's request processing is also helpful, especially for understanding the relationship between Sakai and HTTP sessions.

Update: 7/30/08

Distributed caching and other solutions described in this document run afoul of Sakai ClassLoaders. Terracotta is one alternative and is the option we're currently pursuing. Development activity is tracked by subtasks below SAK-13324.

Sakai Sessions vs HttpSessions vs Sakai UsageSessions vs Sakai Presence

Sessions

The Sakai RequestFilter wraps container-managed HttpServletRequests with a custom decorator which caches the container-implemented HttpSession, but allows Sakai to effectively preclude direct access to the latter. Schematically (draft diagram):

A Sakai Session, then, can be thought of as an HttpSession with somewhat more ambiguous semantics. That is, depending on configuration, invoking HttpServletRequest.getSession() may return any of four session "types":

  1. A container-implemented HttpSession scoped to a particular ServletContext, i.e. web application (the "first one" encountered by the user, typically /portal).
  2. A Sakai-implemented Session (which also happens to implement HttpSession) scoped to all web application invocation paths fronted by the Sakai RequestFilter, i.e. for all of Sakai
  3. A Sakai-implemented ContextSession (which also happens to implement HttpSession) scoped to a single webapp (i.e. ServletContext).
  4. A Sakai-implemented ToolSession (which also happens to implement HttpSession) scoped to a particular tool placement.

Note that in all cases except one ("org.sakaiproject.util.RequestFilter.http_session"= CONTAINER_SESSION), the container-provided HttpSession is effectively replaced by a Sakai-constructed Session of configurable scope. The default configuration ("org.sakaiproject.util.RequestFilter.http_session"= TOOL_SESSION) gives clients access to tool placement-scoped sessions, falling back to sessions associated with groups of tools if no placement session exists. Thus, without configuration, objects stored in user sessions are not visible to the Servlet container. As such, enabling container-managed session replication will typically have no effect, or at least not the intended effect.

Forcing Container-Managed Sessions

Whether or not Sakai would behave properly under a container-managed session configuration is not entirely known. Informal experimentation indicates that such configuration does not cause Sakai to break in obvious ways. However, there's nothing "important" in container-managed sessions. For example, after creating a worksite and manipulating the Forums tool in a variety of ways, the container session has only one attribute: portalskin=defaultskin. Contrast this with the Sakai-scoped Session:

sakai.portal.site.2e3f26ab-d3bd-4881-90a2-c6d02b28c6bc=3678ff14-e7cf-4545-8145-9fb563596fd7, sakai.locale.admin=en_US, org.sakaiproject.event.api.UsageSessionService=[], Access.Copyright.Accepted=[], sakai.portal.site.94b0c44e-539f-44fa-b204-78a3318b36ec=7c5e75aa-2033-436c-8871-d7a54f5e64c1, org.sakaiproject.util.EditorConfiguration.enableResourceSearch=false, sakai.portal.site.!gateway=!gateway-100, sakai.portal.site.~admin=~admin-360, sakai.portal.site.d7f18e57-cc7b-4371-8972-aaccb0416c3a=d4bf7ff3-4cb4-4752-80c0-d3b0f6f4506f, attr_preference_is_null=true

The collection of ToolSessions is quite a bit larger yet.

This behavior occurs because Sakai-managed "session" objects of all three scopes are always available from the SessionManager API, even if the RequestFilter has not been configured for Sakai-scoped sessions. That is, under certain configurations, HttpRequest.getSession() and SessionManager.getCurrentSession() may not return the same object. Thus, even with container-managed sessions enabled, there is no guarantee that the container will in fact have access to all attributes relevant for a user's current interaction, thereby (potentially) obviating the usefulness of container-managed session replication. Static analysis of the Sakai trunk at r41592 (plus SASH) shows 41 references to Session.setAttribute(), 7 references to ContextSession.setAttribute() (most from RSF libraries), and 307 references to ToolSession.setAttribute(). By contrast, there are only 27 references to HttpServletRequest.getSession() and only 39 references to HttpSession.setAttribute().

Session IDs and Cookies

Sessions are bound to an application server by their IDs (and Path), which are stored client-side in JSESSIONID cookies. For example, upon first accessing http://sakai.university.edu/portal, the browser will receive a Set-Cookie header such as the following, where "sakai-host-01" is the value of sakai.serverId. That property is typically configured in local.properties:

Set-Cookie: JSESSIONID=503d1bcb-c3a5-4f53-8d1f-18bec76d00b2.sakai-host-01; Path=/

The Sakai SessionManager (a SessionComponent by default) maintains an in-memory data structure to track Sessions. This data structure is keyed by Session IDs, excluding the embedded serverId. Thus, the serverId exists simply for request routing at the container level and higher. Elements in this data structure are not persisted to the database in any form. It does not contain direct references to ContextSessions nor ToolSessions

A Note on JSESSIONID Handling

Because Sakai sets the JSESSIONID cookie's Path attribute is set to the root, the browser will send this cookie on any request to the same domain, in this case sakai.university.edu. This contrasts with "normal" usage of JSESSIONIDs in a Tomcat Servlet container in which JSESSIONID cookies are scoped to paths representing ServletContexts. For example, consider two web applications, webapp-A and webapp-B deployed to a single Tomcat instance. Each webapp consists of a single servlet which simply outputs a hello world message after ensuring that a HttpSession exists for the current request. Successive requests to each webapp result in the following header exchange:

*http://localhost:8080/webapp-A*

GET /webapp-A HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080207 Ubuntu/7.10 (gutsy) Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=4443AAF15E28F35D799CBF46B583A740; Path=/webapp-A
Content-Length: 13
Date: Fri, 14 Mar 2008 00:39:46 GMT
----------------------------------------------------------
*http://localhost:8080/webapp-B*

GET /webapp-B HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080207 Ubuntu/7.10 (gutsy) Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=A7FFE44805F9323EF0B2EBC2D5887377; Path=/webapp-B
Content-Length: 13
Date: Fri, 14 Mar 2008 00:39:55 GMT
----------------------------------------------------------

Session Attribute Storage

As implemented, each Sakai Session, ContextSession, and ToolSession maintains its own in-memory attribute storage data structure (ConcurrentHashMap). That is, although the latter two objects always have a parent Session, there is no single data structure in which all attributes relevant to an end-users current interaction state are stored.

HttpSession Contract

Although Session, ContextSession and ToolSession default implementations all satisfy the HttpSession interface, their implementations do not satisfy the Servlet's specification's requirement that all HttpSessionAttributeListeners "in the web application" be notified of setAttribute() and removeAttribute() calls.

UsageSessions

Sakai UsageSessions are (non-serializable) attributes of Sessions. They are part of the event management API and are typically created at user login. A UsageSessionService, implemented by UsageSessionServiceAdapter is responsible for creating and persisting UsageSessions. So, while Sessions and UsageSessions are one-to-one, it is the latter that is written to the SAKAI_SESSION table. Thus the values of SAKAI_SESSION.SESSION_ID do not correlate with JSESSIONID cookie value substrings. UsageSession persistent fields:

Field

Note

SESSION_ID

UUID allocated at UsageSession creation. Not equivalent to Session IDs.

SESSION_SERVER

A concatenation of the configured server ID and a timestamp generated at the last initialization of BasicConfigurationService, i.e. the last startup of the Sakai instance to which the UsageSession belongs.

SESSION_USER

A pseudo-foreign key to SAKAI_USER_ID_MAP.USER_ID. Represents the end user associated with this UsageSession

SESSION_IP

The remote IP associated with this UsageSession

SESSION_USER_AGENT

A remote client descriptor, e.g. a browser identifier

SESSION_START

UsageSession creation timestamp.

SESSION_END

UsageSession completion timestamp. A UsageSession is considered "active" if SESSION_START = SESSION_END.

SESSION_ACTIVE

Optimization for queries seeking "open" sessions

Records in SAKAI_SESSION are used to correlate SAKAI_EVENT records with the remote user's identity, location and platform. Interestingly, not all SAKAI_EVENT records seem to have corresponding SAKAI_SESSION records. For example, in a development database, this query returns 1078 rows, 558 of which have null SAKAI_SESSION.SESSION_ID values:

SELECT E.SESSION_ID event_session_id, S.SESSION_ID session_id FROM SAKAI_EVENT E left outer join SAKAI_SESSION S on E.SESSION_ID = S.SESSION_ID where S.session_id is NULL

UsageSessions are also used in administrative tools for browsing active sessions on an installation-wide basis.

"Logout" of a UsageSession invalidates the current Sakai Session (which technically may or may not actually be the Session to which the UsageSession is bound).

UsageSessions can be used to determine the "actual" user's identity for any Sakai interaction. Tools and services which allow users to impersonate other users do so by modifying fields on Session, but leave UsageSession fields untouched.

Presence

"Presence" denotes the transient intersection of a user, a site, a page, and a tool, i.e. the navigational context targeted by the user's last "click." A site, page, and tool triplet is referred to as a "location ID". Page and tool identifiers are optional fields in a location ID. Internally, the default PresenceService implements "presence" as a protected, non-static inner class on org.sakaiproject.presence.impl.BasePresenceService. Instances of BasePresenceService.Presence have references to a UsageSession and a string representation of a location ID. These Presence objects are bound into Sakai Sessions and are (sort of) persisted to the Sakai database in the SAKAI_PRESENCE table. Since Presence objects reference UsageSessions they take on the semantics of an application server-bound object. That is, given a Presence, one could theoretically determine not only the tool with which a user is interacting, but the application server on which a user interaction is occuring. That said, not only is Presence not actually a part of the public PresenceService API, but the PresencetoUsageSession binding in the Presence class does not appear to be necessary since Presence only actually has need of the UsageSession's ID. Also note that the Admin On-Line tool (sakai.online) does not report on location-to-app server relationships.

When a user logs in to Sakai and the Presence tool (sakai.presence) is enabled, an iframe below the the current site's page navigation causes the browser to issue a request to a "special" PortalHandler, encoding the current site ID into the request's URL. For example, the super user's MyWorkspace page navigation includes the following markup:

<div id="presenceWrapper">
  <div id="presenceTitle">Users present:</div>
  <iframe name="presenceIframe" id="presenceIframe" title="Users present:"
    frameborder="0" marginwidth="0" marginheight="0" scrolling="auto" src="http://localhost:8080/portal/presence/%7Eadmin" >
  </iframe>
</div>

Upon receiving the browser's request for this iframe's content, the portal's PresenceHandler constructs a "virtual" Placement instance, suffixing "-presence" to the current site ID to construct the ID field, and forwards control to the ActiveTool instance representing the sakai.presence tool. The sakai.presence tool, implemented by org.sakaiproject.presence.tool.PresenceTool, then invokes PresenceService.setPresence(), passing the Placement's ID as the location ID. setPresence() finds the current UsageSession, writes its ID and the received location ID through to SAKAI_PRESENCE, constructs a new BasePresenceService.Presence and places that object into the current "sakai.presence.service" ToolSession, using the location ID as the attribute key. Subsequent calls to setPresence() with the same location ID are short-circuited if the current "sakai.presence.service" ToolSession has a Presence attribute cached under that location ID. If the check for a cached Presence object in the session is successful, the Presence's TTL is extended. In all cases, setPresence() also walks the "sakai.presence.service" ToolSession, checking for and removing "expired" Presence instances. Removing a Presence object from the session has the side-effect of deleting the corresponding SAKAI_PRESENCE record.

When a new Presence object is created, BasePresenceService.setPresence() fires a "pres.begin" event. When a Presence object is removed from a session Presence.valueUnbound() fires a "pres.end" event. These events are typically consumed by PresenceObservingCouriers the PresenceTool has placed into the ToolSession under the "observer" key. Placing PresenceObservingCouriers in the sessions allows the PresenceTool to avoid creating multiple couriers for a given user session and a given location ID. Note that the Presence and PresenceObservingCouriers objects are placed into different ToolSessions. The former are placed into a ToolSession representing the singleton, i.e. cross-tool, PresenceService. The latter are placed into ToolSessions representing locations. PresenceObservingCouriers are a means for simulating a "push" of new presence state to the client. For example, if the server "knows" that a user is no longer present at a particular location, events can effectively propagate to the client, resulting in refreshed on-screen displays of user presence.

... TODO: This probably deserves a diagram ...

Obstacles and Challenges

Custom Session Attribute Replication Mechanism

I don't think it's going to be valuable to just "try" container managed sessions to see if they'll work. Reimplementing Sakai-managed session storage to delegate to a container-managed session feels like non-trivial work, and we're not at all sure that container-managed session replication will be tunable enough to cope with Sakai's session footprint. (Although Terracotta might help.)

Instead, it seems simpler to focus exclusively on replicating attributes stored by Sakai-managed sessions. One way to do this is exactly what Ian suggested: delegate session attribute storage to the MemoryService, backed by one or more distributed ehcache cache instances. So, for example, assuming a cache structure such as:

  CacheManager ->
    SessionAttributesCache ->
      123456789-attrib-A -> value-A.1
      123456789-attrib-B -> value-B.1
      987654321-attrib-A -> value-A.1
      987654321-attrib-B -> value-B.1

A simplified Session.getAttribute() implementation might look like this:

public Object getAttribute(String name) {
     Cache cache = cacheManager.getCache("SessionAttributesCache");
     Object cachedObject = cache.get(getId() + "-" + name);
     return cachedObject == null ? localAttribMap.get(name) : cachedObject;
   }

A Session.setAttribute() implementation might look like this (listener callbacks, among other things, not shown):

public void setAttribute(String name, Object value) {
     if ( isCacheableSessionAttributeKey(name) ) {
       cacheManager.getCache("SessionAttributesCache").put(getId() + "-" + name,value);
     } else {
       localAttribMap.put(name,value);
     }
   }

A per-session cache instance model is also possible, and without any more complicated code:

  CacheManager ->
      SessionCache-123456789 ->
         attrib-A -> value-A.1
         attrib-B -> value-B.1
      SessionCache-987654321 ->
         attrib-A -> value-A.2
         attrib-B -> value-B.2
public Object getAttribute(String name) {
     Cache cache = cacheManager.getCache("SessionCache-" + getId());
     Object cachedObject = cache.get(name);
     return cachedObject == null ? localAttribMap.get(name) : cachedObject;
   }

   public void setAttribute(String name, Object value) {
     if ( isCacheableSessionAttributeKey(name) ) {
       cacheManager.getCache(getSessionCachePrefix() + getId()).put(name,value);
     } else {
       localAttribMap.put(name,value);
     }
   }

The cache instance model will probably be dictated by ehcache's behavior as the cache entry set size grows (e.g. long pauses as the data structure is resized/rehashed?) and the feasibility of dynamically instantiating distributed cache instances.

Update: 3/26/08

After much discussion of the relative merits of session attribute replication vs. a shared "session server," we've put work on the former on hold pending "lightweight" performance testing of the two approaches. Session attribute replication is appealing in that it avoids dependencies on a single point of failure and the overhead of serializing and de-serializing entire sessions on each request, but its configurational complexity and memory footprint implications are worrisome. That mutable objects are placed into the Sakai session are also problematic for a cache-backed session replication approach. A shared "session server," on the other hand, is appealing in its configurational and conceptual simplicity. There's no need to partition the cluster, there's no need to worry about flushing mutable objects in a cache and application servers need only keep enough sessions in memory to service their current request load. This comes at the cost of reading an arbitrarily large object stream in at the beginning of a request, and writing a similar stream back out on reply. Clever diff algorithms may alleviate the per-request de/serialization overhead, but at that point we're heading in a direction which potentially adds inappropriate levels of complex, custom plumbing to Sakai. In the absence of a clear conceptual "winner," then, we've elected to pause and test each approach under load outside of a Sakai context.

Update: 4/1/08

Initial test results comparing attribute replication and session server performance have been collected into a spreadsheet. We've yet to see what kind of overhead we should expect from a truly distributed cache, so no conclusions can necessarily be drawn yet. So far, though, ehcache seems to be the clear winner, being roughly an order of magnitude faster than either Oracle or MySql session storage. Ehcache still degrades rather steeply, though, as current user loads increase, at least in a single node configuration.

Code under test is in contrib. More detailed Grinder output can also be found below that directory.

Update: 4/2/08

Completed testing distributed ehcache in 2- and 3-node configurations for a variety of concurrent user loads, each user having a steadily growing session size up to approximately 100KB. Raw test output is available in a tar graphs are available in a spreadsheet. Testing with Oracle did not proceed beyond a single-node, 8-user test since anything greater resulted in several hundred error transactions. Since MySql did not perform significantly better, testing of that option was simply abandoned following a 1-node, 12-user test.

Each user transaction resulted in the creation of one map entry representing a session attribute. The combined serialized size of keys and values was approximately 100B, excluding the overhead of any corresponding data structures. In the case of database-backed session storage, the entire object representing the session is read from and written to storage on each request. Reports charting mean response time against session size take the additional overhead of serializing complete session objects into account. In the case of ehcache-backed session storage, each request generated at least one get on the cache, but only new key/value pairs were written on each request. Tomcats were restarted and caches were individually initialized and discovered prior to each test run.

Load balancing and sticky sessions were provided by Apache mod_jk.

Testing above 16 concurrent users was not feasible given available load generation hardware.

Observations, Conclusions and Outstanding Issues:

  1. Concurrent user load and/or total cache size has a far greater impact on ehcache performance than does the presence of additional nodes
  2. Ehcache's performance signature for a fixed total cache size across varying concurrent loads is not yet known. Given that ehcache generally exhibits near-constant time performance for the duration of any given load, though, we suspect concurrency has a far greater impact than does total cache size.
  3. Although Ehcache's performance degrades rather steeply (slightly less that 1ms per concurrent user or 100K of maximum cache size), it is better than an order of magnitude faster than either database option and degrades much less steeply. Further, we hope that this behavior represents something of a worst-case scenario, if we assume that in general sessions are more read- than write-heavy.
  4. Although a cache-based approach would seem to be the clear winner, and we do not intend to continue a "session server" approach, its clear degradation with increased load is somewhat concerning, as are certain problems with cache initialization timing. For example, even with each cache configured with a RMIBootstrapCacheLoaderFactory, we repeatedly observed the following behavior, where a particular cache entry remains effectively invisible to the cluster, leading us to believe that some ehcache customization/bug-fixing may be necessary. Terracotta may be another solution:
    1. Cache 1 starts up. Client puts something in it.
    2. Cache 2 starts up. Discovers cache 1. Reports a cache size of 1.
    3. Cache 3 starts up. Client puts something in it. put does not propagate to Cache 1 (i.e. still reports cache size of 1).
    4. Client puts something else in Cache 3. put does propagate to Cache 1 (i.e. now reports cache size of 2).

Reacting to Failover

My mental model of session failover assumes "sticky failover" whereby a load balancer directs a given user to a single application server node until that node fails, at which point the balancer selects another node for subsequent requests. This (theoretically) allows us to tolerate an asynchronous replication strategy and avoids excessive disruption to UsageSession behaviors and semantics. However, when a Session is "moved" from one server to another, UsageSessions either need to come with and have their server bindings updated, or they need to be reallocated. Reallocation makes the most sense from a semantic standpoint, but this is problematic b/c Sessions and UsageSessions may not agree on the current user's identity. Thus it will probably be necessary to re-implement (and modify the semantics of) UsageSession to be portable between app servers and to find a mechanism for mimicking the behavior of HttpSessionActivationListener, which a Servlet container is obligated to invoke when migrating a session from one node to another.

Non-Serializable and Large Attributes

Assuming an ehcache-backed MemoryService (and therefore session attribute store), non-Serializable session attribute values (of which UsageSession is one!) will not be distributable. Those that are distributable or can be made to be distributable may or may not be designed for efficient serialization. Some such objects may not be valuable enough to be replicated, but we at least need to have some idea of the extent to which Sakai sessions are plausibly distributable as is.

Code for intercepting and logging session attribute adds is in contrib: https://source.sakaiproject.org/contrib/unicon/tool/branches/session-clustering/. Methods for for retrieving and/or logging per-session setAttribute() metrics are exposed via a JMX bean named bean:name=sessionJournaler. Note that java.util.Collection and java.util.Map types are expanded such that, for example, passing a java.util.ArrayList consisting of three java.lang.String instances to Session.setAttribute() will result in the java.lang.String counter for the given attribute key and the current tool being incremented three times. java.util.Map keys are ignored. As such, these reports represent approximate data points, at best.

I've yet to completed a full test script with Selenium, but here's an example of output from the session attribute interception collected during a sequence that involved login, worksite creation, forum creation, multiple topic creation, and a failed attempt at topic thread creation. Column legend:

(A) Session ID

(B) Tool ID

(C) Session Attribute Name

(D) Session Attribute Value Type (* == non-Serializable)

(E) Serializable

(F) Count

(G) Cumulative Serialized Size (bytes)

The serialized size listed is a simple sum which accumulates on each call to setAttribute(), even if the attribute value has not actually changed. Some objects are serializable but have zeroes in column G. This probably occurs when none of the object's fields are actually serializable or JDK-implemented serialization fails for any other reason. Calls to removeAttribute() are ignored.

A Note on JSF

Clustering JSF presents (at least) two problems:

Replicating the component tree state from request to request

Theoretically, JSF could solve the first problem for us by serializing the component tree state into responses for client-side storage, as would occur when javax.faces.STATE_SAVING_METHOD is set to "client." For whatever reason, though (fear of tampering? bandwidth constraints? serialization bugs?), the only JSF tools currently configured for client-side state persistence are:

  1. Gradebook
  2. Roster
  3. Sections

Absent any new information on why this might be the case, the simplest solution seems to be to simply try turning on client-side state "saving" for other JSF tools. If client-side component state saving is unacceptable for some reason, we'll need to dig a bit to figure out what exactly the framework stashes in the session when configured for server-side component state saving. I assume all the javax.faces.component.UIViewRoot objects we see in the session are the result of server-side component state saving, but I don't yet understand why those objects aren't serializable or by what other means JSF expects them to be replicated by a clustered web application deployment.

Replicating session- and application-scoped backing bean state

Application-scoped managed beans: hopefully, these objects are initialized at app startup and treated as immutable thereafter. We see three application-scoped beans in the current trunk:

  1. Gradebook - loginAsBean (faces-test.xml) - Test configuration, so hopefully can be ignored
  2. JSF "Base" Module - Components (faces-config.xml) - Bridges the java.util.Map and org.sakaiproject.component.api.ComponentManager APIs. Access the ComponentManager via its static cover.
  3. Roster - services (faces-config.xml) - Something of a Registry into which JSF injects a variety of Sakai service objects. Effectively a more fine-grained version of the application-scoped "Components" bean in the JSF "Base" Module.

Since all three of these beans are either test objects or immutable after startup, it would seem they could be safely ignored for our purposes. Even so, we should probably review the spec to understand exactly how JSF expects these beans to be treated in a clustered webapp.

I know from development experience that JSF places restrictions on the serializability of managed bean properties, but it doesn't seem as if there are any restrictions on the beans themselves, which seems odd. More spec reading would seem to be in order here. For the time being, though, I'm assuming session-scoped managed beans will need to be carefully reviewed for serializability, especially those that cache domain objects with potentially very large associated graphs. Session-scoped beans which cache Sakai service objects should mark such fields as transient and implement lazy-loading logic to retrieve service references from the ComponentManager cover in the event of null references. This allows session-scoped beans to be both dependency injected and clusterable. None of this, however, addresses the issue of flushing the cache when a session-scoped bean's state changes (we discussed a similar issue above w/r/t Presence). If the framework does not set session beans into the session after each request, I believe we're left with either writing session bean state into the response (see below) or otherwise building phase awareness or request filtering into JSF tools such that session beans are flushed on every request.

The Sakai FlowState component is worth consideration as an option for effectively avoiding session-scoped beans altogether (thereby side-stepping the cache flushing problem). This approach, does not, however, obviate the requirement that beans to be written into the component tree state implement Serializable.

To-do

The following tasks are listed in an execution order which tries to attack core framework problems first and defer work tool- and/or service-specific problems. This has the advantage of ensuring session re-implementation does not introduce regressions, since the presence of a session clustering capability should be completely non-disruptive if disabled. System-level performance testing is not listed since we're currently tracking it separately (internally). "Official" System-level QA is also tracked separately. The "Refactor UsageSession" and "Refactor Presence" line items are more logically aligned with "Phase 2" goals ("Per-Tool/Per-Service Refactoring"), but are so fundamental to Sakai session management (at least UsageSession, anyway), they were included in "Phase 1" as part of the effort to deliver refactored framework session management services.

"Phase 0" - Evaluate Session Replication vs Session Persistence

See "info" block above, dated March 26, 2008.

"Phase 1" - Baselining and Refactoring Framework Session Management

Task

Notes

LOE

Script Acceptance Tests

Using Selenium, record click-through scenarios for each tool. Complete functional tests are out-of-scope, but we need some way to ensure changes to session management do not introduce regressions, and to assist with automated testing of failover senarios. Estimate is fairly generous, but we've seen Selenium fail semi-randomly in ways that seem somehow related to Sakai framesets.

3d - 5d

Script Component-Level Unit Tests

This is intended to guard against regressions when re-implementing the SessionManager API. Currently, the default implementation has no direct automated tests.

2d

Script Component-Level Performance Tests

A separate effort will verify system-level performance impact of session clustering against some baseline. This line-item is intended to establish a component-level performance baseline, primarily for the SessionManager API, using methods similar to those demonstrated by clustered MemoryService performance tests. Mix of session attributes could be derived from reports generated from execution of scripts developed in previous line-item.

3d - 4d

Clustered Session Lookup/Intialization

Replicating session attributes is only part of the problem. The in-memory map of Session objects must also cluster such that a call to SessionManager.getSession(String) "works" on any node. One solution is to re-implement all Session properties as cacheable name-value pairs. ContextSession and ToolSession would need to be similarly re-implemented, as would collections of references to those objects in Session.

3d - 4d

Clustered Session Attributes

Theoretically, we will require more fine-grained control over the Session attributes to be cached locally and those which will be eligible for clustered storage (as shown schematically in code-snippets above). Configuration can be relatively static (i.e. executed at startup, encoded in Spring bean defs), but should be readily localizable. I suspect that having the ability to enable/disable clustered attribute storage at tool-by-tool as well as key-by-key granularities will be useful. Eventually, eliminating key-by-key options seems like a fine goal, since nothing should really be in the session which isn't absolutely necessary for a consistent user experience, IMHO.

2d

Refactor UsageSession

As mentioned elsewhere, some solution for this particular attribute will be necessary since not only is it non-serializable (and implemented by a non-static inner class), but it is by definition bound to a particular app server. One wonders if it is necessary at all, though. If it simply represents a session's current binding to a particular application server, the current user's canonical identity, and tracks a variety of other WWW-oriented attributes (remote IP, user agent, etc), is it really necessary to model this collection of attributes as a dedicated class? Perhaps it is, since objects of this class is persisted to the database, but some development work will be necessary to either:

  1. Completely discard it in favor of additional properties on Session (probably unacceptable)
  2. Refactor to top-level class implementing java.io.Serializable and devise and implement "lifecycle" callbacks to notify UsageSessions of Session migration

3d - 5d

Refactor Presence

Since Presence doesn't "need" UsageSession references, the simplest solution here is to promote the class out of BasePresenceService and refactor the UsageSession dependency to a string dependency (UsageSession.getId()). Thus Presence can fairly easily be refactored to implement Serializable. The bigger problem, I think, is propagating Presence state changes across the cluster. For example, calling Presence.setActive() updates an internal m_expireTime field. This modification will not be picked up automatically by most cache implementations. Potential solutions:

  1. Reimplement Presence to store each of its fields as discrete ToolSession attributes.
  2. Fire Events from Presence methods that update state such that a listener object can dirty the appropriate ToolSession attribute
  3. Inline the event handling of the previous option directly in the Presence implementation
  4. Encode the event handling of the previous option(s) in a AOP layer wrapped around Presence instances
  5. Delegate cache-dirtying responsibilities to its client (PresenceService).
    The last option is probably the simplest and least error prone, especially if we take this opportunity to redesign the Presence interface to have value object semantics (note that Presence identity is completely transient; it isn't even written through to SAKAI_PRESENCE).

    The issue of non-Serializable PresenceObservingCouriers in the session will also need to be addressed, but that can be treated as a separate issue from Presence itself. See the "Refactor Non-Serializable Objects" line item below.

2d - 3d

"Phase 2" - Per-Tool/Per-Service Refactoring

Task

Notes

LOE

Services in Sessions

Service objects, i.e. Sakai "components," almost certainly have no business being placed into sessions, except possibly as transient properties of session-scoped properties of JSF beans. If placing a service object in the session is truly unavoidable, it is almost certainly inappropriate to cluster its attribute key. In the latter case, some mechanism must exist for lazily locating the requested service object following session migration. We've not yet fully exercised Sakai with session instrumentation enabled, so the only such object of which we're currently aware is org.sakaiproject.announcement.impl.DbAnnouncementService which the sakai.motd tool places into the session (unsure of the actual session type) under the service key

2d/service

Refactor Non-Static Inner Classes

I anticipate issues serializing instances of non-static inner classes since I believe the JVM will attempt to include the outer class in the object stream, which is almost certainly not what we want, since the outer class is almost always (if not certainly always) a singleton Sakai service object. Again, test scripts are not yet complete, but so far we've encountered the following such objects:

Tool

Key

Type

Count

sakai.sitesetup

toolRegistrationList

org.sakaiproject.site.tool.SiteAction$MyTool

36

sakai.sitesetup

site.info

org.sakaiproject.site.tool.SiteAction$SiteInfo

8

sakai.sitesetup

icons

org.sakaiproject.site.tool.SiteAction$MyIcon

4

sakai.login

org.sakaiproject.event.api.UsageSessionService

org.sakaiproject.event.impl.
UsageSessionServiceAdaptor$BaseUsageSession

1

sakai.presence

1056ed49-6059-4814-adaf-e9a4a6a967ca-presence

org.sakaiproject.presence.impl.
BasePresenceService$Presence

1

sakai.sitesetup

form_participantToAdd

org.sakaiproject.site.tool.SiteAction$Participant

1

sakai.presence

~admin-presence

org.sakaiproject.presence.impl.
BasePresenceService$Presence

1

This is potentially quite disruptive work, although we're currently working under the assumption that dependencies on concrete API implementations are minimal.

1d - 5d/type

Refactor Non-Serializable Objects

Such objects need to be excluded from session replication (and reasonable behavior tested for in failover situations), removed from sessions altogether, or refactored to Serializable implementations. Of particular concern are the JSF component trees. Again, test scripts are not yet complete, but so far we've encountered the following such objects:

Tool

Key

Type

Count

sakai.sitesetup

toolRegistrationList

org.sakaiproject.site.tool.SiteAction$MyTool

36

sakai.sitesetup

site.info

org.sakaiproject.site.tool.SiteAction$SiteInfo

8

sakai.sitesetup

icons

org.sakaiproject.site.tool.SiteAction$MyIcon

4

sakai.forums

/jsp/dfReviseTopicSettingsAttach.jsp

javax.faces.component.UIViewRoot

2

sakai.presence

observer

org.sakaiproject.util.PresenceObservingCourier

2

sakai.forums

/jsp/discussionForum/forumsOnly/dfForums.jsp

javax.faces.component.UIViewRoot

2

sakai.motd

service

org.sakaiproject.announcement.impl.
DbAnnouncementService

2

sakai.motd

sakai.menu

org.sakaiproject.cheftool.menu.MenuImpl

2

sakai.login

org.sakaiproject.event.api.UsageSessionService

org.sakaiproject.event.impl.
UsageSessionServiceAdaptor$BaseUsageSession

1

sakai.iframe.
myworkspace

sakai.menu

org.sakaiproject.cheftool.menu.MenuImpl

1

sakai.presence

1056ed49-6059-4814-adaf-e9a4a6a967ca-presence

org.sakaiproject.presence.impl.
BasePresenceService$Presence

1

sakai.forums

ForumTool

org.sakaiproject.tool.messageforums.
DiscussionForumTool

1

sakai.iframe.site

sakai.menu

org.sakaiproject.cheftool.menu.MenuImpl

1

sakai.forums

/jsp/dfReviseForumSettingsAttach.jsp

javax.faces.component.UIViewRoot

1

sakai.forums

/jsp/discussionForum/message/dfAllMessages.jsp

javax.faces.component.UIViewRoot

1

sakai.forums

/jsp/dfCompose.jsp

javax.faces.component.UIViewRoot

1

sakai.sitesetup

form_participantToAdd

org.sakaiproject.site.tool.SiteAction$Participant

1

sakai.presence

~admin-presence

org.sakaiproject.presence.impl.
BasePresenceService$Presence

1

sakai.sitesetup

sakai.menu

org.sakaiproject.cheftool.menu.MenuImpl

1

sakai.iframe.service

sakai.menu

org.sakaiproject.cheftool.menu.MenuImpl

1

1d - 5d/type

Refactor Large Objects

This particular item is difficult to estimate and we're not yet in a position to target particular classes, but we're including it here for completeness as performance testing results may dictate that session footprint be reduced in certain areas.

?

Documentation

Tool developers will need some kind of guidance for coding in a cluster-friendly manner.

2d - 3d