JCR rationale and implications

A place to pull together the ideas around JCR support in Sakai and to look at roadmap issues for deployment. Note terminology in various relevant documents is confused around (a) the base service implementing JSR170 (which may be extended over time with modest Sakai features) and (b) a higher level service implementing the current Content Hosting Service, but running on top of the base service implementing JSR170. Please take care to distinguish which is being referred to in a document. Here I have referred to (a) as JCR Service and (b) as JCR-CHS

Why is a JCR-backed Content Service a good idea?

We have a stable service for content already (CHS: Content Hosting Service) - Why change?

CHS is now a large, complex and difficult-to-maintain block of code
- This makes QA all the more difficult, it results in issues that are more resistant to resolution when they are discovered, and it places additional burdens on community developer resources.
- CHS has grown up organically around the particular requirements of Resources, Drop Box, etc., and not a mature, abstracted standard.
- While final testing results are not yet in, it's fairly clear that content performance could be much improved by off-loading the bulk of the CHS logic onto a JCR API.
Implementing a Java standard interface should allow us to substitute various commercial and open source implementations of content repository, giving us
- some degree of 'future proofing' so if one implementation (say Jackrabbit) becomes noticeably faster than another implementation (say Xythos), institutions can switch without major disruption
- different institutions with different budgets can make different choices concerning price/license/performance issues among alternative repositories without significantly affecting the tools that use content hosting
- a richer set of opportunities for other campus integrations that can work against the same store or provide different views of it
JCR repositories already implement some advanced features that we want (e.g. versioning) and using these implementations (which include at least one compatible open source implementation - JackRabbit) will save considerable development effort and reduce deployment risk (because large parts of the code are already production tested).
The variety of JCR implementations available all exceed what we have been able to achieve in terms of quality, test coverage, performance, etc.
JCR API has emerged from a great deal of industry knowledge exchange on the requirements of a content repository interface and therefore represents a mature, relatively complete service, that is noticeably simpler than the current Sakai service. Therefore new development on top of this service can be expected to be easier.
Whole new blocks of functionality may become easily available to Sakai if they are built on top of JCR (e.g. Alfresco Document Management)

Cons:

The development of the JCR standard/api/specification is not complete. JSR 170 will be replaced by JSR 283 sometime within the next 2 years.
We lose some control over the service if we wish to comply with the standard and gain the benefits of interoperability
Different implementations have interpreted the standard differently and there are incompatible implementations, so the full interoperability is unlikely

So what is the proposal and what are the implications for users, production and development?

The long term vision is that JCR-backed Content Service will replace Content Hosting Service and all tools manipulating content will use the new service. The new service (JCR Service) will be a minimal shim on top of a JSR170/JSR283 interface to a compliant repository. Sakai will ship with JackRabbit as the default repository implementation accessed through the new service.

Important considerations:

The JCR Service is able to run side by side with the existing Content Hosting Service, but content will not be accessible to both services without additional code (see JCR-CHS below)
An implementation of the current Content Hosting Service on top of JCR (JCR-CHS) can increase data compatibility and allow existing tools to access JCR service without modification, but due to the complexity of CHS it would be difficult to maintain and probably prohibitive to develop for multiple versions
The JCR Service is reasonably loosely coupled and can readily made to work with 2.5, 2.4, 2.3 and probably 2.2 (it is assumed it will be a primary service from 2.6 onwards)

There are some options for shorter-term transition scenarios
A. The new service remains in trunk, does not go into 2.5, and is included in the 2.6 release in Spring 2009

Between now and Spring 2009, we write new tools and convert existing CHS-dependent tools to use the new service. 2.6 becomes a major release with extensive testing of all CHS-dependent tools required and a considerable resource requirement for any bug-fixing across the suite of tools. Also, this is the sort of large project touching many tools which the community has struggled to execute in a timely fashion.
Between now and Spring 2009, we write multiple versions of JCR-CHS so we can support CHS tools on various Sakai versions. We also write CHS unit tests to confirm performance of the various versions. The 2.6 release deprecates CHS in favour of JCR CHS and introduces JCR Service. For 2009-2010 installations support for new JCR-based tools becomes possible and legacy tools continue to work (reduced testing load at cost of reduced pace of innovation and significant 'extra' development for the JCR-CHS versions).
...

B. The new service goes into 2.5 release and is switched on by a few institutions

Only JCR-Service goes in and new tools that depend on JCR-Service can be run in production by willing institutions. Advantage: production experience gained earlier
JCR Service and JCR-CHS go into 2.5 and legacy tools can begin to store content in JCR. Legacy tools migrate to JCR Service as and when development and testing resource becomes available
...

C. The new service goes into 2.5.x (the maintenance branch)

This would happen immediately following the release, to keep it out of the release
This would however violate the expectation of a maintenance branch, that it include bug fixes only
It would also complicate a future maintenance release of 2.5
...

D. The new service goes into a new 2.5 branch

It would differ from 2.5.x only in the JCR Service and Jackrabbit implementation
Either someone would have to volunteer to maintain this branch and do a lot of redundant merges, or a way would have to be worked out for 2.5.x commits to automatically be applied to this other branch as well
This would be a greater maintenance burden, but it would keep a "cleaner" release while still providing reduced burdens for testing and development with JCR
...

E. The new service and Jackrabbit implementation go toward a special release in May or June of this year

Apart from the JCR bits, it would be identical to 2.5.
This would allow more to be done with it: performance testing, JCR-CHS, testing of migration from CHS storage to JCR, fuller documentation.
This would however involve some "forking" of community effort (effort that might otherwise go toward a 2.5 maintenance release).
It would answer the issue of having to wait too long for the JCR Service, while not bringing new risks to the 2.5.0 release.
...

Should we put JCR Service in 2.5? Why? What about JCR-CHS?

There needs to be a path for new services to get into production. The historic practice has been to include the code in a release as 'disabled by default' so that an institution wishing to use the new service in production is required to activate the service and acknowledge the associated risk. But there should be NO risk to other production installations. Putting JCR into 2.5 in the manner proposed would follow this pattern, but this by itself does not create the need for it to be in 2.5.0 (vs. 2.5.1 or 2.6)
JCR Service is already in trunk, so putting it into 2.5.0 (disabled) would keep 2.5 closer to trunk than if JCR Service is left out, reducing the likelihood of merge conflicts with 2.5.x and future maintenance releases
- A production site that has done extensive testing of 2.5.0 may be reluctant to add it in with a point release (like 2.5.1) as resources may not be available for repeat full testing.
Current plans call for a 2.6 release in Spring 2009, which would be a long time to wait for an important new service
A JCR service and implementation would require significant testing before it could be fully supported in a release. The standard QA period would not suffice. Having it available for a broad range of testing and exploration is an important measure for the QA of the next release of Sakai.
For an institution that wants to deploy tools using the new service, they would need to wait for summer 2009 or deploy from trunk, which would be a considerable resource overhead and disincentive
- Tools that institutions may wish to pilot (or even deploy) on top of JCR Service include 'New Email Archive', 'Assignments2', ...
Pre-Production load-testing (e.g. at Michigan) is likely to take place on releases only and so without JCR Service in the 2.5 release, discovery of a production load issue is deferred or passed to risk tolerant institutions that activate the JCR-Service by moving it in from trunk.

Postscript on JCR vs JSR

My (JRN) understanding of the use of the term JCR matches Wikipedia (based on a cursory scan). i.e. JCR describes the 'class' (or something like that) of standard/specification of which JSR170 is the first version and JSR283 is the second version. Part of the reason for referring to JCR Service is that I hope we will develop it to implement JSR 283 as it comes along without fundamentally changing the existing service features.

http://en.wikipedia.org/wiki/Content_repository_API_for_Java

By this terminology both JackRabbit and Xythos are implementations of the JCR API (current version=JSR170) and are colloquially referred to also as "JCRs" as in Java Content Repositories as well as implementations of JCR API.