Overview

There is currently a beta quality JCR implementation of Content Hosting in trunk. This work was originaly done and documented in SAK-10366. This is some good historical information available too, but that is now deprecated. This page will deal specifically with JCR information related to the ContentHostingService and parts of the Resources tool. For general information about JSR-170 support in Sakai see here

The very first phase of JCR integration for Resources is an implementation of the existing ContentHostingService API using a JCR backend. With this initial support, the Resources Tool and Sakai DAV are meant to operate as they stand with no changes to their code.

Installing the JCR ContentHostingService

Sakai Trunk

Note: Some work on this is currently occuring in content branch SAK-12105. Should be merged back within a few weeks. SG - Nov 13, 2007

If Sakai is already built and deployed, remove the following directory from the tomcat deploy
tomcat/components/sakai-content-pack

Then checkout and build the content branch

svn co https://source.sakaiproject.org/svn/content/branches/SAK-12105
mvn clean install sakai:deploy -f SAK-12105/pom.xml -PJCR

Sakai 2.5.x

TODO

Sakai 2.4.x

This needs to be built with Maven 2, and almost certainly requires pulling in the DB and Entity modules from trunk. It also requires pulling in util, but we did that above when installing the jackrabbit service. If you are using a different implementation such as Xythos, you will also need to install trunk util as described above.

Checkout and install DB from trunk
```
svn co https://source.sakaiproject.org/svn/db/trunk db
mvn clean install sakai:deploy -f db/pom.xml
```
- If you can, run maven und from 2.4.x db, otherwise removing the following from your tomcat:
  - tomcat/components/sakai-db-pack
  - tomcat/shared/lib/sakai-db-api
Checkout and install entity from trunk
```
svn co https://source.sakaiproject.org/svn/entity/trunk entity
mvn clean install sakai:deploy -f entity/pom.xml
```
- Again if you can't use maven und, manually remove the following:
  - tomcat/components/sakai-entity-pack
  - tomcat/shared/lib/sakai-entity-api
At this point there might be two versions of hibernate in tomcat/shared/lib
Remove hibernate 3.1.3 and leave 3.2.5ga
Checkout and install the new content code
```
svn co https://source.sakaiproject.org/svn/content/branches/SAK-12105
mvn clean install sakai:deploy -f SAK-12105/pom.xml -PJCR
```
- If you can, run maven und from 2.4.x content, otherwise removing the following from your tomcat:
  - tomcat/components/sakai-content-pack

Content Migration

Unlike some of the other data upgrades that have been performed on Content Hosting, the conversion to a JCR implementation requires copying all of the data over to another repository. Below is a description of the first algorithm are working on to perform the migration.

This first version:

Doesn't try to do anything fancy.
Is not parallelized in any way. It can only run on one node in the cluster at a time.

Depending on how testing goes and the needs of other universities, this may be spiffed up a bit. Otherwise, if it runs in a reasonable amount of time and tests out, it may stand as is.

Migration algorithm in a Nutshell

Create a table to store path and migration status for each item. The migration will run through the table top to bottom, copying the items over and marking them finished when done.
Copy all the entries in CONTENT_COLLECTION to the migration table. Then copy all the entries in CONTENT_RESOURCE to the migration table. This means that we will copy all the collections/folders over first, so they exist when we try to copy the files over.
Start copying items over.
Listen for Content Events that create, edit, or delete content items, and append these to the end of the migration table.

Detailed Description of Algorithm

Each row in the MIGRATE_CHS_CONTENT_TO_JCR table contains the CONTENT_ID, the Status, and the Event Type. It's important that the table be interpreted as existing linearly in time. Any row further down the table is expected to model a content event that occurred after the ones previously. Processing the table rows in order from top to bottom is required.

Status	Code
Not Started	0
Finished	1

Those are the only two. In the event that add the ability to run this in parallel on multiple cluster nodes, we will certainly have to add more status types.

Event Types	Description
ORIGINAL_MIGRATION	This means that is was part of the original table copy
content.add	This means that it was added to the migration table as the result of receiving a content.add event.
content.write	Same idea as above for writes
content.delete	Same idea as above for deletes

Starting Migration
- Check to see if migration has ever started before. This is done by counting the rows in MIGRATE_CHS_CONTENT_TO_JCR. If there are any rows in the table it means the migration has been started previously.
- If the migration is starting for the first time, the existing CHS data is added to the table.
  - The COLLECTION_ID's from CONTENT_COLLECTION are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
  - The RESOURCE_ID's from CONTENT_RESOURCE are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
During Migration
Each round of data migrating consists of starting a TimerTask, which fetches n unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table and copies them to the JCR Repository. The timer tasks all use one Timer and do not start until the previous finishes. There is an delay time t that can be configured, to specify the time to wait between each batch.
- Fetch the next N unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table.
- For each item:
  - If the item is a ContentCollection and the event type is ORIGINAL_MIGRATION, content.add, or content.write copy the ContentCollection to JCR. If the collection already exists in JCR, do not delete and re-add it, just overwrite the metadata properties, and remove any properties that are not in the source collection.
  - If the item is a ContentCollection and the event type is content.delete, remove the collection node from JCR. In the case that the collection was later readded in Resources, the content.add event for it will be further down the queue, so it will be recreated in that case.
  - If the item is a ContentResource and the event type is ORIGINAL_MIGRATION, content.add, or content.write, we will delete the file node in JCR and recreate it by copying the resource over from CHS. This is a bit different from the ContentCollection, where we did not actually remove the node before recreating it, since it was a folder and did not want to destroy the files/folders inside of it. In this particular situation, a resource file will never have children. ( Though in a pure JCR world, it is possible to do this, but the original ContentHosting has nothing modeled like this)
  - If the item is a ContentResource and the event type is content.delete, then we delete the file node from JCR completely.
  - After operating on the item, we update it's row in MIGRATE_CHS_CONTENT_TO_JCR and set the status to 1, finished.
- After finishing all the content items in the batch, we reschedule this TimerTask setting the delay to the configurable batch delay t.

Edge cases

What if the server crashes?
The server crashes during a batch of copies. When it starts up, the copy that was in progress will still be marked 0 in the table. The copier always handles the case where the node already exists in JCR for some reason, and will just overwrite it and continue.

How do I switch when the conversion is done?
This is a tricky situation. The migration appears to be done when there are no entries in the table with status 0. But whenever someone causes a content event to happen a new migration entry appears. The very end of the migration may need to coincide with some sort of downtime to seal the deal and switch implementations.

Testing Migration Integrity

In order to test the integrity of the migration, a random sampling of files/folders will be chosen for comparison. To compare these we will fetch the ContentCollection or ContentResource for both of them from both implementations, and then compare the properties, and some of the methods that determine properties such as conditional release. Occasionally the byte streams will be compared as well, but perhaps not as often depending on how long it takes for each one.