Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

There is currently a beta quality JCR implementation of Content Hosting in trunk. This work was originaly done and documented in SAK-10366. This is some good historical information available too, but that is now deprecated. This page will deal specifically with JCR information related to the ContentHostingService and parts of the Resources tool. For general information about JSR-170 support in Sakai see here

The very first phase of JCR integration for Resources is an implementation of the existing ContentHostingService API using a JCR backend. With this initial support, the Resources Tool and Sakai DAV are meant to operate as they stand with no changes to their code.

Installing the JCR ContentHostingService

Sakai Trunk

Note: Some work on this is currently occuring in content branch SAK-12105. Should be merged back within a few weeks. SG - Nov 13, 2007 Content Hosting service on jcr is available in trunk, and can be built as part of the framework profile. Deploy the famework with

No Format

mvn -Pframework clean install sakai:deploy -Dmaven.tomcat.home=/tomcathome; 
rm -rf /tomcathome/components/sakai-content-pack; 
cd content; 
mvn -Pframework-jcr clean install sakai:deploy  -Dmaven.tomcat.home=/tomcathome 

In addition, you can use the full-jcr maven profile to install everything with JCR including the content-providers, etc.

  1. If Sakai is already built and deployed, remove the following directory from the tomcat deploy
    tomcat/components/sakai-content-pack
  2. Then checkout and build the content branch
    No Format
    svn co https://source.sakaiproject.org/svn/content/branches/SAK-12105
    mvn clean install sakai:deploy -f SAK-12105/pom.xml -PJCR
    
  3. At the moment there is an issue with the content provider component. After installing remove components/sakai-content-providers-pack.

Sakai 2.5.x

TODO This is probably the same as the Sakai Trunk instructions, it just hasn't been tried out yet.

...

  1. Checkout and install DB from trunk
    No Format
    svn co https://source.sakaiproject.org/svn/db/trunk db
    mvn clean install sakai:deploy -f db/pom.xml
    
    • If you can, run maven und from 2.4.x db, otherwise removing the following from your tomcat:
      • tomcat/components/sakai-db-pack
      • tomcat/shared/lib/sakai-db-api
  2. Checkout and install entity from trunk
    No Format
    svn co https://source.sakaiproject.org/svn/entity/trunk entity
    mvn clean install sakai:deploy -f entity/pom.xml
    
    • Again if you can't use maven und, manually remove the following:
      • tomcat/components/sakai-entity-pack
      • tomcat/shared/lib/sakai-entity-api
  3. At this point there might be two versions of hibernate in tomcat/shared/lib
    Remove hibernate 3.1.3 and leave 3.2.5ga
  4. Checkout and install the new content code
    No Format
    svn co https://source.sakaiproject.org/svn/content/branches/SAK-12105
    mvn clean install sakai:deploy -f SAK-12105/pom.xml -PJCR
    
    • If you can, run maven und from 2.4.x content, otherwise removing the following from your tomcat:
      • tomcat/components/sakai-content-pack
  5. At the moment there is an issue with the content provider component. After installing remove components/sakai-content-providers-pack.

Enabling JCR Content

JCR is disabled by default. To enable it you can either switch over the components beans or use the JCR Inspector to switch over realtime.

Using JCR Inspector to switch over realtime

The JCR Inspector has controls to switch over to using the JCR Content Service (and switch back). They are located on the Import Legacy CHS Data view. Click on the buttons to switch.

Content Migration

Unlike some of the other data upgrades that have been performed on Content Hosting, the conversion to a JCR implementation requires copying all of the data over to another repository. Below is a description of the first algorithm are working on to perform the migration.

...

  1. Starting Migration
    • Check to see if migration has ever started before. This is done by counting the rows in MIGRATE_CHS_CONTENT_TO_JCR. If there are any rows in the table it means the migration has been started previously.
    • If the migration is starting for the first time, the existing CHS data is added to the table.
      • The COLLECTION_ID's from CONTENT_COLLECTION are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
      • The RESOURCE_ID's from CONTENT_RESOURCE are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
  2. During Migration
    Each round of data migrating consists of starting a TimerTask, which fetches n unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table and copies them to the JCR Repository. The timer tasks all use one Timer and do not start until the previous finishes. There is an delay time t that can be configured, to specify the time to wait between each batch.
    • Fetch the next N unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table.
    • For each item:
      • If the item is a ContentCollection and the event type is ORIGINAL_MIGRATION, content.add, or content.write copy the ContentCollection to JCR. If the collection already exists in JCR, do not delete and re-add it, just overwrite the metadata properties, and remove any properties that are not in the source collection.
      • If the item is a ContentCollection and the event type is content.delete, remove the collection node from JCR. In the case that the collection was later readded in Resources, the content.add event for it will be further down the queue, so it will be recreated in that case.
      • If the item is a ContentResource and the event type is ORIGINAL_MIGRATION, content.add, or content.write, we will delete the file node in JCR and recreate it by copying the resource over from CHS. This is a bit different from the ContentCollection, where we did not actually remove the node before recreating it, since it was a folder and did not want to destroy the files/folders inside of it. In this particular situation, a resource file will never have children. ( Though in a pure JCR world, it is possible to do this, but the original ContentHosting has nothing modeled like this)
      • If the item is a ContentResource and the event type is content.delete, then we delete the file node from JCR completely.
      • After operating on the item, we update it's row in MIGRATE_CHS_CONTENT_TO_JCR and set the status to 1, finished.
    • After finishing all the content items in the batch, we reschedule this TimerTask setting the delay to the configurable batch delay t.

Edge cases

What if the server crashes?
The server crashes during a batch of copies. When it starts up, the copy that was in progress will still be marked 0 in the table. The copier always handles the case where the node already exists in JCR for some reason, and will just overwrite it and continue.

How do I switch when the conversion is done?
This is a tricky situation. The migration appears to be done when there are no entries in the table with status 0. But whenever someone causes a content event to happen a new migration entry appears. The very end of the migration may need to coincide with some sort of downtime to seal the deal and switch implementations.

Is it possible for the content events to be added out of order?
What would happen if someone added a folder and some files in the folder to Content Hosting, and for some reason the content.add for the child file was triggered before the content.add for the parent folder.

I think I'm going to need to add a timestamp column to the algorithm, and always sort on it before fetching the next batch of items to copy.

Running the migration

Currently for testing and development the only hooks for actually starting/stoping the migration are in the GUI JCRInspector. There will be some hooks added for starting it automatically on tomcat boot like other upgrade scripts do. The migration hooks are also exposed in an API so they could be triggered via quartz or a web service, etc.

TODO add API details

Testing Migration Integrity

...