Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Event Types

Description

ORIGINAL_MIGRATION

This means that is was part of the original table copy

content.add

This means that it was added to the migration table as the result of receiving a content.add event.

content.write

Same idea as above for writes

content.delete

Same idea as above for deletes

Below is a small example of what the table would look like during migration:

CONTENT_ID

STATUS

EVENT_TYPE

/myfolder/

1

ORIGINAL_MIGRATION

/myfolder/file1.txt

1

ORIGINAL_MIGRATION

/myfolder/file2.txt

1

ORIGINAL_MIGRATION

/myfolder/Music/

0

ORIGINAL_MIGRATION

/myfolder/Music/whittyBanter.mp3

0

ORIGINAL_MIGRATION

/myfolder/Music/moreBanter.mp3

0

content.add

/myfolder/file1.txt

0

content.write

In this scenerio, we've already copied myfolder and two files over that were already in Resources when we started migrating. We still need to copy over the Music folder and my whitty banter mp3. Also, since the migration has started I've added even more banter to my Music folder and changed some of the text in one of my files. These changes are added to the queue now and will be processed in good time.

  1. Starting Migration
    • Check to see if migration has ever started before. This is done by counting the rows in MIGRATE_CHS_CONTENT_TO_JCR. If there are any rows in the table it means the migration has been started previously.
    • If the migration is starting for the first time, the existing CHS data is added to the table.
      • The COLLECTION_ID's from CONTENT_COLLECTION are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
      • The RESOURCE_ID's from CONTENT_RESOURCE are added to MIGRATE_CHS_CONTENT_TO_JCR with a status of 0 and event type of ORIGINAL_MIGRATION
  2. During Migration
    Each round of data migrating consists of starting a TimerTask, which fetches n unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table and copies them to the JCR Repository. The timer tasks all use one Timer and do not start until the previous finishes. There is an delay time t that can be configured, to specify the time to wait between each batch.
    • Fetch the next N unfinished items from the MIGRATE_CHS_CONTENT_TO_JCR table.
    • For each item:
      • If the item is a ContentCollection and the event type is ORIGINAL_MIGRATION, content.add, or content.write copy the ContentCollection to JCR. If the collection already exists in JCR, do not delete and re-add it, just overwrite the metadata properties, and remove any properties that are not in the source collection.
      • If the item is a ContentCollection and the event type is content.delete, remove the collection node from JCR. In the case that the collection was later readded in Resources, the content.add event for it will be further down the queue, so it will be recreated in that case.
      • If the item is a ContentResource and the event type is ORIGINAL_MIGRATION, content.add, or content.write, we will delete the file node in JCR and recreate it by copying the resource over from CHS. This is a bit different from the ContentCollection, where we did not actually remove the node before recreating it, since it was a folder and did not want to destroy the files/folders inside of it. In this particular situation, a resource file will never have children. ( Though in a pure JCR world, it is possible to do this, but the original ContentHosting has nothing modeled like this)
      • If the item is a ContentResource and the event type is content.delete, then we delete the file node from JCR completely.
      • After operating on the item, we update it's row in MIGRATE_CHS_CONTENT_TO_JCR and set the status to 1, finished.
    • After finishing all the content items in the batch, we reschedule this TimerTask setting the delay to the configurable batch delay t.

...