A Tour of Sakai Import

The Motivation

The first thing to understand about this code is the motivation behind it. For a long time (maybe since the beginning), Sakai has shipped with a feature in the Site Info tool called "Import from file." The idea is that the maintainer of the site will upload a specially-formatted zip file that contains an archive of all her course materials. The tool works like a wizard, with four screens:

  1. Asks for a file to upload.
  2. Shows you a list of categories of content in your archive and allows you to choose the items that should be pulled in.
  3. Displays a confirmation screen with a summary of what is about to happen and a "Finish" button.
  4. Upon completion, it displays an "Import Complete" message and an "Ok" button.

The problem with this feature from the get-go was that the only archive files this tool understood were those produced by Sakai's archival functionality and Indiana University's legacy Oncourse system. The initial motivation then was to find a way to substitute other archive file formats for this one, namely Blackboard 5.5 archives in use at Texas State University.

All the code for reading and parsing an archive file was in the Site Info tool code itself, in a 12,000+ line file called SiteAction.java. The new design needed to factor the archive parsing and processing out into an independent module and provide the flexibility to support new archive types without breaking the existing support for Indiana's files.

UI Drives the Design

For better or for worse, the user interface of this feature dictated some major requirements: the fact that it gives you the opportunity to choose categories of content meant that any new archive parser must be able to provide something that satisfies the concept of content categories. Also, it implies that there must be a parsing step preceding the final import step.

The Archive Module

During integration week for Sakai 2.3, the import code was moved into the Sakai trunk from contrib, where it had been under development. It was decided that the appropriate place for it to live would be within the existing archive module. The existing archive code includes the ArchiveService which has been used for importing and exporting in Sakai, including handling the archives in "Import from file."

The long-range plan is for Sakai to support the IMS Common Cartridge specification natively for both importing and exporting. The archive module will gradually acquire and consolidate all these capabilities, but for now it contains two distinct pieces: the original archive service, and the new import architecture. This document exclusively deals with the latter portion of this module.

The import side of things consists of five directories: import-api, import-impl, import-parsers, import-handlers, and import-pack. We will discuss each of these in turn, but first let us address a high-level overview of the approach to archive importing.

The Approach and Key Terminology

One of the important facets of the archive import problem is that archives can come from learning management systems other than Sakai. This means that there is not a clear-cut mapping between resources in an archive and the tools they will inhabit in Sakai. Also, since Sakai represents a large and growing ecosystem of tools, the design for parsing archives should be decoupled from the design for pushing archive resources into the various appropriate tools.

Thus the key architectural feature of the import code is that it divides the problem into two parts: reading an archive file and extracting the pieces of content out is done by exactly one parser. Taking the content pieces and stuffing them into Sakai tools is done by multiple handlers, one handler for each Sakai tool.

Parser

A parser is a class that understands how to read an archive file and extract from it a vendor-neutral collection of content objects.

Handler

A handler takes a vendor-neutral content object and stuffs it into a particular Sakai tool. A handler belongs to one and only one Sakai tool. A handler may be called upon many times in the process of importing a single archive.

A parser is written to accommodate all the specific features of a particular archive format. For example, there is a parser for Blackboard 5.5 archives, and another parser for Blackboard 6.0 archives, because the formats are different. There is yet another parser for IMS Common Cartridge 1.0. Note that only the Common Cartridge and the original Sakai format parsers are included in the Sakai release; Other parsers are available in contrib/migration/import.

The key to decoupling the parsing from the handling is that the parser must produce a collection of vendor-neutral content objects. This means that by the time they get to the handler, they are generic, and don't have any vendor-specific formatting. This ensures that the handlers don't have to have any code in them to deal with the particulars of any single format, which in turn means that the same handlers can be used again and again for any archive format, present or future.

These generic content objects still have types which identify what kind of content they are. Some examples of these content object types are: WebLink, Assessment, Announcement, etc. Each content class must implement an interface called Importable. As a group, these content objects are referred to as importables. An object called an ImportDataSource acts as a container for the group of importables. We can ask an ImportDataSource for the categories for that archive, and ask it for the Importables that match a given list of categories.

Importable

An importable is a content object that implements the Interface org.sakaiproject.importer.api.Importable. Importables are extracted as a collection by a parser, and passed one at a time to one or more handlers to be added to Sakai.

ImportDataSource

An ImportDataSource is an object that acts as a container for the Importable objects in an archive. You can think of it as the abstract representation of some archive.

Sitting between the parsers and the handlers is the ImportService. The default implementation of the ImportService, BasicImportService, keeps a list of available parsers and a list of available handlers, both of which are Spring-injected and configurable in a components.xml file. Here is how a hypothetical client of the ImportService might use it:

ImportService example
ImportService importService = org.sakaiproject.importer.cover.ImportService.getInstance();
byte[] fileData = fileFromUpload.get();
if (importService.isValidArchive(fileData)) {
    ImportDataSource importDataSource = importService.parseFromFile(fileData);
    List lst = importDataSource.getItemCategories();
    importService.doImportItems(lst, siteId);
    }

Here are two simple data flow diagrams illustrating the two steps of the import process:

  1. parse an archive file and get back an ImportDataSource
  2. pass a collection of importables to the handlers, which then push content into Sakai

The import-api Project

Let's begin looking at the import-api project. Here's what the files look like exploded out:

Configuring Import

version information

These instructions assume you are using the code for Sakai 2.3.
If you want to configure the import code with Sakai 2.2.x, see 2.2.x Import Instructions

commercial parsers

Sakai does not come with Blackboard, WebCT, or any other commercial parsers installed. Commercial parsers live in contrib, and must be installed separately. Copy the parser folders you want from https://source.sakaiproject.org/contrib/migration/trunk/import-parsers into your Sakai source code in the /archive/import-parsers directory. You must then configure, build, and deploy the archive module with the new parser(s). You must add dependencies for the additional parsers to the project.xml file in /archive/import-pack.

Edit import-pack/src/webapps/WEB-INF/components.xml file

Configure parsers

By default, the only parser that is configured with Sakai is the original Sakai archive parser. The common cartridge parser is commented out. You can add common cartridge support to import by uncommenting the common cartridge bean.

	<!-- File parsers -->
	<bean id="org.sakaiproject.importer.api.ImportFileParser-Sakai"
			class="org.sakaiproject.importer.impl.SakaiArchiveFileParser"
			singleton="false">
	</bean>

<!--
    <bean id="org.sakaiproject.importer.api.ImportFileParser-CommonCartridge"
			class="org.sakaiproject.importer.impl.CommonCartridgeFileParser"
			singleton="false">
	</bean>
-->

You also have to add your chosen parsers as properties of the ImportService bean. In this case, again, the Sakai archive parser is enabled by default, and the common cartridge must be uncommented:

<property name="parsers">
    <list>
      <ref bean="org.sakaiproject.importer.api.ImportFileParser-Sakai"/>
 <!-- <ref bean="org.sakaiproject.importer.api.ImportFileParser-CommonCartridge"/> -->
    </list>
</property>

Configure handlers

A handler is code that works on behalf of a Sakai tool and contributes import content to that tool. You must uncomment the handlers that you want available to the import. The original Sakai input format doesn't use any handlers, because it still uses the ArchiveService to push things into Sakai. The handlers are used by common cartridge as well as any other parsers you configure.

	<!-- Handlers -->
<!--
	<bean id="org.sakaiproject.importer.impl.handlers.AnnouncementHandler"
			class="org.sakaiproject.importer.impl.handlers.AnnouncementHandler"
			singleton="true">
		<property name="announcementService">
			<ref bean="org.sakaiproject.announcement.api.AnnouncementService" /> 
		</property>
	</bean>
-->
<!--
	<bean id="org.sakaiproject.importer.impl.handlers.ResourcesHandler"
			class="org.sakaiproject.importer.impl.handlers.ResourcesHandler"
			singleton="true">
	</bean>
-->
<!--
	<bean id="org.sakaiproject.importer.impl.handlers.SamigoHandler"
			class="org.sakaiproject.importer.impl.handlers.SamigoHandler"
			singleton="true">
	</bean>
-->
<!--
	<bean id="org.sakaiproject.importer.impl.handlers.MessageCenterHandler"
			class="org.sakaiproject.importer.impl.handlers.MessageCenterHandler"
			singleton="true">
	</bean>
-->

And just like the parsers, the handlers must be set as properties of the ImportService bean:

<property name="resourceHandlers">
    <list>
      <!-- <ref bean="org.sakaiproject.importer.impl.handlers.AnnouncementHandler" /> -->
      <!-- <ref bean="org.sakaiproject.importer.impl.handlers.ResourcesHandler"/> -->
      <!-- <ref bean="org.sakaiproject.importer.impl.handlers.SamigoHandler"/> -->
      <!-- <ref bean="org.sakaiproject.importer.impl.handlers.MessageCenterHandler"/> -->
    </list>
</property>

Edit import-pack/project.xml

There are two dependencies in import-pack/project.xml that need to be uncommented if you decide to use the Samigo handler. Again, the handlers don't do anything for the original Sakai archive format, so don't bother uncommenting the dependencies for the Samigo handler unless you are using the common cartridge parser or one of the other parsers available at https://source.sakaiproject.org/contrib/migration/trunk/import-parsers