Tags Administration. Import Tags Jobs

1. IMPORT JOBS

The following three jobs have been developed for the tag import:

Generic Tags Update

This job allows updating of individual tags and collections. Functions include add, edit and delete.

Generic Tags Update job (Full Collection Import)

This job allows editing for a complete collection. Functions include add, edit and delete of tags from the collections with an xml file containing the full collection.

Mesh Tags Update job

This job will import and update the MESH (Medical Subject Headings) collection.

All jobs described above require files to be placed in the sakai home folder (usually the sakai folder in tomcat) in the correct locations based on the properties explained below in the document.

1.1 Generic Tags Update job

This job allows add/edit of collections and add/edit/delete of individual tags. That means that we can modify only one tag from a 34000 tag collection.

Two files are required to perform these functions. Samples are below:

1.1.1 tagcollections.xml:

Here is a full sample in the source:
tags/tags-impl/impl/src/resources/xmlsamples/tagcollections.xml

Here is the structure of a tag collection:

   <Name>Star Wars</Name>
   <Description>This is the Starwars tag collection</Description>
   <ExternalSourceName>STARWARS</ExternalSourceName>
   <ExternalSourceDescription>Comes from a standard tags xml file</ExternalSourceDescription>
   <DateRevised>
<Year>2016</Year>
<Month>07</Month>
   <Day>12</Day>
   </DateRevised>
</TagCollection>

* Are mandatory fields

Name*: The name of the collection that displays to the users.
Description: A descriptive text explaining what this collection is about.
ExternalSourceName*: A unique ID that will be used to identify the collection when updating it in the future.
ExternalSourceDescription: Explains where this collection comes from.
DateRevised: The last revision date for this collection. That can be useful to know if it needs to be updated or not.

TagCollections will be updated with the new values while the externalSourceName remains the same

TagCollections can’t be deleted by a job. They can be deleted only through the UI when they are empty.

Another sample is:

<?xml version="1.0"?>
<TagCollections>
<TagCollection>
   <Name>Countries</Name>
   <Description>This is the list of countries in the world</Description>
   <ExternalSourceName>COUNTRIES</ExternalSourceName>
</TagCollection>
</TagCollections>

Note this example has no ExternalSourceDescription or DateRevised. That situation is valid.

1.1.2 tags.xml

Here is a full sample in the source code:

tags/tags-impl/impl/src/resources/xmlsamples/tags.xml

This file provides the ability to add/edit/delete of tags one by one inside a collection(s).

The following example shows the structure of a tag. Note a lot of fields are not mandatory but designed for future use based on what some of the most complex tag collections are using.

<Tag Type = "DarkSide">
   <Action>delete</Action>
   <ExternalSourceName>STARWARS</ExternalSourceName>
   <ExternalId>SW00002</ExternalId>
   <TagLabel>Anakin Skywalker</TagLabel>
   <Description>Jedi Knight and Sith Lord</Description>
   <DateCreated>
<Year>2012</Year>
<Month>05</Month>
   <Day>14</Day>
   </DateCreated>
   <DateRevised>
<Year>2015</Year>
<Month>06</Month>
   <Day>30</Day>
   </DateRevised>
   <HierarchyCode></HierarchyCode>
   <AlternativeLabels>Darth Vader</AlternativeLabels>
   <Data>Anakin was a nice guy but went to the dark side</Data>
   <ParentId></ParentId>
</Tag>

An analysis of each value is provided below:

Type: Provides the definition for the tag types. Type will be stored but it is not currently used.
Action: By default (if no value here defined) the tag will be created if it doesn’t exist or updated if it exists. If the value is “delete”, the tag will be deleted.
ExternalSourceName*: Specifies the collection.
ExternalId*: Specifies an unique id in the collection. If tags are being imported from an external source, this will be the external ID. ExternalSourceName + ExternalId will be used to find the tag, so they are mandatory.
TagLabel*: Text value of the tag that will be displayed.
Description: Provides additional information about the tag to display.
DateCreated: Date the tag was created IN THE EXTERNAL SOURCE. This can be used in the future to process differential updates.
DateRevised: Date the tag was updated IN THE EXTERNAL SOURCE. This is used to not update labels that are currently up-to-date.
HierarchyCode: Provides the location to store the EXTERNAL SOURCE hierarchy code.
AlternativeLabels: Some collections have alternative labels and they will be stored in this location.
Data: Provides the ability to store any additional information that can’t be stored in any other field.
ParentId: Used to generate an internal hierarchy. Specify the ExternalId of the label that is the parent of this label.

As stated earlier, some of these fields are provided for future features and use, in case a tool needs them.

Simple tag examples are shown below:

<Tags>
   <Tag>
       <ExternalSourceName>COUNTRIES</ExternalSourceName>
       <ExternalId>AF</ExternalId>
       <TagLabel>Afghanistan</TagLabel>
   </Tag>
   <Tag>
       <ExternalSourceName>COUNTRIES</ExternalSourceName>
       <ExternalId>AL</ExternalId>
       <TagLabel>Albania</TagLabel>
   </Tag>
</Tags>

1.1.2 Generic Tags Update job (Full Collection Import)

Collections can be created using jobs and xml or collections can be created through the tag administration UI. In either case, once collections have been created, this job is useful to maintain the collections. A full collection xml can be exported, a collection can be edited in an external editor and then imported again to update a collection or a process can be created to export from an external source in this format and update the full collection.
The main difference between this job and the Generic Tags Update job is that all tags that are not in this file will be deleted from the collection. In the Generic Tags Update job, we need to add the <Action>delete</Action> , but in this job (so be careful) all the tags not in the file will be erased.

This provides an easy process to update a collection. Just load the full collection file and adds, edits and deletes will happen automatically.

1.1.2.1 Note: How to export an actual collection

The REST services are explained in the REST API document providing additional detail.
To export an actual collection you need to use the REST service with a call like the example shown below:

http://YOURSAKAI/direct/tagservice-admin/downloadCollection.xml?tagcollectionid=THETAGCOLLECTIONID&session=THETAGSSERVICESESSIONID

Where the tag collection Id is the Sakai internal id for that collection and the tag service SessionId is the sessionId created to use this service (like a security token). All details are provided in the REST API document

This file can be imported by the “Generic Tags Update (Full Collection Import)” job, and all the modifications, additions or deletions will be updated.

As stated previously, you can follow this format to do full updates from other external sources. If you want to create a collection in this way, remmeber that first, you will need to create the empty collection, so you can use the tagCollectionId needed in this file

1.1.3 Mesh Tags Update job

This job is one sample of a customized importer. This job imports tags from the MESH (Medical Subject Headings) collection https://www.nlm.nih.gov/mesh/download_mesh.html. If you want to add additional importers use this as a sample and update the code to import other collections import jobs.

This one uses the mesh.xml file as it is, and maintains the mesh collection by adding, editing and deleting tags.