Search Management

Repository OSID API

Javadoc can be found here: Repository OSID API

A quick overview of the issue

The OSID Respository specification, when applied to our application, has some holes that we're trying to figure out how to fill. Generally speaking, the consumer (e.g., Sakai) has no way to get information from the producer (the OSID implementation) about the current state of the search.

In reality, the consumer might want to know quite a few things, and we need to deal with some edge cases. To wit:

  • How many results are potentially returnable? In our context, this is the "8,473 matches found" number. This is the (theoretical) number of times you can call nextAsset before hasNextAsset returns false.
  • What's really going on if something fails? There are several possibilities.
    • The search just plain failed, because things on the other end are down.
    • The search failed because you've timed out, or your login information is bad/stale.
    • The search went ok, but we can't get the next asset because the network died
    • The search went ok, and we'll be happy to give you the next asset if you just wait a few seconds
      for us to fetch the next batch from wherever it is we're fetching them (the asynchronous problem).

So...the client will likely want to know:

  • What, if any, errors have been thrown and what do they mean?
  • How many results are potentially returnable?
  • How many results are available to me right now?

All these data are crucial to the creation of a search client that can deal with asynchronous searches, but the OSID gives us little opening to do anything to pass thing information from the producer to the consumer.

Some things we need to think about as we do this:

  • What are the semantics of the hasNextAsset method? true could mean "I know I'm supposed to have at least one more to give to you" or "I have one to give to you right now." False could mean "You've asked for all the assets the underlying system promised I could get" or "I can't give you one right now, but maybe later..."
  • What are the semantics and exception-throwing behavior of the nextAsset method? When does it throw an exception – only when something is wrong, or also when we need to wait for more results? Do we absolutely require a call to hasNextAsset before every nextAsset, or will nextAsset fail gracefully (throw an exception) if you ask for an asset past the end of the iterator?
  • What is the full list of possible desired exceptions to be thrown? How can we map these onto existing OSID exceptions/return values?

Possible approaches

Following are options for managing an asynchronous search.

Option 1 : AssetIterator only

Run an asynchronous search with minimal search properties and return an empty AssetIterator from getAssetsBySearch(). To get Assets, the Consumer would use AssetIterator methods hasNextAsset() and nextAsset(). The pageSize search property is used to maintain a certain number of Asset objects in an AssetIterator.

Out of Band Agreements Required

  • Search Properties
    • guid :: the key to the user's session state
    • sortBy :: selected sort method (rank, title, date, etc.)
    • searchSourceIds :: identifiers for search sources to be searched
    • pageSize :: how many records to display per page
  • AssetIterator method Exceptions and behavior
    • hasNextAsset() returns true if the AssetIterator cursor position is less than the total number of records found (even though these many results may not have been fetched and added to the AssetIterator as yet) and false if the AssetIterator cursor position is at the total number of records found.
    • hasNextAsset() does not throw any "out-of-band" exceptions.
    • nextAsset() returns the next Asset if it is presently in the AssetIterator. If the next Asset is not in the AssetIterator, but hasNextAsset() resolves to true, nextAsset() throws a RepositoryException with the OPERATION_FAILED message. This indicates that the Asset is available but has not yet been fetched. If hasNextAsset() resolves to false, nextAsset() returns null.

How Consumer gets an Asset

  • Call getAssetsBySearch() with the appropriate search properties.
  • Call hasNextAsset() on the returned AssetIterator:
    • if true - call nextAsset():
      • if RepositoryException::OPERATION_FAILED - wait for some time and restart the process by calling hasNextAsset().
      • if null - no more Assets left.
      • else - Consumer gets an Asset.
    • else - no more Assets left.

Advantages

  • Relatively simple for the Consumer and Provider.

Disadvantages

  • Consumer gets no search status information - i.e. how many records have been found? How many records have been fetched? Has an error or timeout occured?
  • Consumer is forced to handle paging (saving the records returned).

Option 2 : Search Status Assets from getAssetsBySearch()

Run an asynchronous search or get status on an already running search by specifiying the getStatus flag search property. If getStatus is set, getAssetsBySearch() will return the status of the Consumer's running search in an Asset of Type SearchStatus. If no search is running, getAssetsBySearch() will return null or throw a RepositoryException. If getStatus is not set, getAssetsBySearch() will initiate a search.

Out of Band Agreements Required

  • Search Properties
    • guid :: the key to the user's session state
    • sortBy :: selected sort method (rank, title, date, etc.)
    • searchSourceIds :: identifiers for search sources to be searched
    • pageSize :: how many records to display per page
    • startRecord :: starting record to display
    • numRecordsToDisplay :: number of records to display - combined with startRecord, the Provider gets the range of records to get (i.e. records 12-54).
    • getStatus :: if not set, initiate a search; if set, get status on a running search.
  • SearchStatus Asset Type
    • basically a Map of fields providing status information on an asynchronous search.
    • Fields could be:
      • databaseName :: name of a database being searched
      • status :: status notification (searching, fetching, ready, error, timeout, etc) for a given database
      • numRecordsFound :: number of records found for a given database
      • numRecordsFetched :: number of records fetched for a given database

How Consumer gets an Asset

  • Call getAssetsBySearch() with getStatus not set - this initiates the search.
  • Call getAssetsBySearch() with getStatus set:
    • inspect the returned SearchStatus: if status is "ready" for any database call getAssetsBySearch() with getStatus not set.
      • This returns an AssetIterator with the number of records specified through search properties.
      • The Consumer then calls (optionally) AssetIterator.hasNextAsset() followed by AssetIterator.nextAsset() to get an Asset.
    • inspect the returned SearchStatus: if status is "searching" or "fetching" for any database you must wait for some time and try the get status process again to retrieve Assets from these databases.
    • inspect the returned SearchStatus: if status is "error" or "timeout" for any database, the search has completed with an error and there will be no results from that database.

Advantages

  • Consumer has access to detailed search status information.
  • Consumer can request specific records to display (i.e. 1-10, 12-34, 44-last, etc.).

Disadvantages

  • The getAssetsBySearch() method becomes overloaded providing both search result records (it's primary purpose) and search status.
  • The AssetIterator.hasNextAsset() method becomes somewhat useless.

Option 3 : Search Status Properties Type

This option would be exactly like Option 1: AssetIterator only with the addition of Search Status fields in the Repository Properties object. A search would be initiated by calling getAssetsBySearch() and an empty AssetIterator returned. Access to Assets would be controlled through AssetIterator methods hasNextAsset() and nextAsset(). The Provider would update the Search Status fields in the Repository's Properties object which could then be inspected by the Consumer using the Repository.getPropertiesByType(Type searchStatusPropertiesType).

Out of Band Agreements Required

  • Search Properties
    • guid :: the key to the user's session state
    • sortBy :: selected sort method (rank, title, date, etc.)
    • searchSourceIds :: identifiers for search sources to be searched
    • pageSize :: how many records to display per page
    • startRecord :: starting record to display
  • Search Status Properties Type
    • basically a Map of fields providing status information on an asynchronous search.
    • Fields could be:
      • databaseName :: name of a database being searched
      • status :: status notification (searching, fetching, ready, error, timeout, etc) for a given database
      • numRecordsFound :: number of records found for a given database
      • numRecordsFetched :: number of records fetched for a given database

How Consumer gets an Asset

  • Call getAssetsBySearch() with the appropriate search properties.
  • Call hasNextAsset() on the returned AssetIterator:
    • if true - call nextAsset():
      • if RepositoryException::OPERATION_FAILED - check Repository searchStatusProperties for search status.
        • proceed according to search status - see above.
      • if null - no more Assets left.
      • else - Consumer gets an Asset.
    • else - no more Assets left.

Advantages

  • Consumer has access to detailed search status information.
  • The AssetIterator and the getAssetsBySearch() method remain "true" to their purpose.

Disadvantages

  • Consumer is forced to handle paging (saving the records returned).