Need effiecient way to determine pub-view setting for large sites
Description
Activity

David Horwitz February 14, 2010 at 4:10 AM
see https://sakaiproject.atlassian.net/browse/KNL-278#icft=KNL-278 for progress on this

David Horwitz January 25, 2010 at 4:56 AM
MAINTANCE TEAM: Unassigned so these get reviewed by the Maintance Team

Jim Eng September 25, 2009 at 10:45 AM
The work for this will need to be done in kernel. I propose that we change the implementation of the ContentHostingService.isPubView(String) method. It would first check whether a list of pubview realms have been cached for the site (in thread-local cache so it has a request-scope). If so, it would use that list to determine pub-view settings for the current (and subsequent) resources. If not, the method would call a new method in SecurityService, as follows:
pubviewRealms = SecurityService.listRealmsWithinContainer(AuthzGroupService.ANON_ROLE, AUTH_RESOURCE_READ, siteCollectionRef);
That method will return a set of strings, which may be empty but never null. That set will be used to calculate whether the current item is public, and it will be cached in thread-local cache for use in answering whether other items in the same site are public.

Jim Eng September 25, 2009 at 10:38 AM
This query returns the desired realms:
select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM where SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and SAKAI_REALM.REALM_ID LIKE ? and FUNCTION_KEY in (select FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCTION_NAME = ?) and ROLE_KEY in (select ROLE_KEY from SAKAI_REALM_ROLE where ROLE_NAME = ?);
The first variable is the realm-id of the site-level resources collection with a question-mark appended. The second variable is the function name ("content.read" in this case). The third is the role name (".anon", in this case).
It might be a good idea to add a "force index (SAKAI_REALM.REALM_ID)" clause to the query, although tests indicate that Oracle and MySQL used that index for queries against database tables with large numbers of realms defined.

Jim Eng September 25, 2009 at 10:33 AM
The list view in the Resources tool in Sakai 2.x has a column to show who can see each resource or folder. Potential values include the entire site, members of particular groups, and the public.
Items at any level may be made public, so the question ("is this item public?") must be asked for virtually every item in a site. Determining whether resources and folders are open to the public is relatively expensive because it must be repeated so often. If a site contains lots of resources, the expanded view can results in dozens, hundreds or thousands of queries. The only short cut at this time is that the query can be avoided for items that are inside a folder that is public since they will also be public.
The SQL query is against the various realms tables. It asks whether the anonymous role has the "content.read" permission for the realm associated with a particular resource or folder (or for the realm of any folder that contains the item).
We could improve on that if the SecurityService (or AuthzGroupService?) interface had a method to ask a different question: Are there any realms within this site for which the anonymous role has "content.read" permission (and if so, what are they)?
This query would likely be implemented with a SQL "like" operator and a one-sided wildcard. This is actually a join of information from three or four tables, but the query would be asking for a list of sakai_realm.realm_id values like '/content/group/<side-id>/%' (or like '/content/group-user/<user-id>/%' for workspaces, and it's possible there are other variations) for which there are entries in the sakai_realm_rl_fn table where the role_key identifies the anonymous role and the function_key corresponds to "content.read". The result would be a list of realm identifiers.
That would allow Content Hosting to get a list of realms (possibly empty) for items that are public. Content Hosting could then avoid making dozens or hundreds or thousands of individual queries by checking that list of public realms. An alternative to adding a specific API method for this would be to execute that query in the SecurityService and do some form of caching so the Security Service could answer this question hundreds of times without doing hundreds of individual queries.
When users navigate to the Resources tool in sites with many resources OR when users expand-all in sites with many resources, it results in many many queries to the realms tables, checking to see if the anonymous role has "content.read" permission for the realm related to each resource and folder. We need a way to reduce the impact of these queries on database performance.