Idea for increasing the size of resource id

Description

Looking back at some old tickets
https://sakaiproject.atlassian.net/browse/SAK-11816#icft=SAK-11816
https://sakaiproject.atlassian.net/browse/SAK-11020#icft=SAK-11020

The field resource_id (and related fields) in the content tables has remained a small 255 characters since the beginning of Sakai. The main problem now is that this limits the length of file names and directory structures since that entire information is stored in the resource id. Previously it was also a problem for URL's but that was "fixed" in 2.7. (Even though the fix involved actually creating a file which introduced problems of it's own such as importing a tool like Lessons without importing it's content)

It looks like historically it's remained that low because of Mysql. Prior to 5.0.3 the varchar was limited to 255 characters. That makes sense. Then the index was also limited to 737 bytes (255*3 for UTF-8). However in more recent versions, the index of InnoDB tables is allowed to be as large as 3072 bytes (1024*3) and the varchar limitation is up to 65K. Oracle still has the 4K limit which is larger than both and should be safe to increase now, but should probably still stay in the same limit for consistency.

So it seems like we could safely increase these id's where they are used up to the MySQL maximum of 1024, giving us a little more room on these ids. Anything else seems like it would require a more creative solution (which may or may not be worked on)

Any feedback on this?

http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_large_prefix

Activity

Show:

Matthew Jones May 19, 2017 at 8:51 AM

I'm not too sure about the creative/hash solution, it seems like it would require work and conversions. I'm sure there's even more creative solutions as the way we store content in the database is really inefficient but the focus on that ended years ago with JCR. With MySQL 5.7.7+ the large_prefix is the default so that makes it easy, it looks like anyone lower would have to set it in their configuration. This is the instructions on Drupal for utf8mb4 support (which is semi related to this) https://www.drupal.org/node/2754539

It might be possible for Sakai to detect if large prefixes are available and use a different hibernate mapping (and also different limits) while also warning the user they should set these for increased limits. Or we could just make it a requirement for future versions of Sakai to just set this?

Neal Caidin March 29, 2017 at 3:04 PM

The team generally does not consider feature requests to be of "critical" priority. Any reason to make an exception on this one? Although if a PR is coming soon the priority level may not matter so much.

Adam Marshall March 10, 2017 at 4:52 AM

yes and thrice yes Matthew. Please please make this happen. We are always having to apologise to staff when they hit the limit.

For some reason our staff love uploading documents called: "A Microsoft Word document containing text that describes in great detail crop rotation in the home counties of England in the 15th century.doc" into a folder called "A folder that contains documents pertaining to the informational transfer of key concepts related to but not restricted to crop rotation in the home counties of England in the 15th century". I think you get the picture!

Matthew Buckett March 10, 2017 at 3:58 AM

The more complex solution is to hash the resource ID and use that as a key/lookup. That way we don't have to worry about index limits on both the resource_id and the in_collection columns.

Matthew Jones March 9, 2017 at 3:54 PM
Edited

I had this as an old email, mostly putting it up here as an to get more comments/feedback so I don't lose it. This might be a little different if we went to utf8mb4 by default but should still allow for a little more room. Anything else just seems like it gets to be too complicated?

Details

Priority

Components

Assignee

Reporter

Created March 9, 2017 at 3:52 PM
Updated August 3, 2023 at 9:13 AM