Issues
- Mail digests can cause DB crashSAK-11841Resolved issue: SAK-11841Aaron Zeckoski
- Accessibility: Image has non-descriptive label (Page Order Helper)SAK-11669Resolved issue: SAK-11669joshua.ryan@asu.edu
- XML Entities in CHS (and elsewhere) consume to much memorySAK-11432Resolved issue: SAK-11432
- Null events are generated to SAKAI_EVENTSAK-11082Resolved issue: SAK-11082
- Null events are generated to SAKAI_EVENTSAK-11081Resolved issue: SAK-11081
- Add contextId (siteId) field to EventSAK-10801Resolved issue: SAK-10801Stephen Marquard
- Event.event column is too smallSAK-10481Resolved issue: SAK-10481Glenn R. Golden
- Add generic dao version to the master properties file (currently 0.9.1)SAK-10363Resolved issue: SAK-10363IvoG
- Create a single datebase data loading formatSAK-9923Resolved issue: SAK-9923
- Presence events are only logged if presence list is enabledSAK-8499Resolved issue: SAK-8499Nuno Grilo
- hen new sites are created, Wiki Home Page permissions are not the same as Wiki Default_template permissionsSAK-8234Resolved issue: SAK-8234Harriet Truscott
- JSF PageRenderer internationalizationSAK-7655Resolved issue: SAK-7655Ray Davis
- sakai-calendar-tool internationalizationSAK-7654Resolved issue: SAK-7654Beth Kirschner
- Resources needs further i18nSAK-7590Resolved issue: SAK-7590
- FormattedText matches too many times for tags where one is a substring of the other (i and img)SAK-7588Resolved issue: SAK-7588
- System files created on server startup regardless of auto.ddlSAK-7295Resolved issue: SAK-7295
- Empty warning from AgentHelperImplSAK-7111Resolved issue: SAK-7111Huong Nguyen
- JSF error occurs in hidden field: lastSubmittedDate in deliverAssessment.jspSAK-7106Resolved issue: SAK-7106Huong Nguyen
- Accessing an assessment via the publishedUrl,: after logging in successfully, u were sent to the portal page instead of the assessment page .SAK-7104Resolved issue: SAK-7104Huong Nguyen
- Null pointer exception when filling out a hier wizard containing categories without descriptionsSAK-7055Resolved issue: SAK-7055Christopher Maurer
- Glossary Long Descripton not being copied when site is duplicated.SAK-7028Resolved issue: SAK-7028Brad Anderson
- Gradebook throws deadlock exceptions when update gradesSAK-6999Resolved issue: SAK-6999Oliver Heyer
- Adding comment to page with full-stop in name fails with errorSAK-6995Resolved issue: SAK-6995Raad Al-Rawi
- a patch for SAK-6933: failed to saveTotalScoreSAK-6982Resolved issue: SAK-6982Daisy Flemming
- HibernateOptimisticLockingFailureExceptionSAK-6694Resolved issue: SAK-6694Stephen Marquard
- IndexOutOfBoundsException while authoring assessmentSAK-6692Resolved issue: SAK-6692SAMIGO TEAM
- Duplicate component ID 'selectIndexForm:selectTable:_id33' found in viewSAK-6690Resolved issue: SAK-6690Huong Nguyen
- NumberFormatException publishing an assessmentSAK-6688Resolved issue: SAK-6688Huong Nguyen
- Less verbose loggingSAK-6566Resolved issue: SAK-6566Michelle Wagner
- a Samigo patch for 2.2.x to address SAK-6563, SAK-6516, SAK-6493SAK-6564Resolved issue: SAK-6564
- Modification to CharonPortal to link from institutional logoSAK-6559Resolved issue: SAK-6559James Renfro
- NPE from isDisplayByRoleSAK-6552Resolved issue: SAK-6552Michelle Wagner
- NPE in ResourcesAction.getBrowseItemsSAK-6551Resolved issue: SAK-6551
- NPE in PagedResourceActionII.prepPageSAK-6550Resolved issue: SAK-6550Stephen Marquard
- NPE from getGradebooksSAK-6549Resolved issue: SAK-6549Stephen Marquard
- NPE from setGradebookUidSAK-6547Resolved issue: SAK-6547
- Can't remove resource file/folder (under Oracle)SAK-6541Resolved issue: SAK-6541
- Slow response times when accessing gradingSAK-6540Resolved issue: SAK-6540
- need create unique constraint for MFR_UNREAD_STATUS_T tableSAK-6497Resolved issue: SAK-6497Chen Wen
- OSP tool grouping in project-course sitesSAK-6487Resolved issue: SAK-6487
- OSP portfolio can't add a new pageSAK-6387Resolved issue: SAK-6387Brad Anderson
- Need error checking for a properly formatted email addressSAK-6302Resolved issue: SAK-6302IvoG
- No Add option on Portfolio ToolSAK-6299Resolved issue: SAK-6299Christopher Maurer
- Select Evaluator throws exceptionSAK-6298Resolved issue: SAK-6298
- Portfolio Worksite Tools Break After Removing User by AdminSAK-6267Resolved issue: SAK-6267Christopher Maurer
- Deadlock from concurrent site join attemptsSAK-6194Resolved issue: SAK-6194
- Hibernate-related exceptionSAK-6192Resolved issue: SAK-6192Oliver Heyer
- RSS titles include linefeed which is displayed in FFSAK-6140Resolved issue: SAK-6140Stephen Marquard
- RWiki throws exception when accessing an RSS page while not logged inSAK-6139Resolved issue: SAK-6139Stephen Marquard
- Multiple errors maybe importing fileSAK-6133Resolved issue: SAK-6133Huong Nguyen
50 of 56
CARET is seeing an issue related to the mail digests. Here is what we think we have figured out at this point:
PROBLEM:
org.sakaiproject.email.impl.BaseDigestService.java
This is loading the entire digest table every 1 second. We have 6 app servers so our DB is getting hammered pretty hard (it is our slowest query).
sendDigests has this (the key here is the empty catch block):
}
// if in use, missing, whatever, skip on
catch (Throwable any)
{
}
finally
{
if (edit != null)
{
cancel(edit);
edit = null;
}
}
This causes the digest process to attempt to send the digest over and over until the locks are cleared from the database.
org.sakaiproject.util.BaseDbSingleStorage.java
editResource in here seems to be the place where we attempt to update the status of the digest but cannot because of the locks. This is also where something else troublesome happens:
String sessionId = UsageSessionService.getSessionId();
if (sessionId == null)
{
sessionId = "";
}
This sessionId gets set to empty string for all the digests.
Down a little further:
org.sakaiproject.db.impl.BasicSqlService.java
dbWrite(String sql, Object[] fields, String lastField, Connection callerConnection, boolean failQuiet) calls on to prepareStatement(PreparedStatement pstmt, Object[] fields)
In this method we do the following:
if (fields[i] == null || (fields[i] instanceof String && ((String)
fields[i]).length() == 0))
{
// treat a Java null as an SQL null,
// and ALSO treat a zero-length Java string as an SQL null
// This makes sure that Oracle vs MySQL use the same value
// for null.
pstmt.setObject(pos, null);
Which converts the sessionId into a null. This would not be problem if the code does not fail while processing this digest. However, if it does fail, we are still supposed to be ok because we have code whch cleans up locks. Unfortunately, in org.sakaiproject.cluster.impl.SakaiClusterService.java we have code
like this in run():
// Delete any orphaned locks from closed or missing sessions.
statement = clusterServiceSql.getOrphanedLockSessionsSql();
List sessions = m_sqlService.dbRead(statement);
if (sessions.size() > 0) {
Here is the SQL that goes with that:
select distinct l.USAGE_SESSION_ID from SAKAI_LOCKS l left join SAKAI_SESSION s on l.USAGE_SESSION_ID = s.SESSION_ID where s.SESSION_ACTIVE is null
Unfortunately, this does not seem to be able to deal with null session_id and therefore the null session_id locks stay in the tables and cause infinite locks forever. This leads to huge load and growing logs which eventually cause the database to run out of disk space and die.
POSSIBLE SOLUTION (this is what we are doing at CARET but I would like to see this patched into trunk and 2.4.x if people think it is worthwhile):
1) Increase the timing on the digest check to once every 12 hours (I cannot see any reason to have it any shorter than this but I might be missing something, is there a reason?). Make this configurable in sakai.properties. This should be using a Timer instead od a thread.
2) Add in a check to the cluster service which purges all locks which are over X hours old. I am thinking something like 24 hours or maybe less for the default. This would be configurable in sakai.properties.
We are also seeing locks in the tables as old as 1.5 years (early 2006). This tells me that the current cleanup code is not completely trustworthy. I think this needs more investigation in the long run, but in order to keep our servers from dying this fix should be ok in the short run.
(from Stephen Marquard)
The cleanup code you're looking at in trunk (SAK-9857) is new for 2.5 - as you say, the 2-4-x and earlier cleanup code is indeed not trustworthy. We also have some of those (though none with null USAGE_SESSION_IDs).
It would probably be a good idea to add:
DELETE FROM SAKAI_LOCKS;
to the 2.4 > 2.5 conversion scripts.
Some final notes for everyone out there now that the issue is solved for us:
1) The dates in the XML blob that stores digests got our of sync for us and this was causing even worse problems. The code was checking and
seeing that the date range was not what it expected and therefore it would update the row in the DB for each one that did not match
(unfortunately it updated it to the same thing that was already there). This basically ensured it would loop forever and the updates
would crush our DB server with us seeing NumAppServers * NumCorruptDigestEntries updates per second.
With the fixes in, our load has dropped way down into easily acceptable levels.
If you see something like this happening (you are probably going to only notice it in the DB logs when you start to run out of space), the
solution is to either correct the dates in the XML or to fix the XML by removing the <message> tags (basically emptying out the everything
in the xml blob except a couple tags). This will be much harder or impossible if you are running trunk because of the new encoding so be
aware of that.