Michael's Proposed 2008 Release Practices

New Concept

  • This is an idea for discussion, please feel free to edit, comment, etc.
  • An MS Word version of this is attached.

Goals

What we want out of the Sakai release process:

A. The "quality" of a release should be clear to someone downloading it from the website.
B. A "final" release should be extremely high quality, with very few known issues (which would typically be limited to certain software/hardware configurations or as part of optional features).
C. A final release should run "trouble-free" in production, assuming proper configuration of both the application, underlying software stack/database and appropriate hardware.
D. Final releases should undergo thorough quality assurance testing, including performance testing.
E. A release should be supported for two years to allow schools to go through 2 academic years without upgrading.
F. We should not impede schools who wish to pull from the .x branches.
G. (less sure about this one) A release should come with configuration recommendations that provide a starting point for setting up an environment for running Sakai.

Current state

We're currently releasing Sakai twice a year. Each of these is a features release (.0), which introduces significant new features. There are no beta releases. We do not put out a maintenance release. We do put out security releases that only contain fixes related to the security issues. We officially support the current release and the previous release. By support we mean that bug fixes are generally merged into the code branches for those releases.

The Sakai .0 release has too many issues to be a good choice for running in production. Most full production schools are therefore running from the post-release branch (.x) that contains many bug fixes but has not been through community quality assurance, although each institution is testing the particular mixture of fixes they are running locally. It does create the ironic situation that the release on the sakaiproject.org site is not usable by newcomers and you have to be an insider to get a different version.

This practice has evolved for a variety of reasons and most community members have been comfortable with this. One of the benefits of the current approach is that it allows relatively more energy, in the short run at least, to be focused on feature development rather than on bug fixes and performance enhancements. As the community changes (more non-developers using Sakai and more developers in full production mode) and production concerns become more prominent, it makes sense to ask if we should revised these community practices.

Problems with current state

  • There is no release readily available that is production-ready and the .0 release is not labeled as beta (or some other moniker that would indicate its possible unsuitability for production use). This combination is likely to lead to people having a bad experience with Sakai.
  • A new release every 6 months and our "1 version back" support policy means that you must upgrade Sakai every year or run an unsupported version (and fix your own bugs).
  • There is no good documentation for configuring the Sakai stack to run in production. Schools are left to experiment to get it right, sometimes under heavy duress.
  • Tools have extremely inconsistent performance profiles, which makes it difficult to know which are "safe" to include in a production environment.

Potential solutions

Understanding what we want the solution to be takes us to the important question of what our overall goals for Sakai are. Do we want to be delivering production-ready enterprise software or are we happy with providing an excellent framework for the development of your own CLE? If the former, we need to make changes to our process. If the latter, we need to make this clearer to the community and the world and rely on our commercial partners to close the gap for those schools who want an "off the shelf" CLE.

In the spirit of good open source sharing and the recent attention around Zimbra, I took a look at their release process:

It's interesting to see how this has evolved over time. The time from beta to planned general availability in version 3.0 was 1 month. The actual time was nearly three months. A similar schedule persists until version 5.0, the current release. We now see a 6 month timeframe from the first beta to planned general availability. Why is that? I don't know but I'm going to ask Zimbra and find out.

Here's a link to Plone's release process, to pick another application at random. Again, there are formal release candidates prior to the final release. Here's what they say about these releases:

"Beta releases and Release Candidates are normally released for production testing, but should not be used on mission-critical sites."

They also do formal maintenance releases, which focus on performance enhancements, bug fixes, security fixes and UI enhancements.

Subversion also provides an interesting discussion of release numbering and stabalization..

Vocabulary and Potential Solutions

To make vocabulary easier, I suggest the following definitions:

Beta: The first feature-complete release ready for testing outside the Sakai development community. This will be useful for previews and perhaps small pilots. It will be known to have bugs, performance issues and areas of UI that need polishing. Localization and documentation are not complete. There may be more than one beta version depending on how quickly bugs get fixed.
Release Candidate: A version with the potential to be a final product, unless fatal bugs or performance issues emerge in continued testing. This is a version that the more development-savvy users might run in production, especially during a "summer" term. There will generally be more than 1 release candidate, as it is highly unlikely that we quash all the bugs in one round.
General Availability: The version of the product that we recommend for enterprise deployment, even if you don't have resources to manage bug fixes on your own.
Maintenance Release: Despite our best efforts, we can't eliminate all the bugs or performance bottlenecks. And security threats may emerge. These could lead to a release that fixes things found in wide production deployment.

I don't know as much as others do, but I would say that 2.4.0 was at the level of a first Release Candidate (RC1). If we were to cut a new tag from 2.4.x today it would be RC2.

Which brings me to some potential solutions:

Potential Solution #1: A series of releases of increasing quality, ending with a .0 release and continuing with maintenance releases as needed. We make beta and release candidates available publicly for those who want to evaluate or pilot Sakai. We highly discourage anyone from running anything but the final release or, at minimum, RC2+.

This is a good solution, in my view, if we can marshal resources to test the interim releases. We also need a performance-testing suite that can be centrally run on these releases. I'm not sure what the impact is on release management (but see "Release Frequency" below).

Potential Solution #2: Continue as we do today but explicitly create a maintenance release based on the fixes we get from production. We recommend to the external community that they do not run a .0 release in production and wait for an official maintenance release.

The distinction between this and #1 is we are acknowledging that many folks will run the release in production and we're counting on that for much of our performance testing (at least).

I don't like this as much because (a) it risks putting our developing institutions in production fire-fighting mode rather than focusing their energy on pre-deployment testing or long range feature development and (b) it is relatively non-standard in the industry and, therefore, likely to be confusing to newcomers.

Potential Solution #3: Do #1 without an official maintenance release.

To me this is a question of resources, release stability and release frequency. If we are releasing stable versions of Sakai every 6 months then I think a maintenance release is less valuable. There are folks that would benefit from it, of course, but if we were short on resources this is what I would sacrifice.

Potential Solution #4: Change nothing.

I'm sure there are other variations we can imagine. When I think about Sakai in 2009, I think we need to be following something quite close to #1. How we get there from where we are today is an important question, but I don't think it should affect our establishing that as the goal.

One key enabler to achieving that goal is to establish a centralized set of load test scripts that we can "easily" run against Sakai. Another key enabler is a substantial increase in the amount of unit testing. If we can't get resources in the community to provide real testing on beta/RC releases then I prefer solution #2. This would be disappointing, though, in my view.

Release Frequency

Note that none of these solutions address criteria G (supporting releases for 2 academic years, more or less). We obviously have two choices here. Make less frequent releases or support more releases. I think the former is the only realistic choice at this point in time. I think there are other benefits to a longer cycle as well, including the ability to take more time with requirements gathering, unit and performance test development, documentation, etcetera.

If more frequent releases are desirable, I would advocate establishing a new release process first and then increasing the tempo as we are able to improve efficiency.

What is in the Release

Sakai is a kernel plus a set of tools. Some of these tools are required (or close to it) and others are optional. I'm not going to start that conversation in this. But I think it is important to realize that getting to a new release process will require us to prioritize which tools/components get additional attention for (performance and unit) testing. We're not going to be able to bite off all of Sakai in one piece.

I haven't thought this part through very well but I can imagine a "Sakai Basic" release that contains only those tools that are absolutely fundamentally needed in a CLE. Resources would definitely be in. The Poll tool would probably be out (at least as of today). This is not a judgment of the quality of the tool but, rather, the degree to which it was needed on the vast majority of campuses. A separate facility for incorporating additional tools would be provided and information about the testing that has gone into them would also be provided. In any case, anything that appeared in this Basic package would have been rigorously tested.

I'm not sure if it is possible to completely disentangle this conversation from the release process conversation. Probably not because, at the end of the day, this is about a trade off in functionality and quality and frequency. But I'm out of time at the moment and I think this is enough to chew on for now.