The Good Journal #3 Upgrade all the things!

As a professional techy weirdo, I must say that running hundreds of servers inside a Kubernetes cluster can be quite the hassle when it comes to upgrades. And let’s not forget the diverse range of servers, from large enterprise-sized ones to diminutive two-dudes sharing their banjo playing, that each require individual attention.
To mitigate any potential issues, we tend to upgrade during the night in batches for our consumer-sized environments. This usually goes off without a hitch, but as with all things tech-related, sometimes you encounter unexpected problems. If a Kubernetes node goes down or any other infrastructure issues arise, the upgrade script will halt after completing the current task. However, while this approach works fine, I failed to consider degraded performance that could slow down the upgrades to a crawl resulting in some of our consumers being faced with a unavailable environment when their server is still under maintenance mode for an extended period due to slow but steady upgrades.
To prevent such scenarios from occurring in the future, I implemented a time check function within the script. The script now checks the time before running each upgrade, and if it’s outside the designated hours for upgrades, it’ll hold off until the scheduled time arrives.
Our philosophy is to take things slow but steady when it comes to upgrades. We don’t jump on the latest version immediately; instead, we wait until our infrastructure is running smoothly, the developers have put out a bunch of nice fixes for the server release and we’re certain there aren’t any major bugs in the release we’re pushing into production. This is why we’re generally behind in terms of versions. Nextcloud releases new versions every few months, and while this is great, they’re not always immediately stable enough for production use. Hence, even for our consumers, we choose caution over haste.
We have received requests for a more rolling release schedule, which does sound like a lot of fun. However, implementing such a plan would require a considerable amount of work that we currently cannot handle from a support perspective. Even with a zero-support policy, we’ve learned through experience that we’re inherently too nice to ignore user requests for help. As a result, we spend too many hours providing support to our free users when we’ve publicly stated that we wouldn’t.
Our dedication to being good and helping out our users means we often spend too much time assisting them, even when their issues are specific to them. We’re slowly getting used to reminding them that we can’t offer extended support when it comes to general platform usage to our free tier users. As much as we’d like to, we don’t have the resources to do so.
In conclusion, while a wild, rolling, beta testing upgrade track sounds like an exciting prospect, it’s not something we can provide for the foreseeable future. Our focus remains on providing stable, reliable service to our customers.