KernelCare Blog - Rackspace rebooted their Docker servers, maybe something they could have avoided?
KernelCare Blog

Rackspace rebooted their Docker servers, maybe something they could have avoided?

Rackspace rebooted their Docker servers, maybe something they could have avoided?

A few days ago, the Rackspace’s Carina team have undergone a scheduled maintenance update to address multiple issues in the Linux kernel which affected Carina users. The fix required a reboot, and the team worked diligently on updating all the docker servers to rectify the issue. Though it took a little longer than anticipated, the job was completed and all containers were rebooted (see here). However, this update and a required reboot anticipated downtime of servers for about 5-7 minutes, and customers were encouraged to ensure they backup their data and check restart setting.

It should be noted that Carina is still in free beta, and Rackspace advises against running production workloads at this time. None the less, if you are a service provider, and don't want to reboot your customers’ containers and creating 5-7 minutes of interruption in service for applications, many business-critical, that run on those containers, you can easily avoid it with KernelCare. A single-line, rebootless installation of KernelCare will update your containers/servers, and will keep them online by automatically applying patches to running kernels going forward.

Docker adoption has been growing rapidly, and now that containers have taken a center stage, issues will become more common. Patches will need to be applied to resolve them, making rebootless kernel updates even more important than ever. With hypervisors, only a small subset of kernel bugs would affect the security of virtual machines. However, with containers, it is a much more common occurrence. This blog post by Vijay Pandurangan, the director of engineering at Twitter, makes for an interesting read on how this issue was discovered and fixed by the team at Twitter.

So, unless you are ok with rebooting all your customers’ containers, and creating interruptions for clients, KernelCare is a must. You can try it here, and even our free 30-day trial will update all your kernels instantly.

Beta: CageFS, LVE Manager and lve-utils updated
CageFS updated
 

Comments 5

Guest - Jean-Francois on Tuesday, 19 April 2016 19:12

Rackspace servers are running on a custom-compiled kernel.

Please correct me if I'm wrong but I believe Kernelcare can only patch the kernel provided by supported distributions?

Rackspace servers are running on a custom-compiled kernel. Please correct me if I'm wrong but I believe Kernelcare can only patch the kernel provided by supported distributions?
Guest - Inna Gordin on Tuesday, 19 April 2016 21:48

Hi Jean-Francois, thanks for your comment. We work with individual service providers that have custom compiled kernel to support the kernel they have. We make patches to work for the kernel they need. Hope this helps.

Hi Jean-Francois, thanks for your comment. We work with individual service providers that have custom compiled kernel to support the kernel they have. We make patches to work for the kernel they need. Hope this helps.
Guest - Ivan Grynenko on Friday, 29 April 2016 23:36

Docker has no high availability? Really? When we patch our hypervisors (xenserver), customer VM just migrates into a new server. KVM, VMWare all have this capability to live migrate a VM without interruption. The point of kernel patching is taken, though. Thank you for an excellent article!

Docker has no high availability? Really? When we patch our hypervisors (xenserver), customer VM just migrates into a new server. KVM, VMWare all have this capability to live migrate a VM without interruption. The point of kernel patching is taken, though. Thank you for an excellent article!
WisiKlo WisiKlo on Saturday, 30 April 2016 16:17

Docker doesn't even have suspend / resume mechanism for now. CRIU should allow that soon -- but not yet, at least not production quality.
Also, it's not so much about live migrations. Live migrations have cost associated with them. Be it central storage, time to do it - especially if data migration has to be performed as well (imagine when you need to 'live migrate' thousands of servers... It is just not a good solution at scale). Even AWS doesn't have live migrations for their VMs.

Docker doesn't even have suspend / resume mechanism for now. CRIU should allow that soon -- but not yet, at least not production quality. Also, it's not so much about live migrations. Live migrations have cost associated with them. Be it central storage, time to do it - especially if data migration has to be performed as well (imagine when you need to 'live migrate' thousands of servers... It is just not a good solution at scale). Even AWS doesn't have live migrations for their VMs.
Guest - Netz0 on Monday, 02 May 2016 16:42
There is the freeze function which is similar: https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt
Already Registered? Login Here
Guest
Thursday, 22 August 2019

Captcha Image