While downtime is probably inevitable (even in extreme cases, such on a space ship, where it could cost billions of dollars, or in on-line trading system, where it could easily cost much more than that), it should happen as rare as possible. There are many factors to that, some of that are out of CloudLinux control, such as data center reliability. That includes not only obvious things such as uninterruptible power supply and well-connected networking, but also things such as good conditioning. The author once seen a server room disaster caused by a broken A/C unit which just went on fire, transforming its plastic parts into gross amounts of the black smoke and ash, and blowing the resulting products right into a room, due to a huge and fast fan which, ironically, was still perfectly working. Experience not to forget easily, must I say!
So what can CloudLinux do to help increase that famous metric? Two simple things:
- Protect those web sites from each other, by wisely distributing available hardware resources between them.
- Be stable and secure, immune to attacks and exploits, do not crash.
In theory, one can write correct and bug free software. In practice, it's just as impossible as flying (wake up, Neo. The matrix has you). Software stability is the result of an endless battle between developers fixing bugs and themselves adding more bugs. Well, they like to refer to those as , but it is a truth universally acknowledged that features and bugs come bundled together. That is why every respected software development cycle has a certain phase called “feature freeze”, during which they only add fixes but not bugs.
Sometimes this phase is running in parallel with development, that is, some developers continue to add more stuff, while others are cherry-picking bug fixes from that stream. This is exactly how the -stable branches work in mainline Linux kernel: after releasing a certain kernel version (say 3.1) they keep on bashing the next one (3.2), while people like Greg Kroah-Hartman collect bugfixes and periodically release stable kernels like 3.1.1, 3.1.2 and so on.
Then, Linux vendors are doing the same thing, branching their kernels off of a specific mainline kernel version and adding more and more bug and security fixes. One of the vendors who is particularly good at doing that is Red Hat. With their Enterprise Linux kernels, they usually take a kernel and then marinate it for at least six months, doing testing and fixing. Result? A kernel which is much more stable than the mainline one.
Thanks to open source model, CloudLinux stands on the shoulders of Red Hat. What we do is we take RHEL6 (Red Hat Enterprise Linux, version 6) kernels and put our stuff on top of those kernels. This is a way to improve stability and security. More to say, it lets us concentrate on our real job: providing a good platform for shared hosting, leaving the complex job of maintaining a stable and secure kernel to excellent kernel team of Red Hat. Improve your servers' stability. Go CloudLinux!
