I had multiple people asking when we will release new kernel to fix this vulnerability: .I am happy to say that NONE of our kernels are vulnerable. We have tested, verified and made sure that they are all safe and are not exploitable via that vulnerability
New beta version of CageFS is available. The new version fixes multiple crashes in CageFS FUSE, that were causing spontaneous error 500. In addition it adds full support for LiteSpeed/cPanel and for InterWorx control panel.
Other multiple bug fixes and improvements, including:
Issues when sending mail via exim were resolved
Issues related to cPanel CageFS plugin were resolved
Improved detection of LiteSpeed, with automatic configuration of CageFS settings in LiteSpeed
CageFS jail API was improved for better backward compatibility.
“top” is one of the most useful command in any sys admin arsenal. It is amazing tool that lets you find an issue with a server. It is by far the best “birds eye” view to the server that you can get. Anytime someone has a problem with server performance or stability, the first thing I do, I run top. While the same information can be gathered by running multitude of other commands – having it all in one place, refreshing constantly, is very helpful. This guide will walk you through top in a typical shared hosting settings.
Lets take a look:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 47660 mysql 6 -10 1052m 233m 4784 S 2.8 3.0 1:11.31 mysqld 47374 nobody 15 0 305m 141m 2836 S 0.0 1.8 0:03.71 httpd 49478 nobody 15 0 304m 141m 2864 S 0.0 1.8 0:00.34 httpd 49781 nobody 15 0 305m 141m 2320 S 0.0 1.8 0:03.09 httpd 39462 nobody 15 0 305m 140m 2408 S 0.0 1.8 0:07.94 httpd 50142 nobody 18 0 305m 140m 2320 S 1.4 1.8 0:02.84 httpd
Right away we can see that the server was recently rebooted - just an hour ago due to this info (top - 02:12:43 up 1:12), and that load is subsiding (load average: 3.93, 4.29, 12.40). Now, everyone knows that “high” load average is bad, and low load average is good. This three values represent load averages for 1, 5 and 15 minutes correspondingly. The general rule of thumb is that if load average is higher then number of cores on a server – it is a bad thing. If it is less – it is ok. The things are bit more complex on CloudLinux servers, due to CPU throttling – but lets continue discussing 'top' instead.
So, once we take a look at load averages, if they are high – we know that something might be going on with the server. More often then not – top can help you pinpoint the issue.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8842 mysql 11 -10 1274m 178m 4076 S 0.0 2.3 0:46.40 mysqld 27379 root 18 0 232m 123m 3768 D 0.0 1.6 0:00.93 httpd 22731 a19a5a1 16 0 184m 53m 3692 D 0.0 0.7 0:01.13 php 23384 a19a5a1 16 0 184m 53m 3900 D 0.0 0.7 0:01.05 php 19443 a1a16ala 18 0 209m 51m 2772 D 0.0 0.7 0:01.52 php 21254 a19a5a1 16 0 184m 51m 3692 D 0.0 0.7 0:01.26 php 24659 a19a5a1 18 0 182m 48m 3612 D 0.0 0.6 0:00.70 php 20674 a1p16ple 16 0 168m 48m 3604 D 0.3 0.6 0:01.57 php 20796 user 16 0 179m 48m 3540 D 0.7 0.6 0:00.80 php 21179 user 18 0 179m 48m 3540 D 0.0 0.6 0:00.70 php
Load averages are high, and you can see 82.2%wa. “wa” -- stands for IO Wait – the % of cpu time spent by CPU waiting for IO (usually disks) to respond. Lets look at the topmost process in the “top” results: it is mysql. This is pretty easy give away, that something is going on with MySQL. The processes are sorted by CPU usage, and MySQL is IO intensive. So, if MySQL is actively using CPU, and IO usage is high – there is a high chance that the issue is with mysql
Running mysqladmin processlist can help you pinpoint what might be exact problem. Sometimes it might be corrupted database, sometimes – bad query... If you train your eye – you can find out the cause of the issue within seconds of looking at the results.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23058 root 16 0 10476 1044 752 S 3.3 0.0 0:40.76 /usr/sbin/lveps -t 25203 root 18 0 39136 6000 1568 D 2.0 0.1 0:00.18 /usr/local/cpanel/bin/dcpumon 23048 root 15 0 13412 1716 748 S 1.0 0.0 0:12.96 top -c
io wait is at 97.3% , and at that level system would be hardly responsive. Pretty much all the memory taken and ~2.5GB of Swap is used. With a trained eye, you would see right away that “something” using up all the memory. Pressing SHIFT-M would sort processes by memory usage, and you will see which processes are eating up all the RAM, often providing you with enough info on what to do next.
Here is a bit more complex one:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 950397 user123 16 0 183m 19m 7276 S 7.0 0.5 0:00.08 php 950396 user321 16 0 0 0 0 Z 4.4 0.0 0:00.05 php <defunct> 950312 nobody 15 0 216m 113m 2020 S 2.6 2.9 0:00.25 httpd 946406 nobody 15 0 217m 114m 2032 S 1.8 2.9 0:01.03 httpd 950314 nobody 18 0 217m 114m 1992 S 1.8 2.9 0:00.14 httpd 7166 root 10 -5 0 0 0 S 0.9 0.0 38:39.15 kondemand/2 8194 root 10 -5 0 0 0 S 0.9 0.0 67:35.63 kondemand/6 8195 root 10 -5 0 0 0 S 0.9 0.0 93:49.75 kondemand/7
IO wait is pretty high (45.5%), swap is used – but not that much, just short of 1/2GB. It might appear that all the RAM is used up as well (3917884k used out of 4023964k total), but that is not the case.
The are about 1.5GB memory “cached”, and almost 300MB used for buffers. The “cached” memory, is memory used to speed up disk IO. The data is cached by linux kernel, so that instead of re-reading same file from disk, it would be taken directly from RAM. If kernel sees that there are not enough RAM – it would free up caches and buffers. In this case, kernel decided it is more efficient to save some of the processes to Swap (most likely those that were sleeping for a long time), than it is to purge cache. As long as swap usage doesn't change much within few minutes of time – the most likely reason for high IO wait is not Swap.
Looking at topmost processes in the list doesn't bring up anything that is easy to qualify as an issue. While IO bound processes are often CPU bound as well, it wasn't the case here. This is where you can use another good tool, iotop (can be installed via yum install iotop), that will show you processes that use up IO. In this particular case it was rsync causing high IO wait
Total DISK READ: 2.89 M/s | Total DISK WRITE: 212.68 K/s 951134 be/3 root 6.93 M/s 0.00 B/s 0.00 % 99.99 % rsync -rlptD --exclude=*/proc/* --delete /v~ /backup/cpbackup/daily/dirs/_var_lib_mysql_
Here are two more examples, where the issue at hand is not that clear cut, yet top still give enough info to go into right direction. First one is from my previous blog post:
si or software interrupts where the issue. Once again, this helps rule out quite a few scenarios, and shows what to “google” for. In this case USB module was an issue
This issue I have seen first on AWS “micro” server (the top header is taken from one of forum post at amazon, I couldn't find my own), but I have also seen it recently at one of the customer's server running Xen.
The key here is CPU 'stolen'
st - Steal Time. The amount of CPU ’stolen’ from this virtual machine by the hypervisor for other tasks (such as running another virtual machine) – a fairly recent addition to the top command, introduced with the increased virtualization focus in modern operating systems (man top) In the last case, moving VM to another server with more resources solved the issue.
As you can see top is quite useful, if you cross reference all the data that it presents.
I am happy to introduce ability to run 2.6.32 kernel from CloudLinux 6 on CloudLinux 5 servers.
The 2.6.32 kernel is a major improvement representing more then 5 years of development. It is generally faster & more stable then 2.6.18 kernel shipped with CloudLinux 5.
To install the kernel on your CloudLinux 5 server, please run:
# yum update rhn-setup --enablerepo=cloudlinux-updates-testing # /usr/sbin/normal-to-hybrid # reboot
To convert back to original CloudLinux 5 server:
# /usr/sbin/hybrid-to-normal # reboot
Hybrid kernel doesn't correctly recognize xvda devices (using sda instead) on Xen PV installations, causing kernel panik on boot
There is an issue with systems with CageFS 2 installed, please refrain from installing on such servers until second beta.
CageFS 3 beta is not available yet with this kernel (should become available next week)
Load averages are calculated the way they are calculated int CentOS. This means that if one site is getting limited, it will cause limits to increase, even if there is no increase in actual CPU usage, and server is not overloaded
Memory limits are enabled by default, there is no ubc setting, and you cannot do lvectl ubc disable. To disable memory limits, set them to 0
There is no separate kernel-xen, instead regular kernel package will be installed. That package will satisfy xen requirements, and supports para-virtualization
Since about 15 years ago, in addition to traditional shared and dedicated hosting, service providers started offering VPS hosting. VPS stands for Virtual Private Servers, the other acronym used for this is VDS (Virtual Dedicated Server). A VPS is something in between shared and dedicated hosting, closer to the 'dedicated' end of range.
A VPS is functionally equivalent to a separate physical server, is tuned to the individual customer's needs, has the privacy of a separate physical server, and so on. The trick is, multiple VPSes (tens of even hundreds of them) reside on a single real server, sharing its hardware resources, reducing the needs for hardware, rack space, electricity bills, thus lowering the price level for a customer.
Technically, a VPS functionality is implemented as an additional layer between the hardware and the software. Usually, a VPS is either a Virtual Machine (as in VMware, Xen, KVM) or a container (as in OpenVZ, Solaris Zones or LXC). Virtual Machines use hypervisor technology, and every VM-based VPS runs a full software stack, including the OS kernel and set of drivers for (virtual) hardware, the very same way your usual server runs.
Containers are a bit different, more light-weighted, since there is only one single OS kernel is running, and VPSes only run userspace software (i.e. no own kernels, drivers etc.). Now, if you have read up to this point and are not asleep, you deserve a small and nice reward: go get yourself a cup of coffee or whatever your favourite drink is. Got it? Let's continue.
Both containers and virtual machines have some overhead: this is the price one has to pay for splitting the piece of real hardware into multiple smaller pieces, getting some virtual hardware. This partitioning is done via isolation, and the isolation overhead is much less for containers than for VMs, but it's still there.
The other thing worth mentioning besides the isolation overhead is the importance of fair resource management. Indeed, since multiple VPSes share the same set of hardware resources (CPU, RAM, disk, networking), care has to be taken of how to divide this resources in a fair manner, so that no single VPS abuse a resource, say, by eating up all disk space or saturating the disk I/O channel.
Now, let's consider the ideal VPS solution, for which the isolation is bullet-proof (which almost true for VMs) and don't cost us more than a penny (which is almost true for containers), plus the resource management is just perfect, all ruled by no one less than king Solomon in all his great wisdom. Even with that ideal VPS solution in place, there are troubles around the corner.
First of all, every VPS runs (at least) the complete set of userspace software, which needs to be maintained. That is, unlike in shared hosting, you have to take care about software updates (if you care of security, for instance). You need to configure some auxiliary components like system logger or cron daemon. Basically, you need to be a full-scale sysadmin for that, or hire one, which is not that cheap at all. It could be useful if you are a sysadmin, so you can recompile PHP with your favorite options and patches (which is great fun!), but if you're not... than the need to manage your OS yourself is more of a burden, if not a nightmare.
Don't forget that every VPS needs a distinct public IP address. I hear you say "It's not my problem!", and I tend to agree, your HSP should take care about this. Still, considering a recent trend in IPv4 address space shortage, and less-than-warm acceptance of IPv6, this might be your problem, too.
Overall, most of the problems that you have with dedicated hosting also apply to VPS hosting. Dedicated hosting, though, is used by bigger web sites, which have proper sysadmin and other resources, and the knowledge needed to manage operating system instances. A VPS owner is usually not so rich and huge, but she faces the same set of problems as a dedicated guy.
So, which way to go if you want VPS benefits (like proper isolation and resource controlled by a Solomon) but don't like its maintenance burdens? Still don't know the answer? Oh, come on, it's pretty obvious! Go CLOUDLINUX! Amen.
Second version of CageFS 3.0 beta was related. This version:
Adds automatic detection for LiteSpeed webserver (stand-alone ony)
Fixes caching bug in CageFS-FUSE etcfs
Improves stability and performance of CageFS-FUSE
Fixes various bugs related to mounting/unmounting of filesystems
Correct handing of umask
To update:
# yum update cagefs cagefs-fuse lve liblve liblve-devel lve-wrappers --enablerepo=cloudlinux-updates-testing * LiteSpeed + cPanel & LiteSpeed using Apache's httpd config file are currently not supported. We are working to add support for it.
It is finally here! CageFS 3.0 is ready for adventurous souls that want to put latest and greatest on their servers.Right now it is only available released it only for CloudLinux 5.x. CloudLinux 6.x support will be coming soon. Here are some of the highlights of a new version:
Better namespace handing, requiring only fraction of mount points comparing to CageFS 2.x
CageFS-FUSE provides virtualized /etc & /var/log -- increasing security, and decreasing complexity of maintaining CageFS
Caged directories are no longer visible in /proc/mounts, solving all related issues with cPanel
Management plugins for cPanel and Plesk will be installed automatically (other control panels coming soon)
Automatic detection of cPanel and Plesk (other control panels coming soon), with automatic configuration to adjust for the running system
Improved command line tools
The current version should work with LiteSpeed, and or with custom control panels with some additional configuraiton.
More information on CageFS, how to install it or update from previous version can be found here:
We have released multiple packages to CloudLinux Production repositories:
lve-utils 0.6-10
Added lvectl destroy <id> -- for lve0.8.44 and later kernels
Disabled error message regarding UBC for CloudLinux 6
lve-stats 0.8-2
Added separate config file for lveinfo and lvechart to hold read only database access
Fixed issue with CloudLinux 6, and limits showing higher then ncpu allow
Added style option to lveinfo and lvechart to normalize CPU if style is set to user
mod_hostinglimits 0.8-2 (RPM only, for Plesk, InterWorx, ISP Manager users, not for cPanel/DirectAdmin)
Changed error messages to provide URL to full description of the error
Added LVERetryAfter - send RetryAfter header
Added LVEParseMode - there are three modes of working(CONF, PATH, OWNER)
Added LVEPathRegexp - regexp expression for username extracting from path(PATH mode)
Additionally we moved into production LVE Manager plugins for Plesk, DirectAdmin, InterWorx and ISP Manager:
plesk-lvemanager 0.1-7 interworx-lvemanager 0.1-3 da-lvemanager 0.1-6 isp-lvemanager 0.1-7
All the packages should be available via regular yum update
Just in time for a new year we have a new version of lve-stats. That version brings few enhancements and bug fixes. We have added admin & user styles for charting & lveinfo output. The new style will make sure that user will see CPU chart & output normalized relative to 100% of usage.
We also improved our integration with MySQL & PostgreSQL. You can now create /etc/sysconfig/lvestats.readonly and provide user that has read only (SELECT) access to history table in lvestats database. That file will be used by lveinfo & lvecharts to generate end user statistics.
This way you can make /etc/sysconfig/lvestats readable only by root, preventing ability of user to insert or delete any info fromlvestats database.
Please, note that this issue doesn't exist with SQLite that is being used by default.
Additionally, we fixed CPU reporting in CloudLinux 6. Before lve-stats would record CPU usage higher then it would be possible due to NCPU limit.
We are pretty much done with 2011. There are just a few days left. It was a good year. We got to 6000+ servers running CloudLinux (4x increase since 2010), our revenues are more then 1,000% up, our team doubled size and we gained hundreds and hundreds new customers.
We achieved quite a few goals that we have set:
Signed up cPanel and Parallels as distributors making sure that their software is well integrated with CloudLinux
Released CloudLinux 6.x
Implemented memory limits
Drastically improved stability and speed of our repositories by making them fully redundant and geographically distributed
Released plugins for cPanel, Plesk, DirectAdmin, ISPmanager and InterWorx
I am very thankful to our customers and partners for making it possible. It is a pleasure working with you, and we appreciate your support. We strive to make sure that your systems are stable and secure – and we will continue working toward this goal. Next year we will:
Provide an ability to use CloudLinux 6 kernel on CloudLinux 5 servers
Release productions versions of CageFS and make it the best way to secure shared hosting server
Release stable version of MySQL governor
Improved logging that would allow you to pin-point actual issue giving your customers more visibility into what went wrong.
Introduce IO limits, CPU weights, physical memory limits, and limit for a number of processes
Provide better integration with control panels, including email notifications and different limits per plan
Centralized interface to see data about resource usage for all your servers
And hopefully, some other features that you will demand from us as we move forward.
Please, accept my warmest greetings. I hope you will have terrific new year.
Knowing site audience is of a paramount importance for every web master. There are lots of tools to aid in gaining that precious information, from simple ones like to amazingly pretty Web 2.0 creatures like . But sometimes all you need is UNIX shell and some tools knowledge, like awk, grep and sed.
In most installations Apache uses so-called "Combined" log format, which is good enough to contain most of the needed info. On most Linux distributions Apache log files are usually stored in /var/log/httpd/ and the one we are interested in is called access_log. "Combined" log format is defined in the following way in main Apache configuration file httpd.conf:
Looks a bit scary, but in fact it's not that awful. If you check in Apache documentation you will find all the glory details (there's also some docs on itself). What's important now is every log line consists of 9 space-separated fields. To get the idea of how it looks like, we should pick a look at the actual file:
We are interested in a few fields only. Say, the first item (10.20.30.40) is an IP address of a client fetching the page. The item in the square brackets is a timestamp. After the GET word there's a name of a file which a user wants. Number 200 is the HTTP return code, which means 'OK' in this case (say, 404 means 'Document not found' etc.). An URL in quotes is so-called referer, this is where a user (user's browser, actually) comes from, it can be either your own site, or some external site like www.google.com. Finally, the last field is User-Agent, i.e. a browser identification string, happen to be Firefox 8 (in a couple of years it's gonna be Firefox 18, or even Firefox 24 -- well, time flies).
So, we have a huge file filled with lines like the one above. What can we get from it (besides eye floaters)? All sorts of things you never ever knew you could dig from a common log! Let's start crunching! First of all, let's assign a shell variable called LOG a value pointing to your log file, so we don't have to type its name a good hundred times:
Code
$ LOG=/var/log/httpd/access_log
Please note that '$' in the beginning of the line means a shell prompt. It means input lines, i.e. you need type in everything that goes after '$', the '$' itself is printed by shell. In case there's no '$' sign at the beginning of a line, this is output. Now, how big is the file, how many lines (let's call them records) do we have?
Code
$ wc -l $LOG
5217775
Pretty big, 5 million lines! And since what time is it maintained?
Here "head -1" means "give us the first line of a file" and the awk statement means "print us the fourth and the fifth fields of the line", which contains timestamps. Using similar awk statements you can get the list of IP addresses of all users who tried to access your website. I'm sure you do not want to view all the five millions of addresses flooding the screen (remember the terminal from the Matrix movie? Oh, good old days!).
I guess you would like to know only unique IP addresses (and even top ten of them, it's nice to know your adoring fans!):
Code
$ awk '{print $1}' $LOG | sort | uniq
We had to sort the list of addresses we have from awk first, then use uniq utility to omit the repeated lines. Still, the list is way too long (for those curious readers who want to know how long exactly, just add "| wc -l" to the end of the above command, still the number is huge, somewhere about 1 million lines). We can sort it once again to view only top 10 customers (their IP addresses actually):
Here we add -c (count) option to uniq command so it also outputs the number of repeated lines. Then, the second sort command ("sort -nr" is a numerical (rather than alphabetical) sort with reverse order (bigger numbers first). This gives us a list of IPs and their frequencies. Finally, the head command is to limit the output to first 10 lines (the default value is 10; if you want it to be top 25, simply use "head -25". You can do the same awk query to get top 10 of most popular files (pages) of your web site. The file name is seventh field, so:
Often the most popular file is /favicon.ico, a small web icon that you usually see in a browser tab Next thing we want is limiting statistics to the last month only (who cares what was going on on your web site in 1812!). grep helps a lot! By adding grep command we only allow lines that have a string "/Nov/2011" in it:
Sometimes it's important to know HTTP return codes. For most sites it's 200 (OK) and 404 (not found). There are still others return codes and it helps to know them to investigate some web server problems:
You can squeeze many other interesting tidbits out of your logs, if you know a bit of that awk-grep-sort-uniq-kung-fu. Actually, all the above stuff was pretty basic, just to show you how easy and simple it is. Much more sophisticated queries can be performed. Finally, feel free to share your own apache log parsing recipes in comments
It was a regular Monday morning – busy as usual. Then one client popped up with a very interesting problem. His server became unresponsive lately, had very high load, and he was wondering why CloudLinux wasn't stopping the issue.
I quickly logged into the server (and the unresponsiveness become obvious right away) and ran top. I start with top pretty much every time someone has “overload” issue with the server. Running top was the right thing to do this time over as well as it gave me an idea where to look next. The si was at 70% -- something was wrong, really wrong
si stands for % of CPU used to handle software interrupts. On most servers you would rarely see si using more then 2 to 4% of CPU. Software interrupt is an asynchronous signal that needs to be handled by some code. They are normal, and happen all the time. For example, software interrupts happen on each timer tick or when network card receives a packet of data, and it needs software to process that data.
My next step was to see which software interrupts are the most frequent on this system, and might be causing the issue.
# cat /proc/interrupts
CPU0 CPU1
0: 1566845520 60143 IO-APIC-edge timer
1: 1 2 IO-APIC-edge i8042
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 1 3 IO-APIC-edge i8042
50: 226 0 PCI-MSI hda_intel
169: 0 0 IO-APIC-level uhci_hcd:usb5
209: 0 0 IO-APIC-level uhci_hcd:usb4
217: 0 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2
225: 13475 111221010 IO-APIC-level uhci_hcd:usb3, ata_piix
233: 62 327366496 PCI-MSI eth0
NMI: 493362 580033
LOC: 1566919097 1566920751
RES: 27519611 15339092
ERR: 0
MIS: 0
Now, the first column stands of IRQ (interrupt request) number, CPU0, and CPU1 stand for number of times interrupt was handled by particular CPU. The next column is type of the interrupt – which is not important in this case, and the last column are modules that are listening for the interrupt.
I knew that timer could be ignored, it increments on each clock tick.
The LOC stands for local timer – and can be ignored as well.
RES stands for Rescheduling interrupts – and it looked fine.
The other two IRQ numbers that were very active were 225 and 233.
225 was used by uhci_hcd:usb3, ata_piix while 233 was used by eth0.
This is a web server, so I expected lots of network traffic, and high IRQ activity for eth0 was normal.
IRQ 225 didn't look as good. uhci_hcd is used for USB, and ata_piix is your standard ATA hard disk. The disk activity (based on iotop and iostat output) wasn't that high, but the counter was increasing very fast. Could it be interrupt storm caused by some conflict between two devices?
Well, USB is not needed on a web server, so it was easy to test.
# rmmod uhci_hcd ohci_hcd ehci_hcd
unloaded USB related modules, and now
# cat /proc/interrupts
CPU0 CPU1
0: 1567154926 60143 IO-APIC-edge timer
1: 1 2 IO-APIC-edge i8042
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 1 3 IO-APIC-edge i8042
50: 226 0 PCI-MSI hda_intel
225: 13475 111239531 IO-APIC-level ata_piix
233: 62 327404778 PCI-MSI eth0
NMI: 493550 580292
LOC: 1567228505 1567230167
RES: 27526034 15340183
ERR: 0
MIS: 0
USB was no more. System became responsive, and si dropped to 3%, load average dropped as well. It was a conflict between USB and ATA. I added USB modules to blacklist so they wouldn't be loaded after reboot. That was done by adding following lines to /etc/modprobe.d/blacklist.conf # disable usb
blacklist uhci_hcd
blacklist ohci_hcd
blacklist ehci_hcd
While the situation is highly unusual, and probably sign of faulty hardware or bios, it raised an interesting question if USB should be disabled on the server. For most web servers USB is not in use* anyway. Is there harm in having those modules loaded? First of all they take up a little memory. Yet, on some motherboards (like in this example) they might share same IRQ number with another device, and that is bad. It means each time such interrupt happens, both interrupt handlers will wake up and try to decide which will handle it. That wastes CPU cycles. It might not be as bad as in this case (as this one for caused by some hardware issue), but it is still a waste.
It makes a lot of sense to disable any hardware not in use – it might give some extra breathing space for the server.
* If you use KVM - it might use USB for console access, disabling USB is not recommended in this case.
One of the features of CloudLinux we are targeting and caring for is stability. We surely understand that our customers' profits greatly depend on the quality of service (QoS) that they can provide. Users expect their web sites to be up and running no matter what. Every single minute of a site unavailability increases site owner frustration, and that eventually converts into bad reviews and customers running off while screaming loudly. Yes you should worry now, because they run away from you to that other hosting service provider, and they are taking their money with them!
While downtime is probably inevitable (even in extreme cases, such on a space ship, where it could cost billions of dollars, or in on-line trading system, where it could easily cost much more than that), it should happen as rare as possible. There are many factors to that, some of that are out of CloudLinux control, such as data center reliability. That includes not only obvious things such as uninterruptible power supply and well-connected networking, but also things such as good conditioning. The author once seen a server room disaster caused by a broken A/C unit which just went on fire, transforming its plastic parts into gross amounts of the black smoke and ash, and blowing the resulting products right into a room, due to a huge and fast fan which, ironically, was still perfectly working. Experience not to forget easily, must I say!
So what can CloudLinux do to help increase that famous metric? Two simple things:
Protect those web sites from each other, by wisely distributing available hardware resources between them.
Be stable and secure, immune to attacks and exploits, do not crash.
First item looks pretty complicated, and it is actually the very core of CloudLinux technology. It deserves a few blog entries, or maybe even a book (not that thin at all!). What I'd like to impart now is the second item (stability, security and no exploits).
In theory, one can write correct and bug free software. In practice, it's just as impossible as flying (wake up, Neo. The matrix has you). Software stability is the result of an endless battle between developers fixing bugs and themselves adding more bugs. Well, they like to refer to those as , but it is a truth universally acknowledged that features and bugs come bundled together. That is why every respected software development cycle has a certain phase called “feature freeze”, during which they only add fixes but not bugs.
Sometimes this phase is running in parallel with development, that is, some developers continue to add more stuff, while others are cherry-picking bug fixes from that stream. This is exactly how the -stable branches work in mainline Linux kernel: after releasing a certain kernel version (say 3.1) they keep on bashing the next one (3.2), while people like Greg Kroah-Hartman collect bugfixes and periodically release stable kernels like 3.1.1, 3.1.2 and so on.
Then, Linux vendors are doing the same thing, branching their kernels off of a specific mainline kernel version and adding more and more bug and security fixes. One of the vendors who is particularly good at doing that is Red Hat. With their Enterprise Linux kernels, they usually take a kernel and then marinate it for at least six months, doing testing and fixing. Result? A kernel which is much more stable than the mainline one.
Thanks to open source model, CloudLinux stands on the shoulders of Red Hat. What we do is we take RHEL6 (Red Hat Enterprise Linux, version 6) kernels and put our stuff on top of those kernels. This is a way to improve stability and security. More to say, it lets us concentrate on our real job: providing a good platform for shared hosting, leaving the complex job of maintaining a stable and secure kernel to excellent kernel team of Red Hat. Improve your servers' stability. Go CloudLinux!
There's not a big secret to anyone nowadays, that every server has at least one IP address (which is also tied to a domain name). As simply as that, if you want to access the site, you need to know either it's domain name or an IP address.
A web server is identified by an IP address and a port name (which defaults to 80 for http and 443 for https). But what if we want to host more than one web site? Ages ago it would mean having multiple IP addresses on a server, one per web site, or using non-standard port numbers (like . Both solutions are no bargain: an IP address space is limited, and non-standard port number means a user should memorize it.
The glorious solution was described in version 1.1 of http protocol standard, and it is pretty simple. A web browser must add a “Host:” header to any http request, with its value set to a hostname part of an address hat user has typed in a location field. Now, if you want to browse the request containing “Host: ” header is sent to a specific IP address. In the end you use the same IP for , too, or maybe hundreds of such sites. Yes, this is how shared web hosting works. There is usually one web server listening to one IP/port combination, which reads the "Host:" header and supply the content based on its value.
A small talk nerdy to me part for those who are eager to know all the small details. In shared hosting there’s one copy of the Apache web server running plus, let's say, PHP installed either as mod_php, or as a CGI executable. That one copy of Apache handles all the incoming HTTP requests for all the websites that are sharing the server: receiving, processing, sending back a static file (say, an image), running scripts, logging and so on. When Apache is running in this way it runs as a single user, usually www or apache. At the moment Apache + mod_php is the fastest way of runnin PHP scripts, as the PHP interpretor is persistent and in-process, leaving two others (CGI + SuEXEC and FastCGI per user) far behind.
All the above might seem to be a brilliant scheme from the first look, but there are a number of drawbacks. Since it's just one http server instance serving multiple sites, a load on one of such sites will cause a slowdown of all the others, and that's just a mild scenario, more real-world one would be a DDoS attack on a site, rendering not only this one, but all the others dead and non-responsive. This is because there is no way to control resources (CPU time, disk and network bandwidth, used RAM etc.) on a per web site basis.
Another problem with shared hosting is security, or rather a lack of it. In a shared environment, different web sites are just different directories, usually there's little or no separation between those. All the web sites accounts and Apache user are members of the same group, so the thing is that Apache user has read (and sometimes even write) access to all the files of all the websites residing on the same server. If one of the sites is hacked, most probably others are vulnerable, too. More than that, evil hacker can just easily become a customer, get a legitimate account on the box, and then just upload PHP scripts to access the other websites hosted on the box (including all the PHP scripts that contain usernames and passwords for all the MySQL databases. Already having a panic attack? Clutching your head and wondering, what to do?
The simple answer lately was 'go for dedicated hosting' (which means having a distinct physical server for every given web site). Thanks to modern technologies and dark powers of electricity, these days those servers can be virtual dedicated ones, just multiple virtual machines running on top of one physical server (yeah, remember those VPSes you heard of?). This or that way, this looks very much like a trip back to 1990s (along with NAFTA, Hubble telescope, MMORPGs and Red Hot Chili Peppers): single IP per web server, administrating/management nightmares, and so on. Do we REALLY have to choose between good old one server per web site , or a shared hosting?
In CloudLinux, we can have the best of both worlds. It is shared hosting, but with decent resource controls applied to make those web sites more independent of each other. In a nutshell, the CloudLinux technology makes web server processes that handle the request of a specific web site be bound by resource constraints specified for this very site. Remember that “Host:” header which tells a web server what site a client wants? Whenever Apache in CloudLinux sees that, it jumps into so-called LVE (lightweight virtual environment) dedicated to this specific site. Now, with the help of CloudLinux, you can host multiple sites in a same shared manner, but have things about as safe an in case of dedicated hosting, specifying priorities and resource constraints for each of those sites. Now, if one site has a huge increase in traffic, it won't affect the others and make them slow and unresponsive (if a system is configured properly). You need to give it a try. You will be amazed. I guarantee it.
New version will log all processes currently in memory, when there is not enough memory. This should provide better information on what is causing server to overload.
To install:
We could use your help. We’d like you to talk with your server provider, tell them that you are using CloudLinux and that it would be great if they offered CloudLinux OS as a pre-installed option. There are lots of advantages for them, and for you as well. We know that the majority of CloudLinux users are strong advocates for our operating system so we thought, who better to talk with them than you! We have over 100 data center partners offering CloudLinux pre-installed but yours may not be one of them. Your datacenter may not even know that you are using CloudLinux or of any of the stability benefits it offers. Please tell them.
Thanks, in advance, for your help.
Beta version of LVE Manager for ISPmanager is available. It provides web UI to manage, monitor and adjust LVE settings for ISPmanager control panel.
More info can be found here: http://www.cloudlinux.com/docs/ispmanager_lvemanager.php
Two modes to calculate load averages with ability to switch between them
Fix for OOM/hanged task issue
IO Priorities
Code
# yum install kernel
If you have PAE, xen or Enterprise kernel -- use corresponding prefix, like: kernel-PAE, kernel-xen, kernel-ent instead of kernel
To change NCPU on the fly, use:
Code
# lvectl set LVE_ID --ncpu N --force
--force will cause NCPU change on the fly. Please, don't use that option with kernels prior to lve0.8.42, as it can crash the system
Load Averages
CloudLinux has a modified way to calculated load averages, as processes can wait on CPU because of LVE limits, and not because of lack of CPU resources. Previously, our LA algorithm was ignoring uninterruptable processes. You can now switch to another LA algorithm that accounts for uninterruptable processes by running:
Code
sysctl -w kernel.full_loadavg=1
Switching to that mode will cause higher load averages during high IO activity intervals. This should be useful on cPanel servers that have high IO Wait without high load average
You can always switch back by running:
Code
sysctl -w kernel.full_loadavg=0
IO Priorities
While we still planning to release IO limits by the end of this year, this release introduces IO priorities. Each LVE is set to have IO priority of 100 by default (highest possible).
You can lower that priority, causing particular LVE to be de-prioritizes IO wise. This means that if there is lack of resources on the server, that LVE will get less IO operations then other LVEs.
IO Priorities work only with CFQ IO scheduler.
New kernel 2.6.18-374.3.1.el5.lve0.8.44 is available from our cloudlinux-updates-testing repository. It includes everything from lve0.8.43 kernel as well as:
Rebased to upstream kernel 028stab094.3 including security and bug fixes RHSA-2011:1212
New kernel 2.6.18-374.el5.lve0.8.43 is available from our cloudlinux-updates-testing repository. It includes everything from lve0.8.42 kernel as well as a bug fix for a bug that was present since lve0.8.42 kernel
Fix for lve_list_next race condition
# yum install kernel-2.6.18-374.el5.lve0.8.43 --enablerepo=cloudlinux-updates-testing
If you have PAE, xen or Enterprise kernel -- use corresponding prefix, like: kernel-PAE, kernel-xen, kernel-ent instead of kernel
New kernel 2.6.18-374.el5.lve0.8.42 is available from our cloudlinux-updates-testing repository. The kernel brings changes from upstream kernel, as well as number of new features and improvements, including ability to switch number of cores per LVE without reboot and IO priorities. IO Limits will be implemented in the future versions.
Two modes to calculate load averages with ability to switch between them
Fix for OOM/hanged task issue
IO Priorities
fs.proc_super_gid was added to specify group of users that can see all processes
# yum install kernel-2.6.18-374.el5.lve0.8.42 --enablerepo=cloudlinux-updates-testing
If you have PAE, xen or Enterprise kernel -- use corresponding prefix, like: kernel-PAE, kernel-xen, kernel-ent instead of kernel
To change NCPU on the fly, use: # lvectl set LVE_ID --ncpu N --force --force will cause NCPU change on the fly. Please, don't use that option with kernels prior to lve0.8.42, as it can crash the system
Load Averages
CloudLinux has a modified way to calculated load averages, as processes can wait on CPU because of LVE limits, and not because of lack of CPU resources. Previously, our LA algorithm was ignoring uninterruptable processes. You can now switch to another LA algorithm that accounts for uninterruptable processes by running:
sysctl -w kernel.full_loadavg=1 Switching to that mode will cause higher load averages during high IO activity intervals. This should be useful on cPanel servers that have high IO Wait without high load average. You can always switch back by running:
sysctl -w kernel.full_loadavg=0
IO Priorities
While we still planning to release IO limits by the end of this year, this release introduces IO priorities. Each LVE is set to have IO priority of 100 by default (highest possible). You can lower that priority, causing particular LVE to be de-prioritizes IO wise. This means that if there is lack of resources on the server, that LVE will get less IO operations then other LVEs. IO Priorities work only with CFQ IO scheduler.
I am happy to announce the release of CloudLinux OS 6.1. The new version comes with 2.6.32 kernel, updated apache & php packages and brings us in line with EL distribution. All features available in CloudLinux OS 5.x should be available in CloudLinux OS 6.1
New ISO images with CloudLinux OS 6.1 are available, and conversion scripts are compatible with CentOS 6.0 and RHEL 6.1. You can find images and conversion scripts here:
We have recently introduced an issue where CPU usage would show up at 100% in lve stats output & resource usage plugins, even though account wasn't using that much. The issue would only show high usage, and no accounts were wrongly restricted in that case.
The version of lve-stats 0.6-7 was released to fix the issue. To update, please run:
New kernel 2.6.32-231.6.1.lve0.9.8 is available. The kernel fixes the issue with low default shared memory limits inside LVE that was causing issues for eAccelerator.
We are happy to announce CloudLinux 6.1 Beta. This release brings us in line with RHEL 6.x, and CentOS 6.x. The standard conversion scripts (centos2cl or cpanel2cl) were upgraded to handle conversion of RHEL/CentOS 6.x servers and ISO images for CloudLinux 6.1 are available for download http://www.cloudlinux.com/downloads
CloudLinux 6.1 comes with a new kernel based on 2.6.32. The new kernel is more efficient than the one available in CloudLinux 5.x. The new version will no longer create migration threads, and in most cases, should outperform CloudLinux 5.x
There are few known issues with CloudLinux 6:
lve-stats don't work - should be fixed within days
CageFS and MySQL governor are not yet available on CloudLinux 6.1 - we plan to introduce compatible versions in a few weeks
Everything else should be fully functional. We are waiting for our upstream kernel (OpenVZ) to be moved to 'stable' from 'beta', and we should announce a stable release of CloudLinux 6.1 soon after. We expect this within the next 30 days.
Please try CloudLinux 6.x in your environments. A license is required, so please feel free to register another account/request another trial license if needed.
I am happy to announce public beta of CageFS 2.0 (known as SecureLVE before). CageFS is compatible with cPanel, as well as majority of RPM based control panels. DirectAdmin support is coming soon.
CageFS is a virtualized file system and a set of tools to contain each user in its own 'cage'. Each customer will have its own fully functional CageFS, with all the system files, tools, etc.
The benefits of CageFS are:
Only safe binaries are available to user
User will not see any other users, and would have no way to detect presence of other users & their usernames on the server
User will not be able to see server configuration files, such as apache config files.
At the same time, user's environment will be fully functional, and user should not feel in any way restricted. No adjustments to user's scripts are needed.
CageFS will limit any scripts execution done via:
Apache (suexec, suPHP, mod_fcgid, mod_fastcgi)
LiteSpeed Web Server
Cron Jobs
SSH
Any other PAM enabled service (requires additional configuration)
Note: mod_php is not supported, MPM ITK requires custom patch
Comparing to SecureLVE, CageFS has following improvements:
No changes to /etc/passwd file, no longer requires custom shell
Support for any PAM enabled service
Enable All/Disable All modes with white listing
Single binary to control all CageFS operations
cPanel support
Faster & better skeleton update procedures
Prefixes used in /var/cagefs to better scale in environments with large number of customers
namespaces for better security
Improved skeleton configuration via multiple config files
Automatic mount point file generation
Numerous other bug fixes and performance improvements
The memory limits in CloudLinux are confusing at best. First of all they count virtual memory allocated by processes, instead of physical memory. And virtual memory use can be much higher, as Linux is very efficient in using same physical memory for multiple processes. We plan to add physical memory limits in the future – yet, this is not the only issue with memory limits.
No matter if we limit physical or virtual memory, there will always be some guess work in detecting if the script error was due to memory limit, or if it was due to permissions, configuration errors or errors in the script itself. Such errors is the primary reason for us to ship CloudLinux with memory limits disabled by default. Memory limits are useful, and can often save server from overloading, swapping & going down. Yet, they can also add errors, that most sys admins don't connect to memory limits right away
When software (such as php interpretor or mod_fcgid daemon) tries to allocate memory from a system, LVE can prevent that from happening. It would do it same way OS would do it in the case when there not enough memory. Most applications when they try to allocate memory, and fail, they will fail as well. It would look pretty much as if failed due to bug, or some other error. The distinction is very small, and usually comes as part of cryptic error message and strange exit code. When it comes to website, such errors usually pop up as error 500 – which means that script used to serve the request failed due to some error. In this case it usually means that PHP interpretor failed (same way it would fail on bad php script). Basically – PHP or some other components fails, for whatever reason, and error 500 served. Not much for CL to do here.
Sometimes it gets even worth. Recently we got a customer who complained about mail() not working in php script. It was working before, but it stopped working after CloudLinux was installed. We knew that CloudLinux 'never' does something like that, and were totally baffled. It was verifiable error. Running php script that was trying to send email would come back with:
Quote
Warning: mail() [function.mail]: Could not execute mail delivery program '/usr/sbin/sendmail -t -i'
Switching back to CentOS kernel would solve the problem (that would disable LVE). It took us some time to stumble upon the fact that it might be memory limits. Once we did, it took a minute to verify it. There was enough memory to run php interpretor, but not enough for sendmail to run on top of it. Hence sendmail would fail, and php would deliver such message. Increasing memory limit removed the issue. There is an easy way to figure out if the issue relates to memory limits. All you need to do is to run:
Code
# lveinfo --by-fault=mem --display-username
If you see user for which script failed in the list, it means that some script for that user hit memory limit within the past 10 minutes. Run the script again, re-check lveinfo (note, it takes 1 minute for it to update) – and you know for sure. Same information can be taken out of /proc/lve/list
Of course this is not enough, and we plan to do more. We want to create sophisticated notification system, so that both admin & user would be notified in case memory limits are reached. Additionally, we are researching the possibility to detect run time, on webserver level, when one of the processes that was used to serve up the request hit memory limit – and if we can intervene & serve our own error message in such cases. We are still at researching it – and if that would be possible, it would create a nice way to take out the confusion.
New kernel 2.6.18-338.19.1.el5.lve0.8.36 is available now. The kernel brings changes from upstream kernel (OpenVZ 028stab092.2), as well as improves load averages reporting. The kernel has following bug fixes and enhancements:
Rebase on 238.19.1 rhel5.6 update (security and bug fixes )
DRBD is compiled in xen x86_64 kernels
OpenVZ kernel related bugfixes ()
Improved load averages
In previous kernel versions, load averages wouldn't include processes in D state (uninterruptible state). Those are the processes that are waiting on IO. This kernel fixes the issue. You can switch back to old behavior by running:
sysctl -w kernel.full_loadavg=0
or adding it to /etc/sysctl.conf and run sysctl -p