I have doubled the amount of RAM on the server, going from 1 to 2 GB. This should definitely help on the hangs. Also, I have added management again, by using a Real Weasel PCI card. If there is any trouble with the machine now, I should have a fair chance to fix it from home or whereever I am.
I took some pictures of the current server setup. On top you can see the backup server for DNS/MX, which also controls the 4-port USB serial adapter, for management.
Saturday, September 27, 2008
Friday, September 26, 2008
Ok, so we still have hangs. The blog neppe.no (which uses a lot of memory for PHP) and ZFS are the culprits. I had to undo some "fixes" to have neppe.no working at all. Until we have neppe.no using less memory, or more memory in the machine, we are going to have some occasional hangs. When the hangs happens, sometimes processes die. Because of this I have extended the monitoring on Totem to also check things like the mail queue length, and that Amavis/Postgrey are up and running. Should help to keep things up and running, and mail flowing.
Tuesday, September 23, 2008
It seems the recent server hangs are related to rewrite rules in Apache and the generation of one big PHP page several times in parallel. This makes the server use a lot of swap within a short time, and for the users -- hang. I have made some changes to the setup so that it hopefully doesn't happen again.
Sunday, September 14, 2008
Wednesday, September 10, 2008
The last weeks we've had some hangs, lasting upto and hour or two, which I still have now been able to track down. It could be related to the fact that we are using ZFS now. In an attempt to avoid the hangs, I upgraded the mother OS (on the physical server) today. Unfortunately it did not come up after a reboot, which happened during a busy day at work. It took some time before I could look at it and also to find out what was wrong. According to Nagios, we were down from 12:46 to 20:52. And yes, I do get an SMS message if the server goes down.