Server down now back online, is there a log or something to find the problem?

Portal Home > Knowledgebase > Articles Database > Server down now back online, is there a log or something to find the problem?

Posted by chasebug, 07-06-2009, 11:12 PM
So my server went down for about 30 minutes today until it got rebooted. Where can I find logs on my server that can tell me what happened to the server prior to being down?
Posted by Rekhatitus, 07-06-2009, 11:23 PM
have you checked /var/log/messages
Posted by alanzkorner, 07-06-2009, 11:46 PM
Hello , first of all what you should be doing now is to monitor your server for any load using top command , w etc atleast for some time . It is also good that you write a small shell script and add to cron to check server for load above a particular limit say 10 send a mail to you and restart apache . Check var/log/messages like Rekha mentioned , check apache error log based on the time server went down at /usr/local/apache/logs/error_log . It might give you some clue . Check the space status of the partitions using df-h if any partition is full . Check if RAM is enough or has enough free using free -m or is it being consumed fully .. Alan
Posted by Winstyn, 07-07-2009, 12:38 AM
I had a question though. What parts of your server were down? SSH, Apache, everything? Does your provider offer KVM over IP or IPMI? Whenever it seems to go down it is a good idea to check IPMI if the network has just crashed the IPMI will tell you why. This will give you the last 500 lines in the dmesg log. If you think the server crashed you can do the following: This will return any lines with panic in them useful for looking for a kernel panic. Also is this a physical server or a VPS?
Posted by chasebug, 07-07-2009, 03:44 AM
I was connected to the server with ssh, running the top command. The loads were all very low before getting disconnected and server being unpingable. Do you know of any free scripts that can monitor my httpd/mysql status and email me if they're down? I checked /var/log/messages, there was nothing at around the time server went down. I couldn't ping the server, can't access any sites on it and can't connect via ssh, so I don't know which services were down. This is a physical server.
Posted by khunj, 07-07-2009, 04:36 AM
Try or
Posted by inspiron, 07-07-2009, 05:13 AM
Have you also looked at the Apache error log?
Posted by alanzkorner, 07-07-2009, 05:37 AM
bin/bash ####### set -x #echo "=========================================" #echo "Welcome to Load checking Script of Alan" #echo "=========================================" #command=$(w | grep "load average:" | awk '{print $10,$11,$12}') currentload=$(w | grep "load average:" | awk '{print $10}' | cut -d "." -f 1) currentavgload=$(w | grep "load average:" | awk '{print $10}') currentavgload1=$(w | grep "load average:" | awk '{print $11}') currentavgload2=$(w | grep "load average:" | awk '{print $12}') hostname=$(hostname) #echo "Server's Load Average is $command " if [ "$currentload" -gt "7" ]; then echo "SERVER LOAD IS HIGH on $hostname. The current load , last 10 minute AND last 15 minute load is $currentavgload ,$currentavgload1 AND $currentavgload2. " echo -e "SERVER LOAD IS HIGH on $hostname. The current load , last 10 minute AND last 15 minute load is $currentavgload ,$currentavgload1 AND $currentavgload2. " | mail -s "IMPORTANT!! HIGH LOAD on $hostname."alan@servadm.com else exit 0 fi #if [ "$currentload" -gt "14" ]; #then #/usr/bin/killall -9 httpd #else #exit 0 #fi copy the contents to a file load-check.sh and keep it somewhere say in /root/scripts folder by name load-check.sh . This would help you . You should change the email ID to yours . s and add to cron to crontab by crontab -e */5 * * * * /bin/sh /root/script/load-check.sh If the load goes above 7 it will mail you , if it goes above 14 it will kill all httpd instances and mail you too . and it checks load every 5 minutes . you can change the timings too . First make the script running by sh script.sh and then only add to cron . Thanks, Alan John
Posted by alanzkorner, 07-07-2009, 05:42 AM
Hello, You will have to copy the script to /bin also . One mistake you will have to remove the # before in last part of script to kill httpd if load goes above 14 ,but then you should have one to check if httpd is running and then start it if not there , else you will be in trouble. if [ "$currentload" -gt "14" ]; then /usr/bin/killall -9 httpd else exit 0 fi Thanks, Alan Last edited by alanzkorner; 07-07-2009 at 05:48 AM. Reason: incomplete explanation
Posted by alanzkorner, 07-07-2009, 05:51 AM
For monitoring purpose free scripts are available . Best one is nagios ,but you will have to install it , but you might need professional help . Thanks, Alan
Posted by aneesadmin, 07-07-2009, 07:38 AM
Check /var/log/messages for the word "restart". You will find something like "syslogd restart". Closely examine the logs before that process. You will surely get any clue of what really happens..
Posted by WHR-Abner, 07-07-2009, 07:47 AM
Did you find anything in the dmesg logs as mentioned by "khunj"? The kernel buffers should output common I/O errors. If your NOC offers KVM over IP, that is the best way to check the status, if the server goes unresponsive suddenly.
Posted by koithara, 07-07-2009, 08:37 AM
Hi, You can get a copy of load alert script from : http://www.crazyadmins.com/forum/vie....php?f=15&t=50 This will also gives you a summary of all processes currently running on the server, which at times is very helpful in digging out the root causes for load issues.

Add to Favourites Print this Article