Servers crashing, out of memory, kills process
Posted: Thu May 10, 2012 5:23 pm
I already posted this in the Previous Versions forum, but that place is a ghost town, and I don't think this problem is specific to the version of ZM so i'm going to post it here as well.
I have five servers running Ubuntu 8.04 LTS Server x64 and ZM 1.23.3. Each server records 16 cameras 24/7. A few of them have been running over one year (with weekly reboots) with very few problems, and the others probably almost a full year now. I don't know exactly which ones as I didn't keep track of when I built them.
Anyway, two of my servers are crashing fairly often, 3-5 times a week due to out of memory error. The process that gets killed is typically mysqld (or something), but sometimes its a ZM process. When the process gets killed the server stops recording video and stops hosting the website for active monitoring. The weird thing is, that these two servers have both been running for over one year without frequent problems with a scheduled weekly reboot, and it is only recently that they started misbehaving.
When they crash, I just reboot the server. Usually they come back up and start recording fine, but sometimes when I try to go to the ZM webpage, I get "An error has occurred and this operation cannot continue. For full details check your web logs for the code 'B83AB6'". The code is different every time I try, meaning if I refresh the code will be different. To fix this, I run a repair on the frames and events tables using the phpmyadmin web tool and all seems well.
So how do I fix this crashing issue? I've tried a combination of throwing more physical memory at the servers and adjusting the SHMALL and SHMMAX. I'm running the same software versions of MySQL, ZM, and Apache since i've built the system and it never gets software updates. So I don't think anything has changed that could be causing this.
I've started rebooting the servers daily now and that has prevented them from running out of memory (for now...), but that certainly is not a fix. Is it possible that the database just becomes too big of a mess and starts consuming more and more memory over its usage that eventually I will start having problems like this? Do I need to simply reformat and rebuild these servers once a year? I'm not sure what else to try.
I'm not a Linux guru by any means, I barely scrape by with my general tech knowledge... These systems were experimentally built by my boss who knows more about linux that I, but is certainly not an expert either. We're both scratching our heads on this one and don't know what to do. Neither of us are really sure what to set the SHMALL and SHMMAX too, there are several sources claiming a specific way to do it where other sources contradict that. So it seems there is a opinion on how to set it up and that the optimal settings may vary from system to system.
So does anyone have any ideas here?
I have five servers running Ubuntu 8.04 LTS Server x64 and ZM 1.23.3. Each server records 16 cameras 24/7. A few of them have been running over one year (with weekly reboots) with very few problems, and the others probably almost a full year now. I don't know exactly which ones as I didn't keep track of when I built them.
Anyway, two of my servers are crashing fairly often, 3-5 times a week due to out of memory error. The process that gets killed is typically mysqld (or something), but sometimes its a ZM process. When the process gets killed the server stops recording video and stops hosting the website for active monitoring. The weird thing is, that these two servers have both been running for over one year without frequent problems with a scheduled weekly reboot, and it is only recently that they started misbehaving.
When they crash, I just reboot the server. Usually they come back up and start recording fine, but sometimes when I try to go to the ZM webpage, I get "An error has occurred and this operation cannot continue. For full details check your web logs for the code 'B83AB6'". The code is different every time I try, meaning if I refresh the code will be different. To fix this, I run a repair on the frames and events tables using the phpmyadmin web tool and all seems well.
So how do I fix this crashing issue? I've tried a combination of throwing more physical memory at the servers and adjusting the SHMALL and SHMMAX. I'm running the same software versions of MySQL, ZM, and Apache since i've built the system and it never gets software updates. So I don't think anything has changed that could be causing this.
I've started rebooting the servers daily now and that has prevented them from running out of memory (for now...), but that certainly is not a fix. Is it possible that the database just becomes too big of a mess and starts consuming more and more memory over its usage that eventually I will start having problems like this? Do I need to simply reformat and rebuild these servers once a year? I'm not sure what else to try.
I'm not a Linux guru by any means, I barely scrape by with my general tech knowledge... These systems were experimentally built by my boss who knows more about linux that I, but is certainly not an expert either. We're both scratching our heads on this one and don't know what to do. Neither of us are really sure what to set the SHMALL and SHMMAX too, there are several sources claiming a specific way to do it where other sources contradict that. So it seems there is a opinion on how to set it up and that the optimal settings may vary from system to system.
So does anyone have any ideas here?