I have five servers running Ubuntu 8.04 LTS Server x64 and ZM 1.23.3. Each server records 16 cameras 24/7.
Two of my servers are crashing fairly often, 3-5 times a week due to out of memory error. The process that gets killed is typically mysqld (or something), but sometimes its a ZM process. When the process gets killed the server stops recording video and stops hosting the website for active monitoring. The weird thing is, that these two servers have both been running for over one year without frequent problems with a scheduled weekly reboot, and it is only recently that they started misbehaving.
When they crash, I just reboot the server. Usually they come back up and start recording fine, but sometimes when I try to go to the ZM webpage, I get "An error has occurred and this operation cannot continue. For full details check your web logs for the code 'B83AB6'". The code is different every time. I run a repair on the frames and events tables using the phpmyadmin web tool and all is well.
I've tried a combination of throwing more physical memory at the servers and adjusting the SHMALL and SHMMAX. I'm running the same software versions of MySQL, ZM, and Apache since i've built the system and it never gets software updates. So I don't think anything has changed that could be causing this.
I've started rebooting the servers daily now and that has prevented them from running out of memory (for now...), but that certainly is not a fix. Is it possible that the database just becomes too big of a mess and starts consuming more and more memory over its usage that eventually I will start having problems like this? Do I need to simply reformat and rebuild these servers once a year? I'm not sure what else to try.
I'm not a Linux guy by any means, I barely scrape by with my general tech knowledge... These systems were experimentally built by my boss who knows more about linux that I, but is certainly not an expert either. We're both scratching our heads on this one and don't know what to do.
Ideas?