Released 1.36.25 The Memory Remains

Discussions related to the 1.36.x series of ZoneMinder
lightguy48
Posts: 110
Joined: Sun Nov 15, 2015 7:19 pm

Re: Released 1.36.25 The Memory Remains

Post by lightguy48 »

Thanks for fixing the issue with the keyframe interval, looks good no more repeating warnings
keithp
Posts: 16
Joined: Sat Aug 06, 2022 12:44 am

Re: Released 1.36.25 The Memory Remains

Post by keithp »

keithp wrote: Fri Sep 09, 2022 5:26 pm
iconnor wrote: Fri Sep 09, 2022 4:44 pm You have a filter deleting the events as they are created.
I turned it off for now.
Circling back to this (forgive me for being short but I lost my entire post and I don't feel like retyping)...

Things are much better now and I've isolated what I think are the two issues:

1) zmc crashes are VERY bad... the camera "Maximum Image Buffer Size" setting has been the root cause of zmc crashing. I had to set my 720p cam to 5000 and my 1080p cams to 8000. That stopped the errors about the queue being full and then the imminente process crash that would occur. Once that happens, you don't get an end stamp on that segment and the purgewhenfull filter as written, will randomly delete events. I still had two oom events kill a zmc but they don't appear to have effected much. In the last 24 hours I think I've had less than 10 events without end stamps but doing that query sometimes blow up the load, which bring me to my next point...

2) event lookup... anything over 100 events was blowing up my system load (which is known problem that I heard will be fixed in .26) and results in needing to restart apache2 to drop the load so the system is usable. I typically can view and hours worth of events but day sometimes is and a month definitely is a no go. This made it hard to troubleshoot why the purgewhenfull filter was not working but I've found a "fix" for that...
Screenshot_20220910_193552.png
Screenshot_20220910_193552.png (32.17 KiB) Viewed 55030 times
...if you look at that last line, instead of the default, I use "and Start Date/Time less than or equal to -1 day". That was the main thing that got this working. Limiting to 25 events then also prevents the system load from blowing up as a result of the SQL query.

I'm feeling much better about .36.25. I'm going to go back to my .34.x snapshot and upgrade again so things are cleaner going forward but the system is usable and stable. I was able to go from 90% back down to 83% in 3 hours so I'm not at risk of running out of space but I want to see if the system can back down to the 55% target before I do that do that rebuild.
keithp
Posts: 16
Joined: Sat Aug 06, 2022 12:44 am

Re: Released 1.36.25 The Memory Remains

Post by keithp »

I think I may have spoken too soon. A zmc memory leaks appears to be back. This was yesterday...
Screenshot_20220912_103923.png
Screenshot_20220912_103923.png (64.31 KiB) Viewed 54967 times
I turned off the disk purge routine since I thought it might be that but now this is front today...
Screenshot_20220913_114353.png
Screenshot_20220913_114353.png (60.6 KiB) Viewed 54967 times
User avatar
iconnor
Posts: 3197
Joined: Fri Oct 29, 2010 1:43 am
Location: Toronto
Contact:

Re: Released 1.36.25 The Memory Remains

Post by iconnor »

They are all in D state... waiting on disks... likely...

If something gets slow, the queues fill up. Which is why we have MaxImageBuffers...
keithp
Posts: 16
Joined: Sat Aug 06, 2022 12:44 am

Re: Released 1.36.25 The Memory Remains

Post by keithp »

iconnor wrote: Wed Sep 14, 2022 2:07 pm They are all in D state... waiting on disks... likely...

If something gets slow, the queues fill up. Which is why we have MaxImageBuffers...
Interesting... so when the "queue" fills up is that why I'm seeing the process consuming so much memory? Something that comes to mind with MaxImageBuffers is that I've had to turn them up very high to stop getting the messages about the queue being full. That's all fine except that that number based on calculated max ram use is greater than the system memory so in my case oversubscribing appears to be causing the oom to kick in because I'm actually passing a physical limit. The thing I don't understand now, if this is about disk access, why am I seeing this conditions more now in.36.x?

After my last post, I when back to record mode for all cameras and turned off analysis. I also turned back on the purge process. Initially that was great but then the the same issue came back in a couple of hours. Cameras go down every so often saying unable to stream or camera is not connected. I won't see any queue errors in syslog but I will see oom killing a zmc process.

That only thing I can think of that is causing a problem now is the purge process. Maybe that database routine is bogging down the disk access? I'm going to turn that off again since I'm back down the 60%. If there is something else you suggest please let me know.
Post Reply