Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Just to clarify some things: I don't know what the issue is for _your_ server. There are several possibilities.
All that is happening here is that the consumers of the buffer, which is the motion detection/writing out to disk part are not keeping up with capture. In 1.36.4 we will now just wait instead of dropping the packets, which might be ok but we need understand WHY the consumer is not keeping up. There are four possibilities:
DB hanging. analysis creates events and writes to db. mysql does stupid locking. I maybe mistakenly added some foreign keys to the db thinking that this was generally a good idea. It might not have been. I'm thinking of removing them.
Disk IO being slow: finalising an mp4 is kinda intensive. It might take a second or more to do it, during which are capturing. Not sure what I can do about that. Also another other disk IO like logging could affect the results.
A race condition. I think I've found and fixed most of them, it is entirely possible that there is another.
Not enough CPU. In some cases I have seen the process of converting from yuv420p (what we get back from the decoder) to rgba (what we do our motion detection on) takes way longer than doing the actual decoding. I don't know why and will have to do more investigation. Also, long term, we should be doing our motion detection on the yuv420 image instead. Would cut our ram needs by 4 and removing the step entirely.
All that is happening here is that the consumers of the buffer, which is the motion detection/writing out to disk part are not keeping up with capture. In 1.36.4 we will now just wait instead of dropping the packets, which might be ok but we need understand WHY the consumer is not keeping up. There are four possibilities:
DB hanging. analysis creates events and writes to db. mysql does stupid locking. I maybe mistakenly added some foreign keys to the db thinking that this was generally a good idea. It might not have been. I'm thinking of removing them.
Disk IO being slow: finalising an mp4 is kinda intensive. It might take a second or more to do it, during which are capturing. Not sure what I can do about that. Also another other disk IO like logging could affect the results.
A race condition. I think I've found and fixed most of them, it is entirely possible that there is another.
Not enough CPU. In some cases I have seen the process of converting from yuv420p (what we get back from the decoder) to rgba (what we do our motion detection on) takes way longer than doing the actual decoding. I don't know why and will have to do more investigation. Also, long term, we should be doing our motion detection on the yuv420 image instead. Would cut our ram needs by 4 and removing the step entirely.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Yeah, after the excessive logging filling up my disk, having to clean that up (about 10-20 events logged per second both the log files themselves and the DB Log table were huge), then modifying options to stop the errors leads to higher CPU and memory usage and I still get some events logged (would need more tuning).Bearded_Beef wrote: ↑Tue Jun 08, 2021 6:45 pm I ended up completely nuking my setup and went back to 1.34
All of that on top of a memory leak makes zoneminder unusable at this point.
The increased CPU and memory use requirements needs to be fixed, as well as the memory leak and the excessive logging.
I'm going to try going back to 1.34, I have a DB backup but not sure how hard it'll be to restore and downgrade to 1.34.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
After tweaking MariaDB and doing some other minor tweaks to the storage (none as scheduler from mq-deadline and reducing zstd compression ratio) + increasing the overclock on my overworked ARM SBC, I've managed to drastically decrease the amount of dropped frames on my device. Now even when a user is reviewing footage via Montage Review, it rarely complains with every camera set to 200 max image buffer size.
I've most likely been hitting 1, 2 and 4 from iconnor's list due to heavy CPU use from BTRFS extent compression (which limits I/O, bandwidth and causes analysis to fall behind), coupled with memory leaks causing additional CPU load for zram compression/decompression of swapped out pages. The tweaks to MariaDB alone had a big effect on latency, even before everything else.
I've most likely been hitting 1, 2 and 4 from iconnor's list due to heavy CPU use from BTRFS extent compression (which limits I/O, bandwidth and causes analysis to fall behind), coupled with memory leaks causing additional CPU load for zram compression/decompression of swapped out pages. The tweaks to MariaDB alone had a big effect on latency, even before everything else.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
My experience with btrfs for ZM has not been good. First off it is no good for mysql, so don't put mysql on it. After that for event storage it pegs one cpu. Even after turning off copy on write and all the other features.
I need to get around to testing zfs. Mostly I just use ext4 and it works great.
I need to get around to testing zfs. Mostly I just use ext4 and it works great.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Just wanted to add to this that I have as well modified MariaDB/InnoDB configuration for better disk access and while Zoneminder seemed more responsive, the memory issue remains. An odd occurrence however was that there was a time Zoneminder hit 100% RAM usage, then after a few minutes, came back down on it's own and operated just fine for a couple days. Odd fluke considering it was up to restarting every 12 hours.ergamus wrote: ↑Wed Jun 09, 2021 1:09 pm
So what is it? A race condition? The analysis part of ZM being jammed up because it's waiting for INSERTS to hit the database? Can this be solved by optimizing for I/O? f*** it, I'll put the database on a seperate SSD to fix this if necessary. The way you've written your post it seems like it's unfixable.
edit: I've done some tweaks to my config. Increase buffer pool size, allowed up to 50% of buffer for write buffering (ALL, Inserts/Deletes/Updates..), increase I/O capacity values to better reflect storage, disabled doublewrite buffering, doubled the amount of read/write/purge I/O threads. Let's see if it improves the situation.
edit2: Off the bat Zoneminder is more responsive, bit too early to tell about memory leaks. It almost seems like it's reduced CPU usage. Worth optimizing your MySQL/MariaDB settings!
edit3: Memory still exceeding any limits set by the user. Back to the */6 hour crontab restart entry.
For now though I am using a custom script to monitor ram usage on my server, and restart Zoneminder when there is less than 100mb RAM free as to minimize restarts if possible, but still prevent a total lock-up.
Not a great solution but if anyone would like the script, I can provide it.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Caveat: I've not really looked at the code and I'm sure there's more forum posts I could read to further educate myself, but I still think I'm on to why performance took a massive nosedive in the upgrade to this new version.
My setup abbreviated:
ZFS raid
32GB ECC RAM
Intel Xeon 6 core Coffee Lake
semi-recent nvidia GPU
12x 4 to 5MP 20FPS monitors
So I've fiddled with quite a few settings and I think the following is the issue, at least for me:
Some non-trivial amount of decoding and motion detection is being done for monitors that are in the mode "Record" even though... as I understand it, there's no need for motion detection (and per https://zoneminder.readthedocs.io/en/st ... nitor.html it's explicitly disabled for this type of monitor)
Here's my logs saying motion detection is enabled (even though it shouldn't be):
Since my image sources are large, when they "happen" to be decoded and then some amount of motion detection run on them, the system can't keep up and the memory usage skyrockets.
I understand that some amount of decoding needs to be done for a live stream of a monitor, but when I leave "Analysis enabled" (since i want events!) but turn off "Decoding enabled" my RAM and CPU usage becomes trivial. If I add a duplicate monitor in function "Monitor", I can roughly re-create the performance and functionality of the previous version of Zoneminder with the following exception: when "Analysis enabled" is on but "Decoding enabled" is off, the events never end, even though my "Section length" is set to 600
My setup abbreviated:
ZFS raid
32GB ECC RAM
Intel Xeon 6 core Coffee Lake
semi-recent nvidia GPU
12x 4 to 5MP 20FPS monitors
So I've fiddled with quite a few settings and I think the following is the issue, at least for me:
Some non-trivial amount of decoding and motion detection is being done for monitors that are in the mode "Record" even though... as I understand it, there's no need for motion detection (and per https://zoneminder.readthedocs.io/en/st ... nitor.html it's explicitly disabled for this type of monitor)
Here's my logs saying motion detection is enabled (even though it shouldn't be):
Code: Select all
06/15/21 22:14:44.905540 zmc_m1[3872521].DB2-zm_packetqueue.cpp/461 [waiting. Queue size 473 it == end? 0]
06/15/21 22:14:44.905578 zmc_m1[3872522].DB2-zm_packetqueue.cpp/489 [At end]
06/15/21 22:14:44.905584 zmc_m1[3872522].DB3-zm_monitor.cpp/1751 [Motion detection is enabled signal(1) signal_change(0) trigger state(Cancel) image index 472]
06/15/21 22:14:44.905587 zmc_m1[3872522].DB3-zm_monitor.cpp/1756 [Have event lock]
06/15/21 22:14:44.905590 zmc_m1[3872522].DB1-zm_monitor.cpp/1855 [Waiting for decode]
I understand that some amount of decoding needs to be done for a live stream of a monitor, but when I leave "Analysis enabled" (since i want events!) but turn off "Decoding enabled" my RAM and CPU usage becomes trivial. If I add a duplicate monitor in function "Monitor", I can roughly re-create the performance and functionality of the previous version of Zoneminder with the following exception: when "Analysis enabled" is on but "Decoding enabled" is off, the events never end, even though my "Section length" is set to 600
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
One small suggestion, in 1.36.4 when we wait for the buffer to free up, we get a error message for every packet:
Unable to free up older packets. Waiting.
As this happens for every packet, this generates a new write into the DB, and it effectively acts as a DoS when the system is overwhelmed. Not to mention it makes the log a bit messy as you need to now filter through tens of thousands of unnecessary messages. I know if you move from 1.34.* to 1.36 without tweaking the buffer values, and you have a lot of cameras, this means potentially millions of error messages in the log before you figure it out.
So instead of:
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting.
...
Maybe a toggle to your error message, potentially keeping the old per packet behavior for debug logging:
2021-06-16 16:50:56 zmc_m3 14172 ERR Queue freed. Buffering new packets. (or a counter maybe; Packets skipped: 160)
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting.
Unable to free up older packets. Waiting.
As this happens for every packet, this generates a new write into the DB, and it effectively acts as a DoS when the system is overwhelmed. Not to mention it makes the log a bit messy as you need to now filter through tens of thousands of unnecessary messages. I know if you move from 1.34.* to 1.36 without tweaking the buffer values, and you have a lot of cameras, this means potentially millions of error messages in the log before you figure it out.
So instead of:
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting. zm_packetqueue.cpp 136
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting.
...
Maybe a toggle to your error message, potentially keeping the old per packet behavior for debug logging:
2021-06-16 16:50:56 zmc_m3 14172 ERR Queue freed. Buffering new packets. (or a counter maybe; Packets skipped: 160)
2021-06-16 16:50:40 zmc_m3 14172 ERR Unable to free up older packets. Waiting.
Last edited by ergamus on Wed Jun 16, 2021 3:29 pm, edited 1 time in total.
-
- Posts: 1
- Joined: Wed Jun 16, 2021 2:49 pm
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
I just want to report the same issue about ZM 1.36. docker image i am using is ZM 1.34 https://hub.docker.com/r/dlandon/zoneminder, and ZM 1.36 https://hub.docker.com/r/dlandon/zoneminder.unraid, and this is what i get
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
How are we doing with 1.36.5 ? It should improve analysis consuming and freeing ram.
Yes yes I know I havn't finished all the release notifications.
Yes yes I know I havn't finished all the release notifications.
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
After starting ZM, if I peg the CPU usage for say, two minutes to the point where analysis starts to fall behind, the memory use of the zmc process for each monitor will exceed the max specified under Buffers and stay there. Currently from a maximum of ~1.5GiB, one zmc process is using 3.25GiB, the other 2.63GiB. This is for a 1080p camera, 24bit color with a maximum buffer size of 200 frames. The memory usage is permanent, it doesn't decrease until the monitor is restarted or reaped by oomd.
The memory increase is happening after zmc starts waiting for frames (Unable to free up older packets. Waiting.). This is on 1.36.5 on Fedora 34, aarch64. I'll report back if the memory keeps increasing beyond a certain point and flushing over into swap.
edit: One weird thing, this doesn't happen always. I've repeated this CPU exhaustion scenario about 4 times in a row now after a [zmpkg.pl restart], but the memory usage is fixed for one monitor at 1.49GiB~, the other is staying at 2.36GiB but not increasing.
edit: After about half an hour of using Montage Review to.. review footage: 5GiB memory usage between both zmc processes + an additional 1.3GiB in swap usage and an extra 25k dropped frames messages in the log.
-
- Posts: 1336
- Joined: Sat Aug 31, 2019 7:35 am
- Location: San Diego
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
Have you found 24 bit color less taxing than 32?
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
The setting worked fine for me on 1.34, where it helped reduce memory usage. All my cameras are outputting 24bit, so I figured I might as well shave 8bit per pixel off.
However, now in 1.36 I'm wondering if I'm making my problem worse. Since image conversion is more CPU expensive (30-50% usage on 1.34 versus 50-80% on 1.36), adding another step (32->24bit) makes little sense. I'll test going back to 32bit, see if it helps with memory leakage. I think I remember a dev or a wiki entry somewhere that stated most image processing algorithms are optimized around 32bit colorspace, with any other choice just increasing CPU usage.
edit: Just made the problem worse, went back to 24 bit.
edit: Load average on 1.36.5 is consistently higher.
-
- Posts: 1336
- Joined: Sat Aug 31, 2019 7:35 am
- Location: San Diego
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
OK. I thought I'd read the same.
But I suspect if you're cameras really are 24 bit, that's where to set them, as you have found.
Thanks.
But I suspect if you're cameras really are 24 bit, that's where to set them, as you have found.
Thanks.
-
- Posts: 2
- Joined: Sat Jun 05, 2021 9:10 pm
Re: Memory Leak/Increase and Choppy events due to Maximum Image Buffer Size
1.36.5 has helped big time and resolved the buffering and RAM usage issues for me. I still get log errors saying “unable to free up older packets. Waiting.” And “ You have set the max video packets in the queue to 150. The queue is full. Either Analysis is not keeping up or your camera's keyframe interval is larger than this setting. We are dropping packets.” But video seems to be smooth and no more issues like before.
Only thing I’ve noticed is every once in a great while one or all of my cameras will say “unable to stream” and I have to restart zm. It happened all the time on 1.36.4, but very rarely on 1.36.5. Besides some page layout and display issues on mobile, I’m very happy with this update. Thank you!!!