ZMaudit Causes Buffer Overruns

Forum for questions and support relating to the 1.26.x releases only.
Locked
User avatar
dvarapala
Posts: 54
Joined: Sat Nov 06, 2010 2:30 pm

ZMaudit Causes Buffer Overruns

Post by dvarapala »

Here's a known issue for which I'm trying to find a solution.

In the Wiki it states:
the process of scanning the database and file system may take a long time and impact performance.
On my i7 system with 16GB of RAM this is indeed the case. I am running a background "delete when disk full" filter and using OPT_FAST_DELETE; when zmaudit kicks in and starts deleting image files from the disk, I get sporadic "buffer overrun" messages in the log. All 8 cores typically run at no more than 50% utilization, so CPU is not an issue. However, according to the output from iotop, the bottleneck is I/O bandwidth: during the periods when the buffer overruns occur I see the "rm" command using up 99% of the I/O bandwidth. The zma daemons end up having to wait for disk I/O, which slows them down so that they cannot process incoming frames quickly enough, leading to the overruns. So now I'm starting to play around with ionice to try to lower the I/O scheduling priority of the "rm" commands that zmaudit.pl issues. Is there anything else I should try? For those of you who have solved this problem, what worked for you? Is there a standard solution to this problem that I have somehow missed?
User avatar
knight-of-ni
Posts: 2406
Joined: Thu Oct 18, 2007 1:55 pm
Location: Shiloh, IL

Re: ZMaudit Causes Buffer Overruns

Post by knight-of-ni »

Well, knowing how you have your underlying storage array setup would help.

Here are some generic suggestions to try:
1) Limit your filter results. I've got 7 VGA resolution cameras and limiting the results to 200 works for me. The end result is that the filter will delete smaller chunks more often rather than a bunch at a single time.
2) Use a dedicated drive for your video storage. You are already doing this right?
3) Use a faster drive/array for your video storage
4) Use a different fielsystem for your video storage. I use XFS for my Mythtv storage because delete operations do not lock the filesystem like ext4, but tbh I don't know how well XFS works with lots of small files rather than large files.
Visit my blog for ZoneMinder related projects using the Raspberry Pi, Orange Pi, Odroid, and the ESP8266
All of these can be found at https://zoneminder.blogspot.com/
User avatar
dvarapala
Posts: 54
Joined: Sat Nov 06, 2010 2:30 pm

Re: ZMaudit Causes Buffer Overruns

Post by dvarapala »

My filesystem is an ext4 consisting of multiple LVM volumes. The ZM events have their own LVM composed of 2 2TB Seagate Barracuda 7200RPM drives.

Although I was able to lower the I/O priority of zmaudit.pl to Idle Class using ionice, that didn't solve the problem; it turns out that any other process that does a big burst of I/O to the ZM volume - including the ext4 journaling daemon and even other zma instances - can also slow zma down enough to cause overruns.

Next I tried the "frame server" option. On paper, this should be an ideal solution: farm the disk I/O out to a separate thread, leaving the analysis daemons free to continue processing frames even if the disk I/O takes longer than usual.

Update: I figured out that, in addition to enabling the frame server option, I also need to set the buffer size option in the ZM config, since the default buffer size is too small for all my 3MP cameras. So far, so good - now we'll see if this makes the buffer overruns go away. :)
User avatar
dvarapala
Posts: 54
Joined: Sat Nov 06, 2010 2:30 pm

Re: ZMaudit Causes Buffer Overruns

Post by dvarapala »

After playing with the Frame Server option for a while, I have the following observations:

I found that I need to allocate a lot of buffer space for the Unix domain socket that carries the frames between zma and zmf. Even 16MB of buffer space per process isn't enough to absorb all of the peaks; occasionally the buffer fills up, the socket write fails, and zma goes back to writing the frames itself, which of course leads to more overruns. In addition, increasing the socket buffer size wastes memory, because every zma/zmf pair uses the same buffer settings, even if the camera is 640x480 @ 3fps. Thus, while farming out the disk writes in this way does help, it's not the ultimate solution.

So I decided to try an experiment: instead of usng a separate process to do the writing, I modified zma to add a "Writer" thread to write the JPEG files to disk. So instead of writing out the images in the main analysis thread, zma queues up the images for writing by this new thread, preventing the main thread from blocking. The queue is allowed to grow as needed and should absorb peaks of any practical size. It also avoids wasting buffer space on monitors which don't need it.

Currently the image buffers are allocated from the heap, so there is a potential for heap contention as documented here. Since the buffers are allocated by one thread (the analysis thread) and freed by another (the writer thread), this is a textbook scenario for heap contention. To avoid this I may try adding a look-aside list for the image buffers to avoid this contention.

We shall see. :D
User avatar
dvarapala
Posts: 54
Joined: Sat Nov 06, 2010 2:30 pm

Re: ZMaudit Causes Buffer Overruns

Post by dvarapala »

Experimental results:

Adding a background thread to zma to write the frames to disk helped but did not completely solve the problem - there were still occasional buffer overruns. Apparently the analysis thread is blocking on something besides disk I/O (such as I/O to the database socket), or otherwise being starved for CPU time (which seems unlikely, since the overall CPU utilization hovers around 50%). Unfortunately I don't have much experience in diagnosing system performance issues on Linux, so figuring out what's really going on will definitely be a learning experience. I'm now deciding whether I should take the red pill and find out how deep the rabbit hole goes... :)
User avatar
dvarapala
Posts: 54
Joined: Sat Nov 06, 2010 2:30 pm

Re: ZMaudit Causes Buffer Overruns

Post by dvarapala »

For anyone still following along, I managed to solve the problem through a combination of moving the ext4 filesystem journal to a separate device (removing the journal entirely also works, but is obviously less safe) and using a write queue and a background writer thread. The periodic starvation came about whenever the cached filesystem journal data was being flushed out to disk.
Locked