Page 1 of 1

Mulitple zmc processes revisited

Posted: Sun Jan 02, 2005 8:53 am
by brucehvn
I'm finding I have the same issue that another user brought up in a post. After some random amount of time running, one of my monitors gets multiple zmc processes started causing a flood of error messages to my syslog. Today the syslog got over 1 GB before I caught it. When I shutdown the zm service script, there were several zmc processes left. When I tried a killall on those processes, the machine locked up tight and I had to reset to get it back.

The next time I started zm, within less than an hour, there was one too many zmc processes causing my syslog to get flooded again. I've attached the various snippets from the logfiles for that case. This is on a monitor that's coming from a capture card in the computer using /dev/video0.

My ZM_WATCH_MAX_DELAY is set to 15 which appears to be the default.
My ZM_WATCH_CHECK_INTERVAL is set to 10
The monitor is set to a maximum of 15 fps.

Code: Select all

Syslog Error Output
-------------------
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zma-m1[2809]: INF [Surv: 30805 - Gone into prealarm state]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:06 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:07 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]
Jan  1 19:28:07 homeauto zmc_d0[5159]: ERR [Sync failure for frame 0: Device or resource busy]
Jan  1 19:28:07 homeauto zmc_d0[5159]: ERR [Capture failure for frame 0: Device or resource busy]

Code: Select all

zmdc.log
--------
Watchdog starting at 05/01/01 18:53:07
Watchdog pausing for 30 seconds
Restarting capture daemon ('/usr/local/bin/zmdc.pl restart zmc -d 0'), time since last capture 1 seconds (1104634678-1104634677)
'zmc -d 0' started at 05/01/01 18:57:58
'zmc -d 0' starting at 05/01/01 18:57:58, pid = 3163
Restarting capture daemon ('/usr/local/bin/zmdc.pl restart zmc -m 3'), time since last capture 20 seconds (1104635808-1104635788)
'zmc -m 3' started at 05/01/01 19:17:05
'zmc -m 3' starting at 05/01/01 19:17:05, pid = 4432
Restarting capture daemon ('/usr/local/bin/zmdc.pl restart zmc -d 0'), time since last capture 1 seconds (1104636486-1104636485)
'zmc -d 0' started at 05/01/01 19:28:06
'zmc -d 0' starting at 05/01/01 19:28:06, pid = 5159

Code: Select all

zmwatch.log
-----------
'zmfilter.pl' started at 05/01/01 18:53:06
'zmfilter.pl' starting at 05/01/01 18:53:06, pid = 2823
'zmaudit.pl -d 900 -y' started at 05/01/01 18:53:07
'zmaudit.pl -d 900 -y' starting at 05/01/01 18:53:07, pid = 2827
'zmwatch.pl' started at 05/01/01 18:53:07
'zmwatch.pl' starting at 05/01/01 18:53:07, pid = 2832
'zmupdate.pl -c' started at 05/01/01 18:53:08
'zmupdate.pl -c' starting at 05/01/01 18:53:08, pid = 2837
'zmc -d 0' stopping at 05/01/01 18:57:58
'zmc -d 0' died at 05/01/01 18:57:58
'zmc -d 0' started at 05/01/01 18:57:58
'zmc -d 0' starting at 05/01/01 18:57:58, pid = 3163
'zmc -m 3' stopping at 05/01/01 19:17:05
'zmc -m 3' died at 05/01/01 19:17:05
'zmc -m 3' started at 05/01/01 19:17:05
'zmc -m 3' starting at 05/01/01 19:17:05, pid = 4432
'zmc -d 0' started at 05/01/01 19:28:06
'zmc -d 0' starting at 05/01/01 19:28:06, pid = 5159
'zmfilter.pl ' stopping at 05/01/01 19:33:11
'zmfilter.pl ' died at 05/01/01 19:33:11, signal 14
'zma -m 1' stopping at 05/01/01 19:33:11
'zma -m 1' died at 05/01/01 19:33:11
'zmaudit.pl -d 900 -y' stopping at 05/01/01 19:33:11
'zmaudit.pl -d 900 -y' died at 05/01/01 19:33:11, signal 14
'zmc -m 3' stopping at 05/01/01 19:33:11
'zmc -m 3' died at 05/01/01 19:33:12
'zmwatch.pl ' stopping at 05/01/01 19:33:12
'zmwatch.pl ' died at 05/01/01 19:33:12, signal 14
'zma -m 3' stopping at 05/01/01 19:33:12
'zma -m 3' died at 05/01/01 19:33:12
'zmupdate.pl -c' stopping at 05/01/01 19:33:12
'zmupdate.pl -c' died at 05/01/01 19:33:12, signal 14
'zmc -d 0' stopping at 05/01/01 19:33:12
'zmc -d 0' died at 05/01/01 19:33:12
Use of freed value in iteration at /usr/local/bin/zmdc.pl line 419, <CLIENT> line 1.

Posted: Tue Jan 04, 2005 2:37 am
by brucehvn
I had this happen again to me last night after the system was running for about 12 hours or so. I'm in the process of trying to analyze this some more to provide more information.

One thing I notice about my system is that it seems like about every 20 minutes or so, the watch daemon seems to think my main zmc capture daemon has died and it does a restart. This is what generates the comments about it going 1 second since the last image. I looked to see how it is calculating this figure and I saw something I think might be a little off.

It appears when looking at zmwatch.pl that if you have a camera that operates at 10 FPS or more, the watch daemon is going to possibly restart the capture daemon when it doesn't need to from time to time.

For my example, my main capture device (/dev/video0) is set to a max FPS of 15, which it quite easily can achieve.

In zmwatch.pl, the following is the code that determines how long of a delay it will allow before restarting the capture daemon:

Code: Select all

my $max_image_delay = $monitor->{MaxFPS}>0?(10/$monitor->{MaxFPS}):ZM_WATCH_MAX_DELAY;
my $image_delay = $now-$image_time;
In my example, this means that the max delay that zmwatch allows is .666666667 seconds based on my 15 FPS. The problem, I believe, is that when dealing with the times on images, the program only deals with full seconds.

So, for instance, if the last image was captured at a time of 25.99 seconds (which would store a value of 25 in the shared memory), and then zmwatch checks the current time and sees 26, it will calculate a time of 1 second since the last image was captured. This is over the .6666667 max time and it restarts the capture daemon. Since we're dealing with something that is happening very fast, the condition seems to only show up about every 20 minutes or so in my case.

I think that what the formula above is trying to do is basically say that if there have been about 10 consecutive missed frames, it will consider the capture daemon locked up, however, this can really only work correctly all the time if the time comparisons are using a more accurate scheme than full seconds.

Bruce

Posted: Tue Jan 04, 2005 12:05 pm
by zoneminder
Hi Bruce,

There are two issues here I think. The first as you have observed is the granularity of the image deltas for higher frame rates. This has been changed in 1.20.0 so that this calculation is only used for very low frame rates and a default value of a few seconds is used otherwise. The second issue is why some zmc deaths are not noticed by zmdc.pl. This I still haven't figured out as it's hard to catch them.

Phil