As for the topic i encountered unexpected out of memory problems with cuda enabled in ZM and ES.
I solved it by stopping using the link monitor function but, as you will (hopefully!) read in the next "chapters" this is not necessary the only source of the problem.
For this reason i don't wanna waste anyone's time reading the whole post, considering that when it comes to cuda there are simply too many factors playing and my gpu (GT730) is old. I can live with my actual perfectly working setup.
But if you have time to spend like for an easy reading before sleep, then i'd appreciate any comment
Please consider:
This is gonna be a long post, sorry
i'm not an expert (don't laugh )
i'm not (by anymean) sure this is a real problem, nor it's really a ZM/ES issue
I have a stable installation of ZM 1.34.16, ES 5.15, CUDA 10.2.89, cuDNN7.6.5, openCV 4.3.0, dlib 19.19.99, ffmpeg 4.1.6 (everything compiled for nvidia-cuda accelleration) and zmNinja of course
ZM hw accelleration is set to cuvid (decoding), ES with hooks to use cuda
Problem:
One day i noticed i was starting to miss notifications in zmninja, i suddenly had nice thoughts about my old vpn provider as usual (because i really forgot about zm for some time beside a "small" change to one monitor, btw really... great job guys!) but this time the problem was different.
In zmdc.log was logged this traceback:
Code: Select all
07/14/2020 05:58:10.090830 zmdc[13267].INF [ZMServer:410] ['zma -m 3' starting at 20/07/14 05:58:10, pid = 13342]
07/14/2020 05:58:10.091317 zmdc[13342].INF [ZMServer:410] ['zma -m 3' started at 20/07/14 05:58:10]
07/14/2020 05:58:10.233992 zmdc[13345].INF [ZMServer:410] ['zmfilter.pl --filter_id=2 --daemon' started at 20/07/14 05:58:10]
07/14/2020 05:58:10.234837 zmdc[13267].INF [ZMServer:410] ['zmfilter.pl --filter_id=2 --daemon' starting at 20/07/14 05:58:10, pid = 13345]
07/14/2020 05:58:10.542657 zmdc[13267].INF [ZMServer:410] ['zmaudit.pl -c' starting at 20/07/14 05:58:10, pid = 13352]
07/14/2020 05:58:10.544307 zmdc[13352].INF [ZMServer:410] ['zmaudit.pl -c' started at 20/07/14 05:58:10]
07/14/2020 05:58:10.863288 zmdc[13360].INF [ZMServer:410] ['zmwatch.pl' started at 20/07/14 05:58:10]
07/14/2020 05:58:10.863802 zmdc[13267].INF [ZMServer:410] ['zmwatch.pl' starting at 20/07/14 05:58:10, pid = 13360]
07/14/2020 05:58:11.087396 zmdc[13267].INF [ZMServer:410] ['zmupdate.pl -c' starting at 20/07/14 05:58:11, pid = 13366]
07/14/2020 05:58:11.087523 zmdc[13366].INF [ZMServer:410] ['zmupdate.pl -c' started at 20/07/14 05:58:11]
07/14/2020 05:58:11.328751 zmdc[13371].INF [ZMServer:410] ['zmeventnotification.pl' started at 20/07/14 05:58:11]
07/14/2020 05:58:11.329410 zmdc[13267].INF [ZMServer:410] ['zmeventnotification.pl' starting at 20/07/14 05:58:11, pid = 13371]
Update agent starting at 20/07/14 05:58:11
07/14/2020 05:58:11.575475 zmdc[13267].INF [ZMServer:410] ['zmstats.pl' starting at 20/07/14 05:58:11, pid = 13377]
07/14/2020 05:58:11.575485 zmdc[13377].INF [ZMServer:410] ['zmstats.pl' started at 20/07/14 05:58:11]
Can't ignore signal CHLD, forcing to default.
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/memory.hpp:54: error: (-217:Gpu API call) out of memory in function 'ManagedPtr'
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/cudnn/cudnn.hpp:65: error: (-217:Gpu API call) CUDNN_STATUS_NOT_INITIALIZED in function 'UniqueHandle'
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/cudnn/cudnn.hpp:65: error: (-217:Gpu API call) CUDNN_STATUS_NOT_INITIALIZED in function 'UniqueHandle'
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/cudnn/cudnn.hpp:65: error: (-217:Gpu API call) CUDNN_STATUS_NOT_INITIALIZED in function 'UniqueHandle'
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/cudnn/cudnn.hpp:65: error: (-217:Gpu API call) CUDNN_STATUS_NOT_INITIALIZED in function 'UniqueHandle'
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/yolo.py", line 99, in detect
outs = self.net.forward(ln)
cv2.error: OpenCV(4.3.0) /home/fulvio/sources/opencv/modules/dnn/src/layers/../cuda4dnn/csl/cudnn/cudnn.hpp:65: error: (-217:Gpu API call) CUDNN_STATUS_NOT_INITIALIZED in function 'UniqueHandle'
Traceback (most recent call last):
File "/var/lib/zmeventnotification/bin/zm_detect.py", line 402, in <module>
b, l, c = m.detect(image)
File "/usr/local/lib/python3.6/dist-packages/zmes_hook_helpers/face.py", line 102, in detect
number_of_times_to_upsample=self.upsample_times)
File "/usr/local/lib/python3.6/dist-packages/face_recognition/api.py", line 116, in face_locations
return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")]
File "/usr/local/lib/python3.6/dist-packages/face_recognition/api.py", line 100, in _raw_face_locations
return cnn_face_detector(img, number_of_times_to_upsample)
RuntimeError: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file /home/fulvio/dlib/dlib/cuda/gpu_data.cpp:218. code: 2, reason: out of memory
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
bad bcrypt settings at /usr/bin/zmeventnotification.pl line 1492.
Can't ignore signal CHLD, forcing to default.
I have 3 ip cams in modect and 3 zm monitors. No relevant errors
The problem was related to the following change as far as i can tell:
First camera was modified enabling a second stream so to have 2 different streams from the same camera mapped to 2 ZM monitors (4 zm monitors in total now):
First monitor: 1920x1080x32 10fps NODECT
Second monitor: 640x480x32 10fps MODECT
First monitor was linked to second one to trigger recording.
Other two cameras running as usual in modect
No other link
yolo/face detection is active on all cameras
Basic idea: i run detection on the second stream (perhaps lowering to greyscale and 5fps who knows!) but record/archive/watch the one recorded from the first stream. This way i may reduce overall cpu usage or at least better distribute load between processes, better separation i don't know, i just tought it could be worth a try.
But this led to the above problem based on my tests (sorry, for this you have to read the long story ).
When an alarm was caught on the second stream both cameras recorded the event correctly but ES proccesses could not complete safely.
zmdc logged the above exception
esnotificationserver: Not sending event end alarm, as we did not send a start alarm for this, or start hook processing failed
esdetect stuck on: zm_detect.py:373 [Using model: yolo with /var/lib/zmeventnotification/images/11756-alarm.jpg]
Alarms were correctly caught, processed and notified from the other two cameras.
Removing the link solved the problem for me. After that, i forced a concurrent alarm on all monitors and received 4 notifications.
At this point i should think of some bug... but honestly there's much more to say/confess considering that little things could make big differences in such a complex OS. And mine wasn't that tiny
So my zm server is able to manage 4 concurrent events and properly queue cuda detection operations, why not just 2 from the linked monitors?
Honestly i can't say it wouldn't have work before the...
Long story:
Code: Select all
ffmpeg -hwaccels
Hardware acceleration methods:
cuvid
Great! A newer ffmpeg is surely better, of course drivers are always drivers...
Stopped ZM
Installed the new drivers (didn't reboot, i know, i know... i wasn't asked to ) and verified they were up
Quick check to cuda and opencv with python and all was good.
compiled ffmpeg with 0 probs and ran the above command again... and cuda was there!
Started zm, and quickly my 3 monitors came up again (still cuvid), great first step.
Checked alarms, cuda detection, face recognition and notifications.
Perfect!
I can't believe i did it right at first!
And obviously i decided, since this was one of my lucky days, to try to optimize zmc cpu usage splitting one camera (and then all cameras! top!) in two streams as already said.
So i generated a real alarm on the low-res monitor and quickly the linked monitor recorded the same video at high-res.
Perfect! twice, in the same day...
Let's go with the other two cameras too!!!
Perfect! ...
i can't say if zm console was showing "person" in the event info, nor i can say if i did receive a notification for that alarm.
I checked everything was functioning properly one second before splitting the streams, and that made me more confident that eveything was at the right place.
But a couple of days later i noticed missing notifications and saw that error. Then a real mess began... and apparently it was all a waste of time!
Epilogue:
Really, no thoughts in my mind were of the kind: "remove the monitor link and check", i just said, i didn't reboot and drivers are now showing their real power!
Now i'm gonna discard all the tries i made recompiling all the sources in various steps hoping to "quickly" sort this out... after lots of tries, readings, and still no clear evidence of a possible driver compatibility problem with any of the involved components, i still decided to revert back to my original drivers. Even some run test on cuda (see later) was performing right, i mean no errors.
Disabled zm and uninstalled opencv, cudnn and cuda. uninstalled nvidia drivers to nouveau, rebooted.
Deleted any directory or file left related to any of these packages, rebooted again.
No relevant messages in kernel logs.
Installed nvidia drivers and verified with nvidia-smi: drivers 440.33.01 and no cuda
installed cuda, cudnn and verified they were ok (see later)
recompiled opencv, ffmpeg and dlib
tested opencv via python
rebooted! and verified again
I was sure i was done, so i started zm and discovered the error was still there!
At that time i had all the 3 cameras splitted to 2 monitors each. No notification were actually sent so i was totally sure that i was still having problem with cuda and that my clean up ain't been that good after all.
I tried configuring the event server to globally ignore the 3 monitors in high-res but nothing changed.
I think i was going to delete the 3 monitor clones just to "free up" some resources for zm when i got a notification thanks to my dog!!!
So, after figuring this out(the monitor link!), i made a final test creating a single link (not a circular one) between 2 monitors (short story), verifying the problem was still there and only for that camera.
Tests:
Here the tests i ran when zoneminder was showing this problem:
These tests never failed for me even before clean up
Cuda:
Code: Select all
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 730"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 2002 MBytes (2098921472 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 902 MHz (0.90 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Code: Select all
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GT 730
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 2.8
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 3.4
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 32.9
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Code: Select all
./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 2 Capabilities 3.5, SmClock 901.5 Mhz, MemSize (Mb) 2001, MemClock 2505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.032800 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.033344 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.081440 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.283552 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.332032 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.032128 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032704 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.096000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.296512 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.328288 time requiring 207360 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Code: Select all
import numpy as np
import cv2 as cv
import time
confidence_threshold = 0.5
nms_threshold = 0.4
num_classes = 80
net = cv.dnn.readNet(model="/var/lib/zmeventnotification/models/yolov3/yolov3.weights", config="/var/lib/zmeventnotification/models/yolov3/yolov3.cfg")
net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
image = cv.imread("gm.jpg")
blob = cv.dnn.blobFromImage(image, 0.00392, (416, 416), [0, 0, 0], True, False)
# warmup
for i in range(3):
net.setInput(blob)
detections = net.forward(net.getUnconnectedOutLayersNames())
# benchmark
start = time.time()
for i in range(100):
net.setInput(blob)
detections = net.forward(net.getUnconnectedOutLayersNames())
end = time.time()
ms_per_image = (end - start) * 1000 / 100
print("Time per inference: %f ms" % (ms_per_image))
print("FPS: ", 1000.0 / ms_per_image)
Time per inference: 332.912259 ms
FPS: 3.0037944613328595
i observed in about 10 positive tests a gradual increase of memory usage (at 1s interval) until about 1,4Gb (2Gb total) and then a relative gradual decrease to 0.
With linked monitor in about the same number of tests, memory was gradually increasing to 1.4 and then straight to 0 (the exception i guess).
I'd expect to reach the 2Gb limit before that exception is raised but i don't really know.
So my guess is that something is different with linked monitors in ES but i'd like to know other opinions...
...perhaps one like: "did you enable 'that' option in zm config via the gui?", really i'd love this!