High zmc CPU usage due to yuv2rgb24_X_c

Forum for questions and support relating to the 1.34.x releases only.
Post Reply
primiano
Posts: 4
Joined: Tue Sep 06, 2011 5:24 pm

High zmc CPU usage due to yuv2rgb24_X_c

Post by primiano »

Hi,

TL;DR
zmc uses an unexpected amount of CPU for a h264 camera (80% cpu vs 10% cpu when using ffmpeg from cmdline). 80% of cpu cycles are spent in libswscale.so's yuv2rgb24_X_c.

I am setting up zoneminder with some Reolink 5MP h264 cameras.
I am experiencing a very high CPU usage for the zmc process that belongs to the hi-res monitor (Even if I disable everything else and don't do any modect).
I think I understand ZM's system architecture (I quite like that) and I am fully aware that a monitor involves decompressing the stream into a shmem ring buffer even if it's not in use.
What I cannot understand is why it takes that much CPU. it seems due to colorspace conversion (But it's still surprisingly high even for that)


What I see:
one zmc monitor taking 80% of one core (a Celeron N3150) just for the monitor (no modect).
The monitor is configured as "ffmpeg", using as source 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' in TCP mode, and enabling vaapi HWaccel on /dev/dri/renderD128.

What I expect:
If I decode the same stream with ffmpeg like this:

ffmpeg -rtsp_transport tcp -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' -f null /dev/null

that ffmpeg process takes 10-15% CPU instead.

In essence I would expect the cost of zmc decoding into its ring buffer to be comparable with that ffmpeg instance.
You could say "but N3150 is a crappy celeron CPU, what do you expect?". I'd expect that CPU (with hw accel) to be able to handle way more streams. If you open up any commercial NVR they hande 8/16 streams with a passive-cooled iMX8.

I did some perf profiling on the zmc process. 70% of the cpu cycles seem spent in sws_scale > packed_vscale > -yuv2rgb24_X_c (I guess for color space conversions). See https://pastebin.com/raw/u8pM1zeg

Not fully sure what's going on. I suspect what is really costing CPU is not the h264 decode but the YUV>RGB color space conversion.

Debug log: https://pastebin.com/raw/pACFKCGG
(Note, I believe the "iHD_drv_video.so init failed" is a red herring. vainfo and ffmpeg do the same. I think what's going on is that vaapi tries first iHD_drv_video.so and then switches to i965_drv_video.so. I see the same log line when using ffmpeg from cmdline)
User avatar
iconnor
Posts: 3362
Joined: Fri Oct 29, 2010 1:43 am
Location: Toronto
Contact:

Re: High zmc CPU usage due to yuv2rgb24_X_c

Post by iconnor »

Yeah I think the swscale conversions are a big problem and are defeating the hwaccel support. The typical process is that decoding gives us a YUV420 image which we then need to convert to rgb. With some refactoring we could get away with only doing this for motion detection.

Alternatively maybe we could do motion detection in yuv colorspace.
primiano
Posts: 4
Joined: Tue Sep 06, 2011 5:24 pm

Re: High zmc CPU usage due to yuv2rgb24_X_c

Post by primiano »

Yeah the sad part is that in this case I don't care about motion detection at all.
The workflow I am using is the following: for each camera I use:
1. the sub stream (640x480 @ 4fps) for motion detection
2. the main strearm (5MP) just for streaming and full-res recording (linking to the "sub "monitor).

Motion detection on the sub stream is cheap, for the big stream would be great if instead of doing any conversion at all, zmc did just put pass-through the h264 stream in the ring buffer as-is.
In both cases it could be passed either to the browser (which would take care of doing h264 decoding) or written into the file. That would reduce sensibly the amount of CPU required.
I just tried Shinobi and seems to do precisely that. 2 cameras take <20% of the same cpu with that trick.
Post Reply