High zmc CPU usage due to yuv2rgb24_X_c
Posted: Fri Nov 06, 2020 9:39 pm
Hi,
TL;DR
zmc uses an unexpected amount of CPU for a h264 camera (80% cpu vs 10% cpu when using ffmpeg from cmdline). 80% of cpu cycles are spent in libswscale.so's yuv2rgb24_X_c.
I am setting up zoneminder with some Reolink 5MP h264 cameras.
I am experiencing a very high CPU usage for the zmc process that belongs to the hi-res monitor (Even if I disable everything else and don't do any modect).
I think I understand ZM's system architecture (I quite like that) and I am fully aware that a monitor involves decompressing the stream into a shmem ring buffer even if it's not in use.
What I cannot understand is why it takes that much CPU. it seems due to colorspace conversion (But it's still surprisingly high even for that)
What I see:
one zmc monitor taking 80% of one core (a Celeron N3150) just for the monitor (no modect).
The monitor is configured as "ffmpeg", using as source 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' in TCP mode, and enabling vaapi HWaccel on /dev/dri/renderD128.
What I expect:
If I decode the same stream with ffmpeg like this:
ffmpeg -rtsp_transport tcp -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' -f null /dev/null
that ffmpeg process takes 10-15% CPU instead.
In essence I would expect the cost of zmc decoding into its ring buffer to be comparable with that ffmpeg instance.
You could say "but N3150 is a crappy celeron CPU, what do you expect?". I'd expect that CPU (with hw accel) to be able to handle way more streams. If you open up any commercial NVR they hande 8/16 streams with a passive-cooled iMX8.
I did some perf profiling on the zmc process. 70% of the cpu cycles seem spent in sws_scale > packed_vscale > -yuv2rgb24_X_c (I guess for color space conversions). See https://pastebin.com/raw/u8pM1zeg
Not fully sure what's going on. I suspect what is really costing CPU is not the h264 decode but the YUV>RGB color space conversion.
Debug log: https://pastebin.com/raw/pACFKCGG
(Note, I believe the "iHD_drv_video.so init failed" is a red herring. vainfo and ffmpeg do the same. I think what's going on is that vaapi tries first iHD_drv_video.so and then switches to i965_drv_video.so. I see the same log line when using ffmpeg from cmdline)
TL;DR
zmc uses an unexpected amount of CPU for a h264 camera (80% cpu vs 10% cpu when using ffmpeg from cmdline). 80% of cpu cycles are spent in libswscale.so's yuv2rgb24_X_c.
I am setting up zoneminder with some Reolink 5MP h264 cameras.
I am experiencing a very high CPU usage for the zmc process that belongs to the hi-res monitor (Even if I disable everything else and don't do any modect).
I think I understand ZM's system architecture (I quite like that) and I am fully aware that a monitor involves decompressing the stream into a shmem ring buffer even if it's not in use.
What I cannot understand is why it takes that much CPU. it seems due to colorspace conversion (But it's still surprisingly high even for that)
What I see:
one zmc monitor taking 80% of one core (a Celeron N3150) just for the monitor (no modect).
The monitor is configured as "ffmpeg", using as source 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' in TCP mode, and enabling vaapi HWaccel on /dev/dri/renderD128.
What I expect:
If I decode the same stream with ffmpeg like this:
ffmpeg -rtsp_transport tcp -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i 'rtsp://xxx:xxx@192.168.1.45:554/h264Preview_01_main' -f null /dev/null
that ffmpeg process takes 10-15% CPU instead.
In essence I would expect the cost of zmc decoding into its ring buffer to be comparable with that ffmpeg instance.
You could say "but N3150 is a crappy celeron CPU, what do you expect?". I'd expect that CPU (with hw accel) to be able to handle way more streams. If you open up any commercial NVR they hande 8/16 streams with a passive-cooled iMX8.
I did some perf profiling on the zmc process. 70% of the cpu cycles seem spent in sws_scale > packed_vscale > -yuv2rgb24_X_c (I guess for color space conversions). See https://pastebin.com/raw/u8pM1zeg
Not fully sure what's going on. I suspect what is really costing CPU is not the h264 decode but the YUV>RGB color space conversion.
Debug log: https://pastebin.com/raw/pACFKCGG
(Note, I believe the "iHD_drv_video.so init failed" is a red herring. vainfo and ffmpeg do the same. I think what's going on is that vaapi tries first iHD_drv_video.so and then switches to i965_drv_video.so. I see the same log line when using ffmpeg from cmdline)