Page 1 of 5

Raspberry pi3 and h264_mmal

Posted: Sun Apr 30, 2017 10:32 pm
by cmisip
Is there any benefit to using h264_mmal ( the hardware accelerated codec for H264 decoding in the RPI ). The latest ffmpeg libavcodec has support for this codec along with h264_omx (h264 hardware encoder codec). There is only one pipeline though, so only one of the zmc processes could use the codec and of course, it only applies to rtsp h264 streams through Source type ffmpeg.

You can modify the Options:
Under Options->Images->FFMPEG_OUTPUT_OPTIONS add -c:v h264_omx -r 25 -c:a copy -b:v 1200k

I think this has to do with the Generate Video button in the Video section of the Event Display.

With regards to the retrieval of images from the rtsp source, how about a change in the source file zm_ffmpeg_camera.cpp
old code:

Code: Select all

if ( (mCodec = avcodec_find_decoder( mCodecContext->codec_id )) == NULL )
new code:

Code: Select all

if ( (mCodec = avcodec_find_decoder_by_name( "h264_mmal") == NULL )
That is a very simplistic way of doing this. But because there is only one pipeline, it may not be worth pursuing this avenue. Of course, the code will have to handle codec unavailabality in case another process is currently using the pipeline. (Consider also that when you generate a video in the Event display window using h264_omx, the pipeline may not be available for decoding also). I dont know how multiple zmc processes can handle the sharing of one resource. If a zmc requests decoding of only a few seconds of video at a time and gives up the pipeline, the other zmc processes might be able to use it. If they can't, then they can fall back on the software h264 decoder. Any thoughts?

Re: Raspberry pi3 and h264_mmal

Posted: Mon May 01, 2017 1:49 am
by iconnor
Thank you ever so much. Trying to follow ffmpeg and figure this stuff out is a full time job, and you have boiled it down very nicely.

I will be looking into what you have written tomorrow, and I hope you will stick around, as you clearly are following this stuff more closely than I have been able to.

These things will make the Rpi an amazing ZM platform.

Isaac

Re: Raspberry pi3 and h264_mmal

Posted: Mon May 01, 2017 12:25 pm
by knight-of-ni
On the slightly related topic of raspberry pi performance improvements, mastertheknife has an open pr to add arm neon support:
https://github.com/ZoneMinder/ZoneMinder/pull/1823

Should go into zm 1.31.0

We can always use this kind of help.

Re: Raspberry pi3 and h264_mmal

Posted: Tue May 02, 2017 2:45 pm
by cmisip
Seems to be working. It cut zmc's cpu utilization by 50%. My test system is single camera with h264 so dont have to worry about competing zmc processes for one pipeline. I am running raspbian with the following specs:

Code: Select all

pi@raspberrypi:/mnt/DISK1/build/zm/mmal $ dpkg -l | grep ffmpeg
ii  ffmpeg                           7:3.2.4-1~bpo8+1                          armhf        Tools for transcoding, streaming and playing of multimedia files
iU  ffmpeg-doc                       7:3.2.4-1~bpo8+1                          all          Documentation of the FFmpeg multimedia framework
pi@raspberrypi:/mnt/DISK1/build/zm/mmal $ dpkg -l | grep zoneminder
ii  zoneminder                       1.29.0+dfsg-1~bpo8+1                      armhf        video camera security and surveillance solution
ii  zoneminder-dbg                   1.29.0+dfsg-1~bpo8+1                      armhf        Zoneminder -- debugging symbols
ii  zoneminder-doc                   1.29.0+dfsg-1~bpo8+1                      all          ZoneMinder documentation
The patch is :

Code: Select all

Description: <short summary of the patch>
 TODO: Put a short summary on the line above and replace this paragraph
 with a longer explanation of this change. Complete the meta-information
 with other relevant fields (see below for details). To make it easier, the
 information below has been extracted from the changelog. Adjust it or drop
 it.
 .
 zoneminder (1.29.0+dfsg-1~bpo8+1) jessie-backports; urgency=medium
 .
   * Rebuild for jessie-backports.
   * Use bundled copy of CakePHP; do not `dh_linktree` system one.
Author: Dmitry Smirnov <onlyjob@debian.org>

---
The information above should follow the Patch Tagging Guidelines, please
checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
are templates for supplementary fields that you might want to add:

Origin: <vendor|upstream|other>, <url of original patch>
Bug: <url in upstream bugtracker>
Bug-Debian: https://bugs.debian.org/<bugnumber>
Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
Forwarded: <no|not-needed|url proving that it has been forwarded>
Reviewed-By: <name and email of someone who approved the patch>
Last-Update: <YYYY-MM-DD>

--- zoneminder-1.29.0+dfsg.orig/src/zm_ffmpeg_camera.cpp
+++ zoneminder-1.29.0+dfsg/src/zm_ffmpeg_camera.cpp
@@ -309,7 +309,7 @@ int FfmpegCamera::OpenFfmpeg() {
     mCodecContext = mFormatContext->streams[mVideoStreamId]->codec;
 
     // Try and get the codec from the codec context
-    if ( (mCodec = avcodec_find_decoder( mCodecContext->codec_id )) == NULL )
+    if ( (mCodec = avcodec_find_decoder_by_name("h264_mmal")) == NULL )
         Fatal( "Can't find codec for video stream from %s", mPath.c_str() );
 
     Debug ( 1, "Found decoder" );

You have to compile ffmpeg and edit debian/rules to add

Code: Select all

        --enable-omx \
        --enable-mmal \
        --enable-omx-rpi \

And ffmpegs -codecs | grep h264 should show h264_mmal.

Maybe somebody else can cofirm. The video preview of events shows a more pixelated video compared to software h264.

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Tue May 02, 2017 7:51 pm
by iconnor
Ok... so first step is that I am going to have to build & provide packages of ffmpeg for raspbian.

I really wish raspbian and debian and ubuntu would just start building this stuff in...

Re: Raspberry pi3 and h264_mmal

Posted: Sat May 06, 2017 9:54 pm
by cmisip
I added the fallback to h264 software decoding if h264_mmal is busy. I got the following results with top:

With software decoding only for 2 cameras:

Code: Select all

 PID USER      PR  NI    VIRT    RES    SHR S            %CPU %MEM     TIME+ COMMAND 
32339 www-data  20   0  186412  47100  43688 S  25.2  5.3   0:56.73 zma                                                                                                                  
32291 www-data  20   0  189772  53600  47908 R  23.2  6.1   0:54.10 zmc                                                                                                                  
32329 www-data  20   0  189544  53328  47700 S  23.2  6.0   0:53.92 zmc                                                                                                                  
32298 www-data  20   0  186412  47216  43804 S  21.2  5.3   0:48.95 zma                                                                                                                  
With h264_mmal with 2 cameras:

Code: Select all

 PID USER      PR  NI    VIRT    RES    SHR S            %CPU %MEM     TIME+ COMMAND 
28608 www-data  20   0  186412  47032  43620 S  24.9  5.3   2:18.56 zma                                                                                                                  
28303 www-data  20   0  187088  47192  43520 R  21.5  5.3   3:22.49 zma                                                                                                                                                                                                                                
28295 www-data  20   0  224788  55824  47660 S  11.3  6.3   1:40.21 zmc                                                                                                                  
28597 www-data  20   0  224856  56008  47964 S  10.3  6.3   0:59.46 zmc 
Its probably working. Interestingly, I cant access the h264_mmal from command line concurrently to zm (because its probably in use by zm). Also, I expected one of the zmc processess to use software decoder but I got half the cpu utilization for both zmc processes. I guess the zmc processes are taking turns using the hardware decoder?

Both cameras are detecting motion and registering the events.

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Sun May 07, 2017 10:37 pm
by iconnor
Very interesting.

Re: Raspberry pi3 and h264_mmal

Posted: Tue May 09, 2017 11:43 am
by cmisip
Compiled the latest source with neon optimizations. zmc' with further reduced cpu utilization and also zma. By the way, this is with sv3c cameras using rtsp feed of 640x360.

Code: Select all

  PID USER      PR  NI    VIRT    RES    SHR S            %CPU %MEM     TIME+ COMMAND 
17083 www-data  20   0  187460  47400  43908 S  16.2  5.4   0:18.61 zma                                                                                                                   
17043 www-data  20   0  187444  47588  44104 S  13.9  5.4   0:16.37 zma                                                                                                                   
17074 www-data  20   0  226060  56340  48072 R   7.6  6.4   0:08.77 zmc                                                                                                                   
17034 www-data  20   0  226024  56808  48484 S   7.3  6.4   0:08.88 zmc      
This is using the following compiler flags.

Code: Select all

CMAKE_CXX_FLAGS_RELEASE "-Wall -D__STDC_CONSTANT_MACROS -O2 -mcpu=cortex-a53  -mfpu=neon-fp-armv8 -mfloat-abi=hard"
CMAKE_C_FLAGS_DEBUG "-Wall -D__STDC_CONSTANT_MACROS -g -mcpu=cortex-a53  -mfpu=neon-fp-armv8 -mfloat-abi=hard"
I wonder how many h264 cameras could be run this way, sharing a single pipeline. I only have two cameras to test with. Any other compiler flags to test?

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Tue May 09, 2017 6:32 pm
by cmisip
Well, after running for 6 hours, I got the following from top:

Code: Select all

1886 www-data  20   0  187460  47604  44112 R  25.5  5.4   0:18.84 zma                                                                                                                   
 1857 www-data  20   0  187444  47516  44032 S  22.5  5.4   0:16.59 zma                                                                                                                   
 1879 www-data  20   0  225708  56512  48636 S  11.6  6.4   0:09.41 zmc                                                                                                                   
 1850 www-data  20   0  225824  56696  48620 S  11.3  6.4   0:09.47 zmc  
 
It's back to pre-neon cpu utilization. I wonder if it was the time of day. The earlier post was at 7 AM. Perhaps the lighting had something to do with it.

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Wed May 10, 2017 8:09 pm
by cmisip
I realized the latest source did not have the neon optimizations. I downloaded the modified files and was able to successfully build a package. However, it does not run. I am getting these errors in the log:

Code: Select all

 zmc_m4[16777]: PNC [Delta grayscale function failed self-test: Results differ from the expected results]
 zmwatch[16494]: ERR [Memory map file '/dev/shm/zm.mmap.4' does not exist.  zmc might not be running.]

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Thu May 11, 2017 2:08 am
by iconnor
I think there is a merge problem. I'm sure mastertheknife will figure it out soon.

Re: Raspberry pi3 and h264_mmal

Posted: Thu May 11, 2017 6:47 am
by mastertheknife
Yes, my bad. It is fixed in pull request #1878:
https://github.com/ZoneMinder/ZoneMinder/pull/1878

Also, this is AArch32 (32bit mode) only. I am still working adding the optimizations for AArch64 (64bit mode).
Raspbian is 32bit, so it should work :)

Re: Raspberry pi3 and h264_mmal

Posted: Fri May 12, 2017 1:44 am
by cmisip
I'm really new to this. What is the procedure to download the source from git with the neon optimizations? I have just been downloading the latest source and then downloading each modifed file.

I was able to build a package again, just incorporating the latest change to zm_image.cpp. I will run it for a day and see what happens.

Chris

Re: Raspberry pi3 and h264_mmal

Posted: Fri May 12, 2017 10:24 am
by mastertheknife
You can fetch the master branch. The neon stuff (AArch32 only at this moment, AArch64 very soon) and its fix was merged to master.
You can get a tarball of the master branch here:
https://github.com/ZoneMinder/ZoneMinder/tarball/master

Keep in mind it the master branch contains untested code. It contains many of the changes for the upcoming v1.31 release (still very far away).

Please show the cpu usage for zma (as you did earlier). I suspect it'll be about 25-50% lower. The AlarmedPixels function is still not optimized (maybe sometime in the future).

Re: Raspberry pi3 and h264_mmal

Posted: Sat May 13, 2017 2:50 pm
by cmisip
Was able to compile the latest source from 5-12-17 with neon optimizations. I added the h264_mmal line as well. I redid the cpu testing for the original unmodified zoneminder and ffmpeg from the repo using pidstat cause the numbers were jumping around a bit for the neon optimized version. I think also time of day matters due to difference in lighting.

OS: Raspbian "GNU/Linux 8 (jessie)"
Time : 10:15 AM
RPI : Raspberry PI 3 with active vc1 codec
Camera : SV3C with Resolution 640x360
Mode: modect, single zone covering full frame Default settings
Zoneminder Version : 1.29.0+dfsg-1~bpo8+1 ( from repo )
FFMpeg version : 7:3.2.4-1~bpo8+1 ( from repo, no h264_mmal codec)

Code: Select all

PIDSTAT zmc : Reporting average of 12 readings at 5 seconds intervals

ZMC Process ID : 507 ==> ( 25.40 + 26.60 + 26.80 + 25.80 + 25.80 + 25.80 + 26.00 + 27.00 + 25.80 + 26.00 + 25.20 + 25.00 ) / 12 
AVERAGE : 25.93

ZMC Process ID : 545 ==> ( 26.20 + 26.20 + 18.80 + 30.00 + 25.80 + 26.40 + 26.20 + 26.40 + 25.60 + 26.40 + 26.60 + 26.00 ) / 12 
AVERAGE : 25.88

PIDSTAT zma : Reporting average of 12 readings at 5 seconds intervals

ZMA Process ID : 554 ==> ( 27.00 + 26.40 + 26.20 + 26.00 + 25.80 + 25.80 + 25.60 + 25.60 + 26.40 + 26.00 + 26.80 + 26.00 ) / 12 
AVERAGE : 26.13

ZMA Process ID : 729 ==> ( 26.60 + 26.00 + 26.00 + 25.80 + 25.40 + 26.40 + 25.40 + 26.20 + 26.00 + 25.80 + 26.40 + 26.60 ) / 12 
AVERAGE : 26.05
OS: Raspbian "GNU/Linux 8 (jessie)"
Time : 10:40 AM
RPI : Raspberry PI 3 with active vc1 codec
Camera : SV3C with Resolution 640x360
Mode : modect, single zone covering full frame Default settings
Zoneminder Version: zoneminder tarball 5/12/17 with neon optimizations ( I added the h264_mmal line)
FFMpeg version : 7:3.2.4-1~bpo8+1 ( from repo, h264_mmal codec available )

Code: Select all

PIDSTAT zmc : Reporting average of 12 readings at 5 seconds intervals

ZMC Process ID : 6897 ==> ( 10.60 + 10.80 + 10.60 + 10.20 + 10.20 + 11.20 + 11.80 + 10.40 + 10.80 + 10.80 + 10.40 + 10.60 ) / 12 
AVERAGE : 10.70

ZMC Process ID : 6937 ==> ( 10.00 + 9.80 + 11.00 + 11.40 + 10.40 + 9.80 + 10.40 + 10.40 + 10.20 + 7.40 + 14.00 + 10.20 ) / 12 
AVERAGE : 10.42

PIDSTAT zma : Reporting average of 12 readings at 5 seconds intervals

ZMA Process ID : 6906 ==> ( 19.80 + 18.00 + 22.00 + 18.20 + 20.20 + 19.20 + 19.40 + 18.40 + 17.80 + 18.40 + 18.00 + 18.40 ) / 12 
AVERAGE : 18.98

ZMA Process ID : 6946 ==> ( 18.20 + 19.00 + 19.40 + 21.40 + 20.00 + 19.80 + 19.00 + 18.40 + 19.60 + 19.00 + 9.60 + 18.20 ) / 12 
AVERAGE : 18.47

I wrote a bash script for this for anyone interested in testing:

Code: Select all

#!/bin/bash

interval=5
numreports=12 #1 minute 
tail_n=$((${numreports} +1))

zmc_array=(`ps ax | grep -w z[m]c | awk '{print $1}'`)

echo "PIDSTAT zmc : Reporting average of $numreports readings at $interval seconds intervals"
for i in "${zmc_array[@]}"
do
  zmc=(`pidstat   -p $i -u $interval $numreports |  tail -${tail_n} | awk '{print $7}'`)
  bc_arg=();
  zm_counter=${#zmc[@]}
  for j in "${zmc[@]}"
  do 
    ((zm_counter--))
    if [ $zm_counter -gt 0 ]; then
      bc_arg=(${bc_arg[@]} "$j + ")
    fi
  done
  bc_arg=(`echo ${bc_arg[@]} | sed 's/..$//g'`)
  echo ""
  echo "ZMC Process ID : $i ==> ( ${bc_arg[@]} ) / ${numreports} "
  echo "AVERAGE : ${zmc[$((${#zmc[@]}-1))]}" 
done
echo ""


zma_array=(`ps ax | grep -w z[m]a | awk '{print $1}'`)

echo "PIDSTAT zma : Reporting average of $numreports readings at $interval seconds intervals"
for i in "${zma_array[@]}"
do
  zma=(`pidstat   -p $i -u $interval $numreports |  tail -${tail_n} | awk '{print $7}'`)
  bc_arg=();
  zm_counter=${#zma[@]}
  for j in "${zma[@]}"
  do 
    ((zm_counter--))
    if [ $zm_counter -gt 0 ]; then
      bc_arg=(${bc_arg[@]} "$j + ")
    fi
  done
  bc_arg=(`echo ${bc_arg[@]} | sed 's/..$//g'`)
  echo ""
  echo "ZMA Process ID : $i ==> ( ${bc_arg[@]} ) / ${numreports} "
  echo "AVERAGE : ${zma[$((${#zma[@]}-1))]}" 
done
echo ""

It's certainly faster. I wish somebody would test with more than 2 h264 cameras to see what happens. I am wondering how many can be supported by a single RPI3.

It seems switching to 32 bit color depth gives zmc a performance boost although, not zma.

32 bit color:

Code: Select all

PIDSTAT zmc : Reporting average of 12 readings at 5 seconds intervals

ZMC Process ID : 5517 ==> ( 8.80 + 6.60 + 6.60 + 6.80 + 8.00 + 7.20 + 8.00 + 7.60 + 7.40 + 5.80 + 6.39 + 7.40 ) / 12 
AVERAGE : 7.22

ZMC Process ID : 5527 ==> ( 8.60 + 8.80 + 8.60 + 8.60 + 9.00 + 6.60 + 6.20 + 8.00 + 7.20 + 7.40 + 7.40 + 6.80 ) / 12 
AVERAGE : 7.77

PIDSTAT zma : Reporting average of 12 readings at 5 seconds intervals

ZMA Process ID : 4954 ==> ( 18.20 + 21.80 + 23.00 + 19.20 + 19.00 + 15.60 + 22.60 + 24.20 + 24.00 + 24.80 + 22.40 + 16.60 ) / 12 
AVERAGE : 20.95

ZMA Process ID : 5011 ==> ( 21.60 + 23.60 + 23.60 + 23.20 + 23.40 + 16.20 + 0.00 + 20.00 + 22.20 + 23.80 + 20.20 + 21.60 ) / 12 
AVERAGE : 19.95
24 bit color:

Code: Select all

PIDSTAT zmc : Reporting average of 12 readings at 5 seconds intervals

ZMC Process ID : 6293 ==> ( 10.60 + 11.00 + 10.40 + 11.80 + 11.20 + 10.20 + 10.40 + 10.60 + 12.00 + 10.80 + 10.20 + 11.00 ) / 12 
AVERAGE : 10.85

ZMC Process ID : 6303 ==> ( 10.60 + 10.40 + 10.20 + 9.80 + 9.78 + 10.00 + 10.40 + 10.00 + 10.40 + 10.80 + 10.60 + 10.00 ) / 12 
AVERAGE : 10.25

PIDSTAT zma : Reporting average of 12 readings at 5 seconds intervals

ZMA Process ID : 5811 ==> ( 20.80 + 20.20 + 18.80 + 19.00 + 18.40 + 18.20 + 18.40 + 19.00 + 22.40 + 21.00 + 20.00 + 20.20 ) / 12 
AVERAGE : 19.70

ZMA Process ID : 5946 ==> ( 18.20 + 18.20 + 18.60 + 20.60 + 18.80 + 19.40 + 18.80 + 18.80 + 18.40 + 19.00 + 21.00 + 19.16 ) / 12 
AVERAGE : 19.08

Will libturbo-jpeg provide a performance boost? If so, how can that be used instead of libjpeg?

Thanks,
Chris