ZoneMinder 1.36.33 poor performance and no audio

Discussions related to the 1.36.x series of ZoneMinder
Post Reply
d4v1d
Posts: 7
Joined: Wed Sep 06, 2023 11:14 am

ZoneMinder 1.36.33 poor performance and no audio

Post by d4v1d »

Hello,

I'm trying to get audio recording to work.

I have a camera that streams video and audio on separate streams.
The video is an mjpeg rtp stream.
The audio is an mp3 rtsp stream.

I'm using the h264_vaapi hardware encoder with VideoWriter set to 1 (encode).
I use qp=23 as additional encoder parameters, as the crf equivalent for the 264 software encoder library.

Video recording works fine but as soon as I set the SecondPath to my audio rtsp stream, the performance drops from 20 FPS to 1 FPS, even in monitor mode only.

Enabling the record mode also outputs no audio and I get the following errors from the zm_send_packet_receive_frame function on audio packets:
09/06/23 12:46:19.996854 zmc_m1[1738].ERR-zm_ffmpeg.cpp/518 [Unable to send packet Resource temporarily unavailable, continuing]
The performance drop is caused by av_read_frame on the audio format context. But I still do not understand why there is no audio in the output file.

The interesting thing is that there are two streams encoded in the output file but I cannot hear any audio:
Stream 0
Codec: H264 - MPEG-4 AVC (part 10) (avc1)
Typ: Video
Videoauflösung: 1280x960
Pufferabmessungen: 1280x960
Decodiertes Format:
Ausrichtung: Oben links
Farbsättigungslage: Links

Stream 1
Codec: MPEG AAC Audio (mp4a)
Typ: Audio
Abtastrate: 32000 Hz
Bits pro Sample: 16
Bitrate: 32 kB/s
I can record the audio stream with ffmpeg directly:

Code: Select all

ffmpeg -i rtsp://192.168.1.32:5500/stream /tmp/test.mp3
And the audio test file is fine.

I'm using ffmpeg version 6.0.

Is there anything I am missing? Any help is really appreciated.
User avatar
iconnor
Posts: 3266
Joined: Fri Oct 29, 2010 1:43 am
Location: Toronto
Contact:

Re: ZoneMinder 1.36.33 poor performance and no audio

Post by iconnor »

We are going to need debug level 3 logs.

I don't remember exactly, but when developing this feature I'm pretty sure I did exactly what you are doing here... although I might have simply pointed the second url to an mp3 file instead of streaming over rtsp.

mp4's can only contain aac, so we are decoding the mp3 and re-encoding to aac. WHich shouldn't take that much cpu, but does take some.
d4v1d
Posts: 7
Joined: Wed Sep 06, 2023 11:14 am

Re: ZoneMinder 1.36.33 poor performance and no audio

Post by d4v1d »

Hello Mr. Connor,

first of all, thank you for your response.
I attached the level 3 debug logs below.

I changed the audio stream format from mp3 to pcm. Now the audio gets correctly encoded into the video file but the performance issues remain.

With an mp3 audio stream, av_read_frame takes about ~230 ms to return on the audio stream context.
With a pcm audio stream, av_read_frame drops to about ~20 ms execution time, but the condition:

Code: Select all

if ( mSecondFormatContext and
    (
      av_rescale_q(mLastAudioPTS, mAudioStream->time_base, AV_TIME_BASE_Q)
      <
      av_rescale_q(mLastVideoPTS, mVideoStream->time_base, AV_TIME_BASE_Q)
    ) ) {
  // if audio stream is behind video stream, then read from audio, otherwise video
  ...
} else {
  Debug(4, "Using video input because %" PRId64 " >= %" PRId64,
  ...
}
results in the audio packets being read about 10 times until one video packet is read.

I suppose the execution times differ beause of the variable size vs. fixed size in the stream format:
For video, the packet contains exactly one frame. For audio, it contains an integer number of frames if each frame has a known fixed size (e.g. PCM or ADPCM data). If the audio frames have a variable size (e.g. MPEG audio), then it contains one frame.
I moved the audio capture into a separate function and implemented an own audio read thread, similiar to the analysis thread. Now the stream seems to work fine, no FPS drops due to the audio read in the video capture path.

But I still have one issue. The audio stream and video stream time bases seem to have a small difference. This results in a slight deviation between the audio and the video over time in the resulting .mp4 file. For example, in the first minute of the video, audio and video seem to be synchronous. After 5 minutes, the audio lags behind the video noticeably.

I know the time bases get rescaled in the writeAudioFramePacket method in the video store class. But I currently have no idea how to synchronise the audio_out_stream time base with the video_out_stream time base. If I manage to fix that, I could submit my solution to the current development branch of ZoneMinder. Do you have an idea how I can synchronise the audio pts and dts to the video out stream?
Attachments
zmc_m1.log
(983.36 KiB) Downloaded 83 times
User avatar
iconnor
Posts: 3266
Joined: Fri Oct 29, 2010 1:43 am
Location: Toronto
Contact:

Re: ZoneMinder 1.36.33 poor performance and no audio

Post by iconnor »

Hmm... that's tough. The incoming timestamps shouldn't drift... and so neither should the re-encoded timestamps.

I certainly had a lot of trouble getting the timing right when writing this encoding code. But I never had to deal with drift, the timing was either off from the beginning or not.

In general there are more packets for audio than video. You generally want them interleaved fairly closely,
d4v1d
Posts: 7
Joined: Wed Sep 06, 2023 11:14 am

Re: ZoneMinder 1.36.33 poor performance and no audio

Post by d4v1d »

I managed to fix my audio drift by changing the audio_next_pts calculation:

Code: Select all

int64_t in_pts = zm_packet->timestamp.tv_sec * (uint64_t)1000000 + zm_packet->timestamp.tv_usec;
if (audio_first_dts == AV_NOPTS_VALUE) {
  audio_first_dts = in_pts;
  audio_next_pts = 0;
  Debug(3, "audio first_dts to %" PRId64, audio_first_dts);
} else {
  audio_next_pts = av_rescale_q(in_pts - audio_first_dts, AV_TIME_BASE_Q, audio_out_ctx->time_base);
}
...
out_frame->pts = audio_next_pts;
// audio_next_pts += out_frame->nb_samples;
I have used the video pts calculation as reference. Now even after 1 hour of recording, video and audio are still in sync.
Incrementing audio_next_pts by the number of samples resulted my audio pts to grow too fast. So audio was played prematurely with an increasing gap over time.
Post Reply