[SOLVED] misc errors, no clue

Forum for questions and support relating to the 1.30.x releases only.
Locked
marcopete87
Posts: 40
Joined: Wed Dec 04, 2013 4:53 pm

[SOLVED] misc errors, no clue

Post by marcopete87 »

Update on 18/05/2017: issue is solved.
I disabled APM, which was lowering performances.
Also, i bought another hd, so i have 1hd for 2 cameras.

Hi all, here is my system:
Ubuntu server 17.04, zoneminder v1.30.3
nuc5i7ryh
16GB ram
240GB ssd (for os) + 256GB ssd for backups
1x 4TB usb3 drive
1x 1TB usb3 drive

remote web access on different port than 80, 8080 and other common ports, nat to local port 80

4pc 1280x720 ip cameras (3 of them are on modect with 2-3 small zones, all events stored on 1TB drive)
2pc 1920x1080 ip cameras (all of them are on modect with 4-5 small zones, all events stored on 4TB drive)

All cameras are 15fps and 150 frame buffer, which is enough for 10s of video.

everything works fine with ffmpeg, i have mapped 80% of ram to shm.
Average load is 2.5-3.5, cpu utilization is near 50% for core, ram usage is 5.31/15GB

Code: Select all

marco@ubuntu-NUC:~$ df -h
File system     Dim. Usati Dispon. Uso% Montato su
udev            7,8G     0    7,8G   0% /dev
tmpfs           1,6G   26M    1,6G   2% /run
/dev/sdb2       219G   67G    141G  33% /
tmpfs            13G  4,4G    8,1G  36% /dev/shm
tmpfs           5,0M     0    5,0M   0% /run/lock
tmpfs           7,8G     0    7,8G   0% /sys/fs/cgroup
/dev/sda1       235G  115G    109G  52% /mnt/BACKUP
/dev/sdc1       917G  788G     84G  91% /mnt/EXT_1T
/dev/sdb1       511M  3,4M    508M   1% /boot/efi
/dev/sdd1       3,6T 1005G    2,5T  29% /mnt/EXT_4T
tmpfs           1,6G     0    1,6G   0% /run/user/1000
However, often i get buffer overrun errors:

Code: Select all

2017-05-09 18:35:10.011023	zmc_m4		1667	WAR	Buffer overrun at index 26, image 1022576, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 16:11:10.090935	zma_m1		14862	WAR	Approaching buffer overrun, consider slowing capture, simplifying analysis or increasing ring buffer size	zm_monitor.cpp	1324
2017-05-09 16:11:10.022441	zmc_m2		1643	WAR	Buffer overrun at index 1, image 896701, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 16:11:09.978518	zmc_m1		1638	WAR	Buffer overrun at index 149, image 896699, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:12:56.772680	zmc_m4		1667	WAR	Buffer overrun at index 93, image 624093, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:12:28.621271	zmc_m2		1643	WAR	Buffer overrun at index 127, image 627427, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:12:28.607541	zmc_m4		1667	WAR	Buffer overrun at index 127, image 623677, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:12:28.010578	zmc_m2		1643	WAR	Buffer overrun at index 118, image 627418, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:12:27.940199	zmc_m4		1667	WAR	Buffer overrun at index 117, image 623667, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:23.132997	zmc_m4		1667	WAR	Buffer overrun at index 111, image 619161, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:21.120984	zmc_m4		1667	WAR	Buffer overrun at index 81, image 619131, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:15.739386	zmc_m2		1643	WAR	Buffer overrun at index 76, image 622726, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:08.489546	zmc_m4		1667	WAR	Buffer overrun at index 44, image 618944, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:06.208327	zmc_m4		1667	WAR	Buffer overrun at index 11, image 618911, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 11:07:05.212603	zmc_m4		1667	WAR	Buffer overrun at index 146, image 618896, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 09:21:52.474660	zmaudit		1704	ERR	Unable to unlink '1/17/05/08/.123449': No such file or directory	zmaudit.pl	
2017-05-09 09:04:57.313946	zmc_m4		1667	WAR	Buffer overrun at index 87, image 510537, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 09:04:56.132963	zmc_m3		1654	WAR	Buffer overrun at index 71, image 512471, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 07:10:40.935284	zms		18934	ERR	Unable to validate swap image path, disabling buffered playback	zm_monitor.cpp	4025
2017-05-09 07:10:40.929785	zms		18934	ERR	Can't stat '/tmp/zm': No such file or directory	zm_monitor.cpp	3493
2017-05-09 02:51:44.917470	zmaudit		1704	ERR	Unable to unlink '3/17/05/08/.121077': No such file or directory	zmaudit.pl	
2017-05-09 02:36:17.611780	zmdc		1598	WAR	'zma -m 2' has not stopped at 17/05/09 02:36:17. Sending KILL to pid 1728	zmdc.pl
2017-05-09 02:36:09.980452	zmc_m2		1643	WAR	Buffer overrun at index 112, image 161962, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-09 02:36:09.921646	zmc_m3		1654	WAR	Buffer overrun at index 138, image 161988, slow down capture, speed up analysis or increase ring buffer size	zm_monitor.cpp	3094
2017-05-08 23:36:35.185278	zma_m4		1725	WAR	Waiting for capture daemon	zm_monitor.cpp	503
2017-05-08 23:36:34.181516	zma_m4		1725	WAR	Waiting for capture daemon	zm_monitor.cpp	503
2017-05-08 23:36:33.169403	zma_m4		1725	WAR	Waiting for capture daemon
"zma has not stopped" combined with "Unable to validate swap image path, disabling buffered playback" are something i see often, and i don't like them.

iotop combined hd writes are 25-30MB/s, which is 12-15MB/disk, so they aren't supposed to be the bottleneck.

Code: Select all

root@ubuntu-NUC:/home/marco# iostat -xd 2 5
Linux 4.10.0-20-generic (ubuntu-NUC) 	09/05/2017 	_x86_64_	(4 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,08     0,00    54,81     0,00    0,27    0,27    0,00   0,14   0,00
sdb               0,48    17,71    0,74   56,57    24,22   591,55    21,49     0,07    1,28    0,43    1,30   1,16   6,65
sdc               0,45    64,62    5,06   29,00    25,90 10336,93   608,51     0,13    3,78   58,00 1997,81  18,16  61,84
sdd               0,13    21,29    3,48   34,93    14,92 12916,68   673,26    15,64  407,26   20,73  445,82   6,08  23,34

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               1,50   101,50    1,00  282,00    16,00  1672,00    11,93     0,33    1,17    0,00    1,17   1,16  32,80
sdc               0,00     0,00    0,50    0,50     2,00     0,00     4,00     0,73 1628,00 1004,00 2252,00 374,00  37,40
sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00    45,50    0,00  144,50     0,00   878,00    12,15     0,18    1,27    0,00    1,27   1,27  18,40
sdc               0,00    51,50    0,00    0,00     0,00     0,00     0,00     0,25    0,00    0,00    0,00   0,00  25,20
sdd               0,00    21,50    0,00    1,00     0,00    90,00   180,00     0,13  132,00    0,00  132,00 132,00  13,20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00     6,50    0,00   61,00     0,00  2138,00    70,10     0,09    1,51    0,00    1,51   0,62   3,80
sdc               0,00     0,00    0,00    1,00     0,00   210,00   420,00     0,83 1080,00    0,00 1080,00 828,00  82,80
sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00     8,00    0,00   28,50     0,00   188,00    13,19     0,04    1,33    0,00    1,33   1,33   3,80
sdc               0,00    44,00    0,00   52,50     0,00 17216,00   655,85    22,28  424,42    0,00  424,42   7,16  37,60
sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00

Code: Select all

root@ubuntu-NUC:/home/marco# iostat 
Linux 4.10.0-20-generic (ubuntu-NUC) 	09/05/2017 	_x86_64_	(4 CPU)

avg-cpu:  %user   	%nice	 %system 	%iowait  		%steal   	%idle
          	47,56    	0,00   	 2,03   	15,82    		0,00   	34,59

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0,00         0,08         0,00       5641          4
sdb              57,34        24,28       591,77    1796681   43788629
sdc              34,06        25,87     10334,90    1914467  764737688
sdd              38,41        14,91     12916,28    1103147  955748728

Ram usage is good, cpu have some resources.

Everything happens when server is unattended


crontab is configured to do every minute:
- check ups status (with nut), this is an shell script which log status on local mysql database when ups is on battery and, eventually, send shutdown command to a nas.
- update my ip on my website (with curl)

because i need to get small folders, i use 60s events, which are also faster and easier to delete.
every frame last 24 hours.

Does anyone have idea about this?
Or, can someone suggest me some tools for detecting bottleneck?
Last edited by marcopete87 on Thu May 18, 2017 3:52 pm, edited 1 time in total.
bbunge
Posts: 2951
Joined: Mon Mar 26, 2012 11:40 am
Location: Pennsylvania

Re: misc errors, no clue

Post by bbunge »

If you are using the USB drives to store images they may be your problem. Slow the frame rate down to 5 FPS or lower and reduce the resolution to see if the errors stop. Might be better to move the storage drives to SATA connection.
marcopete87
Posts: 40
Joined: Wed Dec 04, 2013 4:53 pm

Re: misc errors, no clue

Post by marcopete87 »

yes, i'm using usb drives, but i tried also with sata ssd, NAS shared, etc without success...
however, 12MB/s for an usb3 won't be an problem, i think...
marcopete87
Posts: 40
Joined: Wed Dec 04, 2013 4:53 pm

Re: misc errors, no clue

Post by marcopete87 »

i have an clue now... :evil:

Code: Select all

root@ubuntu-NUC:/home/marco# smartctl -a /dev/sdd -d sat
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-20-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000LM016-1N2170
Serial Number:    W80173WB
LU WWN Device Id: 5 000c50 09ba09b5c
Firmware Version: 0003
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue May  9 23:21:18 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 710) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   099   006    Pre-fail  Always       -       50434171
  3 Spin_Up_Time            0x0003   097   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       121
  5 Reallocated_Sector_Ct   0x0033   030   030   036    Pre-fail  Always   FAILING_NOW 0
  7 Seek_Error_Rate         0x000f   076   057   030    Pre-fail  Always       -       60815293447
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6438 (172 54 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       84
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       2527
188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       8590262288
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   058   050   045    Old_age   Always       -       42 (Min/Max 41/42)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       51
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       540
194 Temperature_Celsius     0x0022   042   050   000    Old_age   Always       -       42 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   055   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   055   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5794 (111 241 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       455064380995
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       11377060406
an hdd is supposed to be failing...
i'll update this topic when i can find how to do an workaround.

workaround for this night: new partition table, disk was never used on the top 2TB of disk, i'm using top 2TB section.
bbunge
Posts: 2951
Joined: Mon Mar 26, 2012 11:40 am
Location: Pennsylvania

Re: misc errors, no clue

Post by bbunge »

Ah, Seagate drive. I avoid using them on Linux. For some reason they just do not work well or last.
marcopete87
Posts: 40
Joined: Wed Dec 04, 2013 4:53 pm

Re: misc errors, no clue

Post by marcopete87 »

replaced failing hd, buffer overrun persists.
2x 1920x1080 are now 2fps of analysis
marcopete87
Posts: 40
Joined: Wed Dec 04, 2013 4:53 pm

Re: misc errors, no clue

Post by marcopete87 »

update: despite 4TB hdd smart status, i'm using it without issues.
apparentely, disabling APM as told here, https://askubuntu.com/questions/134279/ ... t-in-12-04
is working.

i did a light offload from 1TB drive saving one camera events in an expendable old ssd.
Locked