Page 1 of 2

Slack Boxes Crashing

Posted: Wed Jul 06, 2005 9:06 pm
by Ruler
I've been having a problem with the servers I've installed ZoneMinder on. One has SlackWare 10.0 with ZoneMinder 1.19.5 continually recording 16 cameras to a 4-disk RAID-1 array. The other has Slackware 10.1 with ZoneMinder 1.21.0 recording 8 cameras to a 2-disk RAID-1 array (2 switch to motion detection at night while the other 6 record continually). Both crash with a kernel panic at a random time after running for a few months. The second machine has been on since about 5 days after 1.21.0 was released and crashed this past Saturday. I snapped off some digital pictures of the latest crash and recorded information about some of the previous ones, but have no idea how to interpret the information dumped to the screen nor do I know how to further investigate what caused the problem to begin with.

Does anybody have any ideas as to how I can track down the problem and remedy it?

Posted: Thu Jul 07, 2005 8:32 am
by zoneminder
If you have any pix you might as well post them here. It may at least give some idea of what component of your system is causing the problem.

Phil

Posted: Mon Jul 11, 2005 5:10 pm
by Ruler
The system that crashed the week before this past just crashed again late last week. (I actually think it has something to do with the days I leave and go out fishing... ;) ) Something extremely weird is that the crash on 7-2 occurred at 4:48.46 am, or at least that was the last event that was recorded in the log. On 7-8, the last entry in /var/log/messages was at 4:46:57 am. Being less than 2 minutes apart on the same day of the week is almost too strange to be a coincidence.

I do not have a web site available to upload the images to, but will e-mail them to whomever requests them.


Here is the content of /var/log/messages from shortly before the crash until a prompt comes back: (The unprintable characters showing as ^@ are all on one line, but I broke them up here to avoid horizontal scrolling.)

Code: Select all

Jul  8 04:46:54 VSS-Used1 zma_m1[9907]: INF [BCU1a: 29435 - Gone into prealarm state]
Jul  8 04:46:57 VSS-Used1 zma_m1[9907]: INF [BCU1a: 29438 - Gone into prealarm state]
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jul 11 08:24:14 VSS-Used1 syslogd 1.4.1: restart.
Jul 11 08:24:15 VSS-Used1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jul 11 08:24:15 VSS-Used1 kernel: BIOS-provided physical RAM map:
Jul 11 08:24:15 VSS-Used1 kernel: 127MB HIGHMEM available.
Jul 11 08:24:15 VSS-Used1 kernel: 896MB LOWMEM available.
Jul 11 08:24:15 VSS-Used1 kernel: ide_setup: hdc=ide-scsi
Jul 11 08:24:15 VSS-Used1 kernel: Initializing CPU#0
Jul 11 08:24:15 VSS-Used1 kernel: Memory: 1033084k/1048512k available (1747k kernel code, 15040k reserved, 564k data, 104k init, 131008k highmem)
Jul 11 08:24:15 VSS-Used1 kernel: Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Jul 11 08:24:15 VSS-Used1 kernel: Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Jul 11 08:24:15 VSS-Used1 kernel: Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Jul 11 08:24:15 VSS-Used1 kernel: Buffer cache hash table entries: 65536 (order: 6, 262144 bytes)
Jul 11 08:24:15 VSS-Used1 kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
Jul 11 08:24:15 VSS-Used1 kernel: CPU: L2 Cache: 512K (64 bytes/line)
Jul 11 08:24:15 VSS-Used1 kernel: Enabling fast FPU save and restore... done.
Jul 11 08:24:15 VSS-Used1 kernel: Enabling unmasked SIMD FPU exception support... done.
Jul 11 08:24:15 VSS-Used1 kernel: Checking 'hlt' instruction... OK.
Jul 11 08:24:15 VSS-Used1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfa9e0, last bus=3
Jul 11 08:24:15 VSS-Used1 kernel: PCI: Using configuration type 1
Jul 11 08:24:15 VSS-Used1 kernel: PCI: Probing PCI hardware
Jul 11 08:24:15 VSS-Used1 kernel: PCI: Discovered primary peer bus ff [IRQ]
Jul 11 08:24:15 VSS-Used1 kernel: PCI: Using IRQ router default [10de/01e0] at 00:00.0
Jul 11 08:24:15 VSS-Used1 kernel: Linux NET4.0 for Linux 2.4
Jul 11 08:24:15 VSS-Used1 kernel: Based upon Swansea University Computer Society NET3.039
Jul 11 08:24:15 VSS-Used1 kernel: i2c-core.o: i2c core module version 2.6.1 (20010830)
Jul 11 08:24:15 VSS-Used1 kernel: i2c-dev.o: i2c /dev entries driver module version 2.6.1 (20010830)
Jul 11 08:24:15 VSS-Used1 kernel: i2c-algo-bit.o: i2c bit algorithm module
Jul 11 08:24:15 VSS-Used1 kernel: i2c-proc.o version 2.6.1 (20010830)
Jul 11 08:24:15 VSS-Used1 kernel: Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
Jul 11 08:24:15 VSS-Used1 kernel: ttyS00 at 0x03f8 (irq = 4) is a 16550A
Jul 11 08:24:15 VSS-Used1 kernel: ttyS01 at 0x02f8 (irq = 3) is a 16550A
Jul 11 08:24:15 VSS-Used1 kernel: Real Time Clock Driver v1.10f
Jul 11 08:24:15 VSS-Used1 kernel: Floppy drive(s): fd0 is 1.44M
Jul 11 08:24:15 VSS-Used1 kernel: FDC 0 is a post-1991 82077
Jul 11 08:24:15 VSS-Used1 kernel: loop: loaded (max 8 devices)
Jul 11 08:24:15 VSS-Used1 kernel: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.30.
Jul 11 08:24:15 VSS-Used1 kernel: eth0: forcedeth.c: subsystem: 01458:e000 bound to 00:04.0
Jul 11 08:24:15 VSS-Used1 kernel: Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
Jul 11 08:24:15 VSS-Used1 kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Jul 11 08:24:15 VSS-Used1 kernel: NFORCE2: IDE controller at PCI slot 00:09.0
Jul 11 08:24:15 VSS-Used1 kernel: NFORCE2: chipset revision 162
Jul 11 08:24:15 VSS-Used1 kernel: NFORCE2: not 100%% native mode: will probe irqs later
Jul 11 08:24:15 VSS-Used1 kernel: NFORCE2: 00:09.0 (rev a2) UDMA133 controller
Jul 11 08:24:15 VSS-Used1 kernel:     ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
Jul 11 08:24:15 VSS-Used1 kernel:     ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
Jul 11 08:24:15 VSS-Used1 kernel: PDC20268: IDE controller at PCI slot 01:09.0
Jul 11 08:24:15 VSS-Used1 kernel: PDC20268: chipset revision 2
Jul 11 08:24:15 VSS-Used1 kernel: PDC20268: not 100%% native mode: will probe irqs later
Jul 11 08:24:15 VSS-Used1 kernel:     ide2: BM-DMA at 0xa000-0xa007, BIOS settings: hde:pio, hdf:pio
Jul 11 08:24:15 VSS-Used1 kernel:     ide3: BM-DMA at 0xa008-0xa00f, BIOS settings: hdg:pio, hdh:pio
Jul 11 08:24:15 VSS-Used1 kernel: hda: setmax LBA 156301488, native  156299375
Jul 11 08:24:15 VSS-Used1 kernel: hda: 156299375 sectors (80025 MB) w/8192KiB Cache, CHS=9729/255/63, UDMA(100)
Jul 11 08:24:15 VSS-Used1 kernel: hde: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63, UDMA(100)
Jul 11 08:24:15 VSS-Used1 kernel: hdg: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63, UDMA(100)
Jul 11 08:24:15 VSS-Used1 kernel: Partition check:
Jul 11 08:24:15 VSS-Used1 kernel:  hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >
Jul 11 08:24:15 VSS-Used1 kernel:  hde: hde1
Jul 11 08:24:15 VSS-Used1 kernel:  hdg: hdg1
Jul 11 08:24:15 VSS-Used1 kernel: SCSI subsystem driver Revision: 1.00
Jul 11 08:24:15 VSS-Used1 kernel: Intel 810 + AC97 Audio, version 1.01, 01:28:15 Apr 11 2005
Jul 11 08:24:15 VSS-Used1 kernel: i810: NVIDIA nForce Audio found at IO 0xb800 and 0xb400, MEM 0x0000 and 0x0000, IRQ 10
Jul 11 08:24:15 VSS-Used1 kernel: i810_audio: Audio Controller supports 6 channels.
Jul 11 08:24:15 VSS-Used1 kernel: i810_audio: Defaulting to base 2 channel mode.
Jul 11 08:24:15 VSS-Used1 kernel: i810_audio: Resetting connection 0
Jul 11 08:24:15 VSS-Used1 kernel: ac97_codec: AC97  codec, id: ALG96 (Unknown)
Jul 11 08:24:15 VSS-Used1 kernel: Linux video capture interface: v1.00
Jul 11 08:24:15 VSS-Used1 kernel: md: raid0 personality registered as nr 2
Jul 11 08:24:15 VSS-Used1 kernel: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Jul 11 08:24:15 VSS-Used1 kernel: md: Autodetecting RAID arrays.
Jul 11 08:24:15 VSS-Used1 kernel:  [events: 00000052]
Jul 11 08:24:15 VSS-Used1 kernel:  [events: 00000052]
Jul 11 08:24:15 VSS-Used1 kernel: md: autorun ...
Jul 11 08:24:15 VSS-Used1 kernel: md: considering hdg1 ...
Jul 11 08:24:15 VSS-Used1 kernel: md:  adding hdg1 ...
Jul 11 08:24:15 VSS-Used1 kernel: md:  adding hde1 ...
Jul 11 08:24:15 VSS-Used1 kernel: md: created md0
Jul 11 08:24:15 VSS-Used1 kernel: md: bind<hde1,1>
Jul 11 08:24:15 VSS-Used1 kernel: md: bind<hdg1,2>
Jul 11 08:24:15 VSS-Used1 kernel: md: running: <hdg1><hde1>
Jul 11 08:24:15 VSS-Used1 kernel: md: hdg1's event counter: 00000052
Jul 11 08:24:15 VSS-Used1 kernel: md: hde1's event counter: 00000052
Jul 11 08:24:15 VSS-Used1 kernel: md0: max total readahead window set to 512k
Jul 11 08:24:15 VSS-Used1 kernel: md0: 2 data-disks, max readahead per data-disk: 256k
Jul 11 08:24:15 VSS-Used1 kernel: md: updating md0 RAID superblock on device
Jul 11 08:24:15 VSS-Used1 kernel: md: hdg1 [events: 00000053]<6>(write) hdg1's sb offset: 244195904
Jul 11 08:24:15 VSS-Used1 kernel: md: hde1 [events: 00000053]<6>(write) hde1's sb offset: 244195904
Jul 11 08:24:15 VSS-Used1 kernel: md: ... autorun DONE.
Jul 11 08:24:15 VSS-Used1 kernel: LVM version 1.0.8(17/11/2003)
Jul 11 08:24:15 VSS-Used1 kernel: Initializing Cryptographic API
Jul 11 08:24:15 VSS-Used1 kernel: NET4: Linux TCP/IP 1.0 for NET4.0
Jul 11 08:24:15 VSS-Used1 kernel: IP Protocols: ICMP, UDP, TCP, IGMP
Jul 11 08:24:15 VSS-Used1 kernel: IP: routing cache hash table of 8192 buckets, 64Kbytes
Jul 11 08:24:15 VSS-Used1 kernel: TCP: Hash tables configured (established 262144 bind 65536)
Jul 11 08:24:15 VSS-Used1 kernel: Linux IP multicast router 0.06 plus PIM-SM
Jul 11 08:24:15 VSS-Used1 kernel: NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Jul 11 08:24:15 VSS-Used1 kernel: UMSDOS: msdos_read_super failed, mount aborted.
Jul 11 08:24:15 VSS-Used1 kernel: Freeing unused kernel memory: 104k freed
Jul 11 08:24:15 VSS-Used1 kernel: Adding Swap: 1959920k swap-space (priority -1)
Jul 11 08:24:15 VSS-Used1 kernel: apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16)
Jul 11 08:24:15 VSS-Used1 kernel: Linux agpgart interface v0.99 (c) Jeff Hartmann
Jul 11 08:24:15 VSS-Used1 kernel: agpgart: Maximum main memory to use for agp memory: 941M
Jul 11 08:24:15 VSS-Used1 kernel: agpgart: Detected NVIDIA nForce2 chipset
Jul 11 08:24:15 VSS-Used1 kernel: agpgart: AGP aperture is 64M @ 0xd8000000
Jul 11 08:24:15 VSS-Used1 kernel: scsi0 : SCSI host adapter emulation for IDE ATAPI devices
Jul 11 08:24:15 VSS-Used1 kernel: Uniform CD-ROM driver Revision: 3.12
Jul 11 08:24:15 VSS-Used1 logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig lo 127.0.0.1
Jul 11 08:24:15 VSS-Used1 logger: /etc/rc.d/rc.inet1:  /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
Jul 11 08:24:15 VSS-Used1 logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig eth0 207.185.212.81 broadcast 207.185.212.127 netmask 255.255.255.192
Jul 11 08:24:15 VSS-Used1 logger: /etc/rc.d/rc.inet1:  /sbin/route add default gw 207.185.212.126 metric 1
Jul 11 08:24:15 VSS-Used1 logger: /etc/rc.d/rc.hotplug start (entering script)
Jul 11 08:24:15 VSS-Used1 udevsend[136]: udevd daemon started
Jul 11 08:24:15 VSS-Used1 kernel: bttv: driver version 0.9.15 loaded
Jul 11 08:24:15 VSS-Used1 kernel: bttv: using 16 buffers with 2080k (520 pages) each for capture
Jul 11 08:24:15 VSS-Used1 kernel: bttv: Bt8xx card found (0).
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: Bt878 (rev 17) at 02:08.0, irq: 5, latency: 32, mmio: 0xdc000000
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: detected: Provideo PV150A-1 [card=98], PCI subsystem ID is aa00:1460
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: using: ProVideo PV951 [card=42,insmod option]
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: i2c: checking for TDA9875 @ 0xb0... not found
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: i2c: checking for TDA7432 @ 0x8a... not found
Jul 11 08:24:15 VSS-Used1 kernel: tvaudio: TV audio decoder + audio/video mux driver
Jul 11 08:24:15 VSS-Used1 kernel: tvaudio: known chips: tda9840,tda9873h,tda9874h/a,tda9850,tda9855,tea6300,tea6420,tda8425,pic16c54 (PV951),ta8874z
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: registered device video0
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: registered device vbi0
Jul 11 08:24:15 VSS-Used1 kernel: bttv0: PLL: 28636363 => 35468950 .. ok
Jul 11 08:24:15 VSS-Used1 kernel: bttv: Bt8xx card found (1).
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: Bt878 (rev 17) at 02:09.0, irq: 11, latency: 32, mmio: 0xdc002000
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: detected: Provideo PV150A-2 [card=98], PCI subsystem ID is aa01:1461
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: using: ProVideo PV951 [card=42,insmod option]
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: i2c: checking for TDA9875 @ 0xb0... not found
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: i2c: checking for TDA7432 @ 0x8a... not found
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: registered device video1
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: registered device vbi1
Jul 11 08:24:15 VSS-Used1 kernel: bttv1: PLL: 28636363 => 35468950 .. ok
Jul 11 08:24:15 VSS-Used1 kernel: bttv: Bt8xx card found (2).
Jul 11 08:24:15 VSS-Used1 kernel: bttv2: Bt878 (rev 17) at 02:0a.0, irq: 11, latency: 32, mmio: 0xdc004000
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: detected: Provideo PV150A-3 [card=98], PCI subsystem ID is aa02:1462
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: using: ProVideo PV951 [card=42,insmod option]
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: i2c: checking for TDA9875 @ 0xb0... not found
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: i2c: checking for TDA7432 @ 0x8a... not found
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: registered device video2
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: registered device vbi2
Jul 11 08:24:16 VSS-Used1 kernel: bttv2: PLL: 28636363 => 35468950 .. ok
Jul 11 08:24:16 VSS-Used1 kernel: bttv: Bt8xx card found (3).
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: Bt878 (rev 17) at 02:0b.0, irq: 9, latency: 32, mmio: 0xdc006000
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: detected: Provideo PV150A-4 [card=98], PCI subsystem ID is aa03:1463
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: using: ProVideo PV951 [card=42,insmod option]
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: i2c: checking for TDA9875 @ 0xb0... not found
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: i2c: checking for TDA7432 @ 0x8a... not found
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: registered device video3
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: registered device vbi3
Jul 11 08:24:16 VSS-Used1 kernel: bttv3: PLL: 28636363 => 35468950 .. ok
Jul 11 08:24:16 VSS-Used1 kernel: usb.c: registered new driver usbdevfs
Jul 11 08:24:16 VSS-Used1 kernel: usb.c: registered new driver hub
Jul 11 08:24:16 VSS-Used1 kernel: ehci_hcd 00:02.2: nVidia Corporation nForce2 USB Controller
Jul 11 08:24:16 VSS-Used1 kernel: ehci_hcd 00:02.2: irq 5, pci mem f8e87000
Jul 11 08:24:16 VSS-Used1 kernel: usb.c: new USB bus registered, assigned bus number 1
Jul 11 08:24:16 VSS-Used1 kernel: ehci_hcd 00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2003-Dec-29/2.4
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: USB hub found
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: 6 ports detected
Jul 11 08:24:16 VSS-Used1 kernel: usb-ohci.c: USB OHCI at membase 0xf8e8f000, IRQ 11
Jul 11 08:24:16 VSS-Used1 kernel: usb-ohci.c: usb-00:02.0, nVidia Corporation nForce2 USB Controller
Jul 11 08:24:16 VSS-Used1 kernel: usb.c: new USB bus registered, assigned bus number 2
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: USB hub found
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: 3 ports detected
Jul 11 08:24:16 VSS-Used1 kernel: usb-ohci.c: USB OHCI at membase 0xf8e91000, IRQ 9
Jul 11 08:24:16 VSS-Used1 kernel: usb-ohci.c: usb-00:02.1, nVidia Corporation nForce2 USB Controller (#2)
Jul 11 08:24:16 VSS-Used1 kernel: usb.c: new USB bus registered, assigned bus number 3
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: USB hub found
Jul 11 08:24:16 VSS-Used1 kernel: hub.c: 3 ports detected
Jul 11 08:24:17 VSS-Used1 kernel: hub.c: new USB device 00:02.1-1, assigned address 2
Jul 11 08:24:17 VSS-Used1 kernel: uhci.c: USB Universal Host Controller Interface driver v1.1
Jul 11 08:24:17 VSS-Used1 kernel: usb-uhci.c: $Revision: 1.275 $ time 01:31:28 Apr 11 2005
Jul 11 08:24:17 VSS-Used1 kernel: usb-uhci.c: High bandwidth mode enabled
Jul 11 08:24:17 VSS-Used1 kernel: usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
Jul 11 08:24:20 VSS-Used1 kernel: usb.c: registered new driver hiddev
Jul 11 08:24:20 VSS-Used1 kernel: usb.c: registered new driver hid
Jul 11 08:24:22 VSS-Used1 kernel: hiddev0: USB HID v1.10 Device [American Power Conversion Back-UPS RS 1000 FW:7.g8 .D USB FW:g8 ] on usb3:2.0
Jul 11 08:24:22 VSS-Used1 kernel: hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
Jul 11 08:24:22 VSS-Used1 kernel: hid-core.c: USB HID support drivers
Jul 11 08:24:22 VSS-Used1 kernel: mice: PS/2 mouse device common for all mice
Jul 11 08:24:31 VSS-Used1 logger: /etc/rc.d/rc.hotplug start (exiting script)
Jul 11 08:24:32 VSS-Used1 sshd[1894]: Server listening on 0.0.0.0 port 22.
Jul 11 08:24:40 VSS-Used1 apmd[1910]: Version 3.0.2 (APM BIOS 1.2, Linux driver 1.16)
Jul 11 08:24:40 VSS-Used1 apmd[1910]: Charge: * * * (-1% unknown)
Jul 11 08:24:47 VSS-Used1 zmc_d0[2000]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:47 VSS-Used1 kernel: bttv0: PLL can sleep, using XTAL (28636363).
Jul 11 08:24:48 VSS-Used1 zmc_d0[2000]: INF [Starting Capture]
Jul 11 08:24:48 VSS-Used1 zma_m1[2002]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 zmc_d1[2005]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 kernel: bttv1: PLL can sleep, using XTAL (28636363).
Jul 11 08:24:48 VSS-Used1 zma_m2[2009]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 zmc_d1[2005]: INF [Starting Capture]
Jul 11 08:24:48 VSS-Used1 zmc_d2[2012]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 kernel: bttv2: PLL can sleep, using XTAL (28636363).
Jul 11 08:24:48 VSS-Used1 zma_m3[2015]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 zmc_d2[2012]: INF [Starting Capture]
Jul 11 08:24:48 VSS-Used1 zmc_d3[2018]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 kernel: bttv3: PLL can sleep, using XTAL (28636363).
Jul 11 08:24:48 VSS-Used1 zma_m4[2021]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:48 VSS-Used1 zmc_d3[2018]: INF [Starting Capture]
Jul 11 08:24:49 VSS-Used1 zma_m5[2025]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:49 VSS-Used1 zma_m5[2025]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m1[2002]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m6[2029]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:49 VSS-Used1 zma_m6[2029]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m2[2009]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m7[2033]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:49 VSS-Used1 zma_m7[2033]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m3[2015]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m8[2037]: INF [Debug Level = 0, Debug Log = <none>]
Jul 11 08:24:49 VSS-Used1 zma_m8[2037]: INF [Warming up]
Jul 11 08:24:49 VSS-Used1 zma_m4[2021]: INF [Warming up]
Jul 11 08:24:54 VSS-Used1 apcupsd[2055]: NIS server startup succeeded
Jul 11 08:24:54 VSS-Used1 zma_m5[2025]: INF [BCU1e: 006 - Starting new event 219790]
Jul 11 08:24:54 VSS-Used1 zma_m1[2002]: INF [BCU1a: 006 - Gone into prealarm state]
Jul 11 08:24:54 VSS-Used1 zma_m6[2029]: INF [BCU1f: 006 - Starting new event 219791]
Jul 11 08:24:54 VSS-Used1 zma_m7[2033]: INF [BCU1g: 006 - Starting new event 219792]
Jul 11 08:24:55 VSS-Used1 zma_m8[2037]: INF [BCU1h: 006 - Starting new event 219793]
Jul 11 08:24:55 VSS-Used1 zma_m3[2015]: INF [BCU1c: 006 - Starting new event 219794]
Jul 11 08:24:55 VSS-Used1 zma_m4[2021]: INF [BCU1d: 006 - Starting new event 219795]
Jul 11 08:25:00 VSS-Used1 zma_m5[2025]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m4[2021]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m5[2025]: INF [BCU1e: 012 - Starting new event 219796]
Jul 11 08:25:00 VSS-Used1 zma_m4[2021]: INF [BCU1d: 011 - Starting new event 219797]
Jul 11 08:25:00 VSS-Used1 zma_m6[2029]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m6[2029]: INF [BCU1f: 012 - Starting new event 219798]
Jul 11 08:25:00 VSS-Used1 zma_m1[2002]: INF [BCU1a: 012 - Gone into prealarm state]
Jul 11 08:25:00 VSS-Used1 zma_m7[2033]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m7[2033]: INF [BCU1g: 012 - Starting new event 219799]
Jul 11 08:25:00 VSS-Used1 zma_m8[2037]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m8[2037]: INF [BCU1h: 012 - Starting new event 219800]
Jul 11 08:25:00 VSS-Used1 zma_m3[2015]: INF [Ended event]
Jul 11 08:25:00 VSS-Used1 zma_m3[2015]: INF [BCU1c: 012 - Starting new event 219801]
Jul 11 08:25:25 VSS-Used1 zma_m1[2002]: INF [BCU1a: 038 - Gone into prealarm state]
Jul 11 08:25:29 VSS-Used1 zma_m1[2002]: INF [BCU1a: 043 - Gone into prealarm state]
Jul 11 08:25:39 VSS-Used1 zma_m1[2002]: INF [BCU1a: 053 - Gone into prealarm state]
Jul 11 08:25:42 VSS-Used1 zma_m1[2002]: INF [BCU1a: 056 - Gone into prealarm state]
Jul 11 08:25:46 VSS-Used1 zma_m1[2002]: INF [BCU1a: 060 - Gone into prealarm state]
Jul 11 08:25:49 VSS-Used1 zma_m1[2002]: INF [BCU1a: 063 - Gone into prealarm state]

Posted: Mon Jul 11, 2005 6:47 pm
by Ruler
I just found something very interesting...

The daily cron jobs are set up to run at 4:40 am every morning. While the only two jobs in /etc/cron.daily are the standard logrotate and slocate that are installed by Slackware during a typical installation, I'm wondering if perhaps ZoneMinder is generating events during part of one of these processes and screwing something up. (It's the only theory that I've managed to come up with thus far.) It does make sense that this particular box has the problem worse than my other ZoneMinder machine as I have this one set to have two cameras kick into motion detection mode at night rather than the continual recording in 5-minute chunks.

Posted: Tue Jul 12, 2005 1:59 pm
by zoneminder
Do you have anything in cron.weekly and if so do you know what day they run and when? I can't think of anything in ZM per se that would cause a crash, certainly not of a whole box but it's possibly the combination of heavy disk and/or DB use mixed with something in your crons might be pushing something over the edge.

Phil

Posted: Tue Jul 12, 2005 4:42 pm
by Ruler
Heh, I read your post Phil, then went to log into the server remotely (it's next door) to check for stuff in cron.weekly and couldn't connect. It crashed again at 4:46 this morning. It's almost GOT to be one of those cron jobs that's causing it, but why is a good question. The video is being written to a RAID array of 2 250 gig drives, which is separate from the boot drive. The boot drive does have the mysql database on, but I don't see anything in the cron job that would cause that much I/O, certainly not enough to overload the box. (Unless I'm missing something. It has to be something that takes a while if it is - isn't 6-8 minutes a long time for a process such as those to run?) I was thinking of disabling them as a check - will that have any negative impact on the server that you know of?

Posted: Tue Jul 12, 2005 5:04 pm
by Ruler
I'm not familiar enough with the kernel to know this, but have to assume it's been addressed previously. Is it possible that logrotate is trying to move/rename the log file while it's being written to by another process? I know with that bastardized OS called Sco, you had to do an fuser -u on any file you wanted to blow away and recreate because any process that was using it when you did this would continue to write to the disk address where the file was originally located. Given that it doesn't happen every time the cron job fires would support this theory, as events may not be generated at that exact time.

It's just a thought - way off base?

It's still puzzling to me why it was fine for months and now it's crashed 3 times in 2 weeks. Almost sounds like an intermittent hardware failure, However, if it were RAM or something, I'd think it'd happen all the time rather than just after these particular cron jobs fire. From what I've seen in the past, when a PCI card (I'm thinking capture card) has trouble, there's detailed information about it recorded in the system log, even if the end result of the problem is a kernel panic.

Posted: Tue Jul 12, 2005 5:45 pm
by cordel
I'de run a ram test on it anyway, it could be just one page of memeory affected and till you load it up it don't use that space. I thought I was the only SCO user around here. :shock: Were there any updates installed in the last couple weeks maybe?
Regards,
Cordel

Posted: Tue Jul 12, 2005 5:46 pm
by Ruler
This is also quite interesting...

The message log from the crash on 7-2 contains messages from June 26@4:40 am. The other two on 7-8 and 7-12 only have messages from 4:40am that same day on with the first message being syslogd restarting being the first message in both. The first one has no message about syslogd restarting anywhere in it except for the very first line on Jun 26 @ 04:40:02.

Hmmmmm......

Posted: Tue Jul 12, 2005 5:50 pm
by Ruler
I've never run a RAM test before outside of norton utilities. How does one do this in linux? (I'm relatively sure that it's not the case, but it's definitely worth a look. I compiled the kernel several times before getting the right config - stressing it out like that would reveal any problems at that time.)

I've done nothing to this box in the past couple of weeks with the exception of reboot it when it panics. That was the first thing I did actually - looked for anything that changed recently.


BTW, from my comments about System V, you can guess my opinion of Sco. ;) I no longer use it, but had to put up with it in my last job. I do have an old Sparc station in my garage that I've thought of booting up and playing with; I think that has sco on it.

Posted: Tue Jul 12, 2005 6:17 pm
by cordel
On most distro's cd's it an option when you boot from the first CD. Here is the web site for the software that is on the CD.
http://www.memtest86.com/
You can also make a bootable disk, I just usualy use the CD as it's a little faster booting up.
I take that nothing stands out in any of the logs? Does it do a kernel dump?
SCO maybe a pain in the butt, it is highly stable though. I feel the same way about it. I had to start working with it 3 months ago (good thing I learned linux recently, It sure helps).
Regards,
Cordel

Posted: Tue Jul 12, 2005 7:54 pm
by Ruler
Thanks for that link - I can foresee that CD saving me a ton of time and headache in the future. :)

I just ran memtest for 2 complete passes and no errors were found.

See above for the content of /var/log/messages - the unprintable ^@ characters are the only thing that is written when it panics. I don't know how to check to see if the kernel dumped or not. I do have digital pictures of the panic screen with all the hex codes that are displayed when it occurs; I can e-mail one or more of them to you if you'd like.

I used Sco at my last job and picked up all sorts of bad habits there that I still try in Linux at times; I didn't learn Linux until well after I'd left that evil, evil company. I don't know if it's all that much more stable than a mainstream linux distro - we had client's systems go down on a fairly regular basis. The development machine that we had in-house died a few times as well, but it could be the setup and configuration that was done to them as well. I do know that they cost a TON of cash to set up - far above what a standard linux box does.

Posted: Tue Jul 12, 2005 8:23 pm
by cordel
This might be a no no so if this would be bad practice someone else please pipe in.
I think your right in that cron is normaly set to start around 4:00am ( checked mine and it is set to 4:02 and I didn't set it there) Maybe try stoping cron and see if you get a crash or not. If you go more than a week with no problems then maybe 1) cron it loading the system and or 2) I see that in the above log that an event was being recorded (maybe someone leaves about that time in the morning and if cron is doing its thing and the event is being produced it's to much)
Do you have ingnore max fps on alarm checked maybe? In which case would have ZM that is going into an alarm sucking all it can for that camera and zma tring to keep up?
Try looking at the pictures available from the last event to determin if there is something happening to put the camera in alarm state (a niegbor maybe going to work).
Regards,
Cordel

Things that make you go hummmm :?

Posted: Tue Jul 12, 2005 8:42 pm
by Ruler
The daily cron is set for 4:40 am - 6 to 8 minutes before all the crashes have occurred. I'm thinking of just removing the logrotate job; other than the log files growing large, would this have any bad effects?

There are events recorded all night long, even when there is no motion visible on the screen. I'm guessing that the camera's night vision function causes the picture to become grainy and the shifting grains trigger the motion. I've played with the motion settings, but have been unable to find some that don't produce all the false alarms. Currently, it does save significant drive space. When I rebuild the RAID array with more/bigger drives, I'm most likely going to change these to just straight record all night long. (Just like all the external ones.)

I didn't have the ignore max fps option checked, but I did see that record statistical info was - I unchecked this in the hope that it'll lighten the load on the server and prevent these panics. (Assuming that the problem is too high of a load.)

I really appreciate the ideas cordel. At this point, I'm just trying to come up with viable theories that may be the cause and test them as I've about exhausted the troubleshooting that I know how to do.

Posted: Tue Jul 12, 2005 8:48 pm
by jameswilson
as you hink it might be load related what load are you currently averaging?