Page 1 of 1

zmc Segfault

Posted: Wed Dec 11, 2024 9:27 pm
by RBlumel2
Having "zmc -m4" segfault and not understanding why. What information do you think is needed to help sort this out? Im not seeing a ton more in the logs.

Running 1.37.65~20241210.72-bookworm

Code: Select all

zmc[235875]: segfault at b8 ip 000056532a5c1035 sp 00007ffe33083d20 error 6 in zmc[95035,56532a54d000+1a5000] likely on CPU 2 (core 2, socket 0)
[62747.763727] Code: ff ff ff 48 89 c7 48 89 45 c8 e8 c6 56 fa ff 85 c0 0f 8f ae 01 00 00 49 8b 04 24 48 8b 1b 48 8b 00 49 89 04 24 4c 39 fb 75 c1 <41> c6 85 b8 00 00 00 01 49 8d 7d 28 e8 3a d0 f8 ff 49 8b 45 68 48

Code: Select all

=235666== Memcheck, a memory error detector
==235666== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==235666== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==235666== Command: zmc -m4
==235666== 
12/11/24 13:22:33.791725 zmc_m4[235666].WAR-zmc.cpp/244 [Couldn't connect to monitor 4]

==235666== Invalid write of size 8
==235666==    at 0x134448: UnknownInlinedFun (chrono.h:212)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:260)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:1131)
==235666==    by 0x134448: SetStartupTime (zm_monitor.h:853)
==235666==    by 0x134448: main (zmc.cpp:247)
==235666==  Address 0x78 is not stack'd, malloc'd or (recently) free'd
==235666== 
==235666== 
==235666== Process terminating with default action of signal 11 (SIGSEGV)
==235666==  Access not within mapped region at address 0x78
==235666==    at 0x134448: UnknownInlinedFun (chrono.h:212)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:260)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:1131)
==235666==    by 0x134448: SetStartupTime (zm_monitor.h:853)
==235666==    by 0x134448: main (zmc.cpp:247)
==235666==  If you believe this happened as a result of a stack
==235666==  overflow in your program's main thread (unlikely but
==235666==  possible), you can try to increase the size of the
==235666==  main thread stack using the --main-stacksize= flag.
==235666==  The main thread stack size used in this run was 8388608.
==235666== 
==235666== HEAP SUMMARY:
==235666==     in use at exit: 8,936,854 bytes in 4,117 blocks
==235666==   total heap usage: 7,099 allocs, 2,982 frees, 9,447,260 bytes allocated
==235666== 
==235666== LEAK SUMMARY:
==235666==    definitely lost: 0 bytes in 0 blocks
==235666==    indirectly lost: 0 bytes in 0 blocks
==235666==      possibly lost: 816 bytes in 3 blocks
==235666==    still reachable: 8,934,022 bytes in 4,093 blocks
==235666==         suppressed: 0 bytes in 0 blocks
==235666== Rerun with --leak-check=full to see details of leaked memory
==235666== 
==235666== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==235666== 
==235666== 1 errors in context 1 of 1:
==235666== Invalid write of size 8
==235666==    at 0x134448: UnknownInlinedFun (chrono.h:212)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:260)
==235666==    by 0x134448: UnknownInlinedFun (chrono.h:1131)
==235666==    by 0x134448: SetStartupTime (zm_monitor.h:853)
==235666==    by 0x134448: main (zmc.cpp:247)
==235666==  Address 0x78 is not stack'd, malloc'd or (recently) free'd
==235666== 
==235666== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
[/code][/code]

Re: zmc Segfault

Posted: Thu Dec 12, 2024 4:49 pm
by RBlumel2
Ok got a core dump
I am no programmer but what I gather is some time/date getting messed up and overflowing?

Code: Select all

Reading symbols from zmc...
Reading symbols from /usr/lib/debug/.build-id/c2/4610bbbd50ef4082a460a81f037b585ed92075.debug...
[New LWP 485121]
[New LWP 485122]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `zmc -m4'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005631eebbe448 in std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1l> >, std::ratio<1l, 1000000000l>, long, true, false>::__cast<long, std::ratio<1l, 1000000000l> > (__d=...) at /usr/include/c++/12/bits/chrono.h:212
212		      static_cast<_CR>(__d.count()) / static_cast<_CR>(_CF::den)));
[Current thread is 1 (Thread 0x7fa1019183c0 (LWP 485121))]
(gdb) bt
#0  0x00005631eebbe448 in std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1l> >, std::ratio<1l, 1000000000l>, long, true, false>::__cast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&)
    (__d=<optimized out>) at /usr/include/c++/12/bits/chrono.h:212
#1  std::chrono::duration_cast<std::chrono::duration<long, std::ratio<1l, 1l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) (__d=<optimized out>) at /usr/include/c++/12/bits/chrono.h:260
#2  std::chrono::_V2::system_clock::to_time_t(std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) (__t=<optimized out>) at /usr/include/c++/12/bits/chrono.h:1131
#3  Monitor::SetStartupTime(std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) (time=..., this=0x5631f1fe1bf0) at ./src/zm_monitor.h:853
#4  main(int, char**) (argc=<optimized out>, argv=<optimized out>) at ./src/zmc.cpp:247

Re: zmc Segfault

Posted: Thu Dec 12, 2024 5:06 pm
by iconnor
void SetStartupTime(SystemTimePoint time) { shared_data->startup_time = std::chrono::system_clock::to_time_t(time); }

The only thing I can think of is that shared_data is not valid, meaning we havn't setup the shared mem space. There must be a warning immediately before the crash saying "Couldn't connect to monitor".

The code in zmc is clearly wrong, if we fail then we try to set a value in the shm. I will change the if to a while and it will retry every second.

Try latest, but you will need to investigate why it couldn't get the shm.

Re: zmc Segfault

Posted: Thu Dec 12, 2024 5:21 pm
by RBlumel2
OK Ill double check why shm isn't working. I had an issue that I thought I resolved with making systemd run it as user www-data, though I don't know that is best practice. ill use strace on it and make sure it opens it.

Re: zmc Segfault

Posted: Thu Dec 12, 2024 9:25 pm
by RBlumel2
Ok, ran the updated version and the segfault is gone, Yay!. Also yes I still had a issue with permissions to /dev/shm . The systemd file was updated and commented out the "User www-data" line. Is there a reason the pkg has that commented out?

Re: zmc Segfault

Posted: Thu Dec 12, 2024 9:51 pm
by iconnor
I think because it isn't needed... how things get started up is a little convoluted under systemd.
It is commented out here on mine and it all works fine.

Is there enough space in /dev/shm? Turn on debugging and look in logs.