Closed Bug 1637377 Opened 4 years ago Closed 4 years ago

Process hang dumping gcda file

Categories

(Testing :: Code Coverage, defect, P2)

Version 3
defect

Tracking

(firefox79 fixed)

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: tsmith, Assigned: calixte)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

This log is from a process that held a gcda file open for > 15 seconds and appead to be hung (and idle). I sent SIGABRT the process to get the log.

rax = 0xfffffffffffffffc   rdx = 0x00007fff60eac230
rcx = 0x00007fedc09ff70b   rbx = 0x000000000000001e
rsi = 0x0000000000000007   rdi = 0x000000000000001e
rbp = 0x000000000000001e   rsp = 0x00007fff60eac1c0
r8 = 0x0000000000000000    r9 = 0x00007fed8bdc8800
r10 = 0x0000000000000414   r11 = 0x0000000000000293
r12 = 0x00007fff60eac230   r13 = 0x00007fedb2029ac0
r14 = 0x00007fedb20332e8   r15 = 0x00007fedb3f4c988
rip = 0x00007fedc09ff70b
OS|Linux|0.0.0 Linux 4.15.0-96-generic #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 x86_64
CPU|amd64|family 6 model 158 stepping 9|4
GPU|||
Crash|SIGABRT|0x3e800009b16|0
0|0|libpthread.so.0|__libc_fcntl|/build/glibc-LK5gWL/glibc-2.23/nptl/../sysdeps/unix/sysv/linux/fcntl.c|75|0xf
0|1|libxul.so|__gcov_open|/builds/worker/fetches/gcc-source/gcc/gcov-io.c|168|0x11
0|2|libxul.so|gcov_do_dump|/builds/worker/fetches/gcc-source/libgcc/libgcov-driver.c|850|0xdb
0|3|libxul.so|__gcov_dump_one|/builds/worker/fetches/gcc-source/libgcc/libgcov-driver.c|871|0xe
0|4|libxul.so|__gcov_dump_int|/builds/worker/fetches/gcc-source/libgcc/libgcov-interface.c|164|0x8
0|5|libxul.so|__gcov_flush|/builds/worker/fetches/gcc-source/libgcc/libgcov-interface.c|88|0x5
0|6|libxul.so|mozilla::CodeCoverageHandler::FlushCounters()|hg:hg.mozilla.org/mozilla-central:tools/code-coverage/CodeCoverageHandler.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|49|0x5
0|7|libxul.so|mozilla::CodeCoverageHandler::FlushCountersSignalHandler(int)|hg:hg.mozilla.org/mozilla-central:tools/code-coverage/CodeCoverageHandler.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|97|0x12
0|8|libpthread.so.0||||0x11390
0|9|libxul.so|mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&)|hg:hg.mozilla.org/mozilla-central:dom/events/EventDispatcher.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|628|0x14
0|10|libxul.so|mozilla::EventDispatcher::Dispatch(nsISupports*, nsPresContext*, mozilla::WidgetEvent*, mozilla::dom::Event*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*)|hg:hg.mozilla.org/mozilla-central:dom/events/EventDispatcher.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1055|0x21
0|11|libxul.so|mozilla::PresShell::EventHandler::DispatchEventToDOM(mozilla::WidgetEvent*, nsEventStatus*, nsPresShellEventCB*)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|8387|0x21
0|12|libxul.so|mozilla::PresShell::EventHandler::DispatchEvent(mozilla::EventStateManager*, mozilla::WidgetEvent*, bool, nsEventStatus*, nsIContent*)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|7933|0x12
0|13|libxul.so|mozilla::PresShell::EventHandler::HandleEventWithCurrentEventInfo(mozilla::WidgetEvent*, nsEventStatus*, bool, nsIContent*)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|7866|0x5
0|14|libxul.so|mozilla::PresShell::EventHandler::HandleEventUsingCoordinates(nsIFrame*, mozilla::WidgetGUIEvent*, nsEventStatus*, bool)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|6804|0x1b
0|15|libxul.so|mozilla::PresShell::EventHandler::HandleEvent(nsIFrame*, mozilla::WidgetGUIEvent*, bool, nsEventStatus*)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|6608|0x15
0|16|libxul.so|mozilla::PresShell::HandleEvent(nsIFrame*, mozilla::WidgetGUIEvent*, bool, nsEventStatus*)|hg:hg.mozilla.org/mozilla-central:layout/base/PresShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|6533|0x5
0|17|libxul.so|nsViewManager::DispatchEvent(mozilla::WidgetGUIEvent*, nsView*, nsEventStatus*)|hg:hg.mozilla.org/mozilla-central:view/nsViewManager.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|750|0x13
0|18|libxul.so|nsView::HandleEvent(mozilla::WidgetGUIEvent*, bool)|hg:hg.mozilla.org/mozilla-central:view/nsView.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1133|0x16
0|19|libxul.so|mozilla::widget::PuppetWidget::DispatchEvent(mozilla::WidgetGUIEvent*, nsEventStatus&)|hg:hg.mozilla.org/mozilla-central:widget/PuppetWidget.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|381|0x27
0|20|libxul.so|mozilla::layers::APZCCallbackHelper::DispatchWidgetEvent(mozilla::WidgetGUIEvent&)|hg:hg.mozilla.org/mozilla-central:gfx/layers/apz/util/APZCCallbackHelper.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|458|0x13
0|21|libxul.so|mozilla::dom::BrowserChild::HandleRealMouseButtonEvent(mozilla::WidgetMouseEvent const&, mozilla::layers::ScrollableLayerGuid const&, unsigned long const&)|hg:hg.mozilla.org/mozilla-central:dom/ipc/BrowserChild.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1654|0x29
0|22|libxul.so|mozilla::dom::BrowserChild::ProcessPendingCoalescedMouseDataAndDispatchEvents()|hg:hg.mozilla.org/mozilla-central:dom/ipc/BrowserChild.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1500|0x15
0|23|libxul.so|mozilla::dom::BrowserChild::RecvRealMouseButtonEvent(mozilla::WidgetMouseEvent const&, mozilla::layers::ScrollableLayerGuid const&, unsigned long const&)|hg:hg.mozilla.org/mozilla-central:dom/ipc/BrowserChild.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1618|0x8
0|24|libxul.so|mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&)|s3:gecko-generated-sources:baeb833f5dfd0f57cd3224c436f231ada6753d8132103cea7351cb1fd5035ee5cf264d443e840060ccaaea73712fd3c8c123b492deb13226e4ebbd30f630a5e0/ipc/ipdl/PBrowserChild.cpp:|5092|0x1b
0|25|libxul.so|mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&)|s3:gecko-generated-sources:ab35ef2e9e95379af447a58a521724375ec2d80b867885b9579a470dbd6238da9e84a1c3f8ee7182bf9d020a68d38fb6bce0901b15085cddf93fb006d87728ae/ipc/ipdl/PContentChild.cpp:|8476|0x9
0|26|libxul.so|mozilla::dom::ContentChild::OnMessageReceived(IPC::Message const&)|hg:hg.mozilla.org/mozilla-central:dom/ipc/ContentChild.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|3621|0xb
0|27|libxul.so|mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessageChannel.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|2186|0x6
0|28|libxul.so|mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessageChannel.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|2110|0xe
0|29|libxul.so|mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessageChannel.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1958|0xb
0|30|libxul.so|mozilla::ipc::MessageChannel::MessageTask::Run()|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessageChannel.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1989|0xc
0|31|libxul.so|nsThread::ProcessNextEvent(bool, bool*)|hg:hg.mozilla.org/mozilla-central:xpcom/threads/nsThread.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|1207|0x6
0|32|libxul.so|NS_ProcessNextEvent(nsIThread*, bool)|hg:hg.mozilla.org/mozilla-central:xpcom/threads/nsThreadUtils.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|489|0xe
0|33|libxul.so|mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessagePump.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|109|0xd
0|34|libxul.so|mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessagePump.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|270|0xb
0|35|libxul.so|MessageLoop::Run()|hg:hg.mozilla.org/mozilla-central:ipc/chromium/src/base/message_loop.cc:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|290|0x5
0|36|libxul.so|nsBaseAppShell::Run()|hg:hg.mozilla.org/mozilla-central:widget/nsBaseAppShell.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|137|0x16
0|37|libxul.so|XRE_RunAppShell()|hg:hg.mozilla.org/mozilla-central:toolkit/xre/nsEmbedFunctions.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|909|0x6
0|38|libxul.so|mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessagePump.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|237|0x5
0|39|libxul.so|MessageLoop::Run()|hg:hg.mozilla.org/mozilla-central:ipc/chromium/src/base/message_loop.cc:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|290|0x5
0|40|libxul.so|XRE_InitChildProcess(int, char**, XREChildData const*)|hg:hg.mozilla.org/mozilla-central:toolkit/xre/nsEmbedFunctions.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|740|0x8
0|41|libxul.so|mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*)|hg:hg.mozilla.org/mozilla-central:toolkit/xre/Bootstrap.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|67|0x5
0|42|firefox-bin|content_process_main(mozilla::Bootstrap*, int, char**)|hg:hg.mozilla.org/mozilla-central:ipc/contentproc/plugin-container.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|56|0x13
0|43|firefox-bin|main|hg:hg.mozilla.org/mozilla-central:browser/app/nsBrowserApp.cpp:61a83cc0b74b43117a9fa6d92c3d693ea03bbffc|303|0x13
0|44|libc.so.6|__libc_start_main|/build/glibc-LK5gWL/glibc-2.23/csu/../csu/libc-start.c|291|0x1a
0|45|firefox-bin|_start|||0x29
0|46|||||0x7fff60eb3f18

This does not seem to be test case dependent.

Here are the STR to reproduce what we are seeing while fuzzing:

  1. Install grizzly pip install grizzly-framework
  2. Set GCOV_PREFIX_STRIP= and GCOV_PREFIX= if needed
  3. Run Grizzly with coverage build python3 -m grizzly <coverage-build>/firefox no-op -c 1 --ignore timeout -p ~prefs.js --coverage

The browser process will hang after a few iterations. The no-op adapter opens and closes a window nothing special.

You will see output like this when the issue has been reproduced:
[2020-05-20 10:55:56] gcda file open by pid 7982 after 15.52s

Attached file prefs.js

Here is a pref.js file to use when testing.

Do you have the same issue in using clang ?
If not, is it an option for you to switch clang ?

Is there a coverage build on TaskCluster that isn't built with FORCE_GCC=1? If not why not? does it work? is it supported?

(In reply to Tyson Smith [:tsmith] from comment #4)

Is there a coverage build on TaskCluster that isn't built with FORCE_GCC=1? If not why not? does it work? is it supported?

There is linux64-asan-fuzzing-ccov/opt which is using Clang.
The normal coverage build is still using GCC, but we are planning to switch to Clang soon (there were a couple of concurrency bugs that we had to fix in LLVM first).
We could try switching linux64-fuzzing-ccov/opt to Clang now, maybe you won't hit those problems that we were hitting (and that should be fixed in Clang 10).

Truber created a try build switching the fuzzing ccov builds to Clang. I am seeing the same issue.

On the plus side it is very easy to reproduce. I launch the browser and then send SIGUSR1 to all the process that get opened. Doing this 2x triggered the hang for me: kill -10 <ppid> && kill -10 <ppid> ... kill -10 <pid>

The severity field is not set for this bug.
:ekyle, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(klahnakoski)

(In reply to Tyson Smith [:tsmith] from comment #6)

Truber created a try build switching the fuzzing ccov builds to Clang. I am seeing the same issue.

On the plus side it is very easy to reproduce. I launch the browser and then send SIGUSR1 to all the process that get opened. Doing this 2x triggered the hang for me: kill -10 <ppid> && kill -10 <ppid> ... kill -10 <pid>

What is the stack trace with Clang?

(NOTE: Calixte tried to reproduce but wasn't able to)

(In reply to Marco Castelluccio [:marco] from comment #8)

What is the stack trace with Clang?

Using the try build from comment #6 I get the following when I send a SIGABRT when the process is hung (SIGABRT is sent to the process with an open gcda file):

rax = 0xfffffffffffffffc   rdx = 0x0000000000000002
rcx = 0x00007f770649126d   rbx = 0x000055c12b1ddb40
rsi = 0x0000000000000000   rdi = 0x00007f77068a6000
rbp = 0x00007ffd6192dbe8   rsp = 0x00007ffd6192d4c8
r8 = 0x00007f77068a6000    r9 = 0x000000000000002a
r10 = 0x0000000000000000   r11 = 0x0000000000000246
r12 = 0x000055c12b1ddb40   r13 = 0x00000000ffffffff
r14 = 0x000055c12a877a50   r15 = 0x0000000000000004
rip = 0x00007f770649126d
OS|Linux|0.0.0 Linux 4.15.0-101-generic #102~16.04.1-Ubuntu SMP Mon May 11 11:38:16 UTC 2020 x86_64
CPU|amd64|family 6 model 158 stepping 9|4
GPU|||
Crash|SIGABRT|0x3e80000ea73|0
0|0|libpthread.so.0|__lll_lock_wait|||0x1d
0|1|libpthread.so.0|__pthread_mutex_lock|/build/glibc-LK5gWL/glibc-2.23/nptl/../nptl/pthread_mutex_lock.c|115|0x1e
0|2|libxul.so|FlushCounters|hg:hg.mozilla.org/try:tools/code-coverage/CodeCoverageHandler.cpp:cc16669f7b54134328a1e010050ebd49a9e48459|47|0x77
0|3|libpthread.so.0||||0x11390
0|4|libc.so.6|__poll_nocancel|||0x24
0|5|libc.so.6|__poll_nocancel|||0x24
0|6|||||0x2b000000000033
0|7|libxul.so|Dispatch|hg:hg.mozilla.org/try:dom/events/EventDispatcher.cpp:cc16669f7b54134328a1e010050ebd49a9e48459|1124|0x88

(In reply to Marco Castelluccio [:marco] from comment #9)

(NOTE: Calixte tried to reproduce but wasn't able to)

Did he/you try with (comment #1) or without (comment #6) Grizzly?

I finally managed to reproduce the issue with gcc-7 and grizzly.
I added a little more logging in grizzly and here's what I get:

D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54567 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54599 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54642 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54688 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54735 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54751 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54766 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54778 (child)
D grizzly [2020-05-28 11:27:01] Sending SIGUSR1 to 54542 (parent)
D grizzly [2020-05-28 11:27:01] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:02] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:02] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:02] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:03] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:04] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:04] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:05] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:06] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:07] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:08] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:09] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:10] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:11] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:12] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:14] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:15] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:16] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
D grizzly [2020-05-28 11:27:17] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=31, position=0, mode='r+', flags=32770): 54642, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770): 54778}
[popenfile(path='/tmp/ffplogs_kfw271gb/ffp_log_l9spp6c6.txt', fd=1, position=976, mode='w', flags=32769), popenfile(path='/tmp/ffplogs_kfw271gb/ffp_log_omb666ta.txt', fd=2, position=4745, mode='w', flags=32769), popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dom/html/Unified_cpp_dom_html4.gcda', fd=5, position=80188, mode='r+', flags=32770)]
W grizzly [2020-05-28 11:27:17] gcda file open by pid 54642 after 15.34s

So according to position, process 54642 is waiting for a file lock and for any reason process 54778 is stuck.
Normally , because of https://searchfox.org/mozilla-central/source/tools/code-coverage/CodeCoverageHandler.cpp#47, we shouldn't have two processes (or maybe I'm misunderstanding what CrossProcessMutexAutoLock is) trying to flush at the same time: so for now I don't understand how we can be in such a situation.

In adding more logging, I get:

D sphr_loadmgr [2020-05-28 16:35:01] shutting down and cleaning up workers
D sapphire [2020-05-28 16:35:01] status: 0, timeout: False
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92123 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92123, name='Socket Process', started='16:34:46')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92150 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92150, name='GPU Process', started='16:34:47')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92204 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92204, name='file:// Content', started='16:34:49')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92246 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92246, name='WebExtensions', started='16:34:52')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92301 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92301, name='GPU Process', started='16:34:54')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92323 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92323, name='GPU Process', started='16:34:56')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92339 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92339, name='GPU Process', started='16:34:58')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92354 (child)
D grizzly [2020-05-28 16:35:01] psutil.Process(pid=92354, name='IPC Launch', started='16:35:00')
D grizzly [2020-05-28 16:35:01] Sending SIGUSR1 to 92098 (parent)
D grizzly [2020-05-28 16:35:01] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/layout/base/Unified_cpp_layout_base1.gcda', fd=5, position=170300, mode='r+', flags=32770): 92354}
D grizzly [2020-05-28 16:35:01] {popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/layout/base/Unified_cpp_layout_base1.gcda', fd=32, position=0, mode='r+', flags=32770): 92204, popenfile(path='/home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/layout/base/Unified_cpp_layout_base1.gcda', fd=5, position=170300, mode='r+', flags=32770): 92354}

The firefox stderr:

[ffpuppet] Launch command: /home/calixte/dev/mozilla/mozilla-central.hg/obj-x86_64-pc-linux-gnu/dist/bin/firefox -no-remote -profile /home/calixte/tmp/gz/ffprof_0c1pzm1i http://127.0.0.1:21489

[CodeCoverage] Setting handlers for process 92098.
[CodeCoverage] Setting handlers for process 92204.
[CodeCoverage] Setting handlers for process 92246.
...

[CodeCoverage] Requested flush for 92204.
[CodeCoverage] Requested flush for 92246.
[CodeCoverage] Requested flush for 92098.
[CodeCoverage] flush completed.
[CodeCoverage] flush completed.
[CodeCoverage] flush completed.
[CodeCoverage] Requested flush for 92204.
[CodeCoverage] Requested flush for 92246.
[CodeCoverage] Requested flush for 92354.
[CodeCoverage] Requested flush for 92098.

So it's interesting to notice that process 92354 doesn't appear in Setting handlers... but a flush is requested for it and it's involved in the double writing in file "Unified_cpp_layout_base1.gcda" which should be impossible if the 2 processes are really using the same mutex.

Severity: -- → S3
Flags: needinfo?(klahnakoski)
Priority: -- → P2

My thoughts:

The bug is reproducable with clang.
I added some logging in clang and here's what I got (DUMP is called at exit or before exec** functions when FLUSH is a user request):

[CLANG] DUMP start (pid=176260)
[CodeCoverage] Requested flush for 176116.
[CLANG] FLUSH start (pid=176116)
[CodeCoverage] Requested flush for 176164.
[CodeCoverage] Requested flush for 176260.
[CodeCoverage] Requested flush for 176031.

and the file lock issue is between process 176260 (at position 212828 in foo.gcda) and process 176116 (at position 0 in foo.gcda too).
So I think the explanation in comment 14 is the correct one.

Assignee: nobody → cdenizet

Gcc and Clang dumps gcda files just before an exec** or fork functions.
With ccov enabled, we can dump using a SIGUSR1 but if we're in the middle of dump (because of exec** or fork)
then a gcda file can stay locked and then another process can try to get a lock on it for ever.
So to avoid such a situation, we remove the SIGUSR1 handler just before the fork an set it back just after.

Flags: needinfo?(twsmith)

This build runs much better however it does still hang eventually. Calixte mentioned that it is likely a separate issue. I have opened bug 1643055.

Flags: needinfo?(twsmith)
Pushed by cdenizet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9320c9fef153
Avoid dead lock on user-requested ccov dumps r=marco
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: