Closed
Bug 1341621
Opened 8 years ago
Closed 8 years ago
Running mochitests with --dmd can cause hang on startup
Categories
(Core :: DMD, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: kats, Assigned: erahm)
References
Details
Attachments
(4 files)
On a m-c build with the patch from bug 1341450 applied, run:
DMD="--mode=live" ./mach mochitest --subsuite webgl --dmd
(for completeness, I was running the mach command inside xvfb-run, but I don't think that should make a difference)
This should run the mochitests with DMD enabled (i.e. libdmd.so preloaded) with DMD running in "live mode". And it does. However, it runs into a problem: When running the webgl subsuite, it sometimes hangs on start. Running `ps ef` shows a process tree where firefox has just spawned another firefox process, and when I attach to that child process I get the attached backtrace.
It seems like maybe the firefox process is forking while some thread is holding to a DMD mutex, and that results in a deadlock? I'm not really sure. It's intermittent but happens > 50% of the time for me.
Assignee | ||
Comment 1•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0)
> It seems like maybe the firefox process is forking while some thread is
> holding to a DMD mutex, and that results in a deadlock? I'm not really sure.
> It's intermittent but happens > 50% of the time for me.
I'd be really surprised we're just straight up forking here, as NUWA showed that's hard to do right. Kats looked a bit further and it appears to be fork/execve [1] so that hopefully isn't the issue.
Looking at the code we're not doing any error checking [2], adding some assertions there might help. My guess is more of a standard deadlock (maybe doing a measurement while allocating memory in another thread), looking at the other threads might give a clue. I'll see if I can repro locally.
[1] http://searchfox.org/mozilla-central/rev/39e4b25a076c59d2e7820297d62319f167871449/ipc/chromium/src/base/process_util_linux.cc#177-199
[2] http://searchfox.org/mozilla-central/rev/39e4b25a076c59d2e7820297d62319f167871449/memory/replace/dmd/DMD.cpp#453-456
Assignee | ||
Comment 2•8 years ago
|
||
I can't repro this on my machine (Ubuntu 16.04, clang 3.9.0). kats noted he's using |--enable-webrender| so it may be specific to that. I tested with both live mode and dark matter mode.
kats, a couple of questions:
#1 - Does this repro w/o |--enable-webrender|?
#2 - Can you look at other threads when the deadlock happens? I'm interested in any other thread with a |bt| that runs through DMD.cpp
Flags: needinfo?(bugmail)
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #2)
> kats, a couple of questions:
> #1 - Does this repro w/o |--enable-webrender|?
I'll check this tomorrow (leaving needinfo on me for this).
> #2 - Can you look at other threads when the deadlock happens? I'm interested
> in any other thread with a |bt| that runs through DMD.cpp
With my current setup, when this happens, there are no other threads. At least no that gdb can find. See attached console session.
Reporter | ||
Comment 4•8 years ago
|
||
I guess this behaviour might be explained if we fork() while a different thread is holding the DMD lock. The child of the fork will basically inherit the locked object and it will never be unlocked, per [1]. But I would expect the child process stack to look different, not with JS and gtk stuff on the stack. I don't know what that's about.
[1] http://stackoverflow.com/a/1235578
Reporter | ||
Comment 5•8 years ago
|
||
I'm doing a DMD build without webrender now, but I did some more debugging on the webrender-enabled build. I ran the mochitest command with --debugger=gdb and set a breakpoint on fork. The first fork call was at [1]. The second one came from inside [2]. The third one, though, had the same stack as in the previous attachments. Therefore we can conclude that fork call is the one that spins up the deadlocking child process. I also did a `thread apply all bt` to see where the other threads were at this (third) fork call - see attachment. I didn't catch any of them inside the DMD code on that instance but there were 26 threads so it's not surprising that sometimes one of them might be in the middle of an allocation and holding on to the DMD lock when this fork runs.
[1] http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/toolkit/xre/glxtest.cpp#313
[2] http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/widget/gtk/WakeLockListener.cpp#315
Reporter | ||
Comment 6•8 years ago
|
||
And yes, I can repro without webrender. Mozconfig at [1], exact command I'm running is
DMD="--mode=live" xr ./mach mochitest --subsuite webgl --dmd
(where "xr" is [2])
[1] https://github.com/staktrace/moz-scripts/blob/master/mozconfig.Linux-host-ndebug-dmd
[2] https://github.com/staktrace/moz-scripts/blob/master/xr
Flags: needinfo?(bugmail)
Reporter | ||
Comment 7•8 years ago
|
||
So, in summary: when firefox starts up, gtk does a fork/exec to start ibus (which snorp says is some sort of input messaging thing). The code to do the fork/exec can be seen in [1]. After the fork, but before the exec, there are various codepaths, some of which can trigger a call to opendir(). That's what we see happening in the child process stack. The opendir() call calls malloc, which goes into DMD and tries to acquire the lock. This is all after the fork, so the DMD lock inherits whatever state it had at the time of the fork in the parent. But it's before the exec, so before the state is cleared.
[1] https://github.com/GNOME/glib/blob/master/glib/gspawn.c
Assignee | ||
Comment 8•8 years ago
|
||
Per glandium's excellent in-code comment it looks like we work around the same issue in |LogAlloc| [1] by acquiring the state lock prior to forking and releasing after forking. I'm just going to copy that and see if it works.
[1] http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/memory/replace/logalloc/LogAlloc.cpp#82-111
Assignee | ||
Comment 9•8 years ago
|
||
In order to avoid the possibility of a deadlock if the DMD state lock is
currently acquired when forking a |pthread_atfork| hook is added to wait for
and acquire the lock prior to forking, then release it after forking.
Attachment #8840625 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Assignee | ||
Comment 10•8 years ago
|
||
kats would you mind testing attachment 8840625 [details] [diff] [review]?
Flags: needinfo?(bugmail)
![]() |
||
Comment 12•8 years ago
|
||
Comment on attachment 8840625 [details] [diff] [review]
Avoid deadlocks in DMD when forking
Review of attachment 8840625 [details] [diff] [review]:
-----------------------------------------------------------------
Thank you for the fix.
Attachment #8840625 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 13•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d35b6c817c2e53e954755cdd1584c2911a33209
Bug 1341621 - Avoid deadlocks in DMD when forking. r=njn
Comment 14•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•