GeckoProfiler.Markers gtest crashes after changing tests to use socket process
Categories
(Core :: Networking, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox-esr78 | --- | wontfix |
firefox77 | --- | unaffected |
firefox78 | --- | wontfix |
firefox79 | --- | fixed |
firefox80 | --- | fixed |
People
(Reporter: brennan.brisad, Assigned: mozbugz)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [necko-triaged])
Attachments
(1 obsolete file)
The GeckoProfiler.Markers
gtest have started crashing on my Linux machine. I git-bisected the issue and found that https://hg.mozilla.org/mozilla-central/rev/e251c2af1161147109a3874f33e356b51bdb45d1 (of Bug 1602832) is the commit which broke it.
If I run the test,
$ ./mach gtest GeckoProfiler.Markers
<snip>
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from GeckoProfiler
[ RUN ] GeckoProfiler.Markers
it stays there with no continued progress.
This is what it looks if I do a process listing:
$ ps -ef f
UID PID PPID C STIME TTY STAT TIME CMD
<snip>
michael 18578 6027 2 20:47 pts/1 S+ 0:00 | | \_ python2.7 ./mach gtest GeckoProfiler.Markers
michael 19244 18578 13 20:47 pts/1 Sl+ 0:02 | | \_ /home/michael/gecko/objdir-frontend/dist/bin/firefox -uni
michael 19246 19244 0 20:47 pts/1 Z+ 0:00 | | \_ [firefox] <defunct>
<snip>
I don't know if it is related to this issue, but I thought it might be worth mentioning that a process listing on my computer also shows that my Nightly browser which I have been running for many days have gathered quite a few defunct processes:
michael 17643 17640 0 May31 ? Sl 201:12 \_ /home/michael/.local/nightly/firefox-bin
michael 17668 17643 0 May31 ? Z 1:40 \_ [GPU Process] <defunct>
michael 17708 17643 0 May31 ? Z 0:00 \_ [Socket Process] <defunct>
michael 17725 17643 0 May31 ? Z 0:28 \_ [Privileged Cont] <defunct>
michael 17778 17643 0 May31 ? Z 4:02 \_ [Web Content] <defunct>
michael 17783 17643 0 May31 ? Z 2:36 \_ [Web Content] <defunct>
michael 17787 17643 0 May31 ? Z 2:37 \_ [Web Content] <defunct>
michael 17856 17643 0 May31 ? Z 1:40 \_ [WebExtensions] <defunct>
michael 18083 17643 0 May31 ? Z 0:00 \_ [RDD Process] <defunct>
michael 24474 17643 0 May31 ? Z 0:22 \_ [Web Content] <defunct>
michael 9214 17643 0 Jun01 ? Z 0:01 \_ [Web Content] <defunct>
michael 9249 17643 0 Jun01 ? Z 0:00 \_ [Web Content] <defunct>
michael 9291 17643 0 Jun01 ? Z 0:00 \_ [pingsender] <defunct>
michael 9295 17643 0 Jun01 ? Z 0:00 \_ [firefox-bin] <defunct>
michael 9320 17643 0 Jun01 ? Z 8:07 \_ [GPU Process] <defunct>
michael 9359 17643 0 Jun01 ? Z 0:00 \_ [Socket Process] <defunct>
michael 9431 17643 0 Jun01 ? Z 5:12 \_ [Web Content] <defunct>
michael 9469 17643 0 Jun01 ? Z 8:10 \_ [Web Content] <defunct>
michael 9472 17643 0 Jun01 ? Z 4:43 \_ [Web Content] <defunct>
michael 9483 17643 0 Jun01 ? Z 6:17 \_ [Web Content] <defunct>
michael 9534 17643 0 Jun01 ? Z 1:22 \_ [Privileged Cont] <defunct>
michael 9602 17643 0 Jun01 ? Z 4:17 \_ [WebExtensions] <defunct>
michael 9737 17643 0 Jun01 ? Z 0:00 \_ [RDD Process] <defunct>
michael 9971 17643 0 Jun01 ? Z 1:27 \_ [Web Content] <defunct>
michael 12119 17643 0 Jun01 ? Z 1:54 \_ [Web Content] <defunct>
michael 15988 17643 0 Jun01 ? Z 1:21 \_ [Web Content] <defunct>
michael 25947 17643 0 Jun03 ? Z 0:10 \_ [Web Content] <defunct>
michael 27935 17643 0 Jun04 ? Z 0:00 \_ [pingsender] <defunct>
michael 27939 17643 0 Jun04 ? Z 0:00 \_ [firefox-bin] <defunct>
michael 27964 17643 0 Jun04 ? Z 0:17 \_ [GPU Process] <defunct>
michael 28001 17643 0 Jun04 ? Z 0:00 \_ [Socket Process] <defunct>
michael 28067 17643 0 Jun04 ? Z 0:06 \_ [Web Content] <defunct>
michael 28105 17643 0 Jun04 ? Z 0:51 \_ [Web Content] <defunct>
michael 28108 17643 0 Jun04 ? Z 0:36 \_ [Web Content] <defunct>
michael 28120 17643 0 Jun04 ? Z 0:54 \_ [Web Content] <defunct>
michael 28158 17643 0 Jun04 ? Z 0:03 \_ [Privileged Cont] <defunct>
michael 28239 17643 0 Jun04 ? Z 0:17 \_ [WebExtensions] <defunct>
michael 28285 17643 0 Jun04 ? Z 0:00 \_ [Web Content] <defunct>
michael 28406 17643 0 Jun04 ? Z 0:00 \_ [RDD Process] <defunct>
michael 29108 17643 0 Jun04 ? Z 0:00 \_ [pingsender] <defunct>
michael 29113 17643 0 Jun04 ? Z 0:00 \_ [firefox-bin] <defunct>
michael 29138 17643 0 Jun04 ? Sl 82:33 \_ /home/michael/.local/nightly/firefox-bin -contentproc -parentBuildID 20200603214922 -prefsLen 1 -prefMapSize 226332 -appdir /home/michael/.local/nightly
michael 29177 17643 0 Jun04 ? Sl 0:03 \_ /home/michael/.local/nightly/firefox-bin -contentproc -parentBuildID 20200603214922 -prefsLen 90 -prefMapSize 226332 -appdir /home/michael/.local/nightl
michael 29271 17643 0 Jun04 ? Sl 136:15 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 2 -isForBrowser -prefsLen 695 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir /
michael 29273 17643 0 Jun04 ? Sl 51:13 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 3 -isForBrowser -prefsLen 695 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir /
michael 29286 17643 0 Jun04 ? Sl 58:22 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 4 -isForBrowser -prefsLen 695 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir /
michael 29332 17643 0 Jun04 ? Sl 9:58 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 5 -isForBrowser -prefsLen 789 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir /
michael 29401 17643 0 Jun04 ? Sl 32:40 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 6 -isForBrowser -prefsLen 6915 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir
michael 29459 17643 0 Jun04 ? Sl 54:45 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 7 -isForBrowser -prefsLen 7805 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir
michael 29632 17643 0 Jun04 ? Sl 0:13 \_ /home/michael/.local/nightly/firefox-bin -contentproc -parentBuildID 20200603214922 -prefsLen 7939 -prefMapSize 226332 -appdir /home/michael/.local/nigh
michael 29787 17643 0 Jun04 ? Sl 17:43 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 8 -isForBrowser -prefsLen 8361 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir
michael 30188 17643 0 Jun04 ? Sl 35:27 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 9 -isForBrowser -prefsLen 8361 -prefMapSize 226332 -parentBuildID 20200603214922 -appdir
michael 1473 17643 0 Jun05 ? Sl 34:08 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 12 -isForBrowser -prefsLen 10360 -prefMapSize 226332 -parentBuildID 20200603214922 -appdi
michael 2171 17643 0 Jun05 ? Sl 15:27 \_ /home/michael/.local/nightly/firefox-bin -contentproc -childID 13 -isForBrowser -prefsLen 10561 -prefMapSize 226332 -parentBuildID 20200603214922 -appdi
Comment 1•4 years ago
|
||
Kershaw, please have a look at this bug.
Comment 2•4 years ago
|
||
Set release status flags based on info from the regressing bug 1602832
Comment 3•4 years ago
|
||
I think this is a dead lock problem.
See the log below, http handler is created in locked_profiler_stream_json_for_this_process
, where gPSMutex
is already locked and profiler_register_thread()
is called since we tried to launch socket process in http handler. In profiler_register_thread
, we tried to lock gPSMutex
again and cause this dead lock.
[(null) 38769: Main Thread]: I/prof [38769] profiler_start
[(null) 38769: Main Thread]: I/prof [38769] locked_profiler_start
[(null) 38769: Main Thread]: I/prof [38769] - capacity = 8388608
[(null) 38769: Main Thread]: I/prof [38769] - duration = -1.00
[(null) 38769: Main Thread]: I/prof [38769] - interval = 1.00
[(null) 38769: Main Thread]: I/prof [38769] - browsing context ID = 0
[(null) 38769: Main Thread]: I/prof [38769] - feature = stackwalk
[(null) 38769: Main Thread]: I/prof [38769] - threads = GeckoMain
[(null) 38769: Unnamed thread 0x110140bd0]: D/prof [38769] profiler_register_thread(AsyncLogger)
[(null) 38769: Main Thread]: D/prof [38769] profiler_add_sampled_counter(malloc)
[(null) 38769: Main Thread]: I/prof [38769] profiler_stream_json_for_this_process
[(null) 38769: Main Thread]: I/prof [38769] locked_profiler_stream_json_for_this_process
[(null) 38769: Main Thread]: D/nsHttp nsHttpAuthCache::nsHttpAuthCache 0x1101b4858
[(null) 38769: Main Thread]: D/nsHttp nsHttpAuthCache::nsHttpAuthCache 0x1101b4888
[(null) 38769: Main Thread]: D/nsHttp Creating nsHttpHandler [this=0x1101b4800].
[(null) 38769: Main Thread]: D/nsHttp nsHttpHandler::SetFastOpenOSSupport version 19.5.0
[(null) 38769: Main Thread]: D/nsHttp nsHttpHandler::SetFastOpenOSSupport supported.
[(null) 38769: Main Thread]: D/nsHttp nsHttpHandler::Init
[(null) 38769: IPC Launch #1]: D/prof [38769] profiler_register_thread(IPC Launch #1)
Updated•4 years ago
|
Updated•4 years ago
|
Comment 4•4 years ago
|
||
nsHttpHandler::Init launches socket process, which causes deadlock in trying profiler_register_thread if it's already locked.
Assignee | ||
Comment 5•4 years ago
|
||
I found this deadlock locally (on Windows, so it's not a Linux/intel-only bug), and I've just worked on a fix in bug 1649056 before I saw this bug here! 😅
My solution is pretty much the same (do things before locking the profiler mutex), but covers other potentially-dangerous things and a bit more, so I'd prefer we keep mine.
I'll make this bug here dependent on bug 1649056, and you can verify it's fixed afterwards...
Updated•4 years ago
|
Assignee | ||
Comment 6•4 years ago
|
||
Fixed by bug 1649056. Please reopen if you still have problems with this test.
Comment 7•4 years ago
|
||
Is bug 1649056 something we should consider uplifting?
Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)
Is bug 1649056 something we should consider uplifting?
Thank you for the suggestion. Requested in bug 1649056 comment 8.
Comment 9•4 years ago
|
||
The patch landed in nightly and beta is affected.
:gerald, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 10•4 years ago
|
||
Landed in beta, see bug 1649056 comment 10.
Updated•4 years ago
|
Updated•4 years ago
|
Description
•