Nullptr crash due to reentrancy in ~AutoTaskDispatcher
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox99 | --- | fixed |
People
(Reporter: bwc, Assigned: bwc)
References
Details
Crash Data
Attachments
(1 file)
See bug 1748333 (comment 13 and comment 33)
Assignee | ||
Comment 1•2 years ago
|
||
This looks like just a nullptr crash, and it is pretty hard to trigger. Rate on crash-stats is pretty low:
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 2•2 years ago
|
||
There's a few things we could blame here:
- This code results in ~AutoTaskDispatcher going reentrant and calling mTailDispatcher.DispatchTasksFor, because Maybe::reset invokes the d'tor first, and then marks itself as containing Nothing:
https://searchfox.org/mozilla-central/rev/9f61d854547cedbde0773b2893e4f925352be3b3/xpcom/threads/AbstractThread.cpp#196
https://searchfox.org/mozilla-central/rev/9f61d854547cedbde0773b2893e4f925352be3b3/mfbt/Maybe.h#639,642
This is pretty inconsistent with the way things like unique_ptr work, which is basically how we're using mTailDispatcher. I think that this is probably something we need to fix, one way or another, even if there are other things we should be fixing here.
- It is probably not ideal that we are setting up a situation where reentrancy could occur in the first place. It is harder to say what we could do to fix this, though.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 3•2 years ago
|
||
Assignee | ||
Comment 4•2 years ago
|
||
The core of the problem here is that Maybe::reset invokes the d'tor first, and
then clears the value, whereas unique_ptr::reset does the opposite.
Updated•2 years ago
|
Assignee | ||
Comment 5•2 years ago
|
||
webrtc jobs look ok, although this change could effect pretty much any feature.
Assignee | ||
Comment 6•2 years ago
|
||
Trying to determine whether anything has regressed in mochitest writ large:
Here's the baseline push:
So many oranges...
Assignee | ||
Comment 7•2 years ago
|
||
Here's a failure caused by this bug on try.
Updated•2 years ago
|
Assignee | ||
Comment 9•2 years ago
•
|
||
Seeing a lot more failures in test-windows7-32-qr/opt-mochitest-browser-chrome-fis-e10s-5 (including several novel-looking failures at the end of js/xpconnect/tests/browser):
Here's a try push focusing on js/xpconnect/tests/browser. Maybe I can get better logging of this problem on some other platform.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9572c441295de22e708579498b95715e7f228d7
Assignee | ||
Comment 10•2 years ago
|
||
Yeah, that only seems to happen on windows7 jobs, and it does not appear that we have any windows7 debug jobs even with --full...
Assignee | ||
Comment 11•2 years ago
|
||
Any idea why this patch might be causing a new hang-on-finish in the xpconnect bc tests?
Comment 12•2 years ago
|
||
I'm not sure why that would be happening. I do notice that the log has the line [task 2022-02-14T23:20:25.127Z] 23:20:25 INFO - GECKO(3384) | [Exiting due to channel error.
(https://treeherder.mozilla.org/logviewer?job_id=367873398&repo=try&lineNumber=1955) which suggests that the parent process crashed in some way & lost connection to a child process unexpectedly (child processes log that when the parent process disappears unexpectedly & they quick-exit), though that doesn't line up super well with the application hanging.
Given that this failure is occuring on opt builds, we wouldn't have deadlock detection running, so one of the most likely options here is that we're encountering a deadlock somewhere in the parent process. Unfortunately I don't really have any idea where that would be.
Assignee | ||
Comment 13•2 years ago
|
||
Hmm. The work in progress over in bug 1755318 makes this happen far more often on try:
https://treeherder.mozilla.org/jobs?repo=try&revision=46df533e014f3fda1f17c0d5f5cc0b447e658f52
I think this will have to block bug 1755318.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 14•2 years ago
|
||
Seems that the issue does not reproduce when I hack win7 opt to be more in line with a debug build:
https://treeherder.mozilla.org/jobs?repo=try&revision=a7a34bddcf240948fa697573bcc7a8c25da1bc59
It is possible that disabling assertions prevents the issue, but when assertions are left enabled, we hit assertions and crash before we even run any tests.
It may be that I need to selectively enable just the debugging features I want (eg; just deadlock detection).
Assignee | ||
Comment 15•2 years ago
•
|
||
Actually, there's another possibility; maybe the rebase I just did fixed the issue. Here's a try push for that:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=427fca1b66223e6904e0603f2884b4fdabeb6599
Edit: Well that's a lucky break. A rebase has either stopped the issue, or made it so uncommon that it doesn't show up in 20 runs.
Comment 16•2 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/68192b01f87d Use unique_ptr instead of Maybe for mTailDispatcher to avoid reentrancy. r=nika
Comment 17•2 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Description
•