Closed Bug 1503725 Opened 2 years ago Closed 2 years ago

Crash in nsThread::MaybeRemoveFromThreadList

Categories

(Core :: XPCOM, defect)

64 Branch
All
Windows
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla65
Tracking Status
firefox-esr60 --- unaffected
firefox63 --- unaffected
firefox64 + verified
firefox65 + verified

People

(Reporter: philipp, Assigned: valentin)

References

Details

(4 keywords, Whiteboard: [post-critsmash-triage])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-07336baa-dd9d-4e87-b852-e40620181031.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll void nsThread::MaybeRemoveFromThreadList xpcom/threads/nsThread.cpp:447
1 xul.dll nsThreadShutdownAckEvent::Run xpcom/threads/nsThread.cpp:294
2 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1245
3 xul.dll NS_ProcessPendingEvents xpcom/threads/nsThreadUtils.cpp:472
4 xul.dll static nsresult mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:906
5 xul.dll void ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1429
6 xul.dll int XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4950
7 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:5014
8 xul.dll mozilla::BootstrapImpl::XRE_main toolkit/xre/Bootstrap.cpp:49
9 firefox.exe static int do_main browser/app/nsBrowserApp.cpp:233

=============================================================

this signature is newly popping up since nightly build 65.0a1 20181027100344 and subsequently uplifted to 64.0b5. the timing for this would point to involvement of bug 1500861.
the crashing address is indicating a uaf situation in many occasions.
I wonder if we actually need a lock when we call `isInList`. Although the crash address looks UAF, so perhaps were calling `isInList` on an already destroyed thread. I think maybe the problem is that we need to somehow cancel the `nsThreadShutdownAckEvent` when we timeout, otherwise it's referencing a bad `nsThreadShutdownContext`.
Group: core-security → dom-core-security
(In reply to Eric Rahm [:erahm] from comment #1)
> I wonder if we actually need a lock when we call `isInList`. Although the
> crash address looks UAF, so perhaps were calling `isInList` on an already
> destroyed thread. I think maybe the problem is that we need to somehow
> cancel the `nsThreadShutdownAckEvent` when we timeout, otherwise it's
> referencing a bad `nsThreadShutdownContext`.

What's happening here is that we remove the contexts we intend to leak from mRequestedShutdownContexts - which means that they get freed. If one of the leaked threads happens to complete before we actually shutdown, it will crash on this.
We could make the shutdown context refcountable. Or to make sure that nsThreadShutdownAckEvent doesn't run.
Assignee: nobody → valentin.gosu
Blocks: 1500861
(In reply to Eric Rahm [:erahm] from comment #1)
> I wonder if we actually need a lock when we call `isInList`.

I think we do. We used to not, and there was a long comment explaining why, but after my last refactor some new races showed up.
Sometimes when we call ShutdownWithTimeout on a thread pool, the unresponsive
thread that we leak will actually complete before the main thread is done.
In that case, the thread will dereference the thread shutdown context, so
we must intentionally leak it too.
https://hg.mozilla.org/mozilla-central/rev/73a273670597
Group: dom-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Comment on attachment 9021989 [details]
Bug 1503725 - Do not deallocate nsThreadShutdownContext when leaking thread r?erahm!

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1500861

User impact if declined: Possible UAF and crash at shutdown when a thread we intentionally leak becomes unstuck and completes before the main thread.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This is a very small change - for the threads that we do leak we also leak the shutdown context.

String changes made/needed:
Attachment #9021989 - Flags: approval-mozilla-beta?
Comment on attachment 9021989 [details]
Bug 1503725 - Do not deallocate nsThreadShutdownContext when leaking thread r?erahm!

[Triage Comment]
Fixes a new crash introduced by bug 1500861. Approved for 64.0b7.
Attachment #9021989 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
We can verify it based on crash reports from Socorro.
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
Based on the summary from Socorro, I'm marking this issue Verified Fixed since it appears that the crash was not encountered on Firefox 64b7, or on newer builds.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.