Crash in [@ mozilla::ThreadBound<T>::AssertIsCorrectThread] with poison value in register
Categories
(Core :: DOM: Workers, defect, P2)
Tracking
()
People
(Reporter: release-mgmt-account-bot, Assigned: jstutte)
References
(Blocks 1 open bug)
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main123+r][adv-esr115.8+r])
Crash Data
Attachments
(3 files, 2 obsolete files)
|
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr115+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/7d6ca59a-ad61-42f8-955a-38c180231118
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(IsCorrectThread())
Top 10 frames of crashing thread:
0 xul.dll AnnotateMozCrashReason mfbt/Assertions.h:46
0 xul.dll mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::AssertIsCorrectThread const xpcom/threads/ThreadBound.h:135
0 xul.dll mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::Accessor<1>::Accessor xpcom/threads/ThreadBound.h:100
0 xul.dll mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::Access const xpcom/threads/ThreadBound.h:125
0 xul.dll mozilla::dom::WorkerPrivate::GetCurrentEventLoopGlobal const dom/workers/WorkerPrivate.h:478
0 xul.dll mozilla::dom::WorkerRunnable::Run dom/workers/WorkerRunnable.cpp:245
1 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1192
2 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:480
2 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
3 xul.dll MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:370
By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:
- First crash report: 2023-10-10
- Process type: Content
- Is startup crash: No
- Has user comments: No
- Is null crash: No
| Reporter | ||
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::DOM: Workers' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•2 years ago
|
||
Eden, does this ring any bells? This is something quite new.
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 3•2 years ago
|
||
I cracked up a minidump and I see "rbx": "0xe5e5e5e5e5e5e5e5". Apparently we run a WorkerRunnable after our WorkerThreadPrimaryRunnable already finished and the WorkerPrivate object has been freed. This seems to be a clear UAF, hiding behind that Accessible assertion.
Updated•2 years ago
|
Updated•2 years ago
|
| Reporter | ||
Comment 4•2 years ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jstutte, could you consider increasing the severity of this security bug?
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 5•2 years ago
|
||
Yes, that needs more attention.
| Assignee | ||
Comment 6•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
Apparently we run a
WorkerRunnableafter ourWorkerThreadPrimaryRunnablealready finished and theWorkerPrivateobject has been freed. .
I wonder if this is somehow related to bug 1836937. I'd assume any event able to crash like this would have triggered that assertion (though not any event triggering it might cause this crash, of course).
Technically we have all the information to see, if we have an "illegitimate" WorkerRunnable::Run - on a worker thread we could check nsThread::RecursionDepth to see if we are still running on top of the WorkerThreadPrimaryRunnable and we should know, if we are at all running on the worker thread, too.
| Assignee | ||
Comment 7•2 years ago
|
||
Depends on D199228
| Assignee | ||
Comment 8•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #6)
Technically we have all the information to see, if we have an "illegitimate"
WorkerRunnable::Run- on a worker thread we could checknsThread::RecursionDepthto see if we are still running on top of theWorkerThreadPrimaryRunnableand we should know, if we are at all running on the worker thread, too.
The WIP patch is basically proposing that, and providing the runnable name in the log based on bug 1875800.
Updated•2 years ago
|
| Assignee | ||
Comment 9•2 years ago
|
||
Comment on attachment 9375820 [details]
Bug 1867982 - Check if WorkerRunnable::Run runs on top of WorkerThreadPrimaryRunnable::Run in a worker thread. r?#dom-worker-reviewers
Security Approval Request
- How easily could an exploit be constructed based on the patch?: The patch is seemingly just adding diagnostics (except for the early return), although it does so in a suspiciously complicated way (as we need to avoid to touch
mWorkerPrivate). Neither do we know how to force aWorkerRunnableto be run such late. - Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: probably all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: The larger changeset from bug 1875800 we use here for diagnostics is probably not needed, we could make a variant of this patch that just introduces the early return and the generic diagnostic assertion. Other than that the code should be largely the same in ESR.
- How likely is this patch to cause regressions; how much testing does it need?: The patch is expected to move the intermittent assertion from bug 1836937 here, which could be seen as a regression but actually is not.
Other than that, normal baking times should be enough. - Is Android affected?: Yes
Comment 10•2 years ago
|
||
Comment on attachment 9375820 [details]
Bug 1867982 - Check if WorkerRunnable::Run runs on top of WorkerThreadPrimaryRunnable::Run in a worker thread. r?#dom-worker-reviewers
Approved to land and uplift if desired
Comment 11•2 years ago
|
||
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
Backed out for causing failures at WorkerRunnable.cpp:
https://hg.mozilla.org/integration/autoland/rev/638b234fcefab7a54ef2d2f7e5c872cba1bf4bb7
Push with failures
Failure log
task 2024-01-25T08:14:11.988Z] 08:14:11 INFO - TEST-START | dom/workers/test/test_WorkerDebugger.postMessage.xhtml
[task 2024-01-25T08:14:11.988Z] 08:14:11 INFO - GECKO(1941) | [Parent 1941, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/chrome/nsChromeRegistry.cpp:182
[task 2024-01-25T08:14:11.988Z] 08:14:11 INFO - GECKO(1941) | [Parent 1941, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/chrome/nsChromeProtocolHandler.cpp:73
[task 2024-01-25T08:14:12.209Z] 08:14:12 INFO - GECKO(1941) | MEMORY STAT | vsize 11530MB | residentFast 521MB | heapAllocated 227MB
[task 2024-01-25T08:14:12.222Z] 08:14:12 INFO - GECKO(1941) | [Parent 1941, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/chrome/nsChromeRegistry.cpp:182
[task 2024-01-25T08:14:12.223Z] 08:14:12 INFO - GECKO(1941) | [Parent 1941, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/chrome/nsChromeProtocolHandler.cpp:73
[task 2024-01-25T08:14:12.251Z] 08:14:12 INFO - GECKO(1941) | [Parent 1941, Main Thread] WARNING: Failed to retarget HTML data delivery to the parser thread.: file /builds/worker/checkouts/gecko/parser/html/nsHtml5StreamParser.cpp:1227
[task 2024-01-25T08:14:12.348Z] 08:14:12 INFO - GECKO(1941) | Assertion failure: false (On a worker, a worker runnable should only run on top of WorkerThreadPrimaryRunnable.), at /builds/worker/checkouts/gecko/dom/workers/WorkerRunnable.cpp:253
[task 2024-01-25T08:14:12.349Z] 08:14:12 INFO - GECKO(1941) | #01: mozilla::dom::WorkerRunnable::Run() [dom/workers/WorkerRunnable.cpp:251]
[task 2024-01-25T08:14:12.350Z] 08:14:12 INFO - GECKO(1941) | #02: mozilla::dom::WorkerControlRunnable::Cancel() [dom/workers/WorkerRunnable.cpp:524]
[task 2024-01-25T08:14:12.352Z] 08:14:12 INFO - GECKO(1941) | #03: mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) [dom/workers/WorkerPrivate.cpp:3319]
[task 2024-01-25T08:14:12.353Z] 08:14:12 INFO - GECKO(1941) | #04: mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() [dom/workers/RuntimeService.cpp:2111]
[task 2024-01-25T08:14:12.354Z] 08:14:12 INFO - GECKO(1941) | #05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1194]
[task 2024-01-25T08:14:12.358Z] 08:14:12 INFO - GECKO(1941) | #06: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:480]
[task 2024-01-25T08:14:12.360Z] 08:14:12 INFO - GECKO(1941) | #07: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:301]
[task 2024-01-25T08:14:12.360Z] 08:14:12 INFO - GECKO(1941) | #08: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:346]
[task 2024-01-25T08:14:12.361Z] 08:14:12 INFO - GECKO(1941) | #09: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:372]
[task 2024-01-25T08:14:12.385Z] 08:14:12 INFO - GECKO(1941) | #10: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2024-01-25T08:14:12.386Z] 08:14:12 INFO - GECKO(1941) | #11: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x6a95d]
[task 2024-01-25T08:14:12.387Z] 08:14:12 INFO - GECKO(1941) | #12: ??? [/lib/x86_64-linux-gnu/libpthread.so.0 + 0x76db]
[task 2024-01-25T08:14:12.390Z] 08:14:12 INFO - GECKO(1941) | #13: clone [/lib/x86_64-linux-gnu/libc.so.6 + 0x121a3f]
[task 2024-01-25T08:14:12.391Z] 08:14:12 INFO - GECKO(1941) | #14: ??? (???:???)
| Assignee | ||
Comment 14•2 years ago
•
|
||
Strange that this did not fall out with previous testing.
There are some runnables that call Run on Cancel.
Edit: This seems to happen mostly for runs without e10s / fission enabled, which is definitely a less tested configuration these days.
Updated•2 years ago
|
| Assignee | ||
Comment 15•2 years ago
|
||
| Assignee | ||
Updated•2 years ago
|
Comment 16•2 years ago
|
||
Before I pop into the data review portion: you might not know this, but data reviews must be publicly available to the populations having data collected. Will this bug be made public before the data starts being collected? If not, we'll need to take the data review to a separate, public, bug.
| Assignee | ||
Comment 17•2 years ago
|
||
Oh, yes, I did not know that, thanks. And no, we cannot make this bug public right now. I'll downgrade the thing to a diagnostic assertion without runnable name for now and move the data review part to a new bug (or bug 1836937).
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Comment 19•2 years ago
|
||
| Reporter | ||
Comment 20•2 years ago
|
||
Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 21•2 years ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D199247
Updated•2 years ago
|
Updated•2 years ago
|
| Assignee | ||
Comment 22•2 years ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D199247
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Uplift Approval Request
- Risk associated with taking this patch: Low
- String changes made/needed: n/a
- Steps to reproduce for manual QE testing: n/a
- Fix verified in Nightly: yes
- Is Android affected?: yes
- Code covered by automated testing: yes
- Explanation of risk level: Essentially this adds just a check if the worker runnable will find the expected conditions to run.
- User impact if declined: Potential UAF and/or crashes.
- Needs manual QE test: no
Updated•2 years ago
|
| Assignee | ||
Comment 24•2 years ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D199247
Updated•2 years ago
|
Comment 25•2 years ago
|
||
Uplift Approval Request
- Steps to reproduce for manual QE testing: n/a
- Is Android affected?: yes
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- User impact if declined: Potential UAF and/or crashes.
- Explanation of risk level: Essentially this adds just a check if the worker runnable will find the expected conditions to run.
- Needs manual QE test: no
- Risk associated with taking this patch: Low
- String changes made/needed: n/a
Updated•2 years ago
|
Updated•2 years ago
|
Comment 26•2 years ago
|
||
| uplift | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 28•2 years ago
|
||
| uplift | ||
Updated•2 years ago
|
| Assignee | ||
Comment 29•2 years ago
|
||
Thanks for solving the rebase!
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•