Crash in [@ mozilla::dom::Element::ClassList] on JS_SWEPT_TENURED_PATTERN poison values
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Tracking
()
People
(Reporter: mccr8, Assigned: jandem)
References
Details
(4 keywords, Whiteboard: [adv-main111+r][adv-esr102.9+r])
Crash Data
Attachments
(5 files)
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
dmeehan
:
approval-mozilla-esr102+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
758 bytes,
text/html
|
Details |
Crash report: https://crash-stats.mozilla.org/report/index/15f92ed6-ad90-4d68-aea7-6425c0230103
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 4 frames of crashing thread:
0 xul.dll RefPtr<nsDOMTokenList>::operator! const mfbt/RefPtr.h:311
0 xul.dll mozilla::dom::Element::ClassList dom/base/Element.cpp:622
1 xul.dll mozilla::dom::Element_Binding::get_classList dom/bindings/ElementBinding.cpp:1361
2 ? @0x000001f7fdd21aef
There aren't a ton of these crashes, but almost all of them are on the poison pattern 0x4b, which looks to be JS_SWEPT_TENURED_PATTERN. The objects in question are all DOM things, so maybe the bindings layer unwrapped a poison JS object and we're dereferencing that?
I came across this while investigating bug 1808175, which is another later crash in Element_Binding::get_classList, on a value that is null except with one bit set.
Reporter | ||
Comment 1•1 year ago
|
||
This same crash also shows up (including poison values) under a slightly different signature, though many of the crashes with this signature are different: bp-04164448-0b5e-4b04-8a44-c7bc20230103
Reporter | ||
Comment 2•1 year ago
|
||
Peter, any ideas? I'm not sure if this is actionable or not. I'm not sure if there's an easy way to figure out if we're hitting poison values like this in on things besides Element_Binding::get_classList.
Reporter | ||
Comment 3•1 year ago
|
||
The two crashes I linked have very deep JIT stacks. Maybe we're hitting some weird boundary condition causing a stack limit to be hit, so some random JSAPI call is failing and causing issues. Of course, that's not very useful because it could be basically anything before this call.
Comment 4•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 5•1 year ago
|
||
Another cluster of crashes I found on the 0x4b poison values is [@ nsINode::GetBoolFlag ], such as: bp-e1b8ea4c-3e1d-4df3-b864-d5bca0230112
I looked at all 68 of the crashes with that signature that had 4b4b in the crash address in the last two weeks, and all of them were via Node_Binding::get_parentElement. All of those that had stacks that weren't truncated in the JIT code were running code via nsGlobalWindowInner::RunTimeoutHandler, similar to some of the other crashes in this bug. The stacks didn't look super deep, though.
Comment 7•1 year ago
|
||
Copying crash signatures from duplicate bugs.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 8•1 year ago
|
||
It definitely feels like something is going on here, but I don't think we have enough information to take action here.
Reporter | ||
Comment 9•1 year ago
|
||
In the DOM security meeting on Tuesday, we looked at some of these crashes. Nika loaded one of the minidumps and to our surprise, void_self was not a poison value, but instead the value loaded from void_self was the poison value.
We thought that this means that the JS object that the bindings method is being called on is not poison, but rather it has a private (or whatever slot contains the native object) that points to a heap location that is poisoned by JS_SWEPT_TENURED_PATTERN.
That in turn means that the "DOM reflector" is pointing at a JS GC thing and not something allocator with regular C++ allocators (either jemalloc or the arena allocators), which suggests that the JS object is not actually a DOM reflector, but is instead some other kind of JS object. Peter said he'd talk to a JIT person about this, as this sort of type confusion could indicate a JIT issue.
Somebody suggested that at least for the arena allocated DOM things we could check in the bindings code if the "native" object is arena allocated.
It would probably be good for somebody to check a few more of these "DOM bindings crashes on GC poisoned values" crashes to make sure that is what is usually happening and not just a fluke for the crash we looked at.
Reporter | ||
Comment 10•1 year ago
|
||
I found a few similar-ish looking crashes on JS_MOVED_TENURED_PATTERN, 0x49.
bp-a48c89c4-338a-40fc-b04f-a12b60230214
bp-b8a322c0-46ef-47f5-bf73-6749c0230203
bp-2d780c9e-c75d-4410-8e31-0c4220230214
The volume is much lower than JS_SWEPT_TENURED_PATTERN.
Reporter | ||
Comment 11•1 year ago
|
||
Bug 1816975 looks extremely similar. 13% of the crashes with that signature are specifically on the value 0x0000530700005217, while another 6% are on 0x00000b0700000a17. Generally, the crashes look like they have way too many zeros in them to be legit addresses, so I think this further supports the theory that the DOM bindings are getting handed a pointer into a JS allocated object and not a jemalloc allocated C++ object. Bug 1816975 is also interesting because 90% of the crashes have Element_Binding::get_classList in the proto stack.
Jan, do you have any ideas what might be going on here? Peter started working on his idea to dynamically check the validity of the DOM object being passed in from JS, but maybe you have some ideas from the JIT side about what kind of type confusion could be going on. Thanks.
Assignee | ||
Comment 12•1 year ago
|
||
This is likely caused by object-swapping: our Ion codegen expects the DOM_OBJECT_SLOT
to be a fixed slot, but that can be false when swapping between multiple globals. In this case we can swap a native DOM object with a proxy object that uses an external value array, and the native object will use dynamic slots too.
Here's a shell test that asserts in debug builds:
let domObject = new FakeDOMObject();
let expectedValue = domObject.x;
let {object, transplant} = transplantableObject({object: domObject});
assertEq(object, domObject);
let global1 = newGlobal({newCompartment: true});
let global2 = newGlobal({newCompartment: true});
transplant(global1);
transplant(global2);
transplant(global1);
assertEq(object, domObject);
assertEq(domObject.x, expectedValue);
global1.domObj = domObject;
global1.expectedValue = expectedValue;
global1.evaluate(`
function f() {
for (var i = 0; i < 10000; i++) {
assertEq(domObj.x, expectedValue);
}
}
f();
`);
Updated•1 year ago
|
Assignee | ||
Comment 13•1 year ago
|
||
Assignee | ||
Comment 14•1 year ago
|
||
Depends on D170909
Updated•1 year ago
|
Assignee | ||
Comment 15•1 year ago
|
||
My shell test case asserts on ESR102 too. This might go back to when we added DOM getters/setters to CacheIR but I'm not completely sure.
Assignee | ||
Comment 16•1 year ago
|
||
Next week I'll see if I can create a browser test case for this too.
Reporter | ||
Comment 17•1 year ago
|
||
Amazing! Thanks, Jan. Does it seem like this could also be the cause of bug 1816975? They looked like similar issues to me, so I'll mark the dependency for now.
Assignee | ||
Comment 18•1 year ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #17)
Amazing! Thanks, Jan. Does it seem like this could also be the cause of bug 1816975? They looked like similar issues to me, so I'll mark the dependency for now.
The crash in comment 0 in that bug is probably different (as you noticed already), but the ones in comment 3 look similar to this one.
Assignee | ||
Comment 19•1 year ago
|
||
Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not super easy.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: Easy.
- How likely is this patch to cause regressions; how much testing does it need?: Unlikely, it's just disabling an optimization in a corner case.
- Is Android affected?: Yes
Comment 20•1 year ago
|
||
Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!
Approved to request uplift and land
Updated•1 year ago
|
Assignee | ||
Comment 21•1 year ago
|
||
Depends on D170910
Assignee | ||
Comment 22•1 year ago
|
||
Assignee | ||
Comment 23•1 year ago
|
||
Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!
Beta/Release Uplift Approval Request
- User impact if declined: Crashes, security bugs.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Very low risk: it disables an optimization for a specific corner case.
- String changes made/needed: N/A
- Is Android affected?: Yes
Comment 24•1 year ago
|
||
Ensure we have fixed slots in CanAttachDOMCall. r=iain
https://hg.mozilla.org/integration/autoland/rev/9d5edd36a16c8748e47c5a9ed6002d9b6e96506f
https://hg.mozilla.org/mozilla-central/rev/9d5edd36a16c
Comment 25•1 year ago
|
||
Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!
Approved for 111.0b7
Comment 26•1 year ago
|
||
uplift |
Updated•1 year ago
|
Comment 27•1 year ago
|
||
Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!
Approved for 102.9esr.
Comment 28•1 year ago
|
||
uplift |
Updated•1 year ago
|
Reporter | ||
Comment 29•1 year ago
|
||
There have been about a dozen crashes on beta and Nightly with the [@ nsINode::HasSlots ] signature since this patch landed, but they are all null derefs, and look like a single different issue. I filed bug 1820003 for that.
I looked at all crashes in Nightly and beta that have this patch applied for crashes on poison values, and the remaining crashes are mostly GC crashes. The classlist crashes seem to be gone entirely. I did see a couple of crashes inside DOM bindings code like this, but I only see two, and they are different from each other, so it might be a fluke. I'll file a new issue for them in case something can be done.
Reporter | ||
Comment 30•1 year ago
|
||
This is really more like a type confusion inside JIT code than a UAF per se. It just happens that often the value we're confused with is a pointer into freed JS memory.
Reporter | ||
Comment 31•1 year ago
|
||
Nika suggested that a similar situation could be set up via a popup window rather than document.domain. It might be nice to have that additionally, as Chrome is moving towards getting rid of document.domain so we might be able to get rid of it, too and then we'd have to go back and figure out what this test was doing. Though we do have the shell test.
Reporter | ||
Comment 32•1 year ago
|
||
Comment hidden (obsolete) |
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Comment 34•1 year ago
•
|
||
Here's a minor variation of the popup version. I removed the Mochitest stuff, and it is a single file. It crashes in an unpatched build for me when run from a plain old file:// URI.
Comment 35•1 year ago
|
||
CCing Michael Smith from PLSysSec to investigate whether the verified CacheIR compiler that they're working on could have caught this bug.
Comment 36•1 year ago
|
||
a month ago, Tom Ritter [:tjr] placed a reminder on the bug using the whiteboard tag [reminder-test 2023-04-24]
.
jandem, please refer to the original comment to better understand the reason for the reminder.
Comment 37•1 year ago
|
||
Landed but backed out:
Add test and assertions. r=iain
https://hg.mozilla.org/integration/autoland/rev/ae9bde926915393057671225e3f1ca41410218a3
Add a browser test. r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2970eb267939f8a8a2a88e6c56e8ddeb9ca2f2ee
Browser test, using a popup window. r=jandem
https://hg.mozilla.org/integration/autoland/rev/631384644b876efe90c9a79f9d8940c135d18454
Backed out for causing failures of test_bug1808352.html:
https://hg.mozilla.org/integration/autoland/rev/a1f480a4dd095022a0928bd3172fb7f77a2c2237
Push with failures
Failure log
TEST-UNEXPECTED-FAIL | dom/bindings/test/test_bug1808352.html | Test timed out. -
Assignee | ||
Comment 38•1 year ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #37)
Landed but backed out:
This is with the HTTP3 mochitest server, it doesn't support test1.mochi.test
. That's bug 1827526. I'll add a skip-if similar to what was done for other tests.
Comment 39•1 year ago
|
||
Add test and assertions. r=iain
https://hg.mozilla.org/integration/autoland/rev/32e5affae4b2ad190a15459fc1dd307c0255e505
https://hg.mozilla.org/mozilla-central/rev/32e5affae4b2
Add a browser test. r=mccr8
https://hg.mozilla.org/integration/autoland/rev/1799623b1af9238fc2155e3c7d248eecc0597397
https://hg.mozilla.org/mozilla-central/rev/1799623b1af9
Browser test, using a popup window. r=jandem
https://hg.mozilla.org/integration/autoland/rev/601cb6ec3301fff41f5897d28381001aecac4c9b
https://hg.mozilla.org/mozilla-central/rev/601cb6ec3301
Updated•6 months ago
|
Description
•