Closed Bug 1808352 Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::dom::Element::ClassList] on JS_SWEPT_TENURED_PATTERN poison values

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 111+ fixed
firefox110 - wontfix
firefox111 + fixed
firefox112 + fixed

People

(Reporter: mccr8, Assigned: jandem)

References

Details

(4 keywords, Whiteboard: [adv-main111+r][adv-esr102.9+r])

Crash Data

Attachments

(5 files)

Crash report: https://crash-stats.mozilla.org/report/index/15f92ed6-ad90-4d68-aea7-6425c0230103

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 4 frames of crashing thread:

0  xul.dll  RefPtr<nsDOMTokenList>::operator! const  mfbt/RefPtr.h:311
0  xul.dll  mozilla::dom::Element::ClassList  dom/base/Element.cpp:622
1  xul.dll  mozilla::dom::Element_Binding::get_classList  dom/bindings/ElementBinding.cpp:1361
2  ?  @0x000001f7fdd21aef  

There aren't a ton of these crashes, but almost all of them are on the poison pattern 0x4b, which looks to be JS_SWEPT_TENURED_PATTERN. The objects in question are all DOM things, so maybe the bindings layer unwrapped a poison JS object and we're dereferencing that?

I came across this while investigating bug 1808175, which is another later crash in Element_Binding::get_classList, on a value that is null except with one bit set.

This same crash also shows up (including poison values) under a slightly different signature, though many of the crashes with this signature are different: bp-04164448-0b5e-4b04-8a44-c7bc20230103

Peter, any ideas? I'm not sure if this is actionable or not. I'm not sure if there's an easy way to figure out if we're hitting poison values like this in on things besides Element_Binding::get_classList.

Severity: -- → S2
Flags: needinfo?(peterv)
Keywords: sec-high

The two crashes I linked have very deep JIT stacks. Maybe we're hitting some weird boundary condition causing a stack limit to be hit, so some random JSAPI call is failing and causing issues. Of course, that's not very useful because it could be basically anything before this call.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

Another cluster of crashes I found on the 0x4b poison values is [@ nsINode::GetBoolFlag ], such as: bp-e1b8ea4c-3e1d-4df3-b864-d5bca0230112

I looked at all 68 of the crashes with that signature that had 4b4b in the crash address in the last two weeks, and all of them were via Node_Binding::get_parentElement. All of those that had stacks that weren't truncated in the JIT code were running code via nsGlobalWindowInner::RunTimeoutHandler, similar to some of the other crashes in this bug. The stacks didn't look super deep, though.

Duplicate of this bug: 1811228

Copying crash signatures from duplicate bugs.

Crash Signature: [@ mozilla::dom::Element::ClassList] → [@ mozilla::dom::Element::ClassList] [@ mozilla::dom::Element_Binding::get_classList]
Crash Signature: [@ mozilla::dom::Element::ClassList] [@ mozilla::dom::Element_Binding::get_classList] → [@ mozilla::dom::Element::ClassList] [@ mozilla::dom::Element_Binding::get_classList] [@ nsINode::HasSlots ]

It definitely feels like something is going on here, but I don't think we have enough information to take action here.

Keywords: stalled

In the DOM security meeting on Tuesday, we looked at some of these crashes. Nika loaded one of the minidumps and to our surprise, void_self was not a poison value, but instead the value loaded from void_self was the poison value.

We thought that this means that the JS object that the bindings method is being called on is not poison, but rather it has a private (or whatever slot contains the native object) that points to a heap location that is poisoned by JS_SWEPT_TENURED_PATTERN.

That in turn means that the "DOM reflector" is pointing at a JS GC thing and not something allocator with regular C++ allocators (either jemalloc or the arena allocators), which suggests that the JS object is not actually a DOM reflector, but is instead some other kind of JS object. Peter said he'd talk to a JIT person about this, as this sort of type confusion could indicate a JIT issue.

Somebody suggested that at least for the arena allocated DOM things we could check in the bindings code if the "native" object is arena allocated.

It would probably be good for somebody to check a few more of these "DOM bindings crashes on GC poisoned values" crashes to make sure that is what is usually happening and not just a fluke for the crash we looked at.

Summary: Crash in [@ mozilla::dom::Element::ClassList] on poison values → Crash in [@ mozilla::dom::Element::ClassList] on JS_SWEPT_TENURED_PATTERN poison values

I found a few similar-ish looking crashes on JS_MOVED_TENURED_PATTERN, 0x49.

bp-a48c89c4-338a-40fc-b04f-a12b60230214
bp-b8a322c0-46ef-47f5-bf73-6749c0230203
bp-2d780c9e-c75d-4410-8e31-0c4220230214

The volume is much lower than JS_SWEPT_TENURED_PATTERN.

Bug 1816975 looks extremely similar. 13% of the crashes with that signature are specifically on the value 0x0000530700005217, while another 6% are on 0x00000b0700000a17. Generally, the crashes look like they have way too many zeros in them to be legit addresses, so I think this further supports the theory that the DOM bindings are getting handed a pointer into a JS allocated object and not a jemalloc allocated C++ object. Bug 1816975 is also interesting because 90% of the crashes have Element_Binding::get_classList in the proto stack.

Jan, do you have any ideas what might be going on here? Peter started working on his idea to dynamically check the validity of the DOM object being passed in from JS, but maybe you have some ideas from the JIT side about what kind of type confusion could be going on. Thanks.

Flags: needinfo?(jdemooij)

This is likely caused by object-swapping: our Ion codegen expects the DOM_OBJECT_SLOT to be a fixed slot, but that can be false when swapping between multiple globals. In this case we can swap a native DOM object with a proxy object that uses an external value array, and the native object will use dynamic slots too.

Here's a shell test that asserts in debug builds:

let domObject = new FakeDOMObject();
let expectedValue = domObject.x;

let {object, transplant} = transplantableObject({object: domObject});
assertEq(object, domObject);

let global1 = newGlobal({newCompartment: true});
let global2 = newGlobal({newCompartment: true});

transplant(global1);
transplant(global2);
transplant(global1);

assertEq(object, domObject);
assertEq(domObject.x, expectedValue);

global1.domObj = domObject;
global1.expectedValue = expectedValue;

global1.evaluate(`
function f() {
    for (var i = 0; i < 10000; i++) {
        assertEq(domObj.x, expectedValue);
    }
}
f();
`);
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Component: DOM: Bindings (WebIDL) → JavaScript Engine: JIT
Flags: needinfo?(peterv)
Group: dom-core-security → javascript-core-security

Depends on D170909

Attachment #9319707 - Attachment description: Bug 1808352 - Add tests and assertions. r?iain! → Bug 1808352 - Add test and assertions. r?iain!

My shell test case asserts on ESR102 too. This might go back to when we added DOM getters/setters to CacheIR but I'm not completely sure.

Next week I'll see if I can create a browser test case for this too.

See Also: 1808175
Blocks: 1816975

Amazing! Thanks, Jan. Does it seem like this could also be the cause of bug 1816975? They looked like similar issues to me, so I'll mark the dependency for now.

Flags: needinfo?(jdemooij)

(In reply to Andrew McCreight [:mccr8] from comment #17)

Amazing! Thanks, Jan. Does it seem like this could also be the cause of bug 1816975? They looked like similar issues to me, so I'll mark the dependency for now.

The crash in comment 0 in that bug is probably different (as you noticed already), but the ones in comment 3 look similar to this one.

Flags: needinfo?(jdemooij)

Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not super easy.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: Easy.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely, it's just disabling an optimization in a corner case.
  • Is Android affected?: Yes
Attachment #9319706 - Flags: sec-approval?

Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!

Approved to request uplift and land

Attachment #9319706 - Flags: sec-approval? → sec-approval+
Whiteboard: [reminder-test 2023-04-24]

Depends on D170910

Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes, security bugs.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very low risk: it disables an optimization for a specific corner case.
  • String changes made/needed: N/A
  • Is Android affected?: Yes
Attachment #9319706 - Flags: approval-mozilla-esr102?
Attachment #9319706 - Flags: approval-mozilla-beta?
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!

Approved for 111.0b7

Attachment #9319706 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Priority: -- → P1

Comment on attachment 9319706 [details]
Bug 1808352 - Ensure we have fixed slots in CanAttachDOMCall. r?iain!

Approved for 102.9esr.

Attachment #9319706 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
QA Whiteboard: [post-critsmash-triage]

There have been about a dozen crashes on beta and Nightly with the [@ nsINode::HasSlots ] signature since this patch landed, but they are all null derefs, and look like a single different issue. I filed bug 1820003 for that.

I looked at all crashes in Nightly and beta that have this patch applied for crashes on poison values, and the remaining crashes are mostly GC crashes. The classlist crashes seem to be gone entirely. I did see a couple of crashes inside DOM bindings code like this, but I only see two, and they are different from each other, so it might be a fluke. I'll file a new issue for them in case something can be done.

This is really more like a type confusion inside JIT code than a UAF per se. It just happens that often the value we're confused with is a pointer into freed JS memory.

See Also: → 1820602

Nika suggested that a similar situation could be set up via a popup window rather than document.domain. It might be nice to have that additionally, as Chrome is moving towards getting rid of document.domain so we might be able to get rid of it, too and then we'd have to go back and figure out what this test was doing. Though we do have the shell test.

Whiteboard: [reminder-test 2023-04-24] → [reminder-test 2023-04-24][adv-main111+r]
Whiteboard: [reminder-test 2023-04-24][adv-main111+r] → [reminder-test 2023-04-24][adv-main111+r][adv-esr102.9+r]

Here's a minor variation of the popup version. I removed the Mochitest stuff, and it is a single file. It crashes in an unpatched build for me when run from a plain old file:// URI.

CCing Michael Smith from PLSysSec to investigate whether the verified CacheIR compiler that they're working on could have caught this bug.

a month ago, Tom Ritter [:tjr] placed a reminder on the bug using the whiteboard tag [reminder-test 2023-04-24] .

jandem, please refer to the original comment to better understand the reason for the reminder.

Flags: needinfo?(jdemooij)
Whiteboard: [reminder-test 2023-04-24][adv-main111+r][adv-esr102.9+r] → [adv-main111+r][adv-esr102.9+r]

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #37)

Landed but backed out:

This is with the HTTP3 mochitest server, it doesn't support test1.mochi.test. That's bug 1827526. I'll add a skip-if similar to what was done for other tests.

Group: core-security-release
See Also: → 1860803
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: