Open Bug 1893836 Opened 1 year ago Updated 1 year ago

Crash in [@ js::Shape::objectFlags]

Categories

(Core :: JavaScript Engine, defect, P5)

Other
Windows 11
defect

Tracking

()

Tracking Status
firefox127 --- affected

People

(Reporter: release-mgmt-account-bot, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/7069194d-b838-413b-b6a5-a7cc60240427

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0  xul.dll  js::Shape::objectFlags const  js/src/vm/Shape.h:401
0  xul.dll  JSObject::hasAnyFlag const  js/src/vm/JSObject.h:166
0  xul.dll  js::Watchtower::watchesPropertyAdd  js/src/vm/Watchtower.h:54
0  xul.dll  js::Watchtower::watchPropertyAdd  js/src/vm/Watchtower.h:88
0  xul.dll  js::NativeObject::addProperty  js/src/vm/Shape.cpp:322
1  xul.dll  js::AddDataPropertyToPlainObject  js/src/vm/NativeObject-inl.h:909
1  xul.dll  NewPlainObjectWithProperties  js/src/vm/PlainObject.cpp:307
1  xul.dll  js::NewPlainObjectWithMaybeDuplicateKeys  js/src/vm/PlainObject.cpp:330
1  xul.dll  js::JSONFullParseHandlerAnyChar::finishObject  js/src/vm/JSONParser.cpp:697
1  xul.dll  js::JSONPerHandlerParser<unsigned char, js::JSONFullParseHandler<unsigned char> >::parseImpl<JS::Rooted<JS::Value>, `lambda at /builds/worker/checkouts/gecko/js/src/vm/JSONParser.cpp:1071:26'>  js/src/vm/JSONParser.cpp:876

By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:

  • First crash report: 2024-03-16
  • Process type: Content
  • Is startup crash: No
  • Has user comments: No
  • Is null crash: Yes - 2 out of 8 crashes happened on null or near null memory address

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript Engine' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → JavaScript Engine

I think this is a hardware bug.

There are 63 crashes in js::Shape::objectFlags in the last 14 days (which is roughly when the crash rate for this signature spiked). Every single one has the same stack: in JSON.parse, we allocate a new object, try to add a data property to that object, check to see whether Watchtower cares about property adds for that object, and crash while trying to read the object's flags.

This is already deeply weird, because we just allocated the object (and checked that the allocation succeeded). Even more suspiciously, 60/63 crashes have cpu_info family 6 model 183 stepping 1, and the remaining three all have cpu_info family 6 model 186 stepping 2. These appear to be, respectively, the S and P cores for Intel's Raptor Lake.

I have no idea what we did to make Raptor Lake mad.

Digging into a crash, it looks like we've inlined watchPropertyAdd->watchesPropertyAdd->hasAnyFlag->objectFlags into addProperty. Checking this flag is the first thing we do when we enter addProperty. Here are the relevant instructions:

mov    (%rdx),%rax
mov    (%rax),%r8
movzwl 0xc(%r8),%eax

Relevant registers:

rsp: 0x0000003a613faa10
rdx: 0x0000003a613faba0
rax: 0x0000003a613fade8
r8:  0x0000000000000000

rdx is the second argument on windows, which is the Handle<NativeObject*>. It should point to a root, which should live on the stack slightly above the current stack pointer. This is what we see. When we dereference the root to get an object, we expect to point into the heap. However, the actual value of rax is somewhere nearby on the stack. When we load the first word to get the shape, we load a null pointer, and then segfault when we try to read the object flags from the shape.

So my best guess is that some hardware bug in the stack frame above us has messed up the rooting, or passed in the wrong handle? The code inside addProperty all looks completely correct.

Unless the crash rate goes way up, or we see the same crash on other chips, I don't think there's much value digging deeper here.

Severity: -- → S3
Priority: -- → P5
See Also: → 1883761

Im having this issue and its doing my head in, all my drivers are updated, ive fresh installed windows, my chipset firmware and bios are all upgraded, ive fresh installed windows. Im still getting constant acess violations. There was some stability issues i had before due to the motherboard manufactures putting too much power into raptorlake, but this has been fixed with the recent bios update/new power config settings so i dont think its this issue. Weird thing is that chrome is also getting access violations and crashing tabs also (not sure about edge). No other programs are unstable that i know of. I have the 13900k btw. Could i just have a bad/damaged cpu/memory controller? (memtest passes and swapping out ram sticks doesn't help either)

Flags: needinfo?(iireland)

Given that Chrome is also crashing, it seems extremely likely that it's some sort of hardware problem. We've seen a number of crash signatures that mostly/only show up on Raptor Lake, and from anecdotal evidence we're not the only ones.

Looking at the list of errata for Raptor Lake, there are a few bugs that look suspicious, but the ones that caught my eye all say "It may be possible for BIOS to contain a workaround for this erratum." So if you are still seeing problems with up-to-date microcode, then either Intel doesn't know about / hasn't tried to fix your problem, or the fix is incomplete, neither of which is a particularly great option.

Unfortunately I don't have any recommendations beyond waiting for Intel to figure it out. If it gets to the point that Intel identifies a problem that can't be fixed in microcode, then we could consider trying to generate slightly different code to avoid triggering the problem, but without a detailed explanation from Intel, it's basically impossible for us to guess what would need to be changed.

I wish I could give you a more helpful response.

Flags: needinfo?(iireland)
You need to log in before you can comment on or make changes to this bug.