Crash in [@ mozilla::ScrollContainerFrame::InInitialReflow]
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: release-mgmt-account-bot, Unassigned)
References
(Blocks 2 open bugs)
Details
(Keywords: crash, stalled)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/62677645-01a4-461e-9a6e-5ef250240524
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll mozilla::ScrollContainerFrame::InInitialReflow const layout/generic/ScrollContainerFrame.cpp:1043
0 xul.dll mozilla::ScrollContainerFrame::Reflow layout/generic/ScrollContainerFrame.cpp:1628
1 xul.dll nsAbsoluteContainingBlock::ReflowAbsoluteFrame layout/generic/nsAbsoluteContainingBlock.cpp:811
1 xul.dll nsAbsoluteContainingBlock::Reflow layout/generic/nsAbsoluteContainingBlock.cpp:219
2 xul.dll nsBlockFrame::Reflow layout/generic/nsBlockFrame.cpp:1759
3 xul.dll nsContainerFrame::ReflowChild layout/generic/nsContainerFrame.cpp:885
3 xul.dll mozilla::ScrollContainerFrame::ReflowScrolledFrame layout/generic/ScrollContainerFrame.cpp:915
4 xul.dll mozilla::ScrollContainerFrame::ReflowContents layout/generic/ScrollContainerFrame.cpp:1050
4 xul.dll mozilla::ScrollContainerFrame::Reflow layout/generic/ScrollContainerFrame.cpp:1518
5 xul.dll nsAbsoluteContainingBlock::ReflowAbsoluteFrame layout/generic/nsAbsoluteContainingBlock.cpp:811
By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:
- First crash report: 2024-05-24
- Process type: Content
- Is startup crash: No
- Has user comments: No
- Is null crash: No
By analyzing the backtrace, the regression may have been introduced by a patch [1] to fix Bug 1897752.
[1] https://hg.mozilla.org/mozilla-central/rev?node=df44b0eea88f
:emilio, since you are the author of the potential regressor, could you please take a look?
Comment 1•8 months ago
|
||
That seems fairly unlikely. If this is something, it's probably a signature change from TYLin's rename to ScrollContainerFrame. But that said I don't understand how we can crash there because this
is valid a few lines above.
Comment 2•8 months ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
That seems fairly unlikely. If this is something, it's probably a signature change from TYLin's rename to ScrollContainerFrame.
Yeah, we have some small amount of crash volume for nsHTMLScrollFrame::InInitialReflow, and this is probably just that, under a new name:
https://crash-stats.mozilla.org/signature/?signature=nsHTMLScrollFrame%3A%3AInInitialReflow&date=%3E%3D2023-11-28T14%3A58%3A00.000Z&date=%3C2024-05-28T14%3A58%3A00.000Z&_sort=-date
Updated•8 months ago
|
Comment 3•8 months ago
|
||
Resetting affected/unaffected flags since they're not meaningful (this isn't known to be a regression).
The oldest crash at this point is bp-17ac3b7b-57ff-4105-8ba1-f19c00231201 which is in Firefox 120.0.1, from nearly 6 months ago (as far back as we track crashes)
Reporter | ||
Comment 4•8 months ago
|
||
This bug has been marked as a regression. Setting status flag for Nightly to affected
.
Updated•8 months ago
|
Comment 5•8 months ago
|
||
Looking at the minidump from comment 0, this looks like this was some higher-order bits in a pointer-address being somehow zeroed out in our this
pointer.
Specifically:
- At stack level
1
in the backtrace,nsAbsoluteContainingBlock::ReflowAbsoluteFrame
, we haveaKidFrame
being0x00000143b7ae7b40
, which we call a method on:aKidFrame->Reflow(...)
. - Drilling down one level, the
this
pointer should be that same pointer-value, but it's not quite -- Visual Studio showsthis
as being0x00000000b7ae7b40
there, which has the high order bits (0x143
) zeroed out for some reason.
This feels likely to be bad hardware (or our stack memory has been stomped on somehow).
Comment 6•8 months ago
|
||
Two other recent minidumps seem to show the same pattern (almost certainly from the same user as comment 0, too -- identical hardware and extension list):
bp-e1cd48ba-4879-4b87-b7f7-d663c0240525 (Nightly 128)
- Stack level 1 has
aKidFrame
being0x00000281d0795960
- Stack level 0 has
this
being0x00000000d0795960
, with the high0x281
bits having been zeroed out.
bp-320f9bbf-0a94-45e0-a74e-26cac0240525 (Nightly 128)
- Stack level 1 has
aKidFrame
being0x0000019166b25100
- Stack level 0 has
this
being0x0000000066b25100
, with the high0x191
bits having been zeroed out.
And this one from release (probably a different user, because different graphics card) shows the same zeroing pattern but to a larger extent:
bp-aa0aaf8f-fa4a-4864-975c-4ebe90240515 (Firefox 125.0.3)
- Stack level 1 has
aKidFrame
being0x00000130af363f98
- Stack level 0 has
this
being0x0000000000000000 <NULL>
with the whole value having been zeroed out.
The odd thing is that we're crashing towards the end of the ScrollContainerFrame::Reflow()
implementation. That leads me to believe that the this
pointer is probably fine towards the beginning of that method, and it's somehow getting clobbered partway through, shortly before we hit this call to InInitialReflow
where we crash with this partly-or-fully nulled out:
https://searchfox.org/mozilla-central/rev/f60bb10a5fe6936f9e9f9e8a90d52c18a0ffd818/layout/generic/ScrollContainerFrame.cpp#1628
Comment 7•8 months ago
|
||
Perhaps this is a CPU bug? All 7 of the crashes here (3 with the new ScrollContainerFrame
signature, 4 with the old nsHTMLScrollFrame
signature) have the same CPU count
and cpu info
values:
CPU Count: 32
CPU Info: family 6 model 183 stepping 1
Perhaps that's a sign that this is a CPU bug?
(They do have variable CPUMicrocodeVersion
fields; not sure to-what-extent that would matter.)
Comment 9•8 months ago
|
||
Yes, bug 1897573 and bug 1871892 are all on the same cpu.
Reporter | ||
Comment 10•7 months ago
|
||
The severity field is not set for this bug.
:emilio, could you have a look please?
For more information, please visit BugBot documentation.
Comment 11•7 months ago
|
||
Given it seems like a CPU bug, S3 seems about right.
Description
•