Crash in [@ mozilla::RestyleManager::ElementStateChanged], due to diagnostic assertion failing: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: emilio)
References
(Depends on 1 open bug)
Details
(Keywords: crash, leave-open, topcrash)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/7e69a3c4-4632-4e86-8702-67fdf0221003
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Top 10 frames of crashing thread:
0 XUL mozilla::RestyleManager::ElementStateChanged layout/base/RestyleManager.cpp
0 XUL mozilla::PresShell::ElementStateChanged layout/base/PresShell.cpp:4489
1 XUL mozilla::dom::Document::ElementStateChanged dom/base/Document.cpp:8059
1 XUL mozilla::dom::Element::UpdateState dom/base/Element.cpp:386
2 XUL mozilla::dom::HTMLInputElement::OnValueChanged dom/html/HTMLInputElement.cpp:6695
2 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.cpp:2744
3 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.h:283
3 XUL mozilla::TextControlState::UnbindFromFrame dom/html/TextControlState.cpp:2507
4 XUL nsTextControlFrame::DestroyFrom layout/forms/nsTextControlFrame.cpp:148
5 XUL nsLineBox::DeleteLineList layout/generic/nsLineBox.cpp:387
I'm not sure what's going on but this appears to be a recent regression, the first affected build ID is 20220906213903.
Assignee | ||
Comment 1•8 months ago
|
||
So this is one of those times where the form control state gets out of sync between the DOM and layout (and layout saving the value changes the form control state). This seems to happen only on Android, and it seems it has IMEStateManager::DispatchCompositionEvent
on the stack. Masayuki / Makoto, do you know what might have triggered this?
Comment 2•8 months ago
|
||
This seems to happen only on Android
bp-9f9d06ea-6541-4483-b543-412350221009 is macOS crash and doesn't seem to be related to IMEStateManager
.
Comment 4•8 months ago
|
||
It seems that the web app handles compositionupdate
event in <input>
element and flush pending things in it. Then, the <input>
element is reframed and its state tries to store the new value while TextEditor
is not available. So, I agree that this is not an Android specific bug.
Comment 5•8 months ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Comment 6•7 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:dholbert, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 7•7 months ago
•
|
||
Yeah, let's call this S2 given the recent increase in crash volume; it would be nice to understand what's going on, given that this seems to be a recently-introduced thing (first crash in august, turning into a consistent daily low volume of crashes in mid-Sept).
This is a diagnostic assert, so release users won't be affected, but also they might be getting other forms of bustage when they sail past this assertion (not sure).
RE which platforms are affected -- this initially seemed to be Android-only, but as Masayuki noted, we got several on Mac and now we're getting crashes on Windows, -- e.g. bp-6c82a5e3-2691-4000-9d36-185f20221024.
The "Correlations" pane on crash reports show that 76% of the crashes (including that^ one) have useragent_locale = zh-CN
. Those ones seem to have IME-related stuff in the backtrace, e.g. that^ one has IMEContentObserver::HandleQueryContentEvent
. So there does seem to be an IME connection here (maybe IME code just makes this easier to hit).
For other locales, I'm not seeing IME in the backtrace; e.g. here's a recent en-US Windows crash:
bp-6ed7330a-8abb-4782-a703-53cdf0221025
But it does look like the backtraces consistently have RecreateFramesForContent
calling into nsTextControlFrame::DestroyFrom
/ mozilla::TextControlState::UnbindFromFrame
/ mozilla::TextControlState::SetValue
in the backtrace, though.
So: essentially it looks like, during a style flush, we have to RecreateFramesForContent
on a subtree that includes some text input field, and destruction of that text field's frames causes a value to be set, which triggers ElementStateChanged
which then fatally asserts.
Assignee | ||
Comment 8•7 months ago
|
||
I can try to expand the assert message to try to come up with a test-case.
Assignee | ||
Updated•7 months ago
|
Assignee | ||
Comment 9•7 months ago
|
||
Updated•7 months ago
|
Comment 10•7 months ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/61d384d41c43 Print changed bits in assert failure. r=jwatt
Assignee | ||
Updated•7 months ago
|
Comment 11•7 months ago
|
||
bugherder |
Comment 12•7 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 13•6 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Comment 14•6 months ago
•
|
||
Now that we've got the bits changed in the assertion message (per emilio's tweak), we have a tad more info here. I can see the bits by adding the "Moz crash reason raw" column in the crash reports listing.
The names for these bits are in this struct definition:
https://searchfox.org/mozilla-central/rev/670e2e0999f04dc7734c8c12b2c3d420a1e31f12/dom/base/rust/lib.rs#12
pub struct ElementState: u64 {
In the last week of crash data, it looks like the changed-bits values are all one of the following:
- 140737488355328 (aka
1u64<<47
which isVALUE_EMPTY
) -- this is the most common right now, FWIW. - 4 (aka
1<<2
which isHOVER
) - 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
) - 15360 (aka bits 10, 11, 12, 13 which is
VALID
,INVALID
,USER_VALID
, andUSER_INVALID
)
Comment 15•6 months ago
|
||
In the past 7 days (since comment 14), I'm seeing more of those same codes (15360, 3072, and 140737488355328), plus one additional one:
- 140737488355456 (aka
1<<47 | 1<<7
which isVALUE_EMPTY
andPLACEHOLDER_SHOWN
).
Comment 16•6 months ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #14)
In the last week of crash data, it looks like the changed-bits values are all one of the following:
[...]
- 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
)
Awesome, it looks like we've got a fuzzer testcase for this one in bug 1804806.
Comment 17•5 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 18•4 months ago
|
||
Downgrading due to reduced crash volume, and since we've identified two causes (and fixed one).
Updated•4 months ago
|
Comment 19•8 days ago
|
||
(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #17)
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
Update: apparently this is a topcrash again, though the bot posted it on the wrong bug (since we had two bugs with the same crash-signature field).
BugBot posted in bug 1651070 comment #9:
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
Comment 20•8 days ago
•
|
||
In the past 7 days, we have 62 crashes, with these MOZ_CRASH_REASONS:
128
(13 of these)140737488355328
(43 of these)140737488355456
(3 of these)15360
(2 of these)3072
(1 of these)
These are nearly all the same values from comment 14 - comment 15, except for 128
which is a new one -- that corresponds to the PLACEHOLDER_SHOWN
bit.
As was the case in comment 14, VALUE_EMPTY
(140737488355328
) remains the most common bit here.
Description
•