Crash in [@ mozilla::RestyleManager::ElementStateChanged], due to diagnostic assertion failing: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: emilio)
References
(Depends on 2 open bugs)
Details
(Keywords: crash, leave-open)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/7e69a3c4-4632-4e86-8702-67fdf0221003
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Top 10 frames of crashing thread:
0 XUL mozilla::RestyleManager::ElementStateChanged layout/base/RestyleManager.cpp
0 XUL mozilla::PresShell::ElementStateChanged layout/base/PresShell.cpp:4489
1 XUL mozilla::dom::Document::ElementStateChanged dom/base/Document.cpp:8059
1 XUL mozilla::dom::Element::UpdateState dom/base/Element.cpp:386
2 XUL mozilla::dom::HTMLInputElement::OnValueChanged dom/html/HTMLInputElement.cpp:6695
2 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.cpp:2744
3 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.h:283
3 XUL mozilla::TextControlState::UnbindFromFrame dom/html/TextControlState.cpp:2507
4 XUL nsTextControlFrame::DestroyFrom layout/forms/nsTextControlFrame.cpp:148
5 XUL nsLineBox::DeleteLineList layout/generic/nsLineBox.cpp:387
I'm not sure what's going on but this appears to be a recent regression, the first affected build ID is 20220906213903.
Assignee | ||
Comment 1•3 years ago
|
||
So this is one of those times where the form control state gets out of sync between the DOM and layout (and layout saving the value changes the form control state). This seems to happen only on Android, and it seems it has IMEStateManager::DispatchCompositionEvent
on the stack. Masayuki / Makoto, do you know what might have triggered this?
Comment 2•3 years ago
|
||
This seems to happen only on Android
bp-9f9d06ea-6541-4483-b543-412350221009 is macOS crash and doesn't seem to be related to IMEStateManager
.
Comment 4•3 years ago
|
||
It seems that the web app handles compositionupdate
event in <input>
element and flush pending things in it. Then, the <input>
element is reframed and its state tries to store the new value while TextEditor
is not available. So, I agree that this is not an Android specific bug.
Comment 5•3 years ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Comment 6•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:dholbert, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 7•3 years ago
•
|
||
Yeah, let's call this S2 given the recent increase in crash volume; it would be nice to understand what's going on, given that this seems to be a recently-introduced thing (first crash in august, turning into a consistent daily low volume of crashes in mid-Sept).
This is a diagnostic assert, so release users won't be affected, but also they might be getting other forms of bustage when they sail past this assertion (not sure).
RE which platforms are affected -- this initially seemed to be Android-only, but as Masayuki noted, we got several on Mac and now we're getting crashes on Windows, -- e.g. bp-6c82a5e3-2691-4000-9d36-185f20221024.
The "Correlations" pane on crash reports show that 76% of the crashes (including that^ one) have useragent_locale = zh-CN
. Those ones seem to have IME-related stuff in the backtrace, e.g. that^ one has IMEContentObserver::HandleQueryContentEvent
. So there does seem to be an IME connection here (maybe IME code just makes this easier to hit).
For other locales, I'm not seeing IME in the backtrace; e.g. here's a recent en-US Windows crash:
bp-6ed7330a-8abb-4782-a703-53cdf0221025
But it does look like the backtraces consistently have RecreateFramesForContent
calling into nsTextControlFrame::DestroyFrom
/ mozilla::TextControlState::UnbindFromFrame
/ mozilla::TextControlState::SetValue
in the backtrace, though.
So: essentially it looks like, during a style flush, we have to RecreateFramesForContent
on a subtree that includes some text input field, and destruction of that text field's frames causes a value to be set, which triggers ElementStateChanged
which then fatally asserts.
Assignee | ||
Comment 8•3 years ago
|
||
I can try to expand the assert message to try to come up with a test-case.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 11•3 years ago
|
||
bugherder |
Comment 12•3 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 13•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Comment 14•3 years ago
•
|
||
Now that we've got the bits changed in the assertion message (per emilio's tweak), we have a tad more info here. I can see the bits by adding the "Moz crash reason raw" column in the crash reports listing.
The names for these bits are in this struct definition:
https://searchfox.org/mozilla-central/rev/670e2e0999f04dc7734c8c12b2c3d420a1e31f12/dom/base/rust/lib.rs#12
pub struct ElementState: u64 {
In the last week of crash data, it looks like the changed-bits values are all one of the following:
- 140737488355328 (aka
1u64<<47
which isVALUE_EMPTY
) -- this is the most common right now, FWIW. - 4 (aka
1<<2
which isHOVER
) - 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
) - 15360 (aka bits 10, 11, 12, 13 which is
VALID
,INVALID
,USER_VALID
, andUSER_INVALID
)
Comment 15•3 years ago
|
||
In the past 7 days (since comment 14), I'm seeing more of those same codes (15360, 3072, and 140737488355328), plus one additional one:
- 140737488355456 (aka
1<<47 | 1<<7
which isVALUE_EMPTY
andPLACEHOLDER_SHOWN
).
Comment 16•3 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #14)
In the last week of crash data, it looks like the changed-bits values are all one of the following:
[...]
- 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
)
Awesome, it looks like we've got a fuzzer testcase for this one in bug 1804806.
Comment 17•3 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 18•3 years ago
|
||
Downgrading due to reduced crash volume, and since we've identified two causes (and fixed one).
Updated•3 years ago
|
Comment 19•2 years ago
|
||
(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #17)
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
Update: apparently this is a topcrash again, though the bot posted it on the wrong bug (since we had two bugs with the same crash-signature field).
BugBot posted in bug 1651070 comment #9:
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
Comment 20•2 years ago
•
|
||
In the past 7 days, we have 62 crashes, with these MOZ_CRASH_REASONS:
128
(13 of these)140737488355328
(43 of these)140737488355456
(3 of these)15360
(2 of these)3072
(1 of these)
These are nearly all the same values from comment 14 - comment 15, except for 128
which is a new one -- that corresponds to the PLACEHOLDER_SHOWN
bit.
As was the case in comment 14, VALUE_EMPTY
(140737488355328
) remains the most common bit here.
Comment 21•2 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #20)
As was the case in comment 14,
VALUE_EMPTY
(140737488355328
) remains the most common bit here.
Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854 (potentially related to intermittent bug 1835587 which has the diagnostic assert failing for the same bit.)
Comment 22•2 years ago
•
|
||
(In reply to Daniel Holbert [:dholbert] from comment #21)
(In reply to Daniel Holbert [:dholbert] from comment #20)
As was the case in comment 14,
VALUE_EMPTY
(140737488355328
) remains the most common bit here.Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854
Update: that bug was fixed (via Bug 1472169), but it didn't fix this flavor of the crash entirely; we have a new fuzzer testcase in bug 1838985 that crashes with the same signature/enum-value.
Comment 24•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 25•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
For more information, please visit BugBot documentation.
Comment 26•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 27•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
For more information, please visit BugBot documentation.
Comment 28•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment hidden (Intermittent Failures Robot) |
Comment 30•1 year ago
|
||
Good news, have a new fuzzer testcase that triggers this crash in bug 1889803 (and bug 1889804, though I tentatively duped that to the other one).
Comment hidden (Intermittent Failures Robot) |
Comment 32•10 months ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:emilio, maybe it's time to close this bug?
For more information, please visit BugBot documentation.
Comment hidden (Intermittent Failures Robot) |
Comment 35•8 months ago
|
||
FWIW bug 1936213 is another case with 3072
(aka 1<<10 | 1<<11
which are ElementState
's VALID
and INVALID
bits combined).
Comment 36•8 months ago
•
|
||
But looking at the last 3 months, 822 / 864 of the crashes with this signature (95% of them) have this for moz crash reason raw
:
Element state change during style refresh (35184372088832)
That giant number is from just a single bit being set -- it's 1<<45 i.e. VALUE_EMPTY
, defined here:
https://searchfox.org/mozilla-central/rev/9783996dbd86f999cab50ea426079a7b10f28a2f/dom/base/rust/lib.rs#126-128
/// For :-moz-value-empty (to show widgets like the reveal password
/// button or the clear button).
const VALUE_EMPTY = 1u64 << 45;
Comment 37•8 months ago
|
||
I guess comment 36's 35184372088832
aka VALUE_EMPTY
version of this bug (comment 36) is covered by the fuzzer testcase in bug 1889803. (At least, that's one way to trigger this.)
Description
•