Crash in [@ mozilla::RestyleManager::ElementStateChanged], due to diagnostic assertion failing: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: emilio)
References
(Depends on 1 open bug)
Details
(Keywords: crash, leave-open)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/7e69a3c4-4632-4e86-8702-67fdf0221003
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)
Top 10 frames of crashing thread:
0 XUL mozilla::RestyleManager::ElementStateChanged layout/base/RestyleManager.cpp
0 XUL mozilla::PresShell::ElementStateChanged layout/base/PresShell.cpp:4489
1 XUL mozilla::dom::Document::ElementStateChanged dom/base/Document.cpp:8059
1 XUL mozilla::dom::Element::UpdateState dom/base/Element.cpp:386
2 XUL mozilla::dom::HTMLInputElement::OnValueChanged dom/html/HTMLInputElement.cpp:6695
2 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.cpp:2744
3 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.h:283
3 XUL mozilla::TextControlState::UnbindFromFrame dom/html/TextControlState.cpp:2507
4 XUL nsTextControlFrame::DestroyFrom layout/forms/nsTextControlFrame.cpp:148
5 XUL nsLineBox::DeleteLineList layout/generic/nsLineBox.cpp:387
I'm not sure what's going on but this appears to be a recent regression, the first affected build ID is 20220906213903.
Assignee | ||
Comment 1•1 year ago
|
||
So this is one of those times where the form control state gets out of sync between the DOM and layout (and layout saving the value changes the form control state). This seems to happen only on Android, and it seems it has IMEStateManager::DispatchCompositionEvent
on the stack. Masayuki / Makoto, do you know what might have triggered this?
Comment 2•1 year ago
|
||
This seems to happen only on Android
bp-9f9d06ea-6541-4483-b543-412350221009 is macOS crash and doesn't seem to be related to IMEStateManager
.
It seems that the web app handles compositionupdate
event in <input>
element and flush pending things in it. Then, the <input>
element is reframed and its state tries to store the new value while TextEditor
is not available. So, I agree that this is not an Android specific bug.
Comment 5•1 year ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Comment 6•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:dholbert, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 7•1 year ago
•
|
||
Yeah, let's call this S2 given the recent increase in crash volume; it would be nice to understand what's going on, given that this seems to be a recently-introduced thing (first crash in august, turning into a consistent daily low volume of crashes in mid-Sept).
This is a diagnostic assert, so release users won't be affected, but also they might be getting other forms of bustage when they sail past this assertion (not sure).
RE which platforms are affected -- this initially seemed to be Android-only, but as Masayuki noted, we got several on Mac and now we're getting crashes on Windows, -- e.g. bp-6c82a5e3-2691-4000-9d36-185f20221024.
The "Correlations" pane on crash reports show that 76% of the crashes (including that^ one) have useragent_locale = zh-CN
. Those ones seem to have IME-related stuff in the backtrace, e.g. that^ one has IMEContentObserver::HandleQueryContentEvent
. So there does seem to be an IME connection here (maybe IME code just makes this easier to hit).
For other locales, I'm not seeing IME in the backtrace; e.g. here's a recent en-US Windows crash:
bp-6ed7330a-8abb-4782-a703-53cdf0221025
But it does look like the backtraces consistently have RecreateFramesForContent
calling into nsTextControlFrame::DestroyFrom
/ mozilla::TextControlState::UnbindFromFrame
/ mozilla::TextControlState::SetValue
in the backtrace, though.
So: essentially it looks like, during a style flush, we have to RecreateFramesForContent
on a subtree that includes some text input field, and destruction of that text field's frames causes a value to be set, which triggers ElementStateChanged
which then fatally asserts.
Assignee | ||
Comment 8•1 year ago
|
||
I can try to expand the assert message to try to come up with a test-case.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 9•1 year ago
|
||
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/61d384d41c43 Print changed bits in assert failure. r=jwatt
Assignee | ||
Updated•1 year ago
|
Comment 11•1 year ago
|
||
bugherder |
Comment 12•1 year ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 13•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Comment 14•1 year ago
•
|
||
Now that we've got the bits changed in the assertion message (per emilio's tweak), we have a tad more info here. I can see the bits by adding the "Moz crash reason raw" column in the crash reports listing.
The names for these bits are in this struct definition:
https://searchfox.org/mozilla-central/rev/670e2e0999f04dc7734c8c12b2c3d420a1e31f12/dom/base/rust/lib.rs#12
pub struct ElementState: u64 {
In the last week of crash data, it looks like the changed-bits values are all one of the following:
- 140737488355328 (aka
1u64<<47
which isVALUE_EMPTY
) -- this is the most common right now, FWIW. - 4 (aka
1<<2
which isHOVER
) - 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
) - 15360 (aka bits 10, 11, 12, 13 which is
VALID
,INVALID
,USER_VALID
, andUSER_INVALID
)
Comment 15•1 year ago
|
||
In the past 7 days (since comment 14), I'm seeing more of those same codes (15360, 3072, and 140737488355328), plus one additional one:
- 140737488355456 (aka
1<<47 | 1<<7
which isVALUE_EMPTY
andPLACEHOLDER_SHOWN
).
Comment 16•1 year ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #14)
In the last week of crash data, it looks like the changed-bits values are all one of the following:
[...]
- 3072 (aka
1<<10 | 1<<11
which isVALID
andINVALID
)
Awesome, it looks like we've got a fuzzer testcase for this one in bug 1804806.
Comment 17•1 year ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 18•10 months ago
|
||
Downgrading due to reduced crash volume, and since we've identified two causes (and fixed one).
Updated•10 months ago
|
Comment 19•7 months ago
|
||
(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #17)
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
Update: apparently this is a topcrash again, though the bot posted it on the wrong bug (since we had two bugs with the same crash-signature field).
BugBot posted in bug 1651070 comment #9:
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
Comment 20•7 months ago
•
|
||
In the past 7 days, we have 62 crashes, with these MOZ_CRASH_REASONS:
128
(13 of these)140737488355328
(43 of these)140737488355456
(3 of these)15360
(2 of these)3072
(1 of these)
These are nearly all the same values from comment 14 - comment 15, except for 128
which is a new one -- that corresponds to the PLACEHOLDER_SHOWN
bit.
As was the case in comment 14, VALUE_EMPTY
(140737488355328
) remains the most common bit here.
Comment 21•6 months ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #20)
As was the case in comment 14,
VALUE_EMPTY
(140737488355328
) remains the most common bit here.
Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854 (potentially related to intermittent bug 1835587 which has the diagnostic assert failing for the same bit.)
Comment 22•5 months ago
•
|
||
(In reply to Daniel Holbert [:dholbert] from comment #21)
(In reply to Daniel Holbert [:dholbert] from comment #20)
As was the case in comment 14,
VALUE_EMPTY
(140737488355328
) remains the most common bit here.Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854
Update: that bug was fixed (via Bug 1472169), but it didn't fix this flavor of the crash entirely; we have a new fuzzer testcase in bug 1838985 that crashes with the same signature/enum-value.
Comment 24•5 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 25•4 months ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
For more information, please visit BugBot documentation.
Comment 26•3 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 27•3 months ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
For more information, please visit BugBot documentation.
Comment 28•3 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment hidden (Intermittent Failures Robot) |
Description
•