Open Bug 1793410 Opened 1 year ago Updated 2 months ago

Crash in [@ mozilla::RestyleManager::ElementStateChanged], due to diagnostic assertion failing: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)

Categories

(Core :: Layout, defect)

defect

Tracking

()

ASSIGNED

People

(Reporter: gsvelto, Assigned: emilio)

References

(Depends on 1 open bug)

Details

(Keywords: crash, leave-open)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/7e69a3c4-4632-4e86-8702-67fdf0221003

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)

Top 10 frames of crashing thread:

0 XUL mozilla::RestyleManager::ElementStateChanged layout/base/RestyleManager.cpp
0 XUL mozilla::PresShell::ElementStateChanged layout/base/PresShell.cpp:4489
1 XUL mozilla::dom::Document::ElementStateChanged dom/base/Document.cpp:8059
1 XUL mozilla::dom::Element::UpdateState dom/base/Element.cpp:386
2 XUL mozilla::dom::HTMLInputElement::OnValueChanged dom/html/HTMLInputElement.cpp:6695
2 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.cpp:2744
3 XUL mozilla::TextControlState::SetValue dom/html/TextControlState.h:283
3 XUL mozilla::TextControlState::UnbindFromFrame dom/html/TextControlState.cpp:2507
4 XUL nsTextControlFrame::DestroyFrom layout/forms/nsTextControlFrame.cpp:148
5 XUL nsLineBox::DeleteLineList layout/generic/nsLineBox.cpp:387

I'm not sure what's going on but this appears to be a recent regression, the first affected build ID is 20220906213903.

So this is one of those times where the form control state gets out of sync between the DOM and layout (and layout saving the value changes the form control state). This seems to happen only on Android, and it seems it has IMEStateManager::DispatchCompositionEvent on the stack. Masayuki / Makoto, do you know what might have triggered this?

Flags: needinfo?(masayuki)
Flags: needinfo?(m_kato)

This seems to happen only on Android

bp-9f9d06ea-6541-4483-b543-412350221009 is macOS crash and doesn't seem to be related to IMEStateManager.

So this is dup of bug 1778476?

Flags: needinfo?(m_kato)

It seems that the web app handles compositionupdate event in <input> element and flush pending things in it. Then, the <input> element is reframed and its state tries to store the new value while TextEditor is not available. So, I agree that this is not an Android specific bug.

Flags: needinfo?(masayuki)

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

:dholbert, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dholbert)
Keywords: topcrash

Yeah, let's call this S2 given the recent increase in crash volume; it would be nice to understand what's going on, given that this seems to be a recently-introduced thing (first crash in august, turning into a consistent daily low volume of crashes in mid-Sept).

This is a diagnostic assert, so release users won't be affected, but also they might be getting other forms of bustage when they sail past this assertion (not sure).

RE which platforms are affected -- this initially seemed to be Android-only, but as Masayuki noted, we got several on Mac and now we're getting crashes on Windows, -- e.g. bp-6c82a5e3-2691-4000-9d36-185f20221024.

The "Correlations" pane on crash reports show that 76% of the crashes (including that^ one) have useragent_locale = zh-CN. Those ones seem to have IME-related stuff in the backtrace, e.g. that^ one has IMEContentObserver::HandleQueryContentEvent. So there does seem to be an IME connection here (maybe IME code just makes this easier to hit).

For other locales, I'm not seeing IME in the backtrace; e.g. here's a recent en-US Windows crash:
bp-6ed7330a-8abb-4782-a703-53cdf0221025

But it does look like the backtraces consistently have RecreateFramesForContent calling into nsTextControlFrame::DestroyFrom / mozilla::TextControlState::UnbindFromFrame / mozilla::TextControlState::SetValue in the backtrace, though.

So: essentially it looks like, during a style flush, we have to RecreateFramesForContent on a subtree that includes some text input field, and destruction of that text field's frames causes a value to be set, which triggers ElementStateChanged which then fatally asserts.

Severity: S3 → S2
Flags: needinfo?(dholbert)
Hardware: Unspecified → All
Summary: Crash in [@ mozilla::RestyleManager::ElementStateChanged] → Crash in [@ mozilla::RestyleManager::ElementStateChanged], due to diagnostic assertion failing: MOZ_DIAGNOSTIC_ASSERT(!mInStyleRefresh)

I can try to expand the assert message to try to come up with a test-case.

Flags: needinfo?(emilio)
Flags: needinfo?(emilio)
Assignee: nobody → emilio
Status: NEW → ASSIGNED
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/61d384d41c43
Print changed bits in assert failure. r=jwatt
Keywords: leave-open
See Also: → 1800543

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

Now that we've got the bits changed in the assertion message (per emilio's tweak), we have a tad more info here. I can see the bits by adding the "Moz crash reason raw" column in the crash reports listing.
The names for these bits are in this struct definition:
https://searchfox.org/mozilla-central/rev/670e2e0999f04dc7734c8c12b2c3d420a1e31f12/dom/base/rust/lib.rs#12

pub struct ElementState: u64 {

In the last week of crash data, it looks like the changed-bits values are all one of the following:

  • 140737488355328 (aka 1u64<<47 which is VALUE_EMPTY) -- this is the most common right now, FWIW.
  • 4 (aka 1<<2 which is HOVER)
  • 3072 (aka 1<<10 | 1<<11 which is VALID and INVALID)
  • 15360 (aka bits 10, 11, 12, 13 which is VALID, INVALID, USER_VALID, and USER_INVALID)

In the past 7 days (since comment 14), I'm seeing more of those same codes (15360, 3072, and 140737488355328), plus one additional one:

  • 140737488355456 (aka 1<<47 | 1<<7 which is VALUE_EMPTY and PLACEHOLDER_SHOWN).
See Also: → 1804806

(In reply to Daniel Holbert [:dholbert] from comment #14)

In the last week of crash data, it looks like the changed-bits values are all one of the following:
[...]

  • 3072 (aka 1<<10 | 1<<11 which is VALID and INVALID)

Awesome, it looks like we've got a fuzzer testcase for this one in bug 1804806.

Depends on: 1804806, 1800543
See Also: 1800543, 1804806

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Downgrading due to reduced crash volume, and since we've identified two causes (and fixed one).

Severity: S2 → S3
Depends on: 1651070
No longer depends on: 1804806

(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #17)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

Update: apparently this is a topcrash again, though the bot posted it on the wrong bug (since we had two bugs with the same crash-signature field).

BugBot posted in bug 1651070 comment #9:

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on nightly
Keywords: topcrash

In the past 7 days, we have 62 crashes, with these MOZ_CRASH_REASONS:

  • 128 (13 of these)
  • 140737488355328 (43 of these)
  • 140737488355456 (3 of these)
  • 15360 (2 of these)
  • 3072 (1 of these)

These are nearly all the same values from comment 14 - comment 15, except for 128 which is a new one -- that corresponds to the PLACEHOLDER_SHOWN bit.

As was the case in comment 14, VALUE_EMPTY (140737488355328) remains the most common bit here.

See Also: → 1835587

(In reply to Daniel Holbert [:dholbert] from comment #20)

As was the case in comment 14, VALUE_EMPTY (140737488355328) remains the most common bit here.

Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854 (potentially related to intermittent bug 1835587 which has the diagnostic assert failing for the same bit.)

Depends on: 1836854
Depends on: 1838985

(In reply to Daniel Holbert [:dholbert] from comment #21)

(In reply to Daniel Holbert [:dholbert] from comment #20)

As was the case in comment 14, VALUE_EMPTY (140737488355328) remains the most common bit here.

Good news, we've got a fuzzer testcase that triggers this now, over in bug 1836854

Update: that bug was fixed (via Bug 1472169), but it didn't fix this flavor of the crash entirely; we have a new fuzzer testcase in bug 1838985 that crashes with the same signature/enum-value.

Duplicate of this bug: 1843445

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on nightly

For more information, please visit BugBot documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on nightly

For more information, please visit BugBot documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.