Closed Bug 1727399 Opened 3 years ago Closed 3 years ago

tb windows debug perma - Hit MOZ_CRASH(Resolving style on <html id="messengerWindow" xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:xul="http://www.mozilla.org/keymaster/gate..." icon="messengerWindow"

Categories

(Core :: Layout, defect, P5)

defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- unaffected
firefox93 --- wontfix
firefox94 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: emilio)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Do you know how this started? It seems like something is invalidating the root element style at a bad time, and I suspect it's a TB issue, but should be worth fixing. Does nobody use TB debug on Windows?

Flags: needinfo?(mkmelin+mozilla)

Probably not many people on windows debug no...

We don't know how it started. Unfortunately were was some outage for Thunderbird builds around Aug 19. Aug 19 daily was busted through bug 1726433 and such so no builds until Aug 23... and then when we got the builds going again the error shows up from first build.

The approximate regression window is
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2021-08-19+10%3A00&enddate=2021-08-23+15%3A00
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2021-08-19+10%3A00&enddate=2021-08-23+15%3A00

Hmm, given that the assertion is happening Windows only, maybe bug 1707021 - https://hg.mozilla.org/comm-central/rev/ad55b29e32d4348bb156df9bcdeceea071127953 could have caused it... I'll send off a try run.

Flags: needinfo?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #4)

Probably not many people on windows debug no...

Surprising comment. 95% of the user base is on Windows (https://stats.thunderbird.net/#platlang) and this bug makes development on Windows with a debug build (of course!) impossible since TB doesn't even start. Last successful treeherder build as you correctly pointed out on the 19th of August, more than two weeks ago.

I don't find it very surprising not many people use Windows debug. Debug is only useful if you're changing c++ backend parts, or debugging some very particular problem in it - most people are not doing that. For general development, debug is much too slow to be worth it. And windows is the slowest of the OSs to develop on. Atm, many use primarily artifact builds actually...

We at pEp believe that a product should be developed and tested in a debug build on all platforms that it ships, especially the most prominent one, Windows. Only a debug version allows the developer to see asserts and debug output. As you said yourself, only a debug build allows "general" debugging, in a mixed JS/C++ environment debugging into C++ code can become necessary quicker than on thinks. Without a debug build you're limiting your toolset.

This is NOT windows7-32-qr ONLY.

I see it under Windows 10 x64 WebRender debug, too, in my try-comm-server submission.
My submission:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=711d48dceb1bf1a48e87e809f95a9c86b57dc4a6&test_paths=toolkit%2Fcomponents%2Ftelemetry%2Ftests%2Funit&selectedTaskRun=GfNgB-evRvu2agnliJ6ibA.0

Please see Windows 10 x64 WebRender debug, bct1 log.

So the subject description may need updating (?).

Summary: tb windows7-32-qr debug perma - Hit MOZ_CRASH(Resolving style on <html id="messengerWindow" xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:xul="http://www.mozilla.org/keymaster/gate..." icon="messengerWindow" → tb windows debug perma - Hit MOZ_CRASH(Resolving style on <html id="messengerWindow" xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:xul="http://www.mozilla.org/keymaster/gate..." icon="messengerWindow"

Emilio, how can this be debugged? TB is in the precarious situation that development on Windows has practically been impossible for a month now. This can't be a good thing for any new contributor on Windows.

Flags: needinfo?(emilio)

So it is crashing because of this assert. has_current_styles returns false because the element is dirty, which is rare because we're doing the initial styling.

Debugging this on Linux using rr would be trivial, but I guess it doesn't repro there. Anyhow, you need to figure out why data.hint is not zero, and where did it come from. Chances are some privileged JS is changing some attribute at a place where it shouldn't or something like that.

Flags: needinfo?(emilio)

Bisecting this is probably easy with mozregression though?

Ah, mozregression doesn't support thunderbird debug, that's unfortunate.

More importantly, there were no builds for four days (see comment #4) so there are no builds to bisect. There were seven M-C merges with about 250 changesets, so this would be very painful to bisect in local builds.

Local bisection:
5235f051d95235e351d59f61d5ce19cf0cb9d3f7 Emilio Cobos Álvarez — Bug 1722487 - Improve preference change handling code to be faster. r=jfkthame
https://hg.mozilla.org/mozilla-central/rev/5235f051d95235e351d59f61d5ce19cf0cb9d3f7

Flags: needinfo?(emilio)
Keywords: regression
Regressed by: 1722487
Has Regression Range: --- → yes

I guess I deserve that... I hate debugging on windows so much!

Will look, thanks for narrowing this down Tom :)

Assignee: nobody → emilio

If it helps, we can apply any debug patch and report back the result. We already tried

diff --git a/servo/ports/geckolib/glue.rs b/servo/ports/geckolib/glue.rs
--- a/servo/ports/geckolib/glue.rs
+++ b/servo/ports/geckolib/glue.rs
@@ -5516,16 +5516,19 @@ pub extern "C" fn Servo_TakeChangeHint(e
 #[no_mangle]
 pub extern "C" fn Servo_ResolveStyle(element: &RawGeckoElement) -> Strong<ComputedValues> {
     let element = GeckoElement(element);
     debug!("Servo_ResolveStyle: {:?}", element);
     let data = element
         .borrow_data()
         .expect("Resolving style on unstyled element");
 
+    println!("=== rust debug");
+    println!("=== {:?}", element);
+    println!("=== {:?}", data);
     debug_assert!(
         element.has_current_styles(&*data),
         "Resolving style on {:?} without current styles: {:?}",
         element,
         data
     );
     data.styles.primary().clone().into()
 }

but that didn't appear to give any useful information.

Ok, so I think I see what's going on. The issue is that on the initial document styling, we end up initializing a font-family lazily for font metrics. That causes a post traversal task like this:

#01: gfxFontGroup::AddFamilyToFontList (c:\mozilla-central\mozilla-unified\gfx\thebes\gfxTextRun.cpp:1974)
#02: gfxFontGroup::BuildFontList (c:\mozilla-central\mozilla-unified\gfx\thebes\gfxTextRun.cpp:1903)
#03: gfxFontGroup::gfxFontGroup (c:\mozilla-central\mozilla-unified\gfx\thebes\gfxTextRun.cpp:1858)
#04: nsFontMetrics::nsFontMetrics (c:\mozilla-central\mozilla-unified\gfx\src\nsFontMetrics.cpp:137)
#05: nsFontCache::GetMetricsFor (c:\mozilla-central\mozilla-unified\gfx\src\nsFontCache.cpp:92)
#06: nsLayoutUtils::GetMetricsFor (c:\mozilla-central\mozilla-unified\layout\base\nsLayoutUtils.cpp:9591)
#07: Gecko_GetFontMetrics (c:\mozilla-central\mozilla-unified\layout\style\GeckoBindings.cpp:1427)
#08: style::gecko::wrapper::impl$14::query (c:\mozilla-central\mozilla-unified\servo\components\style\gecko\wrapper.rs:1005)
#09: enum$<style::values::specified::length::FontRelativeLength>::to_computed_value (c:\mozilla-central\mozilla-unified\servo\components\style\values\specified\length.rs:137)
#10: style::values::computed::length_percentage::impl$10::to_computed_value (c:\mozilla-central\mozilla-unified\servo\components\style\values\computed\length_percentage.rs:502)
#11: style::properties::longhands::width::cascade_property (c:\mozilla-central\mozilla-unified\obj-x86_64-pc-mingw32\x86_64-pc-windows-msvc\debug\build\style-e2aaefb0a658ceb9\out\longhands\position.rs:2714)
#12: style::properties::cascade::Cascade::apply_properties<style::properties::cascade::LateProperties,core::iter::adapters::cloned::Cloned<core::slice::iter::Iter<tuple$<ref$<enum$<style::properties::PropertyDeclaration> >,enum$<style::stylesheets::origin::Ori (C:\mozilla-central\mozilla-unified\servo\components\style\properties\cascade.rs:693)
#13: style::properties::cascade::cascade_rules<style::gecko::wrapper::GeckoElement> (C:\mozilla-central\mozilla-unified\servo\components\style\properties\cascade.rs:214)
#14: style::properties::cascade::cascade<style::gecko::wrapper::GeckoElement> (C:\mozilla-central\mozilla-unified\servo\components\style\properties\cascade.rs:109)
#15: style::stylist::Stylist::cascade_style_and_visited<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\stylist.rs:1066)
#16: style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::cascade_style_and_visited<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\style_resolver.rs:346)
#17: style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::cascade_primary_style<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\style_resolver.rs:243)
#18: style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::resolve_primary_style<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\style_resolver.rs:211)
#19: style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::resolve_style<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\style_resolver.rs:259)
#20: style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::resolve_style_with_default_parents<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\style_resolver.rs:293)
#21: style::traversal::compute_style<style::gecko::wrapper::GeckoElement> (c:\mozilla-central\mozilla-unified\servo\components\style\traversal.rs:610)
#22: style::gecko::traversal::impl$1::process_preorder<style::parallel::top_down_dom::closure$1> (c:\mozilla-central\mozilla-unified\servo\components\style\gecko\traversal.rs:37)
#23: std::panic::impl$30::call_once<tuple$<>,rayon_core::scope::impl$1::spawn_fifo::closure$0::closure$0> (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panic.rs:347)
#24: std::panicking::try<tuple$<>,std::panic::AssertUnwindSafe<rayon_core::scope::impl$1::spawn_fifo::closure$0::closure$0> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panicking.rs:430)
#25: std::panic::catch_unwind<std::panic::AssertUnwindSafe<rayon_core::scope::impl$1::spawn_fifo::closure$0::closure$0>,tuple$<> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panic.rs:435)
#26: rayon_core::unwind::halt_unwinding<rayon_core::scope::impl$1::spawn_fifo::closure$0::closure$0,tuple$<> > (C:\mozilla-central\mozilla-unified\third_party\rust\rayon-core\src\unwind.rs:18)
#27: rayon_core::scope::ScopeBase::execute_job<rayon_core::scope::impl$1::spawn_fifo::closure$0::closure$0> (C:\mozilla-central\mozilla-unified\third_party\rust\rayon-core\src\scope\mod.rs:636)
#28: rayon_core::job::impl$6::execute<rayon_core::scope::impl$1::spawn_fifo::closure$0> (C:\mozilla-central\mozilla-unified\third_party\rust\rayon-core\src\job.rs:167)
#29: rayon_core::registry::WorkerThread::wait_until_cold (C:\mozilla-central\mozilla-unified\third_party\rust\rayon-core\src\registry.rs:727)
#30: rayon_core::registry::ThreadBuilder::run (C:\mozilla-central\mozilla-unified\third_party\rust\rayon-core\src\registry.rs:56)
#31: std::sys_common::backtrace::__rust_begin_short_backtrace<rayon_core::registry::impl$2::spawn::closure$0,tuple$<> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\sys_common\backtrace.rs:128)
#32: std::panic::impl$30::call_once<tuple$<>,std::thread::impl$0::spawn_unchecked::closure$0::closure$0> (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panic.rs:348)
#33: std::panicking::try<tuple$<>,std::panic::AssertUnwindSafe<std::thread::impl$0::spawn_unchecked::closure$0::closure$0> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panicking.rs:430)
#34: std::panic::catch_unwind<std::panic::AssertUnwindSafe<std::thread::impl$0::spawn_unchecked::closure$0::closure$0>,tuple$<> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\std\src\panic.rs:435)
#35: core::ops::function::FnOnce::call_once<std::thread::impl$0::spawn_unchecked::closure$0,tuple$<> > (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\library\core\src\ops\function.rs:227)
#36: std::sys::windows::thread::impl$0::new::thread_start (/rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b\/library\std\src\sys\windows\thread.rs:57)
#37: ??? (???:???)

Which since the regressing patch triggers a restyle from here which we don't correctly process before doing frame construction...

Some style system APIs (like StyleNewSubtree) don't deal with nested
restyles really well (it'd be weird if StyleNewSubtree would end up
styling not only that subtree but the whole document).

To prevent breaking style system invariants, make sure to invalidate the
style asynchronously using an early runner. This should ensure at most
one frame of wrong metrics, but usually zero.

Flags: needinfo?(emilio)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6a73dcf39edd
When style updates font information, make sure to invalidate style asynchronously. r=jfkthame

Please move this to a more appropriate component than "Thunderbird::General". I guess it's to late to try now, we'll see the result on the TB build following the next M-C merge.

Thanks for addressing this quickly!

Component: General → Layout
Product: Thunderbird → Core
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: