Closed
Bug 1426797
Opened 5 years ago
Closed 5 years ago
image-size.xul is going to permafail on Windows when Gecko 59 merges to Beta on 2018-01-11 (fails in presence of -moz-window-inactive selector w/ release milestone.txt)
Categories
(Core :: Layout, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox57 | --- | unaffected |
firefox58 | --- | unaffected |
firefox59 | + | wontfix |
firefox60 | --- | fixed |
People
(Reporter: RyanVM, Assigned: emilio)
References
Details
Attachments
(1 file)
Due to a lot of bustage issues, I can't easily bisect what caused this. I think it's going to need some good old fashioned debugging :(. It appears to be Windows-only. https://treeherder.mozilla.org/logviewer.html#?job_id=153065715&repo=try The patch below should suffice for reproducing it: https://hg.mozilla.org/try/rev/0154852baa88e427a1957e67ec1fd9d3e0acd80b
Reporter | ||
Comment 1•5 years ago
|
||
Though maybe the Windows-only crashtest asserts I'm also seeing are shedding some light on it? https://treeherder.mozilla.org/logviewer.html#?job_id=153082682&repo=try WARNING: stylo: requesting a Gecko declaration block?: file z:/build/build/src/layout/style/ServoBindings.cpp, line 453 WARNING: stylo: requesting a Gecko declaration block?: file z:/build/build/src/layout/style/ServoBindings.cpp, line 469 ASSERTION: Placeholder relationship should have been torn down already; this might mean we have a stray placeholder in the tree.: '!placeholder || nsLayoutUtils::IsProperAncestorFrame(aDestructRoot, placeholder)', file z:/build/build/src/layout/generic/nsFrame.cpp, line 767
Flags: needinfo?(xidorn+moz)
Comment 2•5 years ago
|
||
I checked locally with 366583-1.xul on m-c (for the failure in comment 1), and it raises the same stylo warning but no assertion. I had a look at why the warning is raised, and it seems that is because of <scrollbar> having gecko declaration likely due to xul prototype element model, and it is unrelated to the failure (the additional assertion). That warning should go away when we enable stylo-chrome. And for the failure of 366583-1.xul itself, I have no idea. Looking at the code logic, I cannot think of anything going wrong there... Maybe someone would need to do build locally and attach a debugger to it to see what's happening. And, I don't think the assertion failure of 366583-1.xul is anything related to the failure of image-size.xul. As for the failures in image-size.xul, it seems to be related to the sizing of <xul:image>. Somehow when width or height is not specified explicitly, the image completely collapses in that direction. This is likely an issue in nsImageBoxFrame, but can also be an issue in style system if it provides some weird value for corresponding properties. Does the issue happen in stylo-disabled task as well? If not, it could be an issue in style system, then... (cc dholbert, since I'm seeing the sizing code of nsImageBoxFrame is mainly written by him, so he may have some other idea about why that's happening.)
Flags: needinfo?(xidorn+moz)
Reporter | ||
Comment 3•5 years ago
|
||
stylo-disabled tests only run on Linux AFAICT
dholbert, is this something you can take on in the next week?
Flags: needinfo?(dholbert)
Comment 5•5 years ago
|
||
Yeah, I'll take a look.
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment 8•5 years ago
|
||
I needed one more tweak to get that Try run to get past "./mach build check" -- updated Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e24ed15199a35e179a454a2e84542c69fce86cb6&selectedJob=153935571 That try run does still show the issue. I can reproduce it locally, too, with... ./mach reftest layout/xul/reftest/image-size.xul If I load the XUL file directly in a local nightly-as-beta build (after setting dom.allow_XUL_XBL_for_file = true), I can see a *brief* flash of the broken rendering that's captured in the reftest snapshot, and then it very quickly snaps to the correct[1] rendering. So I think this is a case where we get the rendering correct[1] but we paint an incorrect partial rendering on the way, and the reftest harness is capturing that incorrect rendering for some reason. Tentatively, I want to call this a WebRender bug (or possibly a reftest-harness/webrender interaction bug)... [1] (The final "correct" rendering that we settle on actually has a slight brokenness at the top-right -- there's a tiny green rectangle with a blue rectangle inside of it, and the green rectangle is a bit wider in the reference case. But I see that difference even in unmodified Nightly builds on Windows -- I think it's a rounding issue specific to webrender, and it's what the test's "fuzzy-if(webrender,...)" annotation is meant to cover.)
Comment 9•5 years ago
|
||
CC'ing kats and aosmond who each touched the reftest.list line for this testcase during the scope of some webrender work (in bug 1389000 and bug 1183378 / bug 1420279, respectively).
Flags: needinfo?(dholbert)
Comment 10•5 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #8) > [1] (The final "correct" rendering that we settle on actually has a slight > brokenness at the top-right -- there's a tiny green rectangle with a blue > rectangle inside of it, and the green rectangle is a bit wider in the > reference case. But I see that difference even in unmodified Nightly builds > on Windows -- I think it's a rounding issue specific to webrender, and it's > what the test's "fuzzy-if(webrender,...)" annotation is meant to cover.) (Actually -- it looks like this tiny misrendering on my local machine was due to Win10's default 125% HiDPI scaling. If I reset that to 100%, then I don't see this issue. And it also doesn't make sense to blame this on webrender because it seems webrender is still disabled by default -- I'd misremembered it being enabled on Nightly on some platform.)
Comment 11•5 years ago
|
||
"reftest-wait" plus a small setTimeout does seem to fix the issue, FWIW: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cede1653008e2018b857b4f230025f9ba187cc5
Comment 12•5 years ago
|
||
OK, I was able to "hg bisect" this locally on Win10, and I determined that it broke starting with this commit for bug 1416493: https://hg.mozilla.org/mozilla-central/rev/5f57907f645b Its parent 87368e9f8bad (plus the same config/milestone.txt tweak) does not show the problem. That commit 5f57907f645b (plus a config/milestone.txt tweak to drop "a1") shows the problem. Additionally, there's a performance regression for this specific reftest associated with that commit. Before bug 1416493, this reftest takes 140ms-150ms. After that bug, it takes 190ms-210ms. Measurements taken on my laptop using this command: ./mach reftest layout/xul/reftest/image-size.xul | grep "REFTEST" (Occasionally -- less than 20% of the time -- a post-regression build will pass the reftest, but still take 190-210ms. So the perf regression seems to be reliable, even if the rendering regression isn't 100% reliable.) The connection to bug 1416493 is somewhat believable, because that bug does make some tweaks to platform-specific XUL files (adding an @import for a platform-specific "menu.css" file). My current theory is that that new "@import" makes XUL loading take a little bit longer (hence the perf regression), and we get a flash of unstyled/partially-styled content as a result (due to losing a race with some piece of the reftest harness, or something). I'm not yet sure why this only happens in release builds (with the config/milestone.txt tweak), though.
Blocks: 1416493
Comment 13•5 years ago
|
||
FYI I'm not sure if this would change anything from Comment 12, but I'm working on importing menu.css differently in Bug 1420229, such that the @import will happen inside a new UA sheet loaded via nsDocumentViewer.cpp / nsLayoutStylesheetCache.cpp instead of in global.css (https://hg.mozilla.org/try/rev/1337c7dcf641a1cfd35201e1b3a0cb09f1d99a7d)
Comment 14•5 years ago
|
||
Thanks! That might help with the test failure. But regardless, there's a legit bug here. Specifically, the reftest snapshot captures the wrong thing if this testcase has a "-moz-window-inactive" selector, *and* we have a release-build milestone. (Those specific two things. Bug 1416493 gave us the second thing on Windows, because the menu.css file for Windows happens to have a -moz-window-inactive selector.) Here's a Try run which makes those two tweaks, on last night's mozilla-inbound, which hits this reftest failure on all platforms: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a34d6ff18b91ab2271901574b80e4540a31af8d And for comparison, here's a Try run with *just* the "-moz-window-inactive" tweak, and without the milestone tweak: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5542191dd5dc09559ba3593c9363764ad6d4823 (No instances of this reftest failure there). Also -- I don't have a Try run to prove it, but local testing shows that this is only a problem when stylo is enabled. So this is something wrong with the relatively-new stylo support for "-moz-window-inactive", I think. (This is also *not* bug 1428164 (about the same selector), because my Try runs that I linked in this comment *do* have that bug's patches, and yet they still fail.) --> Kicking this back over to xidorn. (or maybe emilio, who's got this selector code fresh in his mind from bug 1428164) Worst case, if we can't get this fixed before the merge on Thursday [I think], I'm sure we could annotate the test as random for beta (since it doesn't fail 100% of the time) and fix this
Flags: needinfo?(xidorn+moz)
Summary: image-size.xul is going to permafail on Windows when Gecko 59 merges to Beta on 2018-01-11 → image-size.xul is going to permafail on Windows when Gecko 59 merges to Beta on 2018-01-11 (fails in presence of -moz-window-inactive selector w/ release milestone.txt)
Comment 15•5 years ago
|
||
For convenience, here's a single patch that should make this reproduce on all platforms (e.g. for local debugging), when you run this reftest command: ./mach reftest layout/xul/reftest/image-size.xul (Note that it doesn't repro 100% of the time, so you might need to run that command a few times to see the failure.)
Comment 16•5 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #2) > I checked locally with 366583-1.xul on m-c (for the failure in comment 1), > and it raises the same stylo warning but no assertion. [...] > > And for the failure of 366583-1.xul itself, I have no idea. Looking at the > code logic, I cannot think of anything going wrong there... Maybe someone > would need to do build locally and attach a debugger to it to see what's > happening. Following up on this: - RyanVM spun off bug 1428087 for this 366583-1.xul failure - ...but I think it's actually the same issue as this bug -- and the assertion can be made to reproduce on all platforms (not just Windows) if you drop "a1" from config/milestone.txt and also add a -moz-window-inactive selector to the testcase, as noted in bug 1428087 comment 2 (see Try run linked there). Noting this here in case it's easier to debug that testcase (now that we have a way of making it reproduce) rather than debugging image-size.xul.
Assignee | ||
Comment 17•5 years ago
|
||
Just in case it's helpful, it seems that :-moz-window-inactive makes us restyle the whole document, so it should be reliably reproducible with something like: let s = document.createElement('style'); s.innerHTML = '* { color: red; }'; document.body.appendChild(s); document.body.offsetTop; s.remove(); Or something of the sort.
Assignee | ||
Comment 18•5 years ago
|
||
Daniel, if you have a build handy (I'm doing a build right now, but it'll take a bit), mind trying whether the assertion is reproducible with Stylo disabled and code like the one in comment 17 instead of :-moz-window-inactive? If so, it may mean this is a longstanding issue that happens to be uncovered by the moz-window-inactive bit, and I'd just wallpaper it. If not and it's a stylo regression, it may be worth digging into a bit more.
Flags: needinfo?(dholbert)
Comment 19•5 years ago
|
||
(You might be onto something, but no, I wasn't able to get this to reproduce with variations on the comment 17 snippet.)
Flags: needinfo?(dholbert)
Comment 20•5 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #12) > Additionally, there's a performance regression for this specific reftest > associated with that commit. Before bug 1416493, this reftest takes > 140ms-150ms. After that bug, it takes 190ms-210ms. Measurements taken on > my laptop using this command: > ./mach reftest layout/xul/reftest/image-size.xul | grep "REFTEST" > > (Occasionally -- less than 20% of the time -- a post-regression build will > pass the reftest, but still take 190-210ms. So the perf regression seems to > be reliable, even if the rendering regression isn't 100% reliable.) I'm not too surprised that there's a regression (although that's bigger than I would have guessed), since we are loading the new CSS file. In practice this doesn't affect Firefox / Talos though since menu.css used to get loaded immediately by XBL at startup anyway due to the presence of <menu> tags in browser.xul.
Comment 21•5 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #12) > The connection to bug 1416493 is somewhat believable, because that bug does > make some tweaks to platform-specific XUL files (adding an @import for a > platform-specific "menu.css" file). My current theory is that that new > "@import" makes XUL loading take a little bit longer (hence the perf > regression), and we get a flash of unstyled/partially-styled content as a > result (due to losing a race with some piece of the reftest harness, or > something). I'm not yet sure why this only happens in release builds (with > the config/milestone.txt tweak), though. By the way, I'm planning to have many more CSS files to get loaded in this same manner as we continue to replace XBL. Basically moving XBL <resources> stylesheets into UA styles that get loaded in all XUL docs. So if this is indeed a race that's getting triggered due to slightly slower CSS loading in the browser chrome I would expect this to get much worse as we go from 1 file loaded this way to tens of files. But I would have expected that reftest results wouldn't be affected by changes in the browser chrome styles.
Comment 22•5 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #20) > (In reply to Daniel Holbert [:dholbert] from comment #12) > > Additionally, there's a performance regression for this specific reftest > > associated with that commit. [...] > > I'm not too surprised that there's a regression (although that's bigger than > I would have guessed), since we are loading the new CSS file. I'd bet that the observed slowdown (like the test failure itself) is caused specifically by the -moz-window-inactive selector itself (which happened to be in the stylesheet but would probably cause the same slowdown anywhere).
Reporter | ||
Comment 23•5 years ago
|
||
I'm going to land the wallpaper for now to avoid orange on tomorrow's uplift. Please remember to revert this change once the underlying issue is resolved.
Keywords: leave-open
Comment 24•5 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d8d5e1376664 Mark image-size.xul as random on Windows. a=test-only
Comment 25•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d8d5e1376664
Comment 26•5 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #17) > Just in case it's helpful, it seems that :-moz-window-inactive makes us > restyle the whole document, so it should be reliably reproducible with > something like: > > let s = document.createElement('style'); > s.innerHTML = '* { color: red; }'; > document.body.appendChild(s); > document.body.offsetTop; > s.remove(); > > Or something of the sort. There is one thing different from doing a whole document restyle and doing this, that in a whole document restyle, the restyle starts from the root frame (which is usually the viewport frame), while with this change, the restyle would at most from the primary frame of the root element (which is probably the root element frame). Restyling the viewport frame could be a key factor for this kind of issue. Anyway, I suspect this would also go away once bug 1409672 lands.
Flags: needinfo?(xidorn+moz)
Reporter | ||
Comment 27•5 years ago
|
||
NI myself to see if this still reproduces.
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 28•5 years ago
|
||
Indeed, this appears to pass in current simulation runs going off the logs. Aryx, can you please verify that comment 25 can be reverted in your next simulation run?
Flags: needinfo?(ryanvm) → needinfo?(aryx.bugmail)
![]() |
||
Comment 29•5 years ago
|
||
No issues with the commit in comment 25 reverted: https://treeherder.mozilla.org/#/jobs?repo=try&revision=99a1d5c3a29ee1aedd591c0b79cb52db2c1c25d1&group_state=expanded&selectedJob=164173328&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable
Flags: needinfo?(aryx.bugmail)
Reporter | ||
Comment 30•5 years ago
|
||
Going to call this fixed by bug 1409672 and re-enable the test then. Thanks for testing it, Aryx!
Assignee: nobody → emilio
Keywords: leave-open
Comment 31•5 years ago
|
||
Backout by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/mozilla-inbound/rev/28d25664945a Backed out changeset d8d5e1376664 because it doesn't fail anymore on Windows
Updated•5 years ago
|
Status: NEW → RESOLVED
Closed: 5 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in
before you can comment on or make changes to this bug.
Description
•