Closed Bug 1507900 Opened 6 years ago Closed 1 month ago

Perma Beta browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:1239, x2:1239, y1:36, y2:69, w:1, h:34}), window width: 1280 -

Categories

(Firefox :: General, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr78 --- unaffected
firefox79 --- unaffected
firefox80 - wontfix
firefox81 + disabled

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file)

Filed by: csabou [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=212266898&repo=mozilla-central https://queue.taskcluster.net/v1/task/UTP7uZv_QGim_0UOW8bGuw/runs/0/artifacts/public/logs/live_backing.log [task 2018-11-16T18:36:47.718Z] 18:36:47 INFO - TEST-PASS | browser/base/content/test/performance/browser_startup_flicker.js | Should have captured some frames. - [task 2018-11-16T18:36:47.720Z] 18:36:47 INFO - ignoring identical frame [task 2018-11-16T18:36:47.721Z] 18:36:47 INFO - Buffered messages finished [task 2018-11-16T18:36:47.724Z] 18:36:47 INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:3, x2:36, y1:36, y2:69, w:34, h:34}), window width: 1280 - [task 2018-11-16T18:36:47.726Z] 18:36:47 INFO - Stack trace: [task 2018-11-16T18:36:47.727Z] 18:36:47 INFO - chrome://mochikit/content/browser-test.js:test_ok:1292 [task 2018-11-16T18:36:47.729Z] 18:36:47 INFO - chrome://mochitests/content/browser/browser/base/content/test/performance/browser_startup_flicker.js:rects<:56 [task 2018-11-16T18:36:47.730Z] 18:36:47 INFO - chrome://mochitests/content/browser/browser/base/content/test/performance/browser_startup_flicker.js:null:39 [task 2018-11-16T18:36:47.731Z] 18:36:47 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1093 [task 2018-11-16T18:36:47.733Z] 18:36:47 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1084 [task 2018-11-16T18:36:47.734Z] 18:36:47 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:982 [task 2018-11-16T18:36:47.735Z] 18:36:47 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:803 [task 2018-11-16T18:36:47.737Z] 18:36:47 INFO - Not taking screenshot here: see the one that was previously logged
The screenshots in the test logs show the back and forward buttons being first displayed disabled, then being hidden, then being displayed disabled again.
Component: Performance Monitoring → General
OS: Unspecified → Linux
Product: Toolkit → Firefox
Hardware: Unspecified → x86_64
Version: unspecified → Trunk
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

This is permafailing on beta Linux x64 DevEdition. Please investigate.

Flags: needinfo?(florian)

We have several different failures in the same bug. I explained the initial failure in comment 5.

Most of the failures in comment 18 were for a different bug: it was on Windows, and the window was being resized after first paint during startup. The changed area has the size of the full windows: width 1280, height: 1031

The failures starting on April 25th (the last 2 in comment 18, then all of comment 19 and comment 20) are another different one: the vertical separator between the Firefox Account and the Hamburger menu toolbar buttons is first painted in dark grey and then repainted in light grey. This is likely an issue with the dark theme. In the error message you see w:1, h:34.

Was https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0a2e7a4f67451383f95e335913660ae6ba78f13&selectedTaskRun=ZG-Km4n7RH26ysSWa94OlA-0 already a perma fail on April 25th? If so, can this help us find a regression range?

Flags: needinfo?(florian) → needinfo?(aryx.bugmail)

It didn't fail April 26th to 30th - this coincides with the landing, backout and relanding of bug 1632098.

Flags: needinfo?(aryx.bugmail)

It looks like disabling this for a try run could be a one line change at https://searchfox.org/mozilla-central/rev/4166c15e2a99a23a9b38ad62c9fdfe8e5448b354/modules/libpref/init/StaticPrefList.yaml#1016

I also wonder if we could reproduce this test failure on a normal trunk build if we just enabled the dark theme.

Both the beta with browser.tabs.documentchannel.ppdc flipped to false and central with the dark theme didn't hit the failure (for the latter, "opt" instead of "Shippable opt" got used).

Flags: needinfo?(florian)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #26)

Both the beta with browser.tabs.documentchannel.ppdc flipped to false

So this points to a regression from bug 1632098 then? Needinfo on jya to see if he has an idea of how this could cause this change of behavior (the vertical separator between the Firefox Account and the Hamburger menu toolbar buttons is first painted in dark grey and then repainted in light grey).

and central with the dark theme didn't hit the failure (for the latter, "opt" instead of "Shippable opt" got used).

Is it possible to run the DevEdition jobs on central and reproduce there? Are there other differences between normal nightly builds and devedition builds besides using the dark theme?

Flags: needinfo?(florian) → needinfo?(jyavenard)

(In reply to Florian Quèze [:florian] from comment #27)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #26)

Both the beta with browser.tabs.documentchannel.ppdc flipped to false

So this points to a regression from bug 1632098 then? Needinfo on jya to see if he has an idea of how this could cause this change of behavior (the vertical separator between the Firefox Account and the Hamburger menu toolbar buttons is first painted in dark grey and then repainted in light grey).

bug 1632098 didn't introduce a regression, it exposes existing bugs :)

The distinction is of consequence.

What does the PPDC do: previously, when you were starting a new load; it could complete in the same event loop and so code that didn't wait for the "load" event to be fired could get away with it.

Something like:
let document = await openNewTab(URI, false /* don't wait for load */);
let element = document.querySelector(look_for_this_element);
element.do_something_with_it();

could have worked, though would intermittently fail because this is all async and there's no guarantee that the load will be observable like it was synchronous.

When we do the load via the ParentProcessDocumentChannel (which is what gets disabled when you set browser.tabs.documentchannel.ppdc to false); the load will never be observed as being synchronous; timings are changed. So element in the snippet above would always be null; you must wait for the load to complete.

I don't know how the recording service starts or when, nor do I know what events the UI is using to determine when to draw what. Gijs maybe?

It seems to be capturing the actual behaviour on how things are being drawn and that somehow the hamburger button is always being painted grey first.

That the theme code is waiting on a particular event, which isn't suitable for that type of work before loading the theme. And change of PPDC modified the timing of this event.

and central with the dark theme didn't hit the failure (for the latter, "opt" instead of "Shippable opt" got used).

There have been no changes between nightly and beta when it comes to the use of DocumentChannel that would impact what you are seeing.

So I believe if you enabled dark theme in nightly, you'll see the same.

There's probably a real bug here, is it expected for the grey thing to be painted first, even if for a very short time?

Flags: needinfo?(jyavenard) → needinfo?(gijskruitbosch+bugs)

(In reply to Jean-Yves Avenard [:jya] from comment #28)

bug 1632098 didn't introduce a regression, it exposes existing bugs :)

We can't really know that this isn't a "real" regression while we don't understand the root cause of this issue. This bug seems to have collected various painting issues, but the one we're seeing as a result of the ppdc change is completely new, so there's no evidence it's an "existing bug".

What does the PPDC do: previously, when you were starting a new load; it could complete in the same event loop and so code that didn't wait for the "load" event to be fired could get away with it.

Something like:
let document = await openNewTab(URI, false /* don't wait for load */);
let element = document.querySelector(look_for_this_element);
element.do_something_with_it();

What you're describing is to do with stuff loaded in the content area / in a browser. But the effects of this intermittent are all in the browser document, which is already loaded, so ppdc should not have changed anything, AIUI. What effects is PPDC supposed to have for the actual parent browser document?

I don't know how the recording service starts or when, nor do I know what events the UI is using to determine when to draw what. Gijs maybe?

It seems to be capturing the actual behaviour on how things are being drawn and that somehow the hamburger button is always being painted grey first.

That the theme code is waiting on a particular event, which isn't suitable for that type of work before loading the theme. And change of PPDC modified the timing of this event.

We set up the lightweight theme when beforeInitialXULLayout fires for the browser window, at https://searchfox.org/mozilla-central/rev/a40ef31fc9af34a99ceda6d65cdc4573d52b83d2/browser/base/content/browser.js#1734 . This should be before the first paint of the browser window. If PPDC is causing us to paint before that event, something is broken in the documentchannel code. If we're somehow ending up painting based on outdated CSS variables, something is broken in layout/gfx.

Can anyone actually reproduce this locally?

Flags: needinfo?(jyavenard)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(florian)

The only thing PPDC does, is potentially delaying when a load even will be fired (and of course when the page will be loaded).

I'm not aware of what beforeInitialXULLayout is ; but I'm also not aware of anything that would cause beforeInitialXULLayout to depend on what a load event on any particular element is fired.

So while I expect timing to change, there's nothing in the code that ensure that one painting event is done before ; those are all orthogonal processes.

Flags: needinfo?(jyavenard)

(In reply to Alexandru Michis [:malexandru] from comment #38)

This is currently permafailing on beta, like it was on the beta sim mentioned here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1507900#c30

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=linux%2C18.04%2Cx64%2Cdevedition%2Copt%2Cmochitests%2Ctest-linux1804-64-devedition%2Fopt-mochitest-browser-chrome-e10s-3%2Cm%28bc3%29&revision=ecf6eea8a56f522e84c0022feb209e896b2cd0bf&selectedTaskRun=Os51_d0-RuW9dMjGmSwreA.0

That failure is different from the one in the summary here - the one in the summary is about the top left of the toolbar, the failure you're highlighting is in the top right. This is kind of important if we want to figure out what is flickering and why...
Also, https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-06-01&endday=2020-07-01&tree=trunk&bug=1507900 shows nothing for me.

Can you clarify what failures you mean and if this is the right bug?

Flags: needinfo?(malexandru)

The one in Comment 30 is also in the top right, I did some retriggers on the early beta sim from the 29th and that's when I saw it was still permafailing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f81b2c31cc1f3336d0f2472b48d4d8ecbe969189&searchStr=linux%2C18.04%2Cx64%2Cdevedition%2Copt%2Cmochitests%2Ctest-linux1804-64-devedition%2Fopt-mochitest-browser-chrome-e10s-3%2Cm%28bc3%29

The intermittent failures link you posted doesn't show beta and try failures, I changed "tree=all" in the link to see those as well:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-06-01&endday=2020-07-01&tree=all&bug=1507900

I can make a separate bug to differentiate the two issues, or modify this one's summary, since taking a glance at the intermittent failures in the past 30 days, I don't see failures that point to the top left of the toolbar in the log.

Flags: needinfo?(malexandru) → needinfo?(gijskruitbosch+bugs)

(In reply to Alexandru Michis [:malexandru] from comment #40)

The one in Comment 30 is also in the top right, I did some retriggers on the early beta sim from the 29th and that's when I saw it was still permafailing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f81b2c31cc1f3336d0f2472b48d4d8ecbe969189&searchStr=linux%2C18.04%2Cx64%2Cdevedition%2Copt%2Cmochitests%2Ctest-linux1804-64-devedition%2Fopt-mochitest-browser-chrome-e10s-3%2Cm%28bc3%29

The intermittent failures link you posted doesn't show beta and try failures, I changed "tree=all" in the link to see those as well:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-06-01&endday=2020-07-01&tree=all&bug=1507900

I can make a separate bug to differentiate the two issues, or modify this one's summary, since taking a glance at the intermittent failures in the past 30 days, I don't see failures that point to the top left of the toolbar in the log.

Might as well morph this one then...

(In reply to Jean-Yves Avenard [:jya] from comment #32)

The only thing PPDC does, is potentially delaying when a load even will be fired (and of course when the page will be loaded).

But this should only affect the loads of content in browsers (webpages, about pages, other stuff that loads in tabs or other <browser> elements), right, not the load of OS-level window documents like browser.xhtml ?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jyavenard)
Summary: Intermittent browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:3, x2:36, y1:36, y2:69, w:34, h:34}), window width: 1280 - → Intermittent browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:1239, x2:1239, y1:36, y2:69, w:1, h:34}), window width: 1280 -

(In reply to :Gijs (he/him) from comment #41)

But this should only affect the loads of content in browsers (webpages, about pages, other stuff that loads in tabs or other <browser> elements), right, not the load of OS-level window documents like browser.xhtml ?

I can't explain this via the timing of operation anymore.
In order to alleviate the effect of timing differences, in bug 1641737 I made all MozPromise used by the ParentProcessDocumentChannel and DocumentLoadListener to use direct task dispatching; this is very similar to JS microtasks.

That is; a direct task runs prior any other pending tasks. The observable timing and number of event loop iterations should now be identical to what it was before was started introducing the concept of DocumentChannel.

Sorry if I can't be of further help.

Flags: needinfo?(jyavenard)
Summary: Intermittent browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:1239, x2:1239, y1:36, y2:69, w:1, h:34}), window width: 1280 - → Perma Beta browser/base/content/test/performance/browser_startup_flicker.js | unexpected changed rect: ({x1:1239, x2:1239, y1:36, y2:69, w:1, h:34}), window width: 1280 -

This bug is 2 years old, so presumably not new in 80?

Flags: needinfo?(malexandru)

(In reply to Julien Cristau [:jcristau] from comment #50)

This bug is 2 years old, so presumably not new in 80?

It got reused because the failure messages from this test include a bunch of coordinates; the bug is relatively new (not 2 years old, at least). The original problem was to do with a different part of the UI flickering.

Tracking+, but admittedly a weak one. I wouldn't be heartbroken if we skipped the test on affected platforms if we can't get any traction on a proper fix here.

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information

Priority: P5 → P2
Assignee: nobody → apavel
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dd7a8c94fd76 skip browser_startup_flicker.js on devedition on beta r#?intermittent-reviewers r=intermittent-reviewers,jmaher DONTBUILD
Whiteboard: [checkin-needed-beta]
Keywords: leave-open
Whiteboard: [stockwell disabled]

Clearing ni? as part of triage since it looks like the question was answered and we have a (partial) resolution.

Flags: needinfo?(florian)
Assignee: apavel → nobody
Severity: normal → S3
Status: REOPENED → RESOLVED
Closed: 5 years ago1 month ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: