Closed Bug 1315611 Opened 3 years ago Closed 3 years ago

Ignore GFX sanity check window during startup of Firefox

Categories

(Testing :: Marionette, defect, P2)

x86_64
Windows 7
defect

Tracking

(firefox-esr52 wontfix, firefox53 fixed, firefox54 fixed, firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- wontfix
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

As noticed over on bug 1256425 there is an extra window we sometimes see when restarting Firefox after an update, which disappears within a moment. We should try to ignore that:

(In reply to Matt Woodrow (:mattwoodrow) from bug 1256425 comment #22)
> (In reply to Henrik Skupin (:whimboo) from bug 1256425 comment #21)
> > So this extra window we get here is actually a graphics test page, which
> > gets closed kinda quickly again, but can cause a race: 
> > 
> > https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/
> > SanityTest.js#303
> > 
> > Mochitests have a specific method which gets called to ensure they do not
> > run into that race:
> > 
> > https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-
> > test.js#256
> > 
> > So the underlying code has been implemented via bug 1156135 and is there to
> > report some telemetry stats. What I wonder is why this test is run at all
> > when Telemetry is switched off - which is done by Marionette by default.
> > Matt, can you tell us a bit more why we do not have a condition for that? Is
> > there a simple way to turn this behavior off? Or would we also need such a
> > workaround like mochitests?
> 
> The graphics window also adjusts prefs based on the results, so just
> detecting telemetry isn't sufficient.
> 
> I think doing the same as mochitests does is the best course of action.

And when it happens:

(In reply to Matt Woodrow (:mattwoodrow) from comment #24)
> (In reply to Henrik Skupin (:whimboo) from comment #23)
> > (In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> > > The graphics window also adjusts prefs based on the results, so just
> > > detecting telemetry isn't sufficient.
> > > 
> > > I think doing the same as mochitests does is the best course of action.
> > 
> > I see. So how often is this actually loaded? I assume it doesn't happen for
> > each start of Firefox but only if a new version has been installed? I have
> > that feeling because we only see that with our update tests and not
> > elsewhere.
> 
> Right, every new firefox version, configuration change (GPU, drivers etc) or
> new profile (since the state is stored in prefs).

A workaround for this would be to explicitly check for a browser window to work with after a restart. With that in place this bug is not a high priority item.
So I was able to also bring up this window on OS X by simply modifying:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/moz.build

But sadly I'm not able to reproduce the problem. Checking more makes me even wonder why we use this window as browser window anyway. Reason is that Marionette waits for the "browser-delayed-startup-finished" observer notification, and then calls getMostRecentWindow("navigator:browser"). So this window should never be accounted for.

I added some more debug lines to marionette server code to see if the issue from bug 1322383 is really this problem.
Ok, so here a try build which perfectly shows the problem:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=32a06f9c7dc85755fc8409d4040f78feb413efd9&selectedJob=77908785

As the log tells me we construct two browserContext instances for a browser and the gfx test window:

> *** starting browser for win with URI: chrome://browser/content/browser.xul
> *** starting browser for win with URI: chrome://gfxsanity/content/sanityparent.html

So we indeed have to wait for the window to be closed before starting the tests.
A fix for this bug will be part of my patch series for bug 1322383.
Assignee: nobody → hskupin
No longer blocks: 1322383
Status: NEW → ASSIGNED
Depends on: 1322383
The patch as attached now got already review over on bug  1322383. So this is just for sanity.
Comment on attachment 8844560 [details]
Bug 1315611 - Ignore GFX sanity check window during startup of Firefox.

https://reviewboard.mozilla.org/r/117942/#review119682
Attachment #8844560 - Flags: review?(ato) → review+
I will wait with pushing the fix until later today when we have all results for the required landing of the patch on bug 1345105.
Depends on: 1345105
No longer depends on: 1322383
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/55ee9540dd12
Ignore GFX sanity check window during startup of Firefox. r=ato
https://hg.mozilla.org/mozilla-central/rev/55ee9540dd12
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Given that this fixes test failures please uplift this test-only patch to all other branches. Thanks.
Whiteboard: [checkin-needed-aurora][checkin-needed-beta][checkin-needed-esr52]
https://hg.mozilla.org/releases/mozilla-aurora/rev/b3b491a4bd8f
Flags: in-testsuite+
Whiteboard: [checkin-needed-aurora][checkin-needed-beta][checkin-needed-esr52] → [checkin-needed-beta][checkin-needed-esr52]
I missed to note that for each branch the patch on bug 1345105 needs to be uplifted first.
Whiteboard: [checkin-needed-beta][checkin-needed-esr52] → [checkin-needed-beta][checkin-needed-esr52][needs patch on bug 1345105 uplifted first]
https://hg.mozilla.org/releases/mozilla-beta/rev/674c744f9b16
Whiteboard: [checkin-needed-beta][checkin-needed-esr52][needs patch on bug 1345105 uplifted first] → [checkin-needed-esr52][needs patch on bug 1345105 uplifted first]
(In reply to Ryan VanderMeulen [:RyanVM] from comment #15)
> Backed out from Beta for making test_refresh_firefox.py permafail across all
> opt builds (including after the uplift of bug 1333484).
> https://hg.mozilla.org/releases/mozilla-beta/rev/
> 20dda1bb565d0b18c77e01468c568f68fa1b1396
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=82590685&repo=mozilla-
> beta

Gijs, do you have an idea why this test would be broken on mozilla-beta when we uplift the changes of this bug? I do not really see a difference in that test between central/aurora vs. beta beside the before-mentioned changes in bug 1333484.

What we are doing here is simply delaying the initilization of Marionette from `final-ui-startup` to `sessionstore-windows-restored`, and also ensure that the GFX sanity test window has been closed.

We would like to uplift this change to esr52 too to improve test stability.

Thanks
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Henrik Skupin (:whimboo) from comment #16)
> What we are doing here is simply delaying the initilization of Marionette
> from `final-ui-startup` to `sessionstore-windows-restored`, and also ensure
> that the GFX sanity test window has been closed.
> 
> We would like to uplift this change to esr52 too to improve test stability.
> 
> Thanks

At a guess, have you checked that that observer notification actually fires when we're displaying about:welcomeback (or, on a similar principle, about:sessionrestore), before clicking the 'restore' button? Because I bet it doesn't, and that would explain why the test is now broken. I don't think you can wait for sessionstore-windows-restored.

You should just ignore the actual graphics window when enumerating windows (or injecting JS in them or whatever marionette is trying to do) instead, and I don't really understand why you haven't done that.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(hskupin)
(In reply to :Gijs from comment #17)
> At a guess, have you checked that that observer notification actually fires
> when we're displaying about:welcomeback (or, on a similar principle,
> about:sessionrestore), before clicking the 'restore' button? Because I bet
> it doesn't, and that would explain why the test is now broken. I don't think
> you can wait for sessionstore-windows-restored.

How would I trigger such a page to be loaded during startup? With Marionette enabled we are just displaying about:blank. I assume I will have to tweak the profile and do a restart of the browser. Maybe I should simply check what's going wrong with my patch applied on mozilla-beta.

> You should just ignore the actual graphics window when enumerating windows
> (or injecting JS in them or whatever marionette is trying to do) instead,
> and I don't really understand why you haven't done that.

For Mochitests we also wait until the gfx window is closed before the tests get started. See the link in the first comments. So I didn't thought that it would cause conflicts when we do the same. Also as what I have seen is that in my cases the gfx window was closed after sessionstore-window-restored.
Flags: needinfo?(hskupin)
Whiteboard: [checkin-needed-esr52][needs patch on bug 1345105 uplifted first] → [needs patch on bug 1345105 uplifted first]
What I'm concerned about is that this window is enabling/disabling different preferences for testing purposes and keep them in the appropriate state. If we do not wait for the window to be closed and simply ignoring it when enumarating window handles, possibly upcoming graphics tests would not work on the recommended settings.
Also what is strange, why does it work on central and aurora and fails in beta?
(In reply to Henrik Skupin (:whimboo) from comment #18)
> (In reply to :Gijs from comment #17)
> > At a guess, have you checked that that observer notification actually fires
> > when we're displaying about:welcomeback (or, on a similar principle,
> > about:sessionrestore), before clicking the 'restore' button? Because I bet
> > it doesn't, and that would explain why the test is now broken. I don't think
> > you can wait for sessionstore-windows-restored.
> 
> How would I trigger such a page to be loaded during startup?

Do a reset or use one of the many 'crash firefox' programs to crash firefox, then see what happens on the next start?

> With Marionette
> enabled we are just displaying about:blank. I assume I will have to tweak
> the profile and do a restart of the browser. Maybe I should simply check
> what's going wrong with my patch applied on mozilla-beta.

That might be the best option.

> > You should just ignore the actual graphics window when enumerating windows
> > (or injecting JS in them or whatever marionette is trying to do) instead,
> > and I don't really understand why you haven't done that.
> 
> For Mochitests we also wait until the gfx window is closed before the tests
> get started. See the link in the first comments. So I didn't thought that it
> would cause conflicts when we do the same. Also as what I have seen is that
> in my cases the gfx window was closed after sessionstore-window-restored.

I'm confused - you seem to both claim (in comment #19) that we need to wait for the gfx window to close, and you're saying here that this code does not in fact do that...
(In reply to :Gijs from comment #21)
> > For Mochitests we also wait until the gfx window is closed before the tests
> > get started. See the link in the first comments. So I didn't thought that it
> > would cause conflicts when we do the same. Also as what I have seen is that
> > in my cases the gfx window was closed after sessionstore-window-restored.
> 
> I'm confused - you seem to both claim (in comment #19) that we need to wait
> for the gfx window to close, and you're saying here that this code does not
> in fact do that...

Because it was always closed after sessionstore-window-restored (that's what I wrote), I thought that we can move to this observer notification before starting the server of Marionette.

Shouldn't this also fix a race condition when Marionette start to navigate to a page right after startup while a session restore is in progress and overwrites the tabs and loaded web pages? I can remember that we had some trouble when mconley created the sessionstore test.

I think that I will have to talk to graphics people to figure out how important it would be to wait for the window to be closed. 

Regarding own beta builds I get failures for Rust. So it might take a bit until I can verify everything:

 6:01.17 error[E0463]: can't find crate for `cheddar`
 6:20.57  --> /Volumes/data/code/gecko/media/libstagefright/binding/mp4parse_capi/build.rs:1:1
 6:20.57   |
 6:20.57 1 | extern crate cheddar;
 6:20.57   | ^^^^^^^^^^^^^^^^^^^^^ can't find crate


Given that everything works for central and aurora we decided not to backout the patch immediately.
(In reply to Henrik Skupin (:whimboo) from comment #20)
> Also what is strange, why does it work on central and aurora and fails in
> beta?

Could be related to differences in first-run pages or something? I'm curious as to whether 54-as-Beta Try simulations are hitting this too.
I was able to fix the above Rust compiler failure and I'm building the opt build now.

Meanwhile I downloaded such a build via Treeherder, and I can replicate this script timeout failure of browser/components/migration/tests/marionette/test_refresh_firefox.py. The reason for it is that the check for default browser modal dialog is open and the test cannot continue. Maybe it's only one of the issues we have here.

I will have to wait for the build to finish to check that test in detail, and see what differs when my patch is not applied.
Ok, so the problem is the following. Due to the delayed start of Marionette the driver.js file is getting loaded later, which means that the handler for modal dialogs is getting registered too late, and misses the creation of the check for default browser dialog.

The result is that it cannot be closed here:
https://dxr.mozilla.org/mozilla-beta/rev/20dda1bb565d0b18c77e01468c568f68fa1b1396/browser/components/migration/tests/marionette/test_refresh_firefox.py#273-279

Because of that the call to `self.runAsyncCode` two lines later cannot be executed by Marionette because our framescript is blocked. As such we run into the script timeout.

I can make the test passing when I manually click away the modal dialog.

Personally I would say that there is clearly a bug in Marionette too. We should not only register the handler but also check for a possible open modal dialog at startup.

Can anyone please explain why this dialog only comes up for beta and not on central and aurora?
Hm, the self-made build isn't showing me this problem. I can only replicate it with the downloaded build as made by our automation. I tried to use the same mozconfig settings, but maybe there is something else I miss?
(In reply to Henrik Skupin (:whimboo) from comment #25)
> Because of that the call to `self.runAsyncCode` two lines later cannot be
> executed by Marionette because our framescript is blocked. As such we run
> into the script timeout.
> 
> I can make the test passing when I manually click away the modal dialog.
> 
> Personally I would say that there is clearly a bug in Marionette too. We
> should not only register the handler but also check for a possible open
> modal dialog at startup.
> 
> Can anyone please explain why this dialog only comes up for beta and not on
> central and aurora?

We delay the default dialog on aurora/nightly more than we do on beta/release. It's a long story.

(In reply to Henrik Skupin (:whimboo) from comment #26)
> Hm, the self-made build isn't showing me this problem. I can only replicate
> it with the downloaded build as made by our automation. I tried to use the
> same mozconfig settings, but maybe there is something else I miss?

RELEASE_OR_BETA define and/or branding, I guess.
(In reply to :Gijs from comment #27)
> We delay the default dialog on aurora/nightly more than we do on
> beta/release. It's a long story.

That would clearly make it understandable. Thanks.

> (In reply to Henrik Skupin (:whimboo) from comment #26)
> > Hm, the self-made build isn't showing me this problem. I can only replicate
> > it with the downloaded build as made by our automation. I tried to use the
> > same mozconfig settings, but maybe there is something else I miss?
> 
> RELEASE_OR_BETA define and/or branding, I guess.

That's what I did. So here my mozconfig which I collected from all the different references:

> ac_add_options --enable-official-branding
> ac_add_options --enable-verify-mar
> 
> ac_add_options --enable-crashreporter
> ac_add_options --enable-release
> 
> # Enable checking that add-ons are signed by the trusted root
> MOZ_ADDON_SIGNING=${MOZ_ADDON_SIGNING-1}
> # Enable enforcing that add-ons are signed by the trusted root
> MOZ_REQUIRE_SIGNING=${MOZ_REQUIRE_SIGNING-1}
> 
> ac_add_options --enable-js-shell
> 
> mk_add_options MOZ_MAKE_FLAGS="-j3"
> mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj/beta-opt/
> 
> # Needed to enable breakpad in application.ini
> export MOZILLA_OFFICIAL=1
> 
> export MOZ_TELEMETRY_REPORTING=1
> 
> # Package js shell.
> export MOZ_PACKAGE_JSSHELL=1

Gijs, do you have a link to the code which handles this delay? Maybe it will be easier for me to locally change that while working on mozilla-central than beta.
(In reply to Henrik Skupin (:whimboo) from comment #28)
> Gijs, do you have a link to the code which handles this delay? Maybe it will
> be easier for me to locally change that while working on mozilla-central
> than beta.

https://dxr.mozilla.org/mozilla-central/source/browser/components/nsBrowserGlue.js#1074-1147

You should be able to just change the default values for the relevant prefs in firefox.js
(In reply to :Gijs from comment #29)
> https://dxr.mozilla.org/mozilla-central/source/browser/components/nsBrowserGlue.js#1074-1147

That's indeed complex. But instead of changing the preferences in firefox.js I will do it in Marionette itself, so I might be able to construct a testcase. Thanks.

Regarding the question if `sessionstore-windows-restored` is the event we want to wait for or not... In case of this test it should not matter given that the default browser check is done in _onWindowsRestored which gets run when the `sessionstore-windows-restored` notification is received by Firefox. 

For all other possible cases I see a check in that same method (line 1087) if a session is about to be restored. So I assume that about:sessionrestore should also work perfectly fine if loaded on a restart. But I will definitely have a look if that works for that page and add a test for it.
Sorry for the delay here but I was distracted with other important work over the last days. I will continue to work on this bug by next Monday.
I have spoken with Gijs and based on the test there is no way to disallow this dialog during startup of Marionette. As such we have to find a solution directly in the harness. It's actually not that hard, and I already have it working locally. But I will cover any work via a separate bug which has to land first across all branches, before we can try another uplift for this patch for beta.
Depends on: 1348872
Whiteboard: [needs patch on bug 1345105 uplifted first] → [uplift patch on bug 1348872 first]
Flags: needinfo?(ato)
Whiteboard: [uplift patch on bug 1348872 first] → [uplift before patch on bug 1348872][checkin-needed-beta][checkin-needed-esr52]
Whiteboard: [uplift before patch on bug 1348872][checkin-needed-beta][checkin-needed-esr52] → [uplift before patch on bug 1348872 and bug 1322383][checkin-needed-beta][checkin-needed-esr52]
Given the complexity of the uplift, the right order and results are all layed out on bug 1322383.
https://hg.mozilla.org/releases/mozilla-beta/rev/0c8604d19f4a
Whiteboard: [uplift before patch on bug 1348872 and bug 1322383][checkin-needed-beta][checkin-needed-esr52]
Flags: needinfo?(ato)
Today we decided to no longer uplift feature additions for Marionette to esr52. Only fixes for regressions or other critical issues will be uplifted.
Depends on: 1420864
You need to log in before you can comment on or make changes to this bug.