Closed Bug 1256425 Opened 8 years ago Closed 8 years ago

Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds with message: Cannot get window type for chrome window handle "1"

Categories

(Testing :: Firefox UI Tests, defect, P2)

x86_64
Windows 7
defect

Tracking

(firefox47 wontfix, firefox48 wontfix, firefox49 wontfix, firefox-esr45 fixed, firefox50 fixed, firefox51 fixed, firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- fixed
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [fixed by bug 1314891])

We fail with not so helpful error message here while running the update tests:

 10:40:08     INFO - TEST-UNEXPECTED-ERROR | test_direct_update.py TestDirectUpdate.test_update | TimeoutException: TimeoutException: Timed out after 5.0 seconds
 10:40:08     INFO - Traceback (most recent call last):
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\marionette\marionette_test.py", line 351, in run
 10:40:08     INFO -     testMethod()
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\tests\firefox-ui\tests\update\direct\test_direct_update.py", line 20, in test_update
 10:40:08     INFO -     self.download_and_apply_available_update(force_fallback=False)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\testcases\update.py", line 312, in download_and_apply_available_update
 10:40:08     INFO -     self.restart()
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\testcases\base.py", line 63, in restart
 10:40:08     INFO -     self.browser = self.windows.switch_to(lambda win: type(win) is BrowserWindow)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\ui\windows.py", line 168, in switch_to
 10:40:08     INFO -     window = self.create_window_instance(handle)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\ui\windows.py", line 117, in create_window_instance
 10:40:08     INFO -     Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\marionette_driver\wait.py", line 143, in until
 10:40:08     INFO -     cause=last_exc)
10:40:08 INFO - TEST-INFO took 31261ms 

Given the above trace the problematic call is for until() with the get_window_type() check. Not sure why this fails but a more helpful error could be thrown here by adding a 2nd argument with the error message to the until method.

http://mxr.mozilla.org/mozilla-central/source/testing/puppeteer/firefox/firefox_puppeteer/ui/windows.py#117

The underlying failure might happen because a window is opened with no window type?
Same for fallback updates. So far only for localized builds on Windows and mozilla-aurora.
Summary: Intermittent failure | test_direct_update.py TestDirectUpdate.test_update | TimeoutException: Timed out after 5.0 seconds → Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds
Ok, it also happens for en-US builds:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&filter-searchStr=firefox%20ui%20aurora&filter-tier=2&filter-tier=3&selectedJob=2185352

We should make this failure a bit more understandable.

> Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)

Maybe another window gets opened with no window type but an ID or such. Or would have to check if I can reproduce this problem.
OS: Unspecified → Windows
Seems like that only Windows 7 64bit systems with both the 32bit and 64bit Firefox installers are affected.
OS: Windows → Windows 7
Hardware: Unspecified → x86_64
Depends on: 1258385
I can constantly reproduce this problem now on mm-win-7-64-4. Looks like we always fail after the restart when Firefox has been updated. A browser window is present so I'm curious what's causing it. Maybe I will apply the patch from bug 1258385 and try again.
With the patch applied I see:

> TEST-UNEXPECTED-ERROR | test_direct_update.py TestDirectUpdate.test_update | Tim
eoutException: TimeoutException: Timed out after 5.1 seconds with message: Canno
t get window type for chrome window handle "1"

It looks like that we are trying to work with a chrome window with the handle 1:

********* window type: Browser:About
*** chrome window handles: [u'3', u'14']
*** current handle: 14
*** handle: 14
********* window type: Browser:About
** restarted **
*** chrome window handles: [u'1', u'5']
*** current handle: 1
*** handle: 1

So this is clearly a bug in Marionette in chrome_window_handles after a restart. The first workable chrome window handle should be 3 but never 1. This failure doesn't exist that long yet but goes back to beta as it looks like. So it can be a regression in Marionette or a change in Firefox which causes us to behave incorrectly.
So I was curious and checked the code of Marionette, especially the ´getChromeWindowHandles()` method:
http://hg.mozilla.org/mozilla-central/annotate/efe7d026ac64/testing/marionette/driver.js#l1433

As you can see it is using a window enumerator retrieved via `getWinEnumerator()`:
http://hg.mozilla.org/mozilla-central/annotate/efe7d026ac64/testing/marionette/driver.js#l273

The code in there looks suspicious when you are using Firefox and are in chrome scope. Under such a condition we do not have a window type of "navigator:browser" but null. So I added some more logging to our test to retrieve the chrome window handles after the restart when we are still in content scope and once again when we switched to chrome scope:

** restarted **
** chrome windows in content scope: [u'5']
** chrome windows in chrome scope: [u'1', u'5']

I'm not sure why we differentiate here between content and chrome scope. This method should always return the same entries. Andreas, could you explain?

But beside that we also got the handle 1 included, where I do not know yet what this is. I will investigate more.
Flags: needinfo?(ato)
Interestingly the above mentioned behavior only seems to happen for non-browser windows e.g. the about window. I will have a minimized testcase ready in a bit, and will file an appropriate marionette bug.
Depends on: 1259029
Bug 1259029 has been filed for the inconsistency of chrome window handles. So I will continue investigating the suspicious handle 1 here.
Flags: needinfo?(ato)
I can no longer reproduce this problem on the Windows 7 machine. Looks like it only happens on specific times? Very strange. I will check again tomorrow.
Most occurring failure for update tests those days. Raising priority accordingly.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: -- → P2
Interesting. I tried it once more and now it passed:

*** chrome window handles: [u'1', u'5']
*** current handle: 1
*** handle: 1
********* window type: Update:Wizard

There seems to be a race in getting the window type via Marionette.
With bug 1258385 fixed we now have a much clearer failure message.
Summary: Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds → Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds with message: Cannot get window type for chrome window handle "1"
A variation of this failure is clearly the following one:

TestDirectUpdate.test_update | MarionetteException: MarionetteException: win.document.documentElement is null

We have a window but the documentElement is null. Maybe in most of the cases its not null but get_attribute() for the window type returns null?

https://treeherder.mozilla.org/logviewer.html#?job_id=3545559&repo=mozilla-central
Depends on: 1270820
No longer depends on: 1270820
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Ok, so there is indeed a race here. When Firefox starts after an successful update we have two chrome window handles 1, and 5. So we start to iterate through those here:

https://dxr.mozilla.org/mozilla-central/source/testing/puppeteer/firefox/firefox_puppeteer/ui/windows.py#182

But then some milliseconds later the window with the handle '1' disappears from the list and only handle 5 remains. Because of that the above failure happens.

Now the question is what kind of window is handle 1!?
So this extra window we get here is actually a graphics test page, which gets closed kinda quickly again, but can cause a race: 

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/SanityTest.js#303

Mochitests have a specific method which gets called to ensure they do not run into that race:

https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#256

So the underlying code has been implemented via bug 1156135 and is there to report some telemetry stats. What I wonder is why this test is run at all when Telemetry is switched off - which is done by Marionette by default. Matt, can you tell us a bit more why we do not have a condition for that? Is there a simple way to turn this behavior off? Or would we also need such a workaround like mochitests?
Flags: needinfo?(matt.woodrow)
Blocks: 1314627
Depends on: 1314891
(In reply to Henrik Skupin (:whimboo) from comment #21)
> So this extra window we get here is actually a graphics test page, which
> gets closed kinda quickly again, but can cause a race: 
> 
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/
> SanityTest.js#303
> 
> Mochitests have a specific method which gets called to ensure they do not
> run into that race:
> 
> https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-
> test.js#256
> 
> So the underlying code has been implemented via bug 1156135 and is there to
> report some telemetry stats. What I wonder is why this test is run at all
> when Telemetry is switched off - which is done by Marionette by default.
> Matt, can you tell us a bit more why we do not have a condition for that? Is
> there a simple way to turn this behavior off? Or would we also need such a
> workaround like mochitests?

The graphics window also adjusts prefs based on the results, so just detecting telemetry isn't sufficient.

I think doing the same as mochitests does is the best course of action.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> The graphics window also adjusts prefs based on the results, so just
> detecting telemetry isn't sufficient.
> 
> I think doing the same as mochitests does is the best course of action.

I see. So how often is this actually loaded? I assume it doesn't happen for each start of Firefox but only if a new version has been installed? I have that feeling because we only see that with our update tests and not elsewhere.
Flags: needinfo?(matt.woodrow)
(In reply to Henrik Skupin (:whimboo) from comment #23)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> > The graphics window also adjusts prefs based on the results, so just
> > detecting telemetry isn't sufficient.
> > 
> > I think doing the same as mochitests does is the best course of action.
> 
> I see. So how often is this actually loaded? I assume it doesn't happen for
> each start of Firefox but only if a new version has been installed? I have
> that feeling because we only see that with our update tests and not
> elsewhere.

Right, every new firefox version, configuration change (GPU, drivers etc) or new profile (since the state is stored in prefs).
Flags: needinfo?(matt.woodrow)
This should no longer be a  problem since the patch on bug 1314891 has been landed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 1314891]
Target Milestone: --- → mozilla52
(In reply to Matt Woodrow (:mattwoodrow) from comment #24)
> Right, every new firefox version, configuration change (GPU, drivers etc) or
> new profile (since the state is stored in prefs).

Thank you. I will file a new bug for Marionette so we can circumvent this window.
Depends on: 1315611
You need to log in before you can comment on or make changes to this bug.