Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds with message: Cannot get window type for chrome window handle "1"

RESOLVED FIXED in Firefox -esr45

Status

P2
normal
RESOLVED FIXED
3 years ago
9 months ago

People

(Reporter: whimboo, Assigned: whimboo)

Tracking

({intermittent-failure})

unspecified
mozilla52
x86_64
Windows 7
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 wontfix, firefox48 wontfix, firefox49 wontfix, firefox-esr45 fixed, firefox50 fixed, firefox51 fixed, firefox52 fixed)

Details

(Whiteboard: [fixed by bug 1314891], URL)

(Assignee)

Description

3 years ago
We fail with not so helpful error message here while running the update tests:

 10:40:08     INFO - TEST-UNEXPECTED-ERROR | test_direct_update.py TestDirectUpdate.test_update | TimeoutException: TimeoutException: Timed out after 5.0 seconds
 10:40:08     INFO - Traceback (most recent call last):
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\marionette\marionette_test.py", line 351, in run
 10:40:08     INFO -     testMethod()
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\tests\firefox-ui\tests\update\direct\test_direct_update.py", line 20, in test_update
 10:40:08     INFO -     self.download_and_apply_available_update(force_fallback=False)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\testcases\update.py", line 312, in download_and_apply_available_update
 10:40:08     INFO -     self.restart()
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\testcases\base.py", line 63, in restart
 10:40:08     INFO -     self.browser = self.windows.switch_to(lambda win: type(win) is BrowserWindow)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\ui\windows.py", line 168, in switch_to
 10:40:08     INFO -     window = self.create_window_instance(handle)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_puppeteer\ui\windows.py", line 117, in create_window_instance
 10:40:08     INFO -     Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)
 10:40:08     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\marionette_driver\wait.py", line 143, in until
 10:40:08     INFO -     cause=last_exc)
10:40:08 INFO - TEST-INFO took 31261ms 

Given the above trace the problematic call is for until() with the get_window_type() check. Not sure why this fails but a more helpful error could be thrown here by adding a 2nd argument with the error message to the until method.

http://mxr.mozilla.org/mozilla-central/source/testing/puppeteer/firefox/firefox_puppeteer/ui/windows.py#117

The underlying failure might happen because a window is opened with no window type?
(Assignee)

Comment 1

3 years ago
Same for fallback updates. So far only for localized builds on Windows and mozilla-aurora.
status-firefox47: --- → affected
Summary: Intermittent failure | test_direct_update.py TestDirectUpdate.test_update | TimeoutException: Timed out after 5.0 seconds → Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds
17 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 13
* mozilla-central: 4

Platform breakdown:
* windows7-64: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-03-14&endday=2016-03-20&tree=all
(Assignee)

Comment 3

3 years ago
Ok, it also happens for en-US builds:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&filter-searchStr=firefox%20ui%20aurora&filter-tier=2&filter-tier=3&selectedJob=2185352

We should make this failure a bit more understandable.

> Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)

Maybe another window gets opened with no window type but an ID or such. Or would have to check if I can reproduce this problem.
OS: Unspecified → Windows
(Assignee)

Comment 4

3 years ago
Seems like that only Windows 7 64bit systems with both the 32bit and 64bit Firefox installers are affected.
status-firefox48: --- → affected
OS: Windows → Windows 7
Hardware: Unspecified → x86_64
(Assignee)

Updated

3 years ago
Depends on: 1258385
(Assignee)

Comment 5

3 years ago
I can constantly reproduce this problem now on mm-win-7-64-4. Looks like we always fail after the restart when Firefox has been updated. A browser window is present so I'm curious what's causing it. Maybe I will apply the patch from bug 1258385 and try again.
(Assignee)

Comment 6

3 years ago
With the patch applied I see:

> TEST-UNEXPECTED-ERROR | test_direct_update.py TestDirectUpdate.test_update | Tim
eoutException: TimeoutException: Timed out after 5.1 seconds with message: Canno
t get window type for chrome window handle "1"

It looks like that we are trying to work with a chrome window with the handle 1:

********* window type: Browser:About
*** chrome window handles: [u'3', u'14']
*** current handle: 14
*** handle: 14
********* window type: Browser:About
** restarted **
*** chrome window handles: [u'1', u'5']
*** current handle: 1
*** handle: 1

So this is clearly a bug in Marionette in chrome_window_handles after a restart. The first workable chrome window handle should be 3 but never 1. This failure doesn't exist that long yet but goes back to beta as it looks like. So it can be a regression in Marionette or a change in Firefox which causes us to behave incorrectly.
(Assignee)

Comment 7

3 years ago
So I was curious and checked the code of Marionette, especially the ´getChromeWindowHandles()` method:
http://hg.mozilla.org/mozilla-central/annotate/efe7d026ac64/testing/marionette/driver.js#l1433

As you can see it is using a window enumerator retrieved via `getWinEnumerator()`:
http://hg.mozilla.org/mozilla-central/annotate/efe7d026ac64/testing/marionette/driver.js#l273

The code in there looks suspicious when you are using Firefox and are in chrome scope. Under such a condition we do not have a window type of "navigator:browser" but null. So I added some more logging to our test to retrieve the chrome window handles after the restart when we are still in content scope and once again when we switched to chrome scope:

** restarted **
** chrome windows in content scope: [u'5']
** chrome windows in chrome scope: [u'1', u'5']

I'm not sure why we differentiate here between content and chrome scope. This method should always return the same entries. Andreas, could you explain?

But beside that we also got the handle 1 included, where I do not know yet what this is. I will investigate more.
Flags: needinfo?(ato)
(Assignee)

Comment 8

3 years ago
Interestingly the above mentioned behavior only seems to happen for non-browser windows e.g. the about window. I will have a minimized testcase ready in a bit, and will file an appropriate marionette bug.
(Assignee)

Updated

3 years ago
Depends on: 1259029
(Assignee)

Comment 9

3 years ago
Bug 1259029 has been filed for the inconsistency of chrome window handles. So I will continue investigating the suspicious handle 1 here.
Flags: needinfo?(ato)
(Assignee)

Comment 10

3 years ago
I can no longer reproduce this problem on the Windows 7 machine. Looks like it only happens on specific times? Very strange. I will check again tomorrow.
(Assignee)

Comment 11

3 years ago
Most occurring failure for update tests those days. Raising priority accordingly.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: -- → P2
(Assignee)

Comment 12

3 years ago
Interesting. I tried it once more and now it passed:

*** chrome window handles: [u'1', u'5']
*** current handle: 1
*** handle: 1
********* window type: Update:Wizard

There seems to be a race in getting the window type via Marionette.
26 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 24
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* windows7-64: 26

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-03-21&endday=2016-03-27&tree=all
(Assignee)

Comment 14

3 years ago
With bug 1258385 fixed we now have a much clearer failure message.
Summary: Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds → Intermittent failure | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 5.0 seconds with message: Cannot get window type for chrome window handle "1"
(Assignee)

Comment 15

3 years ago
A variation of this failure is clearly the following one:

TestDirectUpdate.test_update | MarionetteException: MarionetteException: win.document.documentElement is null

We have a window but the documentElement is null. Maybe in most of the cases its not null but get_attribute() for the window type returns null?

https://treeherder.mozilla.org/logviewer.html#?job_id=3545559&repo=mozilla-central
(Assignee)

Updated

3 years ago
Depends on: 1270820
(Assignee)

Updated

3 years ago
No longer depends on: 1270820
12 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 10
* mozilla-central: 2

Platform breakdown:
* windows7-64: 10
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-05-10&endday=2016-05-16&tree=all
15 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 10
* mozilla-central: 5

Platform breakdown:
* windows7-64: 11
* windows7-32: 3
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-05-16&endday=2016-05-22&tree=all
13 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 9
* mozilla-beta: 4

Platform breakdown:
* windows7-64: 12
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-05-23&endday=2016-05-29&tree=all
8 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 8

Platform breakdown:
* windows7-64: 6
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2016-05-30&endday=2016-06-05&tree=all
(Assignee)

Updated

2 years ago
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
(Assignee)

Comment 20

2 years ago
Ok, so there is indeed a race here. When Firefox starts after an successful update we have two chrome window handles 1, and 5. So we start to iterate through those here:

https://dxr.mozilla.org/mozilla-central/source/testing/puppeteer/firefox/firefox_puppeteer/ui/windows.py#182

But then some milliseconds later the window with the handle '1' disappears from the list and only handle 5 remains. Because of that the above failure happens.

Now the question is what kind of window is handle 1!?
(Assignee)

Comment 21

2 years ago
So this extra window we get here is actually a graphics test page, which gets closed kinda quickly again, but can cause a race: 

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/SanityTest.js#303

Mochitests have a specific method which gets called to ensure they do not run into that race:

https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#256

So the underlying code has been implemented via bug 1156135 and is there to report some telemetry stats. What I wonder is why this test is run at all when Telemetry is switched off - which is done by Marionette by default. Matt, can you tell us a bit more why we do not have a condition for that? Is there a simple way to turn this behavior off? Or would we also need such a workaround like mochitests?
Flags: needinfo?(matt.woodrow)
(Assignee)

Updated

2 years ago
Blocks: 1314627
(Assignee)

Updated

2 years ago
Depends on: 1314891
(In reply to Henrik Skupin (:whimboo) from comment #21)
> So this extra window we get here is actually a graphics test page, which
> gets closed kinda quickly again, but can cause a race: 
> 
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/
> SanityTest.js#303
> 
> Mochitests have a specific method which gets called to ensure they do not
> run into that race:
> 
> https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-
> test.js#256
> 
> So the underlying code has been implemented via bug 1156135 and is there to
> report some telemetry stats. What I wonder is why this test is run at all
> when Telemetry is switched off - which is done by Marionette by default.
> Matt, can you tell us a bit more why we do not have a condition for that? Is
> there a simple way to turn this behavior off? Or would we also need such a
> workaround like mochitests?

The graphics window also adjusts prefs based on the results, so just detecting telemetry isn't sufficient.

I think doing the same as mochitests does is the best course of action.
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 23

2 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> The graphics window also adjusts prefs based on the results, so just
> detecting telemetry isn't sufficient.
> 
> I think doing the same as mochitests does is the best course of action.

I see. So how often is this actually loaded? I assume it doesn't happen for each start of Firefox but only if a new version has been installed? I have that feeling because we only see that with our update tests and not elsewhere.
Flags: needinfo?(matt.woodrow)
(In reply to Henrik Skupin (:whimboo) from comment #23)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> > The graphics window also adjusts prefs based on the results, so just
> > detecting telemetry isn't sufficient.
> > 
> > I think doing the same as mochitests does is the best course of action.
> 
> I see. So how often is this actually loaded? I assume it doesn't happen for
> each start of Firefox but only if a new version has been installed? I have
> that feeling because we only see that with our update tests and not
> elsewhere.

Right, every new firefox version, configuration change (GPU, drivers etc) or new profile (since the state is stored in prefs).
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 25

2 years ago
This should no longer be a  problem since the patch on bug 1314891 has been landed.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox52: --- → fixed
Resolution: --- → FIXED
Whiteboard: [fixed by bug 1314891]
Target Milestone: --- → mozilla52
(Assignee)

Comment 26

2 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #24)
> Right, every new firefox version, configuration change (GPU, drivers etc) or
> new profile (since the state is stored in prefs).

Thank you. I will file a new bug for Marionette so we can circumvent this window.
(Assignee)

Updated

2 years ago
Depends on: 1315611
Assignee: nobody → hskupin
status-firefox47: affected → wontfix
status-firefox48: affected → wontfix
status-firefox49: --- → wontfix
status-firefox50: --- → fixed
status-firefox51: --- → fixed
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1323174
(Assignee)

Updated

2 years ago
status-firefox-esr45: --- → fixed
7 failures in 657 pushes (0.011 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-esr52: 7

Platform breakdown:
* windows8-64: 3
* windows8-32: 3
* windows7-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2018-01-15&endday=2018-01-21&tree=all
3 failures in 701 pushes (0.004 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-esr52: 3

Platform breakdown:
* windows8-64: 1
* windows8-32: 1
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1256425&startday=2018-01-22&endday=2018-01-28&tree=all
You need to log in before you can comment on or make changes to this bug.