Intermittent failure for update tests and unit tests in test_about_window.py due to Marionette exception: "win.document.documentElement is null"

RESOLVED FIXED

Status

defect
P1
normal
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: whimboo, Assigned: chmanchester)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Strangely all of our tests in the test module test_about_window.py are failing if the software update tests are run before. This only happens on Windows Vista so far. For other platforms and versions it is working fine.

Here the traceback:

23:59:37  0:24.82 TEST_START: MainThread test_about_window.py TestAboutWindow.test_basic
23:59:37  0:25.11 TEST_END: MainThread ERROR, expected PASS
23:59:37 Traceback (most recent call last):
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_client-0.12-py2.7.egg\marionette\marionette_test.py", line 277, in run
23:59:37     self.setUp()
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\tests\test_about_window.py", line 15, in setUp
23:59:37     self.about_window = self.browser.open_about_window()
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 569, in open_about_window
23:59:37     return BaseWindow.open_window(self, callback, AboutWindow)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 342, in open_window
23:59:37     window = self._windows.create_window_instance(new_handle, expected_window_class)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 100, in create_window_instance
23:59:37     self.switch_to(handle)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 178, in switch_to
23:59:37     return self.create_window_instance(target_handle)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 102, in create_window_instance
23:59:37     Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\wait.py", line 122, in until
23:59:37     rv = condition(self.marionette)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 102, in <lambda>
23:59:37     Wait(self.marionette).until(lambda mn: mn.get_window_type() is not None)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\marionette.py", line 1186, in get_window_type
23:59:37     response = self._send_message('getWindowType', 'value')
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\decorators.py", line 36, in _
23:59:37     return func(*args, **kwargs)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\marionette.py", line 716, in _send_message
23:59:37     self._handle_error(response)
23:59:37   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\marionette.py", line 752, in _handle_error
23:59:37     raise errors.lookup(status)(message, stacktrace=stacktrace)
23:59:37 MarionetteException: MarionetteException: win.document.documentElement is null
23:59:37 stacktrace:
23:59:37 	GeckoDriver.prototype.getWindowType@chrome://marionette/content/driver.js:1243:3
23:59:37 	CommandProcessor.prototype.execute/req<@chrome://marionette/content/command.js:159:11
23:59:37 	TaskImpl_run@resource://gre/modules/Task.jsm:314:40
23:59:37 	TaskImpl@resource://gre/modules/Task.jsm:275:3
23:59:37 	createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:249:14
23:59:37 	Task_spawn@resource://gre/modules/Task.jsm:164:12
23:59:37 	CommandProcessor.prototype.execute@chrome://marionette/content/command.js:153:13
23:59:37 	Dispatcher.prototype.onPacket@chrome://marionette/content/dispatcher.js:80:5
23:59:37 	DebuggerTransport.prototype._onJSONObjectReady/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:471:9
23:59:37 	makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
23:59:37 	makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
23:59:37 
23:59:37 
23:59:37 
23:59:37  0:25.13 TEST_START: MainThread test_about_window.py TestAboutWindow.test_elements

23:59:42  0:30.53 TEST_END: MainThread ERROR, expected PASS
23:59:42 Traceback (most recent call last):
23:59:42   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_client-0.12-py2.7.egg\marionette\marionette_test.py", line 277, in run
23:59:42     self.setUp()
23:59:42   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\tests\test_about_window.py", line 15, in setUp
23:59:42     self.about_window = self.browser.open_about_window()
23:59:42   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 569, in open_about_window
23:59:42     return BaseWindow.open_window(self, callback, AboutWindow)
23:59:42   File "c:\jenkins\workspace\mozilla-central_functional\firefox-ui-tests\firefox_puppeteer\ui\windows.py", line 335, in open_window
23:59:42     Wait(self.marionette).until(window_opened)
23:59:42   File "c:\jenkins\workspace\mozilla-central_functional\venv\lib\site-packages\marionette_driver-0.5-py2.7.egg\marionette_driver\wait.py", line 143, in until
23:59:42     cause=last_exc)
23:59:42 TimeoutException: TimeoutException: Timed out after 5.0 seconds

David, could this be a bug in Marionette given the line:

> win.document.documentElement is null
Flags: needinfo?(dburns)
The interesting fact here is that this only happens when we set the MAR channels here:
https://github.com/mozilla/firefox-ui-tests/blob/master/firefox_puppeteer/tests/test_software_update.py#L103

I do not see a relation between the allowed mar channels and opening the about window. I will put together a small testcase for reproduction.
Looks like this is a problem when we pass in callbacks for opening new windows. I have a smaller testcase which reproduces it. I will trim it down further later today. It's not clear yet, what could cause this problem.
This could be a bug where we making certain assumptions that just don't work on Chrome. It will be good to see that test case so we can track it down.
Flags: needinfo?(dburns)
Actually this smells like a dupe of bug 1169644, which is a regression since Firefox 39 as caused by bug 1169798. I will dupe for now and reopen if the landed patch didn't fix the problem.

Armen, please take a note of that and that update tests for the upcoming Firefox 39.0b1 can fail a lot due to the underlying problem. I'm not sure if Chris will be able to get the fix landed today and new versions of Marionette packages released.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1169798
This failure is still present as rare failure:

14:22:20 ERROR test_about_window.py TestAboutWindow.test_basic
14:22:20 ERROR test_about_window.py TestAboutWindow.test_elements
14:22:20 ERROR test_about_window.py TestAboutWindow.test_open_window
14:22:20 ERROR test_about_window.py TestAboutWindow.test_patch_info

http://mm-ci-production.qa.scl3.mozilla.com:8080/job/mozilla-central_functional/33143/console

I will have a look at it tomorrow.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
So this failure is not only present for unit tests but also for our update tests! In this case I was able to see it on mm-win-vista-32-4 while it was running updates for Nightly:

http://mm-ci-production.qa.scl3.mozilla.com:8080/job/mozilla-central_update/17042/console

I'm raising the priority of this bug to P1 because those failures would give us bad reputation when we run update tests via releng.

I will spend the next hours on getting this problem investigated. I hope to have a minimized testcase ready by then, which might Chris can pick up for fixing the Marionette issue.
Blocks: m21s
Status: REOPENED → NEW
Component: Firefox UI Tests → Marionette
Priority: -- → P1
Product: Mozilla QA → Testing
QA Contact: hskupin
Summary: All tests in test_about_window.py are failing if software update unit tests run before → Intermittent failure for update tests and unit tests in test_about_window.py due to Marionette exception: "win.document.documentElement is null"
During today I have seen a couple of those failures. Here the list of all of them:

test_about_window_py TestAboutWindow_test_* (mm-win-vista-32-1)
test_dv_certificate_py TestDVCertificate_test_dv_cert (mm-win-vista-32-3)
test_about_window_py TestAboutWindow_test_* (mm-win-vista-32-1)

Given that the failure happens more often on Vista for the about window unit test, I will pick vista-32-1 for testing.
So the underlying problem with this bug is exactly the same as bug 1141251, but just a layer beneath. The fix for bug 1141251 should have been done for Marionette instead.

GeckoDriver.prototype.getCurrentWindow() needs an appropriate update. Maybe it should wait internally until a valid window handle is returned.

http://mxr.mozilla.org/mozilla-central/source/testing/marionette/driver.js#247

This is most likely also a regression given that formerly we got null returned from getWindowType(). I will check for the range.
So this behavior started between March 28th and 29th:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58c9d079f318&tochange=385840329d91

I strongly believe this is also the cause of the changes on bug 1107706.
Blocks: 1107706
Keywords: regression
I found a way to reliably avoid this issue from firefox-ui-tests (waiting for a new chrome window to load before ever switching to it), I'm going to try a fix within marionette as well. From debugging this it seems possible there's similar underlying cause to bug 1169798. 

As far as I can tell, marionette has never been particularly helpful in determining when chrome windows are safe to interact with. If anything Ato's patch changed timing enough to make the latent race become a problem (I noticed adding a small amount of logging to the server would make it less reproducible, and it didn't show up at all on debug builds).
Assignee: nobody → cmanchester
Keywords: regression
Posted file PR with workaround
Here's the workaround for the windows library. I think this is worth taking for the short term to ensure it addresses the relevant failures.

I have a potential fix for marionette here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b43668646131

We should land this if it's effective but it's more invasive and complex than the last one so I'm not sure if it's worth uplifting if our workaround is effective.
Attachment #8615040 - Flags: review?(hskupin)
(In reply to Chris Manchester [:chmanchester] from comment #12)
> I have a potential fix for marionette here:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b43668646131
> 

I just tested on the test machine, and sadly this doesn't fix the issue. I guess we really have to wait for that window before switching to it.
Comment on attachment 8615040 [details] [review]
PR with workaround

That looks fine, Chris! If that is tested and works as expected then lets take it. I only would add a comment before the code block before pushing to master. Also I will file a separate bug for it, so we can keep the status flags correctly.
Attachment #8615040 - Flags: review?(hskupin) → review+
The patch I wrote for this bug is intended to fix it, but if this is re-purposed to add new safeguards to marionette that's not something I'm planning to pursue.
Assignee: cmanchester → nobody
Just an update for this bug... The PR from Chris actually got merged as https://github.com/mozilla/firefox-ui-tests/commit/b70fbbf8b12b18c745f264217ee13bdf13671867 a while ago. I think it's a good idea in general to wait for the chrome window to be completed loading. If we wouldn't do that, other interactions might fail.

Lets call this bug fixed.
Assignee: nobody → cmanchester
Status: NEW → RESOLVED
Last Resolved: 4 years ago3 years ago
Resolution: --- → FIXED
Component: Marionette → Firefox UI Tests
QA Contact: hskupin
You need to log in before you can comment on or make changes to this bug.