Closed Bug 1403428 Opened 7 years ago Closed 7 years ago

Intermittent /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 4294967297

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mrbkap)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Attachments

(1 file, 1 obsolete file)

Summary: Intermittent /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 4294967297 → Intermittent /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 4294967297 | OR | Current window does not have a content browser
That already exists, that's bug 1403688
Summary: Intermittent /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 4294967297 | OR | Current window does not have a content browser → Intermittent /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 4294967297
this failure started the 27th, and has 89 failures in the last 7 days.  This seems to fail on every configuration, so there isn't a clear pattern here.

here is the most recent log (win7-debug):
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=134844500&lineNumber=24164

and related text from the log:
09:07:53     INFO - TEST-START | /html/browsers/windows/browsing-context-names/choose-_parent-001.html
09:07:53     INFO - PID 5336 | [Parent 5336, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [Child 2624, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [Child 2624, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [GPU 4216, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [GPU 4216, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | 
09:07:53     INFO - PID 5336 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
09:07:53     INFO - PID 5336 | 
09:07:53     INFO - PID 5336 | [GPU 4216, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [GPU 4216, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [Parent 5336, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [Child 2624, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file z:/build/build/src/xpcom/threads/nsThread.cpp, line 797
09:07:53     INFO - PID 5336 | [Parent 5336, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - PID 5336 | [Parent 5336, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
09:07:53     INFO - TEST-UNEXPECTED-ERROR | /html/browsers/windows/browsing-context-names/choose-_parent-001.html | Unable to locate window: 2147483652
09:07:53     INFO - stacktrace:
09:07:53     INFO - 	WebDriverError@chrome://marionette/content/error.js:239:5
09:07:53     INFO - 	NoSuchWindowError@chrome://marionette/content/error.js:481:5
09:07:53     INFO - 	GeckoDriver.prototype.switchToWindow@chrome://marionette/content/driver.js:1582:11
09:07:53     INFO - 	Async*despatch@chrome://marionette/content/server.js:563:20
09:07:53     INFO - 	async*execute@chrome://marionette/content/server.js:537:11
09:07:53     INFO - 	async*onPacket/<@chrome://marionette/content/server.js:512:15
09:07:53     INFO - 	async*onPacket@chrome://marionette/content/server.js:511:8
09:07:53     INFO - 	_onJSONObjectReady/<@chrome://marionette/content/transport.js:501:9
09:07:53     INFO - TEST-INFO took 132ms


:overholt, I am ni? you as this was filed in web-platform-tests and not core::dom, I suspect a wrong component on accident.  Is this something you believe is related to the product or the harness/toolchain?
Component: web-platform-tests → DOM
Flags: needinfo?(overholt)
Product: Testing → Core
Whiteboard: [stockwell needswork:owner]
Version: Version 3 → unspecified
See Also: → 1403688
I think mrbkap may know about this sort of thing. Do you, Blake?

(It's weird to me that this intermittent ...)
Flags: needinfo?(overholt) → needinfo?(mrbkap)
Priority: P5 → P2
I spent most of yesterday digging into this. I haven't yet found an explanation for this and my suspicion is that this is actually a bug in the wptrunner test harness (or its integration with Firefox) rather than a bug in the test itself. I'm going to start pushing some logging patches to try to see if I can get more insight into exactly what's happening.

So far, I've been trying to figure out what switchToWindow is failing. Looking at (what I believe to be) the proper testsuite, it seems like it must be [1]. It seems like we should always be asking for the same window, so it's very surprising if we suddenly stop being able to access it.

[1] http://searchfox.org/mozilla-central/rev/1033bfa26f6d42c1ef48621909f04e734a7ed8a3/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py#174
(In reply to Blake Kaplan (:mrbkap) from comment #10)
> my suspicion is that this is actually a bug in the
> wptrunner test harness (or its integration with Firefox) rather than a bug
> in the test itself

Given the presence of Marionette tests in https://bugzilla.mozilla.org/buglist.cgi?quicksearch=intermittent%20unable%20to%20locate%20window I'd bet that the wptrunner integration uses Marionette.
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca96afd3cb9b
Disable /html/browsers/windows/browsing-context-names/choose-_parent-001.html on !win for frequent failures. r=me, a=test-only
I disabled this for high failures, please remember to re-enable this while working on this.

also, :jgraham, it looks like comment 11 indicates this might be a harness issue
Flags: needinfo?(james)
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Blocks: 1408759
I think I know what's happening here, though I haven't reproduced it yet. As can be seen from bug 1408759, this failure is more due to choose-_blank-002.html than either choose-_parent-00N.html test.

choose-_blank-002.html opens a new window (in a new tab), waits for it to load and then ends the test. It uses the PrefixedLocalStorage utility to communicate with itself. In order to close the newly-opened tab, the test uses PrefixedLocalStorage's close_on_cleanup utility. When a test finishes, we fire off the "complete" notification from the harness, which (in some order) notifies the marionette harness as well as the PLS code. That code then sets a property on itself, causing another event to be sent which, when it's fired closes the window.

That races with the harness tearing down the current test and setting up the next test. That ends up calling close_old_windows [1]. Unfortunately, that function is not atomic wrt the browser, so I *think* that between getting the window_handles and the command to switch_to_window/close, PLS gets around to closing the window, causing switch_to_window to throw.

I think the proper fix here would be to add a function to the marionette harness to allow Python to atomically close all but the "test_runner" window, but in the meantime, there should be a pretty simple workaround.

[1] http://searchfox.org/mozilla-central/rev/1c4da216e00ac95b38a3f236e010b31cdfaae03b/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py#147
Flags: needinfo?(mrbkap)
Comment on attachment 8920428 [details]
Bug 1403428 - switch to window and close it atomically.

https://reviewboard.mozilla.org/r/191398/#review196956

:ato is best-suited for this review. I expect we don't want to be adding new WebDriver commands to solve this problem.
Attachment #8920428 - Flags: review?(mjzffr)
Comment on attachment 8920428 [details]
Bug 1403428 - switch to window and close it atomically.

https://reviewboard.mozilla.org/r/191398/#review197244

::: testing/marionette/driver.js:2762
(Diff revision 2)
> + * This is essentially an atomic way for Python to ask the browser to close a
> + * window with a given name (if the window could potentially be closed between
> + * Python calling switchToWindow and then close itself).

Like maja_zf indicated on the bug, we don’t want to introduce a
separate command for this because it is an implementation of the
inherently racy WebDriver API.

If the window has been closed between WebDriver:SwitchToWindow and
WebDriver:CloseWindow, the client needs to care for this by ignoring
the error on the latter command.
Attachment #8920428 - Flags: review?(ato) → review-
Comment on attachment 8920425 [details]
Bug 1403428 - Handle a rare error case more gracefully.

https://reviewboard.mozilla.org/r/191396/#review197246

::: testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:162
(Diff revision 1)
> +            try:
> -            self.marionette.switch_to_window(handle)
> +                self.marionette.switch_to_window(handle)
> +            except errors.NoSuchWindowException:
> +                # We might have raced with the previous test to close this
> +                # window, skip it.
> +                pass
>              self.marionette.close()

This is almost how I think we need to handle this, except
self.marionette.close() also needs to be skipped if we cannot switch
to the window handle, otherwise we risk closing the wrong window.
Attachment #8920425 - Flags: review?(ato) → review+
Assignee: nobody → mrbkap
Attachment #8920428 - Attachment is obsolete: true
I have no idea why I left the close call outside of the try/except. I'll push this now. Let's keep an eye on bug 1408759 and if that gets fixed by this patch then we can re-enable choose-_parent-001.html.
Flags: needinfo?(james)
Pushed by mrbkap@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c9d85d930e9c
Handle a rare error case more gracefully. r=ato
From bug 1408759, this looks fixed. I'll file a new bug to re-enable choose-_parent-001.html and mark this one fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Blocks: 1413022
Target Milestone: --- → mozilla58
Whiteboard: [stockwell disabled] → [stockwell fixed:other]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: