Closed
Bug 1294456
Opened 8 years ago
Closed 8 years ago
[e10s] After opening a tab/window Marionette hangs if any command is send which executes in the listener.js frame script
Categories
(Remote Protocol :: Marionette, defect)
Tracking
(e10s+, firefox50 fixed, firefox51 fixed)
RESOLVED
FIXED
mozilla51
People
(Reporter: whimboo, Assigned: mconley)
References
(Blocks 1 open bug)
Details
(Keywords: hang, regression)
Attachments
(3 files, 5 obsolete files)
I can get Marionette reliably to hang on the server side with the attached testcase. This started to happen not that long ago. I will check for a regression range.
Comment 1•8 years ago
|
||
Henrik, if this turns out to be non-test code breaking the test, please re-nom for tracking-e10s, thanks.
Blocks: e10s-tests
Reporter | ||
Comment 2•8 years ago
|
||
Ok, so this is not only happening in case of syntax failures but also if in this case the condition times out. Interestingly I only see this with my self-made opt and debug build (latest code from mozilla-central) but not with the Nightly build from August 10th. Even when I checkout 6cf0089510fa and rebuilt I'm not able to reproduce. Maybe it is related to artifact builds? My current build flags are: mk_add_options AUTOCLOBBER=1 mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj/opt/ mk_add_options MOZ_MAKE_FLAGS="-j2" ac_add_options --enable-artifact-builds ac_add_options --disable-debug # ac_add_options --enable-dmd # Set environment variables export BUILD_SYSTEM_TELEMETRY=1 export ENABLE_MARIONETTE=1 export MOZ_PACKAGE_JSSHELL=1
Summary: [e10s] Marionette hangs in case of a syntax failure within a test after opening a new tab → [e10s] Marionette hangs in case of a test failure after opening a new tab
(In reply to Henrik Skupin (:whimboo) from comment #0) > I can get Marionette reliably to hang on the server side with the attached > testcase. This started to happen not that long ago. I will check for a > regression range. I don't see an attached testcase -- did you forget to add it?
Flags: needinfo?(hskupin)
This may be related to Bug 1294540, which is also seen on artifact builds only, with e10s. When I use a Nightly build instead, or a full self-made build (instaed of artifact), all is fine.
Reporter | ||
Comment 5•8 years ago
|
||
Interesting. I'm sure that I attached it when I filed the bug. Anyway, here is the testcase. Maja, it would be great if you could try to repro with an artifact and Nightly build.
Flags: needinfo?(hskupin) → needinfo?(mjzffr)
I can reproduce the hang on latest m-c with artifact build. Attaching some logs.
Flags: needinfo?(mjzffr)
Same result as above for full build with e10s. With e10s off on full build, it fails faster and I get a different error: 0:33.16 TEST_END: MainThread ERROR, expected PASS Traceback (most recent call last): File "/Users/mozilla/dev/firefox/testing/marionette/harness/marionette/marionette_test.py", line 344, in run testMethod() File "/Users/mozilla/dev/firefox/test_temp_henrik.py", line 13, in test_hang lambda mn: len(mn.window_handles) == 3) File "/Users/mozilla/dev/firefox/testing/marionette/harness/marionette/marionette_test.py", line 685, in wait_for_condition raise TimeoutException("wait_for_condition timed out") TimeoutException: wait_for_condition timed out
(In reply to Maja Frydrychowicz (:maja_zf) from comment #9) > With e10s off on full build, it fails faster and I get a different error: Is that the intended outcome of the test, or can we actually expect 3 window handles at the point? Anyway, there you have it.
Reporter | ||
Comment 12•8 years ago
|
||
Thanks Maja for testing this. The failure you see is expected and actually needed to reproduce this hang. I think that we all are seeing the same and some e10s related changes broke our IPC communitcation. I will investigate this further and maybe make this bug more general.
Reporter | ||
Comment 13•8 years ago
|
||
Updating the summary to reflect the underlying issue. This hang can happen at any time.
Summary: [e10s] Marionette hangs in case of a test failure after opening a new tab → [e10s] Marionette randomly hangs after "Error: Channel error: cannot send/recv"
Reporter | ||
Updated•8 years ago
|
Attachment #8782078 -
Attachment mime type: text/x-log → text/plain
Reporter | ||
Updated•8 years ago
|
Attachment #8782031 -
Attachment mime type: text/x-log → text/plain
Assignee | ||
Comment 15•8 years ago
|
||
(Referring to the log of https://bugzilla.mozilla.org/show_bug.cgi?id=1294540#c0) Looks like the child has a hard time interpreting something from the parent, and shuts itself down. This error looks suspicious: [Child 31330] WARNING: Message needs unreceived descriptors channel:1174fa000 message-type:4849673 header()->num_fds:1 num_fds:0 fds_i:0: file /builds/slave/ash-m64-0000000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 482 That refers to this: https://dxr.mozilla.org/mozilla-central/rev/fe895421dfbe1f1f8f1fc6a39bb20774423a6d74/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#472 Hey billm, any idea why we might not have gotten the right number of fd's in a message?
Flags: needinfo?(wmccloskey)
Reporter | ||
Comment 16•8 years ago
|
||
Sorry, I think that I wrongly duped the other bug. I tried with my testcase again and I do not see the IPC error but just a hang without a reply. I will correct that and restore the former states.
Flags: needinfo?(wmccloskey)
Summary: [e10s] Marionette randomly hangs after "Error: Channel error: cannot send/recv" → [e10s] Marionette hangs in case of a test failure after opening a new tab
Reporter | ||
Comment 17•8 years ago
|
||
I'm currently working on a regression range via my own debug builds. This might take a while. :(
Reporter | ||
Comment 18•8 years ago
|
||
Ok, interestingly i can now reproduce the hang in a Nightly build. This will make the search faster.
Reporter | ||
Comment 19•8 years ago
|
||
Affected branches are clearly mozilla-central, and mozilla-aurora.
status-firefox50:
--- → affected
status-firefox51:
--- → affected
Reporter | ||
Comment 20•8 years ago
|
||
Sadly I cannot use mozregression because marionette exists with 0 when we have a socket.timeout during teardown. I will investigate that first because it means under those conditions we do not have the expected result.
Reporter | ||
Updated•8 years ago
|
Summary: [e10s] Marionette hangs in case of a test failure after opening a new tab → [e10s] Marionette hangs in teardown if any opened tab or window has not been closed at the end of the test
Reporter | ||
Comment 21•8 years ago
|
||
The hang in teardown is just the result of my testcase. But this could happen at any time. The regression test with mozregression has shown: 14:04.57 INFO: Last good revision: 84b5a1027550faf65534d67dc02372ba09508550 14:04.57 INFO: First bad revision: 80ab1089a326e4a35d32cd6f52ed0194eba89ecf 14:04.57 INFO: Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=84b5a1027550faf65534d67dc02372ba09508550&tochange=80ab1089a326e4a35d32cd6f52ed0194eba89ecf 14:05.63 INFO: Looks like the following bug has the changes which introduced the regression: https://bugzilla.mozilla.org/show_bug.cgi?id=1261842 Mike, can you please have a look at this bug given that it is a regression from one of your patches?
Blocks: 1261842
Flags: needinfo?(mconley)
Keywords: regressionwindow-wanted
Summary: [e10s] Marionette hangs in teardown if any opened tab or window has not been closed at the end of the test → [e10s] Marionette hangs if any opened tab or window has not been closed at the end of the test
Reporter | ||
Comment 22•8 years ago
|
||
I would assume the problem lies in the following changeset? https://hg.mozilla.org/integration/autoland/rev/88b8a2529323
Reporter | ||
Comment 23•8 years ago
|
||
Attachment #8781025 -
Attachment is obsolete: true
Attachment #8782029 -
Attachment is obsolete: true
Attachment #8782031 -
Attachment is obsolete: true
Attachment #8782078 -
Attachment is obsolete: true
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Reporter | ||
Comment 26•8 years ago
|
||
The above assertion is also visible in non-e10s mode. So it's not relevant. Looks like I cannot do more here, and I would ask Mike to continue the investigation.
Reporter | ||
Comment 27•8 years ago
|
||
Bumping severity to major given that this bug causes a lot of intermittent oranges for us in Marionette and Firefox ui tests.
Severity: normal → major
Reporter | ||
Comment 28•8 years ago
|
||
Andreas, maybe you can help too? Looks like this is really happening when we try work with the listener.js frame script for a newly opened tab. Maybe we don't install it correctly? I will update the minimized testcase shortly.
Flags: needinfo?(ato)
Summary: [e10s] Marionette hangs if any opened tab or window has not been closed at the end of the test → [e10s] After opening a tab/window Marionette hangs if any command is send which executes in the listener.js frame script
Reporter | ||
Comment 29•8 years ago
|
||
Attachment #8782493 -
Attachment is obsolete: true
Reporter | ||
Comment 30•8 years ago
|
||
Keep in mind that the testcase does not switch away from the original window handle! Maybe we somehow kill the listener frame script after opening a new tab?
Reporter | ||
Comment 31•8 years ago
|
||
I just noticed that whenever we call switch_to_window() after a tab has been opened, the test will pass even for the original tab. So I assume our references to listener.js are getting mixed-up somehow.
Reporter | ||
Comment 32•8 years ago
|
||
Here is some more debugging output: When using switch_to_window() before accessing the page_source property: 1471960691095 Marionette TRACE conn5 -> [0,16,"getPageSource",null] 1471960691096 Marionette INFO ******** Enter call to get page source ******* 1471960691097 Marionette INFO * executeWhenReady - remoteness change: false 1471960691102 Marionette INFO ******** Exit call to get page source ******* When the switch_to_window() call is left-out because we already have the correct window selected: 1471960823850 Marionette INFO ******** Enter call to get page source ******* 1471960823851 Marionette INFO * executeWhenReady - remoteness change: true The difference here is about the remoteness change. It means that executeWhenReady() will push the new command to the pendingCommands list, but those commands are never popped out of the list again. It feels to me that we miss a call to flushPendingCommands() somewhere. I will have to check if that is the right thing to do, and if yes where it has to be placed.
Reporter | ||
Comment 33•8 years ago
|
||
Interestingly it looks like we load the listener twice when opening a new tab: 1471961553666 Marionette DEBUG loaded listener.js 1471961553833 Marionette TRACE conn5 <- [1,8,null,{}] ++DOCSHELL 0x7f5f37079800 == 7 [pid = 18098] [id = 7] ++DOMWINDOW == 14 (0x7f5f370c7800) [pid = 18098] [serial = 14] [outer = (nil)] ++DOMWINDOW == 15 (0x7f5f370c9800) [pid = 18098] [serial = 15] [outer = 0x7f5f370c7800] 1471961553865 Marionette DEBUG loaded listener.js 1471961553866 Marionette INFO ***** listeners attached ++DOMWINDOW == 16 (0x7f5f37c0e800) [pid = 18098] [serial = 16] [outer = 0x7f5f3745d400] 1471961553956 Marionette INFO ***** listeners attached So this is somewhat similar to bug 1239552 which got a follow-up bug 1296628. I would speculate it might fix this problem. I will mark as a blocker for now and re-evaluate this bug once bug 1296628 got fixed.
Depends on: 1296628
Reporter | ||
Comment 34•8 years ago
|
||
Actually listeners are attached twice also in the switch_to_window() case. So it should not completely be related to bug 1296628.
Assignee | ||
Comment 36•8 years ago
|
||
Some interesting findings. 1) Inside Marionette's browser.js, the reference to this.tab does not refer to the currently selected browser tab, but the browser tab that Marionette last switched to. 2) Bug 1227252 landed about 9 months ago, and accidentally broke that assumption. Some state that browser.js holds on to is supposed to refer to this.tab, but it uses this.browser.selectedTab instead. With your test case, that causes Marionette to think that the browser is switching remoteness, and therefore starts queueing commands for a remoteness change that'll never happen. I think we need to correct the work that was done in bug 1227252 to fix this. If I, for example, back that patch out, the testcase in this bug passes just fine. [1]: Since the initial browser is non-remote (this.tab), but the opened one (this.browser.selectedTab) is not.
Comment hidden (mozreview-request) |
Reporter | ||
Comment 38•8 years ago
|
||
mozreview-review |
Comment on attachment 8784944 [details] Bug 1294456 - Stop Marionette from being confused about which tab it currently cares about in each window. https://reviewboard.mozilla.org/r/74266/#review72124 Just to add, it would be great if my testcase could be added to the unit tests, eg. in https://dxr.mozilla.org/mozilla-central/source/testing/marionette/harness/marionette/tests/unit/test_window_handles.py.
Assignee | ||
Updated•8 years ago
|
Attachment #8784944 -
Flags: review?(cmanchester) → review?(dburns)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 41•8 years ago
|
||
mozreview-review |
Comment on attachment 8784984 [details] Bug 1294456 - Regression test. https://reviewboard.mozilla.org/r/74308/#review72164 ::: testing/marionette/harness/marionette/tests/unit/test_window_handles.py:50 (Diff revision 1) > + self.wait_for_condition(lambda mn: len(mn.window_handles) == 2) > + > + self.assertEqual(self.marionette.get_url(), "about:blank") > + > + self.marionette.switch_to_window(self.marionette.window_handles[1]) > + self.marionette.close() I think if you don't switch back to the original window handle it can cause trouble.
Comment hidden (mozreview-request) |
Comment 43•8 years ago
|
||
mozreview-review |
Comment on attachment 8784944 [details] Bug 1294456 - Stop Marionette from being confused about which tab it currently cares about in each window. https://reviewboard.mozilla.org/r/74266/#review72500
Attachment #8784944 -
Flags: review?(dburns) → review+
Comment 44•8 years ago
|
||
mozreview-review |
Comment on attachment 8784984 [details] Bug 1294456 - Regression test. https://reviewboard.mozilla.org/r/74308/#review72502 Please can you change the commit message to something more meaningful and related to what is currently happening.
Attachment #8784984 -
Flags: review?(dburns) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 47•8 years ago
|
||
Pushed by mconley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5d34ee345e18 Stop Marionette from being confused about which tab it currently cares about in each window. r=automatedtester https://hg.mozilla.org/integration/autoland/rev/e17911d5430d Regression test. r=automatedtester
Comment 48•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5d34ee345e18 https://hg.mozilla.org/mozilla-central/rev/e17911d5430d
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Reporter | ||
Comment 49•8 years ago
|
||
Finally all that works wonderful again. Thank you a lot! Lets get this also landed on aurora to get the regression fixed.
Assignee: nobody → mconley
Flags: needinfo?(ato)
Whiteboard: [checkin-needed-aurora]
Comment 50•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/52ce999aa594 https://hg.mozilla.org/releases/mozilla-aurora/rev/d6b420b13a6a
Updated•8 years ago
|
Whiteboard: [checkin-needed-aurora]
Updated•1 year ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•