Closed
Bug 1046078
Opened 10 years ago
Closed 10 years ago
JSbridge disconnect in testPreferences_masterPassword
Categories
(Mozilla QA Graveyard :: Mozmill Tests, defect, P1)
Tracking
(firefox31 fixed, firefox32 fixed, firefox33 fixed, firefox34 fixed, firefox-esr24 wontfix, firefox-esr31 fixed)
People
(Reporter: andrei, Assigned: cosmin-malutan)
References
Details
(Keywords: regression, reproducible, Whiteboard: [mozmill-test-failure])
Attachments
(8 files, 4 obsolete files)
1.18 KB,
patch
|
AndreeaMatei
:
review+
|
Details | Diff | Splinter Review |
38.25 KB,
text/plain
|
Details | |
35.31 KB,
text/plain
|
Details | |
132.82 KB,
image/jpeg
|
Details | |
131.25 KB,
image/jpeg
|
Details | |
595 bytes,
application/javascript
|
Details | |
1.20 KB,
patch
|
whimboo
:
review+
|
Details | Diff | Splinter Review |
1.26 KB,
patch
|
AndreeaMatei
:
review+
|
Details | Diff | Splinter Review |
Sample report: http://mozmill-daily.blargon7.com/#/functional/report/533154d2e18fa05bbdf2eaada8736bbb Aurora 33 is affected. I can reproduce it on CI (windows) machines. This does not reproduce locally. Not sure if it's a regression in Firefox. We did Windows OS updates a couple of days ago. Checking both hypothesis atm. Not all runs seem to be affected, see results from the last 3 days: http://mozmill-daily.blargon7.com/#/functional/reports?app=All&branch=All&platform=Win&from=2014-07-27&to=2014-07-30 Affected are the ones with 32 tests. While only running the masterPassword test I have a 50% reproduce rate. When it fails I have the following message in the log: > ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost Which might imply that we don't properly restart Firefox.
Reporter | ||
Comment 1•10 years ago
|
||
Here's a more relevant slice of reports: http://mozmill-daily.blargon7.com/#/functional/reports?app=All&branch=All&platform=All&from=2014-07-22 This failures seem to have started on 2014-07-22, which indicate that it has nothing to do with the Windows OS udpates, since we did these on the 28th and 29th. We did however had the merge on the 22nd, so this looks like a regression.
Assignee: nobody → andrei.eftimie
Status: NEW → ASSIGNED
Keywords: regression,
regressionwindow-wanted
Reporter | ||
Comment 2•10 years ago
|
||
Only certain locales seem to be affected. The dashboard shows failures for: de, ru and zh-CN I can reproduce the failure with de. It doesn't reproduce with en-US
Reporter | ||
Comment 3•10 years ago
|
||
I am able to reproduce this locally. Hopefully this will speed things up. Unfortunately the oldest localised Aurora builds I can find on the FTP server are from the 24th, so this doesn't help much. I'll try finding exactly what (or why) is failing, maybe this will narrow where to look for a regressor.
Reporter | ||
Comment 4•10 years ago
|
||
I get about 30-40% failure rate with this testcase (windows, de aurora build). I'll look more into it tomorrow. I have no idea what the problem here. I suspect some race condition.
Reporter | ||
Comment 5•10 years ago
|
||
This is failing often enough to warrant its disable. I can complete a testrun with this test skipped on the affected locales on Aurora.
Attachment #8465396 -
Flags: review?(andreea.matei)
Comment 6•10 years ago
|
||
Comment on attachment 8465396 [details] [diff] [review] skip.patch Review of attachment 8465396 [details] [diff] [review]: ----------------------------------------------------------------- http://hg.mozilla.org/qa/mozmill-tests/rev/40192e011707 (aurora)
Attachment #8465396 -
Flags: review?(andreea.matei) → review+
Updated•10 years ago
|
Whiteboard: [mozmill-test-failure] → [mozmill-test-failure][mozmill-test-skipped]
Reporter | ||
Comment 7•10 years ago
|
||
Running this with --debug I managed to get a pass and failure on the testcase.
This is the error:
> *************************
> A coding exception was thrown in a Promise resolution callback.
> See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Promise
>
> Full message: TypeError: aWindow is null
> Full stack: pbu_privacyContextFromWindow@resource://gre/modules/PrivateBrowsingUtils.jsm:24:5
> pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:20:12
> LoginManagerContent.loginsFound@resource://gre/modules/LoginManagerContent.jsm:251:47
> Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:866:23
> this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:745:7
> Assert_waitFor@resource://mozmill/modules/assertions.js:595:7
> Httpd.prototype.stop@resource://mozmill/modules/frame.js:650:1
> AppQuitObserver.prototype.observe@resource://mozmill/modules/frame.js:439:11
> shutdownApplication@resource://mozmill/modules/frame.js:65:3
> events.startShutdown@resource://mozmill/modules/frame.js:169:5
> sendMessage@resource://mozmill/driver/msgbroker.js:44:5
> MozMillController.prototype.restartApplication@resource://mozmill/driver/controller.js:444:1
> test1@resource://mozmill/modules/frame.js -> file:///c:/work/bugs/1046078/testcase.js:12:3
> Runner.prototype.execFunction@resource://mozmill/modules/frame.js:752:5
> Runner.prototype.runTestModule@resource://mozmill/modules/frame.js:706:9
> Runner.prototype.runTestFile@resource://mozmill/modules/frame.js:690:3
> runTestFile@resource://mozmill/modules/frame.js:775:3
> Bridge.prototype._execFunction@resource://jsbridge/modules/Bridge.jsm:140:10
> Bridge.prototype.execFunction@resource://jsbridge/modules/Bridge.jsm:147:16
> @resource://jsbridge/modules/Server.jsm:32:1
> Server.Session/<@resource://jsbridge/modules/Server.jsm:32:5
> Sockets.Client.prototype.onMessage/event.notify@resource://jsbridge/modules/Sockets.jsm:40:11
>
> *************************
Does look like a race condition. The window is closed before we are done with it?
I'm not sure why the top of the stack lies with PrivateBrowsingUtils.jsm:24:5 since we don't use PB in the testcase...
Reporter | ||
Comment 8•10 years ago
|
||
For comparison the passing test.
Reporter | ||
Comment 9•10 years ago
|
||
Uhg, I've just had this failing _without_ the exception mentioned in comment 9. That might be a different issue.
Reporter | ||
Comment 10•10 years ago
|
||
Updated testcase. We don't need to wait for anything else, just: 1. Open page 2. Wait for it to load 3. Open "about:newtab" 4. Call restartApplication()
Attachment #8464770 -
Attachment is obsolete: true
Reporter | ||
Comment 11•10 years ago
|
||
Nightly is also affected. I can reproduce it on the oldest l10n (de) Nightly build that is still available on the ftp server (2014-07-29). en-US does not fail at all. The message mentioned in comment 0 : > ###!!! [Child][MessageChannel::SendAndWait] Error: Channel closing: too late to send/recv, messages will be lost Also appears when this is not failing
status-firefox34:
--- → affected
Reporter | ||
Comment 12•10 years ago
|
||
Here's about:newtab in en_US
Reporter | ||
Comment 13•10 years ago
|
||
about:newtab in de Notice that the tiles are from the browsing history instead of the predefined tiles. Theory: we close the browser while it's doing some operation in loading the page in these tiles. I'm not _sure_ if the "thumb" is an image or it's loading the actual webpage (I would assume it's a cached image). I don't know why this behaviour is different on en-US vs other locales. I'll try finding some info on this.
Reporter | ||
Comment 14•10 years ago
|
||
Just a small update since the design of the "thumbs" was recently changed and in the previous screenshot I had an older Nightly build which had them looking differently.
Attachment #8467021 -
Attachment is obsolete: true
Reporter | ||
Comment 15•10 years ago
|
||
This is the tracking bug for the changes to the tiles: bug 973273 Relevant changes are probably one of the issues tracked in bug 977787 (doing a lot of reading against them) The "sponsored tiles" are only active for en-US. They are preffed as: `browser.newtabpage.directory.source: chrome://global/content/directoryLinks.json` Where the directoryLinks.json contains entries only for en-US, that's where the difference between locales comes from.
Comment 16•10 years ago
|
||
So what happens when you disable the sponsored tiles for en-US? Will those builds also show this failure?
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #16) > So what happens when you disable the sponsored tiles for en-US? Will those > builds also show this failure? That's what I have been hoping, this way I would be able to get the regressor. Unfortunately I haven't yet been able to have a en-US build fail. The directoryLinks.json file is stored in the profile, I'm now checking if there's anything else different between a `de` and a `en-US` profile.
Reporter | ||
Comment 18•10 years ago
|
||
I've rerun the testcase with mozmill/master, unfortunately the exit code is None, so we learnt nothing new here.
> ←[1;31mTEST-UNEXPECTED-FAIL←[0m | Connection to application lost (exit code: None)
Reporter | ||
Comment 19•10 years ago
|
||
I've actually also gotten the testcase to fail on Win with the latest Nightly en-US build:
> (restart) C:\work\mozmill>mozmill -t C:\work\bugs\1046078\testcase\testcase.js -
> b C:\Mozilla\Nightly\en-US\firefox-34.0a1.en-US.win32\firefox\firefox.exe
> ^/(.*)$
> ^/(.*)\.asis$
> ^/(.*)\.py$
> No handlers could be found for logger "wptserve"
> ^/(.*)$
> ←[1;34mTEST-START←[0m | C:\work\bugs\1046078\testcase\testcase.js | test1
> ←[1;32mTEST-PASS←[0m | C:\work\bugs\1046078\testcase\testcase.js | test1
> TEST-END | C:\work\bugs\1046078\testcase\testcase.js | finished in 1161ms
>
> ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, me
> ssages will be lost
>
> 2014-08-21 05:33:52: stackwalker.cc:125: INFO: Couldn't load symbols for: ntdll.
> pdb|86C5ABD953F644798B23D22818BE23A42
> 2014-08-21 05:33:52: stackwalker.cc:125: INFO: Couldn't load symbols for: kernel
> base.pdb|DD4522B344B04180A82AA5F4E0378A1A2
> 2014-08-21 05:33:52: stackwalker.cc:125: INFO: Couldn't load symbols for: nss3.p
> db|F04E131922E3412496921A8D8FBCEF451
> 2014-08-21 05:33:52: stackwalker.cc:125: INFO: Couldn't load symbols for: xul.pd
> b|713CFC6055EF4AEB8CF6347E4D5E0C0A2
> 2014-08-21 05:33:52: basic_code_modules.cc:88: INFO: No module at 0xd5f4a0
> 2014-08-21 05:33:52: basic_code_modules.cc:88: INFO: No module at 0xd5f5c8
> 2014-08-21 05:33:52: basic_code_modules.cc:88: INFO: No module at 0xd5f4a0
> 2014-08-21 05:33:52: basic_code_modules.cc:88: INFO: No module at 0x12314a0
> ←[1;31mTEST-UNEXPECTED-FAIL←[0m | Connection to application lost (exit code: Non
> e)
> ←[1;31mTEST-UNEXPECTED-FAIL←[0m | Disconnect Error: Application unexpectedly clo
> sed
> RESULTS | Passed: 1
> RESULTS | Failed: 1
> RESULTS | Skipped: 0
> Traceback (most recent call last):
> File "c:\work\mozmill\mozmill\mozmill\__init__.py", line 899, in run
> mozmill.run(tests, self.options.restart)
> File "c:\work\mozmill\mozmill\mozmill\__init__.py", line 444, in run
> self.handle_disconnect(e)
> File "c:\work\mozmill\mozmill\mozmill\__init__.py", line 516, in handle_discon
> nect
> self.stop_runner()
> File "c:\work\mozmill\mozmill\mozmill\__init__.py", line 585, in stop_runner
> raise errors.ShutdownError('client process shutdown unsuccessful')
> ShutdownError: client process shutdown unsuccessful
Not sure what's with the symbols since I used an opt build.
Reporter | ||
Comment 20•10 years ago
|
||
So what I've seen with this testcase on en-US is probably another issue, so I've filed bug 1057246.
Comment 21•10 years ago
|
||
(In reply to Andrei Eftimie from comment #18) > I've rerun the testcase with mozmill/master, unfortunately the exit code is > None, so we learnt nothing new here. We know that we lost the connection to Firefox somehow, and that the application is still running! This might be also a longer delay during shutdown. So I wouldn't call it 'nothing new'. So why no regression window test has been made here? It sounds like the problem is always reproducible.
Reporter | ||
Comment 22•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #21) > So why no regression window test has been made here? It sounds like the > problem is always reproducible. The 'always reproducible' part was actually another bug, see comment 20: bug 1057246 This issue is only reproducible with localised builds and we lack older localised build on FPT, so we would require building Firefox locally with a locale (its possible that we don't need the full locale, only the directory tiles changes). Unfortunately I can't use the testcase ATM until bug 1057246 gets fixed (or we can work around it somehow) because we're always run into that.
Comment 23•10 years ago
|
||
You could use an en-US build and installing the appropriate l10n XPI. Then you will have the version localized. I think that should perfectly be enough for testing.
Reporter | ||
Comment 24•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #23) > You could use an en-US build and installing the appropriate l10n XPI. Then > you will have the version localized. I think that should perfectly be enough > for testing. I didn't knew this was possible to localise FF only with an xpi. I'll look into it.
Comment 25•10 years ago
|
||
You can find specific language packs beside the aurora builds. They will work on all platforms even listed under win32: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/08/2014-08-25-00-40-04-mozilla-aurora-l10n/win32/xpi/
Assignee | ||
Comment 26•10 years ago
|
||
There is no need for a localizated build here, similar to bug 1046645 this reproduces if we restart the browser from an about:newtab window while the tiles are loading. For that we would only have to set the preference "browser.newtabpage.directory.source" to "data:application/json,{}". I did reproduced this manually with the 'Restartles Restart Add-on' but I couldn't caught it in a screencast.
Assignee: andrei.eftimie → cosmin.malutan
Attachment #8466966 -
Attachment is obsolete: true
Assignee | ||
Comment 27•10 years ago
|
||
Also be aware that this reproduces only on windows.
OS: All → Windows 8
Comment 28•10 years ago
|
||
So what exactly happens here? Especially when you do it manually? I miss that important information. Also you most likely want to create a Mozmill bug for that. For the tests we could workaround it. So any reason why we are using about:newtab and not about:blank in teardown? We only want to get rid of the loaded page, so the latter page should be perfectly fine. http://hg.mozilla.org/qa/mozmill-tests/file/488cc02b7a85/firefox/tests/functional/restartTests/testPreferences_masterPassword/test1.js#l28
Assignee | ||
Comment 29•10 years ago
|
||
Right, so in tests, when we open a certain location, then open "about:newtab" and call the restart in test while the tiles are loading, the firefox won't restart after shutting down and we will get: >TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed > .... > .... >Exception: Cannot connect to jsbridge extension, port 56612 When I've done this manually, I prepared a profile with the 'Restartless Restart Add-on' installed, I opened the "about:newtab" page, I hit the restart button, firefox closed but it didn't come back. We are using the about:newtab and not about:blank to be in sync with the closeAllTabs method, and we assume that a user will start a browsing session from about:newtab, but about:blank should be perfectly fine for a workaround. Regarding the mozmill bug I should file, I guess we have bug 974971 for tracking the root issue and we could provide the workaround in this bug.
Assignee | ||
Comment 30•10 years ago
|
||
Here is the patch for nightly, once backported on aurora we should backout the skip patch.
Attachment #8481130 -
Flags: review?(hskupin)
Attachment #8481130 -
Flags: review?(andrei.eftimie)
Comment 31•10 years ago
|
||
(In reply to Cosmin Malutan from comment #29) > Right, so in tests, when we open a certain location, then open > "about:newtab" and call the restart in test while the tiles are loading, the > firefox won't restart after shutting down and we will get: > >TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed > > .... > > .... > >Exception: Cannot connect to jsbridge extension, port 56612 Can you please try to run exactly this testcase with Mozmill 2.1 (latest master)? It should tell that Firefox hasn't been restarted even requested so. It would be a good proof. > When I've done this manually, I prepared a profile with the 'Restartless > Restart Add-on' installed, I opened the "about:newtab" page, I hit the > restart button, firefox closed but it didn't come back. Wonderful!! That's a perfect testcase for bug 974971, even I'm not sure if that is a different incarnation of the underlying issue. > We are using the about:newtab and not about:blank to be in sync with the > closeAllTabs method, and we assume that a user will start a browsing session > from about:newtab, but about:blank should be perfectly fine for a > workaround. If it turns out that about:newtab is really causing all this mess with restarts, we should indeed change it to about:blank too. But lets figure that out on the other bug first. Good catch Cosmin! > Regarding the mozmill bug I should file, I guess we have bug 974971 for > tracking the root issue and we could provide the workaround in this bug. There is NO mozmill bug in this case. It's simply Firefox, which is playing bad with us. Mozmill should only report this correctly.
Comment 32•10 years ago
|
||
Comment on attachment 8481130 [details] [diff] [review] patch v1.0 Review of attachment 8481130 [details] [diff] [review]: ----------------------------------------------------------------- Please combine this patch with the unskip. Otherwise please see my inline comment. ::: firefox/tests/functional/restartTests/testPreferences_masterPassword/test1.js @@ +34,5 @@ > var teardownModule = function(aModule) { > + // Bug 1046078 > + // We have to leave the test in about:blank instead of about:newtab because if > + // tiles are loading while we are shutting down, firefox won't restart > + controller.open("about:blank"); I would not call this a workaround. At this stage it is not necessary to use about:newtab at all. We just want to blank out the currently loaded web page. So just change the single line.
Attachment #8481130 -
Flags: review?(hskupin)
Attachment #8481130 -
Flags: review?(andrei.eftimie)
Attachment #8481130 -
Flags: review-
Assignee | ||
Comment 33•10 years ago
|
||
Thanks Henrik, as discussed on IRC, I didn't add the un-skip patch, because on nightly the test is not skipped but we need the fix there too, so we will backout the skip-patch on aurora.
Attachment #8481130 -
Attachment is obsolete: true
Attachment #8481148 -
Flags: review?(hskupin)
Comment 34•10 years ago
|
||
Comment on attachment 8481148 [details] [diff] [review] patch v1.1 Review of attachment 8481148 [details] [diff] [review]: ----------------------------------------------------------------- Who-ever checks this in, please make sure to also unskip the test. Thanks Cosmin!
Attachment #8481148 -
Flags: review?(hskupin) → review+
Updated•10 years ago
|
Keywords: regressionwindow-wanted → checkin-needed
Comment 35•10 years ago
|
||
Lets do this change for all the branches regarding our stability work.
status-firefox32:
--- → affected
status-firefox-esr24:
--- → wontfix
status-firefox-esr31:
--- → affected
Comment 36•10 years ago
|
||
http://hg.mozilla.org/qa/mozmill-tests/rev/8fe867b60274 (default) http://hg.mozilla.org/qa/mozmill-tests/rev/3aeebb0d63dd (aurora) And unskipped: http://hg.mozilla.org/qa/mozmill-tests/rev/ca1e70e09b86 (aurora) Please check the rest of the branches Cosmin. Thanks!
Updated•10 years ago
|
Whiteboard: [mozmill-test-failure][mozmill-test-skipped] → [mozmill-test-failure]
Assignee | ||
Comment 37•10 years ago
|
||
All branches are affected, only on aurora it was skippe.
Assignee | ||
Comment 38•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #31) > Can you please try to run exactly this testcase with Mozmill 2.1 (latest > master)? It should tell that Firefox hasn't been restarted even requested > so. It would be a good proof. Balow is the log with mozmill 2.1 > No handlers could be found for logger "wptserve" ^/(.*)$ TEST-START | c:\Users\svuser\Desktop\testCase.js | test1 TEST-PASS | c:\Users\svuser\Desktop\testCase.js | test1 TEST-END | c:\Users\svuser\Desktop\testCase.js | finished in 610ms ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, me ssages will be lost 2014-08-29 11:54:40: stackwalker.cc:125: INFO: Couldn't load symbols for: ntdll. pdb|CD4062A231154A17A18DAE7D1A0FBACC2 2014-08-29 11:54:40: stackwalker.cc:125: INFO: Couldn't load symbols for: kernel 32.pdb|17D8B8C647E14C1DAF9C6FC206AAAC512 2014-08-29 11:54:40: stackwalker.cc:125: INFO: Couldn't load symbols for: nss3.p db|F82CA356137646B49C03516C04FD1E6F1 2014-08-29 11:54:40: stackwalker.cc:125: INFO: Couldn't load symbols for: xul.pd b|FFA7CC61623349F98F1C2D417E6423442 2014-08-29 11:54:40: basic_code_modules.cc:88: INFO: No module at 0x37f7f4 2014-08-29 11:54:40: basic_code_modules.cc:88: INFO: No module at 0x37f91c 2014-08-29 11:54:40: basic_code_modules.cc:88: INFO: No module at 0x37f7f4 2014-08-29 11:54:40: basic_code_modules.cc:88: INFO: No module at 0x4ffb18 TEST-UNEXPECTED-FAIL | Connection to application lost (exit code: None) TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed RESULTS | Passed: 1 RESULTS | Failed: 1 RESULTS | Skipped: 0 Traceback (most recent call last): File "c:\Users\svuser\Desktop\mozmill-env\python\lib\site-packages\mozmill-2.1 _dev-py2.7.egg\mozmill\__init__.py", line 899, in run mozmill.run(tests, self.options.restart) File "c:\Users\svuser\Desktop\mozmill-env\python\lib\site-packages\mozmill-2.1 _dev-py2.7.egg\mozmill\__init__.py", line 444, in run self.handle_disconnect(e) File "c:\Users\svuser\Desktop\mozmill-env\python\lib\site-packages\mozmill-2.1 _dev-py2.7.egg\mozmill\__init__.py", line 516, in handle_disconnect self.stop_runner() File "c:\Users\svuser\Desktop\mozmill-env\python\lib\site-packages\mozmill-2.1 _dev-py2.7.egg\mozmill\__init__.py", line 585, in stop_runner raise errors.ShutdownError('client process shutdown unsuccessful') ShutdownError: client process shutdown unsuccessful
Comment 39•10 years ago
|
||
Transplanted: http://hg.mozilla.org/qa/mozmill-tests/rev/363de15faeef (beta) http://hg.mozilla.org/qa/mozmill-tests/rev/9d2e84deab67 (release) Esr31 needs separate patch.
status-firefox31:
--- → fixed
Comment 40•10 years ago
|
||
Please do not forget to remove the checkin-needed keyword once the landing has been completed. Nothing exists for landing right now.
Keywords: checkin-needed
Assignee | ||
Comment 41•10 years ago
|
||
Here is the patch for esr31.
Attachment #8483343 -
Flags: review?(andreea.matei)
Comment 42•10 years ago
|
||
Comment on attachment 8483343 [details] [diff] [review] patch v1.1[esr31] Review of attachment 8483343 [details] [diff] [review]: ----------------------------------------------------------------- http://hg.mozilla.org/qa/mozmill-tests/rev/2aa30fcdfe6f (esr31)
Attachment #8483343 -
Flags: review?(andreea.matei) → review+
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•