[e10s] Unusable nightly after updating with a staged update

RESOLVED FIXED in Firefox 55

Status

()

Toolkit
Application Update
--
critical
RESOLVED FIXED
5 months ago
3 months ago

People

(Reporter: bogdan_maris, Assigned: rstrong)

Tracking

55 Branch
mozilla56
All
Windows 10
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox53 unaffected, firefox54 unaffected, firefox55+ fixed, firefox56 fixed)

Details

Attachments

(3 attachments)

(Reporter)

Description

5 months ago
[Affected versions]:
- Latest Nightly 55.0a1

[Affected platforms]:
- Windows 10 only

[Steps to reproduce]:
1. Download and start an old Nightly build
2. Force an update from Panel menu/About Nightly or let it update on itself
3. Restart to apply updates.
- repeat steps using same build or another, this is quite intermittent.

[Expected result]:
- After restart, Nightly is usable again.

[Actual result]:
- After restart, Nightly is no longer usable: tabs don't work, pages will not load etc. 

[Regression range]:
- Not sure if this is a regression, I remember I've seen this for some time now, months, it would be impossible to find a regression because the issue reproduces intermittent.

[Additional notes]:
- Not entirely sure the component I've selected is right, if not please change it accordingly.
- We've only ran into this issue only on Windows 10 for some reason and only in Nightly.
- Restarting Firefox will not resolve the issue. What will resolve the issue is disabling e10s (about:config actually works even though everything else does not).
- In task manager there is a blank process from Nightly (can be seen in attached .gif).
- I've attached a gif that shows the issue.
- Also when user closes Nightly or restarts it, a very high probability is that Fx will crash, we've ran into this two times:
bp-c4feaaea-14b5-4881-8d50-0ebe50170606
bp-2644bf03-3bc2-4397-a099-471af0170606
(Reporter)

Comment 1

5 months ago
Here is the .gif showing the issue, was too large to upload it to bugzilla directly.
https://www.dropbox.com/s/6dxs4zmfv2027gk/AAAAAAAAA.gif?dl=0
When this happens can you start Firefox after you kill the firefox.exe processes in task manager on the details tab?
Flags: needinfo?(bogdan.maris)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #2)
> When this happens can you start Firefox after you kill the firefox.exe
> processes in task manager on the details tab?
Be sure to kill all firefox.exe processes in the details tab in task manager when you try this.

Comment 4

5 months ago
I can reproduce the problem when updating Nightly55 with new profile.
You must kill Nightly process from Task Manager.
And Nightly can not start normally again until you completely uninstall nightly from program folder and reinstall.


Browser console shows the following errors when the problem.

While creating services from category 'profile-after-change', service for entry 'Notification Telemetry Service', contract ID '@mozilla.org/notificationTelemetryService;1' does not implement nsIObserver.

NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIScriptSecurityManager.getLoadContextCodebasePrincipal]  remote-browser.xml:370

TypeError: frameLoader.tabParent is null[詳細]  remote-browser.xml:243:13

NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIScriptSecurityManager.getLoadContextCodebasePrincipal]  remote-browser.xml:370

A promise chain failed to handle a rejection. Did you forget to '.catch', or did you forget to 'return'?
See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Promise

Date: Wed Jun 07 2017 02:36:52 GMT+0900
Full Message: TypeError: frameLoader.tabParent is null
Full Stack: set_docShellIsActive@chrome://global/content/bindings/remote-browser.xml:243:13
_insertBrowser@chrome://browser/content/tabbrowser.xml:2272:13
addTab@chrome://browser/content/tabbrowser.xml:2504:15
ssi_restoreWindow@resource:///modules/sessionstore/SessionStore.jsm:3298:17
ssi_restoreWindows@resource:///modules/sessionstore/SessionStore.jsm:3469:5
initializeWindow@resource:///modules/sessionstore/SessionStore.jsm:1158:11
onBeforeBrowserWindowShown/<@resource:///modules/sessionstore/SessionStore.jsm:1307:9
process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:922:23
walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:806:7
Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:739:11
schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:770:7
Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:455:5
push@resource://gre/modules/osfile/osfile_async_front.jsm:375:19
read@resource://gre/modules/osfile/osfile_async_front.jsm:1093:10
readJSON@resource://services-common/utils.js:385:12
get@resource://gre/modules/FxAccountsStorage.jsm:464:12
_readPlainStorage@resource://gre/modules/FxAccountsStorage.jsm:275:19
async*_initialize@resource://gre/modules/FxAccountsStorage.jsm:96:38
async*initialize@resource://gre/modules/FxAccountsStorage.jsm:64:32
newAccountState@resource://gre/modules/FxAccounts.jsm:412:5
initialize@resource://gre/modules/FxAccounts.jsm:386:32
this.FxAccounts@resource://gre/modules/FxAccounts.jsm:352:3
@resource://gre/modules/FxAccounts.jsm:1788:11
get@resource://gre/modules/XPCOMUtils.jsm:198:21
XPCU_moduleLambda@resource://gre/modules/XPCOMUtils.jsm:273:9
get@resource://gre/modules/XPCOMUtils.jsm:198:21
_getUserData@resource://services-sync/UIState.jsm:164:7
async*_refreshFxAState@resource://services-sync/UIState.jsm:128:26
async*refreshState@resource://services-sync/UIState.jsm:108:11
async*init@resource://services-sync/UIState.jsm:76:5
isReady@resource://services-sync/UIState.jsm:67:7
isReady@resource://services-sync/UIState.jsm:237:12
_maybeUpdateUIState@chrome://browser/content/browser-sync.js:67:9
init@chrome://browser/content/browser-sync.js:102:5
_delayedStartup@chrome://browser/content/browser.js:1640:5
EventListener.handleEvent*onLoad@chrome://browser/content/browser.js:1368:5
onload@chrome://browser/content/browser.xul:1:1
  remote-browser.xml:243
(In reply to Alice0775 White from comment #4)
> I can reproduce the problem when updating Nightly55 with new profile.
> You must kill Nightly process from Task Manager.
> And Nightly can not start normally again until you completely uninstall
> nightly from program folder and reinstall.
Please try restarting the computer after this happens and launching Firefox after restart.
Flags: needinfo?(alice0775)
(Reporter)

Comment 6

5 months ago
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #2)
> When this happens can you start Firefox after you kill the firefox.exe
> processes in task manager on the details tab?

I actually can start Nightly after killing the process but it behaves the same. I checked the Browser console and I get the same thing as Alice said in comment 4.

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #5)
> (In reply to Alice0775 White from comment #4)
> > I can reproduce the problem when updating Nightly55 with new profile.
> > You must kill Nightly process from Task Manager.
> > And Nightly can not start normally again until you completely uninstall
> > nightly from program folder and reinstall.
> Please try restarting the computer after this happens and launching Firefox
> after restart.

Restarting the PC after I closed Nightly seems to fixed the problem, Nightly successfully can start and can be used again as normal.
Flags: needinfo?(bogdan.maris)
Thanks! This is very similar to bug 1112937 and the patch I am working on already handles this case as well as the updater can handle it since the firefox.exe process is outside of its control.
Depends on: 1112937
Flags: needinfo?(alice0775)
Summary: [e10s] Unusable nightly after update to latest build → [e10s] firefox.exe process doesn't always exit when updating which breaks Firefox
Depends on: 1370634
Please try to reproduce with the latest nightly. Bug 1370634 makes it so if Firefox has a bug with exiting that the updater will wait until after Firefox's watchdog timer should force the firefox.exe process has exited.
Flags: needinfo?(bogdan.maris)
(Reporter)

Comment 9

5 months ago
Still reproducing this (intermittently just as before) after updating from Nightly from 2017-06-07 or 2017-05-20 (those were the old builds I had lying around) to latest 2017-06-12.
Flags: needinfo?(bogdan.maris)
Bug 1370634 landed on 2017-06-07 so the fix is in the 2017-06-08 build. Please keep an eye on if you can still reproduce with any builds from 2017-06-08 and newer. If it isn't reproducible in the next month I'll close this as fixed by bug 1370634. Thanks.
Whiteboard: [closeme 7/12/17]
The symptoms for this bug are similar to bug 1356637... there is likely something e10s related that is preventing the firefox.exe process from exiting in a timely manner on your system which affects app update since it expected the firefox.exe process to exit within 10 seconds. Now with bug 1370634 landed app update will wait as much as 70 seconds since the watchdog timer waits 63 seconds. I have never been able to reproduce this issue and I think that bug 1370634 will fix this issue as far as app update is concerned but it won't fix the root issue which is outside of app update.

If that is the case a toolkit -> startup and profile system bug should be filed for the root issue of the firefox.exe process not exiting in a timely manner or being killed by the watchdog timer. It is entirely possible it could also be 3rd party software preventing the firefox.exe process from exiting like bug 1356637.
(Reporter)

Comment 12

5 months ago
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #10)
> Bug 1370634 landed on 2017-06-07 so the fix is in the 2017-06-08 build.
> Please keep an eye on if you can still reproduce with any builds from
> 2017-06-08 and newer. If it isn't reproducible in the next month I'll close
> this as fixed by bug 1370634. Thanks.

Just tested with build from 2017-06-09 and I could still reproduce (I made sure I restarted Windows before testing). Browser console output (https://pastebin.com/PXGbtRkC). As stated in comment 0 this has something to do with e10s since if I start a non-e10s window (or disabling the e10s pref and restarting) from the same broken session, firefox will work as it should.
Please attach the files as requested below by performing the following steps:

1. Type about:config in the url bar.
2. Click the button that is shown.
3. Copy devtools.chrome.enabled into the search bar.
4. Toggle the value from false to true by either double clicking it or via the context menu.
5. Under Tools -> Web Developer select Browser Console.
6. Copy and paste the following into the text box at the bottom
var fileLocator = Cc["@mozilla.org/file/directory_service;1"].getService(Ci.nsIProperties); var dir = fileLocator.get("UpdRootD", Ci.nsIFile); dir.reveal();

7. Attach the active-update.xml and updates.xml files to this bug
8. In the updates subdirectory attach the last-update.log and the backup-update.log files if they exist.
9. In the 0 subdirectory attach the update.log and the update.status files if they exist.
BTW: thanks for the console output but that info is way too late in the process to provide any insight. :(
Whiteboard: [closeme 7/12/17]
In case it isn't clear, you don't need to reproduce again to attach those files as long as the last update for that installation reproduced.
(Reporter)

Comment 16

5 months ago
Created attachment 8877034 [details]
updates.zip

Here is the info you requested, folder 0 was empty though.
Bogdan, please try setting the app.update.staging.enabled preference to false and then try to reproduce again. Also, please attach those same files if it reproduces with staging disabled.

Comment 18

5 months ago
I can still also reproduce the problem Bug 1370634 when Nightly update from 2017-06-11 to 2017-06-12.
So  bug 1370634 did not fix the issue.
(Reporter)

Comment 19

5 months ago
Created attachment 8877041 [details]
v2-updates.zip

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #17)
> Bogdan, please try setting the app.update.staging.enabled preference to
> false and then try to reproduce again. Also, please attach those same files
> if it reproduces with staging disabled.

Done. It is strange that it pointed to the same folder inside updates but last-update is different.
(Reporter)

Updated

5 months ago
Attachment #8877041 - Attachment description: v2-D6F48FE46F98A65B.zip → v2-updates.zip
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #19)
> Created attachment 8877041 [details]
> v2-updates.zip
> 
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #17)
> > Bogdan, please try setting the app.update.staging.enabled preference to
> > false and then try to reproduce again. Also, please attach those same files
> > if it reproduces with staging disabled.
> 
> Done. It is strange that it pointed to the same folder inside updates but
> last-update is different.
That is expected.

Just to verify, you were able to reproduce?
That still had staging enabled. The difference in the logs is due to how long it took before the firefox.exe file was no longer exclusively locked.

Change the app.update.staging.enabled preference to false, exit, and then try to reproduce just to make sure there isn't a pending update already staged. Then try to reproduce and if you do attach the logs.
(Reporter)

Comment 22

5 months ago
Gah...you are right, I think I did not change the value of app.update.staging.enabled. Trying again.
Out of curiosity, does it seem like it might not be reproducible with staging disabled?
Flags: needinfo?(bogdan.maris)
(Reporter)

Comment 24

5 months ago
Sorry for late reply, I was away from office. 

I have trouble reproducing with staging disabled, I haven't been able to reproduce it yet and I tried like 20 times or so.
Steps:
1. Changed the update channel to something irrelevant just so update will not start.
2. Changed pref app.update.staging.enabled to false
3. Exit firefox and change the update channel back to "nightly"
4. Start Firefox
5. Restart after update.

With staging enabled I could have reproduced 1 out of 3-4 tries.
Flags: needinfo?(bogdan.maris) → needinfo?(robert.strong.bugs)
No problem and thanks!
Flags: needinfo?(robert.strong.bugs)
It appears that bug 1368600 and this bug are very closely related.
Created attachment 8877140 [details] [diff] [review]
patch - disable staging updates on Windows
Assignee: nobody → robert.strong.bugs
Status: NEW → ASSIGNED
Comment on attachment 8877140 [details] [diff] [review]
patch - disable staging updates on Windows

Pushed to try
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d918112db2069558eaa1838a4d03935facf4c6c

I've run all app update tests locally and since they all passed locally on Windows along with this being a Windows only change this should be safe without a try push.
Attachment #8877140 - Flags: review?(mhowell)
Depends on: 1368600
No longer depends on: 1112937, 1370634
Blocks: 1368600
No longer depends on: 1368600
[Tracking Requested - why for this release]: This bug makes Windows Firefox not launch properly after updating with staging enabled which is the default. It is likely that the simple patch to disable staging on Windows in this bug will also fix bug 1368600.
tracking-firefox55: --- → ?

Updated

5 months ago
Attachment #8877140 - Flags: review?(mhowell) → review+

Comment 30

5 months ago
Pushed by rstrong@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c173d396a46e
Change app.update.staging.enabled preference to false on Windows to disable update staging. r=mhowell
tracking-firefox55: ? → +
Summary: [e10s] firefox.exe process doesn't always exit when updating which breaks Firefox → [e10s] Unusable nightly after updating with a staged update

Comment 31

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c173d396a46e
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8877140 [details] [diff] [review]
patch - disable staging updates on Windows

Note: this should also fix bug 1368600.

Approval Request Comment
[Feature/Bug causing the regression]: Update staging bug 307181 and Windows 10 Creator
[User impact if declined]: Firefox won't start properly after updating on Windows 10 Creator without killing the Firefox process or rebooting the OS.
[Is this code covered by automated tests?]: Yes. The update without staging path has many tests. The failure on Windows 10 Creator does not have a test.
[Has the fix been verified in Nightly?]: Not yet but disabling of update staging fixing this bug was checked in comment #24.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: The update without staging code path is well tested and used as a fallback when staging fails.
[String changes made/needed]: None
Attachment #8877140 - Flags: approval-mozilla-beta?
If this is due to a windows update and not a change in firefox 55, is there a risk this would affect 53 to 54 updates?
It turns out that staging updates was broken on Windows with Firefox 53 and 54 due to bug 1362267 so they shouldn't be affected.
Comment on attachment 8877140 [details] [diff] [review]
patch - disable staging updates on Windows

fix an issue with updates on windows, beta55+

Thanks Robert, this should be in 55b2
Attachment #8877140 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 36

5 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/5b33d7b0729a
status-firefox55: affected → fixed
Depends on: 1373564
Needinfo'ing Wayne and Edmund to let them know that the app.update.staging.enabled pref should be set to false on Windows for Thunderbird and SeaMonkey to avoid this bug with the latest Windows 10 Update. This bug only affects Beta and Nightly at this time.
Flags: needinfo?(vseerror)
Flags: needinfo?(ewong)
Currently the default is false so applications had to opt-in to update staging. It would be a good thing to add the pref with a value of false on Windows systems though.

Comment 39

5 months ago
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #37)
> Needinfo'ing Wayne and Edmund to let them know that the
> app.update.staging.enabled pref should be set to false on Windows for
> Thunderbird and SeaMonkey to avoid this bug with the latest Windows 10
> Update. This bug only affects Beta and Nightly at this time.

Thanks for the heads up, Robert!  Appreciate it!
Flags: needinfo?(ewong)

Comment 40

5 months ago
Thanks Robert.
Flags: needinfo?(vseerror)
(In reply to Robert Strong (PTO until 8/14) [:rstrong] (use needinfo to contact me) from comment #32)
> [Is this code covered by automated tests?]: Yes. The update without staging
> path has many tests. The failure on Windows 10 Creator does not have a test.
> [Has the fix been verified in Nightly?]: Not yet but disabling of update
> staging fixing this bug was checked in comment #24.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on rstrong's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.