Since v147 Firefox stops responding [@ remote_settings::RemoteSettingsService]
Categories
(Application Services :: Remote Settings, defect, P3)
Tracking
(firefox148 fixed, firefox149 fixed)
People
(Reporter: richard.wiseman, Assigned: bdk)
References
Details
(Whiteboard: [fxsync-])
Crash Data
Attachments
(2 files, 3 obsolete files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0
Steps to reproduce:
I have three different profiles in Firefox, two to run specific webapps and one for general browsing. I've noticed in the past few days (I assume since I received the upgrade to Firefox v147) that sometimes Firefox just stops responding. I get a pop-up asking if I want to wait or force it to quit. I've tried waiting, sometimes re-clicking the button as it reappears following a further delay. In the end I just have to force quit.
This has happened both in my general browsing profile and my Evernote profile (in which that's all that is running).
I've noticed it either happens when bringing Firefox into focus or when changing tab, rather than when I'm using a particular tab.
Actual results:
As mentioned above, it just stops responding and I get a pop-up (from my window manager I assume) saying it's stopped responding and asking whether I want to wait or force it to quit.
Expected results:
It shouldn't freeze…
Comment 1•5 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•5 months ago
|
||
Can you please use these steps to get backtrace of the frozen browser?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Getting_Mozilla_crash_report_from_running_application
Thanks.
Updated•5 months ago
|
| Reporter | ||
Comment 3•5 months ago
|
||
Thanks Martin, will do. Does the crash report I submit contain any personal information?
| Reporter | ||
Comment 4•5 months ago
|
||
It happened again, but I was ready!
The ID is 581a0213-24db-39b8-f4a4-a00b0c770c15 or possibly bp-a185e76d-eb4b-48ed-8d33-3017c0260129 — although that might be the same one. (The first was after I'd let the crash report be auto-submitted; the second was after I'd pressed the button in about:crashes to submit it.)
Comment 5•5 months ago
|
||
So it's somewhere at remote_settings::RemoteSettingsService. Not sure what's that used for:
https://firefox-source-docs.mozilla.org/rust-components/topic-guides/remote-settings.html
Updated•5 months ago
|
Updated•5 months ago
|
Comment 6•5 months ago
|
||
Thank you for this very detailed report.
The crash shows that it involves the Rust client of the application-services lib, I'll move it to the right component (hopefully one day they become the same).
Updated•5 months ago
|
| Assignee | ||
Comment 7•5 months ago
|
||
Yes thanks for that report, I think it shows what's going on here:
- Thread 113 is running Suggest code that's making a remote settings request. It's holding the lock for
RemoteSettingsClientInner. - The main thread (Thread 0) is responding to a locale change, which causes the remote settings config to be updated. It's waiting on the lock that Thread 113 is holding.
I believe the deadlock happens because some of the networking code runs in the main thread. This causes thread 113 to be blocked on the getting access to the main thread, while the main thread is blocked waiting for thread 113 to release the lock.
I think the simplest short-term fix would be to configure RemoteSettingsService::update_config to be a wrapped-async call. That way it won't block the main thread while it's running.
Long-term, we should probably make these locks more fine grained. Maybe the config gets it's own lock and we never keep it locked for any significant amount of time. For example the client would lock the config, copy the data, then unlock it right after.
| Reporter | ||
Comment 8•5 months ago
|
||
Many thanks to you all for investigating this.
Was I perhaps doing something unusual to trigger this? (I didn't explicitly or intentionally change locale, but MS Teams does sometimes say it has detected a locale change, although I always ignore that because I haven't changed anything!) And is it just a coincidence that this seems to have started happening on this Firefox release? I was wondering if MS Copilot had recently changed its behaviour, but then I realised I also experienced a similar freeze in my Evernote-only profile.
Thanks again!
| Assignee | ||
Updated•5 months ago
|
Updated•5 months ago
|
| Assignee | ||
Comment 9•5 months ago
|
||
I made https://bugzilla.mozilla.org/show_bug.cgi?id=2013527 for the long-term solution, let's use this one to just stop the deadlock.
Was I perhaps doing something unusual to trigger this? (I didn't explicitly or intentionally change locale, but MS Teams does sometimes say it has detected a locale change, although I always ignore that because I haven't changed anything!)
I don't think so, maybe the network requests are going slower than normal. From what I saw, we're getting about 20 reports a day for this, which means it's pretty rare. For some reason you're getting hit with it more than normal though.
And is it just a coincidence that this seems to have started happening on this Firefox release?
I don't think so. I believe the changes from https://bugzilla.mozilla.org/show_bug.cgi?id=1983135 made this much more likely to happen, although I do see a couple of crash reports from version 146 so it was possible before as well.
| Assignee | ||
Comment 10•5 months ago
|
||
I was able to reproduce the hang by running the following code in the browser console:
ChromeUtils.defineESModuleGetters(lazy, {
SharedRemoteSettingsService:
"resource://gre/modules/RustSharedRemoteSettingsService.sys.mjs",
SuggestIngestionConstraints:
"moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
SuggestStoreBuilder:
"moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
SuggestionProvider:
"moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
SuggestionProviderConstraints:
"moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
SharedRemoteSettingsService: "resource://gre/modules/RustSharedRemoteSettingsService.sys.mjs",
});
let storeDataPath = PathUtils.join(
Services.dirsvc.get("ProfD", Ci.nsIFile).path,
"test-suggest.sqlite"
);
let builder = lazy.SuggestStoreBuilder.init()
.dataPath(storeDataPath)
.remoteSettingsService(lazy.SharedRemoteSettingsService.rustService())
.loadExtension(
AppConstants.SQLITE_LIBRARY_FILENAME,
"sqlite3_fts5_init"
);
let store = builder.build();
await store.clear()
store.ingest(new lazy.SuggestIngestionConstraints({
providers: [lazy.SuggestionProvider.AMP, lazy.SuggestionProvider.FAKESPOT]
}))
lazy.SharedRemoteSettingsService.updateServer()
| Assignee | ||
Comment 11•5 months ago
|
||
Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.
Updated•5 months ago
|
Comment 12•5 months ago
|
||
Comment 13•5 months ago
|
||
| bugherder | ||
| Reporter | ||
Comment 14•4 months ago
|
||
Thanks for such a speedy fix!
| Assignee | ||
Comment 15•4 months ago
|
||
No problem. The fix is currently only on Nightly / 149, I'll try to get the change uplifted to earlier versions though.
| Assignee | ||
Comment 16•4 months ago
|
||
Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.
Original Revision: https://phabricator.services.mozilla.com/D281219
Updated•4 months ago
|
| Assignee | ||
Comment 17•4 months ago
|
||
Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.
Original Revision: https://phabricator.services.mozilla.com/D281219
Updated•4 months ago
|
Comment 18•4 months ago
|
||
firefox-release Uplift Approval Request
- User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
- String changes made/needed: No
- Is Android affected?: no
Comment 19•4 months ago
|
||
firefox-release Uplift Approval Request
- User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
- String changes made/needed: No
- Is Android affected?: no
| Assignee | ||
Comment 20•4 months ago
|
||
Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.
Original Revision: https://phabricator.services.mozilla.com/D281219
Updated•4 months ago
|
| Assignee | ||
Comment 21•4 months ago
|
||
Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.
Original Revision: https://phabricator.services.mozilla.com/D281219
Updated•4 months ago
|
Comment 22•4 months ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
- String changes made/needed: No
- Is Android affected?: no
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Comment 23•4 months ago
|
||
| uplift | ||
Updated•4 months ago
|
Description
•