Closed Bug 2012955 Opened 5 months ago Closed 5 months ago

Since v147 Firefox stops responding [@ remote_settings::RemoteSettingsService]

Categories

(Application Services :: Remote Settings, defect, P3)

Firefox 147

Tracking

(firefox148 fixed, firefox149 fixed)

RESOLVED FIXED
149 Branch
Tracking Status
firefox148 --- fixed
firefox149 --- fixed

People

(Reporter: richard.wiseman, Assigned: bdk)

References

Details

(Whiteboard: [fxsync-])

Crash Data

Attachments

(2 files, 3 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0

Steps to reproduce:

I have three different profiles in Firefox, two to run specific webapps and one for general browsing. I've noticed in the past few days (I assume since I received the upgrade to Firefox v147) that sometimes Firefox just stops responding. I get a pop-up asking if I want to wait or force it to quit. I've tried waiting, sometimes re-clicking the button as it reappears following a further delay. In the end I just have to force quit.

This has happened both in my general browsing profile and my Evernote profile (in which that's all that is running).

I've noticed it either happens when bringing Firefox into focus or when changing tab, rather than when I'm using a particular tab.

Actual results:

As mentioned above, it just stops responding and I get a pop-up (from my window manager I assume) saying it's stopped responding and asking whether I want to wait or force it to quit.

Expected results:

It shouldn't freeze…

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core
Flags: needinfo?(richard.wiseman)

Thanks Martin, will do. Does the crash report I submit contain any personal information?

It happened again, but I was ready!

The ID is 581a0213-24db-39b8-f4a4-a00b0c770c15 or possibly bp-a185e76d-eb4b-48ed-8d33-3017c0260129 — although that might be the same one. (The first was after I'd let the crash report be auto-submitted; the second was after I'd pressed the button in about:crashes to submit it.)

Flags: needinfo?(richard.wiseman)

So it's somewhere at remote_settings::RemoteSettingsService. Not sure what's that used for:
https://firefox-source-docs.mozilla.org/rust-components/topic-guides/remote-settings.html

Component: Widget: Gtk → Remote Settings Client
Product: Core → Firefox
Summary: Since v147 Firefox stops responding → Since v147 Firefox stops responding [@ remote_settings::RemoteSettingsService]

Thank you for this very detailed report.

The crash shows that it involves the Rust client of the application-services lib, I'll move it to the right component (hopefully one day they become the same).

Flags: needinfo?(bdeankawamura)
Component: Remote Settings Client → Remote Settings
Product: Firefox → Application Services
Target Milestone: --- → 147 Branch

Yes thanks for that report, I think it shows what's going on here:

  • Thread 113 is running Suggest code that's making a remote settings request. It's holding the lock for RemoteSettingsClientInner.
  • The main thread (Thread 0) is responding to a locale change, which causes the remote settings config to be updated. It's waiting on the lock that Thread 113 is holding.

I believe the deadlock happens because some of the networking code runs in the main thread. This causes thread 113 to be blocked on the getting access to the main thread, while the main thread is blocked waiting for thread 113 to release the lock.

I think the simplest short-term fix would be to configure RemoteSettingsService::update_config to be a wrapped-async call. That way it won't block the main thread while it's running.

Long-term, we should probably make these locks more fine grained. Maybe the config gets it's own lock and we never keep it locked for any significant amount of time. For example the client would lock the config, copy the data, then unlock it right after.

Flags: needinfo?(bdeankawamura)

Many thanks to you all for investigating this.

Was I perhaps doing something unusual to trigger this? (I didn't explicitly or intentionally change locale, but MS Teams does sometimes say it has detected a locale change, although I always ignore that because I haven't changed anything!) And is it just a coincidence that this seems to have started happening on this Firefox release? I was wondering if MS Copilot had recently changed its behaviour, but then I realised I also experienced a similar freeze in my Evernote-only profile.

Thanks again!

Severity: -- → S3
See Also: → 2013527
Whiteboard: [fxsync-]

I made https://bugzilla.mozilla.org/show_bug.cgi?id=2013527 for the long-term solution, let's use this one to just stop the deadlock.

Was I perhaps doing something unusual to trigger this? (I didn't explicitly or intentionally change locale, but MS Teams does sometimes say it has detected a locale change, although I always ignore that because I haven't changed anything!)

I don't think so, maybe the network requests are going slower than normal. From what I saw, we're getting about 20 reports a day for this, which means it's pretty rare. For some reason you're getting hit with it more than normal though.

And is it just a coincidence that this seems to have started happening on this Firefox release?

I don't think so. I believe the changes from https://bugzilla.mozilla.org/show_bug.cgi?id=1983135 made this much more likely to happen, although I do see a couple of crash reports from version 146 so it was possible before as well.

I was able to reproduce the hang by running the following code in the browser console:

ChromeUtils.defineESModuleGetters(lazy, {
  SharedRemoteSettingsService:
    "resource://gre/modules/RustSharedRemoteSettingsService.sys.mjs",
  SuggestIngestionConstraints:
    "moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
  SuggestStoreBuilder:
    "moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
  SuggestionProvider:
    "moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
  SuggestionProviderConstraints:
    "moz-src:///toolkit/components/uniffi-bindgen-gecko-js/components/generated/RustSuggest.sys.mjs",
    SharedRemoteSettingsService: "resource://gre/modules/RustSharedRemoteSettingsService.sys.mjs",
});
let storeDataPath = PathUtils.join(
  Services.dirsvc.get("ProfD", Ci.nsIFile).path,
  "test-suggest.sqlite"
);
let builder = lazy.SuggestStoreBuilder.init()
    .dataPath(storeDataPath)
    .remoteSettingsService(lazy.SharedRemoteSettingsService.rustService())
    .loadExtension(
      AppConstants.SQLITE_LIBRARY_FILENAME,
      "sqlite3_fts5_init"
    );
let store = builder.build();
await store.clear()
store.ingest(new lazy.SuggestIngestionConstraints({
    providers: [lazy.SuggestionProvider.AMP, lazy.SuggestionProvider.FAKESPOT]
}))
lazy.SharedRemoteSettingsService.updateServer()

Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.

Assignee: nobody → bdeankawamura
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED

Thanks for such a speedy fix!

No problem. The fix is currently only on Nightly / 149, I'll try to get the change uplifted to earlier versions though.

Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.

Original Revision: https://phabricator.services.mozilla.com/D281219

Attachment #9541804 - Flags: approval-mozilla-release?

Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.

Original Revision: https://phabricator.services.mozilla.com/D281219

Attachment #9541805 - Flags: approval-mozilla-release?

firefox-release Uplift Approval Request

  • User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
  • String changes made/needed: No
  • Is Android affected?: no

firefox-release Uplift Approval Request

  • User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
  • String changes made/needed: No
  • Is Android affected?: no
Attachment #9541815 - Flags: approval-mozilla-release?

Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.

Original Revision: https://phabricator.services.mozilla.com/D281219

Attachment #9541804 - Attachment is obsolete: true
Attachment #9541804 - Flags: approval-mozilla-release?

Made RemoteSettingsService.update_config async-wrapped to avoid the
deadlock described in that bug.

Original Revision: https://phabricator.services.mozilla.com/D281219

Attachment #9541878 - Flags: approval-mozilla-beta?

firefox-beta Uplift Approval Request

  • User impact if declined: Firefox hangs under certain circumstances. It doesn't happen that often, but I'm seeing about 10-20 crashes a day from crash-stats.
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: The changes are very small and simple. The main risk is a bad merge with the generated code. I'll make sure to manually check that.
  • String changes made/needed: No
  • Is Android affected?: no
Crash Signature: [@ syscall | parking_lot_core::thread_parker::imp::ThreadParker::futex_wait ]
Target Milestone: 147 Branch → 149 Branch
Attachment #9541805 - Attachment is obsolete: true
Attachment #9541805 - Flags: approval-mozilla-release?
Attachment #9541878 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9541815 - Attachment is obsolete: true
Attachment #9541815 - Flags: approval-mozilla-release?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: