Closed Bug 1706594 Opened 3 years ago Closed 3 years ago

100% CPU usage on WebExtensions process

Categories

(Core :: Networking, defect, P2)

Firefox 88
defect

Tracking

()

RESOLVED FIXED
95 Branch
Performance Impact ?
Tracking Status
firefox-esr91 99+ fixed
firefox94 --- wontfix
firefox95 --- fixed

People

(Reporter: alfred, Assigned: rpl)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0

Steps to reproduce:

Usual browsing, with these plugins : ublocks, multi-account container, lastpass.

Actual results:

At one point, the CPU usage of the WebExtensions process reach 100% CPU usage, always.
It always happen, and when it happens, it stays that way, it is not a burst.
It is in a polling infinite loop (strace shows the recvmsg error unvailable resource).

Here the profile : https://share.firefox.dev/3dBWRDB
__GI___poll uses 100% CPU (focus on the main process).

I don't understand why the 100% usage is on the main process but on top it is on WebExtensions.

The Bugbug bot thinks this bug should belong to the 'WebExtensions::Untriaged' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Product: Firefox → WebExtensions

Hello,

I cannot reproduce the issue on the latest Nightly (90.0a1/20210422213157), Beta (89.0b3/20210422190146) and Release (88.0/20210415204500) under Windows 10 x64 and Ubuntu 16.04 LTS, using the mentioned add-ons.

To simulate normal browsing and maybe stress the browser a bit, I’ve opened several tabs with running videos on YouTube (2 tabs outisde containers and 1 tab inside a container), several Wikipedia articles and several Reddit articles (in and out of containers) and some miscellaneous tabs (about:addons, about:processes, etc). After 20 minutes or so, there was no significant increase in CPU usage on the Extensions process. Only one short spike to ~20% and the rest of the time CPU usage was around 2%-5%. Please note that tests were conducted on new profiles.

Attached file fireflox_strace.txt

Here is a strace log from the WebExtension Process going 100% CPU.

I can intermittently but reliably reproduce this with Firefox 88 and 87 on current Debian/unstable. I can't confirm exact browsing pattern that kicks it off, but once it starts, WebExtensions process is stuck in a loop at 100% CPU until I restart Firefox (closing all tabs doesn't fix this).

I recorded a 12s profiling sample with https://profiler.firefox.com/ and uploaded it at https://share.firefox.dev/3346PaR. It shows that 97% of the time is spent in nsThread::ProcessNextEvent, of that 29% in a libpthread/__recvmsg from libX11/XPending, 15% in libc/__GI___poll, 9% in a libpthread/__read from libxul/nsAppShell::EventProcessorCallback, 29% in libxul/mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal.

The Debian bug describing the same problem (https://bugs.debian.org/986027) has some additional information:

According to perf (perf record -p PID_OF_WebExtensions sleep 5; hotspot perf.data), the CPU time is spend in

nft_pipapo_avx2_scratch_index [nf_tables] use 50% of the cycles

We got another report that seems to share some details with this one, we are not sure if it is related yet, but I'm linking it as a see also in the meantime.

We will come back to this for a more deeper look into the attached profile.

See Also: → 1684299

The GitHub web-ext#2219 issue is relevant, thanks for linking it! It shows the same issue (web-ext using up an entire CPU core looping in nsThread::ProcessNextEvent), and it's significant that it reproduced the same problem on Windows: it means that the problem is not related to Linux libraries such as libX11 and libpthread, and narrows down the root cause to Firefox itself.

The LastPass bug is either a manifestation of the same problem or, more likely, a red herring: I don't have and never had the LastPass extension and yet I have this problem. Considering numerous reports that disabling or even removing all extensions doesn't make this problem go away, it's safe to say the root cause is in the Firefox core code, not in any specific extension.

hi Florian,
could you help us (or points us to someone that may have some time for it) to double-check if there is any other interesting detail in the profile attached to this issue's comment 0 and the ones in https://github.com/mozilla/web-ext/issues/2219#issuecomment-821968369 and https://github.com/mozilla/web-ext/issues/2219#issuecomment-824083397 that we may have missed and could help us to look into the right direction?

(both this issue and the one in the web-ext github repo are reporting the same kind of user noticeable issue, the extension process being quite busy, but the profile attached in this bug is collected on Linux, whereas the ones in the web-ext github issue are collected on Windows).

Thanks in advance for your help.

Flags: needinfo?(florian)

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #8)

hi Florian,
could you help us (or points us to someone that may have some time for it) to double-check if there is any other interesting detail in the profile attached to this issue's comment 0 and the ones in https://github.com/mozilla/web-ext/issues/2219#issuecomment-821968369 and https://github.com/mozilla/web-ext/issues/2219#issuecomment-824083397 that we may have missed and could help us to look into the right direction?

(both this issue and the one in the web-ext github repo are reporting the same kind of user noticeable issue, the extension process being quite busy, but the profile attached in this bug is collected on Linux, whereas the ones in the web-ext github issue are collected on Windows).

Profiles like this would be excellent to show in the Joy of Profiling sessions that we do every Monday. Feel free to join and ask for help to read them there. There's also a Joy of Profiling channel on Matrix where you can ask for help reading profiles.

A few observations:

I hope someone who has a good understanding of the RemoteLazyInputStream code might have a guess. smaug, any idea?

Flags: needinfo?(florian) → needinfo?(bugs)

Is this a regression? If so, finding the regression range would be really useful here.

The profile in comment 4 definitely doesn't look normal. Why would a non-parent process use so much time in XPending.

Jed, does this ring any bells?

Flags: needinfo?(bugs) → needinfo?(jld)

I think the XPending calls are just part of the event loop: every time through it, glib calls gdk which calls Xlib/xcb to check if there are any X events (and xcb also checks for async replies, maybe). It probably isn't that expensive, but it's probably being called a lot — notice how the different branches of the call tree are scattered more or less randomly through the timeline. So, I don't think I have anything to add over comment #9: there's probably something in that nsInputStreamPump branch of the profile that's causing some kind busy-wait via the event loop, like constantly re-dispatching a runnable.

Flags: needinfo?(jld)

Thanks Alfred for all the info, are you perhaps able to bisect which of your extensions might be causing this?

In any case, we don't see any extension code in the profile, so we're probably gonna need help from folks more familiar with IPC, so moving there.

Component: Untriaged → DOM: File
Flags: needinfo?(alfred)
Product: WebExtensions → Core

I tried to disable lastpass extension (that I first suspected), but eventually I had the issue, then lastpass + ublocks, the same.
I did not try to disable the multi-account container extension because I use it all the time, but last update was before the issue raised.

I can make another profile with a custom set of options, if needed ?

Flags: needinfo?(alfred)

Same problem on Firefox 88.0.1 Windows. The profiler data - https://share.firefox.dev/3uaePlM . The problem seems to be in this stack tree:
(root)
RtlUserThreadStart
BaseThreadInitThunk
__scrt_common_main_seh()
wmain(int, wchar_t**)
XRE_InitChildProcess
XRE_InitChildProcess(int, char**, XREChildData const*)
MessageLoop::Run()
MessageLoop::RunHandler()
XRE_RunAppShell()
nsAppShell::Run()
nsBaseAppShell::Run()
MessageLoop::Run()
MessageLoop::RunHandler()
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
nsThread::ProcessNextEvent(bool, bool*)
XPCJSContext::AfterProcessTask(unsigned int)
mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint(bool)

We are getting a similar issue with unpacked extensions for ABP, see details in https://gitlab.com/eyeo/adblockplus/abpui/adblockplusui/-/issues/970

There might be some ping-pong going on on the same thread when we are in closed state? Should we really execute still a callback if we are in closed state?

Flags: needinfo?(amarchesini)
See Also: → 1637742
Severity: -- → S3

For everyone who has this bug:
Stay on Firefox87. It doesn't have this bug.

This is a very annoying bug, and in my opinion security relevant as it breaks security-related extensions such as noscript and adblockers.

It still exists in Firefox 89.0.1.

Because this has not been fixed for over two months now, I am considering to switch my default browser back to Chromium.

I just found a workaround for those of you affected during development with web-ext.
You can simply kill the specific process (the one with high CPU, for example using Task Manager in Windows) which will kill the web-extension process. Then reloading extension will restart it and everything is fine again :)

(In reply to juraj.masiar from comment #19)
Thanks, it works. But there is a more radical way - stay on FF86. FF87 already has this bug.

I think I have the same issue in Firefox Developer Edition 91.0b9 (64-bit)

Here is a portion of troubleshooting information, in case it helps:

Application Basics

Name: Firefox
Version: 91.0b9
Build ID: 20210729185755
Distribution ID:
Update Channel: aurora
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
OS: Linux 5.13.7-arch1-1 #1 SMP PREEMPT Sat, 31 Jul 2021 13:18:52 +0000
Multiprocess Windows: 2/2
Fission Windows: 0/2 Disabled by default
Remote Processes: 5
Enterprise Policies: Inactive
Google Location Service Key: Found
Google Safebrowsing Key: Found
Mozilla Location Service Key: Found
Safe Mode: false

Graphics

Features
Compositing: WebRender
Window Protocol: wayland
Desktop Environment: gnome
Target Frame Rate: 60
GPU #1
Active: Yes
Description: Mesa DRI Intel(R) HD Graphics 3000 (SNB GT2)
Vendor ID: 0x8086
Device ID: 0x0126
Driver Vendor: mesa/i965
Driver Version: 21.1.6.0
RAM: 0

Diagnostics
AzureCanvasBackend: skia
AzureContentBackend: skia
AzureFallbackCanvasBackend: skia
CairoUseXRender: 0
CMSOutputProfile: Empty profile data
Display0: 1920x1200
DisplayCount: 1
Device Reset: Trigger Device Reset
Decision Log
HW_COMPOSITING:
available by default
OPENGL_COMPOSITING:
available by default
WEBRENDER:
available by default
WEBRENDER_QUALIFIED:
available by default
WEBRENDER_COMPOSITOR:
disabled by default: Disabled by default
blocklisted by env: Blocklisted by gfxInfo
WEBRENDER_PARTIAL:
available by default
WEBRENDER_SHADER_CACHE:
disabled by default: Disabled by default
WEBRENDER_OPTIMIZED_SHADERS:
available by default
WEBRENDER_ANGLE:
available by default
unavailable by env: OS not supported
WEBRENDER_DCOMP_PRESENT:
available by default
disabled by user: User disabled via pref
unavailable by env: Requires Windows 10 or later
unavailable by runtime: Requires ANGLE
WEBRENDER_SOFTWARE:
available by default
OMTP:
disabled by default: Disabled by default
WEBGPU:
disabled by default: Disabled by default
blocked by runtime: WebGPU can only be enabled in nightly
X11_EGL:
available by default
blocklisted by env: Blocklisted by gfxInfo
DMABUF:
available by default

Failure Log
(#0) Error: Unable to load glyph: -1

Still in 91.0.1, btw.

Still in 92.0.

Since every release notes Mozilla claims how much faster FF would run now,... and since all I see is 100% CPU usage... is there anything one can help here with respect to debugging this?
If so, please tell what me you'd like to have ... the problem happens so often here, that I shouldn't need to wait too long for it.

Also I guess this should be handled as a security issue.

Because I've seen :florian's setting of Blocks: power-usage.

This is not just a problem of 100% usage.

Everytime this happens, all add-ons stop working (including security relevant ones like noscript). The only way to recover is to manually disable all of them and re-enable them afterwards (after which most loaded websites are broken and need to be reloaded).

Affects at least noscript, ublock-origin and FoxyProxy.

(In reply to Christoph Anton Mitterer from comment #23)

since all I see is 100% CPU usage... is there anything one can help here with respect to debugging this?
If so, please tell what me you'd like to have ... the problem happens so often here, that I shouldn't need to wait too long for it.

Finding reliable steps to reproduce would be excellent.

I think it would be very helpful if someone managed to capture a profile of when the problem begins. Capturing a profile of when the problem ends (you said in comment 24 you can make it end by disabling all the add-ons one by one) might also be interesting.

You can also look in about:processes (if you are not on Nightly, go in about:config first and enable toolkit.aboutProcesses.showThreads first) and check if there are other threads that are active in addition to the main thread of the WebExtension process.

As I noted in comment 9, profiling with the IPC feature of the profiler enabled might be interesting, and profiling on a Nightly build would give us the name of the runnables that keep going through the event loop.

Here is a profile captured using the STR from 15 (loading adblock plus unpacked and reloading its option page after loading it once in a Firefox tab):

I reloaded the options page tab around "28s - 30s" in the timeline (I opened a new tab before reloading the options page to make it more clearly visible when the problem is expected to start based on the screenshots collected along the rest of the profile data).

Around the time when the CPU usage spikes, in the "Marker Chart" tab I see a pretty long (and "pretty crowded") sequence of Runnable markers, in particular the markers for the two runnables nsPipeInputStream::AsyncAwait and dom::InputStreamCallbackRunnable seems to keep be scheduled a lot (which seems to suggest that something like what Jens pointed out in comment 16 may be happening and be what is keeping the CPU usage of the WebExtensions process so high).

Whem the High CPU usage in the WebExtensions child process is being triggered, it looks that
nsPipeInputStream::AsyncAwait and dom::InputStreamCallbackRunnable are being kept to be scheduled
in an infinite loop, which keeps the main thread of the WebExtensions child process busy forever.

The issue seems to be more easily triggered when the extension has been loaded temporarily from files
instead of being loaded from an xpi file.

Nevertheless, the issue seems also to be quite intermittent (likely due to some underlying race which is
triggering it) and so I can't exclude there may be ways to trigger it (and the bug report does also make
me assume there are ways to trigger a similar issue with extensions installed non temporarily, and we may
be just unaware of the STR to trigger them consistently, as this STR provided by AdBlockPlus does).

The infinite loop that I've been able to dig into (using both rr and MOZ_LOG="URILoader:5,nsStringPump:5,sync")
seems to be triggered when the undelying component triggering the request is gone by the time we are
ready to read the request content (e.g. because the window has been reloaded and its DOM elements and
docShell are gone or dying):

  • nsInputStreamPump::OnStateTransfer keeps to call nsDocumentOpenInfo::OnDataAvailable,
    but the nsDocumentOpenInfo instance does not have an m_targetStreamListener set and
    so it ignores the call and returns NS_OK unconditionally (which makes nsInputStreamPump
    to schedule on the main thread a new runnable to try again, because the stream is
    technically still running and ready)

  • by looking to the request status when the infinite loop is being triggered, it seems that
    the request is actually explicitly in a failure status

  • using the STR that I'm currently aware off, the requests triggering the infinite loop have
    status NS_BINDING_ABORTED or NS_ERROR_DOCSHELL_DYING

This patch is currently breaking the infite loop by applying the following changes to
nsDocumentOpenInfo::OnDataAvailable:

  • if there is no retarget listener, check the request status and return a failure if
    the request did already fail (instead of just ignoring it as it is currently happening)

  • log some information about the outcome of OnDataAvailable (because it may be useful to
    diagnose issues that involve the result returned by this method in the future)

Based on a bisect run using mozregression and the STR based on AdBlockPlus, the issue has started
from Bug 1681529, but it may have been a side effect of those changes instead of an issue
specifically introduced by the changes landed as part of that bug.

This patch contains a reduced test case based on the STR from Comment 15 in form of a small xpcshell test.

This xpcshell test reproduce the issue on both my Linux machines, the xpcshell test get stuck while trying to exiting at the end of the test and the WebExtensions child process cpu usage spikes to 100%, but it seems to be intermittently able to complete successfully (it triggers the issue consistently on my laptop, not from time to time it was passing just fine on my desktop machine and while running under rr).

Assignee: nobody → lgreco

I'm not 100% sure that the patch attached to comment 27 is the best way to fix the issue, but:

  • it does break consistently the infinite loop (and the resulting high cpu usage spike) on both the xpcshell test attached to comment 28 and also in a full Nightly local build when I try to trigger the issue using the STR from comment 15

  • it provides some more insights about what is going on in the WebExtensions child process when the high cpu usage is triggered

and so I decided to attach it to use it to discuss about it and the underlying issue with Nika.

Hi Nika,
I'd like to hear your perspective about this issue.

As also described in comment 27, trying to bisect the regressing change using mozregressions and the STR from comment 15 did point to Bug 1681529 as the regressing change, and so it looks that the underlying issue started to become triggerable (or even just more likely) after Bug 1681529 has been landed.

  • In comment 27 there is a more detailed description about what I did notice about the underlying issue while digging into it from rr
  • In comment 28 I have attached an xpcshell test that I used locally to trigger the issue (and also record and dig into it from rr), in case it may help you to get a better picture.
  • The patch attached to comment 27 contains a small change to nsDocumentOpenInfo::OnDataAvailable that technically has been able to break the infinite loop, but I was digging for the first time in many of the underlying components involved in this issue and so I have a number of doubts (e.g. if that is a reasonable way to fix the issue, or if it is just going to hide the actual issue by fixing the symptoms, and if there are other more reasonable way to fix this issue).

Thanks in advance for your help!

Flags: needinfo?(nika)
Attachment #9241129 - Attachment description: WIP: Bug 1706594 - Return failure from nsDocumentOpenInfo::OnDataAvailable on failed requests when there is no retarget listener. r?nika! → Bug 1706594 - Add nsInputStreamPump created by ExtensionStreamGetter::OnStream to the mChannel loadGroup . r?nika!
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #9241129 - Attachment description: Bug 1706594 - Add nsInputStreamPump created by ExtensionStreamGetter::OnStream to the mChannel loadGroup . r?nika! → Bug 1706594 - Add nsICancelable out param to nsBaseChannel::BeginAsyncRead virtual method. r?nika!

I just looked to the debian bug report linked from comment 5 (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=986027) and then to the content of the debian packages for one of the extension mentioned in that bug, which made me realize that the extensions installed from these debian packages are unpackaged and so they should be triggering the exact same issue that I was able to trigger consistently using the STR from comment 15 (where ABP has to also be loaded as unpacked to trigger the issue consistently).

e.g. the content of the webext-private-badger deb package from packages.ubuntu.com (debian.packages.org seems to don't be able to show the list of files, but they should be packaged exactly in the same way from this perspective):

This confirms that the attached fix should also be covering the issue experienced by debian/ubuntu users that are installing extensions from deb packages instead of installing them from addons.mozilla.org (or by installing the signed xpi non temporarily).

(I'm also clearing the previously assigned needinfos, I discussed the proposed approach used in the current version of the attached patch with Nika over Matrix and we will continue on phabricator).

Flags: needinfo?(nika)
Flags: needinfo?(amarchesini)

Moving to "Core :: Networking", where this issue actually belongs.

Component: DOM: File → Networking
Priority: -- → P2
Whiteboard: [necko-triaged]
Pushed by luca.greco@alcacoop.it:
https://hg.mozilla.org/integration/autoland/rev/2c21101c4b3b
Add nsICancelable out param to nsBaseChannel::BeginAsyncRead virtual method. r=nika,necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch

Confirmed fixed, thanks so much Luca, Nika, et al.

As a side note, I noticed that CPU usage still increased more than expected while watching YouTube streams, but it didn't pin to 100% after navigating away.
I ran the profiler, which doesn't cause Firefox to crash anymore, and saw that the Video DownloadHelper extension was using up the CPU, and after disabling that, my CPU usage is what I expect it to be. I don't really need that extension, so I disabled it and didn't look any further.

(In reply to david from comment #36)

Confirmed fixed, thanks so much Luca, Nika, et al.

Thanks for confirming!

As a side note, I noticed that CPU usage still increased more than expected while watching YouTube streams, but it didn't pin to 100% after navigating away.
I ran the profiler, which doesn't cause Firefox to crash anymore, and saw that the Video DownloadHelper extension was using up the CPU, and after disabling that, my CPU usage is what I expect it to be. I don't really need that extension, so I disabled it and didn't look any further.

Very nice that you could use the profiler to identify which extension was using your CPU! Would you consider reporting the issue and sharing the profile with the add-on author?

Whiteboard: [necko-triaged] → [necko-triaged][qf]
Regressions: 1735899

For those of us waiting for this to be fixed in firefox-esr91, I found a faster workaround than clicking enable/disable on all addons:

First in about:config set devtools.chrome.enabled to true.

Then when you want to apply the workaround press Ctrl+Shift+j, run this:

Components.utils.import("resource://gre/modules/AddonManager.jsm");
AddonManager.getAllAddons().then(addons => addons.forEach(addon => {addon.disable() ; addon.enable()}))

I noticed that this patch is missing from the latest Firefox ESR release, would it be possible for it to be added to the Firefox ESR 91 branch too?

Restarting Firefox ESR frequently to work around this issue is starting to get annoying, would it be possible for it to be added to the Firefox ESR 91 branch too?

Can this bug please be fixed in ESR?

[Tracking Requested - why for this release]: Poor user experience with popular extensions.

Regressed by: 1681529
Has Regression Range: --- → yes

This looks like a pretty big patch with another known regression bug fix we'd also have to take. It also hasn't had a ton of bake time in the wild yet since the fix only shipped in Fx95. Normally this wouldn't be a great candidate for ESR backport given those factors. Do we have any sense of how commonly this issue is encountered in the wild and are there any other workarounds we could land to avoid taking the bigger change?

Flags: needinfo?(lgreco)

The main people who are affected by the issue are folks running distro builds (like Debian) along with distro packaged extensions (like Debian webext-* packages). These people (Debian package users at least) are likely to have telemetry disabled by the distro (Debian does that by default).

It does sound like Windows users are also affected, but only when they use unpacked extensions, which is likely limited to extension developers and power users?

PS: I'm in the first group (running Debian firefox-esr with webext-* packages) and this is affecting me pretty much daily.

Perhaps Debian would be a good place to bake the patch? Have glandium add the backported patch to the Debian firefox-esr, closing the Debian bug reports about this, and then wait to see if Debian users reopen the bugs. Once the fix is confirmed working then it could be added to the Mozilla Firefox ESR releases.

I'm good with that if glandium is. I did a cursory check to confirm that the patch grafts cleanly (it does), but I didn't verify that it actually builds and works as expected. As noted previously, you'd also want to take the fix for bug 1735899 with it to avoid introducing that regression.

Flags: needinfo?(mh+mozilla)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #43)

This looks like a pretty big patch with another known regression bug fix we'd also have to take. It also hasn't had a ton of bake time in the wild yet since the fix only shipped in Fx95. Normally this wouldn't be a great candidate for ESR backport given those factors. Do we have any sense of how commonly this issue is encountered in the wild and are there any other workarounds we could land to avoid taking the bigger change?

I confirm that as Paul mentioned in Comment 44 the users that would more likely be able to trigger this issue are:

  • extension developers (on any platform)
  • linux users IFF they use also distro-packaged extension (then installed in Firefox as unpacked), as Debian does (but likely also for other distros, e.g. Fedora seems to be doing the same)
  • power users on any platform if that decide to install extension as unpacked (which should not be that common)

As Ryan already mention Bug 1735899 patch would also be needed, it isn't technically a regression introduced by this patch, but fixing this bug reveal the other one by making it possible to trigger it, and so if we decide to backport this one we definitely want both.

Given that the bigger class of users that are affected by this are definitely "Linux users installing distro-packaged extensions", backporting these patches through the set of patches that the Linux Distribution apply in their build seems a reasonable strategy to me too.

And so I'm also good with that strategy if glandium is.
(and thanks Ryan for checking that the patches grafts cleanly upfront).

Flags: needinfo?(lgreco)

The patches are now applied to the package in Debian unstable. I'll let it bake there for a little while, and will update bullseye after that.

Flags: needinfo?(mh+mozilla)

I've been running the patched package from Debian unstable. I don't see the 100% CPU issue any more. I haven't noticed any crashes or other issues either. So I think this is probably good to go for Debian bullseye and also upstream Firefox ESR. Maybe wait a bit more, perhaps until it reaches Debian bookworm, before doing that though.

I still have this issue running Firefox 96.0 in Fedora 35 (firefox-96.0-1.fc35.x86_64 rpm from official Fedora repository).
I have quite a number of the following extensions installed:
Bookmark Highlighter
Bookmars Organizer
Consent-O-Matic
Debian Wiki Search
floccus bookmark sync
KeePassXC-Browser
NoScript
Privacy Badger
Send to Kodi
Session Sync
Tab Stash
uBlock Origin

If I disable extensions then at some point it stops happening but it doesn't seem related to a particular extension.

If the issue is similar to the one fixed by this bug, then is is triggered only by addons installed as unpacked (e.g. installed from the linux distribution packages instead of being installed from addons.mozilla.org, or by other installations flows but packed as xpi).

If some of the extensions listed in comment 51 are installed from the linux distribution packages, those are the ones that may likely be preventing the issue from being triggered if they are disabled before triggering it.
This issue, once triggered, was not stopping until the entire extension process was killed, if that is not the behavior of the bug you are experiencing than it is more likely a different kind of bug (which may still share with this the way the issue is perceived, e.g. CPU spinning on the extension process, even if it is not the same bug).

If you are able to collect a profile using the Firefox Profiler (https://profiler.firefox.com/) while the bug is being triggered, that may help us to confirm if the bug looks similar to this one (but maybe triggered by a similar issue in internals that were not changed in this bug, I can't 100% exclude that) or a totally different one.

Thank you very much for your detailed comment. In my case I have installed all extensions manually, except uBlock Origin, which comes from the system rpm. When the issue is triggered then killing the extension process does solve it, but alternatively disabling individual extensions also seems to work, so indeed it might be a similar symptom with a different cause.
I will try to create the profile data and will post it here.
Thank you very much for helping!

Hi,
I have uploaded the profile: https://share.firefox.dev/3qZCnvf for the problem I commented in https://bugzilla.mozilla.org/show_bug.cgi?id=1706594#c51

Back to the original bug, I haven't seen any noticeable regressions from this patch when using the ESR packages from Debian sid/bookworm. I think it is time to get this patch into Debian bullseye at minimum, and probably also into the next upstream Mozilla Firefox ESR release.

Performance Impact: --- → ?
Whiteboard: [necko-triaged][qf] → [necko-triaged]

This has probably baked long enough that we can go ahead and take it in mainline now. Go ahead and nominate for approval if you agree.

Flags: needinfo?(lgreco)

Have been running Debian Bullseye Firefox ESR for over a month now with no sign of the 100% CPU issue.

Comment on attachment 9241129 [details]
Bug 1706594 - Add nsICancelable out param to nsBaseChannel::BeginAsyncRead virtual method. r?nika!

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: For the user affected by this bug, the extension child process may enter into a deadlock and keeps the CPU 100% busy, not allowing any extension to be able to run their code in the extension child process anymore and potentially also introduce lags for the browser UI and/or webpages not being loaded anymore (e.g. because the installed extensions may have suspended some webpage requests and then not being able to handle webRequest messages and let those requests to be resumed).

The user may need to manually kill the extension process to make sure the child process doesn't keep running (closing the browser may not be able to do that because unable to communicate with the child extension process).

  • User impact if declined: Extensions may suddenly stop working, the browser may lag or not being loading webpages, on lower end machines even the rest of the system may feel less responsive due to the deadlocked child process keeping the CPU in use.
  • Fix Landed on Version: 95
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The fix wasn't trivial, but it baked in release for quite some time now and it has been also applied to the Firefox ESR build that Debian OS packages and it doesn't seem that any issue has been reported by Debian users being using it up until now.

NOTE: If accepted, this fix should be uplifted along with the fix for an issue that got unveiled after we fixed this bug in Firefox 95: Bug 1735899
(which was also applied to the Debian's Firefox ESR package for the same reasons).

Flags: needinfo?(lgreco)
Attachment #9241129 - Flags: approval-mozilla-esr91?

Comment on attachment 9241129 [details]
Bug 1706594 - Add nsICancelable out param to nsBaseChannel::BeginAsyncRead virtual method. r?nika!

Thanks for all the testing and verification that went into this. Approved for 91.8esr.

Attachment #9241129 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: