Open Bug 1804823 Opened 1 year ago Updated 10 months ago

10-20s High CPU at startup

Categories

(Core :: Storage: Quota Manager, defect, P3)

Firefox 107
defect

Tracking

()

People

(Reporter: mozbz, Unassigned)

References

(Depends on 1 open bug)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0

Steps to reproduce:

Fresh start of browser
Happens during normal startup as well as using Safe Mode

Actual results:

CPU usage at 100% for 1 CPU core for 10+ seconds after application is first visible
Home page is set to completely empty/blank tab
No other tabs loaded or restored from prior session

Any web page attempted to be accessed during this time is delayed until exactly the moment the CPU would otherwise go idle.

Expected results:

Browser should have loaded and initialized within a few seconds at most.

Using MOZ_PROFILER_STARTUP=1 I captured the following 3 profiles:

Normal startup
https://share.firefox.dev/3W6nnbt

Safe Mode startup
https://share.firefox.dev/3PcxGIR

Another normal startup, in this case PID 32612 is the one using CPU for 10s+
https://share.firefox.dev/3UEaXGJ

This issue does not occur on our side, at least not with the fresh profiles we use, since this issue also occurs in safe mode we can rule out the Extensions.
Setting a component for this issue, maybe one of our devs can take a look at the Performance profiles and they might indentify the cause of it. Please move this issue to a more suitable component if this is not the correct one.

Component: Untriaged → JavaScript Engine
Product: Firefox → Core

This doesn't seem to have anything to do with JS. We're looping in TaskController::GetRunnableForMTTask (waiting for something runnable?).

It looks like TaskController was added in bug 1606706, so moving to XPCOM to match. Bas, it looks like you landed the TaskController code. Do you know who should be looking at this?

Component: JavaScript Engine → XPCOM
Flags: needinfo?(bas)

This is just blocking time waiting for runnables. That's just normal idling :-). Whatever thread is doing the actual CPU usage does not appear to be captured in this profile. At this point we have no real indication as to what the issue is here. Markus, perhaps you can advise the reporter as to what the best course of action is here? Possibly it's best if we just capture all threads?

Component: XPCOM → General
Flags: needinfo?(bas) → needinfo?(mstange.moz)

(In reply to Bas Schouten (:bas.schouten) from comment #4)

Markus, perhaps you can advise the reporter as to what the best course of action is here? Possibly it's best if we just capture all threads?

DC, could you capture another profile with the "Power" preset? This will record the CPU usage of all threads. This might let us find out which thread is keeping Firefox busy.

Flags: needinfo?(mstange.moz)

There's another thing that's strange about the Windows profiles: The profile does not contain native stacks in the parent process, even though the stackwalk feature was enabled. And the other processes do contain native stacks!

So something is preventing stack unwinding in the parent process. We have code which suspends native stackwalking while a DLL is being loaded or unloaded; maybe we're in a state where something is trying to load a DLL in a loop?

Thank you very much for looking into this. Here is a capture using the Power preset:
https://share.firefox.dev/3jgpZFR

So today (a bit after the above capture was done) I updated to the recently released v108 (20221215175817 specifically) and so far it doesn't seem that this issue is still happening. Hopefully this means whatever was causing it is no longer there in this version, but it may be too soon to call it resolved just yet. Unfortunately this will hinder my ability to do any more profiling captures or other troubleshooting.

If the symptom returns I will be sure to report in.

Turns out it is still happening in v108, here is the latest capture using Power preset:
https://share.firefox.dev/3PNcMk5

The severity field is not set for this bug.
:marco, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mcastelluccio)

It's the Quota Manager again. https://share.firefox.dev/3GepkMH

Questions for Quota Manager people:

  • Why is it taking so long?
  • Why is it taking so long on every startup?
  • Why do we need to delete things before we can have a usable browser?
  • Can we put a limit on how much time we spend deleting things? If time runs out, we'll just use more disk space, which is preferable over having a frozen browser, no?
Status: UNCONFIRMED → NEW
Component: General → Storage: Quota Manager
Ever confirmed: true

Shipping Origin Private File System is currently the top priority, but it looks we are very close to enabling it by default on Nightly. We should have more time after that to fix the storage initialization by making it asynchronous, bug 1671932.

Depends on: 1671932

(In reply to Jan Varga [:janv] from comment #11)

We should have more time after that to fix the storage initialization by making it asynchronous, bug 1671932.

Part of this bug is about the QuotaManager using too much CPU, not just about initialization being slow. Making things async would help with the "Home page is set to completely empty/blank tab. No other tabs loaded or restored" part of the report, but I'm not sure it would reduce the CPU use.

(In reply to Markus Stange [:mstange] from comment #5)

(In reply to Bas Schouten (:bas.schouten) from comment #4)

Markus, perhaps you can advise the reporter as to what the best course of action is here? Possibly it's best if we just capture all threads?

DC, could you capture another profile with the "Power" preset?

There's no way to use a preset for a startup profile. However you can set specific features using an extra environment variable. For example: MOZ_PROFILER_STARTUP_FEATURES="default,markersallthreads,power" would give you something close to a profile with the power preset.

(In reply to Florian Quèze [:florian] from comment #12)

(In reply to Jan Varga [:janv] from comment #11)

We should have more time after that to fix the storage initialization by making it asynchronous, bug 1671932.

Part of this bug is about the QuotaManager using too much CPU, not just about initialization being slow. Making things async would help with the "Home page is set to completely empty/blank tab. No other tabs loaded or restored" part of the report, but I'm not sure it would reduce the CPU use.

Once the initialization is asynchronous (which is the hardest part), we can do other optimizations like doing it on a low priority thread, etc.
And there are other things on our list which should lead to better performance (for example, it seems that the quota cache is sometimes not used even if build id didn't change).

Flags: needinfo?(mcastelluccio)
Severity: -- → S3
Priority: -- → P3
Performance Impact: --- → ?
Performance Impact: ? → ---
You need to log in before you can comment on or make changes to this bug.