Closed Bug 1567863 Opened 5 years ago Closed 2 years ago

Firefox startup slow - window loads quickly, but first network call delayed, works normal after that

Categories

(Core :: Networking, defect, P2)

68 Branch
x86_64
All
defect

Tracking

()

RESOLVED FIXED
Performance Impact high

People

(Reporter: asdofindia, Assigned: acreskey)

References

(Blocks 1 open bug)

Details

(Keywords: perf:pageload, Whiteboard: [necko-triaged])

Steps to reproduce

  1. Keep using firefox for months on ArchLinux. Current version 68.
    2. Get datareporting/archived folder to be very large (please see comment 3)
  2. Start Firefox (firefox window appears immediately)
  3. Try loading mozilla.org or wait for previous session's tab to open
  4. Keep waiting for it to load any page for minutes
  5. After a few minutes, pages start loading
  6. Then, you can browse normally.

What should have been
Steps 5 & 6 should not have been.

How do I know it is the datareporting/archived folder that is the culprit?

Well, I saw it mentioned here: https://support.mozilla.org/en-US/questions/1229621#answer-1159980

And then, I tried deleting it. And suddenly the slow startup is gone!

How does my datareporting/archived folder look like:

$tree datareporting/archived
datareporting/archived
├── 2019-06
│   ├── 1559355385659.8af44b18-326e-43fc-8aeb-08f14ace9e2d.main.jsonlz4
│   ├── 1559477593911.d832fa2a-7df6-4412-8a28-bff0ddcc1c2a.health.jsonlz4
| ...........
└── 2019-07
├── 1561919400021.539cd2ff-d45a-4944-852a-70e6adfdf1aa.main.jsonlz4
├── 1561950829885.203b2732-2b87-4dd3-8dbd-ed56cf81e864.main.jsonlz4
..................

2 directories, 396 files

Can I provide my firefox profile folder that reproduces this bug? I can. But are there some files I should hide? Could someone help me?

Hey,

thanks for the report. Weird to see that this is causing problems.
Can you be so kind to give me some more info:

  • How many files are in datareporting/archived?
  • What's the total size of the datareporting/archived directory?

One thing that would help tremendously identifying the problem here would be a profile of the startup.
This can be achieved using the Firefox Profiler, instructions on how to profile the startup can be found here: https://profiler.firefox.com/docs/#/./guide-startup-shutdown

Thanks!

Flags: needinfo?(asdofindia)

Thanks for that quick response.

Number of files I've already reported - 2 directories, 396 files

$ du -h datareporting/archived
3.0M datareporting/archived/2019-06
2.2M datareporting/archived/2019-07
5.2M datareporting/archived

Startup profile: https://perfht.ml/2XRTbY2 (Ah, thanks for the link on how to do this.)

Flags: needinfo?(asdofindia) → needinfo?(jrediger)

Turns out I was dead wrong.

After deleting the archived folder, and now restarting the computer, the startup bug is still there. That obviously disconnects this from the datareporting service. I'm sorry.

But could you gather any insight from the profile? (At least for the right triage category)

I've uploaded a fresh startup profile (with deleted datareporting/archived folder) at https://perfht.ml/2XW8vTN

Component: Telemetry → Untriaged
Flags: needinfo?(jrediger)
Product: Toolkit → Firefox
Summary: When archived folder of datareporting becomes large, firefox page load on startup becomes slow → Firefox startup slow - window loads quickly, but first network call delayed, works normal after that

Hi,

I've set a component for this bug. If you consider that there's another component that's more proper for this case you may change it.

Best regards, Flor.

Component: Untriaged → Networking
Product: Firefox → Core
Component: Networking → Performance

Hi Akshay -- sorry for the long delay. Is this still happening? The profile you link in https://bugzilla.mozilla.org/show_bug.cgi?id=1567863#c3 shows a large amount of time in both the HTTPS Everywhere and the Ghostery Web Extension. If you still witness this issue, can you try running Firefox in safe-mode and see if that fixes the problem? If that does fix it, trying to figure out which Web Extension causes the problem would help.

Flags: needinfo?(asdofindia)

Disabled all add-ons and profiled. Problem persists.

Profiler: https://perfht.ml/2oH50jh

Flags: needinfo?(asdofindia)

Hi Akshay,

Thanks so much - we're able to see that, in your first profile, Firefox is blocked waiting for a socket thread to become available. The second profile shows us waiting a long time for an "unknown reason". Perhaps this can be further illuminated by profiling the Socket Thread as well.

Doing this involves setting an environment variable - MOZ_PROFILER_STARTUP_FILTERS. Can you please set that environment variable to:

MOZ_PROFILER_STARTUP_FILTERS=GeckoMain,Socket,StreamTrans

and try gathering another startup profile - this time, loading https://bugzilla.mozilla.org/show_bug.cgi?id=1567863 as the first URL, rather than twitter.com?

/cc'ing valentin because this is starting to look like Necko.

ni'ing reporter for comment 7.

Flags: needinfo?(asdofindia)

Thanks a lot. I appreciate you going through that profile (it looks really complicated to me :D). I did with the filters set:

Profile after hot restarting firefox https://perfht.ml/2Ol7ONB (where there is no appreciable delay)

Profile after rebooting archlinux and starting firefox cold: https://perfht.ml/31UndZr (I am not really sure if this one had delay. The bugzilla page loaded immediately. But the profiler.firefox.com page didn't load at first attempt.)

I hope these give a clue although I couldn't really reproduce the bug when I was capturing these profile. But if they aren't enough, please do needinfo me again and I'll be happy to try a few more times.

Flags: needinfo?(asdofindia)

Yeah, these profiles do not show any delay. We need a profile that does.

Flags: needinfo?(asdofindia)

I'm fortunately (or not) able to reproduce the issue. For context I'll describe exactly what I did.

The profiles in my last comment were with all add-ons except profiler disabled.

I tried a few more times with the same (restarting computer multiple times) and could not reproduce the bug.

Then I enabled three extensions - ghostery, vimium, and HTTPS everywhere and tried again.

But I couldn't reproduce the bug this time either.

Then I enabled one more extension - "Firefox Multi-Account Containers" (by Mozilla)

This time I could reproduce the issue and here is the profile: https://perfht.ml/2Io9hip

(I can do a profiler with only Firefox Multi-Account Containers enabled if you needinfo me)

Thanks again.

Flags: needinfo?(asdofindia)

Rather than a profile, can I ask you for logs? Best setup as: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments

Please note URLs that were failing/delay to load along with sending the logs.

Thanks.

Flags: needinfo?(asdofindia)

Thanks :mayhemer

Did that.

Attaching the profile while I did that: https://perfht.ml/2LKKkQ9

The URL I loaded first is the url of this bug itself: https://bugzilla.mozilla.org/show_bug.cgi?id=1567863

The logs (will expire after 20 downloads and 7 days): https://send.firefox.com/download/3c798533a78eb92f/#aN9Gh4tE_QqijYjipKheGw

(Please let me know if there are any passwords or tokens in that so I can delete that)

Flags: needinfo?(asdofindia)

Downloaded. There can be cookies visible, yes.

OK, thank you for the logs, but the information is missing because of the size limit (the rotate:200 bit). Can you please rerun with either that removed from the MOZ_LOG list or with changing to say rotate:600? The number is a limit of MBs per process. You can also try to minimize any other interaction with the browser during the session.

Thank you.

Flags: needinfo?(asdofindia)

In the two latest profiles, the response comes from the cache. So maybe it's the cache startup that's taking too long here?

Okay. I am now consistently able to reproduce the issue with profiler add-on and multi-account containers being the only add-ons enabled.

Deleted the previous logs uploaded.

Removed the rotate bit and did absolutely no interaction with the browser till this url loaded: https://bugzilla.mozilla.org/show_bug.cgi?id=1567863 (url of this bug)

Logs Link: https://send.firefox.com/download/615fd274804982da/#f3IUg1FmUcpjEkNdqojT7g (expires in 7 days/20 downloads.)

Do feel free to ask me for a screengrab/more logs/whatever.

Flags: needinfo?(asdofindia)

In the startup hang phase, even if I open a new tab and enter a URL I've never visited, nothing happens till after a minute something magically clicks and everything (old from cache and new websites) suddenly starts opening.

Also, this happens only when starting Firefox first time after boot. Not when restarting firefox.

I can see the delay now, the channel is suspended twice and resumed (the second time) after a bit more than two minutes. Who calls the resume is not clear, it could be classification but I'm not entirely sure.

I'll return to this bug tomorrow. Akshay, are you willing to run an experimental build with some enhanced tracing features that may reveal what's going on here easily? I need to produce the build first, based on Firefox Nightly.

By the way, it would be good to make a copy of that profile - just in case it stops reproducing because we try it with a different build.

I've made a copy of the profile. I tried reproducing the issue in a fresh profile after installing multi-account container, but couldn't.

I am willing to run the experimental build.

Akshay, here target.tar.bz2 is the Firefox Nightly based experimental build that will produce special tracing logs.

Make sure you are running this on the copied profile, because then you can't use it with older Firefox versions. To run it:

  • unpack the bzip whenever it suits
  • run as: $ MOZ_BACKTRACK=3 ./firefox -profile /path/to/copied-profile -no-remote
  • reproduce the problem the same way as you did before to produce the logs
  • exit Firefox
  • in /tmp you will find backtrack-logs-NNN sub-dir with number of files in it, one directory is one browser session

Pack the whole /temp/backtrack-logs-NNN dir (xzip will do best) and send the same way as you did with the logs before.

Hopefully this will reveal the cause.

Thanks!

Flags: needinfo?(asdofindia)

Unfortunately (or fortunately), I'm unable to reproduce the bug with the build you provided.

Thereafter I tried reproducing it in the latest official nightly and there too it seems to be not reproducible.

Yet, I'm attaching the backtrack log here: https://send.firefox.com/download/a82db5c77cf2b377/#RjUcgQVmJ1VBsQJzBcnqxA

Flags: needinfo?(asdofindia)

Thanks a lot for trying! And testing with Nightly.

I don't see any delay, yes... The response is suspended because the multi-account-containers extension processes the hook of opening the channel asynchronously; this automatically suspends the channel. We resume on resolution of [1] (and then immediately [2]). There are other DOM promises (their identification is a TODO in Backtrack) in between as well, hard to say which one could cause the 2 minutes delay, but I'm pretty sure that delay happens on this path. In the log the opening phase of the channel is relatively fast, then the network is apparently suffering a large RTT (~300ms).

A side note: I can see we are triggering (from the parent process) the listener event chain via an idle dispatch. Here it is delayed only by 13 ms, but I've seen logs that took way longer. These notifications are critical to unblock network loads, so I wonder why we use idle dispatch when the next hop is an async IPC message to the web extension process. This is caused by the generic code in frameloader.

Sub log for examination: https://mayhemer.github.io/backtrack-analyzer/?../data/1567863-c23.zip

[1] https://github.com/mozilla/multi-account-containers/blob/d101f16ceee0139d0042facf80f6d19a04a62cc7/webextension/js/background/assignManager.js#L225
[2] https://hg.mozilla.org/mozilla-central/file/6b93a83735ed3ab3b57b46c1b768814b1a1af5d6/toolkit/components/extensions/ExtensionParent.jsm#l1217

Whiteboard: [qf:p1:pageload]
Priority: -- → P1

Tentatively moving to Networking. Honza, feel free to remove qf, if you think there isn't anything actionable here, but the bug does look rather bad when it happens.

Component: Performance → Networking

For reference, there are a lot of similar reports on reddit.

I may have good news. Thanks :evilpie for making me do it one more time (after trying at least 20 times in the last many days). I've been able to reproduce this bug by running the special build that :mayhemer provided in one of the many copies of the profile I made and have been running in the last couple of days.

Here is the link: https://send.firefox.com/download/714b0b308e06db21/#Hqu0_CU_JaCnpoQ1z_KGKA (expires after 20 downloads/7 days)

I really hope this helps you amazing devs pin point the issue. If not, please feel free to tell me what else I need to give.

Thanks, Akshay. I've downloaded a copy of the logs and have them, in case the Send link expires before mayhemer can get a chance to see them.

Flags: needinfo?(honzab.moz)

(In reply to Mike Conley (:mconley) (:⚙️) (Wayyyy behind on needinfos) from comment #28)

Thanks, Akshay. I've downloaded a copy of the logs and have them, in case the Send link expires before mayhemer can get a chance to see them.

Thanks a lot Mike, that was thoughtful! I was away last week, now I'm ready to look at it. Please feel free to distribute my way. Thanks!

Flags: needinfo?(honzab.moz) → needinfo?(mconley)
Flags: needinfo?(mconley) → needinfo?(honzab.moz)

Thanks so much!

Downloaded and I'm looking at the log. It's quite interesting.

I'll try to summary the main findings:

  • I found the load of https://bugzilla.mozilla.org/show_bug.cgi?id=1567863 that took 110 SECONDS, the path back leads to startup
  • something (not clear, because the code is too generic and we are missing more labeling there) calls CacheFileInputStream::NotifyListener on main process/main thread that leads to triggering of nsIStreamListener notifications on the channel; apparently it is a line of JS promise resolutions (micro tasks executions) and one of them is definitely an extension message response handled here
  • I found that few events (executed on the WebExtension process's main thread) on the path between the startup and finishing load of the URL are blocked (dispatched after) events processing nsHtml5ExecutorReflusher taking an extremely long time

the blockers (not events on the path itself!) are:

EXECUTE_BEGIN "nsHtml5ExecutorReflusher" @104291.03ms
  WebExtensions(1356)/Main Thread(1356)  
self-time: 15478.89ms
source events: 
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/utils.js]" @52675.43ms
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[data:text/html,%3C!DOCTYPE%20html%3E%0A%3Chtml%3E%0A%20%20%3Chead%3E%3Cmeta%20charset=%22utf-8%22%3E%3C/head%3E%0A%20%20%3Cbody%3E%0A%20%20%20%20%3Cscript%20type=%22text/javascript%22%20src=%22moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/util" @52440.72ms
• LABEL_BEGIN "nsDocShell::InternalLoad|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/_generated_background_page.html]" @52438.73ms
• LABEL_BEGIN "nsDocShell::InternalLoad|[chrome://extensions/content/dummy.xul]" @48870.04ms
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/content_scripts/vimium_frontend.js]" @47451.24ms
• STARTUP "" @17153.83ms
EXECUTE_BEGIN "nsHtml5ExecutorReflusher" @53896.75ms
  WebExtensions(1356)/Main Thread(1356)  
self-time: 50119.71ms
source events: 
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/utils.js]" @52675.43ms
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[data:text/html,%3C!DOCTYPE%20html%3E%0A%3Chtml%3E%0A%20%20%3Chead%3E%3Cmeta%20charset=%22utf-8%22%3E%3C/head%3E%0A%20%20%3Cbody%3E%0A%20%20%20%20%3Cscript%20type=%22text/javascript%22%20src=%22moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/util" @52440.72ms
• LABEL_BEGIN "nsDocShell::InternalLoad|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/_generated_background_page.html]" @52438.73ms
• LABEL_BEGIN "nsDocShell::InternalLoad|[chrome://extensions/content/dummy.xul]" @48870.04ms
• LABEL_BEGIN "nsBaseChannel::AsyncOpen|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/content_scripts/vimium_frontend.js]" @47451.24ms
• STARTUP "" @17153.83ms
  • there are more blockers and long executions, possibly GC (few times taking around 2-3 seconds) and more; the above listed are most significant

It looks like the problem comes from this extension:
https://addons.mozilla.org/en-US/firefox/addon/vimium-ff/

Specifically, a long execution of script files generated from respective .coffee files (same path and file name, 6cef4ca):
https://github.com/philc/vimium/blob/master/background_scripts/bg_utils.coffee
https://github.com/philc/vimium/blob/master/lib/settings.coffee

Looking into the generated js files (in the xpi) doesn't reveal any obvious problem.

snippet from the long exection:

INFO "RunScript:leave|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/settings.js:1:0]" @104015.92ms
INFO "RunScript:leave|[moz-extension://ebade033-4440-406d-9ce6-5e83186e92b8/lib/settings.js:2:9]" @104015.65ms
< here is a 50104.07ms delay recorded >
INFO "RunScript:leave|[resource://gre/modules/Schemas.jsm:2695:13]" @53911.58ms
INFO "RunScript:leave|[resource://gre/modules/XPCOMUtils.jsm:383:23]" @53911.56ms
INFO "RunScript:enter|[resource://gre/modules/XPCOMUtils.jsm:383:23]" @53911.55ms
....
Flags: needinfo?(honzab.moz)

Hmm, in that case I wonder why the Gecko profiles from earlier in the bug don't show much activity on the WebExtension process.

(In reply to Markus Stange [:mstange] from comment #32)

Hmm, in that case I wonder why the Gecko profiles from earlier in the bug don't show much activity on the WebExtension process.

Could we be just waiting there for something that is not covered by the backtrack instrumentation? As I understand, all JS promise resolutions go via the micro task queue, right? Hence, promises are covered. Not sure what else we could do there. Note that there is not enough info what piece of code (line/col) the wait happens on. Wonder if this could be OS thread scheduling? The machine seems to be slower in general, it's a 4 physical core system.

Other idea is that we either wait for some conditional var or lock (lock are not tracked by BT at the moment, but I plan to) or we do sync IPC through ThreadLink::SendMessage which I'm not sure is covered well by BT, as I'm looking at its code.

Markus, does the main thread of the webext process have the same "long delay" detection (aka responsiveness measurement) mechanism as parent and content processes do? Because this is exactly that - events get blocked in the main thread queue.

Akshay, would you be willing to try to reproduce again with the backtrack custom build (as in comment 24), but this time with MOZ_BACKTRACK=7 instead of 3? It will add BT markers for conditional variable wait (forgot I already cover that!), which is what we may face here. Expect the log files be a bit bigger.

Thank you!

Flags: needinfo?(asdofindia)

I have been able to reproduce with MOZ_BACKTRACK=7 and vimium + multi-account containers installed.

Vimium maybe involved as the extension appears crashed (it is greyed and dysfunctional even though "enabled") and starts working if I do a quick disable-enable from addons.

Logs: https://send.firefox.com/download/b6ec93b966c755f6/#2JhKtUVq_Q3gkJfZh1wOLg (expires after 20 downloads/7 days, as usual)

My computer is old and the load goes upto 4.5 with CPU usage above 80 at the startup. But, I can use the terminal, riot messenger, telegram messenger, etc at the same time while firefox is blocked. Also, the bug is sometimes not reproducible. It could be something related to archlinux in the end. I can try replicating this in another computer if you want me to, but I don't know if I can just copy my profile and it will magically work.

Let me know what you need - at this point, I'm as invested in this bug as you are.

Flags: needinfo?(asdofindia) → needinfo?(honzab.moz)

Akshay, you rock! Thanks again.

I can again see number of channels delayed for ~120 - 123 seconds. This time it seems different. The bug URL load is suspended until apparently some extension (can't see which one this time) finishes waiting for an IndexDB operations to finish (largest source of blocking I found), in details, walking backwards:

  • the channel (previously suspended) is resumed by an extension message
  • that message is preceded by line of promise resolutions, there is not much script references, tho
  • that promise chain first resolves after an indexdb operation is done (IPC actor deletion). There is one dom::indexedDB::DatabaseMaintenance on the path blocked by several other dom::indexedDB::DatabaseMaintenance events on the Main Process/IndexedDB Mnt #5 thread, blocked overall for ~10 seconds.
  • before then, there is an event on Main Process/QuotaManager IO blocked for ~17 seconds by dom::DatastoreOperationBase runnable which itself is taking 76314.19ms (!).
  • before that I can see nsIdleService::SetTimerExpiryIfBefore timer, scheduled by telemetry with a 7 seconds delay (it's likely correct, BT has already revealed crazy stuff like this in the past!)
  • before that the code seems to do a lot Places (aka bookmarks) work
  • one significant delay of 6 seconds comes from places::InsertVisitedURIs runnable on Main Process/mozStorage #3
  • there there is a long list of promise resolution in Places/Sqlite, I gave up loading more log lines there

To sum, this time it more looks like we are blocked on I/O (on background threads) to resolve promises blocking start and finishing of the load of the page.

I takes 47089ms since startup to start opening the page for the bug URL. I unfortunately can't read the path back to startup, since the path crosses a code we know is not instrumented for backtracking (compositing).

(In reply to Akshay S Dinesh from comment #37)

My computer is old and the load goes upto 4.5 with CPU usage above 80 at the startup. But, I can use the terminal, riot messenger, telegram messenger, etc at the same time while firefox is blocked. Also, the bug is sometimes not reproducible.

It depends on how things race, what wins, if something that just leads to finishing of something not visible or the active tab page load. It depends on many non-deterministic variables.

It could be something related to archlinux in the end. I can try replicating this in another computer if you want me to, but I don't know if I can just copy my profile and it will magically work.

Hard to say, but it's clear the machine is slow overall.

How many tabs are you restoring?

Let me know what you need - at this point, I'm as invested in this bug as you are.

I have to think first what more we could try, but it seems like a possible combination of factors happening all at the same time (I/O and memory exhaustion). Thank you so far for all the work and data!

Flags: needinfo?(honzab.moz)
Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][necko-triaged]
Priority: P1 → P2

With recent versions of Firefox I don't have this issue at all. Shall try reproducing with the profile folder that was causing the bug and update the bug.

(In reply to Akshay S Dinesh from comment #39)

With recent versions of Firefox I don't have this issue at all. Shall try reproducing with the profile folder that was causing the bug and update the bug.

Any luck reproducing this, Akshay?

Flags: needinfo?(asdofindia)

I've not been able to reproduce this.

Maybe okay to close.

Flags: needinfo?(asdofindia)

Okay. I can now reproduce this on Ubuntu 18.04 with 16GB RAM Intel i7 8th gen and Firefox 74.0 with various add-ons installed (including ghostery, vimium, and mult-account container)

In bug 1625006, aswan was suggesting that we record more information in the profiler about what's blocking web requests; this may help to understand what's happening in this bug.

Something else you could try doing to record more information is to profile startup with the DOM Worker threads included (should be included by default if you don't use the MOZ_PROFILER_STARTUP_FILTERS value that was requested earlier), and with the ipcmessages feature included (MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_FEATURES="js,stackwalk,mainthreadio,ipcmessages"). This should show when we are blocked on IndexedDB.

See Also: → 1625006

(In reply to Florian Quèze [:florian] from comment #43)

In bug 1625006, aswan was suggesting that we record more information in the profiler about what's blocking web requests; this may help to understand what's happening in this bug.

Something else you could try doing to record more information is to profile startup with the DOM Worker threads included (should be included by default if you don't use the MOZ_PROFILER_STARTUP_FILTERS value that was requested earlier), and with the ipcmessages feature included (MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_FEATURES="js,stackwalk,mainthreadio,ipcmessages"). This should show when we are blocked on IndexedDB.

https://bit.ly/2JjKJav is with MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_FEATURES="js,stackwalk,mainthreadio,ipcmessages" firefox %u

Please let me know if there is any other command I can run.

I don't know if this is relevant. But vimium repository has a bug that has a comment like this:

Yes. This is a race condition between Vimium's front-end JS and the JS on the background page. The front-end JS runs before the background page is listening, so initialisation fails. We should fix this.

I think this is definitely relevant. multi-account-containers has this bug report

After a cold boot, tabs are unresponsive for a random amount of time. At some point, it felt some 5 minutes or more.

Someone has even found out the exact line at which things freeze.

I'm suffering from this bug from a long time and I thought that it was because my HD was old and slow. Good to know that I was wrong..
Is there anything that I can do to contribute to a fix? Maybe some other logs or profile...
English is not my primary language but I'm confident that communication won't be a problem.

Whiteboard: [qf:p1:pageload][necko-triaged] → [qf:p1:pageload][necko-triaged][fxperf]
Whiteboard: [qf:p1:pageload][necko-triaged][fxperf] → [qf:p1:pageload][necko-triaged][fxperf:p2]

Can we get a bump on this? I've been experiencing this issue for some months now (both release channel and nightly) and only recently was linked to this ticket as the cause.

(In reply to Sollace from comment #48)

Can we get a bump on this? I've been experiencing this issue for some months now (both release channel and nightly) and only recently was linked to this ticket as the cause.

Sollace, I have very recently added some in-depth logging to current Nightly builds. Can you please download it, install it and setup logging according this doc and add events:1 to the MOZ_LOG list? Then reproduce and provide all the logs - there will be many .child logs as well (xzip/7zip them, you can use send.firefox.com to publish them.)

Thanks.

Flags: needinfo?(sollacea)

Sollace, please note that Nightly will create a new profile. You may need to make a copy of your regular profile and point the Nightly installation to it. More about profiles in about:profiles internal page and here.

:mayhemer, https://send.firefox.com/download/a71761535a31b8d0/#fVPP1Wmyo-5YsByKj9CeDw is the log after a fresh boot with

export MOZ_LOG=timestamp,rotate:200,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,cookie:5,events:1
export MOZ_LOG_FILE=/tmp/log.txt
./firefox

on 79.0a1 (2020-06-28) compressed with tar cf startup-log.tar.xz --xz log.txt*

My computer is now a more powerful one than when I originally filed the bug. The delay was only a few seconds when logging this.

Let me know if some parameter wasn't correctly specified.

Flags: needinfo?(honzab.moz)

Thanks! The channel is delayed by some 22 seconds. There is not only one source of the delay:

2020-06-28 15:39:19.867437 UTC - [Parent 5497: Main Thread]: V/nsHttp HttpBaseChannel::AddAsNonTailRequest this=0x7f749bb55000, rc=0x7f74ab810b30, already added=0
2020-06-28 15:39:19.917 ⁃ Event ⁃ 7f749bb57070 ⁃ released
2020-06-28 15:39:19.917439 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749bb57070
2020-06-28 15:39:19.917456 UTC - [Parent 5497: URL Classifier]: E/events EXEC 0x7f749c78f800 0x7f749baba718 [nsUrlClassifierDBServiceWorker::OpenDb]
2020-06-28 15:39:24.864183 UTC - [Parent 5497: URL Classifier]: E/events DONE 0x7f749baba718
2020-06-28 15:39:24.864197 UTC - [Parent 5497: URL Classifier]: E/events EXEC 0x7f749bb57070 0x7f749baba718 [AsyncUrlChannelClassifier::CheckChannel]
2020-06-28 15:39:24.865 ⁃ Event ⁃ 7f7498d9a1c0 ⁃ released
2020-06-28 15:39:24.865031 UTC - [Parent 5497: URL Classifier]: E/events DISP 0x7f7498d9a1c0
2020-06-28 15:39:24.865046 UTC - [Parent 5497: URL Classifier]: E/events DONE 0x7f749baba718
2020-06-28 15:39:25.166218 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f7498d9a1c0 0x7ffef7b42b18 [AsyncUrlChannelClassifier::CheckChannel - return]
2020-06-28 15:39:25.166230 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::ResolveProxy [this=0x7f749bb55000]

nsUrlClassifierDBServiceWorker::OpenDb takes 8 seconds and blocks classification of the channel.

Then the channel is suspended by apparently an extension or something that runs in a different content process (not necessary a web content).

2020-06-28 15:39:25.173639 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=0x7f749bb55000]
...
2020-06-28 15:39:32.192693 UTC - [Child 5819: JS Helper]: E/events DISP 0x7f826fd95280
2020-06-28 15:39:32.195205 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826fd95280 0x7ffd5505b9b8 [dom::NotifyOffThreadScriptLoadCompletedRunnable]
2020-06-28 15:39:32.195 ⁃ Event ⁃ 7f826fd82640 ⁃ released
2020-06-28 15:39:32.195215 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826fd82640
2020-06-28 15:39:32.755651 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b9b8
2020-06-28 15:39:32.758883 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826fd82640 0x7ffd5505b9b8 [nsHtml5ExecutorReflusher]
2020-06-28 15:39:32.758 ⁃ Event ⁃ 7f827a13e680 ⁃ released
2020-06-28 15:39:32.758963 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f827a13e680
2020-06-28 15:39:32.758981 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b9b8
2020-06-28 15:39:33.251 ⁃ IPC Receive ⁃ 7f7498591100 ⁃ released
2020-06-28 15:39:33.251810 UTC - [Parent 5497: Gecko_IOThread]: E/events RECV 0x7f7498591100 0x7f74b448c9b8 -893 [PBrowser::Msg_AsyncMessage]
2020-06-28 15:39:33.251 ⁃ Event ⁃ 7f749bbc7640 ⁃ released
2020-06-28 15:39:33.251829 UTC - [Parent 5497: Gecko_IOThread]: E/events DISP 0x7f749bbc7640
2020-06-28 15:39:33.251845 UTC - [Parent 5497: Gecko_IOThread]: E/events DONE 0x7f74b448c9b8
2020-06-28 15:39:33.251205 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f827a13e680 0x7ffd5505b9b8 [Document::DispatchContentLoadedEvents]
2020-06-28 15:39:33.251 ⁃ Event ⁃ 7f826f5fc620 ⁃ released
2020-06-28 15:39:33.251501 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826f5fc620
2020-06-28 15:39:33.251522 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826f5fc620 0x7ffd5505adc0
2020-06-28 15:39:33.251 ⁃ Event ⁃ 7f826d392500 ⁃ released
2020-06-28 15:39:33.251644 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826d392500
2020-06-28 15:39:33.251679 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505adc0
2020-06-28 15:39:33.251682 UTC - [Child 5819: Chrome_ChildThread]: E/events EXEC 0x7f826d392500 0x7f827a486c10 [IPC::Channel::Send]
2020-06-28 15:39:33.251 ⁃ IPC Send ⁃ 7f826f51a900 ⁃ released
2020-06-28 15:39:33.251698 UTC - [Child 5819: Chrome_ChildThread]: E/events SEND 0x7f826f51a900 -893 5497
IPC Send @7f826f51a900 --> IPC Receive @7f7498591100
2020-06-28 15:39:33.251728 UTC - [Child 5819: Chrome_ChildThread]: E/events DONE 0x7f827a486c10
2020-06-28 15:39:33.264762 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749bbc7640 0x7ffef7b42b18 [PBrowser::Msg_AsyncMessage]
2020-06-28 15:39:33.264 ⁃ Event ⁃ 7f749b282060 ⁃ released
2020-06-28 15:39:33.264966 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282060
2020-06-28 15:39:33.264988 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282060 0x7ffef7b40c10
2020-06-28 15:39:33.265 ⁃ Event ⁃ 7f749b282a80 ⁃ released
2020-06-28 15:39:33.265047 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282a80
2020-06-28 15:39:33.265125 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.265135 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282a80 0x7ffef7b40c10
2020-06-28 15:39:33.265 ⁃ Event ⁃ 7f749b282060 ⁃ released
2020-06-28 15:39:33.265158 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282060
2020-06-28 15:39:33.265172 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.265179 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282060 0x7ffef7b40c10
2020-06-28 15:39:33.270 ⁃ Event ⁃ 7f749b2828a0 ⁃ released
2020-06-28 15:39:33.270197 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b2828a0
2020-06-28 15:39:33.270294 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.270303 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b2828a0 0x7ffef7b40c10
2020-06-28 15:39:33.270 ⁃ Event ⁃ 7f749b282060 ⁃ released
2020-06-28 15:39:33.270540 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282060
2020-06-28 15:39:33.270578 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.270649 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282060 0x7ffef7b40c10
2020-06-28 15:39:33.270 ⁃ Event ⁃ 7f749b2828a0 ⁃ released
2020-06-28 15:39:33.270688 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b2828a0
2020-06-28 15:39:33.270720 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.270843 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b2828a0 0x7ffef7b40c10
2020-06-28 15:39:33.270 ⁃ Event ⁃ 7f749b282980 ⁃ released
2020-06-28 15:39:33.270868 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282980
2020-06-28 15:39:33.270882 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.270985 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282980 0x7ffef7b40c10
2020-06-28 15:39:33.271 ⁃ Event ⁃ 7f749b282060 ⁃ released
2020-06-28 15:39:33.271002 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282060
2020-06-28 15:39:33.271010 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.271015 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282060 0x7ffef7b40c10
2020-06-28 15:39:33.271 ⁃ Event ⁃ 7f749b282840 ⁃ released
2020-06-28 15:39:33.271336 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282840
2020-06-28 15:39:33.271363 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.271372 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282840 0x7ffef7b40c10
2020-06-28 15:39:33.271 ⁃ Event ⁃ 7f749b282060 ⁃ released
2020-06-28 15:39:33.271406 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282060
2020-06-28 15:39:33.271417 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.271421 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282060 0x7ffef7b40c10
2020-06-28 15:39:33.271 ⁃ Event ⁃ 7f749b282ba0 ⁃ released
2020-06-28 15:39:33.271439 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282ba0
2020-06-28 15:39:33.271450 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b40c10
2020-06-28 15:39:33.271457 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282ba0 0x7ffef7b40c10
2020-06-28 15:39:33.271549 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=0x7f749bb55000]

unfortunately, JS Helper still uses it's own event pool so it's currently untracked what script we compile and what all it's blocked on. anyway source of another delay of about 8 seconds.

then getting cache entry is delayed by 8 seconds:

2020-06-28 15:39:33.275275 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::OpenCacheEntry [this=0x7f749bb55000]
2020-06-28 15:39:33.275503 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::MaybeRaceCacheWithNetwork [this=0x7f749bb55000, delay=500]
2020-06-28 15:39:33.275517 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::TriggerNetworkWithDelay [this=0x7f749bb55000, delay=500]
nsHttpChannel @7f749bb55000 --> CacheEntry @7f7498cc8900
2020-06-28 15:39:33.275 ⁃ Event ⁃ 7f7499ba5f80 ⁃ released
2020-06-28 15:39:33.275994 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f7499ba5f80
2020-06-28 15:39:33.276030 UTC - [Parent 5497: Cache2 I/O]: E/events EXEC 0x7f7499ba5f80 0x7f74a633ada0 [net::OpenFileEvent]
2020-06-28 15:39:33.276078 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::Connect 0x7f749bb55000 AwaitingCacheCallbacks forces async
2020-06-28 15:39:33.276092 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42b18
2020-06-28 15:39:33.276 ⁃ Event ⁃ 7f74985e3a60 ⁃ released
2020-06-28 15:39:33.276891 UTC - [Parent 5497: Cache2 I/O]: E/events DISP 0x7f74985e3a60
2020-06-28 15:39:33.276948 UTC - [Parent 5497: Cache2 I/O]: E/events DONE 0x7f74a633ada0
2020-06-28 15:39:33.277001 UTC - [Parent 5497: Cache2 I/O]: E/events EXEC 0x7f74985e3a60 0x7f74a633ada0 [net::ReadEvent]
2020-06-28 15:39:33.321099 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b9b8
2020-06-28 15:39:33.346 ⁃ Event ⁃ 7f74985e38e0 ⁃ released
2020-06-28 15:39:33.346145 UTC - [Parent 5497: Cache2 I/O]: E/events DISP 0x7f74985e38e0
2020-06-28 15:39:33.346166 UTC - [Parent 5497: Cache2 I/O]: E/events DONE 0x7f74a633ada0
2020-06-28 15:39:33.346209 UTC - [Parent 5497: Cache2 I/O]: E/events EXEC 0x7f74985e38e0 0x7f74a633ada0 [net::ReadEvent]
2020-06-28 15:39:33.381060 UTC - [Parent 5497: Cache2 I/O]: D/nsHttp nsHttpChannel::OnCacheEntryCheck enter [channel=0x7f749bb55000 entry=0x7f749b2de540]
CacheEntry @7f7498cc8900
2020-06-28 15:39:33.381123 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp nsHttpResponseHead::ParseCachedOriginalHeader [this=0x7f74a0109f70]
2020-06-28 15:39:33.381197 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp nsHttpResponseHead::ParseCachedHead [this=0x7f74a0109f70]
2020-06-28 15:39:33.381226 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp nsHttpResponseHead::ParseVersion [version=HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Security-Policy: default-src 'self'; worker-src 'none'; connect-src 'self' https://product-details.mozilla.org https://www.google-analytics.com https://treeherder.mozilla.org/api/failurecount/; font-src 'self' https://fonts.gstatic.com; img-src 'self' data: https://secure.gravatar.com https://bug1567863.bmoattachments.org/; media-src 'self' https://bug1567863.bmoattachments.org/; object-src 'none'; script-src 'self' 'nonce-u9COGjUR2v9MxOgyEKEupFvS7Tz9pNjcppTIMYmtDwRIpf4T' 'unsafe-inline' 'unsafe-eval' https://www.google-analytics.com; style-src 'self' 'unsafe-inline'; frame-src 'self' https://crash-stop-addon.herokuapp.com; frame-ancestors 'self'; form-action 'self' https://www.google.com/search https://github.com/login/oauth/authorize https://github.com/login https://phabricator.services.mozilla.com/
Content-Type: text/html; charset=UTF-8
Date: Sun, 28 Jun 2020 15:28:10 GMT
Link: </static/v20200624.1/skins/standard/fonts/FiraMono-Regular.woff2?v=3.202>; rel="preload"; as="font", </static/v20200624.1/skins/standard/fonts/FiraSans-Bold.woff2?v=4.203>; rel="preload"; as="font", </static/v20200624.1/skins/standard/fonts/FiraSans-Italic.woff2?v=4.203>; rel="preload"; as="font", </static/v20200624.1/skins/standard/fonts/FiraSans-Regular.woff2?v=4.203>; rel="preload"; as="font", </static/v20200624.1/skins/standard/fonts/FiraSans-SemiBold.woff2?v=4.203>; rel="preload"; as="font", </static/v20200624.1/skins/standard/fonts/MaterialIcons-Regular.woff2>; rel="preload"; as="font", <https://www.google-analytics.com>; rel="preconnect"; crossorigin
Referrer-policy: same-origin
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-content-type-options: nosniff
X-frame-options: SAMEORIGIN
X-xss-protection: 1; mode=block
]
2020-06-28 15:39:33.381266 UTC - [Parent 5497: Cache2 I/O]: E/nsHttp Have status line [version=11 status=200 statusText=OK]
2020-06-28 15:39:33.381279 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp ParseContentType [type=text/html; charset=UTF-8]
2020-06-28 15:39:33.381353 UTC - [Parent 5497: Cache2 I/O]: D/nsHttp nsHttpChannel::ResponseWouldVary [channel=0x7f749bb55000] processing Accept-Encoding
2020-06-28 15:39:33.381365 UTC - [Parent 5497: Cache2 I/O]: D/nsHttp nsHttpChannel::ResponseWouldVary [channel=0x7f749bb55000] stored value = "gzip, deflate, br"
2020-06-28 15:39:33.381371 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp VALIDATE_NEVER set
2020-06-28 15:39:33.381376 UTC - [Parent 5497: Cache2 I/O]: V/nsHttp NOT validating based on VALIDATE_NEVER load flag
2020-06-28 15:39:33.382897 UTC - [Parent 5497: Cache2 I/O]: D/nsHttp Opened cache input stream without buffering [channel=0x7f749bb55000, mCacheEntry=0x7f749b2de540, stream=0x7f749824b740]
2020-06-28 15:39:33.382907 UTC - [Parent 5497: Cache2 I/O]: D/nsHttp nsHTTPChannel::OnCacheEntryCheck exit [this=0x7f749bb55000 doValidation=0 result=0]
2020-06-28 15:39:33.382 ⁃ Event ⁃ 7f749bb7ce70 ⁃ released
2020-06-28 15:39:33.382950 UTC - [Parent 5497: Cache2 I/O]: E/events DISP 0x7f749bb7ce70
2020-06-28 15:39:33.382983 UTC - [Parent 5497: Cache2 I/O]: E/events DONE 0x7f74a633ada0

(and in the meantime on the main thread, a script is being executed for those ~8 seconds, again, don't know which script:)
2020-06-28 15:39:33.281 ⁃ Event ⁃ 7f749bb4f828 ⁃ released
2020-06-28 15:39:33.281464 UTC - [Parent 5497: JS Helper]: E/events DISP 0x7f749bb4f828
2020-06-28 15:39:33.285060 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749bb4f828 0x7ffef7b42b18 [AsyncScriptCompiler]
2020-06-28 15:39:33.285 ⁃ Event ⁃ 7f749b282bc0 ⁃ released
2020-06-28 15:39:33.285176 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282bc0
2020-06-28 15:39:33.285192 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42b18
2020-06-28 15:39:33.285201 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282bc0 0x7ffef7b42a10
2020-06-28 15:39:33.286 ⁃ Event ⁃ 7f749b282d80 ⁃ released
2020-06-28 15:39:33.286343 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282d80
2020-06-28 15:39:33.286354 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:33.286674 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282d80 0x7ffef7b42a10
2020-06-28 15:39:33.286 ⁃ Event ⁃ 7f749b282f40 ⁃ released
2020-06-28 15:39:33.286725 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282f40
2020-06-28 15:39:33.286843 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:33.287341 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282f40 0x7ffef7b42a10
2020-06-28 15:39:33.287 ⁃ Event ⁃ 7f749b282bc0 ⁃ released
2020-06-28 15:39:33.287360 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282bc0
2020-06-28 15:39:33.287369 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:33.287484 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282bc0 0x7ffef7b42a10
2020-06-28 15:39:33.287 ⁃ Event ⁃ 7f749b282ee0 ⁃ released
2020-06-28 15:39:33.287556 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749b282ee0
2020-06-28 15:39:33.287566 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:33.288101 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749b282ee0 0x7ffef7b42a10
2020-06-28 15:39:41.410779 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
...
2020-06-28 15:39:41.449719 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749bb7ce70 0x7ffef7b42b18 [CacheEntry::AvailableCallbackRunnable]
2020-06-28 15:39:41.449881 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::OnCacheEntryAvailable [this=0x7f749bb55000 entry=0x7f749d2f13a0 new=0 appcache=(nil) status=0 mAppCache=(nil) mAppCacheForWrite=(nil)]
nsHttpChannel @7f749bb55000 --> CacheEntry @7f7498cc8900
2020-06-28 15:39:41.449935 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::TriggerNetwork [this=0x7f749bb55000]
2020-06-28 15:39:41.449984 UTC - [Parent 5497: Main Thread]: D/nsHttp   triggering network
2020-06-28 15:39:41.450019 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::ReadFromCache [this=0x7f749bb55000] Using cached copy of: https://bugzilla.mozilla.org/show_bug.cgi?id=1567863
2020-06-28 15:39:41.450172 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42b18

again, a script executed on the parent process for ~8 seconds.

another delay of about 0.5 second

2020-06-28 15:39:41.551927 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=0x7f749bb55000]
2020-06-28 15:39:41.801 ⁃ Event ⁃ 7f74a4ac33e0 ⁃ released
2020-06-28 15:39:41.801674 UTC - [Parent 5497: Timer]: E/events DISP 0x7f74a4ac33e0
2020-06-28 15:39:41.815970 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f74a4ac33e0 0x7ffef7b42b18 [ChromeUtils::IdleDispatch]
2020-06-28 15:39:41.816 ⁃ Event ⁃ 7f749801d140 ⁃ released
2020-06-28 15:39:41.816772 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749801d140
2020-06-28 15:39:41.816826 UTC - [Parent 5497: Gecko_IOThread]: E/events EXEC 0x7f749801d140 0x7f74b448cc10 [IPC::Channel::Send]
2020-06-28 15:39:41.816 ⁃ IPC Send ⁃ 7f74981604a0 ⁃ released
2020-06-28 15:39:41.816865 UTC - [Parent 5497: Gecko_IOThread]: E/events SEND 0x7f74981604a0 504 5819
2020-06-28 15:39:41.816912 UTC - [Parent 5497: Gecko_IOThread]: E/events DONE 0x7f74b448cc10
2020-06-28 15:39:41.816946 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42b18
2020-06-28 15:39:42.001 ⁃ IPC Receive ⁃ 7f7498591100 ⁃ released
2020-06-28 15:39:42.001929 UTC - [Parent 5497: Gecko_IOThread]: E/events RECV 0x7f7498591100 0x7f74b448c9b8 -1039 [PWindowGlobal::Msg_RawMessage]
2020-06-28 15:39:41.817 ⁃ IPC Receive ⁃ 7f828d1d2100 ⁃ released
2020-06-28 15:39:41.817014 UTC - [Child 5819: Chrome_ChildThread]: E/events RECV 0x7f828d1d2100 0x7f827a4869b8 504 [PWindowGlobal::Msg_RawMessage]
IPC Send @7f74981604a0 --> IPC Receive @7f828d1d2100
2020-06-28 15:39:41.817 ⁃ Event ⁃ 7f827a180390 ⁃ released
2020-06-28 15:39:41.817109 UTC - [Child 5819: Chrome_ChildThread]: E/events DISP 0x7f827a180390
2020-06-28 15:39:41.817126 UTC - [Child 5819: Chrome_ChildThread]: E/events DONE 0x7f827a4869b8
2020-06-28 15:39:42.001 ⁃ Event ⁃ 7f74982b1640 ⁃ released
2020-06-28 15:39:42.001939 UTC - [Parent 5497: Gecko_IOThread]: E/events DISP 0x7f74982b1640
2020-06-28 15:39:42.001944 UTC - [Parent 5497: Gecko_IOThread]: E/events DONE 0x7f74b448c9b8
2020-06-28 15:39:41.996866 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f827a180390 0x7ffd5505b9b8 [PWindowGlobal::Msg_RawMessage]
2020-06-28 15:39:42.001 ⁃ Event ⁃ 7f826f531ee0 ⁃ released
2020-06-28 15:39:42.001801 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826f531ee0
2020-06-28 15:39:42.001817 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b9b8
2020-06-28 15:39:42.001821 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826f531ee0 0x7ffd5505b8b0
2020-06-28 15:39:42.001 ⁃ Event ⁃ 7f826f536140 ⁃ released
2020-06-28 15:39:42.001827 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826f536140
2020-06-28 15:39:42.001830 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b8b0
2020-06-28 15:39:42.001832 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826f536140 0x7ffd5505b8b0
2020-06-28 15:39:42.001 ⁃ Event ⁃ 7f826f531ee0 ⁃ released
2020-06-28 15:39:42.001836 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826f531ee0
2020-06-28 15:39:42.001838 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b8b0
2020-06-28 15:39:42.001840 UTC - [Child 5819: Main Thread]: E/events EXEC 0x7f826f531ee0 0x7ffd5505b8b0
2020-06-28 15:39:42.001 ⁃ Event ⁃ 7f826f5de500 ⁃ released
2020-06-28 15:39:42.001867 UTC - [Child 5819: Main Thread]: E/events DISP 0x7f826f5de500
2020-06-28 15:39:42.001881 UTC - [Child 5819: Main Thread]: E/events DONE 0x7ffd5505b8b0
2020-06-28 15:39:42.001890 UTC - [Child 5819: Chrome_ChildThread]: E/events EXEC 0x7f826f5de500 0x7f827a486c10 [IPC::Channel::Send]
2020-06-28 15:39:42.001 ⁃ IPC Send ⁃ 7f826da11430 ⁃ released
2020-06-28 15:39:42.001894 UTC - [Child 5819: Chrome_ChildThread]: E/events SEND 0x7f826da11430 -1039 5497
IPC Send @7f826da11430 --> IPC Receive @7f7498591100
2020-06-28 15:39:42.001908 UTC - [Child 5819: Chrome_ChildThread]: E/events DONE 0x7f827a486c10
2020-06-28 15:39:42.302165 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f74982b1640 0x7ffef7b42b18 [PWindowGlobal::Msg_RawMessage]
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827ade0 ⁃ released
2020-06-28 15:39:42.302209 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827ade0
2020-06-28 15:39:42.302215 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42b18
2020-06-28 15:39:42.302218 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827ade0 0x7ffef7b42a10
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827af60 ⁃ released
2020-06-28 15:39:42.302222 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827af60
2020-06-28 15:39:42.302234 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:42.302236 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827af60 0x7ffef7b42a10
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827ad40 ⁃ released
2020-06-28 15:39:42.302244 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827ad40
2020-06-28 15:39:42.302248 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:42.302284 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827ad40 0x7ffef7b42a10
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827afc0 ⁃ released
2020-06-28 15:39:42.302302 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827afc0
2020-06-28 15:39:42.302305 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:42.302350 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827afc0 0x7ffef7b42a10
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827adc0 ⁃ released
2020-06-28 15:39:42.302355 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827adc0
2020-06-28 15:39:42.302357 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:42.302396 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827adc0 0x7ffef7b42a10
2020-06-28 15:39:42.302 ⁃ Event ⁃ 7f749827ace0 ⁃ released
2020-06-28 15:39:42.302400 UTC - [Parent 5497: Main Thread]: E/events DISP 0x7f749827ace0
2020-06-28 15:39:42.302402 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10
2020-06-28 15:39:42.302431 UTC - [Parent 5497: Main Thread]: E/events EXEC 0x7f749827ace0 0x7ffef7b42a10
2020-06-28 15:39:42.302451 UTC - [Parent 5497: Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=0x7f749bb55000]
2020-06-28 15:39:42.302485 UTC - [Parent 5497: Main Thread]: E/events DONE 0x7ffef7b42a10

The 8 seconds pattern is visible quite often...

Public log https://mayhemer.github.io/logan/?../data/1560090.zip#{%22show%22%3A[{%22name%22%3A%22Event%22%2C%22on%22%3A10994%2C%22clr%22%3A%22%23b36704%22}%2C{%22name%22%3A%22nsHttpChannel%22%2C%22on%22%3A15%2C%22clr%22%3A%22%2387ba35%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A36871%2C%22clr%22%3A%22%23feffe3%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A36904%2C%22clr%22%3A%22%23d85c16%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A36909%2C%22clr%22%3A%22%23c6a145%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40212%2C%22clr%22%3A%22%23c7f23b%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40213%2C%22clr%22%3A%22%23d5fbaa%22}%2C{%22name%22%3A%22IPC%20Send%22%2C%22on%22%3A8396%2C%22clr%22%3A%22%239bd264%22}%2C{%22name%22%3A%22IPC%20Receive%22%2C%22on%22%3A8694%2C%22clr%22%3A%22%23fdf991%22}%2C{%22name%22%3A%22IPC%20Receive%22%2C%22on%22%3A8697%2C%22clr%22%3A%22%23f19f7f%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40256%2C%22clr%22%3A%22%23ec5294%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40268%2C%22clr%22%3A%22%23d738ae%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40283%2C%22clr%22%3A%22%23e61a08%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40284%2C%22clr%22%3A%22%2365feeb%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40285%2C%22clr%22%3A%22%230ab233%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40286%2C%22clr%22%3A%22%2385d3c4%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40287%2C%22clr%22%3A%22%2360bdf0%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40288%2C%22clr%22%3A%22%23eb5d32%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40305%2C%22clr%22%3A%22%239291fa%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40306%2C%22clr%22%3A%22%23b3d5e2%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40307%2C%22clr%22%3A%22%23e60be8%22}%2C{%22name%22%3A%22Event%22%2C%22on%22%3A40308%2C%22clr%22%3A%22%23cd939c%22}%2C{%22name%22%3A%22IPC%20Send%22%2C%22on%22%3A8423%2C%22clr%22%3A%22%2342fc09%22}]}

Flags: needinfo?(honzab.moz)

keeping ni? on me to try to investigate further.

Flags: needinfo?(honzab.moz)

I'm still having more than a minute hang and could contribute with some logs if necessary.

(In reply to vinieich from comment #55)

I'm still having more than a minute hang and could contribute with some logs if necessary.

Please do send logs using the nightly build suggested in comment 49. Although I sent logs, I didn't experience as much delay as I experience in the stable version. Therefore, it is a possibility that the logs I included doesn't include the info required for actually debugging the issue.

Flags: needinfo?(vinieich)

Alright, I've attached a zip file with some logs I've generated in a few different circumstances.

Through testing I did, at least, narrow down one application freeze to the DownThemAll! extension. (turning it back on after disabling it caused Firefox to lock up). Without that extension there is still a noticeable delay when restarting the browser, and a freeze+delay loading thumbs on the blank tab+delay loading youtube with no extensions after rebooting windows.

https://drive.google.com/drive/u/0/folders/1h9j-EzsBGya547VEMUGt13cZYCoFvvN2

Flags: needinfo?(sollacea)

I think that's a long existing issue that never been truly fixed. It's caused by poorly designed db schemes. The *.sqlite files become extremely inefficient after a while. The temporary solution is do a full vacuum to rebuild all the *.sqlite files, although that won't last long before everything becomes slow again.

(In reply to Sollace from comment #57)

Alright, I've attached a zip file with some logs I've generated in a few different circumstances.

Through testing I did, at least, narrow down one application freeze to the DownThemAll! extension. (turning it back on after disabling it caused Firefox to lock up). Without that extension there is still a noticeable delay when restarting the browser, and a freeze+delay loading thumbs on the blank tab+delay loading youtube with no extensions after rebooting windows.

https://drive.google.com/drive/u/0/folders/1h9j-EzsBGya547VEMUGt13cZYCoFvvN2

Thanks. I looked at the normal reboot logs only and I can see a huge delay in loading from the cache, I/O is extremely slow. There are channels delayed by 20+ seconds just because of waiting for the cache to response.

I wonder why RCWN doesn't kick in here, Michal, any idea?

Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)

(In reply to Honza Bambas (:mayhemer) from comment #59)

Thanks. I looked at the normal reboot logs only and I can see a huge delay in loading from the cache, I/O is extremely slow. There are channels delayed by 20+ seconds just because of waiting for the cache to response.

I wonder why RCWN doesn't kick in here, Michal, any idea?

Can you point me to a particular channel? After a quick look I found these 2:

0x1EB3E3A2000 - https://derpicdn.net/img/2014/10/16/744285/medium.png
It's greater than 256kB, so we don't race.

0x1EB28DA0000 - https://derpibooru.org/js/app-e51ec369208bbe0260651e5779e58659.js?vsn=d
This one has alt-data, so we also don't race.

Unfortunately, we don't log the reason why we decide not to race, so it's not that easy to find it in the log.

Flags: needinfo?(michal.novotny)

(In reply to Michal Novotny [:michal] from comment #60)

(In reply to Honza Bambas (:mayhemer) from comment #59)

Thanks. I looked at the normal reboot logs only and I can see a huge delay in loading from the cache, I/O is extremely slow. There are channels delayed by 20+ seconds just because of waiting for the cache to response.

I wonder why RCWN doesn't kick in here, Michal, any idea?

Can you point me to a particular channel? After a quick look I found these 2:

E.g. nsHttpChannel @1eb28da0000 created 2020-07-12 14:37:38.439 (log.txt.moz_log.1)

(In reply to Honza Bambas (:mayhemer) from comment #61)

E.g. nsHttpChannel @1eb28da0000 created 2020-07-12 14:37:38.439 (log.txt.moz_log.1)

That one I already mentioned in comment 60.

So I had to check this multiple times to be sure, but after running the nightly installer to install updates and disabling the Down Them All! adding, I haven't noticed any freezing or blocking at all when starting Firefox Nightly. Neither when just restarting the browser, nor following a full reboot.

I'll probably do an explicit restart a little later to check this again that it's not just a fluke.

I'm not really sure what would have been changed that would suddenly fix it.

Setting network.dns.disableIPv6 to true solved the issue for me.

Reporter, have you still seen this?

Flags: needinfo?(asdofindia)

(In reply to Olli Pettay [:smaug] (away May 1 - 8) from comment #65)

Reporter, have you still seen this?

I have an even newer laptop with SSD in it now (all my older ones had HDDs) and I can't really see it in this. There definitely is a tiny sub-second delay where the window has loaded and content isn't loading. But I can't call that an issue. But my mind says if I go back to an HDD that sub-second can expand to many seconds. I leave it to your call.

Flags: needinfo?(asdofindia)

I have this problem every time I start Nightly after cold boot (from months to now). If I restart the browser after close, the problem doesn't appear.
It takes 2-5 minutes (!) to start properly. In meantime, about:support doesn't show any data, and I can't disable any extension; they only work after pages are loaded.

(In reply to Mateus Meyer Jiacomelli from comment #67)

I have this problem every time I start Nightly after cold boot (from months to now). If I restart the browser after close, the problem doesn't appear.
It takes 2-5 minutes (!) to start properly. In meantime, about:support doesn't show any data, and I can't disable any extension; they only work after pages are loaded.

If you can reproduce on a recent Nightly build, capturing a startup profile would be interesting.
To capture a profile, first go to profiler.firefox.com and click "Enable Firefox Profiler Menu Button". Then (after a reboot), start Firefox with the MOZ_PROFILER_STARTUP=1 environment variable. After Firefox is done starting, click the Profiler toolbar button to capture the profile.

(In reply to Florian Quèze [:florian] from comment #68)

(In reply to Mateus Meyer Jiacomelli from comment #67)

I have this problem every time I start Nightly after cold boot (from months to now). If I restart the browser after close, the problem doesn't appear.
It takes 2-5 minutes (!) to start properly. In meantime, about:support doesn't show any data, and I can't disable any extension; they only work after pages are loaded.

If you can reproduce on a recent Nightly build, capturing a startup profile would be interesting.
To capture a profile, first go to profiler.firefox.com and click "Enable Firefox Profiler Menu Button". Then (after a reboot), start Firefox with the MOZ_PROFILER_STARTUP=1 environment variable. After Firefox is done starting, click the Profiler toolbar button to capture the profile.

https://share.firefox.dev/3e4rL7J

(In reply to Mateus Meyer Jiacomelli from comment #69)

(In reply to Florian Quèze [:florian] from comment #68)

(In reply to Mateus Meyer Jiacomelli from comment #67)

I have this problem every time I start Nightly after cold boot (from months to now). If I restart the browser after close, the problem doesn't appear.
It takes 2-5 minutes (!) to start properly. In meantime, about:support doesn't show any data, and I can't disable any extension; they only work after pages are loaded.

If you can reproduce on a recent Nightly build, capturing a startup profile would be interesting.
To capture a profile, first go to profiler.firefox.com and click "Enable Firefox Profiler Menu Button". Then (after a reboot), start Firefox with the MOZ_PROFILER_STARTUP=1 environment variable. After Firefox is done starting, click the Profiler toolbar button to capture the profile.

https://share.firefox.dev/3e4rL7J

Thanks!

Page loads are blocked until WebExtensions are done initializing (TelemetryStopwatch - WEBEXT_STORAGE_LOCAL_IDB_GET_MS markers in the WebExtensions process). The WebExtensions process seems idle.

In the parent process (which is also mostly idle), I see lots of "FileIO (non-profiled thread)" markers for reading files in the <profile folder>/storage/default/... folder.
The stacks for these reads are all in:

mozilla::dom::quota::QuotaManager::InitializeRepository(mozilla::dom::quota::PersistenceType)libxul.so
mozilla::dom::quota::QuotaManager::LoadQuota()libxul.so
mozilla::dom::quota::QuotaManager::EnsureTemporaryStorageIsInitialized()libxul.so
mozilla::dom::(anonymous namespace)::PrepareDatastoreOp::NestedRun()libxul.so
mozilla::dom::(anonymous namespace)::LSRequestBase::Run()libxul.so
nsThread::ProcessNextEvent(bool, bool*)libxul.so

Andrew, is this actionable?

Flags: needinfo?(bugmail)

Between comment 70 and the profile from comment 69 where I'm seeing WebExtension process is blocked on a LocalStorage read in a method getSavedStaticThemes from the extension DarkReader, it seems that QuotaManager initialization is taking a long time.

Potential mitigations / workarounds:

  • QuotaManager tries to cache the results of quota initialization at successful shutdown after successful initialization. It invalidates this cache whenever the build id changes, which causes a potentially very slow initialization scan. By setting the boolean pref dom.quotaManager.caching.checkBuildId to false in nightly in about:config, it can be made to not invalidate the cache due to upgrades. This pref really only exists for testing purposes and was only added to nightly a few days ago, but it's a way to side-step the problem.
  • A better workaround might be to clear a bunch of offline storage data for origins that aren't needed. This would reduce the number of origins/data that the initialization pass needs to scan, reducing the I/O and any anti-virus contention. This may also help if the problem is that there are filesystem problems or hardware problems that are causing I/O for specific directories/files to take an extremely long time.
    • The in-Firefox UI for this is the about:preferences UI for this documented at https://support.mozilla.org/en-US/kb/storage.
    • You can also go to about:profiles to find your current profile on disk and clean out the contents of PROFILE/storage/default.
  • The "refresh firefox" mechanism from about:support will NOT copy over the contents of PROFILE/storage/default and is a way to side-step the pre-existing storage but is an annoying process since it's very conservative about what it does propagate to the new profile.

Jan, maybe you can look in a little more detail at this profile to better understand what's slowing stuff down?

Flags: needinfo?(bugmail) → needinfo?(jvarga)

Some insights about profile folder (I'm using a HDD, and this profile is at least 2 years old).
It seems to start slowly after every system boot, even if using same Nightly build.

$ du -cksh .mozilla/firefox/1oezlspz.default-nightly/storage/default/
826M	.mozilla/firefox/1oezlspz.default-nightly/storage/default/
826M	total
$ ls .mozilla/firefox/1oezlspz.default-nightly/storage/default/ | wc -l
2109

Firefox crashed now (seems to be unrelated, it's due to Deezer memory leak bug), and when I opened it again, it was slow to open (the slowness doesn't happen if I close the browser and reopen it, but happens after boot/restart and crashes).

I've been watching this issue as I'm having a very similar behavior, under Windows where the initial startup is taking in the order of minutes before my home page is displayed. During that time, nothing is displayed in Firefox window, it's entirely blank. Like the OP, this only happens on a cold boot, if I restart Firefox after that, it starts fast.
What I found is that removing items from history helps a bit on the loading time, but it's coming back as I don't clean my history that often.
Doing a startup profiling, I see that it's spending most of its time in this call stack:

NtReadFile
ReadFile
winRead(sqlite3_file*, void*, int, long long)
`anonymous namespace'::xRead(sqlite3_file*, void*, int, long long)
readDbPage(PgHdr*)
getPageNormal(Pager*, unsigned int, PgHdr**, int)
getAndInitPage(BtShared*, unsigned int, MemPage**, BtCursor*, int)
moveToLeftmost(BtCursor*)
sqlite3VdbeExec(Vdbe*)
sqlite3_step(sqlite3_stmt*)
mozilla::storage::Connection::stepStatement(sqlite3*, sqlite3_stmt*)
Connection::stepStatement SELECT h.id, h.url, h.title AS page_title, h.rev_host, h.visit_count, h.last_visit_date, null, null, null, null, null, null AS tags , h.frecency, h.hidden, h.guid, null, null, null FROM moz_places h WHERE 1  AND hidden = 0 AND last_visit_date NOTNULL   AND ( h.id IN (SELECT place_id FROM moz_historyvisits WHERE visit_type =   :transition0_  ) )  ORDER BY 6 DESC LIMIT 42 
mozilla::storage::Statement::ExecuteStep(bool*)
Statement::ExecuteStep
nsNavHistory::ResultsAsList(mozIStorageStatement*, nsNavHistoryQueryOptions*, nsCOMArray<nsNavHistoryResultNode>*)
nsNavHistory::GetQueryResults(nsNavHistoryQueryResultNode*, RefPtr<nsNavHistoryQuery> const&, RefPtr<nsNavHistoryQueryOptions> const&, nsCOMArray<nsNavHistoryResultNode>*)
nsNavHistoryQueryResultNode::FillChildren()
nsNavHistoryQueryResultNode::OpenContainer()
nsNavHistoryContainerResultNode::SetContainerOpen(bool)
XPTC__InvokebyIndex
static XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
XPC_WN_GetterSetter(JSContext*, unsigned int, JS::Value*)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
js::CallSetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::Handle<JS::Value>)
SetExistingProperty(JSContext*, JS::Handle<JS::PropertyKey>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::Handle<js::NativeObject *>, JS::Handle<JS::PropertyResult>, JS::ObjectOpResult&)
js::NativeSetProperty<js::Qualified>(JSContext*, JS::Handle<js::NativeObject *>, JS::Handle<JS::PropertyKey>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::ObjectOpResult&)
Interpret(JSContext*, js::RunState&)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
set result
js::RunScript
js::CallSetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::Handle<JS::Value>)
SetExistingProperty(JSContext*, JS::Handle<JS::PropertyKey>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::Handle<js::NativeObject *>, JS::Handle<JS::PropertyResult>, JS::ObjectOpResult&)
js::NativeSetProperty<js::Qualified>(JSContext*, JS::Handle<js::NativeObject *>, JS::Handle<JS::PropertyKey>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::ObjectOpResult&)
JS_SetProperty(JSContext*, JS::Handle<JSObject *>, char const*, JS::Handle<JS::Value>)
nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*)
XPCWrappedJS method call
PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned long long*, unsigned long long*, double*)
SharedStub
nsNavHistoryResult::AddObserver(nsINavHistoryResultObserver*, bool)
XPTC__InvokebyIndex
static XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
Interpret(JSContext*, js::RunState&)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
DownloadHistoryList
getList/this._listPromises[key]<
js::RunScript
js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)
PromiseReactionJob(JSContext*, unsigned int, JS::Value*)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)
mozilla::dom::PromiseJobCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::ErrorResult&)
mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint(bool)
promise callback
mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int)
XPCJSContext::AfterProcessTask(unsigned int)
nsThread::ProcessNextEvent(bool, bool*)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()
XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)
XREMain::XRE_main
XRE_main(int, char**, mozilla::BootstrapConfig const&)
wmain(int, wchar_t**)
__scrt_common_main_seh()
BaseThreadInitThunk
RtlUserThreadStart
(root)

This stack was made with Firefox 83, I tried again with the current 88 version but it won't load the profile saying an item was expected but is undefined.

See Also: → 1741865

I am affected by this issue too or maybe a similar one. When I start Firefox, it can take minutes for the tabs to load. The private window is not affected, just the regular windows. I also use multi-account containers actively.

But I noticed, when I start my computer, updatedb.mlocate always runs and uses 100% IO. And when I kill updatedb.mlocate before starting Firefox, I get the tab content instantly. I guess, Firefox has to wait/is slowed down since mlocate hogs the harddisk. I have no idea if anything can be done about that, but maybe others could check their io usage (e.g. using iotop) too.

Dragana, is this actually P2?
P2 is "Fix in the next release cycle or the following (nightly + 1 or nightly + 2)" (https://firefox-source-docs.mozilla.org/bug-mgmt/guides/priority.html#priority-definitions).

Severity: normal → --
Flags: needinfo?(dd.mozilla)

Sorry, I spoke too soon. The issue is still there, removing updatedb.mlocate just helps but does not resolve the issue. Firefox still hangs with 100% IO in [QuotaManager IO] for quite some time after at least some reboots. Whatever it does there, it is IO-wise obviously quite expensive.

Christoph, curious, what kind of hard drive does your system have?

Also, do you think you could try to capture a performance profile (it was discussed earlier in this bug).
https://profiler.firefox.com/
https://profiler.firefox.com/docs/#/./guide-startup-shutdown

I have a Samsung 850 EVO for the OS but my home directory is located Seagate Barracuda with 3 TB, 2.5 Zoll, 5400 U/Min. It isn't the fastest kid on the block but should be fast enough for the usecase "browser startup".

As for the performance profile:
I am not sure I did it correctly. It seems to be too short, I mean, it spent 5 seconds in __poll, but startup took far longer.

$ export MOZ_PROFILER_STARTUP=1
$ firefox
https://share.firefox.dev/3pirUtO

If I did something wrong, please tell me and I can try again.

Hmm, I guess that profile is missing "QuotaManager IO" thread. Could you try to capture also that

I tried, but it doesn't appear in the profile. I just tried, startup took at least a minute, but the profile thread shows only 4 seconds. Are there any settings I could make?

Since somebody above mentioned that PROFILE/storage/default is read by QuotaManager, I just checked, the folder contains 55000 files using over 4 GB. Since i have no comparison, is that normal?

Interestingly, the top 3 domains need 80% of the space. And I don't remember visiting one of the at all (it seems, I visited it 3 months ago).

(In reply to Christoph Rabel from comment #81)

Interestingly, the top 3 domains need 80% of the space. And I don't remember visiting one of the at all (it seems, I visited it 3 months ago).

Can you share which are the top 3 domains (or rather origins) ?
It would be also good to know which sub directory under given origin directory contributes the most to that. It can be one of these "idb", "cache", "ls".

There are some settings for automatic clearing of unused origins. The current threshold is set to 13 months. Maybe you can tweak the setting. I can provide you more details how to do that if you are ok with experimenting with that.

Sure:

  1. app.slack.com I hardly use Slack. I am member of some channel and look into it every couple of weeks.
    2.1G cache
    244K idb
    28K ls
    In the cache folder is a morgue folder that contains the 2.1 GB and 14863 files

  2. sharepoint.com
    873 MB, same picture.

  3. rollingstone.com
    413 MB, same picture. I don't remember visiting the site.

I can certainly remove the files or experiment a bit. The question is: Does this just help only me or everybody? I guess, I could simply delete at least most of these folders. My guess is: If those folders get large, Firefox startup becomes slow. No idea what the code does when reading that stuff, but maybe it could e.g. just skip reading the cache folders at startup till they are used/needed.

The provided information definitely helps everybody. Thanks for that.
We already know about issues with app.slack.com, but when I was investigating this with some other user we were seeing "only" around 2000-3000 files.

Flags: needinfo?(jvarga)

Well, if you need further information, I can do some tests. If not, I will simply delete those pesky folders in a few days and hope that it will resolve the issue.

(In reply to Marco Castelluccio [:marco] from comment #76)

Dragana, is this actually P2?
P2 is "Fix in the next release cycle or the following (nightly + 1 or nightly + 2)" (https://firefox-source-docs.mozilla.org/bug-mgmt/guides/priority.html#priority-definitions).

Necko planned to look at this in Q1 2022.
(but this may not be a necko issue.)

Flags: needinfo?(dd.mozilla)

I finally took the time to do startup profiling with a recent enough version: https://share.firefox.dev/33Rk5mF
I took 43 seconds to show the homepage but I can't really figure out where this is coming from. Just that the hard drive activity LED was flashing the whole time.

Blocks: necko-perf
Severity: -- → S3

Some add-ons like Metamask can load very slowly and interfere with uBlock Origin causing 10+ second delays https://old.reddit.com/r/uBlockOrigin/comments/ryiqst/slow_load_after_firefox_startup/

This has been happening to me consistently for YEARS with whatever the latest version of Firefox is, across various operating systems including Windows and Linux.

Firefox window appears almost instantly at startup, and is fully interactive... You can type in an address, or pick from the bookmarks menu. But it takes 30+ seconds before it will actually connect to a website. After that, it runs normally for the the rest of the session.

Completely deleting the profile folder fixes it. So does deleting all history/cache data from the history menu.

Firefox is hitting the disk the whole time. I've used a perf monitor to see what files are being hit, and it seems to be going through the cache folder file by file for some reason.

For me, it has always just been part of "living with firefox".

I can't believe that only a handful of people are experiencing this.

I do run spinning magnetic hard drives....

I'm guessing that the devs and many users run SSDs all the time, so it masks the problem. Touching thousands of cache files at every startup is SLOW.

I think this might be the issue I've been experiencing - it's slightly faster in Safe Mode, but still rather slow to open. I'd also like to note that I'm using an SSD.

Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload][necko-triaged][fxperf:p2] → [necko-triaged][fxperf:p2]
Whiteboard: [necko-triaged][fxperf:p2] → [necko-triaged]

The severity field for this bug is relatively low, S3. However, the bug has 50 CCs.
:dragana, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dd.mozilla)
Assignee: nobody → acreskey

Redirect a needinfo that is pending on an inactive user to the triage owner.
:dragana, since the bug has high priority and recent activity, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(vinieich) → needinfo?(dd.mozilla)

I believe that I can reproduce the same issue on my MacBookPro (SSD drive)
Using a larger profile that is about 3 years old.

https://share.firefox.dev/38kvZIo

3.7 seconds delay on startup in mozilla::dom::LSObject::EnsureDatabase

With a fresh profile, I don't see this delay.

The delay will increase roughly proportionate to the size of the storage/default folder.

A profile with all threads captured,
https://share.firefox.dev/3adp8RB

As seen elsewhere, we see the busy QuotaManager IO thread in the parent process:

mozilla::dom::quota::QuotaManager::InitializeOrigin(mozilla::dom::quota::PersistenceType, mozilla::dom::quota::OriginMetadata const&, long long, bool, nsIFile*)
mozilla::dom::quota::QuotaManager::LoadQuota()
mozilla::dom::quota::QuotaManager::EnsureTemporaryStorageIsInitialized()
mozilla::dom::(anonymous namespace)::PrepareDatastoreOp::NestedRun()
mozilla::dom::(anonymous namespace)::LSRequestBase::Run()

As far as I can tell, I haven't seen the QuotaManager cache the results of quota initialization.
That is, after a clean shutdown, I can see the same delay on launch.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #71)
...

Potential mitigations / workarounds:

  • QuotaManager tries to cache the results of quota initialization at successful shutdown after successful initialization. It invalidates this cache whenever the build id changes, which causes a potentially very slow initialization scan. By setting the boolean pref dom.quotaManager.caching.checkBuildId to false in nightly in about:config, it can be made to not invalidate the cache due to upgrades. This pref really only exists for testing purposes and was only added to nightly a few days ago, but it's a way to side-step the problem.

Hi Andrew,
I'm seeing that loading any content on startup that makes use of localStorage will block on the QuotaManager initialization.
Removing calls to localStorage defers the initialization so that it doesn't block pageload.

Above you mentioned the QuotaManager trying to cache the results of quota initialization and also invalidating based on the build id changing.
I'm seeing a longer initialization on every launch, independent of a build id change.

Is it possible that the caching mechanism is not functioning, or else is being bypassed in my local tests?
I've tried running with dom.quotaManager.caching.checkBuildId set to false but I see the same initialization delay.

Flags: needinfo?(bugmail)

Yes, setting dom.quotaManager.caching.checkBuildId to false avoids the build id check, but it should be used only for testing. The code for that is located here.

The cache can only be used if it's valid, that is when the previous FF session was correctly closed (there was no crash), so we were able to store the state during shutdown.

If you suspect that the cache isn't used, try to debug the code in this block

Flags: needinfo?(bugmail)

Thank you Jan, this is interesting.
What I'm seeing is that mCacheUsable is true but the cache is not valid.
I'm shutting down Firefox as gracefully as I can, via Quit Nightly.

Hm, can you manually check the <profile>/storage.sqlite database ?
What does select valid from cache; give ?

(In reply to Jan Varga [:janv] from comment #99)

Hm, can you manually check the <profile>/storage.sqlite database ?
What does select valid from cache; give ?

So when using the large profile, the valid field of storage.sqlite is always 0 after a shutdown.

I've instrumented my build and in other scenarios (i.e. generally) the valid field is set to 1 on shutdown in QuotaManager::UnloadQuota().
In these cases we don't invalidate the cache on startup and everything proceeds smoothly.

I've just started debugging the shutdown with the large profile, but it looks like it may be crashing before we reach QuotaManager::UnloadQuote()

[2022-05-31T15:10:02Z ERROR xulstore::persist] removeDocument error: unavailable

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #100)

(In reply to Jan Varga [:janv] from comment #99)

Hm, can you manually check the <profile>/storage.sqlite database ?
What does select valid from cache; give ?

So when using the large profile, the valid field of storage.sqlite is always 0 after a shutdown.

Hm, interesting. We thought that people experience hangs only after an upgrade (which happens much more often on nightly channel of course).
It would be very helpful if you could identify exact place in the code which causes UnloadQuota to finish/return prematurely, so the valid flag is not set to true.

Unsure if this is helpful but I experience this issue on every startup typically except the startup following an upgrade.

Yesterday, I noticed that it actually only seems to occur when my monitor is set to its maximum refresh rate (300hz) and doesn't occur at 240hz or 60hz (the other supported modes). Literally I can open firefox, experience the hang, close firefox, change the refresh rate, and the hang will not occur on next open. Likewise, it returns if I set the refresh rate to 300. Hopefully that's helpful, and not just a description of a different bug with the same symptoms!

https://bugzilla.mozilla.org/show_bug.cgi?id=1734947#c22

(In reply to Jan Varga [:janv] from comment #101)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #100)

(In reply to Jan Varga [:janv] from comment #99)

Hm, can you manually check the <profile>/storage.sqlite database ?
What does select valid from cache; give ?

So when using the large profile, the valid field of storage.sqlite is always 0 after a shutdown.

Hm, interesting. We thought that people experience hangs only after an upgrade (which happens much more often on nightly channel of course).
It would be very helpful if you could identify exact place in the code which causes UnloadQuota to finish/return prematurely, so the valid flag is not set to true.

I'll have a bit of time to narrow this down further, but now I'm seeing that we enter QuotaManager::UnloadQuota() but do not progress past this block of code:
https://searchfox.org/mozilla-central/rev/3419858c997f422e3e70020a46baae7f0ec6dacc/dom/quota/ActorsParent.cpp#4500-4550
i.e. we exit the function before setting the DB as valid.

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #103)

I'll have a bit of time to narrow this down further, but now I'm seeing that we enter QuotaManager::UnloadQuota() but do not progress past this block of code:
https://searchfox.org/mozilla-central/rev/3419858c997f422e3e70020a46baae7f0ec6dacc/dom/quota/ActorsParent.cpp#4500-4550
i.e. we exit the function before setting the DB as valid.

Sure, it must be failing somewhere in that block. All calls in the block use QM_TRY, so you should see warnings in the terminal/system console in debug builds. The warnings contain exact source file name and source line number.

(In reply to Jan Varga [:janv] from comment #104)

Sure, it must be failing somewhere in that block. All calls in the block use QM_TRY, so you should see warnings in the terminal/system console in debug builds. The warnings contain exact source file name and source line number.

Ok, I found two of things:

1 - The presence of a folder with an invalid name (e.g. include a space) in storage/default will prevent the cache from ever becoming valid.
While this shouldn't normally happen, I had manually modified my storage/default and it resulted in these errors:

[Parent 65500, QuotaManager IO] WARNING: The name of the origin directory (http+++asciicker.com test) doesn't match the sanitized origin string (http+++asciicker.com) in the metadata file!: file /Users/acreskey/dev/new/mozilla-central/dom/quota/ActorsParent.cpp:4959

And so QuotaManager::UnloadQuota() would error out mid-way:

[Parent 65435, QuotaManager IO] WARNING: QM_TRY failure (ERROR): 'ToResult(insertStmt->Execute()) failed with resultCode 0x80630003, resultName NS_ERROR_STORAGE_CONSTRAINT', file dom/quota/ActorsParent.cpp:4546

Specifically here.

Is it possible to ignore such files and still allow the caching to proceed?

2 - Independently, in a scenario where the user has a large storage/default folder and the cache is valid, I am still seeing a significant stall on initialization.
We process each file, and the time-consuming work looks to be in mozilla::dom::cache::(anonymous namespace)::ReduceUsageInfo
https://share.firefox.dev/3x0hxhb
I don't know if this is needed on initialization.

Jan and I were working on in this via direct message.
Summary:

  1. Issue 1 from Comment 105 is a known issue (user creates a misnamed folder), but likely not a common occurrence

  2. We identified a case where loading a successfully cached quota could also fail.
    Jan pinpointed the cause, a bug in deserialization. Should be ToInteger64. See Bug 1702600.

This was happening on my profile and is certainly occurring for other users.

Flags: needinfo?(dd.mozilla)
Depends on: 1702600
Flags: needinfo?(dd.mozilla)

Patches for bug 1702600 are up for review.

I've pulled in the latest mozilla-central with the fix for bug 1702600 merged.

As expected, once the local storage cache is valid (i.e. subsequent runs after an upgrade) I can no longer reproduce the delayed initial network request.

I propose that we close this as fixed.

Jan fixed the general case of this issue with Bug 1702600.

On cache invalidation due to Firefox upgrade users will experience a delay based on the size of their local storage and the speed of their hard drive.
I propose we log that as a new issue.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
OS: Linux → All
See Also: → 1774924

Sorry, but is this supposed to be fixed in 103.0b3 (64-Bit)? In that case: It is not fixed at all for me.

I waited really hard for 103b, because this issue is really annoying. I got the upgrade for that version and restarted Firefox after the upgrade. It took a while. Ok, AFAIK the fix only helps when there was no Firefox upgrade, right?

To test the issue, I rebooted the computer. Starting Firefox took a very long, long time. Alas, I didn't measure it.

To make sure, I rebooted again now and measured it: 18 seconds from clicking the Firefox icon till a page was shown.
When I close Firefox again and start it from scratch: 7 seconds for the same page.
(Note: I just measured the page: It needs 3,6 seconds to load)

I am pretty sure it took much longer than 18 seconds at the first reboot, but alas, I didn't measure it.

Please note: While I am logged in, I can restart Firefox as often as I want and it always starts fast. BUT after a reboot, it's still slow.

I rebooted again:
<Click Firefox Icon>
$date
Sa 02 Jul 2022 11:15:35 CEST
<Page loaded>
$ date
Sa 02 Jul 2022 11:16:23 CEST

Ok, another reboot and this time it was RIDICULOUS:
<Click Firefox Icon>
$ date
So 03 Jul 2022 19:35:30 CEST
<Page loaded>
$ date
So 03 Jul 2022 19:39:55 CEST

Firefox barely needed CPU during that time, but I could see Quotamanager IO all the time in iotop.

Christoph, your issue might be a different one, maybe open a new bug?

I can do that, but I believe, it is still the same issue. At least the initial reporter, Akshay S Dinesh, seems to be affected from the same issue. Just read his comments, e.g.:

"After deleting the archived folder, and now restarting the computer, the startup bug is still there. That obviously disconnects this from the datareporting service. I'm sorry."

I mean, you apparently fixed one problem in Bug 1702600 which might or might not have been the root cause for this issue and closed it. You never confirmed, that it actually fixes this issue, you just assumed, it does.

Also, the fix in 1702600 did nothing about the performance issue at startup itself.

Please note: While I am logged in, I can restart Firefox as often as I want and it always starts fast. BUT after a reboot, it's still slow.

Thanks for testing this out Christoph.
Yes, please do file this is a new bug where after a reboot the first page load is slow.

There are multiple issues that can delay first load (such as the upgrade), so it's more effective for us if each is logged and fixed separately.

Just for reference, I have opened Bug 1778374.

See Also: → 1778374
See Also: → 1899015
You need to log in before you can comment on or make changes to this bug.