Thunderbird 78.x slow startup each time caches are invalidated - 40 seconds vs 5 seconds. Firefox has same problem. 86.0b is fine.
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr78 affected, thunderbird86 unaffected)
| Tracking | Status | |
|---|---|---|
| thunderbird_esr78 | --- | affected |
| thunderbird86 | --- | unaffected |
People
(Reporter: cmgaudry33, Unassigned)
References
Details
(Keywords: perf, regression, regressionwindow-wanted)
Each time, the startup cache is invalidated (because of version update, compatibility.ini file or the startup cache is missing from the user profile), Thunderbird 78 (from 78.0 to 78.7.1) is starting too slowly (about 40 seconds instead of 5 seconds).
If you restart Thunderbird, everything seems to be fine (starts in 5 seconds).
Please see the shared profile : https://share.firefox.dev/3qEwtgr
You will see that writing to the cache is locked.
| Reporter | ||
Comment 1•5 years ago
|
||
I'm looking for why there three 10 seconds timeouts are occurring while the cache is locked (see Marker Chart).
| Reporter | ||
Comment 2•5 years ago
•
|
||
Same problem with Firefox 78.7.1, starting without any caches, (https://share.firefox.dev/3bAiAJO) where releasing cache write output stream is locked multiple times and takes too much time.
If caches are already there, everything is OK : https://share.firefox.dev/3dDA5eR
| Reporter | ||
Comment 3•5 years ago
|
||
Enabling profiling or tracing (NSPR_LOG_MODULES) speeds up starting : 12 seconds instead of 40 seconds.
| Reporter | ||
Comment 4•5 years ago
|
||
Using Thunderbird 86.0b3 with the same configuration and profile, without any profiling or tracing: starting duration is divided by 4 (10 seconds instead of 40 seconds).
| Reporter | ||
Comment 5•5 years ago
|
||
Is anyone working on this lock issue?
In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?
Comment 6•5 years ago
|
||
(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)
In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?
I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?
| Reporter | ||
Comment 7•5 years ago
•
|
||
(In reply to Wayne Mery (:wsmwk) from comment #6)
(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)
In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?
I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?
So nobody is trying to solve it (or bug 1649103).
This is really a bad news.
So Firefox 78 and Thunderbird 78 cannot be sent to production at all.
All my users will stay on 68.9.0 versions without any updates for one year. Or they need to change all their computers to more powerful ones to mitigate this issue. It is quite disappointing.
PS : But what happens when a user goes directly from 68 to 91? At least there will be a problem with migrating address books that have been deleted from newer versions.
Comment 8•5 years ago
|
||
Thanks for the detailed information you have provided thus far. Let's press on...
(In reply to Chris GAUDRY [:cmgaudry33] from comment #7)
(In reply to Wayne Mery (:wsmwk) from comment #6)
(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)
In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?
I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?
So nobody is trying to solve it
The cause (and even the full conditions) of "it" hasn't yet been clearly identified, so I don't see how you arrive at a conclusion of nobody is trying to solve it.
(or bug 1649103).
Richard has a wide range of issues which are still being worked on (for example bug 1660672). But as far as I can tell not, or mostly not, directly related to what you have reported here. But if you find direct correlation in a specific area, please point it out.
So Firefox 78 and Thunderbird 78 cannot be sent to production at all.
All my users will stay on 68.9.0 versions without any updates for one year. Or they need to change all their computers to more powerful ones to mitigate this issue. It is quite disappointing.
This is the first time you are stating that the problem wasn't seen in version 68. That is helpful. Note - 68.9.0 is not the most current version of 68.
I just realized we don't know some things about your environment - questions for you:
- Profile is on local spinning disk (not SSD), and also not a roaming profile?
- What type of cache are you referring to - locally stored imap message bodies, or the cache in \AppData\local\Thunderbird ?
- What is the size of the Inbox(s) of the accounts in the profile?
- To what extent does this reproduce with Windows started in safe mode?
- Have you personally been using version 78 for some months or weeks prior to reporting this bug?
- This problem isn't also blocking you from updating these same users to Firefox 78?
- (not a question for you) 8 months into version 78, why has no one else reported this issue? (I haven't found any bug reports similar to yours
FWIW, I noticed a significant chunk of jank in https://share.firefox.dev/3qEwtgr from comment 0 is nsObserverService::NotifyObservers addrbook-bulk-add-cards
PS : But what happens when a user goes directly from 68 to 91? At least there will be a problem with migrating address books that have been deleted from newer versions.
I don't understand your question.
Comment 9•5 years ago
|
||
Richard, is this something you have seen?
With more information, I will attempt to reproduce this.
Comment 10•5 years ago
|
||
Hi Chris, Wayne,
@Chris,
How/Where do you see the cache invalidated in the perf profile?
In addition of Wayne environment questions above in Comment 8, can you provide:
-
the list of feature in use/enabled in the affected profile?
Email, Network Addressbook, Network Calendar, etc... how many accounts of which type? Have you tried disabling features to identify which one may be the possible culprit? -
any plug-in enabled - does the same issue happens in Safe Mode / All plug-ins disabled?
@Wayne,
In Chris perf profile, in the Marker Table tab, I noticed a jump of logs between 5.842s and 15.976s without any lines recorded in between...
Could that be indicative of some sort of an issue within core?
(In reply to Wayne Mery (:wsmwk) from comment #9)
Richard, is this something you have seen?
Well I can still see slowness in starting TB, but would that be linked to issue reported here or not, I cannot tell... I am not sure what to look for in the perf profile.
I had previously publish a certain number of perf profiles here...
https://bugzilla.mozilla.org/show_bug.cgi?id=1649103#c52
Can Chris or you spot anything in them from there?
With more information, I will attempt to reproduce this.
If you can advise on the steps to reproduce and what to look for in the perf profile I can have a look... I currently use 87.0b2 (64-bit) but I also have the 78.x branch installed for testing purpose...
My understanding from cache is that it would only be valid for a certain period of time so could it be that it is invalidated after a certain time period?
I also noticed that if I restart TB my network CalDAV calendar would be re-loaded even though cache is enabled... I think a sync would still occur to compare cache entry and data on the server anyway... but still much faster with cache enabled than without cache... I am not familiar with the CardDAV feature of TB just yet... so I may not have the same environment as Chris...
Regards,
Comment 11•5 years ago
|
||
@Chris,
Does the same issue raise:
- if you clear cache in the existing affected profile?
- in a newly created profile?
Regards,
Comment 12•5 years ago
|
||
(In reply to Richard Leger from comment #10)
Hi Chris, Wayne,
@Chris,
How/Where do you see the cache invalidated in the perf profile?
In addition of Wayne environment questions above in Comment 8, can you provide:
the list of feature in use/enabled in the affected profile?
Email, Network Addressbook, Network Calendar, etc... how many accounts of which type? Have you tried disabling features to identify which one may be the possible culprit?any plug-in enabled - does the same issue happens in Safe Mode / All plug-ins disabled?
@Wayne,
In Chris perf profile, in the Marker Table tab, I noticed a jump of logs between 5.842s and 15.976s without any lines recorded in between...
Could that be indicative of some sort of an issue within core?
I assume you are referring to the profile in comment 0 https://profiler.firefox.com/public/z4sanbcf2sw0ysy1hx9m7862tv9bz8ym57hjdq0/marker-chart/?globalTrackOrder=0&hiddenLocalTracksByPid=1152-1-2&localTrackOrderByPid=1152-0-1-2~&thread=0&v=5
In general, activity without symbols would I think be related to things happening in the OS into which the profiler has no visibility. In this specific case I don't know - but I'm not yet good at reading these profiles. We need someone who has the skills to guide us.
In the firefox profile https://profiler.firefox.com/public/8aycw69nrke474f1cmgx0ck3e1g2598as9j3zt0/network-chart/?globalTrackOrder=0-1-2&hiddenLocalTracksByPid=7708-0-2-3-4&localTrackOrderByPid=7708-6-0-1-2-3-4-5~8212-1-0~1548-0~&thread=0&v=5 the middle third of the profile is activity related to https://portail.toto.fr/toto/
Comment 13•5 years ago
|
||
It looks like there are major slow downs caused EMET (Enhanced Mitigation Experience Toolkit). There's a big one around MOZ_XML_Parse and both other places where we see a big pause, we see it around thread creation which it also looks like EMET is interfering with.
Chris, can you try disabling EMET? This article might help with that:
https://support.solarwinds.com/SuccessCenter/s/article/Stop-the-Microsoft-EMET-service?language=en_US
Comment 14•5 years ago
|
||
Thanks Jeff for the analysis. The details from from matrix chat...
wsmwk
We're really curious what is happening with https://bugzilla.mozilla.org/show_bug.cgi?id=1693810#c0 ... and in light of the follow up at comment 12
jrmuizel
wsmwk: perhaps EMET is causing badness
wsmwk: the long pauses are pretty weird
wsmwk
that wouldn't surprise me, because the user has not yet attempted to reproduce in windows safe mode
where do you detect EMET?
jrmuizel
Being called from MOZ_XML_Parse around 16.8s into the profile from comment 0
wsmwk
jrmuizel: and that activity is being account for, at least in part, under WaitForMultipleObjectsExImplementation ? ~26%
... there is also whatever is happening between 23s and 34s
jrmuizel
yeah in all cases we're stuck waiting for something
the profile doesn't really give an indication of what we're waiting for
actually maybe it does
it looks like perhaps thread creation is being made really slow by EMET
both big waits are around the place that we create a new thread
and there's EMET.dll inbetween our calls to beginthreadex and the call to NtCreateThreadEx
so I'd say it's not worth further investigation until EMET is eliminated
Comment 15•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #12)
I assume you are referring to the profile in comment 0 https://profiler.firefox.com/public/z4sanbcf2sw0ysy1hx9m7862tv9bz8ym57hjdq0/marker-chart/?globalTrackOrder=0&hiddenLocalTracksByPid=1152-1-2&localTrackOrderByPid=1152-0-1-2~&thread=0&v=5
In addition to the long time spent in EMET.dll, the profile also reminds me of bug 1667581 (nsSegmentedBuffer::FreeOMT) and bug 1510226 (NS_NewNamedThread blocking the main thread).
| Reporter | ||
Comment 16•5 years ago
•
|
||
Our team looked for a way to speed up the startup of Thunderbird.
On versions prior to 78.7.1 included, you must set the preference "nglayout.debug.disable_xul_cache" to true, to change from 40s to 10s.
Since version 78.8.0, this trick has no effect, Thunderbird starts naturally in 10s.
Still, I guess the problem may come back if the racing conditions change.
Is it possible that one of your specialists takes a look at the interaction that the XUL layer can have on the startupCache ?
Anyway, I'll test with the "nglayout.debug.disable_xul_cache" preference set to false and EMET disabled, and let you know.
...
...
Disabling the EMET service has no impact: the start is always done in 36 or 40s.
Comment 17•5 years ago
|
||
(In reply to Chris GAUDRY [:cmgaudry33] from comment #16)
Our team looked for a way to speed up the startup of Thunderbird.
On versions prior to 78.7.1 included, you must set the preference "nglayout.debug.disable_xul_cache" to true, to change from 40s to 10s.
Since version 78.8.0, this trick has no effect, Thunderbird starts naturally in 10s.
So you are saying in version 78.8.0, startup time is back to normal/acceptable?
Comment 18•5 years ago
•
|
||
One change which seems to be in Thunderbird version 78.0 is Bug 1641866 - XUL prototype cache preference ( nglayout.debug.disable_xul_cache) is not updated correctly if deleted at runtime.
The net effect of that change to Thunderbird performance when updates occur is not clear to me.
| Reporter | ||
Comment 19•5 years ago
•
|
||
For your information, in our environment, EMET is configured with "thunderbird.exe -ASR -EAF+". It makes sense that disabling EMET doesn't change anything.
Yes, I'm saying that with Thunderbird 78.8.0 and 78.8.1, starting duration is about 10 seconds, the same as with 86.0b3 and 87.0b2.
With Thunderbird 68.9.0, it takes half time.
Comment 20•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #18)
Why would it be deleted at runtime, ever?
| Reporter | ||
Comment 21•5 years ago
|
||
I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.
Comment 22•5 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #20)
(In reply to Wayne Mery (:wsmwk) from comment #18)
Why would it be deleted at runtime, ever?
Dunno. But I didn't title the bug report. In any event, losing the cache should not be normal, unless one is using roaming profiles, right?
Comment 23•4 years ago
|
||
(In reply to Chris GAUDRY [:cmgaudry33] from comment #21)
I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.
Can you find the exact regression range using https://mozilla.github.io/mozregression/ ?
Comment 24•4 years ago
•
|
||
Chris, is a newer beta still "bad"?
(In reply to Wayne Mery (:wsmwk) from comment #23)
(In reply to Chris GAUDRY [:cmgaudry33] from comment #21)
I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.Can you find the exact regression range using https://mozilla.github.io/mozregression/ ?
Updated•3 years ago
|
| Reporter | ||
Comment 26•3 years ago
|
||
Hi,
You can close this bug.
Regards
Comment 28•3 years ago
|
||
presumably WFM
Description
•