Closed Bug 1693810 Opened 5 years ago Closed 3 years ago

Thunderbird 78.x slow startup each time caches are invalidated - 40 seconds vs 5 seconds. Firefox has same problem. 86.0b is fine.

Categories

(Thunderbird :: General, defect)

x86_64
Windows 7
defect

Tracking

(thunderbird_esr78 affected, thunderbird86 unaffected)

RESOLVED WORKSFORME
Tracking Status
thunderbird_esr78 --- affected
thunderbird86 --- unaffected

People

(Reporter: cmgaudry33, Unassigned)

References

Details

(Keywords: perf, regression, regressionwindow-wanted)

Each time, the startup cache is invalidated (because of version update, compatibility.ini file or the startup cache is missing from the user profile), Thunderbird 78 (from 78.0 to 78.7.1) is starting too slowly (about 40 seconds instead of 5 seconds).
If you restart Thunderbird, everything seems to be fine (starts in 5 seconds).

Please see the shared profile : https://share.firefox.dev/3qEwtgr

You will see that writing to the cache is locked.

Keywords: perf

I'm looking for why there three 10 seconds timeouts are occurring while the cache is locked (see Marker Chart).

Same problem with Firefox 78.7.1, starting without any caches, (https://share.firefox.dev/3bAiAJO) where releasing cache write output stream is locked multiple times and takes too much time.
If caches are already there, everything is OK : https://share.firefox.dev/3dDA5eR

Enabling profiling or tracing (NSPR_LOG_MODULES) speeds up starting : 12 seconds instead of 40 seconds.

Using Thunderbird 86.0b3 with the same configuration and profile, without any profiling or tracing: starting duration is divided by 4 (10 seconds instead of 40 seconds).

Is anyone working on this lock issue?
In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?

Flags: needinfo?(vseerror)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)

In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?

I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?

Flags: needinfo?(vseerror) → needinfo?(chris.m.gaudry)
Summary: Slow performance of Thunderbird 78.x at startup each time caches are invalidated → Slow performance of Thunderbird 78.x at startup each time caches are invalidated. Firefox has same problem.

(In reply to Wayne Mery (:wsmwk) from comment #6)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)

In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?

I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?

So nobody is trying to solve it (or bug 1649103).
This is really a bad news.
So Firefox 78 and Thunderbird 78 cannot be sent to production at all.
All my users will stay on 68.9.0 versions without any updates for one year. Or they need to change all their computers to more powerful ones to mitigate this issue. It is quite disappointing.

PS : But what happens when a user goes directly from 68 to 91? At least there will be a problem with migrating address books that have been deleted from newer versions.

Flags: needinfo?(chris.m.gaudry) → needinfo?(vseerror)

Thanks for the detailed information you have provided thus far. Let's press on...

(In reply to Chris GAUDRY [:cmgaudry33] from comment #7)

(In reply to Wayne Mery (:wsmwk) from comment #6)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #5)

In version 78, will there be a backport of the modifications made to the StartupCache class or to the locks that the cache mechanism uses?

I'm guessing no - they tend not to do that for ESR. Which bug# implements this for Firefox?

So nobody is trying to solve it

The cause (and even the full conditions) of "it" hasn't yet been clearly identified, so I don't see how you arrive at a conclusion of nobody is trying to solve it.

(or bug 1649103).

Richard has a wide range of issues which are still being worked on (for example bug 1660672). But as far as I can tell not, or mostly not, directly related to what you have reported here. But if you find direct correlation in a specific area, please point it out.

So Firefox 78 and Thunderbird 78 cannot be sent to production at all.
All my users will stay on 68.9.0 versions without any updates for one year. Or they need to change all their computers to more powerful ones to mitigate this issue. It is quite disappointing.

This is the first time you are stating that the problem wasn't seen in version 68. That is helpful. Note - 68.9.0 is not the most current version of 68.

I just realized we don't know some things about your environment - questions for you:

  1. Profile is on local spinning disk (not SSD), and also not a roaming profile?
  2. What type of cache are you referring to - locally stored imap message bodies, or the cache in \AppData\local\Thunderbird ?
  3. What is the size of the Inbox(s) of the accounts in the profile?
  4. To what extent does this reproduce with Windows started in safe mode?
  5. Have you personally been using version 78 for some months or weeks prior to reporting this bug?
  6. This problem isn't also blocking you from updating these same users to Firefox 78?
  7. (not a question for you) 8 months into version 78, why has no one else reported this issue? (I haven't found any bug reports similar to yours

FWIW, I noticed a significant chunk of jank in https://share.firefox.dev/3qEwtgr from comment 0 is nsObserverService::NotifyObservers addrbook-bulk-add-cards

PS : But what happens when a user goes directly from 68 to 91? At least there will be a problem with migrating address books that have been deleted from newer versions.

I don't understand your question.

Flags: needinfo?(vseerror) → needinfo?(chris.m.gaudry)
See Also: → 1649103
Summary: Slow performance of Thunderbird 78.x at startup each time caches are invalidated. Firefox has same problem. → Thunderbird 78.x slow startup each time caches are invalidated - 40 seconds vs 5 seconds. Firefox has same problem. Beta 86 is fine.

Richard, is this something you have seen?

With more information, I will attempt to reproduce this.

Flags: needinfo?(richard.leger)

Hi Chris, Wayne,

@Chris,

How/Where do you see the cache invalidated in the perf profile?

In addition of Wayne environment questions above in Comment 8, can you provide:

  • the list of feature in use/enabled in the affected profile?
    Email, Network Addressbook, Network Calendar, etc... how many accounts of which type? Have you tried disabling features to identify which one may be the possible culprit?

  • any plug-in enabled - does the same issue happens in Safe Mode / All plug-ins disabled?

@Wayne,

In Chris perf profile, in the Marker Table tab, I noticed a jump of logs between 5.842s and 15.976s without any lines recorded in between...
Could that be indicative of some sort of an issue within core?

(In reply to Wayne Mery (:wsmwk) from comment #9)

Richard, is this something you have seen?

Well I can still see slowness in starting TB, but would that be linked to issue reported here or not, I cannot tell... I am not sure what to look for in the perf profile.

I had previously publish a certain number of perf profiles here...
https://bugzilla.mozilla.org/show_bug.cgi?id=1649103#c52

Can Chris or you spot anything in them from there?

With more information, I will attempt to reproduce this.

If you can advise on the steps to reproduce and what to look for in the perf profile I can have a look... I currently use 87.0b2 (64-bit) but I also have the 78.x branch installed for testing purpose...

My understanding from cache is that it would only be valid for a certain period of time so could it be that it is invalidated after a certain time period?

I also noticed that if I restart TB my network CalDAV calendar would be re-loaded even though cache is enabled... I think a sync would still occur to compare cache entry and data on the server anyway... but still much faster with cache enabled than without cache... I am not familiar with the CardDAV feature of TB just yet... so I may not have the same environment as Chris...

Regards,

Flags: needinfo?(richard.leger)

@Chris,

Does the same issue raise:

  • if you clear cache in the existing affected profile?
  • in a newly created profile?

Regards,

(In reply to Richard Leger from comment #10)

Hi Chris, Wayne,

@Chris,

How/Where do you see the cache invalidated in the perf profile?

In addition of Wayne environment questions above in Comment 8, can you provide:

  • the list of feature in use/enabled in the affected profile?
    Email, Network Addressbook, Network Calendar, etc... how many accounts of which type? Have you tried disabling features to identify which one may be the possible culprit?

  • any plug-in enabled - does the same issue happens in Safe Mode / All plug-ins disabled?

@Wayne,

In Chris perf profile, in the Marker Table tab, I noticed a jump of logs between 5.842s and 15.976s without any lines recorded in between...
Could that be indicative of some sort of an issue within core?

I assume you are referring to the profile in comment 0 https://profiler.firefox.com/public/z4sanbcf2sw0ysy1hx9m7862tv9bz8ym57hjdq0/marker-chart/?globalTrackOrder=0&hiddenLocalTracksByPid=1152-1-2&localTrackOrderByPid=1152-0-1-2~&thread=0&v=5

In general, activity without symbols would I think be related to things happening in the OS into which the profiler has no visibility. In this specific case I don't know - but I'm not yet good at reading these profiles. We need someone who has the skills to guide us.

In the firefox profile https://profiler.firefox.com/public/8aycw69nrke474f1cmgx0ck3e1g2598as9j3zt0/network-chart/?globalTrackOrder=0-1-2&hiddenLocalTracksByPid=7708-0-2-3-4&localTrackOrderByPid=7708-6-0-1-2-3-4-5~8212-1-0~1548-0~&thread=0&v=5 the middle third of the profile is activity related to https://portail.toto.fr/toto/

It looks like there are major slow downs caused EMET (Enhanced Mitigation Experience Toolkit). There's a big one around MOZ_XML_Parse and both other places where we see a big pause, we see it around thread creation which it also looks like EMET is interfering with.

Chris, can you try disabling EMET? This article might help with that:
https://support.solarwinds.com/SuccessCenter/s/article/Stop-the-Microsoft-EMET-service?language=en_US

Thanks Jeff for the analysis. The details from from matrix chat...

wsmwk
We're really curious what is happening with https://bugzilla.mozilla.org/show_bug.cgi?id=1693810#c0 ... and in light of the follow up at comment 12

jrmuizel
wsmwk: perhaps EMET is causing badness
wsmwk: the long pauses are pretty weird

wsmwk
that wouldn't surprise me, because the user has not yet attempted to reproduce in windows safe mode
where do you detect EMET?

jrmuizel
Being called from MOZ_XML_Parse around 16.8s into the profile from comment 0

wsmwk
jrmuizel: and that activity is being account for, at least in part, under WaitForMultipleObjectsExImplementation ? ~26%
... there is also whatever is happening between 23s and 34s

jrmuizel
yeah in all cases we're stuck waiting for something
the profile doesn't really give an indication of what we're waiting for
actually maybe it does
it looks like perhaps thread creation is being made really slow by EMET
both big waits are around the place that we create a new thread
and there's EMET.dll inbetween our calls to beginthreadex and the call to NtCreateThreadEx
so I'd say it's not worth further investigation until EMET is eliminated

(In reply to Wayne Mery (:wsmwk) from comment #12)

I assume you are referring to the profile in comment 0 https://profiler.firefox.com/public/z4sanbcf2sw0ysy1hx9m7862tv9bz8ym57hjdq0/marker-chart/?globalTrackOrder=0&hiddenLocalTracksByPid=1152-1-2&localTrackOrderByPid=1152-0-1-2~&thread=0&v=5

In addition to the long time spent in EMET.dll, the profile also reminds me of bug 1667581 (nsSegmentedBuffer::FreeOMT) and bug 1510226 (NS_NewNamedThread blocking the main thread).

Our team looked for a way to speed up the startup of Thunderbird.
On versions prior to 78.7.1 included, you must set the preference "nglayout.debug.disable_xul_cache" to true, to change from 40s to 10s.
Since version 78.8.0, this trick has no effect, Thunderbird starts naturally in 10s.
Still, I guess the problem may come back if the racing conditions change.
Is it possible that one of your specialists takes a look at the interaction that the XUL layer can have on the startupCache ?

Anyway, I'll test with the "nglayout.debug.disable_xul_cache" preference set to false and EMET disabled, and let you know.
...
...
Disabling the EMET service has no impact: the start is always done in 36 or 40s.

Flags: needinfo?(chris.m.gaudry) → needinfo?(vseerror)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #16)

Our team looked for a way to speed up the startup of Thunderbird.
On versions prior to 78.7.1 included, you must set the preference "nglayout.debug.disable_xul_cache" to true, to change from 40s to 10s.
Since version 78.8.0, this trick has no effect, Thunderbird starts naturally in 10s.

So you are saying in version 78.8.0, startup time is back to normal/acceptable?

Flags: needinfo?(vseerror) → needinfo?(chris.m.gaudry)

One change which seems to be in Thunderbird version 78.0 is Bug 1641866 - XUL prototype cache preference ( nglayout.debug.disable_xul_cache) is not updated correctly if deleted at runtime.

The net effect of that change to Thunderbird performance when updates occur is not clear to me.

Flags: needinfo?(mkmelin+mozilla)

For your information, in our environment, EMET is configured with "thunderbird.exe -ASR -EAF+". It makes sense that disabling EMET doesn't change anything.

Yes, I'm saying that with Thunderbird 78.8.0 and 78.8.1, starting duration is about 10 seconds, the same as with 86.0b3 and 87.0b2.
With Thunderbird 68.9.0, it takes half time.

Flags: needinfo?(chris.m.gaudry) → needinfo?(vseerror)

(In reply to Wayne Mery (:wsmwk) from comment #18)
Why would it be deleted at runtime, ever?

Flags: needinfo?(mkmelin+mozilla)

I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.

(In reply to Magnus Melin [:mkmelin] from comment #20)

(In reply to Wayne Mery (:wsmwk) from comment #18)
Why would it be deleted at runtime, ever?

Dunno. But I didn't title the bug report. In any event, losing the cache should not be normal, unless one is using roaming profiles, right?

Flags: needinfo?(vseerror)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #21)

I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.

Can you find the exact regression range using https://mozilla.github.io/mozregression/ ?

Flags: needinfo?(chris.m.gaudry)

Chris, is a newer beta still "bad"?

(In reply to Wayne Mery (:wsmwk) from comment #23)

(In reply to Chris GAUDRY [:cmgaudry33] from comment #21)

I have also tested Thunderbird 74.0b1, 74.0b2, 75.0b1, 76.0b1, 77.0b1.
Slow start (40 seconds) occurs since 75.0b1 to 78.7.1.

Can you find the exact regression range using https://mozilla.github.io/mozregression/ ?

Severity: -- → S4
Flags: needinfo?(chris.m.gaudry)

Ping. Can you give us in update? Thanks

Flags: needinfo?(chris.m.gaudry)

Hi,

You can close this bug.

Regards

Flags: needinfo?(chris.m.gaudry)

WONTFIX?

Flags: needinfo?(chris.m.gaudry)

presumably WFM

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(chris.m.gaudry)
Resolution: --- → WORKSFORME
Summary: Thunderbird 78.x slow startup each time caches are invalidated - 40 seconds vs 5 seconds. Firefox has same problem. Beta 86 is fine. → Thunderbird 78.x slow startup each time caches are invalidated - 40 seconds vs 5 seconds. Firefox has same problem. 86.0b is fine.
You need to log in before you can comment on or make changes to this bug.