Closed Bug 1668811 Opened 4 years ago Closed 3 years ago

typing lags and CPU ~10%. Constantly querying registry "software\MicrosoftWindows\currentVersion\theme". High addressbook query via Gloda. Windows Server 2019 RDSH

Categories

(Thunderbird :: Theme, defect)

Unspecified
Windows
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: duparchy, Unassigned)

References

Details

(Keywords: perf, regression, regressionwindow-wanted)

Attachments

(4 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0

Steps to reproduce:

Using TB 78.3.1 64bits the normal way...
Three IMAP accounts.
One shared account with quite a lot of emails/folders.
No local caching of IMAP emails.

Actual results:

After a while TB Lags when typing and is eating CPU ~10 % CPU time (We're talking about a Windows 2019 RDSH sever W / 14 CPUs... )

Tried to disable global index
Tried to increase the compacting threshold.
No way.

My only clue comes from Sysinternals Process Monitor which shows that under some circumstances TB is constantly querying the registry for the "software\MicrosoftWindows\currentVersion\theme"..

I've been able reproduce by starting TB and not typing the password.
TB popup for password is waiting. Process Monitor shows a constant Registry activity.

Expected results:

Not eating my CPU

Attached image TB1.png

Process Monitor Capture

Attached image TB2.png

another capture. Other Registry keys

Martin, do you have an opinion on the registry activity generating CPU activity? It seems to be theme related?

duparchy, please capture a profile using https://github.com/thunderbird-conversations/thunderbird-conversations/wiki/Profiling-Conversation's-Performance and post the URL here in the bug report.

Component: Untriaged → Theme
Flags: needinfo?(richard.marti)
Flags: needinfo?(duparchy)
Keywords: perf
Summary: Thunderbird - high CPU - thrashing Windows registry → typing lags and CPU ~10%. Also thrashing Windows registry

I see no such massive theme requests with a long active Daily. What theme is used on TB and Windows?

Flags: needinfo?(richard.marti)

Thunderbird theme is the default one. Although there are older, not compatible, themes present.

Windows theme is "Windows 10" theme. I don't remember if this theme was already there in W2016 / W2019 RDSH or if it was imported.

I tried w/ a fresh TB profile several times, on different computers.
Results are not consistent.... But.... it seems that after a while and several restart Thunderbird stays quiet, at last.

I got several times the never-ending registry activity, related to theme, when Thunderbird is waiting for the password.

I got another seemingly never-ending activity related to abook and history database, just after typing the password (see attachement) . Although I was doing nothing. There was for sure no history to keep nor any address to collect...?

Flags: needinfo?(duparchy)
Attached image TB3.png

I will try to refresh the TB sertup of users who are complaining about Thunderbird Lags. (I manage ~90 users. )
We are using TB since the early age and most prefs.js have been upgraded many many time.

A fresh setup will not harm...

I keep you posted.

Please find the uploaded profile at (related to the TB3.png problem (High CPU and never ending open/close History + Abook files))

https://profiler.firefox.com/public/3g84c9xjsf6xsn54x2bb0cqx2rqx350hgzwhfmg/calltree/?globalTrackOrder=0&thread=0&v=5

I have not been able to trace the problem related to windows Theme.
The only way to reproduce it for sure is at TB startup , before entering the password. I can't open the developer toolbox.
I'm waiting for the problem to show up again with one of my users.

Attached image TB4.png

Posting a capture (TB4.png) of yet-another seemlingly never ending loop of curious activity (cpu intensive as well)

PLease find a below a profile capture related to the Registry activity.

The Registry thrashing occurs while writing an email.
I Just open a new email, did nothing and here we are with this activity.

Note that started the capture when a new email window was open. During the capture I closed it and open a new one. Typing on;y a few characters

https://profiler.firefox.com/public/f64k10ptdrvknt6d28afjhebdfn7b0419ba78ag/calltree/?globalTrackOrder=0&thread=0&v=5

https://profiler.firefox.com/public/xqtbx2yyfmm9c43n9x8ttgn0t847xck7vxvmk4r/calltree/?globalTrackOrder=0&thread=0&v=5

This time I was just moving the mouse, hovering folders and emails.
Just doing that and Thunderbird takes 3 to 5% of my 14cpus. Flabbergasting.

I noticed that is the preview pane is not empty (I have clicked an email) then hovering does not trigger the registry calls.
(i.e, clicking a folder and not clicking on an email, leaving the preview pane empty and then hovering arround triggers the problem)

(In reply to duparchy from comment #8)

Please find the uploaded profile at (related to the TB3.png problem (High CPU and never ending open/close History + Abook files))

https://profiler.firefox.com/public/3g84c9xjsf6xsn54x2bb0cqx2rqx350hgzwhfmg/calltree/?globalTrackOrder=0&thread=0&v=5

Approximately 50% is gloda related. 44% is unidentified under nsAppShell::ProcessNextNativeEvent::Wait
XREMain::XRE_main
(root)

Geoff, can you make anything of this?

Flags: needinfo?(geoff)

Gloda is hitting the address book really hard. I don't know why, but I guess that's just what it does when indexing. I'm trying to figure out how to cache the address book queries it's making in a way that isn't going to have invalidation problems.

This is a big performance improvement but I'm worried about stale data. I'm going to try to figure out a way to only have this cache active when an indexing operation is happening.

Hi,
Thanks for your quick responses and corrective measures

On a test system I reverted back to TB 60.9.1. No performance problem, no Registry thrashing while writing an email or hovering the mouse over the folders.

I've no idea about what's going on with the registry, but the patch I've proposed should significantly reduce the number of accesses to the .sqlite files. I'm still confused about why every access needs to unlock and lock the file though.

From what I can tell, the registry thing is what creates lags while writing emails.

Please take into account that there are users on multi-session VMs, w/o GPU.
What goes unnoticed on a single-user / quad-core + GPU system, is revealed on those virtual and shared environment.

Geoff, Is it possible that the cost for AB lookups from gloda has greatly increased in 78?

This investigation is significant, because we've have many reports of inability to read/load messages while Thunderbird is downloading messages, and historically this has often correlated to gloda indexing activity. I experience this myself now for many weeks. An example report is bug 1631631

See Also: → 1631631
Assignee: nobody → geoff
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: typing lags and CPU ~10%. Also thrashing Windows registry → typing lags and CPU ~10%. Also thrashing Windows registry (due to gloda address book lookups)

(In reply to Geoff Lankow (:darktrojan) from comment #15)

Created attachment 9180058 [details]
Bug 1668811 - Try to avoid using the address book database when indexing messages

This is a big performance improvement but I'm worried about stale data. I'm going to try to figure out a way to only have this cache active when an indexing operation is happening.

Didn't we just talk about ab caching and change notifications to keep a cache current in Bug 1633998 ;)

I'm going to move my address book changes to another bug since they probably don't address most of the original problem here.

Assignee: geoff → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(geoff)
See Also: → 1669872

(In reply to Wayne Mery (:wsmwk) from comment #19)

Geoff, Is it possible that the cost for AB lookups from gloda has greatly increased in 78?

This investigation is significant, because we've have many reports of inability to read/load messages while Thunderbird is downloading messages, and historically this has often correlated to gloda indexing activity. I experience this myself now for many weeks. An example report is bug 1631631

Yes, it has increased, and I'm trying to stomp on that now, but it shouldn't cause a problem for small amounts of new mail. It shouldn't cause any problem if there's no new mail.

Comment on attachment 9180058 [details]
Bug 1668811 - Try to avoid using the address book database when indexing messages

Revision D92698 was moved to bug 1669872. Setting attachment 9180058 [details] to obsolete.

Attachment #9180058 - Attachment is obsolete: true

For the record: I had users complaining about lags in Thunderbird (68.1.2 / 32bits) .
That's why I upgraded to 78 (64 Bits)....which made things worst (hence that thread).

So I deciced to downgrade way back, to 60.9.1 .
No complaint from my users so far...

Now that I'm monitoring closely Thunderbird, I notice that, concerning the user whose Thunderbird was accounting for a regular 10% cpu, he is now l accounting for a permanent 2% throughout the day. Though, there's no more dubious activity related to files and registry .
This is Better (he's not complaining anymore) but still questionable (global indexing is disabled BTW). There's a design flaw somewhere IMO.

My previous comment is about TB 60.9.1...

The user had the setup "open message in a new window" and had the habit to leave the messages windows open and deal with latter when not urgent. So he had regularly many Thunderbird Windows opened. Seems to worsen the CPU bug.

Refocusing this bug on the theme / registry - presumably regression - activity.

Alex, Richard what do you make of the other half of comment 13 (i.e. excluding the addressbook part) - it seems to me this could be the Windows bits. Related to "improvements" in dark/light theme support?

https://profiler.firefox.com/public/3g84c9xjsf6xsn54x2bb0cqx2rqx350hgzwhfmg/calltree/?globalTrackOrder=0&thread=0&v=5
44% is unidentified under nsAppShell::ProcessNextNativeEvent::Wait

For the record: I had users complaining about lags in Thunderbird (68.1.2 / 32bits) .

For performance issues that began in version 68, consult https://mzl.la/2IeceoD But be aware, version 68 is end of life and not receiving patches.

Flags: needinfo?(richard.marti)
Flags: needinfo?(alessandro)
OS: Unspecified → Windows
Summary: typing lags and CPU ~10%. Also thrashing Windows registry (due to gloda address book lookups) → typing lags and CPU ~10%. Constantly querying the registry for the "software\MicrosoftWindows\currentVersion\theme"..

The high CPU + lag problems forces me (count my 90 RDSH users) to stick to TB 60. So, I guess it's even worse in terms of security....
I probably shouldn't write this on internet. gasp.

I don't see how our themes should access the registry. We do no such checks as I know. Maybe this a issue of the theme back-end in m-c?

Flags: needinfo?(richard.marti)

I doubt this is related to the improvements we did on detecting Windows dark/light system theme.
The only OS check we do is being sure we're not on a Linux platform: https://searchfox.org/comm-central/rev/94f0c44be292c2951fa85128e3e12369a16dc2cf/mail/base/modules/ThemeVariableMap.jsm#173

(In reply to duparchy from comment #12)

I noticed that is the preview pane is not empty (I have clicked an email) then hovering does not trigger the registry calls.
(i.e, clicking a folder and not clicking on an email, leaving the preview pane empty and then hovering around triggers the problem)

Maybe we should investigate this aspect and some calls are happening to populate the empty message pane with the "Welcome to Thunderbird" page. When was this implemented first? In 68?

Flags: needinfo?(alessandro)

I confirm that TB 68.12.1 does have "high cpu" problem, but w/o Registry nor file Thrashing.

Step to reproduce :

Start Thunderbird,
hover the mouse over folders an emails (no click) and watch insane cpu activity.

(done through a RDP session on a VM)

This is perhaps normal behavior after all . ...
I checked Firefox and MS Word and they exhibits same cpu peaks when moving the mouse around.

(In reply to Alessandro Castellani (:aleca) from comment #32)

Maybe we should investigate this aspect and some calls are happening to populate the empty message pane with the "Welcome to Thunderbird" page. When was this implemented first? In 68?

The profile in bug 1661980 also shows significant activity related to the start page

Flags: needinfo?(alessandro)

When I wrote "normal behavior" that was about the latter one probably related to RDP.

The former and original ones , i.e. Registry thrashing + High CPU (related or not) in various situation is absolutely not normal...

Thanks for your investigation.

Can someone test this on 83.0b2?
We improved the themeing detection and delegated all the operations to event listeners in order to not constantly trigger luminance calculation.
If the problem persists it means that it's totally unrelated to our themeing code and it might be something coming from m-c. (uneducated assumptions)

Flags: needinfo?(alessandro)

Hi, seems to be good...mostly.

I got one case where there are still tons of Registry theme queries.

Steps to reproduce

  • Start Thunderbird (setup w/ a preview pane)
  • Enter the password
  • click on the account root
  • click on the Inbox
  • Hover the mouse over the (IMAP) folders

But after a while of "normal" usage of Thunderbird (clicking to preview emails, writing emails, opening dozens of emails) .

Apart from that TB stays quiet on the registry side and a lot more quiet on the CPU side.

Flags: needinfo?(duparchy)

I'm writing to put some other symptoms on your radar, and perhaps give you some additional insight into the issue as you problem-solve. I believe I've encountered this same root cause with different symptoms.

We had (have?) significant issues with Thunderbird visual artifacts / screen updates after it updated to 78.4.x. I cannot speak to the CPU load issues, as I haven't paid much attention to that. Our screens simply don't refresh well in TB for some users. We could, for example, click on a message and the screen doesn't update to show it. The screen may also get jumbled a bit with artifacts. The issue appeared when TB updated itself to the version that first included the newer-themed interface. The issue only affects TB -- no other applications.

We have dozens of users using TB using Remote Desktop on Windows Server 2012. I believe that Remote Desktop's inherent limitations contributed to make our issue a lot more pronounced than most users probably experience. I tried disabling hardware acceleration (as frequently helps in RDP applications), but this did not help with this problem.

Based on this thread and the hint that theming could be part of the issue, I advised one of my users to switch from the Default theme to the Light theme. He reported that this appeared to resolve the visual / screen updating issues. I have now advised other users with the issue to do the same. Another user reports it to be a good bit better but still not quite right.

I hope that this insight might help save others time and frustration, not to mention contribute to the ultimate fix for the bug.

Alex, so something we should uplift, if so, what?

Flags: needinfo?(alessandro)

Nothing left to uplift as far as I can see as everything already landed on 78.5.1

Those improvements and changes were done in bug 1668989, and bug 1638233.
The last fix and performance improvements were done in bug 1678029.

Flags: needinfo?(alessandro)

Hi,
There are still some circumstances where just hovering the mouse over folders or emails trashs the registry and eats CPU.

(beside there seems to be a consensus about the new design not being pleasant )

Sticking to 68.12 for now.... ( Speaking for my hundred users using Remote Desktop on Windows 2019 RDSH )

Anyone can monitor TB with procmon and see that 68.12 is just quiet whereas 78.5 is not.

Why so much activity with abook ??

(In reply to duparchy from comment #45)

...
Why so much activity with abook ??

Because Gloda indexes messages on more than just body and subject.

Please post a new performance profile using 78.6.0 (or at least 78.5.1).
Also, what antivirus software do you run, and what is your backing disk storage?

Flags: needinfo?(duparchy)
Summary: typing lags and CPU ~10%. Constantly querying the registry for the "software\MicrosoftWindows\currentVersion\theme".. → typing lags and CPU ~10%. Constantly querying registry "software\MicrosoftWindows\currentVersion\theme". High addressbook query via Gloda. Windows Server 2019 RDSH

duparchy, also, how much memory is allocated to each user's machine, and how much memory is a running Thunderbird using for version 60 vs version 78?

Revisiting the v68 slowness, it is helpful to know it was already slower (you say 68.1.2 which is circa 2019-10-10) because it helps with regression ranges (especially if reproducible and we can identify specifically what is slow). But still I don't see any big smoking guns for version 68 in new bug reports 2019-01-01 to 2020-06-01 https://mzl.la/2LjTLs3 (which covers the beta development period of v60 up through roughly 60.10.x - a bigger range than comment 29).

Time wise, biggest bang for the buck is hard data
A. finding regression ranges [1] for specific behavior
B. obtaining performance profiles to benchmark specific behavior (note, profiler doesn't reliably work at various points for version 60 and 68)

Conversely, in general we've found it's not productive to try to audit code or randomly compare interim releases (without the aforementioned data) because in a long time from of 1-3 years too many issues come and go, which greatly clouds the performance picture. For example Bug 1575214 - Gloda is getting in the way of typing messages, slow because of jank from synchronous GC and Bug 1580480 - IMAP fetch chunk size is always 65536 bytes

[1] regression ranges using nightly builds can be found relatively quickly with the https://mozilla.github.io/mozregression/ tool

Brett, can you file a bug report for your comment 41 please? Thanks.

Flags: needinfo?(bgpublic)

Hi,
We're on Windows 2019 RDSH. ~10 simultaneous users over RDP connections .
Resources are adapted to the max workload we've seen and can be increased. Currently 14 CPU / 40Gb RAM. iSCSI storage (10Gb/s) through virtualization layer. SSD class. A bit of latency due to the virtualization + iSCSI.
Network between the servers and the IMAP servers is also 10Gb/s.
Users are not complaining about performances issues. The last bunch of complaints was after TB upgrade. Downgraded since then.

Thunderbird 68 will allocate between 100 and 350 Mb per instance, depending on the user.
Apart from that, I don't quite understand your comment 47.. Sorry my mother tongue in not English.
I'll check again performances with latest official release and let you know.

Flags: needinfo?(duparchy)

TB 78.6.0 seems fine, once the initial indexing is finished.
Indexing is still quite CPU intensive and involves tons of lock/unlock file and small writes. This is a kind of a worst case in a iSCSI storage environment because of the latency...

There is still one use case where tons of registry calls seems to be triggered for nothing
(Preview Pane enabled ) Clicking on a folder never opened before (preview pane is blank) => tons of registry calls

tons of registry calls when hovering other folders

Severity: -- → S3
See Also: → 1728747

I started again to monitor Thunderbird of my users (because of case bug 1728747 ), and yes, there is still cases where TB will constantely query the registry for the same thing like crazy (see TB2.png)

Unless quickly fixed, I will have to revert to TB 60 (again).

How is version 91?

Flags: needinfo?(duparchy)

Seems OK.

Flags: needinfo?(duparchy)

Nice to hear the problem is gone.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bgpublic)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: