Open Bug 753343 Opened 12 years ago Updated 1 year ago

Soft-hang for 5s after focusing a TB window, and massive memory leaks, getting gradually worse over days, with calendar and ICS enabled

Categories

(Calendar :: Calendar Frontend, defect)

x86_64
Linux
defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: BenB, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: hang, memory-leak, perf)

Attachments

(4 files)

Environment:
* Linux 64bit (amd64)
* Thunderbird, trunk, self-compiled, 64bit
* Lightning

Usage pattern:
* Thunderbird is running for many days (10-20 days)
* Lightning has 4 .ics calendars, (refresh every 30 minutes)
* Pref "open msg in new window" (i.e. new standalone msg window for every email)
* 2-3 TB main windows (3 pane) with different folder selections

Actual result:
Very often (every 2.-3. time) when I open or focus a new TB window, no matter the type of window, TB freezes for 5 seconds. It takes no input at all during that time, then recovers.
The freeze time increases notably with the TB runtime. I.e. right after restart, it's maybe 0.5-1s (not pronounced, but still noticable), and after 10 days, it's 5 seconds (completely unusable).

Expected result:
Response time to user input is <= 30ms at any situation

Importance:
This makes TB extremely annoying and unusable. Imagine you have to wait 5 seconds just to read an email.
Severity "major", because a hang/freeze is critical by definition, but it's not a permanent hang. This bug makes TB unusable -> major by definition.
Debugger is one possibility to get a traction here, but 1) debuggers on Linux are horrible and 2) the bug doesn't appear pronounced after a restart.
Ben,
how long ago have you first seen this?

on IRC you posted this good info
TB currently is at 1.4 GB RAM (resident physical RAM). linux 64 bit, with lightning.
PID USER PR NI VIRT RES SHR S PU %MEM TIME+ COMMAND
2297 ben 20 0 2062m 1.4g 39m S 0 8.7 751:50.25 thunderbird 

IME, high memory usage alone, regardless of the source of the high usage and RAM available, seems to cause performance issues. It's not clear why, I've never got traction on it and don't anticipate anyone finding out soon - perhaps GC or CC related.  

So I would focus on tracking the memory usage over time, and confirming that reduction in performance is roughly linear to your (presumably) increase of memory over time.  You say gloda is enabled, but that would not be one of my prime suspects. My starting points would be:

- Lightning - 
** check the bugs linked to bug 441710
** consult someone in #calendar world. 
** if disabling lightning not your first choice for diagnosis, run two instances of THunderbird side by side using -no-remote for a period of a couple days - one without calendar, and one without - you should quickly learn whether the one running lightning causes trouble

- folders - You have a history of reporting very large folders - is that still true?  a log of msgdb,1 might be useful per https://wiki.mozilla.org/MailNews:Logging

- sleep / wake cycles - there has been some speculation, at least on windows, that OS sleep / wake cycles somehow impact mozilla performance - I don't have any diagnosis steps for that, other than do a profile
Keywords: perf
Summary: Soft-hang for 5s after focusing a TB window → Soft-hang for 5s after focusing a TB window. starting small, getting worse over period of days
> how long ago have you first seen this?

Autumn last year.

I don't remember when I started using Lightning, but it was roughly around the same time.

> IME, high memory usage alone, regardless of the source of the high usage and
> RAM available, seems to cause performance issues. It's not clear why, I've
> never got traction on it and don't anticipate anyone finding out soon - perhaps
> GC or CC related.

This could well be the cause of this bug. I.e. a leak somewhere, which causes the JS engine / GC runs to be very slow. Compare bug 641025. If this theory is true, the root cause of the perf problem is a mem leak.

> confirming that reduction in performance is roughly linear to your (presumably)
> increase of memory over time.

It is.

> ** if disabling lightning not your first choice for diagnosis, run two
> instances of THunderbird side by side using -no-remote for a period of a couple
> days - one without calendar, and one with...

Yeah, that's an idea.

> - folders - You have a history of reporting very large folders
> is that still true?

My spam folders aren't as large anymore, "only" 10-20 000 msg, but I rarely open them.
My bugzilla folder, which I use a lot, has about 30 000 msgs.
I have about 150 folders in total.
Summary: Soft-hang for 5s after focusing a TB window. starting small, getting worse over period of days → Soft-hang for 5s after focusing a TB window, getting gradually worse over days
I agree with Wayne that high memory usage is usually a symptom that corresponds with laggy perf. 

On the trunk, we should be closing inactive db's, so I wouldn't expect the memory bloat to be caused by a bunch of open db's.
Summary: Soft-hang for 5s after focusing a TB window, getting gradually worse over days → Soft-hang for 5s after focusing a TB window, and massive leaks, getting gradually worse over days
I've now disabled Lightning in my main profile and put it in a second profile.
Of course, I've restarted TB. Now, aft er a fresh start, the same profile as above needs only 150 MB. This is 1/10 of the RAM it used before. So, we have a leak of 10x (!) the normal RAM usage. Or about 5-10% leak (of the overall RAM usage) per day. This is massive.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
14158 ben       20   0  840m 175m  35m S    0  1.1   1:22.61 thunderbird
> Or about 5-10% leak (of the overall RAM usage) per day.

Correction: about 50-100% of the total RAM usage leaks every day.

It's also fast after the restart, but I've already stated that above.
Bug 592876 sounds an *awful* lot like mine.
Bug 315959 and bug 412914 also sounds relevant.

My Lightning usage:
I have 4 ICS calendars that I have configured as "Network calendar" with URL file:///..., because Lightning (stupidly) doesn't allow me to use local .ics calendars (esp. outside my TB profile). One of the calendars has 200-300 events from 2 years (which I consider light usage, not much) including 5 recurring events every week.
Summary: Soft-hang for 5s after focusing a TB window, and massive leaks, getting gradually worse over days → Soft-hang for 5s after focusing a TB window, and massive leaks, getting gradually worse over days, with Lightning and ics enabled
BTW, this is a high-end machine with 4-core, 16 GB RAM and 2 SSDs (no HDD), so if this involves blocks on I/O or sqlite, then it would be far worse on other machines.
Ben, is the cache setting (visible in the calendar properties dialog by double-clicking on the calendar name in the calendars list) enabled on your calendars, and does changing it make any difference?
Matthew, Caching was disabled when I saw the problem. I have now enabled it yesterday, and I'm running Lightning in a different process and profile now.
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                               3548 ben       20   0 1430m 565m  39m S    0  3.5  54:50.45 /usr/mine/mozilla/thunderbird/default/thunderbird
3612 ben       20   0  934m 168m  36m S    2  1.1  26:02.87 /usr/mine/mozilla/thunderbird/default/thunderbird -no-remote -P lightning
$ uptime up 2 days, 12:31

So, within 2 days, TB (without Lightning) went from 175 MB to 565 MB, leaking ~200 MB per day.
(In reply to Ben Bucksch (:BenB) from comment #12)
> So, within 2 days, TB (without Lightning) went from 175 MB to 565 MB,
> leaking ~200 MB per day.

that's significant. :(

Is current trunk the same, worse, better?
Asking because some TB perf changes have landed.
Flags: needinfo?(ben.bucksch)
And do you still see the "soft hang" without lightning?  
Your comment 12 is specific only about memory increasing.


(In reply to Ben Bucksch (:BenB) from comment #9)
> BTW, this is a high-end machine with 4-core, 16 GB RAM and 2 SSDs (no HDD),
> so if this involves blocks on I/O or sqlite, then it would be far worse on
> other machines.
perhaps likely to be worse as you say. but not necessarily - we see sometimes see SSD related strangeness
Whiteboard: [needs TB17 retest]
No, the problem is mostly gone since I run Lightning in a separate process & profile.
Flags: needinfo?(ben.bucksch)
Whiteboard: [needs TB17 retest]
(i.e. fairly clearly, Lightning was at fault)

VIRT   RES  SHR S %CPU %MEM    TIME+  COMMAND                
1893m 951m  39m S    0  5.9 135:46.81 firefox
1865m 921m  42m S    0  5.7 295:59.96 thunderbird -P default
1206m 268m  35m S    0  1.7 301:09.70 thunderbird -no-remote -P lightning
Compare bug 441710
Blocks: 441710
Summary: Soft-hang for 5s after focusing a TB window, and massive leaks, getting gradually worse over days, with Lightning and ics enabled → Soft-hang for 5s after focusing a TB window, and massive memory leaks, getting gradually worse over days, with Lightning and ics enabled
This should probably move to calendar.  I'll leave you and fallen to sort that out
Keywords: mlk
Depends on: 841995
Component: Mail Window Front End → Calendar Views
Product: Thunderbird → Calendar
need:
- profile per https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G
- about:memory via help | troubleshooting

If profile is to big then the share/upload step will fail. You may need to go to shorter profile time of say 5-10 seconds

--------------
copied from Ben's comment in bug 592876

Reproduction:
[0. in a VM, or in Thunderbird profile seperate from your production profile via -no-remote]
1. Add 6 ICS calendars to Lightning, as local file (file:/// URLs to ICS files)
2. Add several meetings, including recurring meetings, to the calendars
3. Let it run for a week

Actual result: multi-second freezes all the time, dog slow, unusable
Expected result: No slowdown over time at all
Cause: My guess is that Lightning uses too many JS objects and there's a weakness in the JS GC in mozilla.
I see the same with some other extensions (e.g. Header Tools Lite) that process lots (MBs) of data in JS.
--------------

High memory (>1gb perhaps?) can indeed be a problem just by itself. gc performance can get horrible. We should see it in the profiler output.

We might need to add more than 6 calendars to get the effect in a shorter period of time, in a profile that isn't actively getting mail.

Ben still sees this issue, so bug 841995 must not have been a huge factor, if at all
Flags: needinfo?(Mozilla)
After running TB for some time with 6 calendars and quite a few events, I have memory usage up to 1100+ MB.
Starting with approx. 1000MB, I started to see lags in TB execution. Before everything went smooth and quick.

I'll now try to get the profiler run on this instance.
Please also about:memory via help | troubleshooting
Attached file about:memory report —
Here is the about:memory report
Flags: needinfo?(Mozilla)
:wsmwk I think this is exactly what I was talking about to you in e-mail. Poked around in about:memory. Lightning is currently hogging 2.4G non-shared memory and the attached memory report confirms it.

I'm hereby confirming the same laggy behavior of the original report.
(In reply to Leho Kraav (:macmaN @lkraav) from comment #23)
> 
> I'm hereby confirming the same laggy behavior of the original report.

Some additional details: 20 calendars, all network, mix of ICS and CalDAV, incl. Google Calendars.

Have been running the calendar in a separate profile forever, ever since I realized I want my mail UI always be lightning.. err I mean just fast. ;)
Please also - profile per https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G
Flags: needinfo?(leho)
Flags: needinfo?(bv1578)
Flags: needinfo?(Mozilla)
See Also: → 592876
Ben, regarding Header Tools Lite, can you provide more details?  For example is it necessary to use the addon in a specific way to cause significant gc issues, or it is sufficient to just install it and let it run some number of days?  What other addons are causing similar problems?

(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #19)
> need:
> --------------
> copied from Ben's comment in bug 592876
> ..
> I see the same with some other extensions (e.g. Header Tools Lite) that
> process lots (MBs) of data in JS.
> --------------
Flags: needinfo?(ben.bucksch)
See Also: → 733039
Wayne, you need to install "Header Tools Lite" and change the headers of a message with an attachment of several MB size.
Flags: needinfo?(ben.bucksch)
FWIW, I can't help any further on this bug. I already provided all info, and we have a memory report attached. Somebody needs to get onto this with a debugger.

It's fairly easy to reproduce: Add 10 ICS calendars, refresh every minute, and wait for the memory to leak.
(Add a few recurring events to each, for kicks)
At the moment, my TB builds crash on Ubuntu, if I try to add a new profile.
The profile I worked with last time, also crashes.

I keep trying.
Flags: needinfo?(Mozilla)
(In reply to Markus Adrario [:Taraman] from comment #32)
> Here is the profiler report from a running thunderbird using ~1500MB of RAM
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> ?1440231217377#report=0e109b9893f596cc29966e37069cba95e277e6a3&selection=0,1,
> 226,227,228,229,230,231

1. ~31% nsCycleCollector::collectSlice / nsJSContext::RunCycleCollectorSlice (with the jank checkbox enabled it's 39%)
2.  19% cAS_findAlarms/cAS_fA_onGetResult/<() @ calAlarmService.js:508 // cAS_removeAlarmsForItem() @ calAlarmService.js:390 // onRemoveAlarmsByItem() @ calendar-base-view.xml:189

#1 is to be expected, and sucks 
Most of the rest of of the CPU appears to be in the binary. humbug, profile doesn't have the binary symbols!

How long had you had Thunderbird running til it reached 1.5GB, and how many calendars?
I had 6 local ics calendars, each with an endless daily recurring event.
I started at about 750 MB and after one night it reached 1.5 GB.
It looks, this might not only be a linux-Problem.
I forgot to turn off my computer before I left on Monday and had TB running since then on Win7

now it uses 1.4 GB of Memory and also hangs every now and then.
I have same problem. Some information here:
https://github.com/Ericsson/exchangecalendar/issues/301
After I have disabled iCal calendars problem disappear.
I've got the same problem after upgrading to TB 38.3/Lightning 4.0.3.1 from TB 31.7/Lightning 3.3.3.

Env:
User Agent - Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 (Application Build ID 20150930122345)
10 CalDAV calendars (lots of events, including reccuring events)

After ~8 hours I got multi-second freezes and ~1.5 GB RAM usage.
I've reproduced the issue on nightly-esr38.
I'm subscribed on 5 CalDAV calendars and after ~24h I got laggy interface. For example when you're writing email it freezes for about a second every 5-10 seconds.

Here's my dump: http://people.mozilla.org/~bgirard/cleopatra/?1448516725559#report=69012c8283af0b60b8ef4a768280e3d31940a89a&selection=0,1

Memory info from ps:
PID    VSZ   RSS %MEM CMD
5522 1954416 779000  4.7 /home/gavrilov/thunderbird/thunderbird -P
The same profile after 3 days:
  PID    VSZ   RSS %MEM %CPU  STARTED CMD
 4781 6076236 5013124 30.7 22.0   Nov 27 /home/gavrilov/thunderbird/thunderbird -P

Completely unusable!
Same problem here, TB 38.3/Lightning 4.0.3.1, Windows 7 x64, 4 calendars, 1min update interval.

Excessive memory usage causing freezes after <1day of usage, only restarting TB helps. 
This bug has been there for years now and users find it unusable. 

As it seems that nothing can be done about it even though it is easily reproducable, are there any alternative applications for CalDAV calendaring on Windows until this finally gets fixed as SunBird isn't updated anymore and therefore also cannot be used (and most likely would suffer from the same bug anyway)?
@leecher, we're stick to TB 31.7 + Lightning 3.3.3. Seems to be fine for 300+ users in our company.
I tried both versions, the most recent from the homepage and TB 31.7 with the same setup over night and can confirm that the version you suggested doesn't suffer from this horrible memleak (mem usage stabilized @ approx. 175 MB) whereas the newest version was @ 900MB RAM still growing after just 7 hours. On user's machines 2GB usage was reached (32bit Process limit) easily after ~1,5 days and TB became unresponsive. 
So I downgraded all machines to TB 31.7 and pinned the versions via app.update.enabled = false
Thanks for the advice!
Hi all. I'm fairly confident I'm seeing massive improvement in TB 44 beta1 + Lightning 4.6 beta1. App uptime has been several days and I'm not seeing any slowdowns.
Flags: needinfo?(leho)
annecdotally

Leho reports "I've been comfortably using Lightning for several months now,
calendar count is double-digits."

Sergey reports "I'm on 45 branch now and it's ok. I'm not quite sure about exact version where it became good for me. I believe I skipped 44 and saw Leho's comment when there was 45beta already."
Flags: needinfo?(bv1578)
Wayne, by that measure, all bug reports would be anecdotally.

Given that no active developer seems to care about hunting these leaks, all we have is individual reports.
There have been reports from several people here.
I've personally spent a long time figuring out that it's Lightning that slowed down my Thunderbird to a level of being unusable. I have proven it beyond doubt - by separating out Lightning, which dramatically improved the situation.

This is a very serious bug that makes Thunderbird unusable (for those who see it).

Just because it works for some people does not mean there is no bug. Such a stance is unhelpful and destructive. Wayne, I've asked you repeatedly to stop commenting on my bugs and trying to minimize or close them. I'm asking you here again: Do not comment on my bugs, or make them in any other way useless. You may triage bug reports from new users, but not from senior developers.
I am using Thunderbird 45.5.1 and Lightning 4.7.5.1 right now.

Leho is said "uptime has been several days" - that's not enough to see the bug. The bug appears for me only after 1-2 weeks of heavy use without restart.
(In reply to Ben Bucksch (:BenB) from comment #47)
> I am using Thunderbird 45.5.1 and Lightning 4.7.5.1 right now.
> 
> Leho is said "uptime has been several days" - that's not enough to see the
> bug. The bug appears for me only after 1-2 weeks of heavy use without
> restart.

That comment was made in February last year. The subject buggy leaking slowdown behavior hasn't been an issue here for nearly a year now. You're right in saying "doesn't mean there's no bug left", but whatever the original issue was seems to have been to a large extent mitigated.
Anybody have some more recent "memory reports" to use for comparison? Maybe if someone can post steps to get these, it would provide a broader base of data to work with. 

Thanks.
(In reply to Worcester12345 from comment #49)
> Anybody have some more recent "memory reports" to use for comparison? Maybe
> if someone can post steps to get these, it would provide a broader base of
> data to work with. 
> 
> Thanks.
> The subject buggy leaking slowdown behavior hasn't been an issue here for nearly a year now.
> You're right in saying "doesn't mean there's no bug left", but whatever the original issue was seems to have been
> to a large extent mitigated.

That is not correct. I still see this bug, and it's a huge impediment for my use of Thunderbird.

I have worked around it by splitting TB mail and Lightning into 2 processes and profiles, which makes that the bug is less serious, but the basic bug is still there.

Just because I don't keep pestering people about it, it doesn't mean the bug is not here or is not serious or can just be forgotten about. It's just bad manners on bugzilla to pester. But if you prefer, I can do that.

Yes, I see this bug, and it's still terrible.
And, no, I will not provide any more data, unless a developer capable (and willing) of fixing it is asking me.
as i am not allowed (i dont know by which reason) to post any comment to Bug 733039 i leave my comment over here:

i had always problems with VERY slow GUI when lightning is enabled, Thunderbird hung a couple of times a day while i was writing mails, etc. - i tried a lot of things, but nothing helped. i have about 10 CALDAV-calendars. 

i could kind of solve this problem by create a complete new thunderbird-profile and reconfigure everything. my thunderbird-profile was pretty old (i used it a couple of years!). what ever changed inside, since i created the new profile this massiv lags inside thunderbird went away. sometimes TB still hangs, but not that often and hard like before. in most of the times, it helps to restart thunderbird.

good luck.
If anything, it seems worse than ever now for me, since I went to 60.0 (32-bit). Calendar/Lightning is screwed up also.
Something happened to my Thunderbird-Installation! As i told you, after creating a complete new thunderbird-profile the problem was fixed, but now it is back there - thunderbird lags so badly like before sporadically. it is almost unusable. i think it happened after one of the last updates, maybe the major-upgrade 60.0 - so something "destroys the profile" and causes this problem when updating thunderbird with an older profile. i dont want to recreate my profile again from scratch, cause this is a lot of work and i think it will happen again and it would help others to figure out what exactly is causing this problems. does anybody has an idea how i could encircle the part of the profile, that is causing this issue?
Most likely causes are addons or Lightning.
sure, Ben, thats why the title of this bug is "with Lightning enabeld" - but that was not the answer to my question (if that was the purpose of your last comment). The question is, how can the causing PART of the profile-configuration be found, cause a recreation of the profile did the trick as i said in my earlier comment.
If you really want to find out, you can:
1. Create a new profile N
2. Make a copy of your broken profile B (which creates profile T1)
3. Replace one file of the broken test profile T1 with the same file from the new profile N
4. Try Thunderbird with that, whether you see the slowdown or not
5. Repeat steps 2.-5, with various files, until you see the problem is gone.
6. If you think you found the offending file:
7. Copy the new profile N to profile T100
8. Copy the offending file from the broken profile B to the profile T100
9. Verify that you see the problem in that profile T100
10. Verify that you do not see the problem in the new profile N without the offending file
11. Copy broken profile B to profile T101
12. Copy the file corresponding to the offending file from new profile B to fixed profile T101
13. Verify that you do not see the problem in fixed profile T101 without the offending file
14. Post all your results here, all steps that you did
15. Attach the offending file here in this bug

If you want to help even further, you can repeat the same method with the file *contents* of the offending file, to narrow down further which exact content of the file is causing this problem.
> 12. Copy the file corresponding to the offending file from new profile B to fixed profile T101

Sorry, typo. Correction:
12. Copy the file corresponding to the offending file from new profile N to fixed profile T101
Ben, thanks for that instructions, i kept the mails in my inbox to try that later when i have time, but, i dont know what happend, but the problem disappeared by itself - so right now i dont have any problem - mystery!
> thunderbird lags so badly like before sporadically. it is almost unusable.

It might be the search indexer that runs over all your mails. That's a one-time action. That would explain why you see it after upgrade, and in a new profile, and then the problem disappears after a short while.
Ben, no, it must have been something else (at least in the first time), you misundestood something. in a new profile i NEVER had this problem, until some days ago. Also, in the past, the problem disappeared never, the problem was present all the time (until i recreatedthe profile), not only after an update!
Blocks: 592876
Duplicate of this bug: 592876
See Also: 592876
Summary: Soft-hang for 5s after focusing a TB window, and massive memory leaks, getting gradually worse over days, with Lightning and ics enabled → Soft-hang for 5s after focusing a TB window, and massive memory leaks, getting gradually worse over days, with calendar and ICS enabled

Is "P1" the top priority? Because it's not showing that here.

(In reply to Worcester12345 from comment #64)

Is "P1" the top priority? Because it's not showing that here.

This is being adressed in other places as resources permit. If you have a reproducible issue which is not covered in an existing bug report, please create a bug report with a performance profile and other details.

No longer blocks: 592876
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: