Closed Bug 791581 Opened 9 years ago Closed 1 year ago

startup crash: memory-allocation loop. corrupt/large 40MB panacea.dat after crash?

Categories

(MailNews Core :: Database, defect)

x86_64
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: thiemo.nagel, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, Whiteboard: [startupcrash])

Attachments

(2 files)

Hello,

after a crash, I'm experiencing a reproducible memory-allocation loop at startup which sends the computer thrashing within approx. 10 seconds due to a corrupted profile (a fresh profile works) with Thunderbird 15.0.1 on 64-bit Linux.  Removing localstore.rdf and chrome and extensions folders does not help.

This is not the first time this has happened to me, iirc I solved it the last time by ditching my profile.

Anybody interested in debugging this?  (debugging as in fixing Thunderbird, not as in fixing my profile folder)

Cheers,
Thiemo
can you get a stacktrace?  https://developer.mozilla.org/En/How_to_get_a_stacktrace_for_a_bug_report#Linux

two linux startup crash bugs - bug 752768 and bug 708222 - are in this list of startup crashes https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced;short_desc_type=allwordssubstr;resolution=---;type0-0-0=nowords;value0-0-0=count%20counts;product=MailNews%20Core;product=Thunderbird;short_desc=startup%20crash;list_id=4407513

you could also try starting offline using -offline

ludo may have other suggestions or someone to pass this on to.
Severity: normal → critical
Keywords: crash, stackwanted
I'll only have time to look into it later this week or at the weekend, but I'll get you a strack trace then.
The stack track obtained with gdb doesn't seem very useful.  (Any suggestions on how to improve it?)

Program received signal SIGINT, Interrupt.
0xf7fdf425 in __kernel_vsyscall ()
(gdb) bt
#0  0xf7fdf425 in __kernel_vsyscall ()
#1  0xf7df234c in lseek64 () from /lib32/libc.so.6
#2  0xf7d8abc5 in _IO_file_seek () from /lib32/libc.so.6
#3  0xf7d8be1b in _IO_file_seekoff () from /lib32/libc.so.6
#4  0xf7d8250f in ?? () from /lib32/libc.so.6
#5  0xf7d83ede in fseek () from /lib32/libc.so.6
#6  0xf6e6fab9 in ?? () from /home/thiemo/bin/thunderbird/libxul.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Anyways, I've found that deleting panacea.dat resolves the issue.  Is 40 MB an unusual size?
-rw-r--r-- 1 thiemo thiemo 40256883 Sep 16 21:57 panacea.dat

strace shows what looks like an endless loop in the reading of panacea.dat:

_llseek(27, 40255488, [40255488], SEEK_SET) = 0
read(27, "k^81:c)(s=9)} \n  [-19A(^82^AC56A"..., 4096) = 1395
_llseek(27, 8667136, [8667136], SEEK_SET) = 0
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
_llseek(27, 8658944, [8658944], SEEK_SET) = 0
read(27, "9F9{@\n@$$}389F9}@\n\n@$${389FA{@\n@"..., 4096) = 4096
read(27, "9B}@\n\n@$${38A9C{@\n@$$}38A9C}@\n\n@"..., 4096) = 4096
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
fstat64(27, {st_mode=S_IFREG|0644, st_size=40256883, ...}) = 0
_llseek(27, 40255488, [40255488], SEEK_SET) = 0
read(27, "k^81:c)(s=9)} \n  [-19A(^82^AC56A"..., 4096) = 1395
_llseek(27, 8667136, [8667136], SEEK_SET) = 0
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
_llseek(27, 8658944, [8658944], SEEK_SET) = 0
read(27, "9F9{@\n@$$}389F9}@\n\n@$${389FA{@\n@"..., 4096) = 4096
read(27, "9B}@\n\n@$${38A9C{@\n@$$}38A9C}@\n\n@"..., 4096) = 4096
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
fstat64(27, {st_mode=S_IFREG|0644, st_size=40256883, ...}) = 0
_llseek(27, 40255488, [40255488], SEEK_SET) = 0
read(27, "k^81:c)(s=9)} \n  [-19A(^82^AC56A"..., 4096) = 1395
_llseek(27, 8667136, [8667136], SEEK_SET) = 0
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
_llseek(27, 8658944, [8658944], SEEK_SET) = 0
read(27, "9F9{@\n@$$}389F9}@\n\n@$${389FA{@\n@"..., 4096) = 4096
read(27, "9B}@\n\n@$${38A9C{@\n@$$}38A9C}@\n\n@"..., 4096) = 4096
read(27, ":^80(^82^829C4)(^83^109)(^84=7b3"..., 4096) = 4096
fstat64(27, {st_mode=S_IFREG|0644, st_size=40256883, ...}) = 0
what happens if you start TB in safe mode, and then start in normal mode.
https://support.mozillamessaging.com/en-US/kb/safe-mode#os=linux&browser=tb15
The problem shows up in safe mode just the same.  Also, starting (and crashing) in safe-mode doesn't help.  After that, the crash in normal mode is just the same.

Do you need a better backtrace?  Can I download a debug build from somewhere?
So the crash reporter doesn't show up ? (if so send us the crash reports and go to help -> troobleshooting , about:crashes)

If it does not - you'll have to build thunderbird debug  (see https://developer.mozilla.org/en-US/docs/Simple_Thunderbird_build) to get more symbols and get us all the threads stacks (bt apply all I believe), not only the crashing one..


panacea.dat contains the information we save when using pop3 to know which messages with donwloaded or not. So maybe having a pop3 log when you crash would give us a hint on what's going on.(see https://wiki.mozilla.org/MailNews:Logging)
Attached file bt full
Backtrace captured after hitting Ctrl-C within the memory allocation loop.

blockSize seems dangerously high (without looking at the code).

#7  0x00007ffff35f726b in morkZone::ZoneNewRun (this=0x7fffd4582190, ev=0x7fffd4711dc0, inSize=84400) at /scratch/thiemo/bulk-computing/tb-15.0.1/db/mork/src/morkZone.cpp:366
        blockSize = 1250959360
Backtrace captured after hitting Ctrl-C within the memory allocation loop.
(In reply to Ludovic Hirlimann [:Usul] from comment #6)
> So the crash reporter doesn't show up ? 

It doesn't.  The issue is a memory-consuming endless loop, not a crash per se.  I'd figure that such things are not picked up by the crash reporter.

> panacea.dat contains the information we save when using pop3 to know which
> messages with donwloaded or not. So maybe having a pop3 log when you crash
> would give us a hint on what's going on.(see
> https://wiki.mozilla.org/MailNews:Logging)

That's all that shows up in pop3.log before I have to kill thunderbird to avoid thrashing due to tb's excessive memory consumption:

1399306016[7f0252463150]: WARNING: Unable to test style tree integrity -- no content node: file /scratch/thiemo/bulk-computing/tb-15.0.1/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 8013
1399306016[7f0252463150]: WARNING: Subdocument container has no frame: file /scratch/thiemo/bulk-computing/tb-15.0.1/mozilla/layout/base/nsDocumentViewer.cpp, line 2391
1399306016[7f0252463150]: WARNING: Subdocument container has no frame: file /scratch/thiemo/bulk-computing/tb-15.0.1/mozilla/layout/base/nsDocumentViewer.cpp, line 2391
Keywords: stackwanted
I just wanted to ping this bug again.  A stack trace is there, thus I've removed the "stackwanted" keyword and added "helpwanted" instead.
Keywords: helpwanted
bug 795295 also crashes in mork-land
Cc'ing jcranmer, who (I think) has some experience dealing with Mork's special-ness.
(In reply to Ludovic Hirlimann [:Usul] from comment #6)
> panacea.dat contains the information we save when using pop3 to know which
> messages with donwloaded or not. So maybe having a pop3 log when you crash
> would give us a hint on what's going on.(see
> https://wiki.mozilla.org/MailNews:Logging)

No, you're mixing up panacea with popstate. panacea.dat is what is known as the "message folder cache"; it caches properties of folders so that we can display the folder pane without having to load every database. It should be safe to delete this file with no harmful consequences other than a potentially longer startup time for the next run.

(In reply to Thiemo Nagel from comment #3)
> Anyways, I've found that deleting panacea.dat resolves the issue.  Is 40 MB
> an unusual size?

No, it's probably rather large. I have over a hundred folders and a profile which has accumulated very large cruft in the profile, and my panacea amounts to fewer than 10MB.
As I wrote in the first sentence of the first message (albeit in parentheses), it doesn't crash on a fresh profile.  The crash is caused by a memory-consuming endless loop in the reading of a (probably corrupted) panacea.dat.  As this makes for a very unpleasant user experience (bringing the whole system to a grinding halt because of thrashing), I think that Thunderbird should catch it somehow.
sorry for being lame earlier.  Do you still have the bad panacea.dat file?
Summary: startup crash: memory-allocation loop → startup crash: memory-allocation loop. corrupt panacea.dat after crash?
No problem.  Yes, I've still got the offending panacea.dat.
Keywords: helpwantedqawanted
Thiemo, does that panacea.dat still make you crash?  

I'm wondering that anyone other than you would crash, even if that anyone had your full profile
Flags: needinfo?(thiemo.nagel)
To be honest, I'm not as enthusiastic about getting this fixed anymore as I was over a year ago, when I reported it.  In the mean time, I've switched to GMail and use Tb very rarely, if at all.  I'll have to dig to find that broken profile and I probably won't do another debug build of Tb in the foreseeable future, so the question remains how useful that kind of input would be to you.

I'd guess that the bug is directly related to panacea.dat parsing and would show up independent of profile, but I don't remember having tested that.
Thlemo, thanks for the feedback.  I think the way forward will ultimately be Bug 418551 - Convert panacea.dat from mork to another database forma
Component: General → Database
Flags: needinfo?(thiemo.nagel)
Product: Thunderbird → MailNews Core
See Also: → 1084276
Summary: startup crash: memory-allocation loop. corrupt panacea.dat after crash? → startup crash: memory-allocation loop. corrupt/large 40MB panacea.dat after crash?
Whiteboard: [startupcrash]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
Depends on: 418551
Keywords: qawanted

With mork very much on it's way out, I'm going to say we won't give this attention. So let's get it off the radar.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.