Closed
Bug 782899
Opened 13 years ago
Closed 13 years ago
Daily hangs and leaks memory like crazy when trying to delete messages in the results of a filter search
Categories
(Thunderbird :: Folder and Message Lists, defect)
Tracking
(thunderbird15+ fixed, thunderbird16+ fixed, thunderbird17+ fixed)
RESOLVED
FIXED
Thunderbird 18.0
People
(Reporter: justdave, Assigned: Irving)
References
(Blocks 1 open bug)
Details
(4 keywords, Whiteboard: [gs][regression:TB15][gssolved])
Attachments
(3 files)
171.85 KB,
text/plain
|
Details | |
278.90 KB,
message/rfc822
|
Details | |
3.07 KB,
patch
|
asuth
:
review+
standard8
:
approval-comm-aurora+
standard8
:
approval-comm-beta+
standard8
:
approval-comm-release+
|
Details | Diff | Splinter Review |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/17.0 Thunderbird/17.0a1
BuildID: 20120814030529
For the last few weeks, typing a string in the filter box at the top of a folder message list pane, and the selecting multiple items in the results and attempting to delete them results in Daily immediately consuming multiple GB of RAM and hanging until it's force-quit.
Comment 1•13 years ago
|
||
Confirming,; just having multiple selection kills the ponies.
Comment 2•13 years ago
|
||
Dave is mail.operate_on_msgs_in_collapsed_threads set to true ?
Comment 3•13 years ago
|
||
perhaps same as bug 753502
Reporter | ||
Comment 4•13 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #2)
> Dave is mail.operate_on_msgs_in_collapsed_threads set to true ?
It is set to the default value, which is true, yes, I also have threading disabled in most of my folders (including the ones I've run into this in) in case it matters)
Severity: critical → normal
tracking-thunderbird17:
? → ---
Comment 5•13 years ago
|
||
Dave do these emails have attachments ?
Reporter | ||
Comment 6•13 years ago
|
||
They are "page changed" notifications from Confluence, which are multipart/html with embedded pics and css, so yes.
I've also discovered that simply opening one of these can cause the RAM usage explosion.
Comment 7•13 years ago
|
||
(In reply to Dave Miller [:justdave] from comment #6)
> They are "page changed" notifications from Confluence, which are
> multipart/html with embedded pics and css, so yes.
>
> I've also discovered that simply opening one of these can cause the RAM
> usage explosion.
how much memory per message?
regression?
Reporter | ||
Comment 8•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #7)
> (In reply to Dave Miller [:justdave] from comment #6)
> > They are "page changed" notifications from Confluence, which are
> > multipart/html with embedded pics and css, so yes.
> >
> > I've also discovered that simply opening one of these can cause the RAM
> > usage explosion.
>
> how much memory per message?
It usually immediately jumps to around 3.5 GB to 4.0 GB physical in use, and then starts swapping. It continually grabs more memory once it starts, until I force quit.
> regression?
Yes. I could read these without getting this problem a few weeks ago.
Comment 9•13 years ago
|
||
the oldest regression report I find is Bug 759092 - though I'm not convinced it's the same issue.
justdave, could you hunt the regression range?
Bug 759092 cites failure using 2012-05-25 build
Severity: normal → major
tracking-thunderbird17:
--- → ?
Comment 10•13 years ago
|
||
Profiling this shows us spending an inordinate amount of time in jsmimeemitter.js, in the writeBody function:
http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/components/jsmimeemitter.js#455
Profile posted here:
http://people.mozilla.com/~bgirard/cleopatra/?report=ace22cfa16f76a49a787d5d705c5474c68e72633
Cc'ing asuth and protz because this is Gloda stuff.
Updated•13 years ago
|
OS: Mac OS X → All
Comment 11•13 years ago
|
||
This is the function in question:
455 writeBody: function mime_emitter_writeBody(aBuf, aSize, aOutAmountWritten) {
456 if (this._writeBody &&
457 (!this._saneBodySize ||
458 this._curPart.body.length < MAX_SANE_BODY_PART_SIZE))
459 this._curPart.body += aBuf;
460 },
If memory is leaking like crazy, then probably this accumulation is where it's going. Likely that means that either the sane body part size isn't being checked, or there are a lot of not-quite-max-size parts getting added.
It sounds like there's a specific message (or set of messages) that's causing this issue; can this be posted?
Comment 12•13 years ago
|
||
Shot in the dark: it may be a message with an insane amount of parts, or many such messages.
Comment 13•13 years ago
|
||
(In reply to Jonathan Protzenko [:protz] from comment #12)
> Shot in the dark: it may be a message with an insane amount of parts, or
> many such messages.
NO not more than 3. Message coming up. Can't save it from tb so you'll get the untouched message eg without the mbox headers.
Comment 14•13 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #13)
> Created attachment 654764 [details]
> testcase
>
> (In reply to Jonathan Protzenko [:protz] from comment #12)
> > Shot in the dark: it may be a message with an insane amount of parts, or
> > many such messages.
>
> NO not more than 3. Message coming up. Can't save it from tb so you'll get
> the untouched message eg without the mbox headers.
52 parts. It looks like it's basically of the form:
multipart/mixed
multipart/alternative
text/msg
multipart/related
text/html
... Several cid: images ...
application/pdf
Each image averages about 1/3KB of de-base64'd size. Oh, and the images are all Content-Disposition: attachment too, fwiw.
Comment 15•13 years ago
|
||
I see MultiMessageSummary in the profile; not a lot, but a little. I assume this is a case where MultiMessageSummary is either overwhelmed by what it is streaming to provide a useful summary and/or it is making things worse by re-computing things. I think the summaries have some logic to wait for quiescence based on a timer; if the system has become overwhelmed, it is conceivable that things appear to quiesce to pour some gasoline on the fire. It's been long enough that I don't recall the exact event sequence for the summaries so I don't know how possible that is.
I doubt gloda's indexer is involved because all types of deletion should appear to gloda as either a fast-pathed move or an actual deletion. (gloda does show up in the profile a teeny bit, but that could just be background-ness.)
Comment 16•13 years ago
|
||
It's worth noting that because of the way XPConnect works, even if the emitter has stopped concatenating the strings because saturation has been reached, all those calls still happen and those tiny strings will still generate tons of garbage. And I think in the case of HTML, those calls may not be lined-based, but tag-based, in which case the garbage generation could be a ridiculous multiplier since HTML e-mails can include a ridiculous number of tags. I suspect Joshua may know more about that (and his MIME parser may not have this pathological problem).
Comment 17•13 years ago
|
||
FWIT before I came back from vacation , I could read these emails without any issues.
Assignee | ||
Comment 18•13 years ago
|
||
I have this reproduced, so I'll dig in with debugger and tracing. If someone else can find a regression window by testing old Daily builds that would be great.
Assignee: nobody → irving
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•13 years ago
|
||
Oh now this is interesting (and annoying...) The Aug 1 nightly build (https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2012-08-01-04-20-04-comm-aurora/) beachballs and consumes a big chunk of CPU when I select several messages in a folder including the test message, but it *doesn't* consume gigabytes of memory - whereas trunk as of last Thursday eats both CPU and memory.
That means we get to bisect two problems - the CPU load before Aug 1 and the memory bloat after.
Comment 20•13 years ago
|
||
Could be massive garbage generation in both cases, but the GC is just running more frequently (and/or running full cycle collecting GC's more frequently), so we don't actually bloat as much. The profiler should label GC's, so if that's the case now, it should be pretty obvious pretty fast. (Based on the previous profile, I would expect the I/O should still be very obvious.)
It's probably also worth checking with gdb that the I/O hasn't stopped being buffered or something like that. Specifically, check the length of the strings to make sure they are not 1 character long...
Assignee | ||
Comment 21•13 years ago
|
||
Parts of the problem seem to have come and gone. The basic regression window is:
OK: https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2012/05/2012-05-14-05-31-54-comm-aurora/
FAILED: https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2012/05/2012-05-15-03-05-24-comm-central/
so the problem seems to have been introduced May 14. The May 15 build has both CPU load and memory bloat; the May 30 build has CPU load but not memory bloat, and the memory reappears some time after Aug 1.
Assignee | ||
Comment 22•13 years ago
|
||
Bah, I messed up and mixed Aurora and Nightly builds in my bisection. The problem is earlier than May 14 in nightlies. Bisect bisect bisect.
Assignee | ||
Comment 23•13 years ago
|
||
OK, this time for sure ;->
Good nightly: https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2012/05/2012-05-03-03-16-32-comm-central/thunderbird-15.0a1.en-US.mac.txt
Bad nightly: https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2012/05/2012-05-04-03-36-16-comm-central/thunderbird-15.0a1.en-US.mac.txt
Looking at the changes, there's nothing obvious in comm-central; unfortunately I can't get a successful build of either the good or bad revision to test/bisect further.
I haven't read all the mozilla-central changes during that time frame yet.
Comment 24•13 years ago
|
||
I've always suspected the compartment changes in moz-central that landed around that time.
Assignee | ||
Comment 25•13 years ago
|
||
David, good catch - compartment-per-global for XPConnect landed that day. If I recall correctly, mconley's profiler runs showed a bunch of time spent in the cross-compartment JS call path.
Assignee | ||
Comment 26•13 years ago
|
||
OK, so it's a combination of things:
1) the message summary generator uses Gloda's mime structure builder to parse content out of messages, in order to get the first couple of lines to summarize: https://mxr.mozilla.org/comm-central/source/mail/base/content/selectionsummaries.js#585
2) libmime splits messages into text lines and calls the mime "emitter" with the message content line-by-line
3) Gloda's builder reassembles that content into a single string for plaintext and html parts: https://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/components/jsmimeemitter.js#455
Unfortunately the string building at step 3 happens in a way that interacts terribly with compartment-per-global; for some reason the string assignment is happening across a compartment boundary and among other things this forces the JS string to be flattened, so there is insane amounts of memory allocation going on inside a tight loop that doesn't have any GC pauses - so we both burn huge amounts of CPU copying strings, and we blow out the heap by keeping all the copies.
The quick fix is to make selectionsummaries.js pass the "saneBodySize:true" flag to MsgHdrToMimeMessage() - this stops the gloda builder from reassembling the entire string.
This still leaves behind a cruel trap for anyone else who uses the Gloda mime engine, we need to fix the underlying problem as well.
Blocks: cpg
Keywords: regressionwindow-wanted
![]() |
||
Comment 27•13 years ago
|
||
(In reply to Irving Reid (:irving) from comment #26)
Any number of things can cause a string to be flattened so, in general, code should not rely on strings not being flattened. However, part of comment 26 sounded worrying so I'd like to ask for a clarification: are all these huge strings live (i.e., they are in some array or other object structure that is reachable) or should they be garbage that, for some reason, isn't being collected? If it is the latter, this may be a GC scheduling bug, independent but exacerbated by bug 650353.
Comment 28•13 years ago
|
||
Ah, the flattening... good investigatory find! As a simple horribleness-reducer, we could change to just putting the strings in a list and calling join at the end.
Comment 29•13 years ago
|
||
(In reply to Irving Reid (:irving) from comment #26)
> Unfortunately the string building at step 3 happens in a way that interacts
> terribly with compartment-per-global; for some reason the string assignment
> is happening across a compartment boundary and among other things this
> forces the JS string to be flattened, so there is insane amounts of memory
> allocation going on inside a tight loop that doesn't have any GC pauses - so
> we both burn huge amounts of CPU copying strings, and we blow out the heap
> by keeping all the copies.
This may be caused by how the gloda mimemsg.js architecture works. mimemsg.js is a JS module that is imported by Cu.import, while the code that appends body parts is a JS component called back by libmime. The this._curPart is synthesized from a function in mimemsg.js, so it generates a cross-compartment wrapper, so most of our manipulation from the JS emitter is going through cross-compartment calls (and xpconnect!).
Assignee | ||
Comment 30•13 years ago
|
||
Luke: I agree that we don't want to rely on flattener and GC behaviour; in fact, while the CPU burn on this bug has been constant since CPG landed the memory bloat has come and gone a few times so it could very well be related to GC scheduling.
Step one of my preferred approach would be to figure out a way to have all the objects in the mime message assembler be inside the same compartment so that we don't need to proxy any of the calls; then we can look at fixing the inefficient way we assemble the big string.
Assignee | ||
Comment 31•13 years ago
|
||
Passing this option prevents the gloda MIME emitter from building extra-large strings containing MIME part contents, which reduces the impact of string copying and CPG enough to get by.
asuth: Should we also pass partsOnDemand:true to avoid further downloading, or does aAllowDownload=false already avoid problems with getting MIME parts from the server?
callek: mxr didn't find any other non-test calls to MsgHdrToMimeMessage(), but someone should check SM to make sure it doesn't have a similar problem if it uses the gloda MIME emitter.
Attachment #656201 -
Flags: superreview?(bugspam.Callek)
Attachment #656201 -
Flags: review?(bugmail)
Comment 32•13 years ago
|
||
Comment on attachment 656201 [details] [diff] [review]
Don't accumulate the entire message body in MIME emitter when creating message summaries
redirecting to Neil for SeaMonkey "OMG will this break us, or do we need to port" Q, since I know very little about this part of our code.
Attachment #656201 -
Flags: superreview?(bugspam.Callek) → superreview?(neil)
Comment 33•13 years ago
|
||
FWIW, I just tried this patch on a collection of messages that the TB generally chokes on when multi-selecting.
There is a significant improvement to performance - it's still a little sticky, but it certainly doesn't send TB into a tailspin like it did before.
Great detective work, Irving!
Comment 34•13 years ago
|
||
Comment on attachment 656201 [details] [diff] [review]
Don't accumulate the entire message body in MIME emitter when creating message summaries
I'm looking at newsblogOverlay.js because I think that's the only shared code (we don't have the code that summarises message selections). Is there no way to get just the headers, or better still, just the content-base header from the message, and no body at all?
Comment 35•13 years ago
|
||
> asuth: Should we also pass partsOnDemand:true to avoid further downloading, or
does aAllowDownload=false already avoid problems with getting MIME parts from
the server?
aAllowDownload=false means: "if the message is not available locally (offline sync or message cache), don't download it over the internet, and use the very little information we have from the headers in the .msf file"
partsOnDemand=true means: "if fetching the message remotely (because it isn't fully available locally), and the server is IMAP, and the server supports BODYSTRUCTURE, use it to not download parts too big, but still create a complete Mime structure for the message"
I think aAllowDownload=false takes precedence.
Comment 36•13 years ago
|
||
Comment on attachment 656201 [details] [diff] [review]
Don't accumulate the entire message body in MIME emitter when creating message summaries
As protz says, the default (which is now being explicitly set) for allow download (=false) should avoid any network traffic anyways. I think setting saneBodySize is definitely a good stop-gap.
I don't think there's any way to avoid the JS component being in its own compartment without changes to the way the emitter is instantiated, which leaves us moving all of the representation gunk from mimemsg.js to jsmimeemitter.js and leaving the calls that trigger the streaming in mimemsg.js.
Specifically, gloda.manifest lists the category manage entry: "category mime-emitter @mozilla.org/messenger/mimeemitter;1?type=application/x-js-mime-message @mozilla.org/gloda/jsmimeemitter;1" which is how it gets instantiated We could have the streaming process take a pre-constructed XPConnect-looking object and have all the MIME helper stuff operate by using the subscript loader inside the same context as whatever wants to consume it. We would probably want to use a RequireJS-type idiom (with 'use strict' to avoid global pollution). This would likely want to entail a change to similarly most of the gloda modules in a similar fashion.
Having said that, I think just having the representation classes loaded in the emitter compartment should be sufficient to stop the worst of the pain. That and maybe encouraging the C++ shim that defines the emitter layer to perform a little more internal batching so the XPConnect wrapping is not quite so over-the-top.
Attachment #656201 -
Flags: review?(bugmail) → review+
Comment 39•13 years ago
|
||
several potentially matching topics taggged https://getsatisfaction.com/mozilla_messaging/tags/tb15perf
TB15.0.1 coming?? :)
Whiteboard: [gs]
Assignee | ||
Comment 40•13 years ago
|
||
Checked in:
https://hg.mozilla.org/comm-central/rev/4ac6d862637f
but see Bug 784286 to track further work necessary...
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
![]() |
||
Comment 41•13 years ago
|
||
Is this specific to TB17 or can it be in TB15 too?
Here are some bugs like this: bug 787348, bug 787321.
Comment 42•13 years ago
|
||
@aceman: yes, I believe the bug is in TB15 official release. Here's my experience after upgrading: bug 787121
@:wsmwk -- I agree that a TB 15.0.1 release containing this fix is needed -- I actually had to roll back to TB14, and am happily productive again.
Updated•13 years ago
|
Attachment #656201 -
Flags: superreview?(neil)
Updated•13 years ago
|
Whiteboard: [gs] → [gs][regression:TB15]
Target Milestone: --- → Thunderbird 17.0
![]() |
||
Comment 43•13 years ago
|
||
There are potential reports that may be a dupe of this. Will this patch be backported to some branches?
Comment 44•13 years ago
|
||
Comment on attachment 656201 [details] [diff] [review]
Don't accumulate the entire message body in MIME emitter when creating message summaries
[Triage Comment]
Taking this onto beta so we can get some testing.
Attachment #656201 -
Flags: approval-comm-beta+
Attachment #656201 -
Flags: approval-comm-aurora+
Comment 45•13 years ago
|
||
Checked in:
https://hg.mozilla.org/releases/comm-aurora/rev/f0b6d2317205
https://hg.mozilla.org/releases/comm-beta/rev/0564977e1c9d
status-thunderbird16:
--- → fixed
status-thunderbird17:
--- → fixed
tracking-thunderbird15:
--- → +
tracking-thunderbird16:
--- → +
Target Milestone: Thunderbird 17.0 → Thunderbird 18.0
![]() |
||
Comment 46•13 years ago
|
||
Is this going into 15.0.1 please? So we can see if the tons of similar bugs are dupes.
Comment 48•13 years ago
|
||
Comment on attachment 656201 [details] [diff] [review]
Don't accumulate the entire message body in MIME emitter when creating message summaries
[Triage Comment]
The general feedback so far has been that this and bug 784286 are fixing issues, so taking onto the release branch.
Attachment #656201 -
Flags: approval-comm-release+
Comment 49•13 years ago
|
||
status-thunderbird15:
--- → fixed
See Also: → https://launchpad.net/bugs/1048618
![]() |
||
Comment 51•13 years ago
|
||
Irving, we get reports in other bugs about this, that in TB15.0.1 the memory (freeze) problem is solved but now not all of the selected messages are summarized in the message pane. Is that a known side effect of the patch?
Comment 55•13 years ago
|
||
(In reply to :aceman from comment #51)
> Irving, we get reports in other bugs about this, that in TB15.0.1 the memory
> (freeze) problem is solved but now not all of the selected messages are
> summarized in the message pane. Is that a known side effect of the patch?
There should be no side-effect like that. But be aware that the summarization logic caps MAX_THREADS to 100 and MAX_MESSAGES to 100, so it could just be people are being more observant and noticing this limitation.
![]() |
||
Comment 56•13 years ago
|
||
Are they informed about that? Does the preview pane contain some "further messages were hidden" footnote?
Comment 57•13 years ago
|
||
(In reply to :aceman from comment #56)
> Are they informed about that? Does the preview pane contain some "further
> messages were hidden" footnote?
yes at the bottom of the summary view
Comment 58•13 years ago
|
||
In my case, I don't see any notice that not everything was summarized. For 9 messages, total size is 14MB.
Comment 59•12 years ago
|
||
Whiteboard: [gs][regression:TB15] → [gs][regression:TB15][gssolved]
You need to log in
before you can comment on or make changes to this bug.
Description
•