Closed Bug 781049 Opened 12 years ago Closed 8 years ago

crash in MimeContainer_finalize

Categories

(MailNews Core :: MIME, defect)

x86
Windows NT
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: wsmwk, Assigned: Irving)

References

Details

(4 keywords, Whiteboard: [tbird ][regression:TB18.0a2])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-5ed87de9-b777-4bca-bbbd-fc9002120806 .
============================================================= 

appears to be a regression, or at least has a kick in rate, starting with 16.0a1 20120614030535.  also crashes TB15

0	mozglue.dll	arena_run_reg_dalloc	memory/jemalloc/jemalloc.c:3303
1	mozglue.dll	arena_dalloc_small	memory/jemalloc/jemalloc.c:4512
2	mozglue.dll	arena_dalloc	memory/jemalloc/jemalloc.c:4640
3	mozglue.dll	je_free	memory/jemalloc/jemalloc.c:6567
4	xul.dll	mime_free	mailnews/mime/src/mimei.cpp:297
5	xul.dll	MimeContainer_finalize	mailnews/mime/src/mimecont.cpp:77
6	xul.dll	MimeMultipart_finalize	mailnews/mime/src/mimemult.cpp:107
7	xul.dll	MimeMultipartRelated_finalize	mailnews/mime/src/mimemrel.cpp:232
8	xul.dll	mime_free	mailnews/mime/src/mimei.cpp:290
9	xul.dll	MimeContainer_finalize	mailnews/mime/src/mimecont.cpp:77
10	xul.dll	MimeMessage_finalize	mailnews/mime/src/mimemsg.cpp:92
Summary: crash in arena_run_reg_dalloc → crash in MimeContainer_finalize
The crash is in an assert statement, which changed in the 15 release timeline due to bug 764192.

The change in behaviour was only supposed to go out on non-release builds, but the TB configure may not have picked up the necessary compile flags change; I'll look into that.

Comments on bug 764192 imply that the crash is likely due to freeing the same object twice, or freeing an address that wasn't allocated using jemalloc.
Assignee: nobody → irving
Status: NEW → ASSIGNED
#5 crash (aggregate) for non-released versions
Whiteboard: [tbird devtopcrash]
irving, afaict these are all users of quicktext, making this a cousin of bug 785370 / bug 813899.  All crashes are version 18.
(In reply to Wayne Mery (:wsmwk) from comment #3)
> irving, afaict these are all users of quicktext, making this a cousin of bug
> 785370 / bug 813899.  All crashes are version 18.

OTOH, #2 crash for TB18 beta with slightly different signature ...

none of these with moz_abort | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize, have quicktext. bp-50b4c365-1d96-40e8-bd1f-f28752121204 for example.
now #1 crash for TB18

TB16/TB17 signature arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize 

TB18/... signature moz_abort | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize
Crash Signature: [@ arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize] → [@ arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize] [@ arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize ] [@ moz_abort | arena_run_reg_dalloc | arena_dalloc_small |…
perhaps bug 547621 could also be addressed?
(In reply to Irving Reid (:irving) from comment #1)
> The crash is in an assert statement, which changed in the 15 release
> timeline due to bug 764192.
> 
> The change in behaviour was only supposed to go out on non-release builds,
> but the TB configure may not have picked up the necessary compile flags
> change; I'll look into that.
> 
> Comments on bug 764192 imply that the crash is likely due to freeing the
> same object twice, or freeing an address that wasn't allocated using
> jemalloc.

irving, reversing my earlier comment ... for current beta and aurora, afaict quicktext seems NOT to be involved

#2 crash for TB19

TB17 crashes
bp-b1919f3b-46b3-4514-a111-111b62130111
bp-f65c4310-837f-40b1-a80a-cc8a02130105

TB19 crashes 
bp-a0f84d19-ef64-43c4-a055-10d302130128
bp-021286cf-0e83-4195-a3d8-dffd52130128
bp-31df92d9-d2e8-4f8a-ac58-2bedc2130125
Whiteboard: [tbird devtopcrash] → [tbird devtopcrash][regression:Tb18]
Version: unspecified → 18
(In reply to Irving Reid (:irving) from comment #1)
> Comments on bug 764192 imply that the crash is likely due to freeing the
> same object twice, or freeing an address that wasn't allocated using
> jemalloc.

irving, were you able to determine where this is occurring?
Flags: needinfo?(irving)
OK, so this is starting to look related to the fix in bug 478175 - if we never call MimeMultipartRelated_parse_eof(), we would double-free the "head" object of the multipart/related in MimeMultipartRelated_finalize() because there's a pointer to that part in both the children array and in ->headobj.

I haven't been able to recreate a situation where this happens, so this might not be the problem, but it's what I have for now.
Blocks: 478175
Flags: needinfo?(irving) → needinfo?(bugmail)
I tried to look into this before, but my brain had forgotten most of the surrounding context and I was unable to re-grok the code before my time window expired.  Cost/benefit says I'm unlikely to ever touch libmime again so I'm de-peering myself.  I think jcranmer was interested in a MIME fuzzer which seems like the best option; have it permute where the HTML part is placed in multi/related plus variations on potentially omitting required part closing tags, etc.
Flags: needinfo?(bugmail)
jcranmer, ref comment 9 / 10
Flags: needinfo?(Pidgeot18)
Is this crash happened only on Window?
I've tried to create test case for this issue, but I can't reproduce on my Linux box yet.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #12)
> Is this crash happened only on Window?
> I've tried to create test case for this issue, but I can't reproduce on my
> Linux box yet.

hiro, yes crash occurs on Mac. But more common for users of 3.1.x and 3.0.x (yup, they're out there in good numbers!) Crash rate is too low to find Windows or Mac examples for beta, Earlybird or Daily. 

3.1    bp-1ef24915-27ac-41c0-b61f-0b1f02130613
3.1.20 bp-47b6dde4-5572-41a9-bcde-8502f2130324 (talbano)
12.0.1 bp-fac1c142-fce8-41a8-b8ee-535422121229 (info)
16.0.2 bp-efa35839-fc6f-4924-84e7-369232130530
17.0.4 bp-e6e598b7-7b9d-4794-8167-52d9f2130325
multipart/related is one of the more complicated things to have to wrap my head around, and I don't think I have the time investment to do it in the near future. If we had a testcase that caused this crash (should be as simple as "open this message up" with some possible custom message display settings), it would be easier to diagnose and fix.
Flags: needinfo?(Pidgeot18)
It's #3 top crasher in 23.0b1 so a topcrash according to https://wiki.mozilla.org/CrashKill/Topcrash.
Keywords: topcrash
Whiteboard: [tbird devtopcrash][regression:Tb18] → [tbird betatopcrash][regression:Tb18]
It's #301 crasher in TB 17.0.7 and #2 in TB 18.0b1. It first showed up in 18.0a2/ 	20121022 so the regression range is six-week large (comm-central):
http://hg.mozilla.org/comm-central/pushloghtml?fromchange=1dcadef385b1&tochange=97b223a0ed8e
#253, so not topcrash in releases. So only a spike only in unreleased versions. In fact still #2 crash for TB25.0b1.  But ~1/3 of crashes are from about 3 users - discounting the dups => not in top 5 of crashes for beta.
Keywords: topcrash
Whiteboard: [tbird betatopcrash][regression:Tb18] → [tbird betatopcrash][regression:TB18.0a2]
Depends on: 547621
Is it conceivable that we chose the wrong mime type?

regarding version 31 - bug 547621 crash sig appears but not this bug 781049 crash sig.
Blocks: 1016524
See Also: → 543141
past the point where tracking-24 is useful. And the relevant signature is now bug 547621

(In reply to Joshua Cranmer [:jcranmer] from comment #15)
> multipart/related is one of the more complicated things to have to wrap my
> head around, and I don't think I have the time investment to do it in the
> near future. If we had a testcase that caused this crash (should be as
> simple as "open this message up" with some possible custom message display
> settings), it would be easier to diagnose and fix.

I still don't have a testcase. But given your forays into mime the past year, any new thoughts?

N.B. comment 9 and patch https://bugzilla.mozilla.org/attachment.cgi?id=361951&action=diff
> OK, so this is starting to look related to the fix in bug 478175 - if we
> never call MimeMultipartRelated_parse_eof(), we would double-free the "head"
> object of the multipart/related in MimeMultipartRelated_finalize() because
> there's a pointer to that part in both the children array and in ->headobj.
Flags: needinfo?(Pidgeot18)
Whiteboard: [tbird betatopcrash][regression:TB18.0a2] → [tbird ][regression:TB18.0a2]
(In reply to Wayne Mery (:wsmwk) from comment #20)
> I still don't have a testcase. But given your forays into mime the past
> year, any new thoughts?

No. I've not descended into the existing libmime code in the past year, and all my existing notes on the matter were lost to the ether when my hard drive went belly up.
Flags: needinfo?(Pidgeot18)
Crash Signature: | arena_dalloc | je_free | mime_free | MimeContainer_finalize] → | arena_dalloc | je_free | mime_free | MimeContainer_finalize] [@ moz_abort | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mime_free | MimeContainer_finalize]
Signature does not exist after 24.6.0
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.