Closed Bug 1679299 Opened 3 years ago Closed 3 years ago

Crash in [@ MimeMultipartAlternative_flush_children] opening message

Categories

(MailNews Core :: MIME, defect, P1)

Thunderbird 81

Tracking

(thunderbird_esr78 unaffected, thunderbird91+ fixed)

VERIFIED FIXED
92 Branch
Tracking Status
thunderbird_esr78 --- unaffected
thunderbird91 + fixed

People

(Reporter: wsmwk, Assigned: benc)

References

Details

(4 keywords)

Crash Data

Attachments

(8 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1663664 +++

Still a top 10 crash for 83 beta. Example...

bp-fd91392e-b37a-4d26-ab75-565800201125
0 xul.dll MimeMultipartAlternative_flush_children(MimeObject*, bool, priority_t) comm/mailnews/mime/src/mimemalt.cpp:233
1 xul.dll MimeMultipartAlternative_parse_eof(MimeObject*, bool) comm/mailnews/mime/src/mimemalt.cpp:252
2 xul.dll MimeMultipartRelated_parse_eof(MimeObject*, bool) comm/mailnews/mime/src/mimemrel.cpp:1067
3 xul.dll MimeMultipart_close_child(MimeObject*) comm/mailnews/mime/src/mimemult.cpp:505
4 xul.dll MimeMultipart_parse_line(char const*, int, MimeObject*) comm/mailnews/mime/src/mimemult.cpp:140
5 xul.dll mime_LineBuffer(char const*, int, char**, int*, unsigned int*, bool, int ()(char, unsigned int, void*), void*) comm/mailnews/mime/src/mimebuf.cpp:205
6 xul.dll MimeObject_parse_buffer(char const*, int, MimeObject*) comm/mailnews/mime/src/mimeobj.cpp:223
7 xul.dll MimeMessage_parse_line(char const*, int, MimeObject*) comm/mailnews/mime/src/mimemsg.cpp:179
8 xul.dll mime_LineBuffer(char const*, int, char**, int*, unsigned int*, bool, int ()(char, unsigned int, void*), void*) comm/mailnews/mime/src/mimebuf.cpp:205
9 xul.dll MimeObject_parse_buffer(char const*, int, MimeObject*) comm/mailnews/mime/src/mimeobj.cpp:223
10 xul.dll mime_display_stream_write(_nsMIMESession*, char const*, int) comm/mailnews/mime/src/mimemoz2.cpp:850
11 xul.dll nsStreamConverter::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long long, unsigned int) comm/mailnews/mime/src/nsStreamConverter.cpp:824
12 xul.dll mozilla::dom::MediaDocumentStreamListener::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long long, unsigned int) dom/html/MediaDocument.cpp:89
13 xul.dll mozilla::net::ParentChannelWrapper::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long long, unsigned int) netwerk/ipc/ParentChannelWrapper.h:27
14 xul.dll mozilla::net::DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*) netwerk/ipc/DocumentLoadListener.cpp:1179
15 xul.dll mozilla::net::DocumentLoadListener::FinishReplacementChannelSetup(nsresult) netwerk/ipc/DocumentLoadListener.cpp:1089
16 xul.dll mozilla::net::DocumentLoadListener::RedirectToRealChannelFinished(nsresult) netwerk/ipc/DocumentLoadListener.cpp:1030
17 xul.dll mozilla::net::DocumentLoadListener::TriggerRedirectToRealChannel::<unnamed-tag>::operator()(nsresult const&) netwerk/ipc/DocumentLoadListener.cpp:1970
18 xul.dll mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, 1>::ThenValue<lambda at /builds/worker/checkouts/gecko/netwerk/ipc/DocumentLoadListener.cpp:1961:11',lambda at /builds/worker/checkouts/gecko/netwerk/ipc/DocumentLoadListener.cpp:1972:11'>::DoResolveOrRejectInternal(mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, 1>::ResolveOrRejectValue&) xpcom/threads/MozPromise.h:769
19 xul.dll mozilla::MozPromise<CopyableTArray<bool>, bool, 0>::ThenValueBase::ResolveOrRejectRunnable::Run() xpcom/threads/MozPromise.h:410
20 xul.dll mozilla::RunnableTask::Run() xpcom/threads/TaskController.cpp:450
21 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) xpcom/threads/TaskController.cpp:720
22 xul.dll mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) xpcom/threads/TaskController.cpp:579

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(mkmelin+mozilla)
See Also: → 1680879

this signature arose with version 81, same as bug 1675784

Flags: needinfo?(benc)
See Also: → 1675784

I've had a bit of a blind look down through the call stack and can't see anything obvious... but mainly that's a factor of me not being at all familiar with the mime code. I'd need to sit down and really dig into it I think.

But. One thing I did notice: in most of the crash reports the access violations are on memory locations which look very suspiciously like ASCII characters. So I'd guess either a buffer overrun somewhere, or using uninitialised memory (which would probably fit in with pending_parts being wrong, as per Comment 1).

Flags: needinfo?(benc)

to reporduce: Specific email causes crash - just clicking to open kills TB, every time

Can you attach a test case as .eml?

Test case EML file.

(In reply to Magnus Melin [:mkmelin] from comment #6)

Can you attach a test case as .eml?

I'm having the same issue as Antony states above. Attaching a copy of the email EML file that's causing the issue.

(In reply to willdanneriv from comment #7)

Created attachment 9229544 [details]
FW ERN Log Phase 1 Training.eml

Test case EML file.

unfortunately doesn't crash for me

OTOH, I opened and forwarded it to myself as attachment, and I consistently crash when clicking on the message in Inbox.
bp-24113461-f891-4454-ac18-66ca20210701

Flags: needinfo?(mkmelin+mozilla)

Ben, please have a look.

Assignee: nobody → benc
Flags: needinfo?(mkmelin+mozilla)

(In reply to Wayne Mery (:wsmwk) from comment #10)

OTOH, I opened and forwarded it to myself as attachment, and I consistently crash when clicking on the message in Inbox.
bp-24113461-f891-4454-ac18-66ca20210701

I did the same test, and it happened to me too
I downloaded the eml
Opened in TB 90.b3
Forwarded to my self

When I try to open the email from my Inbox, and crash bp-c32bf094-d5e6-45e3-b02a-0f3f00210701

(In reply to Wayne Mery (:wsmwk) from comment #10)

OTOH, I opened and forwarded it to myself as attachment, and I consistently crash when clicking on the message in Inbox.
bp-24113461-f891-4454-ac18-66ca20210701

This is the behavior I'm seeing as well. I'm on TB 90.0b3 (64-bit).
The few emails it has happened on were forwarded emails with that as an attachment.

file as ref in my latest comment date 8/7

Further testing shows that a reply email, from this one correspondent, on the back of origin from me, works, no issue.
Sending an eml file as well of one that crashed... if i set the body message as Plain Text, I can open the msg, thogh it is is ful of Base64 chrs

QkVHSU46VkNBTEVOREFSDQpNRVRIT0Q6UkVRVUVTVA0KUFJPRElEOk1pY3Jvc29mdCBFeGNoYW5n

as the first line

Also, seems that the original email from this source works, since then fwd or resends with additional attachments causes crash.

Regressed to 90.b1 same issue

bizzare coincidence, I just crashed bp-0a947e96-d2b0-4204-acd7-11ca50210709 90.0b3 on my virtual PC.
It wasn't the testcase, but I had just received a message whose composition is similar to the testcase - be I can't reproduce the crash.
However, the testcase does reproduce on Windows and Mac.

Based on crash-stats, this is a top crash for beta, and I believe this will emerge as a top crash when we release version 91. AFICS it's not an issue on version 78

Severity: -- → S2
Status: NEW → ASSIGNED
OS: Windows 10 → All
Priority: -- → P1
Hardware: Unspecified → All
Summary: Crash in [@ MimeMultipartAlternative_flush_children] → Crash in [@ MimeMultipartAlternative_flush_children] opening message

(In reply to Wayne Mery (:wsmwk) from comment #17)

bizzare coincidence, I just crashed bp-0a947e96-d2b0-4204-acd7-11ca50210709 90.0b3 on my virtual PC.
It wasn't the testcase, but I had just received a message whose composition is similar to the testcase - be I can't reproduce the crash.
However, the testcase does reproduce on Windows and Mac.

Based on crash-stats, this is a top crash for beta, and I believe this will emerge as a top crash when we release version 91. AFICS it's not an issue on version 78

Wayne
received a message whose composition is similar to the testcase
Are you referring to my eml I attached - if yes my concern is tht this is as plain text and hence may not help reporduce (if not my eml - then no prob)
I will see if I can attach the original email that crashes (well I hope it is the original version) might not be any diff

had to zip as the original is just a bit too big, its full of attachments HTML

Use this dump the previous file

(In reply to Wayne Mery (:wsmwk) from comment #9)

(In reply to willdanneriv from comment #7)

Created attachment 9229544 [details]
FW ERN Log Phase 1 Training.eml

Test case EML file.

unfortunately doesn't crash for me

Nor me!!!

Is there a max size for incoming attachments?

I notice that the email (in fact a resending of the same email with attachments being added with each subsequent send), grows in size

One is at 18 mg size of attachments which hen equates to 25 mg email size

Oddly, if I fwd same email to another address of mine, it opens fine - no crash (will ALL attachments!) so I guess that rather answers my question above :-/

However, the original text part of the msg is missing

Same for the original email that crashes TB - fwd to my outlook, opens but no msg in body: attachments are there though

see my comment and look at screenshot referred to

Ok, had added a new attachment

When opened you will see Sophie's reply to my reply to her originl email. That original 30/06 at 16:36 crashes TB

Now what is weird is that I was able to open and reply 30/06 17:25 and she replied next morning. My reply does NOT appear in the top window in the thread - screenshot also provided.

The original email in this thread now crashes TB. I need to check when I updated TB - I have a feeling I also jumped from 90.0b1 to b3, or 90.b2 build 1 to 90.b3 build2, so may have skipped an interdiate build/beta ver

Is there a log that records the update paths and timestamps?

update to 91.0b1 - same email(s) crash TB

I can replicate this using the email in comment #20 (I've got it in an IMAP folder, and it'll crash whenever I go to view it).

Just before the crash I see some NS_ASSERTIONs triggering (which aren't fatal by default, but really should be :- ):

[Parent 374400, Main Thread] ###!!! ASSERTION: object shouldn't be closed: '!obj->closed_p', file /home/ben/tb/mozilla/comm/mailnews/mime/src/mimeobj.cpp:220
[Parent 374400, Main Thread] ###!!! ASSERTION: object shouldn't be closed: '!obj->closed_p', file /home/ben/tb/mozilla/comm/mailnews/mime/src/mimeobj.cpp:220
[Parent 374400, Main Thread] ###!!! ASSERTION: obj already parsed: '!obj->parsed_p', file /home/ben/tb/mozilla/comm/mailnews/mime/src/mimeobj.cpp:239

I think these are related to the crash, but not 100% certain yet. Digging into it now.

Attachment #9230545 - Attachment is obsolete: true

#14 crash for beta 91

OK, I think I understand what's happening.
It requires a large enough message (>800KB or so, I think) with a "text/calendar" part in it.

Short description:
The calendar part causes async badness which cause nested calls to nsStreamConverter::OnDataAvailable().

Long description:
The mime parsing (in C++) is driven by an async nsInputStreamPump which feeds data into the parser by calling nsStreamConverter::OnDataAvailable(). Under normal operation the chunk of data will be processed and then OnDataAvailable() will exit, ready to be called when the next chunk is ready to process.

However, with "text/calendar" parts (and likely other types), it calls nsISimpleMimeConverter.convertToHTML() to compose a nice blob of HTML to embed into the message display. For "text/calendar", the converter is implemented in javascript, in CalMimeConverter.jsm. That convertToHTML() implementation decodes the event details and uses them to fill out a template. It uses an XMLHttpRequest to fetch that template synchronously.
This is where it all goes pear-shaped.
The XMLHttpRequest blocks, and like all good javascript plumbing, allows the thread to go on processing other pending things while we wait for the request to complete. One of these other things is the input stream we were reading from in the first place, which promptly calls OnDataAvailable() again, even though we're still in the middle of processing the first OnDataAvailable() call.
Hilarity ensues.

Not quite sure yet what the proper fix is. A couple of ideas I'll investigate tomorrow:

  1. CalMimeConverter.jsm could preload the template file to avoid blocking and yielding to other events. But seems a bit crappy to impose such non-yielding conditions upon JS code, as there's no obvious indication which things yield behind the scenes and which don't (no await/async to make it obvious).
  2. it might be possible to suspend the inputpump at the beginning of the OnDataAvailable() handling, and resume it at exit, to prevent reentry. Seems pretty icky though.
    I'm currently leaning toward 1 as the lesser evil.

Geoff: are there any conventions/expectations/precedents about behind-the-scenes yielding in javascript? Is it utterly unreasonable to say some JS functions are banned from doing non-async XMLHttpRequests and the like?

Flags: needinfo?(geoff)

Oh, looks like the only other built-in nsISimpleMimeConverter implementation is the one in addressbook, for displaying "text/vcard" (VCardUtils.jsm). But that one uses a template hard-coded into the code itself, so I don't think it'll suffer from this issue.

I'd lean towards option 1 as well. It'd be easy to see whether it works given you can reproduce the bug, and the template itself is small enough to hard-wire in. It'd be nice to have some protection from the input side of things though.

Geoff: are there any conventions/expectations/precedents about behind-the-scenes yielding in javascript?

The blocking XMLHttpRequest really allows other things to continue while it does I/O? That's somewhat surprising. I think the answer is no, and most people would assume a sync request would be okay in these circumstances anyway.

Is it utterly unreasonable to say some JS functions are banned from doing non-async XMLHttpRequests and the like?

Seems okay to me, although all you can really do to enforce it is make a big scary warning comment in the appropriate places.

Flags: needinfo?(geoff)

(In reply to Geoff Lankow (:darktrojan) from comment #33)

The blocking XMLHttpRequest really allows other things to continue while it does I/O? That's somewhat surprising. I think the answer is no, and most people would assume a sync request would be okay in these circumstances anyway.

It's down in the deep depths of the runtime system - threads are event targets which can have a queue of events waiting to run when they get the chance. And I think a sync XMLHttpRequest yields to some other pending event.
I'm not worried about the Javascript runtime - I'm pretty confident that it's all sensible. I'm not totally sure what happens there - I'd guess that you would expect to see other JS running while the XMLHttpRequest blocked... eg a previous promise waiting on some other IO, or a timer callback? Or maybe those would have to wait until the request completed? Whatever the case is, I'm confident that it's been considered carefully and that it all Just Works (tm). JS runtime has had a lot of attention.
It's more the interaction with the C++ side of things. Calling into JS in this case, the XMLHttpRequest means we yield to some other queued event, even though we're still in the middle of processing one. From the C++ point of view, we're totally not expecting some other event to be run. Its like accidentally turning a simple function call into a coroutine.
I wonder what other JS operations can do the same thing? Hmm. Maybe this is why the [scriptable] XPIDL markers are so rigorous?

This removes the use of a blocking (synchronous) XMLHttpRequest, which was
causing recursive calls to nsStreamConverter::OnDataAvailable() and general
mayhem. The stream feeding the mime parsing takes advantage of the blocking
to send us more data even though we're still in the middle of processing the
previous chunk of data!

Looking into the implementation of sync XMLHttpRequest, it calls SpinEventLoopUntil() while blocking, which means other thread events will be serviced. Hence the inputstream being able to deliver the next chunk of data before we're ready for it...

https://searchfox.org/mozilla-central/search?q=If+we're+synchronous%2C+spin+an+event+loop+here+and+wait&path=&case=false&regexp=false

There are a bunch of other places which also call SpinEventLoopUntil(), but I don't see any which are toooooo immediately alarming.
(looks like flushing out prefs to disk can do it, for example, but seems unlikely to be an issue for us).

I think we just have to chalk this up to: "there are some obscure things you can do which will screw this up, so don't do those things!" :-)
In the case of XMLHttpRequest, the sync operation is explicitly discouraged anyway. Good advice.

Attachment #9232263 - Attachment description: WIP: Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. → Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r?darktrojan
Attachment #9232263 - Attachment description: Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r?darktrojan → Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r=darktrojan
Target Milestone: --- → 92 Branch

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/5518f9fe5364
Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r=darktrojan

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/7401d1f46d1b
follow-up - Parse template file as XML, not HTML. rs=bustage-fix

Doh! Sorry about that... Worked fine on my local tests, but stupidly forgot to run a try build.

Regressions: 1722257

Comment on attachment 9232263 [details]
Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r=darktrojan

[Approval Request Comment]
Top crash fix. Will also need to uplift bug 1722257.

Attachment #9232263 - Flags: approval-comm-beta?

Comment on attachment 9232263 [details]
Bug 1679299 - Inline invitation-template.xhtml to prevent C++ re-entrancy in CalMimeConverter. r=darktrojan

[Triage Comment]
Approved for beta

We will want to be on the lookout for regressions

Attachment #9232263 - Flags: approval-comm-beta? → approval-comm-beta+
Regressions: 1723116

Antony, is the crash gone for you on beta?

willdanneriv, is the crash gone for you?

Flags: needinfo?(willdanneriv)
Flags: needinfo?(acdp)

(In reply to Wayne Mery (:wsmwk) from comment #43)

Antony, is the crash gone for you on beta?

willdanneriv, is the crash gone for you?

Yes - looks fixed - I checked with the same email that I reported that crashed TB, this now displays properly

Cool Bananas

Flags: needinfo?(acdp)

Thanks for the update.
Also gone according to crash-stats

Status: RESOLVED → VERIFIED
Flags: needinfo?(willdanneriv)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: