Crash in [@ nsMapiHook::PopulateCompFieldsW] in 66.0b1

VERIFIED FIXED in Thunderbird 68.0

Status

defect
--
critical
VERIFIED FIXED
2 months ago
9 days ago

People

(Reporter: wsmwk, Unassigned)

Tracking

({crash})

Thunderbird 68.0
x86
Windows 7

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(2 attachments)

(Reporter)

Description

2 months ago

still some crash despite fix from bug 1523818

This bug is for crash report bp-418fe1af-3cd6-4731-8ac0-fc1b80190212.

Top 10 frames of crashing thread:

0 xul.dll nsMapiHook::PopulateCompFieldsW comm/mailnews/mapi/mapihook/src/msgMapiHook.cpp:699
1 xul.dll CMapiImp::SendMailW comm/mailnews/mapi/mapihook/src/msgMapiImp.cpp:253
2 rpcrt4.dll Invoke 
3 rpcrt4.dll _imp_load__FreeAddrInfoW 
4 ole32.dll ole32.dll@0x1407f5 
5 ole32.dll ChannelProcessInitialize 
6 ole32.dll ChannelWrapper_QueryInterface 
7 ole32.dll CCtxComChnl::ContextInvoke 
8 ole32.dll MTAInvoke d:\w7rtm\com\ole32\com\dcomrem\callctrl.cxx:2097
9 ole32.dll STAInvoke 

(Reporter)

Updated

2 months ago
See Also: → 1523818

Comment 1

2 months ago

In the beta this code is:

nsresult nsMapiHook::PopulateCompFieldsW(lpnsMapiMessageW aMessage,
                                         nsIMsgCompFields* aCompFields)
{
  if (aMessage->lpOriginator && aMessage->lpOriginator->lpszAddress)  <= 699
    aCompFields->SetFrom(nsDependentString(aMessage->lpOriginator->lpszAddress));

The function is not called any more iff aMessage==null. So now we'd have to assume what Kai already mentioned: Memory corruption? :-(

All I can see us doing is pulling the entire function out again :-(

(Reporter)

Comment 2

2 months ago

I forgot to mention the user wrote "send email from Quickbooks"

(Reporter)

Comment 3

2 months ago

Time for another beta?

Also crash report bp-f3341961-0de5-482b-a22b-3569a0190212 "sending a file from explorer"

Top 10 frames of crashing thread:

0 xul.dll mozilla::detail::nsTStringRepr<char16_t>::EqualsASCII xpcom/string/nsTSubstring.cpp:1052
1 xul.dll nsMapiHook::PopulateCompFieldsW comm/mailnews/mapi/mapihook/src/msgMapiHook.cpp:718
2 xul.dll CMapiImp::SendMailW comm/mailnews/mapi/mapihook/src/msgMapiImp.cpp:264
3 rpcrt4.dll Invoke 
4 mapiproxy_inuse.dll seh_filter_dll 
5 rpcrt4.dll NdrStubCall2 
6 ole32.dll IClassFactory_CreateInstance_Stub 
7 ole32.dll ChannelProcessUninitialize 
8 ole32.dll CRpcChannelBuffer::GetProtocolVersion 
9 ole32.dll CCtxComChnl::ContextInvoke 

Crash Signature: [@ nsMapiHook::PopulateCompFieldsW] → [@ nsMapiHook::PopulateCompFieldsW] [@ mozilla::detail::nsTStringRepr<T>::EqualsASCII]

(In reply to Wayne Mery (:wsmwk) from comment #3)

Time for another beta?

We don't have a new potential fix yet.

Comment 5

2 months ago

The crash in comment #3 is from TB 65 beta and is misleading.

I would like to suggest to cleanup the API use further.

We have a type mismatch. The Windows callback definition from
https://docs.microsoft.com/en-us/windows/desktop/api/mapi/nc-mapi-mapisendmailw
uses a different type for the lpMessage parameters.

It uses the MapiMessage / MapiMessageW types defined by the Windows API.

It seems wrong that our code had replaced that with our own type
nsMapiMessage / nsMapiMessageW

Apparently the intention of the code author was to use a single type that works for both the incoming call from Windows, and for the outgoing call to our own definition of the NSIMapi interface.

Although the definition of our own nsMapiMessage* types attempt to be a very close match to the Windows MapiMessage* types, I'm not convinced the way that type is used is safe.

During this function call, IIUC we have two separate process contexts.

Context A - TB's MapiDll

  • Incoming call from Windows, we are given a parameter of type MapiMessage*.
  • It's incoming to TB's separate MapiDll that is registered with the Windows registry.
  • I think we're supposed to treat this as read-only.

Process context A makes a cross-process call into context B.

Context B - Thunderbird process

  • Incoming interprocess call, that our MapiDll initiated
  • calls the implementation of our own definition of msgMapi.SendMail*

I see the following potential issues with the call from A to B:

  • we don't know if the call is synchronous or asynchronous.
    Unless anyone can tell us what exactly happens, it could be sometimes sync, sometimes async,
    or whether it's sync/async might even depend on the Windows version.

    If it's asynchronous, then context A might terminate as soon as the function
    call to context B has been issued, which means that the original parameter
    given to context A is deleted.
    This means that whenever context B gets called in an async way, context B
    operates on memory that might have been freed/destroyed already.

  • even if it's synchronous, I don't know about the destruction order of the parameter
    given from A to B.

I think we need the following question answered:

In context A, is MapiDll required to make deep copies of all infomation
that was received in the MapiMessage* structure?
When in doubt, I'd say yes, we should.

What are the memory ownership rules when calling a COM IDL interface, from A to B ?
Which of the following is true?

  • A is allowed to keep ownership, and is allowed to destroy the memory immediately
    after the call to nsIMapi.SendMail* returns
    or
  • A is required to allocate new memory for the pointers given to nsIMapi.SendMail*
    and A must not free them, but assume that the function call will transfer
    ownership of the memory?

If we get good answers to the above, then we can implement the correct copying the cleanup as required.

If we cannot find out, "copying and leaking" might be the safest approach.

Comment 7

2 months ago

@Wayne: Yes, in principle it's time for another beta. My suggestion would be to disable MAPISendMailW with a one-line change. Then applications will use MAPISendMail and we can see whether it has the same problem since both interfaces use equivalent code. Pity that we need to run a beta to find out.

That said, I should try whether "sending a file from explorer" crashes for me, I haven't tried for a while.

@Kai: If bug 393302 comment #119 doesn't lie, there's already a "deep copy" happening.

Comment 8

2 months ago

Actually, doesn't crash using my local 64bit build, maybe we should try a 32bit build.

Comment 9

2 months ago

(In reply to Kai Engert (:kaie:) from comment #6)

We have a type mismatch. The Windows callback definition from
https://docs.microsoft.com/en-us/windows/desktop/api/mapi/nc-mapi-mapisendmailw
uses a different type for the lpMessage parameters.

It uses the MapiMessage / MapiMessageW types defined by the Windows API.

It seems wrong that our code had replaced that with our own type
nsMapiMessage / nsMapiMessageW

Apparently the intention of the code author was to use a single type that works for both the incoming call from Windows, and for the outgoing call to our own definition of the NSIMapi interface.

Although the definition of our own nsMapiMessage* types attempt to be a very close match to the Windows MapiMessage* types, I'm not convinced the way that type is used is safe.

During this function call, IIUC we have two separate process contexts.

Context A - TB's MapiDll

  • Incoming call from Windows, we are given a parameter of type MapiMessage*.
  • It's incoming to TB's separate MapiDll that is registered with the Windows registry.
  • I think we're supposed to treat this as read-only.

Process context A makes a cross-process call into context B.

Context B - Thunderbird process

  • Incoming interprocess call, that our MapiDll initiated
  • calls the implementation of our own definition of msgMapi.SendMail*

I see the following potential issues with the call from A to B:

  • we don't know if the call is synchronous or asynchronous.
    Unless anyone can tell us what exactly happens, it could be sometimes sync, sometimes async,
    or whether it's sync/async might even depend on the Windows version.

    If it's asynchronous, then context A might terminate as soon as the function
    call to context B has been issued, which means that the original parameter
    given to context A is deleted.
    This means that whenever context B gets called in an async way, context B
    operates on memory that might have been freed/destroyed already.

  • even if it's synchronous, I don't know about the destruction order of the parameter
    given from A to B.

I think I already answered the concerns... but cannot find my answer.
The definition(s) in mailnews/mapi/mapihook/build/msgMapi.idl are supposed to follow the MS definition in <MAPI.h> to produce identical memory layout. This, of course, needs checking; but I don't see the apparent differences right away. The definitions of MapiMessage and MapiMessageW in the <MAPI.h> are similar - only differ by using W-types for strings and pointed structs; and both e.g. have no explicit alignment specifiers. The same is true for nsMapiMessage and nsMapiMessageW pair in our IDL. So - if the existing implementation of MAPISendMail in TB works reliably (which means that it has matching definition of its nsMapiMessage struct), then I cannot see how the nsMapiMessageW be different.

Additionally, it cannot happen that the definition be correct for some calling applications, but wrong for others, since the same binary MAPI DLL is used everywhere, and it expects to find same data members at same physical address offsets.

The COM calls are synchronous: they end when e.g. CMapiImp::SendMailW returns (the return value is passed back to the caller) - can you imagine another way to get a return value from that? The COM calls are managed by TB code from both sides: TB MAPI DLL on one side, TB main process on the other side. There's nothing that can make the calls to differ in different times. And additionally - the COM calls use marshaler, which doesn't simply pass the data - which would give invalid pointer addresses in the context of COM server (TB) application: you cannot take a struct and pass it to a different process intact, and expect that addresses of its pointers would still be valid and point to correct data. The marshaling process uses the data definitions we have in IDL (and we generate special DLL for that) to understand how to copy data from structs passed from one address space into similar structs in another address space. The data arriving to TB through COM is a copy of what has been passed from MAPI DLL. The copy is memory-managed by marshaler, and is freed after the COM call returns. Its pointers are different from what was passed from DLL - you may simply debug that, and point to copies.

I suppose that the only way to find out where the problem is would be enabling logging here in released code, and ask that output along with the crash logs.

Deep-copy to account for suspected differences between MapiMessageW and nsMapiMessageW (so that the latter is only used internally, but not as the type of data passed to MAPISendMailW) would be inferior to some static asserts that would check that sizes and alignment of all struct members are the same; something like

static_assert(sizeof(nsMapiMessageW) == sizeof(MapiMessageW));

and similar checks that offsets of struct members are same:

#include <cstddef>
static_assert(offsetof(nsMapiMessageW, ulReserved) == offsetof(MapiMessageW, ulReserved));
static_assert(offsetof(nsMapiMessageW, lpszSubject) == offsetof(MapiMessageW, lpszSubject));

... for each struct and member would safeguard the guarantee.

Comment 10

2 months ago

Actually, my suspicion would be using some components from different versions: like type definition library (and maybe MAPI DLL) from one version, and TB main process from another. If there were mismatch in type definition library, then of course it could pass data to TB with a different layout than the one expected by TB. But no amount of copying would help in that case.

The problem could arise if we had released a version with a definition of nsMapiMessageW, and then released another version with slightly different definition, and then upgrading TB would leave the type library until reboot (because it was in use). In that case, the problem would only happen before reboot.

Comment 11

2 months ago

This is not a fix of course, this is just a series of static asserts that would guarantee proper layout of the structs.

Comment 12

2 months ago
Comment on attachment 9044579 [details] [diff] [review]
static_asserts.diff

I suppose this compiles?

How about my suggestion to disable MAPISendMailW on the next beta?

(In reply to Mike Kaganski from comment #9)
> So - *if* the existing implementation of MAPISendMail in TB works reliably ...
The clean-up-fix for bug 393302was already shipped in TB 65 beta 3, and the W-API implementation followed in beta 4, see https://www-stage.thunderbird.net/en-US/thunderbird/65.0beta/releasenotes/.

So no alarm bells rang for TB 65 beta 3 which confirms/illustrates/proves that the clean-up fix was right, at least for the narrow interface.

Most applications, apart from M$ Word (see bug 1521007 comment #0), prefer the W-API, so dropping it would force us back to the known-working narrow API.

Comment 13

2 months ago

Actually, it doesn't compile at all :-(

Comment 14

2 months ago

(In reply to Jorg K (GMT+1) from comment #12)

I suppose this compiles?

(In reply to Jorg K (GMT+1) from comment #13)

Actually, it doesn't compile at all :-(

It does, if you follow the comment there, which says that you need to ensure that MAPI.h from Outlook 2000 MAPI header package is not used. The said package (that is required for our build, and is mentioned in Simple_Thunderbird_build) includes old version of MAPI.h, not having W-API. But Windows SDK has its own MAPI.h in C:\Program Files (x86)\Windows Kits\10\Include\10.0.nnnnn.0\um (as opposed to \shared where it's advised to put the Outlook 2000 headers), with the said API; and the \shared directory is included prior to \um; so just removing MAPI.h from \shared would allow to build that.

By the way, in the MAPI code, all instances of inclusions of <mapidefs.h> from that package are safe to replace with

#if !defined WIN32_LEAN_AND_MEAN
#define WIN32_LEAN_AND_MEAN
#endif
#include <windows.h>

which would reduce (but not eliminate) the dependency on it.

Also I'd suggest to change the recommendations on Simple_Thunderbird_build, to copy the headers to \um instead of \shared, and not overwrite MAPI.h when prompted. That would make preparing the configuration more robust.

How about my suggestion to disable MAPISendMailW on the next beta?

(In reply to Mike Kaganski from comment #9)

So - if the existing implementation of MAPISendMail in TB works reliably ...
The clean-up-fix for bug 393302was already shipped in TB 65 beta 3, and the
W-API implementation followed in beta 4, see
https://www-stage.thunderbird.net/en-US/thunderbird/65.0beta/releasenotes/.

So no alarm bells rang for TB 65 beta 3 which confirms/illustrates/proves
that the clean-up fix was right, at least for the narrow interface.

Most applications, apart from M$ Word (see bug 1521007 comment #0), prefer
the W-API, so dropping it would force us back to the known-working narrow
API.

I suppose that that question is not to me; I cannot suggest anything regarding the releases and what should go where - but of course, I don't have any objections there: the main priority for any software is being reliable. Polishing may be continued in a later releases.

Comment 15

2 months ago

(In reply to Mike Kaganski from comment #10)

The problem could arise if we had released a version with a definition of nsMapiMessageW, ...

To my knowledge, nsMapiMessageW was shipped in bug 1048658 and not changed.

I still think our best bet is to disable the W-API again.

Comment 16

2 months ago

(In reply to Mike Kaganski from comment #14)

It does, if you follow the comment there, ...

OK, apart from 200 BMO comments daily I need to study all the patches carefully now ;-( - Yes, it does compile, so we got the structures right.

So is temporarily dropping the W-API a good idea?

Comment 17

2 months ago

(In reply to Jorg K (GMT+1) from comment #16)

So is temporarily dropping the W-API a good idea?

+1 from me

Comment 18

2 months ago

Like this?

Attachment #9044652 - Flags: feedback?(mikekaganski)

Comment 19

2 months ago
Comment on attachment 9044652 [details] [diff] [review]
1527450-disable-MAPISendMailW.patch

Review of attachment 9044652 [details] [diff] [review]:
-----------------------------------------------------------------

Exactly!
Attachment #9044652 - Flags: feedback?(mikekaganski) → feedback+

Comment 20

2 months ago

TB 66 beta 2:
https://hg.mozilla.org/releases/comm-beta/rev/ad9ab2307818aab0702701b46c89ffa7a042355d

Disabled MAPISendMailW on beta. That will stop the crash, but doesn't help fixing the issue.

Comment 21

2 months ago

I think we understand now what happens. The automatic updated does NOT replace the MAPI DLLs that interface with Windows or an external program. Result: And old MAPI DLL tries to interface with a changed API on the Thunderbird side.

Great. See bug 1531869.

Comment 23

a month ago

I don't see any crashes with TB 66 beta 3 any more, build ID of that beta is 20190306001450. I can only see crashes of installs before that date or after that date bug before that build, for example here bp-20ffe866-43d0-4cde-9fec-90cb40190323 someone installed 20190209112700 two days ago :-(

Mike you feel like landing your "static assert" patch. It wouldn't hurt.

Flags: needinfo?(mikekaganski)
Target Milestone: --- → Thunderbird 66.0

Updated

a month ago
Status: NEW → RESOLVED
Last Resolved: a month ago
Resolution: --- → FIXED

Comment 24

a month ago

(In reply to Jorg K (GMT+1) from comment #23)

Mike you feel like landing your "static assert" patch. It wouldn't hurt.

I agree it won't hurt (although the problem indeed was using different components of different versions, as I suspected); actually, given that we really must ensure that the layout is identical to that of MS structs, it could prevent some inadvertent breakage here.

Flags: needinfo?(mikekaganski)

Comment 25

a month ago

OK, I need to see what MAPI headers are used in automation and whether this compiles given my negative experience in comment #14.

Comment 27

a month ago

Didn't compile, we need to get that fixed in bug 1538567.

(Reporter)

Comment 28

21 days ago

(In reply to Jorg K (GMT+2) from comment #23)

I don't see any crashes with TB 66 beta 3 any more, build ID of that beta is 20190306001450. I can only see crashes of installs before that date or after that date bug before that build, for example here bp-20ffe866-43d0-4cde-9fec-90cb40190323 someone installed 20190209112700 two days ago :-(

confirmed, almost no crashes with builds newer than 20190306001450. One of the "almosts" is bp-4f3ce9fa-cfe2-4ccf-8a2f-faf4f0190403 in 67.0b1

Status: RESOLVED → VERIFIED

Comment 29

21 days ago

Well, msgMapiHook.cpp:716 is:

  if (aMessage->lpOriginator && aMessage->lpOriginator->lpszAddress)

I cannot crash there since the function isn't called if aMessage is null. So this looks to me like an install problem for the MAPI libraries which caused the crash in the first place.

Please let me know of more "almosts" show up.

Comment 30

9 days ago
Comment on attachment 9044579 [details] [diff] [review]
static_asserts.diff

Let's take this now that it compiles in automation.
Attachment #9044579 - Flags: review+

Comment 31

9 days ago
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/52fc34de850f
Add static asserts to ensure MAPI structures have the same size and alignment as their MS counterparts. r=jorgk

Updated

9 days ago
Target Milestone: Thunderbird 66.0 → Thunderbird 68.0
You need to log in before you can comment on or make changes to this bug.