(In reply to Kai Engert (:kaie:) from comment #6)
We have a type mismatch. The Windows callback definition from
uses a different type for the lpMessage parameters.
It uses the MapiMessage / MapiMessageW types defined by the Windows API.
It seems wrong that our code had replaced that with our own type
nsMapiMessage / nsMapiMessageW
Apparently the intention of the code author was to use a single type that works for both the incoming call from Windows, and for the outgoing call to our own definition of the NSIMapi interface.
Although the definition of our own nsMapiMessage* types attempt to be a very close match to the Windows MapiMessage* types, I'm not convinced the way that type is used is safe.
During this function call, IIUC we have two separate process contexts.
Context A - TB's MapiDll
- Incoming call from Windows, we are given a parameter of type MapiMessage*.
- It's incoming to TB's separate MapiDll that is registered with the Windows registry.
- I think we're supposed to treat this as read-only.
Process context A makes a cross-process call into context B.
Context B - Thunderbird process
- Incoming interprocess call, that our MapiDll initiated
- calls the implementation of our own definition of msgMapi.SendMail*
I see the following potential issues with the call from A to B:
we don't know if the call is synchronous or asynchronous.
Unless anyone can tell us what exactly happens, it could be sometimes sync, sometimes async,
or whether it's sync/async might even depend on the Windows version.
If it's asynchronous, then context A might terminate as soon as the function
call to context B has been issued, which means that the original parameter
given to context A is deleted.
This means that whenever context B gets called in an async way, context B
operates on memory that might have been freed/destroyed already.
even if it's synchronous, I don't know about the destruction order of the parameter
given from A to B.
I think I already answered the concerns... but cannot find my answer.
The definition(s) in mailnews/mapi/mapihook/build/msgMapi.idl are supposed to follow the MS definition in <MAPI.h> to produce identical memory layout. This, of course, needs checking; but I don't see the apparent differences right away. The definitions of MapiMessage and MapiMessageW in the <MAPI.h> are similar - only differ by using W-types for strings and pointed structs; and both e.g. have no explicit alignment specifiers. The same is true for nsMapiMessage and nsMapiMessageW pair in our IDL. So - if the existing implementation of MAPISendMail in TB works reliably (which means that it has matching definition of its nsMapiMessage struct), then I cannot see how the nsMapiMessageW be different.
Additionally, it cannot happen that the definition be correct for some calling applications, but wrong for others, since the same binary MAPI DLL is used everywhere, and it expects to find same data members at same physical address offsets.
The COM calls are synchronous: they end when e.g. CMapiImp::SendMailW returns (the return value is passed back to the caller) - can you imagine another way to get a return value from that? The COM calls are managed by TB code from both sides: TB MAPI DLL on one side, TB main process on the other side. There's nothing that can make the calls to differ in different times. And additionally - the COM calls use marshaler, which doesn't simply pass the data - which would give invalid pointer addresses in the context of COM server (TB) application: you cannot take a struct and pass it to a different process intact, and expect that addresses of its pointers would still be valid and point to correct data. The marshaling process uses the data definitions we have in IDL (and we generate special DLL for that) to understand how to copy data from structs passed from one address space into similar structs in another address space. The data arriving to TB through COM is a copy of what has been passed from MAPI DLL. The copy is memory-managed by marshaler, and is freed after the COM call returns. Its pointers are different from what was passed from DLL - you may simply debug that, and point to copies.
I suppose that the only way to find out where the problem is would be enabling logging here in released code, and ask that output along with the crash logs.
Deep-copy to account for suspected differences between MapiMessageW and nsMapiMessageW (so that the latter is only used internally, but not as the type of data passed to MAPISendMailW) would be inferior to some static asserts that would check that sizes and alignment of all struct members are the same; something like
static_assert(sizeof(nsMapiMessageW) == sizeof(MapiMessageW));
and similar checks that offsets of struct members are same:
static_assert(offsetof(nsMapiMessageW, ulReserved) == offsetof(MapiMessageW, ulReserved));
static_assert(offsetof(nsMapiMessageW, lpszSubject) == offsetof(MapiMessageW, lpszSubject));
... for each struct and member would safeguard the guarantee.