Permanent startup Win32 application crashed [@ None + None] and [@ nsXPCWrappedJS::QueryInterface] with EXCEPTION_ACCESS_VIOLATION_EXEC
Categories
(Thunderbird :: General, defect, P1)
Tracking
(thunderbird_esr115 unaffected, thunderbird125 unaffected, thunderbird126 affected, thunderbird127 fixed)
| Tracking | Status | |
|---|---|---|
| thunderbird_esr115 | --- | unaffected |
| thunderbird125 | --- | unaffected |
| thunderbird126 | --- | affected |
| thunderbird127 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Regression)
Details
(5 keywords, Whiteboard: [fixed by bug 1892022])
Crash Data
Attachments
(2 obsolete files)
Filed by: geoff [at] darktrojan.net
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=454113357&repo=comm-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CWfHh4c5TCK6HMInFKkBUQ/runs/0/artifacts/public/logs/live_backing.log
I can't see any reason for it, but Thunderbird tests on Windows 32-bit are failing after bug 1873386.
Comment 1•1 year ago
|
||
Zombie, can you think of any reason why your patch would cause us to fail spectacularly on Win32 but not on other platforms?
Comment 2•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 4•1 year ago
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #1)
Zombie, can you think of any reason why your patch would cause us to fail spectacularly on Win32 but not on other platforms?
Sorry, I have no ideas to contribute on this, I'm totally unfamiliar with all the interfaces changed in that patch, I was just fixing an inconsistency in XPIDL between what members are exposed, and which XPCOM types can be exposed to js.
Though most of the changes involved adding [noscript] to interface members, which would just make them invisible to js, not something I would expect to be able to crash.
The two things I would suggest to look at first are the nsIThreadInternal interface which was made not [scriptable], and the removal of InputAvailable/nsIInputAvailableCallback from nsIStreamTransportService.
Comment 5•1 year ago
|
||
Would you be able to provide a WIP patch, or even better a complete one if you already know the areas that need to be targeted?
Comment 6•1 year ago
|
||
I discovered the problem is that we implement nsIClassInfo in JS for calendar and chat classes, but now getScriptableHelper is marked [noscript] we can't implement that, not that we were using it. I still have no idea why only Windows 32-bit cares about this.
The options I can see here are:
- Stop using nsIClassInfo in calendar and chat. This would mean adding a lot of
QueryInterfacecalls and would be prone to many regressions where we use the presence or absence of a object's fields to determine what to do with it. - Figure out why
getScriptableHelperneeds to be marked[noscript]and change that. - Try to avoid the access violation somehow. I'm not hopeful this can happen, but only one platform has a problem, so there's a chance.
Comment 7•1 year ago
|
||
Option 2 can fix the problem, but whether it is a good idea or not, I do not know.
Comment 8•1 year ago
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #7)
Option 2 can fix the problem, but whether it is a good idea or not, I do not know.
I don't know enough about xpcom to judge this, but you could put up a patch and tag xpcom-reviewers for feedback.
If they're not open to that, perhaps you could have a thrunderbird-only interface nsIXPCScriptableTB : nsIXPCScriptable, mark that one as [scriptable], and then have an #ifdef thunderbird method nsIClassInfo::getScriptableTBHelper.
Comment 9•1 year ago
|
||
| Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment 11•1 year ago
|
||
As of today this also affects beta.
Comment 12•1 year ago
|
||
Very odd. Have you tried deleting the two empty getScriptableHelper implementations? Maybe that will help somehow.
Comment 13•1 year ago
|
||
Peter, any ideas about what could be going wrong here? Maybe somehow this is related to the specific quirks of ClassInfo? Thanks.
Comment 14•1 year ago
|
||
Updated•1 year ago
|
Comment 15•1 year ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #12)
Very odd. Have you tried deleting the two empty getScriptableHelper implementations? Maybe that will help somehow.
Yes, and it didn't help.
Comment 16•1 year ago
|
||
I made a minimal test that fails: https://hg.mozilla.org/try/rev/973207fc4f1bb688db7a42f03783f558f0d05a25
Comment 17•1 year ago
|
||
Thanks! I see "can QI to nsIClassInfo" in the log, so I guess either the registerFactory() or createInstance() calls are failing.
Do any of you have a Windows development environment handy and can get a decent stack for the crash? The Thunderbird crash stack looked too inlined to figure out where in DelegatedQueryInterface it was actually crashing, and the two XPCShell crash seem to have almost no stack at all.
Comment 18•1 year ago
|
||
Does it help if the nsIClassInfo has the SINGLETONflag set?
Comment 19•1 year ago
|
||
Nika and I looked at this again. Our theory is that the extra JSON trimming that bug 1873386 added trimmed out too much, because the getScriptableHelper entry in xptdata looks bogus. I'll put together a patch.
Comment 20•1 year ago
|
||
Wait, never mind. That comment on the patch is obsolete, and the patch as it is shouldn't trim anything extra. But maybe there is some existing latent bug in the xptdata generation.
Comment 21•1 year ago
|
||
Thanks for looking into this, we really appreciate.
We were wondering, if the fix could take a bit to figure out would it be possible to back-out the regression only on beta from the m-c side?
At least that will allow us to ship beta for Windows 32 and we can continue testing things on Daily.
We're okay with not shipping Win 32 daily for now.
Comment 22•1 year ago
|
||
(In reply to Peter Van der Beken [:peterv] from comment #18)
Does it help if the
nsIClassInfohas theSINGLETONflag set?
Nope. No difference.
Comment 23•1 year ago
|
||
| STR | ||
Testing nightly I've determined that after the first crash 32bit also won't start even in troubleshoot mode. In fact I don't get even troubleshoot mode dialog. So a user will be stuck in endless startup crashes.
So I think we need to completely avoid building 32bit beta, not just not ship it until this is fixed.
My second and subsequent crashes are nsXPCWrappedJS::QueryInterface bp-381204b5-9b5e-4c55-8489-b77460240416. Which is currently a top crash for 127.0a1 since we reenabled building nightlies yesterday.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 24•1 year ago
|
||
I filed bug 1891989 for a more targeted backout. It is a bit bizarre so we'll see if Nika is okay with that or not. I also filed bug 1892022 for the general issue, as we probably want to get the Thunderbird problem fixed up without figuring out what is going on exactly.
Comment 25•1 year ago
|
||
Follow up to comment 28, Thunderbird is currently not building beta because of this issue, i.e. shipping beta 126 is currently blocked because we don't want users to get a fatally flawed 32bit build, manually or automatically.
If the better fix in bug 1892022 cannot be on mozilla-beta for Monday, can bug 1873386 be backed out on beta? Or some other relief?
Comment 26•1 year ago
|
||
Nika, is there some fix we can get on beta by Monday that would be okay with you? Thanks.
Comment 27•1 year ago
|
||
I've put in an uplift request for beta: https://bugzilla.mozilla.org/show_bug.cgi?id=1892022#c10
Comment 28•1 year ago
|
||
Thanks Nika!
To anyone who has seen this problem, the patch can be tested using the most recent nightly build from https://archive.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-127.0a1.en-US.win32.installer.exe
Updated•1 year ago
|
Comment 29•1 year ago
|
||
Hello,
I installed the latest Nightly build from the Archive on Windows 10x86(22h2 - build 19045) and did not encounter any crashes, the build worked without any issues.
The scenarios used for this confirmation consist of:
- Writing and sending a new email (with and without attachments)
- Downloading different size attachments
- Adding multiple email accounts from different email providers(Google, Yahoo)
- Creating/Deleting Folders
- Moving/Deleting Emails
- Restarting TB
- Leaving it in idle while downloading emails from server (with/without OS locked screen)
It is worth mentioning that I did not manage to reproduce the crash on older (Nightly or beta)builds with the same OS. Are there any repro steps in order to trigger this issue, or maybe a test file?
Comment 30•1 year ago
|
||
Are there any repro steps in order to trigger this issue, or maybe a test file?
It crashes on startup. So just starting it is a sufficient test. Thanks
Updated•1 year ago
|
Description
•