Closed Bug 1890700 Opened 1 year ago Closed 1 year ago

Permanent startup Win32 application crashed [@ None + None] and [@ nsXPCWrappedJS::QueryInterface] with EXCEPTION_ACCESS_VIOLATION_EXEC

Categories

(Thunderbird :: General, defect, P1)

Thunderbird 126

Tracking

(thunderbird_esr115 unaffected, thunderbird125 unaffected, thunderbird126 affected, thunderbird127 fixed)

VERIFIED FIXED
127 Branch
Tracking Status
thunderbird_esr115 --- unaffected
thunderbird125 --- unaffected
thunderbird126 --- affected
thunderbird127 --- fixed

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(5 keywords, Whiteboard: [fixed by bug 1892022])

Crash Data

Attachments

(2 obsolete files)

Filed by: geoff [at] darktrojan.net
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=454113357&repo=comm-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CWfHh4c5TCK6HMInFKkBUQ/runs/0/artifacts/public/logs/live_backing.log


I can't see any reason for it, but Thunderbird tests on Windows 32-bit are failing after bug 1873386.

Zombie, can you think of any reason why your patch would cause us to fail spectacularly on Win32 but not on other platforms?

Flags: needinfo?(tomica)
Assignee: nobody → john
Status: NEW → ASSIGNED
Keywords: leave-open
Status: ASSIGNED → NEW
Assignee: john → nobody
Pushed by john@thunderbird.net: https://hg.mozilla.org/comm-central/rev/253adcad1285 Disable Daily builds due to crashes. r=darktrojan

(In reply to Geoff Lankow (:darktrojan) from comment #1)

Zombie, can you think of any reason why your patch would cause us to fail spectacularly on Win32 but not on other platforms?

Sorry, I have no ideas to contribute on this, I'm totally unfamiliar with all the interfaces changed in that patch, I was just fixing an inconsistency in XPIDL between what members are exposed, and which XPCOM types can be exposed to js.

Though most of the changes involved adding [noscript] to interface members, which would just make them invisible to js, not something I would expect to be able to crash.

The two things I would suggest to look at first are the nsIThreadInternal interface which was made not [scriptable], and the removal of InputAvailable/nsIInputAvailableCallback from nsIStreamTransportService.

Flags: needinfo?(tomica)

Would you be able to provide a WIP patch, or even better a complete one if you already know the areas that need to be targeted?

Flags: needinfo?(tomica)

I discovered the problem is that we implement nsIClassInfo in JS for calendar and chat classes, but now getScriptableHelper is marked [noscript] we can't implement that, not that we were using it. I still have no idea why only Windows 32-bit cares about this.

The options I can see here are:

  • Stop using nsIClassInfo in calendar and chat. This would mean adding a lot of QueryInterface calls and would be prone to many regressions where we use the presence or absence of a object's fields to determine what to do with it.
  • Figure out why getScriptableHelper needs to be marked [noscript] and change that.
  • Try to avoid the access violation somehow. I'm not hopeful this can happen, but only one platform has a problem, so there's a chance.

Option 2 can fix the problem, but whether it is a good idea or not, I do not know.

(In reply to Geoff Lankow (:darktrojan) from comment #7)

Option 2 can fix the problem, but whether it is a good idea or not, I do not know.

I don't know enough about xpcom to judge this, but you could put up a patch and tag xpcom-reviewers for feedback.

If they're not open to that, perhaps you could have a thrunderbird-only interface nsIXPCScriptableTB : nsIXPCScriptable, mark that one as [scriptable], and then have an #ifdef thunderbird method nsIClassInfo::getScriptableTBHelper.

Flags: needinfo?(tomica)
Version: unspecified → Thunderbird 126

As of today this also affects beta.

Severity: -- → S1
Priority: -- → P1

Very odd. Have you tried deleting the two empty getScriptableHelper implementations? Maybe that will help somehow.

Peter, any ideas about what could be going wrong here? Maybe somehow this is related to the specific quirks of ClassInfo? Thanks.

Flags: needinfo?(peterv)
Backout by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/311b7ba5a87f Backed out changeset 253adcad1285 - enable daily builds. r=backout a=wsmwk
Attachment #9395963 - Attachment is obsolete: true

(In reply to Andrew McCreight [:mccr8] from comment #12)

Very odd. Have you tried deleting the two empty getScriptableHelper implementations? Maybe that will help somehow.

Yes, and it didn't help.

Thanks! I see "can QI to nsIClassInfo" in the log, so I guess either the registerFactory() or createInstance() calls are failing.

Do any of you have a Windows development environment handy and can get a decent stack for the crash? The Thunderbird crash stack looked too inlined to figure out where in DelegatedQueryInterface it was actually crashing, and the two XPCShell crash seem to have almost no stack at all.

Does it help if the nsIClassInfo has the SINGLETONflag set?

Nika and I looked at this again. Our theory is that the extra JSON trimming that bug 1873386 added trimmed out too much, because the getScriptableHelper entry in xptdata looks bogus. I'll put together a patch.

Flags: needinfo?(peterv)

Wait, never mind. That comment on the patch is obsolete, and the patch as it is shouldn't trim anything extra. But maybe there is some existing latent bug in the xptdata generation.

Thanks for looking into this, we really appreciate.
We were wondering, if the fix could take a bit to figure out would it be possible to back-out the regression only on beta from the m-c side?
At least that will allow us to ship beta for Windows 32 and we can continue testing things on Daily.
We're okay with not shipping Win 32 daily for now.

Flags: needinfo?(continuation)

(In reply to Peter Van der Beken [:peterv] from comment #18)

Does it help if the nsIClassInfo has the SINGLETONflag set?

Nope. No difference.

Testing nightly I've determined that after the first crash 32bit also won't start even in troubleshoot mode. In fact I don't get even troubleshoot mode dialog. So a user will be stuck in endless startup crashes.

So I think we need to completely avoid building 32bit beta, not just not ship it until this is fixed.

My second and subsequent crashes are nsXPCWrappedJS::QueryInterface bp-381204b5-9b5e-4c55-8489-b77460240416. Which is currently a top crash for 127.0a1 since we reenabled building nightlies yesterday.

Crash Signature: [@ None + None] → [@ None + None] [@ nsXPCWrappedJS::QueryInterface]
Flags: needinfo?(rob)
Summary: Permanent Win32 application crashed [@ None + None] with EXCEPTION_ACCESS_VIOLATION_EXEC → Permanent Win32 application crashed [@ None + None] and [@ nsXPCWrappedJS::QueryInterface] with EXCEPTION_ACCESS_VIOLATION_EXEC
Depends on: 1891989
Flags: needinfo?(rob)
See Also: → 1892022

I filed bug 1891989 for a more targeted backout. It is a bit bizarre so we'll see if Nika is okay with that or not. I also filed bug 1892022 for the general issue, as we probably want to get the Thunderbird problem fixed up without figuring out what is going on exactly.

Flags: needinfo?(continuation)

Follow up to comment 28, Thunderbird is currently not building beta because of this issue, i.e. shipping beta 126 is currently blocked because we don't want users to get a fatally flawed 32bit build, manually or automatically.

If the better fix in bug 1892022 cannot be on mozilla-beta for Monday, can bug 1873386 be backed out on beta? Or some other relief?

Flags: needinfo?(continuation)

Nika, is there some fix we can get on beta by Monday that would be okay with you? Thanks.

Flags: needinfo?(continuation) → needinfo?(nika)

I've put in an uplift request for beta: https://bugzilla.mozilla.org/show_bug.cgi?id=1892022#c10

Flags: needinfo?(nika)

Thanks Nika!

To anyone who has seen this problem, the patch can be tested using the most recent nightly build from https://archive.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-127.0a1.en-US.win32.installer.exe

Attachment #9396356 - Attachment is obsolete: true

Hello,

I installed the latest Nightly build from the Archive on Windows 10x86(22h2 - build 19045) and did not encounter any crashes, the build worked without any issues.

The scenarios used for this confirmation consist of:

  • Writing and sending a new email (with and without attachments)
  • Downloading different size attachments
  • Adding multiple email accounts from different email providers(Google, Yahoo)
  • Creating/Deleting Folders
  • Moving/Deleting Emails
  • Restarting TB
  • Leaving it in idle while downloading emails from server (with/without OS locked screen)

It is worth mentioning that I did not manage to reproduce the crash on older (Nightly or beta)builds with the same OS. Are there any repro steps in order to trigger this issue, or maybe a test file?

Are there any repro steps in order to trigger this issue, or maybe a test file?

It crashes on startup. So just starting it is a sufficient test. Thanks

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Summary: Permanent Win32 application crashed [@ None + None] and [@ nsXPCWrappedJS::QueryInterface] with EXCEPTION_ACCESS_VIOLATION_EXEC → Permanent startup Win32 application crashed [@ None + None] and [@ nsXPCWrappedJS::QueryInterface] with EXCEPTION_ACCESS_VIOLATION_EXEC
Whiteboard: [fixed by bug 1892022]
Target Milestone: --- → 127 Branch

No version 128 crashes per crash stats

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: