Crash in [@ BaseThreadInitThunk]
Categories
(Core :: mozglue, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | unaffected |
firefox104 | --- | unaffected |
firefox105 | --- | unaffected |
firefox106 | blocking | fixed |
People
(Reporter: aryx, Unassigned)
References
Details
(Keywords: crash)
Crash Data
10 crashes from 8 Windows 10 installations, all with the latest Nightly (106.0a1 20220830210405). There was an isolated crash report with this signature for Nightly in July. We still see this crash signature occasionally for release and rarely for Nightly.
Michael, could you take a look if this could be related to the WebRTC update (bug 1766646 etc.)? For the record, the other changes in this Nightly are https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=ecb328de1aafc36765b3bbf7f434ef84d93cad28
See bug 1740627 for a former instance of this signature which got fixed.
Crash report: https://crash-stats.mozilla.org/report/index/2ef9ff48-a7cc-4843-8781-26e180220831
Reason: EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE
Top 8 frames of crashing thread:
0 ntdll.dll LdrpICallHandler
1 ntdll.dll RtlpExecuteHandlerForException
2 ntdll.dll RtlDispatchException
3 ntdll.dll KiUserExceptionDispatch
4 ntdll.dll LdrpDispatchUserCallTarget
5 kernel32.dll BaseThreadInitThunk
6 mozglue.dll patched_BaseThreadInitThunk toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp:581
7 ntdll.dll RtlUserThreadStart
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::mozglue' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
The bug is marked as tracked for firefox106 (nightly). However, the bug still isn't assigned.
:Sylvestre, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Comment 3•2 years ago
|
||
Redirecting to the managers who knows much more than I about the init system
(not sure that bugbug is correct on the component)
Comment 4•2 years ago
|
||
given the volume, I guess it is a new issue
Comment 5•2 years ago
|
||
The previous instance was a hardening against badly written injections. However here I see reports (at least on Nightly) with no obvious 3rd party modules and with a large uptime (the last may not necessarily be inconsistent, as it's happening on thread launch):
https://crash-stats.mozilla.org/report/index/60b82f2c-c17c-40ab-abc6-310230220831
https://crash-stats.mozilla.org/report/index/3e91f2ac-b3fd-49d2-9d9c-2182c0220831
https://crash-stats.mozilla.org/report/index/043c2de5-5718-46e0-be71-08b070220831
https://crash-stats.mozilla.org/report/index/655295c5-33b8-4d24-8362-5289e0220831
Unfortunately there are no correlations available on crash-stats.
There is no relation between this crash and the changes in the regression range, so either we crash due to earlier stack bustage elsewhere, or this is third party stuff anyway.
Not sure this will be very actionable.
Comment 6•2 years ago
|
||
This is turning into a new top crasher on Nightly, there is another similar new signature that spiked with the same buildID, still Windows only.
Setting as P1/S1
Updated•2 years ago
|
Updated•2 years ago
|
Comment 7•2 years ago
|
||
I don't have anything to add here. Can we tell what thread was being started from the stacks of other threads and see if there's any commonality?
Comment 8•2 years ago
|
||
Can we tell what thread was being started from the stacks of other threads and see if there's any commonality?
No. Looking at the active thread (if any) also doesn't show any obvious commonality to me.
I notice that these reports don't seem to have memory information attached. Is this because EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE
is a WER-caught error and we don't have that info there?
Comment 9•2 years ago
|
||
Jim, this is P1/S1 now and the WebRTC update is in the regression range. There's no other obvious (or even not so obvious?) change that can cause this, and we have very limited info to go on here, so making sure this is high on your radar.
Comment 10•2 years ago
|
||
Additionally, I was wondering if these could correlate with OOM reduction (bug 1716727), but that was first enabled in 105 and the majority of these crashes are in 106.
Comment 11•2 years ago
|
||
Several user comments mention that the crash happens when you close a tab which had Netflix playing:
"Crash when closing the tab while playing Netflix "
"Crashes have occurred a few times, when closing a Tab with Netflix open. "
"If I close the tab while playing any series or movie from the Netflix site, the application crashes."
Comment 12•2 years ago
|
||
Do we know if there's been any CDM updates in that timeframe?
Comment 13•2 years ago
|
||
This is a reminder regarding comment #2!
The bug is marked as blocking firefox106 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
![]() |
||
Comment 14•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)
Do we know if there's been any CDM updates in that timeframe?
We haven't updated the cdm in this time frame. We do have a major libwebrtc update in this time frame, as well as a security related fix in media code. Pretty tough diagnosing this though, there's nothing of much value in the stacks.
Comment 15•2 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #14)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)
Do we know if there's been any CDM updates in that timeframe?
We haven't updated the cdm in this time frame. We do have a major libwebrtc update in this time frame, as well as a security related fix in media code. Pretty tough diagnosing this though, there's nothing of much value in the stacks.
Have we tried reproducing the crash by closing a tab where Netflix is playing?
![]() |
||
Comment 16•2 years ago
|
||
Exception thrown at 0x00007FF9FA3D3B6E (ntdll.dll) in firefox.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
This is easy to reproduce. Open any netflix stream and close the tab after it starts playing. Will try to find a regression range.
![]() |
||
Comment 17•2 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #16)
Exception thrown at 0x00007FF9FA3D3B6E (ntdll.dll) in firefox.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
This is easy to reproduce. Open any netflix stream and close the tab after it starts playing. Will try to find a regression range.
Hmm, and now I can't reproduce in the same nightly version.
![]() |
||
Comment 18•2 years ago
|
||
Not having any luck reproducing reliably which is preventing generation of a regression range.
![]() |
||
Updated•2 years ago
|
![]() |
||
Comment 19•2 years ago
•
|
||
This crash signature might be related - https://bugzilla.mozilla.org/show_bug.cgi?id=1788592
Comment 20•2 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #19)
This crash signature might be related - https://crash-stats.mozilla.org/signature/?product=Firefox&signature=__delayLoadHelper2%20%7C%20_tailMerge_oleaut32.dll%20%7C%20_tailMerge_d3dcompiler_47.dll%20%7C%20%3CT%3E%3A%3Aoperator%28%29%20%7C%20__crt_seh_guarded_call%3CT%3E%3A%3Aoperator%28%29%3CT%3E&date=%3E%3D2022-06-06T16%3A28%3A00.000Z&date=%3C2022-09-06T16%3A28%3A00.000Z#summary
This is bug 1788592 (See Also)
Comment 21•2 years ago
|
||
This is a reminder regarding comment #2!
The bug is marked as blocking firefox106 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.
Comment 22•2 years ago
|
||
"closing the tab while playing Netflix" is pretty consistent with the GMP process shutting down. Bug 1788592 has _exit(0) in the GMP process on the stack, so I wouldn't be surprised if they are related.
![]() |
||
Comment 23•2 years ago
|
||
The regression here appears to be fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=1788592
I'm not going to dupe this since the signature has been around for a while. Will leave it to release drivers to decide what to do with this bug.
![]() |
||
Comment 24•2 years ago
•
|
||
Hey gcp, any chance your Windows experts can comment on this? We addressed this by preloading oleaut in gmp, but we really don't understand what triggered the issue in the first place.
Comment 25•2 years ago
•
|
||
I'm going to assume relman will be able to drop the severity and we don't need to immediately figure out who can dive more deeply into this. But we'll get to this.
Comment 26•2 years ago
|
||
We stopped crashing on Nightly after the patch in bug 1788592 landed, I am marking it fixed for the 106 release, lowering severity and removing the P1 flag.
Comment 27•2 years ago
•
|
||
To summarize findings from bug 1788592:
- These crashes were a consequence of linking statically with the library
comsupp.lib
when buildingxul.dll
.oleaut32.dll
is an implicit dependency used at exit when usingcomsupp.lib
, and we specify thatoleaut32.dll
should be delay-loaded when buildingxul.dll
. Exiting theplugin-container.exe
process results in calling thedynamic_atexit_destructor_for_'vtMissing'
fromcomsupp.lib
, which calls the delay-import forVariantClear
fromoleaut32.dll
, resulting in trying to loadoleaut32.dll
, and failing at that because the sandbox is active and we cannot read the library file on disk. - There are two possible fixes: (1) we could keep pre-loading
oleaut32.dll
which is the current fix, or (2) we could fall back to not usingcomsupp.lib
inxul.dll
and add continuous integration checks that we don't statically link withcomsupp.lib
when buildingxul.dll
(see bug 1788592 for details).
Comment 28•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 29•2 years ago
|
||
I think we should close out this bug and reopen a new one for the crashes that remain. They seem unrelated to the webrtc problem that was fixed here.
Description
•