<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 1

•

3 years ago

The Bugbug bot thinks this bug should belong to the 'Core::mozglue' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → mozglue

Product: Firefox → Core

Updated

•

3 years ago

tracking-firefox106: --- → +

Sylvestre Ledru [:Sylvestre]

Comment 2

•

3 years ago

The bug is marked as tracked for firefox106 (nightly). However, the bug still isn't assigned.

:Sylvestre, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(sledru)

Comment 3

•

3 years ago

Redirecting to the managers who knows much more than I about the init system
(not sure that bugbug is correct on the component)

status-firefox104: --- → affected

status-firefox105: --- → affected

Flags: needinfo?(sledru)

Flags: needinfo?(haftandilian)

Flags: needinfo?(gpascutto)

Sylvestre Ledru [:Sylvestre]

Comment 4

•

3 years ago

given the volume, I guess it is a new issue

status-firefox104: affected → unaffected

status-firefox105: affected → unaffected

https://crash-stats.mozilla.org/report/index/60b82f2c-c17c-40ab-abc6-310230220831
https://crash-stats.mozilla.org/report/index/3e91f2ac-b3fd-49d2-9d9c-2182c0220831
https://crash-stats.mozilla.org/report/index/043c2de5-5718-46e0-be71-08b070220831
https://crash-stats.mozilla.org/report/index/655295c5-33b8-4d24-8362-5289e0220831

Comment 5

•

3 years ago

The previous instance was a hardening against badly written injections. However here I see reports (at least on Nightly) with no obvious 3rd party modules and with a large uptime (the last may not necessarily be inconsistent, as it's happening on thread launch):

Unfortunately there are no correlations available on crash-stats.

There is no relation between this crash and the changes in the regression range, so either we crash due to earlier stack bustage elsewhere, or this is third party stuff anyway.

Not sure this will be very actionable.

Flags: needinfo?(gpascutto)

https://crash-stats.mozilla.org/signature/?product=Firefox&version=106.0a1&signature=patched_BaseThreadInitThunk&date=%3E%3D2022-08-25T07%3A57%3A00.000Z&date=%3C2022-09-01T07%3A57%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1

Comment 6

•

3 years ago

This is turning into a new top crasher on Nightly, there is another similar new signature that spiked with the same buildID, still Windows only.

Setting as P1/S1

Severity: S2 → S1

Crash Signature: [@ BaseThreadInitThunk] → [@ BaseThreadInitThunk] [@ patched_BaseThreadInitThunk]

tracking-firefox105: --- → ?

tracking-firefox106: + → ?

Priority: -- → P1

Updated

•

3 years ago

Keywords: topcrash

Updated

•

3 years ago

Crash Signature: [@ BaseThreadInitThunk] [@ patched_BaseThreadInitThunk] → [@ BaseThreadInitThunk] [@ patched_BaseThreadInitThunk] [@ ntdll.dll | BaseThreadInitThunk ]

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

3 years ago

status-firefox-esr102: --- → unaffected

status-firefox-esr91: --- → unaffected

tracking-firefox105: ? → ---

tracking-firefox106: ? → blocking

Reporter

Updated

•

3 years ago

Updated

•

3 years ago

Flags: needinfo?(haftandilian)

Comment 7

•

3 years ago

I don't have anything to add here. Can we tell what thread was being started from the stacks of other threads and see if there's any commonality?

Comment 8

•

3 years ago

Can we tell what thread was being started from the stacks of other threads and see if there's any commonality?

No. Looking at the active thread (if any) also doesn't show any obvious commonality to me.

I notice that these reports don't seem to have memory information attached. Is this because EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE is a WER-caught error and we don't have that info there?

Comment 9

•

3 years ago

Jim, this is P1/S1 now and the WebRTC update is in the regression range. There's no other obvious (or even not so obvious?) change that can cause this, and we have very limited info to go on here, so making sure this is high on your radar.

Flags: needinfo?(jmathies)

Haik Aftandilian [:haik]

Comment 10

•

3 years ago

Additionally, I was wondering if these could correlate with OOM reduction (bug 1716727), but that was first enabled in 105 and the majority of these crashes are in 106.

Comment 11

•

3 years ago

Several user comments mention that the crash happens when you close a tab which had Netflix playing:
"Crash when closing the tab while playing Netflix "
"Crashes have occurred a few times, when closing a Tab with Netflix open. "
"If I close the tab while playing any series or movie from the Netflix site, the application crashes."

Comment 12

•

3 years ago

Do we know if there's been any CDM updates in that timeframe?

Comment 13

•

3 years ago

This is a reminder regarding comment #2!

The bug is marked as blocking firefox106 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

Comment 14

•

3 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)

Do we know if there's been any CDM updates in that timeframe?

We haven't updated the cdm in this time frame. We do have a major libwebrtc update in this time frame, as well as a security related fix in media code. Pretty tough diagnosing this though, there's nothing of much value in the stacks.

Flags: needinfo?(mfroman)

Flags: needinfo?(jmathies)

Comment 15

•

3 years ago

(In reply to Jim Mathies [:jimm] from comment #14)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)

Do we know if there's been any CDM updates in that timeframe?

We haven't updated the cdm in this time frame. We do have a major libwebrtc update in this time frame, as well as a security related fix in media code. Pretty tough diagnosing this though, there's nothing of much value in the stacks.

Have we tried reproducing the crash by closing a tab where Netflix is playing?

Flags: needinfo?(jmathies)

Comment 16

•

3 years ago

Exception thrown at 0x00007FF9FA3D3B6E (ntdll.dll) in firefox.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

This is easy to reproduce. Open any netflix stream and close the tab after it starts playing. Will try to find a regression range.

Flags: needinfo?(jmathies)

Comment 17

•

3 years ago

(In reply to Jim Mathies [:jimm] from comment #16)

Exception thrown at 0x00007FF9FA3D3B6E (ntdll.dll) in firefox.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

This is easy to reproduce. Open any netflix stream and close the tab after it starts playing. Will try to find a regression range.

Hmm, and now I can't reproduce in the same nightly version.

Comment 18

•

3 years ago

Not having any luck reproducing reliably which is preventing generation of a regression range.

Updated

•

3 years ago

Flags: qe-verify?

Comment 19

•

3 years ago

•

Edited

This crash signature might be related - https://bugzilla.mozilla.org/show_bug.cgi?id=1788592

Comment 20

•

3 years ago

(In reply to Jim Mathies [:jimm] from comment #19)

This crash signature might be related - https://crash-stats.mozilla.org/signature/?product=Firefox&signature=__delayLoadHelper2%20%7C%20_tailMerge_oleaut32.dll%20%7C%20_tailMerge_d3dcompiler_47.dll%20%7C%20%3CT%3E%3A%3Aoperator%28%29%20%7C%20__crt_seh_guarded_call%3CT%3E%3A%3Aoperator%28%29%3CT%3E&date=%3E%3D2022-06-06T16%3A28%3A00.000Z&date=%3C2022-09-06T16%3A28%3A00.000Z#summary

This is bug 1788592 (See Also)

Andreas Pehrson [:pehrsons]

Comment 21

•

3 years ago

This is a reminder regarding comment #2!

The bug is marked as blocking firefox106 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.

Comment 22

•

3 years ago

"closing the tab while playing Netflix" is pretty consistent with the GMP process shutting down. Bug 1788592 has _exit(0) in the GMP process on the stack, so I wouldn't be surprised if they are related.

Comment 23

•

3 years ago

The regression here appears to be fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=1788592

I'm not going to dupe this since the signature has been around for a while. Will leave it to release drivers to decide what to do with this bug.

Comment 24

•

3 years ago

•

Edited

Hey gcp, any chance your Windows experts can comment on this? We addressed this by preloading oleaut in gmp, but we really don't understand what triggered the issue in the first place.

Flags: needinfo?(gpascutto)

Comment 25

•

3 years ago

•

Edited

I'm going to assume relman will be able to drop the severity and we don't need to immediately figure out who can dive more deeply into this. But we'll get to this.

Flags: needinfo?(gpascutto)

Yannis Juglaret [:yannis]

Comment 26

•

3 years ago

We stopped crashing on Nightly after the patch in bug 1788592 landed, I am marking it fixed for the 106 release, lowering severity and removing the P1 flag.

Severity: S1 → S2

status-firefox106: affected → fixed

Priority: P1 → --

Comment 27

•

3 years ago

•

Edited

To summarize findings from bug 1788592:

These crashes were a consequence of linking statically with the library comsupp.lib when building xul.dll. oleaut32.dll is an implicit dependency used at exit when using comsupp.lib, and we specify that oleaut32.dll should be delay-loaded when building xul.dll. Exiting the plugin-container.exe process results in calling the dynamic_atexit_destructor_for_'vtMissing' from comsupp.lib, which calls the delay-import for VariantClear from oleaut32.dll, resulting in trying to load oleaut32.dll, and failing at that because the sandbox is active and we cannot read the library file on disk.
There are two possible fixes: (1) we could keep pre-loading oleaut32.dll which is the current fix, or (2) we could fall back to not using comsupp.lib in xul.dll and add continuous integration checks that we don't statically link with comsupp.lib when building xul.dll (see bug 1788592 for details).