Bugzilla

Comment 1

•

7 months ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 20 desktop browser crashes on release
Top 20 desktop browser crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

andrew

Reporter

Updated

•

7 months ago

Component: Untriaged → XPCOM

Product: Firefox → Core

Comment 2

•

7 months ago

The severity field is not set for this bug.
:nika, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(nika)

Emilio Cobos Álvarez (:emilio)

Comment 3

•

7 months ago

Unfortunately this crash is overly generic, and corresponds to a few different causes. These crashes are all due to the main thread hanging while waiting for some nsThreadPool to shut down in the background, with different threadpools being waited on in different cases (though the signature is not walking up enough frames here to capture which threadpools are hanging).

Scanning through a few of the specific reports, I've noticed that they tend to fall into one of two major categories, though there are obviously some outliers:

StreamTransport shutdown hangs (clientcerts)

These hangs are occurring during nsStreamTransportService threadpool shutdown, and are generally due to a single STS thread still being active.
These crashes tend to occur within a backgroundtask process, with the most common task seeming to be defaultagent.
- Unlike normal Firefox processes, backgroundtask processes tend to live for a much shorter amount of time (just enough to perform a single task), meaning that a long-running operation which would normally complete before the Firefox process enters shutdown may not have completed before quitting.
Exactly where the crash happens seems to vary, but it often is occurring within LoadLoadableCertsTask::Run(), specifically within the rust osclientcerts module.
- In some cases, the code is hanging due to an outstanding mpsc recv call, though the osclientcerts thread appears to be present and active.

Examples:

https://crash-stats.mozilla.org/report/index/5f21807c-f559-40d5-9ef5-dca270231218#allthreads
- The osclientcerts thread appears to be partway through executing the open_session call
https://crash-stats.mozilla.org/report/index/e4672058-e671-4f8a-91eb-b530a0231218#allthreads
https://crash-stats.mozilla.org/report/index/f919c4e9-4d0e-4974-ab05-647de0231218#allthreads
- Appears to be in C_Initialize instead
https://crash-stats.mozilla.org/report/index/2797240b-53f3-44fb-a4df-a2ab40231218#allthreads
- Unclear what the osclientcerts code is doing.
https://crash-stats.mozilla.org/report/index/e6cf7120-eea5-4541-a9a4-08e9f0231218#allthreads
- Also in C_Initialize
https://crash-stats.mozilla.org/report/index/0af04780-0fec-43db-85be-0589e0231218#allthreads
- In C_GetInfo
https://crash-stats.mozilla.org/report/index/c191f84a-3538-48b9-b513-ea2ab0231218#allthreads
https://crash-stats.mozilla.org/report/index/4ae99336-94b2-4549-b569-782b00231218#allthreads
- No rust on the stack, but is within LoadLoadableCertsTask::Run
- Unlike the others in this section I've noticed, this was during a backgroundupdate backgroundtask
https://crash-stats.mozilla.org/report/index/b070b369-3759-4c97-9bf1-5fc6b0231218#allthreads
- No rust on the stack, but is within LoadLoadableCertsTask::Run
- Also in a backgroundupdate backgroundtask

Printer-related Background IO Thread Pool shutdown hangs

These hangs are occurring during the shared BackgroundThreadPool shutdown, and are usually due to one or more threads in the BgIOThreadPool being blocked.
Unlike the osclientcerts crashes, these appear to be happening in normal Firefox processes, not backgroundtask processes.
When the XUL caller is visible, it appears to most frequently be nsPrinterListWin::Printers(), though in some cases the stack is full of opaque PrintConfig.dll frames, so we don't have a great backtrace, and it could be some other caller.
The dispatches presumably originate from https://searchfox.org/mozilla-central/rev/91cc8848427fdbbeb324e6ca56a0d08d32d3c308/widget/nsPrinterListBase.cpp#61-67

nsPrinterListWin::Printers Examples:

PrintConfig.dll Examples:

Others

https://crash-stats.mozilla.org/report/index/38f23fc5-1a15-48f5-8e37-362f20231218#tab-details
- StreamTransport hang which is not in a backgroundtask and appears to have no connection to osclientcerts - appears to be in OsReauthenticator
https://crash-stats.mozilla.org/report/index/00debc63-ca71-4a81-96a2-438e20231218#allthreads
- BackgroundThreadPool hang when trying to pin the app to the taskbar. No thread names, but background thread appears to be Thread 12.

Leaving a ni? for :emilio for the printer hangs and :dkeeler for the osclientcerts hangs.

Flags: needinfo?(nika)

Flags: needinfo?(emilio)

Flags: needinfo?(dkeeler)

Comment 4

•

7 months ago

The print hangs don't seem super-actionable here, it seems like a windows print API call is taking longer than expected, which happens in a background thread, but that's afaict not under our control, and lots of these operations are not really cancellable / timeout-able, see this for example... :/

Flags: needinfo?(emilio)

Assignee

Comment 5

•

7 months ago

For osclientcerts, I wonder if this could be due to bug 1745925. NSS initialization causes the osclientcerts module (as well as other sources of certificates) to be loaded on a background thread, which is not something we want to do during shutdown. Telemetry indicates this operation can take longer than 1 minute for some users (https://sql.telemetry.mozilla.org/queries/96623#238541), which would be identified as a shutdown hang if that's what's happening.

Flags: needinfo?(dkeeler)

Comment 6

•

7 months ago

The severity field is not set for this bug.
:nika, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(nika)

Comment 7

•

6 months ago

Setting to S3, but I could be convinced the osclientcerts bugs should be higher priority, as I believe they will show the Firefox crash reporter UI to the user while they are actively using the browser due to a background process crashing, which could be a poor user experience.

Severity: -- → S3

Flags: needinfo?(nika)

Comment 8

•

6 months ago

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #5)

For osclientcerts, I wonder if this could be due to bug 1745925. NSS initialization causes the osclientcerts module (as well as other sources of certificates) to be loaded on a background thread, which is not something we want to do during shutdown. Telemetry indicates this operation can take longer than 1 minute for some users (https://sql.telemetry.mozilla.org/queries/96623#238541), which would be identified as a shutdown hang if that's what's happening.

Avoiding NSS initialization during shutdown might help in this situation. Unfortunately, for very short-lived processes such as the backgroundtask processes (which are the ones crashing here), starting a 1-minute operation even during startup could still lead to a shutdown crash, as the process does not live for a full minute. If it's possible, doing something like making these operations interruptable by shutdown or avoiding starting osclientcerts in backgroundtask processes might be a more reliable solution if it's possible.

Flags: needinfo?(dkeeler)

Assignee

Comment 9

•

6 months ago

Attached file Bug 1866944 - don't load osclientcerts in backgroundtask processes r?nika (obsolete) — Details

Phabricator Automation

Updated

•

6 months ago

Assignee: nobody → dkeeler

Status: NEW → ASSIGNED

Assignee

Comment 10

•

6 months ago

Right now, there's not really a way for osclientcerts to stop loading when shutdown starts, but we can definitely avoid loading it in backgroundtask processes. My one concern with that is if the backgroundtask needs to do network i/o but the connection is via a proxy or something that requires client authentication. Do backgroundtasks tend to rely on the network?

Flags: needinfo?(dkeeler)

Comment 11

•

6 months ago

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #10)

Right now, there's not really a way for osclientcerts to stop loading when shutdown starts, but we can definitely avoid loading it in backgroundtask processes. My one concern with that is if the backgroundtask needs to do network i/o but the connection is via a proxy or something that requires client authentication. Do backgroundtasks tend to rely on the network?

I believe backgroundtasks are sometimes used to interact with the network, yes. The main task which is encountering this issue (defaultagent) is a windows background scheduled task collecting information and submitting it to telemetry about what browser the user has set as their OS default (https://firefox-source-docs.mozilla.org/toolkit/mozapps/defaultagent/default-browser-agent/index.html).

If this is required for networking such as for sending pings like this, perhaps we need to find some other solution? It's unclear to me how we are starting shutdown before osclientcerts has loaded if we need it to send the ping though.

Flags: needinfo?(dkeeler)

Assignee

Comment 12

•

6 months ago

Yeah, looking at this some more, I don't think osclientcerts is directly the issue here. Loading that library should take almost no time (it doesn't do anything right away).

Bug 1745925 is seeming like a better place to start, again. However, that led to bug 1745043, so maybe we could just start with not dispatching the background task to load loadable certs if we're in shutdown.

Flags: needinfo?(dkeeler)

Phabricator Automation

Updated

•

6 months ago

Attachment #9370777 - Attachment is obsolete: true

Comment 13

•

5 months ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 20 desktop browser crashes on release (startup)

For more information, please visit BugBot documentation.

Keywords: topcrash-startup

Liz Henry (:lizzard) (relman/hg->git project)

Assignee

Comment 14

•

4 months ago

I recently landed bug 1881117, which might improve things here.

Comment 15

•

4 months ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash-startup

Wayne Mery (:wsmwk)

Updated

•

3 months ago

Whiteboard: [tbird crash]

Comment 16

•

2 months ago

This is back in topcrash territory for Firefox 126, 127, and 128.

status-firefox126: --- → affected

status-firefox127: --- → affected

status-firefox128: --- → affected

Arthur K. (he/him)

Comment 17

•

2 months ago

•

Edited

I just hit this on a Win 2016 Server Standard VM when I had Exchange Admin Center open and was doing some tasks: https://crash-stats.mozilla.org/report/index/bp-f3c40371-a9de-47dd-a1aa-210200240520

Looking closer at Thread 2, could this be a Trend Micro-related issue?

Chris Peterson [:cpeterson]

Updated

•

1 month ago

Comment 18

•

14 days ago

Something landed in Nightly 129 (build 20240621100955) which may have fixed the issue (or moved the crash signature)
dana, if there is a fix and you can help figure out what it was, I wonder if it might be upliftable to 128 beta.

Flags: needinfo?(dkeeler)