Closed Bug 1692724 Opened 3 years ago Closed 2 years ago

[zh-CN] Crash in [@ mozilla::ScriptPreloader::CacheWriteComplete]

Categories

(Core :: XPConnect, defect, P3)

defect

Tracking

()

RESOLVED FIXED
101 Branch
Tracking Status
firefox-esr91 --- fixed
firefox97 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- fixed

People

(Reporter: aryx, Assigned: kmag)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

See bug 1610246 for the history on this issue.

Crash report: https://crash-stats.mozilla.org/report/index/ea30788b-fa29-44a3-a83e-3620d0210213

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll mozilla::ScriptPreloader::CacheWriteComplete js/xpconnect/loader/ScriptPreloader.cpp:763
1 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1201
2 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:739
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
4 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
7 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
8 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:602
9 xul.dll nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:271

needinfo'ing kmag because he said he would take a look at this crash.

Severity: normal → S3
Flags: needinfo?(kmaglione+bmo)
Priority: -- → P3

The crash volume spiked on February 13 and March 15 and then trailed off. But neither of those dates corresponds to a new Firefox release cycle: Firefox 86 was released on February 23 and Firefox 87 was released on March 23.

zh-CN hit issues in the past when they released 2 add-on updates at the same time (cannot find the bug at the moment).

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #3)

zh-CN hit issues in the past when they released 2 add-on updates at the same time (cannot find the bug at the moment).

I think you mean bug 1610246, linked in "see also" above?

(In reply to Chris Peterson [:cpeterson] from comment #2)

The crash volume spiked on February 13 and March 15 and then trailed off. ...

And I can confirm those dates were when we released Fx 86/87 compat fixes to two of our extensions.

Bug 1610246 handled this for the other caller of StartCacheWrite. The idea
behind not adding an explicit check was that mSaveComplete should usually
imply !mSaveThread. However, when there's nothing in the cache to save,
PrepareCacheWriteInternal sets mSaveComplete to true before the save
thread shuts down. And when we have two cache flushes in the same session due
to multiple extension upgrades, that can lead to us hitting the cache flush
codepath in the middle of that critical period.

Assignee: nobody → kmaglione+bmo
Status: NEW → ASSIGNED

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:kmag, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(continuation)
Flags: needinfo?(continuation)

This crash signature is again spiking this week.

It looks like there have been spikes about once a month.

For the last two spikes (late Nov. & late Dec.) we were shipping updates to only one of our extensions, not two or more. Which is inconsistent with earlier observations.

(In reply to Kris Maglione [:kmag] from comment #5)

Created attachment 9233588 [details]
Bug 1692724: More don't spawn a second save thread if one already exists. r=mccr8

Bug 1610246 handled this for the other caller of StartCacheWrite. The idea
behind not adding an explicit check was that mSaveComplete should usually
imply !mSaveThread. However, when there's nothing in the cache to save,
PrepareCacheWriteInternal sets mSaveComplete to true before the save
thread shuts down. And when we have two cache flushes in the same session due
to multiple extension upgrades, that can lead to us hitting the cache flush
codepath in the middle of that critical period.

Andrew, I wonder if we should try to land this patch.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(continuation)

Hi! We got another spike here. The volume is much higher than previous ones. Is the new spike aligned with your extension update? Can and should we do anything with this? Thank you.

Flags: needinfo?(bzhao)

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #11)

Hi! We got another spike here. The volume is much higher than previous ones.

Hi Hsin-Yi, indeed it’s much higher.

Is the new spike aligned with your extension update?

Yes, it is.

Can and should we do anything with this? Thank you.

In the past, we tried to avoid updating multiple extensions simultaneously, and it worked to some extent.

But corresponding to recent spikes, we’ve been updating only one of our extensions. It appears something changed in Fx itself.

I still believe a proper fix should happen upstream, not in our extensions.

Flags: needinfo?(bzhao)

I don't have time to look into this in the near future, unfortunately.

Flags: needinfo?(continuation)

I don't think Kris will be looking at this soon, either, so I'll set it back to new to better reflect the status.

Assignee: kmaglione+bmo → nobody
Status: ASSIGNED → NEW

We're shipping an update to one of the extensions today, so expect another spike of this signature.

Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/872728d7230d
More don't spawn a second save thread if one already exists. r=mccr8
Assignee: nobody → kmaglione+bmo
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch

Given the crash volume, I'm not sure S3 is the right severity for this.

The patch landed in nightly and beta is affected.
:kmag, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(kmaglione+bmo)

This is already in the beta branch. It landed way to late in the cycle to uplift to the last beta.

Flags: needinfo?(kmaglione+bmo)

Hi Kris, this would be nice to fix on ESR still. So far, all indications are that this patch is working as expected on Beta101. Please nominate this for approval if you're comfortable doing so.

Flags: needinfo?(kmaglione+bmo)

Comment on attachment 9233588 [details]
Bug 1692724: More don't spawn a second save thread if one already exists. r=mccr8

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: It causes startup crashes
  • User impact if declined: Startup crashes when certain bundled extensions or language packs are updated
  • Fix Landed on Version: 101
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It simply adds a sanity check to prevent multiple concurrent attempts to update the startup cache.
Flags: needinfo?(kmaglione+bmo)
Attachment #9233588 - Flags: approval-mozilla-esr91?

Comment on attachment 9233588 [details]
Bug 1692724: More don't spawn a second save thread if one already exists. r=mccr8

Fix appears to be working on Fx101. Approved for 91.11esr.

Attachment #9233588 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: