Closed Bug 1579858 Opened 5 years ago Closed 5 years ago

Crash in [@ XULContentSinkImpl::Release]

Categories

(Core :: DOM: Service Workers, defect, P1)

Desktop
Windows 10
defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
relnote-firefox --- 69+
firefox-esr60 --- unaffected
firefox-esr68 --- verified
firefox69 + verified
firefox70 + verified
firefox71 --- verified

People

(Reporter: marcia, Assigned: edgar)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-b9bffbc9-27f0-4645-a38f-fad5e0190909.

Seen while looking at 69 release crash stats, but present in 70 as well: https://bit.ly/2kb4pEw. Currently #28 overall without a bug. A handful of crashes were present in 68 release, but this is a bit more visible in 69.

Comments are not particularly helpful. Correlations:

(100.0% in signature vs 50.28% overall) process_type = content [34.88% vs 266.91% if startup_crash = null]
(100.0% in signature vs 09.32% overall) address = 0x0
(100.0% in signature vs 30.28% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(38.76% in signature vs 99.17% overall) plugin_version = null
(100.0% in signature vs 59.13% overall) cpu_arch = amd64
(84.50% in signature vs 40.79% overall) platform_pretty_version = Windows 10
(53.49% in signature vs 07.65% overall) Module "igd10iumd64.dll" = true
(53.49% in signature vs 08.92% overall) Module "ntasn1.dll" = true
(73.64% in signature vs 33.36% overall) plugin_filename = null
(100.0% in signature vs 13.76% overall) startup_crash = null [34.88% vs 73.06% if process_type = content]
(65.12% in signature vs 27.63% overall) shutdown_progress = null [34.88% vs 73.05% if process_type = content]
(15.50% in signature vs 50.59% overall) Module "api-ms-win-crt-multibyte-l1-1-0.dll" = true
(26.36% in signature vs 59.94% overall) contains_memory_report = null

Top 10 frames of crashing thread:

0 xul.dll XULContentSinkImpl::Release xpcom/ds/nsArray.cpp
1 xul.dll mozilla::CycleCollectedJSContext::~CycleCollectedJSContext xpcom/base/CycleCollectedJSContext.cpp:123
2 xul.dll void mozilla::dom::WorkerJSContext::~WorkerJSContext dom/workers/RuntimeService.cpp:943
3 xul.dll nsresult mozilla::dom::workerinternals::`anonymous namespace'::WorkerThreadPrimaryRunnable::Run dom/workers/RuntimeService.cpp:2366
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
6 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:333
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
9 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:459

a majority of crash urls and comments reference Office 365 (https://www.office.com/?auth=2) as the crashing tab.

The number of the crash increases sharply since the first week of Sep. Andrew, and Perry, maybe you have an idea what's going on?

Flags: needinfo?(perry)
Flags: needinfo?(bugmail)
Priority: -- → P1

I wonder if it has something to do with an update pushed out on 8-26 - see https://docs.microsoft.com/en-us/officeupdates/update-history-office365-proplus-by-date.

judging by url correlations, this is likely the 32bit signature for the same crashing issue.

Crash Signature: [@ XULContentSinkImpl::Release] → [@ XULContentSinkImpl::Release] [@ mozilla::dom::Promise::Release]

Hi Jens, we've seen 1400 crashes with this signature over the last week. Can you help assign someone to investigate? The volume is high enough that I'd consider a fix for a dot release if one were available.

Flags: needinfo?(jstutte)

There might be some relation to the disabling of GC in memory pressure state? See Bug 1560948

Flags: needinfo?(jstutte)
Flags: needinfo?(jcoppeard)

(In reply to Jens Stutte [:jstutte] from comment #6)
This doesn't look like an OOM issue. (Also that bug doesn't disable GC when we're in a low memory state, rather it stops us repeatedly collecting after the first one when we enter the state.)

My suspicion would be bug 1362272 which was enabled in the 69 release in bug 1525554 since we're crashing in the destructor of mPendingUnhandledRejections which was added in that bug.

Flags: needinfo?(jcoppeard) → needinfo?(echen)

There are two signature,

(In reply to Edgar Chen [:edgar] from comment #8)

There are two signature,

In this regard, should we have separate bugs to track each one, Edgar?

both signatures have started to spike up on september 6 and then once again on september 8 and both highly correlated to https://www.office.com as crashing url, [@ XULContentSinkImpl::Release] is from 64bit installations, [@ mozilla::dom::Promise::Release] from 32bit.
so even if it they don't seem related from the outset, considering the circumstances i think they are.

(In reply to Edgar Chen [:edgar] from comment #8)
Both signatures go through the CycleCollectedJSContext destructor. XULContentSinkImpl is not associated with CycleCollectedJSContext in any way I can see, so I think this is just a glitch in the stack decoding and these are the same crash.

(In reply to Jon Coppeard (:jonco) from comment #11)

XULContentSinkImpl is not associated with CycleCollectedJSContext in any way I can see,

Yeah, that is why I am a bit confused the stack.

I suspect that there are Promise in mPendingUnhandledRejections when destructing and the destructor of hash table clear memory to 0x0 first before release RefPtr?

The NotifyUnhandledRejections runnable should clear the items in mPendingUnhandledRejections, the only possibility is that worker terminates when NotifyUnhandledRejections runnable is schudled but yet executed.

Flags: needinfo?(echen)

Okay, I could get the similar crash stack by modifying code that leaves Promise in mPendingUnhandledRejections intentionally.

something like,

thread #39, name = 'DOM Worker', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)

frame #0: 0x000000011793e1ea XUL`::NS_CycleCollectorSuspect3(aPtr=0x000000014e10c130, aCp=0x0000000000000000, aRefCnt=0x000000014e10c140, aShouldDelete=0x0000000000000000) at nsCycleCollector.cpp:3761:3

frame #1: 0x000000011794931c XUL`unsigned long nsCycleCollectingAutoRefCnt::decr<&(NS_CycleCollectorSuspect3)>(this=0x000000014e10c140, aOwner=0x000000014e10c130, aCp=0x0000000000000000, aShouldDelete=0x0000000000000000) at nsISupportsImpl.h:234:7

frame #2: 0x00000001179a9f97 XUL`unsigned long nsCycleCollectingAutoRefCnt::decr<&(NS_CycleCollectorSuspect3)>(this=0x000000014e10c140, aOwner=0x000000014e10c130, aShouldDelete=0x0000000000000000) at nsISupportsImpl.h:221:12

frame #3: 0x000000011dad790a XUL`mozilla::dom::Promise::Release(this=0x000000014e10c130) at Promise.cpp:72:1

frame #4: 0x00000001179101f5 XUL`mozilla::RefPtrTraits<mozilla::dom::Promise>::Release(aPtr=0x000000014e10c130) at RefPtr.h:48:40

frame #5: 0x00000001179101d5 XUL`RefPtr<mozilla::dom::Promise>::ConstRemovingRefPtrTraits<mozilla::dom::Promise>::Release(aPtr=0x000000014e10c130) at RefPtr.h:373:36

frame #6: 0x00000001179101ba XUL`RefPtr<mozilla::dom::Promise>::~RefPtr(this=0x00000001148e3108) at RefPtr.h:79:7

frame #7: 0x00000001178ce515 XUL`RefPtr<mozilla::dom::Promise>::~RefPtr(this=0x00000001148e3108) at RefPtr.h:77:13

frame #8: 0x0000000117911193 XUL`nsBaseHashtableET<nsUint64HashKey, RefPtr<mozilla::dom::Promise>::~nsBaseHashtableET(this=0x00000001148e3100) at nsBaseHashtable.h:533:62

frame #9: 0x0000000117911165 XUL`nsBaseHashtableET<nsUint64HashKey, RefPtr<mozilla::dom::Promise>::~nsBaseHashtableET(this=0x00000001148e3100) at nsBaseHashtable.h:533:61

frame #10: 0x00000001179110ac XUL`nsTHashtable<nsBaseHashtableET<nsUint64HashKey, RefPtr<mozilla::dom::Promise>::s_ClearEntry(aTable=0x00000001aa151868, aEntry=0x00000001148e3100) at nsTHashtable.h:429:37

frame #11: 0x00000001179bc0f6 XUL`PLDHashTable::~PLDHashTable(this=0x000070000bf420e8, aSlot=0x000070000bf42048)::$_0::operator()(PLDHashTable::Slot const&) const at PLDHashTable.cpp:304:7

frame #12: 0x00000001179bc044 XUL`void PLDHashTable::EntryStore::ForEachSlot<PLDHashTable::~PLDHashTable()::$_0>(aStore="\x92*^\x03", aCapacity=64, aEntrySize=16, aFunc=0x000070000bf420e8)::$_0&&) at PLDHashTable.h:359:9

frame #13: 0x000000011799feda XUL`void PLDHashTable::EntryStore::ForEachSlot<PLDHashTable::~PLDHashTable()::$_0>(this=0x00000001aa151870, aCapacity=64, aEntrySize=16, aFunc=0x000070000bf420e8)::$_0&&) at PLDHashTable.h:349:7

frame #14: 0x000000011799fdda XUL`PLDHashTable::~PLDHashTable(this=0x00000001aa151868) at PLDHashTable.cpp:302:15

frame #15: 0x000000011799f9b5 XUL`PLDHashTable::~PLDHashTable(this=0x00000001aa151868) at PLDHashTable.cpp:291:31

frame #16: 0x00000001178f7285 XUL`nsTHashtable<nsBaseHashtableET<nsUint64HashKey, RefPtr<mozilla::dom::Promise> > >::~nsTHashtable(this=0x00000001aa151868) at nsTHashtable.h:384:43

frame #17: 0x00000001178f7265 XUL`nsBaseHashtable<nsUint64HashKey, RefPtr<mozilla::dom::Promise>, mozilla::dom::Promise*>::~nsBaseHashtable(this=0x00000001aa151868) at nsBaseHashtable.h:60:7

frame #18: 0x00000001178f7245 XUL`nsRefPtrHashtable<nsUint64HashKey, mozilla::dom::Promise>::~nsRefPtrHashtable(this=0x00000001aa151868) at nsRefPtrHashtable.h:23:7

frame #19: 0x00000001178cc915 XUL`nsRefPtrHashtable<nsUint64HashKey, mozilla::dom::Promise>::~nsRefPtrHashtable(this=0x00000001aa151868) at nsRefPtrHashtable.h:23:7

frame #20: 0x00000001178cc28e XUL`mozilla::CycleCollectedJSContext::~CycleCollectedJSContext(this=0x00000001aa14c000) at CycleCollectedJSContext.cpp:124:1

frame #21: 0x000000011da00bed XUL`mozilla::dom::WorkerJSContext::~WorkerJSContext(this=0x00000001aa14c000) at RuntimeService.cpp:947:3

......

Assignee: nobody → echen
Flags: needinfo?(perry)
Flags: needinfo?(bugmail)

(In reply to Edgar Chen [:edgar] from comment #13)

I suspect that there are Promise in mPendingUnhandledRejections when destructing and the destructor of hash table clear memory to 0x0 first before release RefPtr?

It is because of the CycleCollectorData is clear in nsCycleCollector_forgetJSContext and we try to release RefPtr after that.

mAboutToBeNotifiedRejectedPromises will be clear in AfterProcessMicrotasks()
and mPendingUnhandledRejections will be clear after NotifyUnhandledRejections
runnable is handled.

However, worker could terminate in any time, we still need to clear those
structures manually before CollectData is clear.

Pushed by echen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9596d7f4a745
Should release RefPtr before CollectData is clear; r=smaug
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

adding the macos signature as well.

Crash Signature: [@ XULContentSinkImpl::Release] [@ mozilla::dom::Promise::Release] → [@ XULContentSinkImpl::Release] [@ mozilla::dom::Promise::Release] [@ NS_CycleCollectorSuspect3]

Hello,
I’ve encountered 3 crashes today with the same signature on Ubuntu 18.04 with Firefox 69.0.1 (20190917135527) while navigating office pages. I don’t have some STR on how to reproduce them, they were random like one on the sign-in page, one while clicking “All apps” from App launcher. Attaching here the link from the crash: https://crash-stats.mozilla.org/report/index/b2441c11-532e-4f50-b3ce-44d600190918. Is this related to this crash or it's different and needs a separate bug? Thank you!

Flags: needinfo?(echen)

Looking good on Nightly so far. Please nominate this for Beta approval when you get a chance.

(In reply to Alexandru Trif, QA [:atrif] from comment #20)

Hello,
I’ve encountered 3 crashes today with the same signature on Ubuntu 18.04 with Firefox 69.0.1 (20190917135527) while navigating office pages. I don’t have some STR on how to reproduce them, they were random like one on the sign-in page, one while clicking “All apps” from App launcher. Attaching here the link from the crash: https://crash-stats.mozilla.org/report/index/b2441c11-532e-4f50-b3ce-44d600190918. Is this related to this crash or it's different and needs a separate bug? Thank you!

I think it is the same crash given that CycleCollectedJSContext destructor and Promise::Release both are on the crash stack.

Flags: needinfo?(echen)

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

Beta/Release Uplift Approval Request

  • User impact if declined: Tab crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: A majority of crash urls and comments reference Office 365 (https://www.office.com/?auth=2) as the crashing tab. But I don't have specific STR.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch just does some additional cleanup in the destructor.
  • String changes made/needed: None
Attachment #9093178 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

Fixes a topcrash. Approved for 70.0b8.

Attachment #9093178 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Hello,
I tried reproducing the crash today using Firefox 71.0a1 (20190918215055) x32/x64 builds with Windows 10x64 accessing office pages multiple times and doing random actions on office 365 apps. Random steps were conducted on macOS 10.14 and Ubuntu 18.04 with latest Nightly. No crashes were encountered while navigating office pages. I think it’s safe to assume that the issue is verified fixed.

Tested with Firefox 70.0b8 (20190919103121) on Ubuntu 18.04, macOS 10.14 and Windows 10x64 using x32 and x64 fx builds by accessing office pages and log in page (https://www.office.com/?auth=2) multiple times. No crashes encountered on either of the platforms.

Removing the qe+ flag and marking this as verified. If any further actions are needed please let me know.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Please nominate this for ESR68 approval when you get a chance.

Flags: needinfo?(echen)
Flags: needinfo?(echen)

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Tab crash on office 365 page.
  • User impact if declined: Tab crash on office 365 page.
  • Fix Landed on Version: 71
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch just does some additional cleanup in the destructor.
  • String or UUID changes made by this patch: None
Attachment #9093178 - Flags: approval-mozilla-esr68?

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

Topcrash fix, approved for 68.2esr.

Attachment #9093178 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Verified on 68.2.0esr (20190925132602) from comment 32 on Windows 10x64, macOS 10.14 and Ubuntu 18.04. No crashes encountered while browsing Microsoft online office pages.

Please nominate this patch for mozilla-release approval so we can include it in the upcoming 69.0.2 dot release.

Flags: needinfo?(echen)

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

Beta/Release Uplift Approval Request

  • User impact if declined: Tab crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: A majority of crash urls and comments reference Office 365 (https://www.office.com/?auth=2) as the crashing tab. But I don't have specific STR.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch just does some additional cleanup in the destructor.
  • String changes made/needed: None
Flags: needinfo?(echen)
Attachment #9093178 - Flags: approval-mozilla-release?
Flags: qe-verify+

Comment on attachment 9093178 [details]
Bug 1579858 - Should release RefPtr before CollectData is clear;

Topcrash fix verified on other channels. Approved for 69.0.2.

Attachment #9093178 - Flags: approval-mozilla-release? → approval-mozilla-release+

Verified the issue using Firefox 69.0.2 (20191001234643) on Windows 10x64 (Fx x64 and Fx x32 builds), Ubuntu 18.04 and macOS 10.14. No crashes encountered while navigating on office 365 pages and doing random steps.

Flags: qe-verify+

Added to the Firefox 69.0.2 release notes:

Fixed a crash when editing files on Office 365 websites

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: