Closed Bug 1763284 Opened 1 year ago Closed 11 months ago

Assertion failure: data, at src/xpcom/base/nsCycleCollector.cpp:3762

Categories

(Core :: DOM: Workers, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1763390
Tracking Status
firefox100 --- fixed
firefox101 --- affected

People

(Reporter: tsmith, Unassigned)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed])

Crash Data

Attachments

(1 file)

4.30 KB, application/x-zip-compressed
Details
Attached file testcase.zip

Found while fuzzing m-c 20220405-8d8a4fb25517 (--enable-debug --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip

Assertion failure: data, at src/xpcom/base/nsCycleCollector.cpp:3762

#0 0x7fdd8ba6be76 in NS_CycleCollectorSuspect3 src/xpcom/base/nsCycleCollector.cpp:3762:3
#1 0x7fdd8efdd6a6 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:248:7
#2 0x7fdd8efdd6a6 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:234:12
#3 0x7fdd8efdd6a6 in mozilla::DOMEventTargetHelper::AddRef() src/dom/events/DOMEventTargetHelper.cpp:85:1
#4 0x7fdd90323ec4 in AddRef src/dom/workers/WorkerScope.cpp:224:1
#5 0x7fdd90323ec4 in AddRef src/dom/workers/WorkerScope.cpp:401:1
#6 0x7fdd90323ec4 in mozilla::dom::DedicatedWorkerGlobalScope::AddRef() src/dom/workers/WorkerScope.cpp:800:1
#7 0x7fdd8efdd603 in mozilla::DOMEventTargetHelper::QueryInterface(nsID const&, void**) src/dom/events/DOMEventTargetHelper.cpp:83:1
#8 0x7fdd90321560 in QueryInterface src/dom/workers/WorkerScope.cpp:230:1
#9 0x7fdd90321560 in mozilla::dom::WorkerGlobalScope::QueryInterface(nsID const&, void**) src/dom/workers/WorkerScope.cpp:401:1
#10 0x7fdd90323d94 in QueryInterface src/dom/workers/WorkerScope.cpp:800:1
#11 0x7fdd90323d94 in non-virtual thunk to mozilla::dom::DedicatedWorkerGlobalScope::QueryInterface(nsID const&, void**) src/dom/workers/WorkerScope.cpp
#12 0x7fdd8ba8d82a in QueryReferent src/xpcom/base/nsWeakReference.cpp:147:19
#13 0x7fdd8ba8d82a in nsQueryReferent::operator()(nsID const&, void**) const src/xpcom/base/nsWeakReference.cpp:52:9
#14 0x7fdd902eb3f5 in assign_from_query_referent /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:1254:7
#15 0x7fdd902eb3f5 in nsCOMPtr /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:643:5
#16 0x7fdd902eb3f5 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() src/dom/workers/RuntimeService.cpp:2220:9
#17 0x7fdd8bb62979 in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1181:16
#18 0x7fdd8bb692dd in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:465:10
#19 0x7fdd8c70ea9b in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:300:20
#20 0x7fdd8c62b5d7 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:380:10
#21 0x7fdd8c62b4e2 in RunHandler src/ipc/chromium/src/base/message_loop.cc:373:3
#22 0x7fdd8c62b4e2 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:355:3
#23 0x7fdd8bb5df6e in nsThread::ThreadFunc(void*) src/xpcom/threads/nsThread.cpp:385:10
#24 0x7fdda166ca57 in _pt_root src/nsprpub/pr/src/pthreads/ptthread.c:201:5
#25 0x7fdda23e6608 in start_thread /build/glibc-sMfBJT/glibc-2.31/nptl/pthread_create.c:477:8
#26 0x7fdda1fad162 in __clone /build/glibc-sMfBJT/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Flags: in-testsuite?
Crash Signature: [@ NS_CycleCollectorSuspect3 | mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface]

A Pernosco session is available here: https://pernos.co/debug/HbCmuMxPiH7gI6AnbjpqLA/index.html

IIUC our sentinel weak reference to the global scope found a living instance and wants to increment the refcount via NS_IMPL_CYCLE_COLLECTING_ADDREF. In doing so it calls suspect which wants to access the static, thread local cycle collector instance - which has gone already.

Not sure how difficult it would be to make the refcounting resilient enough against a missing cycle collector instance to work also after its shutdown. I assume this might open a can of worms, but I have not much knowledge here. We could keep a MOZ_ASSERT_UNLESS_FUZZING inside suspect and just do nothing in case?

In any case this way the whole sentinel thing is not very meaningful and might cause null-crashes in release we did not have before, IIUC.

Flags: needinfo?(bugs)

(In reply to Jens Stutte [:jstutte] from comment #2)

In any case this way the whole sentinel thing is not very meaningful and might cause null-crashes in release we did not have before, IIUC.

Which is already confirmed by the crashes like https://crash-stats.mozilla.org/report/index/c9c58b40-196e-4a24-87ef-a2b670220405.

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220406043513-678264f22280.
The bug appears to have been introduced in the following build range:

Start: 5300e917c86d0b1e209a231bcbdbe389d0d0b1bc (20220219093323)
End: f6a5db4ee19659e21f1404d9dccb495fffb18bfb (20220219083300)
Pushlog: https://hg.mozilla.org/mozilla-unified/pushloghtml?fromchange=5300e917c86d0b1e209a231bcbdbe389d0d0b1bc&tochange=f6a5db4ee19659e21f1404d9dccb495fffb18bfb

Keywords: regression
Whiteboard: [bugmon:bisected,confirmed]
Crash Signature: [@ NS_CycleCollectorSuspect3 | mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface] → [@ NS_CycleCollectorSuspect3 | mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface] [@ mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface]

(In reply to Jens Stutte [:jstutte] from comment #2)

Not sure how difficult it would be to make the refcounting resilient enough against a missing cycle collector instance to work also after its shutdown. I assume this might open a can of worms, but I have not much knowledge here. We could keep a MOZ_ASSERT_UNLESS_FUZZING inside suspect and just do nothing in case?

One would need to consider performance (AddRef and Release are extremely hot methods) but also this kind of crash really hints about error elsewhere.

Crash Signature: [@ NS_CycleCollectorSuspect3 | mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface] [@ mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface] → [@ NS_CycleCollectorSuspect3 | mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface] [@ mozilla::DOMEventTargetHelper::AddRef | mozilla::DOMEventTargetHelper::QueryInterface]
Flags: needinfo?(bugs)

(In reply to Olli Pettay [:smaug] from comment #6)

(In reply to Jens Stutte [:jstutte] from comment #2)

Not sure how difficult it would be to make the refcounting resilient enough against a missing cycle collector instance to work also after its shutdown. I assume this might open a can of worms, but I have not much knowledge here. We could keep a MOZ_ASSERT_UNLESS_FUZZING inside suspect and just do nothing in case?

One would need to consider performance (AddRef and Release are extremely hot methods) but also this kind of crash really hints about error elsewhere.

Looking at NS_CycleCollectorSuspect3 I already see a special case handling for shutdown CC:

  if (MOZ_LIKELY(data->mCollector)) {
    data->mCollector->Suspect(aPtr, aCp, aRefCnt);
    return;
  }
  SuspectAfterShutdown(aPtr, aCp, aRefCnt, aShouldDelete);

which seems incomplete to me? Should this better read:

 if (MOZ_LIKELY(data && data->mCollector)) {

?

See Also: → 1763390

Recap from our matrix chat: We have two issues here.

  1. The usage of the weak reference after the JSContext has been destroyed (which is not the same as the cycle collector being shut down). See bug 1763390
  2. The leak of the global scope itself that we should investigate here with this test case.
Severity: S2 → S3
Priority: -- → P3

The crash as such is mitigated by bug 1763390. If we are concerned about the volume in beta we could uplift that one.

This bug and testcase should serve now to investigate the leak.

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → DUPLICATE

Hi Jason, the testcase here might now trigger a different failure (assert). I assume bugmon will try to verify automatically if this is really fixed?

Flags: needinfo?(jkratzer)

Jens, the testcase here now triggers the assertion in bug 1764921. Also, just FYI but bugmon will no longer analyze bugs resolved as a duplicate. It will only verify fixes when a bug has been resolved as FIXED.

Flags: needinfo?(jkratzer)

(In reply to Jason Kratzer [:jkratzer] from comment #12)

Jens, the testcase here now triggers the assertion in bug 1764921. Also, just FYI but bugmon will no longer analyze bugs resolved as a duplicate. It will only verify fixes when a bug has been resolved as FIXED.

Thanks! Then let's move the discussion there.

No valid actions for resolution (DUPLICATE).
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon
You need to log in before you can comment on or make changes to this bug.