Open Bug 1700402 Opened 3 years ago Updated 2 months ago

Assertion failure: !mPendingRequestCount, at src/dom/indexedDB/IDBTransaction.cpp:134

Categories

(Core :: Storage: IndexedDB, defect, P2)

defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr102 --- affected
firefox88 --- wontfix
firefox89 --- wontfix
firefox103 --- wontfix
firefox104 --- affected
firefox105 --- affected

People

(Reporter: tsmith, Assigned: jstutte, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: assertion)

Attachments

(1 file, 1 obsolete file)

Found while fuzzing m-c 20210320-71b483d7ca07 (--enable-debug --enable-fuzzing)

At this time a reduced test case is not available.

Assertion failure: !mPendingRequestCount, at src/dom/indexedDB/IDBTransaction.cpp:134

#0 0x7fb791a6a1fe in mozilla::dom::IDBTransaction::~IDBTransaction() src/dom/indexedDB/IDBTransaction.cpp:134:3
#1 0x7fb791a6a450 in mozilla::dom::IDBTransaction::~IDBTransaction() src/dom/indexedDB/IDBTransaction.cpp:132:35
#2 0x7fb78d887cc1 in SnowWhiteKiller::MaybeKillObject(SnowWhiteKiller::SnowWhiteObject&) src/xpcom/base/nsCycleCollector.cpp:2434:29
#3 0x7fb78d87cc34 in SnowWhiteKiller::~SnowWhiteKiller() src/xpcom/base/nsCycleCollector.cpp:2421:7
#4 0x7fb78d87c335 in nsCycleCollector::FreeSnowWhite(bool) src/xpcom/base/nsCycleCollector.cpp:2611:3
#5 0x7fb78d8806d3 in nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) src/xpcom/base/nsCycleCollector.cpp:3592:3
#6 0x7fb78d88028c in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) src/xpcom/base/nsCycleCollector.cpp:3415:9
#7 0x7fb78d880037 in nsCycleCollector::ShutdownCollect() src/xpcom/base/nsCycleCollector.cpp:3358:20
#8 0x7fb78d881286 in nsCycleCollector::Shutdown(bool) src/xpcom/base/nsCycleCollector.cpp:3653:5
#9 0x7fb78d882cf2 in nsCycleCollector_shutdown(bool) src/xpcom/base/nsCycleCollector.cpp:3968:18
#10 0x7fb78d9a4824 in mozilla::ShutdownXPCOM(nsIServiceManager*) src/xpcom/build/XPCOMInit.cpp:706:3
#11 0x7fb793877c7c in XRE_TermEmbedding() src/toolkit/xre/nsEmbedFunctions.cpp:212:3
#12 0x7fb78e29d02e in mozilla::ipc::ScopedXREEmbed::Stop() src/ipc/glue/ScopedXREEmbed.cpp:90:5
#13 0x7fb793878372 in XRE_InitChildProcess(int, char**, XREChildData const*) src/toolkit/xre/nsEmbedFunctions.cpp:737:16
#14 0x55edade44fb6 in content_process_main src/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#15 0x55edade44fb6 in main src/browser/app/nsBrowserApp.cpp:309:18
#16 0x7fb7a321f0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
#17 0x55edade22d5c in _start (/home/worker/builds/m-c-20210321093903-fuzzing-debug/firefox-bin+0x14d5c)

A Pernosco session is available here: https://pernos.co/debug/STsVEF1XXd-aCd43eDPsXA/index.html

It seems we have 2 pending requests on the transaction when we get into ActorDestroy. But we call mTransaction->OnRequestFinished just once. We should call it until mPendingRequestCount is 0 (or provide a different OnRequestAbort function that does this) to force the commit?

Severity: -- → S3
Priority: -- → P2

(In reply to Jens Stutte [:jstutte] from comment #2)

It seems we have 2 pending requests on the transaction when we get into ActorDestroy. But we call mTransaction->OnRequestFinished just once. We should call it until mPendingRequestCount is 0 (or provide a different OnRequestAbort function that does this) to force the commit?

Taking another look at the code it seems to me, that we already have an explicit shutdown abort handling based on aRequestCompletedSuccessfully (which is false in the pernosco run due to aWhy == NormalShutdown, triggering thus the mSentCommitOrAbort.Flip() and marking the transaction as aborted). I assume, that we should do something with mPendingRequestCount in that case, too?

I also wonder, if we should check the abort condition before we decrement mPendingRequestCount? After all we are not calling OnRequestFinishedfrom a foreseen code path here but to handle the abort on shutdown?

Flags: needinfo?(bugmail)

Hi Tyson, assuming that this still happens - did you have any luck in further reducing the test case?

Flags: needinfo?(bugmail) → needinfo?(twsmith)

OK, I was looking again at the old pernosco trace, and it seems we never pass through the normal shutdown sequence here. QuotaManager::Shutdown is never called, as we exit the process through a "xul-window-destroyed" notification which ends up calling nsAppStartup::Quit which does not sweep through all our shutdown phases, apparently.

This could explain why we end up in XPCOMShutdown with IDB transactions that have not been canceled.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED

(In reply to Jens Stutte [:jstutte] from comment #5)

OK, I was looking again at the old pernosco trace, and it seems we never pass through the normal shutdown sequence here. QuotaManager::Shutdown is never called, as we exit the process through a "xul-window-destroyed" notification which ends up calling nsAppStartup::Quit which does not sweep through all our shutdown phases, apparently.

Scratch that, it is just the order of things. The rest of the shutdown phases would have been kicked off on ~ScopedXPCOMStartup but we fail earlier inside a content process (thanks to Nika for pointing me there).

This could explain why we end up in XPCOMShutdown with IDB transactions that have not been canceled.

The general scenario seems to be that we have two open requests inside an IDB Transaction in a content process when the main app is starting shutdown and asks thus all content processes to shut down (which happens before QuotaManager::Shutdown runs in the parent). So I'll probably return to the hypothesis of comment 2 and comment 3 that we have some flaw in aborting with the aWhy == NormalShutdown case when there is more than one request active in the transaction.

Attachment #9297448 - Attachment is obsolete: true

An updated Pernosco session is available here: https://pernos.co/debug/g3U9cd7CwyvFbEQjmIrnJQ/index.html

Flags: needinfo?(twsmith)

Thanks! So we rely on request's ActorDestroy and/or Recv__delete__ in order to adjust our mPendingRequestCount and make it go back down to zero. That looks very fragile, in particular BackgroundCursorChild<CursorType>::SendContinueInternal increases mPendingRequestCount potentially more than once for each actor instance. I was not able to follow through entirely in the pernosco session to find the single ++ without --. But we definitely never check if messages are dispatched successfully, which during shutdown might become a problem. It seems to me that we should fake-abort our transaction at the first shutdown-killed request's ActorDestroy we encounter, regardless of being there remaining requests or not.

However, I cannot 100% exclude to be there some error in our book-keeping even in the non-shutdown case. It all looks safe under the assumption that IPC channels do not go away unexpectedly (if not in shutdown), and probably we need to rely on more fuzzing here.

Attachment #9297542 - Attachment description: WIP: Bug 1700402 - Immediately fake transaction abort on request actor shutdown kill. → Bug 1700402 - Immediately fake transaction abort on request actor shutdown kill. r?#dom-storage-reviewers

Hi Tyson, would it be possible to run the fuzzer testcase with the build from https://treeherder.mozilla.org/jobs?revision=baf2f6fc477895a9c7d8a3dbd0abaa032a161410&repo=try to see if it helps ? Thank you!

Flags: needinfo?(twsmith)

I was unable to reproduce the issue with the requested build.

Flags: needinfo?(twsmith)

Hi Tyson, sorry for bothering you again, but after bug 1777914 landed - did you see this anymore ?

Flags: needinfo?(twsmith)

(In reply to Jens Stutte [:jstutte] from comment #13)

Hi Tyson, sorry for bothering you again

I am always happy to help :)

after bug 1777914 landed - did you see this anymore ?

This is still being reported by the fuzzers (m-c 20221120-d2c628738330). I am able to repro (m-c 20221207-8e09abeeb445), so I will try to reduce the test case and get a new Pernosco session.

Not sure if we need pernosco, I just wanted to know if the patch here is still of value, and it seems so.

Ok, just let me know if anything changes.

Flags: needinfo?(twsmith)

I wonder if bug 1785080 is related and if this comment gives a hint how to fix the overly complicated state machinery here?

Flags: needinfo?(jvarga)
See Also: → 1785080

(In reply to Jens Stutte [:jstutte] from comment #17)

I wonder if bug 1785080 is related and if this comment gives a hint how to fix the overly complicated state machinery here?

I get the error/assertion locally from time to time when I'm running tests for dom/indexedDB. I could try to narrow it down and even try to create a test for it if you think it's a priority.

Flags: needinfo?(jvarga)
Flags: needinfo?(jstutte)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: