Crash in [@ IPCError-browser | ShutDownKill | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ]
Categories
(Core :: Storage: IndexedDB, defect, P2)
Tracking
()
People
(Reporter: mccr8, Assigned: sg)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
This bug is for crash report bp-6e282395-1ff3-4650-8ce5-ceea00200312.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableSRW
2 xul.dll trunc
3 xul.dll xul.dll@0x200bf
4 xul.dll mozilla::CycleCollectedJSContext::CleanupIDBTransactions xpcom/base/CycleCollectedJSContext.cpp:439
5 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
6 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue>::GetEvent xpcom/threads/ThreadEventQueue.cpp:208
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1140
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:481
9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
I don't know if this is why we are seeing these hangs, but AFAICT mozilla::CycleCollectedJSContext::CleanupIDBTransactions takes time that is quadratic in the length of mPendingIDBTransactions.
The line number from these hangs is the last line of the function, so maybe it is due to running destructors?
It does look like it is doing more work than needed.
localQueue.AppendElements(mPendingIDBTransactions);
localQueue.SwapElements(mPendingIDBTransactions);
I think this copies all of the elements of mPendingIDBTransactions to the end of localQueue, which causes all of the mTransactions to get AddRefed. Then when we destroy localQueue, the mTransactions will get Released. I think that could be fixed by changing the first line above to localQueue.AppendElements(std::move(mPendingIDBTransactions));
Anyways, I don't know what this code is doing, so maybe those microoptimizations are irrelevant.
Comment 1•5 years ago
|
||
I'm a bit surprised if we have tons of pending IDB transactions.
Is that expected?
But sure, we do a move.
Comment 2•5 years ago
|
||
Half the reports have the IPCShutdownState
annotation set to SendFinishShutdown (sent)
so they actually finished executing between when the minidump was grabbed and when we killed the process - i.e. they were being slow. The other half didn't have the annotation at all, which means they never responded to the IPC shutdown notification. Most likely they received it but it was still sitting in a queue waiting for the event-loop to catch up.
Comment 3•5 years ago
|
||
Simon, can you take a look ?
Assignee | ||
Comment 4•5 years ago
|
||
I noticed that after a spike there have been no new reports so for after build 20200311163942. So maybe, this has resolved again? Given https://bugzilla.mozilla.org/show_bug.cgi?id=1622114#c2, it seems that this might mean the cause of the slow shutdown might be unrelated to the IDBTransactions itself, which makes it rather hard to identify what regressed and/or fixed it. I guess if this doesn't happen again, we might just close this again?
Comment 5•5 years ago
|
||
It is just six days since the last crashing build. So I would wait until the end of this week, at least. Simon, can you monitor it until then?
Reporter | ||
Comment 6•5 years ago
|
||
I looked for crash signatures containing CleanupIDBTransactions, and found some more shutdown hangs. I don't know why there's a varying amount of junk in these signatures.
Thunderbird has an interesting variation: bp-173c348d-1171-4001-93b5-efc350200314 with the signature [@ nsTArray_base<T>::SwapArrayElements<T> | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ].
Assignee | ||
Comment 7•5 years ago
|
||
With the newly added signatures, we have a more recent occurrence. Changing from copying to moving the elements makes sense independently from this hang. I will provide a fix for that.
However, I still can't imagine that it is more than a coincidence the shutdown kill happens here. SwapElements
/SwapArrayElements
should not be a costly operation under any circumstances.
Reporter | ||
Comment 8•5 years ago
|
||
Yeah, the volume is so low (and it looks even lower on release!) that it is hard to make any real conclusions about the severity, so it probably just is some fluke.
Assignee | ||
Comment 9•5 years ago
|
||
Updated•5 years ago
|
Comment 10•5 years ago
|
||
The fact that there's a lot of pending transactions could indicate a problem with parent/child communication. For example:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/indexedDB/ActorsParent.cpp#14275
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/indexedDB/ActorsParent.cpp#14363
If we hit the race, the parent side will just refuse to create a parent actor, but the child side currently expects that it's an infallible operation, it won't be notified that parent failed, so the transcation will just endlessly wait until shutdown on the child side.
This is just a theory, should be investigated more.
Assignee | ||
Comment 11•5 years ago
|
||
I think we don't know if there are really a lot of pending transactions. We could add a crash annotation to check that. Do crash annotations work with ShutdownKills as well?
Comment 12•5 years ago
|
||
I think crash annotations work more like static variables, you set it to something and then later if the process crashes, the information that has been set some time before the crash will be used for sending a crash report, for example:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/events/TouchEvent.cpp#248
Assignee | ||
Comment 13•5 years ago
|
||
After some discussion on Slack, we came to the conclusion that it's plausible that crash annotations are provided with ShutdownKill crash reports, but we are not completely sure. Andrew, can you answer that? It makes sense to be sure of that before going through data review etc.
Comment 14•5 years ago
|
||
I can. Crash annotations will be sent alongside ShutDownKill crash reports. To give you an example we set the IPCShutdownState
annotation in content processes (see here) and when set it's always present in the ShutDownKill crash reports.
Note that if an annotation has not been set in the content process we pick the value that has been set in the main process instead.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 15•5 years ago
|
||
![]() |
||
Comment 16•5 years ago
|
||
Is this low-risk enough that we could consider uplifting? Eliminating quadratic behavior seems like a positive thing outside of shutdown hangs.
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #16)
Is this low-risk enough that we could consider uplifting? Eliminating quadratic behavior seems like a positive thing outside of shutdown hangs.
Makes sense. I'll monitor this and then request uplift in a few days.
Comment 18•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 19•5 years ago
|
||
We're building the last beta tomorrow, so if you still want to uplift today's the last chance.
Assignee | ||
Comment 20•5 years ago
|
||
Comment on attachment 9133874 [details]
Bug 1622114 - Improve array manipulation in CleanupIDBTransactions. r=#dom-workers-and-storage
Beta/Release Uplift Approval Request
- User impact if declined: Higher exposure to shutdown kills.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Local improvements reducing algorithmic complexity.
- String changes made/needed:
Comment 21•5 years ago
|
||
Comment on attachment 9133874 [details]
Bug 1622114 - Improve array manipulation in CleanupIDBTransactions. r=#dom-workers-and-storage
approved for 75.0b10
Comment 22•5 years ago
|
||
bugherder uplift |
Description
•