Closed Bug 1622114 Opened 5 years ago Closed 5 years ago

Crash in [@ IPCError-browser | ShutDownKill | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ]

Categories

(Core :: Storage: IndexedDB, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox-esr68 --- wontfix
firefox74 --- wontfix
firefox75 --- fixed
firefox76 --- fixed

People

(Reporter: mccr8, Assigned: sg)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-6e282395-1ff3-4650-8ce5-ceea00200312.

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 xul.dll trunc 
3 xul.dll xul.dll@0x200bf 
4 xul.dll mozilla::CycleCollectedJSContext::CleanupIDBTransactions xpcom/base/CycleCollectedJSContext.cpp:439
5 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
6 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue>::GetEvent xpcom/threads/ThreadEventQueue.cpp:208
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1140
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:481
9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109

I don't know if this is why we are seeing these hangs, but AFAICT mozilla::CycleCollectedJSContext::CleanupIDBTransactions takes time that is quadratic in the length of mPendingIDBTransactions.

The line number from these hangs is the last line of the function, so maybe it is due to running destructors?

It does look like it is doing more work than needed.

localQueue.AppendElements(mPendingIDBTransactions);
localQueue.SwapElements(mPendingIDBTransactions);

I think this copies all of the elements of mPendingIDBTransactions to the end of localQueue, which causes all of the mTransactions to get AddRefed. Then when we destroy localQueue, the mTransactions will get Released. I think that could be fixed by changing the first line above to localQueue.AppendElements(std::move(mPendingIDBTransactions));

Anyways, I don't know what this code is doing, so maybe those microoptimizations are irrelevant.

I'm a bit surprised if we have tons of pending IDB transactions.
Is that expected?

But sure, we do a move.

Flags: needinfo?(jvarga)

Half the reports have the IPCShutdownState annotation set to SendFinishShutdown (sent) so they actually finished executing between when the minidump was grabbed and when we killed the process - i.e. they were being slow. The other half didn't have the annotation at all, which means they never responded to the IPC shutdown notification. Most likely they received it but it was still sitting in a queue waiting for the event-loop to catch up.

Simon, can you take a look ?

Flags: needinfo?(jvarga) → needinfo?(sgiesecke)

I noticed that after a spike there have been no new reports so for after build 20200311163942. So maybe, this has resolved again? Given https://bugzilla.mozilla.org/show_bug.cgi?id=1622114#c2, it seems that this might mean the cause of the slow shutdown might be unrelated to the IDBTransactions itself, which makes it rather hard to identify what regressed and/or fixed it. I guess if this doesn't happen again, we might just close this again?

Flags: needinfo?(sgiesecke)

It is just six days since the last crashing build. So I would wait until the end of this week, at least. Simon, can you monitor it until then?

Flags: needinfo?(sgiesecke)

I looked for crash signatures containing CleanupIDBTransactions, and found some more shutdown hangs. I don't know why there's a varying amount of junk in these signatures.

Thunderbird has an interesting variation: bp-173c348d-1171-4001-93b5-efc350200314 with the signature [@ nsTArray_base<T>::SwapArrayElements<T> | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ].

Crash Signature: [@ IPCError-browser | ShutDownKill | trunc | xul.dll | mozilla::CycleCollectedJSContext::CleanupIDBTransactions] → [@ IPCError-browser | ShutDownKill | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ] [@ IPCError-browser | ShutDownKill | trunc | xul.dll | mozilla::CycleCollectedJSContext::CleanupIDBTransactions] [@ IPCError-browser | ShutDownKill | xul.dll |…
Summary: Crash in [@ IPCError-browser | ShutDownKill | trunc | xul.dll | mozilla::CycleCollectedJSContext::CleanupIDBTransactions] → Crash in [@ IPCError-browser | ShutDownKill | mozilla::CycleCollectedJSContext::CleanupIDBTransactions ]

With the newly added signatures, we have a more recent occurrence. Changing from copying to moving the elements makes sense independently from this hang. I will provide a fix for that.

However, I still can't imagine that it is more than a coincidence the shutdown kill happens here. SwapElements/SwapArrayElements should not be a costly operation under any circumstances.

Flags: needinfo?(sgiesecke)

Yeah, the volume is so low (and it looks even lower on release!) that it is hard to make any real conclusions about the severity, so it probably just is some fluke.

Assignee: nobody → sgiesecke
Status: NEW → ASSIGNED

The fact that there's a lot of pending transactions could indicate a problem with parent/child communication. For example:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/indexedDB/ActorsParent.cpp#14275
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/indexedDB/ActorsParent.cpp#14363
If we hit the race, the parent side will just refuse to create a parent actor, but the child side currently expects that it's an infallible operation, it won't be notified that parent failed, so the transcation will just endlessly wait until shutdown on the child side.
This is just a theory, should be investigated more.

I think we don't know if there are really a lot of pending transactions. We could add a crash annotation to check that. Do crash annotations work with ShutdownKills as well?

I think crash annotations work more like static variables, you set it to something and then later if the process crashes, the information that has been set some time before the crash will be used for sending a crash report, for example:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/dom/events/TouchEvent.cpp#248

After some discussion on Slack, we came to the conclusion that it's plausible that crash annotations are provided with ShutdownKill crash reports, but we are not completely sure. Andrew, can you answer that? It makes sense to be sure of that before going through data review etc.

Flags: needinfo?(continuation)

I can. Crash annotations will be sent alongside ShutDownKill crash reports. To give you an example we set the IPCShutdownState annotation in content processes (see here) and when set it's always present in the ShutDownKill crash reports.

Note that if an annotation has not been set in the content process we pick the value that has been set in the main process instead.

Flags: needinfo?(continuation)
Attachment #9133874 - Attachment description: Bug 1622114 - Move pending transactions array instead of copying. r=#dom-workers-and-storage → Bug 1622114 - Improve array manipulation in CleanupIDBTransactions. r=#dom-workers-and-storage
Priority: -- → P2
Pushed by sgiesecke@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d405a129956e Improve array manipulation in CleanupIDBTransactions. r=dom-workers-and-storage-reviewers,smaug,asuth

Is this low-risk enough that we could consider uplifting? Eliminating quadratic behavior seems like a positive thing outside of shutdown hangs.

(In reply to Nathan Froyd [:froydnj] from comment #16)

Is this low-risk enough that we could consider uplifting? Eliminating quadratic behavior seems like a positive thing outside of shutdown hangs.

Makes sense. I'll monitor this and then request uplift in a few days.

Flags: needinfo?(sgiesecke)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

We're building the last beta tomorrow, so if you still want to uplift today's the last chance.

Comment on attachment 9133874 [details]
Bug 1622114 - Improve array manipulation in CleanupIDBTransactions. r=#dom-workers-and-storage

Beta/Release Uplift Approval Request

  • User impact if declined: Higher exposure to shutdown kills.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Local improvements reducing algorithmic complexity.
  • String changes made/needed:
Flags: needinfo?(sgiesecke)
Attachment #9133874 - Flags: approval-mozilla-beta?

Comment on attachment 9133874 [details]
Bug 1622114 - Improve array manipulation in CleanupIDBTransactions. r=#dom-workers-and-storage

approved for 75.0b10

Attachment #9133874 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: