Closed Bug 1604005 Opened 9 months ago Closed 9 months ago

Crash in [@ mozilla::ThreadEventTarget::IsOnCurrentThreadInfallible] called from storage::Connection::setClosedState()

Categories

(Core :: Storage: Quota Manager, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- unaffected
firefox72 --- unaffected
firefox73 + fixed
firefox74 --- fixed

People

(Reporter: jseward, Assigned: karlt)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-917e0fc5-8e62-4f12-b383-f21c30191214.

This crash was reported 8 times in 8 different installations in the Windows nightly 20191212095326.

Top 10 frames of crashing thread:

0 xul.dll bool mozilla::ThreadEventTarget::IsOnCurrentThreadInfallible xpcom/threads/ThreadEventTarget.cpp:189
1 xul.dll nsThread::IsOnCurrentThread xpcom/threads/nsThread.cpp:762
2 xul.dll nsresult mozilla::storage::Connection::setClosedState storage/mozStorageConnection.cpp:922
3 xul.dll nsresult mozilla::storage::Connection::synchronousClose storage/mozStorageConnection.cpp:1296
4 xul.dll mozilla::storage::Connection::Release storage/mozStorageConnection.cpp
5 xul.dll void mozilla::dom::quota::QuotaManager::~QuotaManager dom/quota/ActorsParent.cpp:3503
6 xul.dll void mozilla::dom::quota::QuotaManager::~QuotaManager dom/quota/ActorsParent.cpp:3500
7 xul.dll unsigned long mozilla::dom::quota::DirectoryLockImpl::Release dom/quota/ActorsParent.cpp:773
8 xul.dll void mozilla::dom::quota::`anonymous namespace'::NormalOriginOperationBase::UnblockOpen dom/quota/ActorsParent.cpp:8425
9 xul.dll nsresult mozilla::dom::quota::`anonymous namespace'::OriginOperationBase::Run dom/quota/ActorsParent.cpp:8239

Flags: needinfo?(dothayer)
Component: XPCOM → Storage: Quota Manager
Summary: Crash in [@ mozilla::ThreadEventTarget::IsOnCurrentThreadInfallible] → Crash in [@ mozilla::ThreadEventTarget::IsOnCurrentThreadInfallible] called from storage::Connection::setClosedState()

The crash reason is "IsOnCurrentThreadInfallible should never be called on nsIThread".

I'll look into this tomorrow.

Assignee: nobody → ttung

Changing platform to all since it is happening on all 3 platforms.

OS: Windows 10 → All
Hardware: Unspecified → All

Unassign myself because I'm not sure if I have time to work on this in this week.

Assignee: ttung → nobody
Priority: -- → P2

It's not clear to me why I was needinfo'd on this bug. If someone could clarify if this needinfo is still relevant then I can take a look though.

Flags: needinfo?(dothayer)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Crashes started in 73 using 20191210095443 - possible regression range based on that build id: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8a285497c16992995f515e43166432b0c765ea72&tochange=1fb1ca98f23b29d1a8a287ad2db326c84df04c57

Doug - Julien added the ni on you. I cannot tell from the regression range which landing might have caused this. 47 crashes in the last 7 days across all platforms.

Bug 1593246 is in that regression range. Unfortunately, most of the people involved in that bug are already on PTO for the holidays :(

EDIT: Actually, that one got backed out. I also see bug 1600906 in there.

Jan, do you have cycles to look into this?

Flags: needinfo?(jvarga)

The crash happens here:

NS_IMETHODIMP_(bool)
ThreadEventTarget::IsOnCurrentThreadInfallible() {
  // Rely on mThread being correct.
  MOZ_CRASH("IsOnCurrentThreadInfallible should never be called on nsIThread");
}

From the calling function it seems, that the mThread of nsIEventTarget is nullptr and then we fall back to IsOnCurrentThreadInfallible(), which probably is wrong, becaus we were supposed to have a thread.

bool nsIEventTarget::IsOnCurrentThread() {
  if (mThread) {
    return mThread == PR_GetCurrentThread();
  }
  return IsOnCurrentThreadInfallible();
}

So probably there is a lifecycle problem here and the mThread has ben nulled earlier than expected on threadOpenedOn of Connection here. I do not see many explicit assignments to threadOpenedOn, so I would assume, that the containing Connection object did already die and destroyed the threadOpenedOn such that we are in some sparse event on that dead connection object.

We are in an OriginOperationBase::Run() call with State_UnblockingOpen. In UnblockOpen() we clear the refptr to the DirectoryLockImpl object, whose refcount goes down to 0. This triggers a deletion of the DirectoryLockImpl, which contains in turn a refptr to a QuotaManager object. Also its refcount goes to 0 and the destructor of the 'QuotaManager' object wants to release the Connection object refptr.

The Connectionobject provides its own Release() function, which contains several edge cases and describes some invariant in its comments. The most edgy case could cause a synchronousClose() to be called probably not on the opening thread. Still this would not explain (to me), why the mThread of nsIEventTarget should be nullptr.

:janv, does this make come something to your mind?

Looking further at the nsIEventTarget::IsOnCurrentThread() function, it might be a first approach to:

  1. add an override of IsOnCurrentThreadInfallible() to the Connection object, returning always false.

  2. maybe add some more sophisticated, graceful error handling, doing an additional check if 'mThread' has become nullptr and return silently without a crash then (but I am not sure about the side effects), assuming that this means we should not really care any more to do anything.

But this makes sense only if we are pretty sure that the state we are in is not harmful but just weird.

[Tracking Requested - why for this release]: Significant enough volume in the last 7 days to warrant tracking this for 73.

This signature has generated 72 crashes in the last 7 days across all platforms (primarily Windows). There are no comments and no clear correlations. Based on the volume I think we should track this for 73.

Also many of the URLS are about:newtab, about:home, about:blank. There are some other URLs but no clear pattern other than general browsing.

hi, could you take a look at this crash signature, which is showing up quite prominently in crash data we have from the first few hours of 73.0b1 usage? thank you

Flags: needinfo?(karlt)
Regressed by: 1602646
Assignee: nobody → karlt
Status: NEW → ASSIGNED
Flags: needinfo?(karlt)
Regressed by: 1599922
No longer regressed by: 1602646

ThreadEventTarget is final. I also audited nsIEventTarget::IsOnCurrentThreadInfallible() declarations. None are on classes deriving from nsThread.

(In reply to Jens Stutte [:jstutte] from comment #11)

The most edgy case could cause a synchronousClose() to be called probably not on the opening thread.

This code would have been affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1599922#c4, but maybe that was benign here. Anyway, this is already fixed in the changeset that caused this regression.

Thanks :karlt ! Then I assume, there is nothing QuotaManager specific left to do here, thanks to your change this signature will not occur any more for sure.

I wonder why the connection is closed in QuotaManager destructor and not in QuotaManager::ShutdownStorage. Closing the connection in the destructor is not desired because we are on the PBackground thread and the connection was opened and used on the QM I/O thread. I'm going to investigate it a bit more.

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #18)

I wonder why the connection is closed in QuotaManager destructor and not in QuotaManager::ShutdownStorage. Closing the connection in the destructor is not desired because we are on the PBackground thread and the connection was opened and used on the QM I/O thread. I'm going to investigate it a bit more.

Yes, probably there is some order/lifecycle weirdness here. Still, if we store refptrs on QuotaManager objects, we must be prepared that the destructors are called anytime and should try to do the most meaningful things then in the sense of 2. in comment #12. The most meaningful thing could even be a "if not in debug, do nothing and return silently", if we are in an unexpected state.

I think the root cause is that for some reason in some edge case, QuotaManager doesn't close the storage connection in QuotaManager::ShutdownStorage. That needs to be investigated and I filed bug 1607692 for that. I think the patch in this bug makes sense because it fixes the case when the consumer (QuotaManager in this case) doesn't close the connection correctly and the thread which was used to open the connection is dead.

Well, I think we shouldn't allow (or prevent) destruction of QuotaManager if QuotaManager::ShutdownStorage was not called. Maybe it was called but we failed to dispatch a runnable in that method. This is not only about closing storage connection, there's other important stuff in ShutdownStorage. Anyway, this will be done in bug 1607692.

OK, then we consider this bug to be resolved through :karlt's patch and move further investigation on the QuotaManager life cycle over there. Thanks!

Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95a6f05099b7
implement ThreadEventTarget::IsOnCurrentThreadInfallible() to handle case of null mThread r=froydnj
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(karlt)

Comment on attachment 9119197 [details]
Bug 1604005 implement ThreadEventTarget::IsOnCurrentThreadInfallible() to handle case of null mThread r?froydnj

Beta/Release Uplift Approval Request

  • User impact if declined: Unpredictable crash.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Risk is low due to the simple nature of the patch and evidence that this path is rarely reached.
  • String changes made/needed: None.
Flags: needinfo?(karlt)
Attachment #9119197 - Flags: approval-mozilla-beta?

Comment on attachment 9119197 [details]
Bug 1604005 implement ThreadEventTarget::IsOnCurrentThreadInfallible() to handle case of null mThread r?froydnj

Fixes a topcrash showing up in the early Beta73 results. Approved for 73.0b3.

Attachment #9119197 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.