Closed Bug 1389561 Opened 2 years ago Closed 2 years ago

crash in PLDHashTable::Remove | mozilla::dom::quota::QuotaObject::Release

Categories

(Core :: DOM: IndexedDB, defect, P1, critical)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: philipp, Assigned: shawnjohnjr)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main58+][post-critsmash-triage])

Crash Data

Attachments

(10 files, 16 obsolete files)

3.20 KB, patch
janv
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
9.14 KB, patch
luke
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
3.16 KB, patch
luke
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
9.21 KB, patch
janv
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
3.02 KB, patch
janv
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
3.38 KB, patch
Details | Diff | Splinter Review
9.39 KB, patch
Details | Diff | Splinter Review
3.14 KB, patch
Details | Diff | Splinter Review
9.46 KB, patch
Details | Diff | Splinter Review
2.77 KB, patch
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #1242255 +++

these crash signatures stopped in 45, so the original bug was closed but those crashes have crept up again and are present up until the recent 57.0a1 builds. quite a high share of them still have an address indicating a UAF situation & user comments indicate it's happening on a somewhat random basis
Group: core-security → dom-core-security
Jan, please take a look here.
Flags: needinfo?(jvarga)
I'm working on bug 1399671 and it was reported in 56.0b10.
Should bug 1399671 also be a security bug?
Assignee: nobody → shuang
Flags: needinfo?(jvarga) → needinfo?(shuang)
Priority: -- → P1
(In reply to Andrew Overholt [:overholt] from comment #3)
> Should bug 1399671 also be a security bug?

I set bug 1399671 keyword to sec-high, but it doesn't change to security bug.
So I set duplicate flag. I will keep update here.
Flags: needinfo?(shuang)
Thanks, Shawn!
Maybe the crash in QuotaObject::Release() caused by closing a sqlite connection for a file which was already removed on disk. It's important to figure out why that a file was deleted. Looking into test_idle_maintenance.js, and see if I can make crash by removing files and doing IdleMaintenance at the same time. Maybe it's possible to continuously do indexed write data and trigger idle maintenance by sending 'idle-daily' notification to observer.
(In reply to Shawn Huang [:shawnjohnjr](PTO: 9/21~23) from comment #7)
> Maybe the crash in QuotaObject::Release() caused by closing a sqlite
> connection for a file which was already removed on disk. It's important to
> figure out why that a file was deleted. Looking into

Yes, a crash like this usually happens when an open file is deleted.

Do we have a new full stack trace for this ?
(In reply to Jan Varga [:janv] from comment #8)
> (In reply to Shawn Huang [:shawnjohnjr](PTO: 9/21~23) from comment #7)
> > Maybe the crash in QuotaObject::Release() caused by closing a sqlite
> > connection for a file which was already removed on disk. It's important to
> > figure out why that a file was deleted. Looking into
> 
> Yes, a crash like this usually happens when an open file is deleted.
> 
> Do we have a new full stack trace for this ?

Yes. Crash happened since 56.0b12.
https://crash-stats.mozilla.com/report/index/769d07ae-6df2-40da-b556-34de30170912#tab-details


More reports:
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=PLDHashTable%3A%3ASearch%20%7C%20mozilla%3A%3Adom%3A%3Aquota%3A%3AQuotaObject%3A%3ARelease&date=%3E%3D2017-09-18T06%3A20%3A00.000Z&date=%3C2017-09-25T06%3A20%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
Ok, so our idle maintenance correctly acquires a directory lock and releases it when it's done.
So either there's a bug somewhere in directory locking or something else deletes the file without using a directory lock.

This is usually quite hard to debug.
It only crashes on windows because other operating systems delete the file when it's closed.
(In reply to Jan Varga [:janv] from comment #11)
> It only crashes on windows because other operating systems delete the file
> when it's closed.
I guess this is the reason I don't have progress to reproduce this bug last week. I tried to write data into idb and send fake "idle-daily" notification using rr on linux. I will try it on windows.
(In reply to Jan Varga [:janv] from comment #10)
> So either there's a bug somewhere in directory locking or something else deletes the file without using a directory lock.
If there's a bug in directory locking, I think we might see a lot of problems, but we don't see that. It might be helpful to check all places that deletes the file without using a directory lock.

> It only crashes on windows because other operating systems delete the file
> when it's closed. 
So possible different type of files are under origin/idb:
.sqlite-wal
.sqlite-shm
.sqlite-journal
.sqlite

Both DeleteDatabaseOp::VersionChangeOp::DeleteFile and DeleteFilesRunnable::DeleteFile hold directory locks.
Note, that there's DatabaseMaintenance::Run() on stack. The maintenance also holds a directory lock, but there's also internal synchronization between FactoryOp/Database and Maintenance. These all get a shared directory lock which means that they can get it at the same time (FactoryOp gets a directory lock and also Maintenance gets a directory lock). The shared directory lock prevents a clear origin operation from getting an exclusive lock.
So, it's important that there's internal synchronization in IDB implementation. If there's a FactoryOp or a Database object, then a Maintenance won't run. Also when there's a Maintenance running, FactoryOp is delayed until the maintenance is complete.

It's hard to say where's the problem is if we don't have steps to reproduce or at least a brief description of situation before the browser crashed.

My gut feeling is that an IDB file is manually deleted without holding a directory lock. Maybe there's an extension or developer tool that inspects profile/storage and allows to do that ...
(In reply to Jan Varga [:janv] from comment #14)
> My gut feeling is that an IDB file is manually deleted without holding a
> directory lock. Maybe there's an extension or developer tool that inspects
> profile/storage and allows to do that ...

Actually, this crash is about a crash in QuotaObject::Release(), probably when trying to unregister itself from OriginInfo, if OriginInfo was already deleted, it crashes.
Note, that OriginInfo can't be destroyed when something is deleted manually.
(In reply to Jan Varga [:janv] from comment #15)
> Actually, this crash is about a crash in QuotaObject::Release(), probably
> when trying to unregister itself from OriginInfo, if OriginInfo was already
> deleted, it crashes.
> Note, that OriginInfo can't be destroyed when something is deleted manually.

It seems more likely that the mQuotaObjects of the mOriginInfo is null but mOriginInfo isn't according to the call stack in comment 9:
    if (mRefCnt > 0) {
      return;
    }

    if (mOriginInfo) {
      mOriginInfo->mQuotaObjects.Remove(mPath); <- Crash at PLDHashTable::Search() which tried to access mTable but this pointer was already nullptr.
    }
(In reply to Bevis Tseng[:bevis][:btseng] from comment #16)
> (In reply to Jan Varga [:janv] from comment #15)
> > Actually, this crash is about a crash in QuotaObject::Release(), probably
> > when trying to unregister itself from OriginInfo, if OriginInfo was already
> > deleted, it crashes.
> > Note, that OriginInfo can't be destroyed when something is deleted manually.
> 
> It seems more likely that the mQuotaObjects of the mOriginInfo is null but
> mOriginInfo isn't according to the call stack in comment 9:
>     if (mRefCnt > 0) {
>       return;
>     }
> 
>     if (mOriginInfo) {
>       mOriginInfo->mQuotaObjects.Remove(mPath); <- Crash at
> PLDHashTable::Search() which tried to access mTable but this pointer was
> already nullptr.
>     }

Yea, that also explain some signatures crash at Remove() some crash at Search() because mOriginInfo is null.
mOriginInfo is a dangling pointer at that time, because it's not cleared when OriginInfo is destroyed
(In reply to Jan Varga [:janv] from comment #18)
> mOriginInfo is a dangling pointer at that time, because it's not cleared
> when OriginInfo is destroyed
You are right, mQuotaObjects is not a pointer. The problem is that the data that mOriginInfo pointed was already invalid.
Is there any reason why we use a raw pointer instead of a RefPtr for QuotaObject::mQuotaObjects?
(The assumption is that the GroupInfo::mOriginInfos lives longer than QuotaObject.)
(In reply to Bevis Tseng[:bevis][:btseng] from comment #20)
> Is there any reason why we use a raw pointer instead of a RefPtr for
> QuotaObject::mQuotaObjects?
> (The assumption is that the GroupInfo::mOriginInfos lives longer than
> QuotaObject.)

QuotaManager::GetQuotaObject() checks if there's already a QuotaObject by looking up the hash table.
We don't want a strong ref for that, because QuotaObject wouldn't get destroyed when a client releases it.
I had discussion with Jan in the Storage meeting, he provided some suggestion since we don't have high volume of crash reports.
Here are action plans to tackle down this issue:
1. Add MOZ_DIAGNOSTIC_ASSERT in nightly channel only for checking the count of mQuotaObject, in order to prove our theory, that mQuotaObject.Count() is not 0.
a. http://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#518
b. http://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#3464
c. http://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#3121
Then we can wait for more data from crash board. We can verify our analysis is correct.

2. We can try to get the name of file. QuotaObject contains a member mName, it's full path, so we can get full path to understand the context. Since we can't get the log, so we probably need to do early assertion before crash.

3. Write one test containing scenario because user left comments in bug report, crash happened when they left computers and they saw crash when they return:
(a). Hit global limit on purpose and make vacuum for temporary storage happened 
(b). Send Idle notification to do maintenance

4. Write a C++ test for DirectoryLock, perhaps MustWaitFor needs to be verified.
Yes, exactly.
Sorry, I found it's hard to get mPath from QuotaObject using single one MOZ_DIAGNOSTIC_ASSERT. Maybe we can land MOZ_DIAGNOSTIC_ASSERT without mPath first?
Attachment #8913189 - Flags: review?(jvarga)
Comment on attachment 8913189 [details] [diff] [review]
Bug 1389561 - Part 1: Convernt MOZ_ASSERT to MOZ_DIAGNOSTIC_ASSERT for more debugging information

Review of attachment 8913189 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, that's enough for now.
Attachment #8913189 - Flags: review?(jvarga) → review+
https://hg.mozilla.org/mozilla-central/rev/89ffaadaf58b
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Good news is that we did crash in the diag assertion.
This time:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Adom%3A%3Aquota%3A%3AOriginInfo%3A%3ARelease&date=%3E%3D2017-09-28T06%3A32%3A00.000Z&date=%3C2017-10-05T06%3A32%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports



https://crash-stats.mozilla.com/report/index/f46dd342-f3c8-4588-aa2c-038ae0171005
0 	libxul.so 	mozilla::dom::quota::OriginInfo::Release 	dom/quota/ActorsParent.cpp:518
1 	libxul.so 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt 	mfbt/RefPtr.h:41
2 	libxul.so 	mozilla::dom::quota::GroupInfo::Release 	xpcom/ds/nsTArray.h:1738
3 	libxul.so 	mozilla::dom::quota::GroupInfoPair::~GroupInfoPair 	mfbt/RefPtr.h:41
4 	libxul.so 	nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::dom::quota::GroupInfoPair> > >::s_ClearEntry 	xpcom/base/nsAutoPtr.h:78

It seems this time all assertioni on Linux platform.
Crash Signature: mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release] → mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release] [@ mozilla::dom::quota::OriginInfo::R…
This probably uncovered a different problem, the stack indicates that quota manager is already gone (shutdown) but IDB is still finishing and once it releases a directory lock, quota manager is destroyed for real (because a directory lock holds a strong ref to quota manager). Quota manager destructor starts destroying GoupInfoPairs that destroys GroupInfos and finally OriginInfos. However there's still a live QuotaObject ...

I think this stack trace has nothing to do with the original stack trace where IDB database maintenance was involved.
Another assertion !mQuotaObjects.Count(), code path is different, not related to idle maintenance.

https://crash-stats.mozilla.com/report/index/67d1b142-aa9d-4f26-86ce-31e430171004


0 	xul.dll 	mozilla::dom::quota::OriginInfo::`scalar deleting destructor'(unsigned int) 	
1 	xul.dll 	RefPtr<mozilla::dom::quota::OriginInfo>::~RefPtr<mozilla::dom::quota::OriginInfo>() 	mfbt/RefPtr.h:79
2 	xul.dll 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned __int64, unsigned __int64) 	xpcom/ds/nsTArray.h:2060
3 	xul.dll 	mozilla::dom::quota::GroupInfo::LockedRemoveOriginInfos() 	dom/quota/ActorsParent.cpp:5999
4 	xul.dll 	mozilla::dom::quota::QuotaManager::RemoveQuota() 	dom/quota/ActorsParent.cpp:3815
5 	xul.dll 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitializedInternal(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**, bool*) 	dom/quota/ActorsParent.cpp:5220
6 	xul.dll 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitialized(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**) 	dom/quota/ActorsParent.cpp:5166
7 	xul.dll 	mozilla::dom::cache::Context::QuotaInitRunnable::Run() 	dom/cache/Context.cpp:434
Crash Signature: mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release] [@ mozilla::dom::quota::OriginInfo::R… → mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release]
https://crash-stats.mozilla.com/report/index/6b4f365f-7e0b-4f12-b317-c53fe0171003

0 	xul.dll 	mozilla::dom::quota::OriginInfo::`scalar deleting destructor'(unsigned int) 	
1 	xul.dll 	RefPtr<mozilla::dom::quota::OriginInfo>::~RefPtr<mozilla::dom::quota::OriginInfo>() 	mfbt/RefPtr.h:79
2 	xul.dll 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned __int64, unsigned __int64) 	xpcom/ds/nsTArray.h:2060
3 	xul.dll 	mozilla::dom::quota::GroupInfo::LockedRemoveOriginInfos() 	dom/quota/ActorsParent.cpp:5999
4 	xul.dll 	mozilla::dom::quota::QuotaManager::RemoveQuota() 	dom/quota/ActorsParent.cpp:3815
5 	xul.dll 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitializedInternal(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**, bool*) 	dom/quota/ActorsParent.cpp:5220
6 	xul.dll 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitialized(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**) 	dom/quota/ActorsParent.cpp:5166
7 	xul.dll 	mozilla::dom::indexedDB::`anonymous namespace'::OpenDatabaseOp::DoDatabaseWork 	dom/indexedDB/ActorsParent.cpp:21588
8 	xul.dll 	mozilla::dom::indexedDB::`anonymous namespace'::FactoryOp::Run

We cleanup partially initialized quota in EnsureOriginIsInitializedInternal but somehow some QuotaObjects are still alive.
(In reply to Shawn Huang [:shawnjohnjr] from comment #31)
> We cleanup partially initialized quota in EnsureOriginIsInitializedInternal
> but somehow some QuotaObjects are still alive.

Interesting, these assertions uncovered unrelated (but important to know) issues.
I think in this case, something called GetQuotaObject() while EnsureOriginIsInitialized was initializing storage.
We could uncover that by adding a diag assert to GetQuotaObject() to check mTemporaryStorageIsInitialized (note that when you access mTemporaryStorageIsInitialized you should be on the quota manager I/O thread or you have to use the quota mutex to protect the access).
Another signature for this release assertion on OSX: bp-d0d6814a-3923-42df-9c8d-f399d0171005
Crash Signature: mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release] → mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | nsBaseHashtable<T>::Remove | mozilla::dom::quota::QuotaObject::Release] [@ PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release] [@ nsTArray_Impl<T>::RemoveElementsAt | mo…
An another different assertion hit from AsmJsCache.
https://crash-stats.mozilla.com/report/index/a2b70019-c806-4055-a270-5d8c30171007

0 	xul.dll 	mozilla::dom::quota::OriginInfo::`scalar deleting destructor'(unsigned int) 	
1 	xul.dll 	RefPtr<mozilla::dom::quota::OriginInfo>::~RefPtr<mozilla::dom::quota::OriginInfo>() 	mfbt/RefPtr.h:79
2 	xul.dll 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned __int64, unsigned __int64) 	xpcom/ds/nsTArray.h:2060
3 	xul.dll 	mozilla::dom::quota::GroupInfo::`scalar deleting destructor'(unsigned int) 	
4 	xul.dll 	RefPtr<mozilla::dom::quota::GroupInfo>::~RefPtr<mozilla::dom::quota::GroupInfo>() 	mfbt/RefPtr.h:79
5 	xul.dll 	mozilla::dom::quota::GroupInfoPair::`scalar deleting destructor'(unsigned int) 	
6 	xul.dll 	nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::dom::quota::GroupInfoPair> >::`scalar deleting destructor'(unsigned int) 	
7 	xul.dll 	PLDHashTable::~PLDHashTable() 	xpcom/ds/PLDHashTable.cpp:325
8 	xul.dll 	mozilla::dom::quota::QuotaManager::~QuotaManager() 	dom/quota/ActorsParent.cpp:3222
9 	xul.dll 	mozilla::dom::quota::QuotaManager::`scalar deleting destructor'(unsigned int) 	
10 	xul.dll 	mozilla::dom::quota::QuotaManager::Release() 	dom/quota/QuotaManager.h:109
11 	xul.dll 	RefPtr<mozilla::dom::quota::QuotaManager>::~RefPtr<mozilla::dom::quota::QuotaManager>() 	mfbt/RefPtr.h:79
12 	xul.dll 	mozilla::dom::quota::DirectoryLockImpl::~DirectoryLockImpl() 	dom/quota/ActorsParent.cpp:2564
13 	xul.dll 	mozilla::dom::quota::DirectoryLockImpl::Release() 	dom/quota/ActorsParent.cpp:367
14 	xul.dll 	mozilla::dom::asmjscache::`anonymous namespace'::ParentRunnable::FinishOnOwningThread
Attachment #8915866 - Flags: review?(jvarga) → review+
An another IDB related pattern:

https://crash-stats.mozilla.com/report/index/f46dd342-f3c8-4588-aa2c-038ae0171005


0 	libxul.so 	mozilla::dom::quota::OriginInfo::Release 	dom/quota/ActorsParent.cpp:518
1 	libxul.so 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt 	mfbt/RefPtr.h:41
2 	libxul.so 	mozilla::dom::quota::GroupInfo::Release 	xpcom/ds/nsTArray.h:1738
3 	libxul.so 	mozilla::dom::quota::GroupInfoPair::~GroupInfoPair 	mfbt/RefPtr.h:41
4 	libxul.so 	nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::dom::quota::GroupInfoPair> > >::s_ClearEntry 	xpcom/base/nsAutoPtr.h:78
5 	libxul.so 	PLDHashTable::~PLDHashTable 	xpcom/ds/PLDHashTable.cpp:325
6 	libxul.so 	mozilla::dom::quota::QuotaManager::~QuotaManager() 	
7 	libxul.so 	mozilla::dom::quota::QuotaManager::Release 	dom/quota/ActorsParent.cpp:3222
8 	libxul.so 	mozilla::dom::quota::DirectoryLockImpl::~DirectoryLockImpl() 	
9 	libxul.so 	mozilla::dom::quota::DirectoryLockImpl::Release 	dom/quota/ActorsParent.cpp:367
10 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::Database::ConnectionClosedCallback() 	
11 	libxul.so 	mozilla::detail::RunnableMethodImpl<mozilla::dom::indexedDB::(anonymous namespace)::Database*, void (mozilla::dom::indexedDB::(anonymous namespace)::Database::*)(), true, (mozilla::RunnableKind)0>::Run() 	
12 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::WaitForTransactionsHelper::MaybeWaitForFileHandles() 	
13 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::WaitForTransactionsHelper::Run() 	
14 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::Database::MaybeCloseConnection() 	
15 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::Database::CloseInternal() 	
16 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::Database::Invalidate() 	
17 	libxul.so 	mozilla::dom::indexedDB::PBackgroundIDBDatabaseParent::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) 	
18 	libxul.so 	mozilla::dom::indexedDB::PBackgroundIDBFactoryParent::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) 	
19 	libxul.so 	mozilla::ipc::PBackgroundParent::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) 	
20 	libxul.so 	mozilla::ipc::PBackgroundParent::OnChannelClose 	ipc/ipdl/PBackgroundParent.cpp:2381
21 	libxul.so 	mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() 	
22 	libxul.so 	mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel*, void (mozilla::ipc::MessageChannel::*)(), false, (mozilla::RunnableKind)1u>::Run
(In reply to Jan Varga [:janv] from comment #29)
> This probably uncovered a different problem, the stack indicates that quota
> manager is already gone (shutdown) but IDB is still finishing and once it
> releases a directory lock, quota manager is destroyed for real (because a
> directory lock holds a strong ref to quota manager). Quota manager
> destructor starts destroying GoupInfoPairs that destroys GroupInfos and
> finally OriginInfos. However there's still a live QuotaObject ...
For Shutdown cases, is it okay to set null to pointers to OriginInfo in QuotaObjects and set the abort flags when destroying OriginInfo?
Then all functions directly return in QuotaObjects which require OriginInfo.
Since it's in Shutdown phase, it doesn't make any sense to make QuotaObject collect information from OriginInfo and calculate Quota per origin. I guess we can't guarantee that OriginInfo's lifetime is longer than QuotaObject's when destroying QuotaManager during Shutdown phase.
The problem is that a QuotaObject actually exists during quota manager destruction, if it exists it means that something is still accessing files in profile/storage.

Client::ShutdownWorkThreads is called when we start the shutdown sequence, each quota client should close everything in that method.
Backed out for often asserting mTemporaryStorageInitialized at dom/quota/ActorsParent.cpp:3936 and crashing on Android:

https://hg.mozilla.org/integration/mozilla-inbound/rev/943d78f2c4c3198a1315cde858f61178aa6dd6f5

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0b1a22a4c03cf1c66c47e41a0113969b7289d1b3&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=exception&filter-resultStatus=retry

Please also check the Valgrind failure.

Log with assertion: https://treeherder.mozilla.org/logviewer.html#?job_id=135667471&repo=mozilla-inbound

[task 2017-10-09T09:29:27.475Z] 09:29:27     INFO - TEST-START | dom/tests/mochitest/localstorage/test_localStorageQuotaSessionOnly2.html
[task 2017-10-09T09:29:27.477Z] 09:29:27     INFO - TEST-SKIP | dom/tests/mochitest/localstorage/test_localStorageQuotaSessionOnly2.html | took 0ms
[task 2017-10-09T09:29:27.480Z] 09:29:27     INFO - Running manifest: dom/tests/mochitest/dom-level0/mochitest.ini
[task 2017-10-09T09:29:27.727Z] 09:29:27     INFO -  Setting pipeline to PAUSED ...
[task 2017-10-09T09:29:27.732Z] 09:29:27     INFO -  Pipeline is PREROLLING ...
[task 2017-10-09T09:29:27.736Z] 09:29:27     INFO -  Pipeline is PREROLLED ...
[task 2017-10-09T09:29:27.737Z] 09:29:27     INFO -  Setting pipeline to PLAYING ...
[task 2017-10-09T09:29:27.738Z] 09:29:27     INFO -  New clock: GstSystemClock
[task 2017-10-09T09:29:27.775Z] 09:29:27     INFO -  Got EOS from element "pipeline0".
[task 2017-10-09T09:29:27.777Z] 09:29:27     INFO -  Execution ended after 33280216 ns.
[task 2017-10-09T09:29:27.777Z] 09:29:27     INFO -  Setting pipeline to PAUSED ...
[task 2017-10-09T09:29:27.779Z] 09:29:27     INFO -  Setting pipeline to READY ...
[task 2017-10-09T09:29:27.780Z] 09:29:27     INFO -  Setting pipeline to NULL ...
[task 2017-10-09T09:29:27.780Z] 09:29:27     INFO -  Freeing pipeline ...
[task 2017-10-09T09:29:27.851Z] 09:29:27     INFO -  22
[task 2017-10-09T09:29:28.068Z] 09:29:28     INFO -  pk12util: PKCS12 IMPORT SUCCESSFUL
[task 2017-10-09T09:29:28.333Z] 09:29:28     INFO - MochitestServer : launching [u'/builds/worker/workspace/build/tests/bin/xpcshell', '-g', '/builds/worker/workspace/build/application/firefox', '-v', '170', '-f', '/builds/worker/workspace/build/tests/bin/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmpxmy7gy.mozrunner'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', '/builds/worker/workspace/build/tests/mochitest/server.js']
[task 2017-10-09T09:29:28.335Z] 09:29:28     INFO - runtests.py | Server pid: 1054
[task 2017-10-09T09:29:28.393Z] 09:29:28     INFO - runtests.py | Websocket server pid: 1057
[task 2017-10-09T09:29:28.454Z] 09:29:28     INFO - runtests.py | SSL tunnel pid: 1060
[task 2017-10-09T09:29:29.160Z] 09:29:29     INFO - runtests.py | Running with e10s: True
[task 2017-10-09T09:29:29.162Z] 09:29:29     INFO - runtests.py | Running tests: start.
[task 2017-10-09T09:29:29.162Z] 09:29:29     INFO - 
[task 2017-10-09T09:29:29.177Z] 09:29:29     INFO -  Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2017-10-09T09:29:29.182Z] 09:29:29     INFO -  [1054, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/dom/media/CubebUtils.cpp, line 370
[task 2017-10-09T09:29:29.186Z] 09:29:29     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox -marionette -foreground -profile /tmp/tmpxmy7gy.mozrunner
[task 2017-10-09T09:29:29.247Z] 09:29:29     INFO - runtests.py | Application pid: 1079
[task 2017-10-09T09:29:29.250Z] 09:29:29     INFO - TEST-INFO | started process GECKO(1079)
[task 2017-10-09T09:29:29.404Z] 09:29:29     INFO - GECKO(1079) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /tmp/tmpxmy7gy.mozrunner/runtests_leaks.log
[task 2017-10-09T09:29:34.292Z] 09:29:34     INFO - GECKO(1079) | 1507541374281	Marionette	INFO	Enabled via --marionette
[task 2017-10-09T09:29:34.369Z] 09:29:34     INFO - GECKO(1079) | ++DOCSHELL 0x7ff764395800 == 1 [pid = 1079] [id = {ae96ae01-8955-4e74-bbce-ce29e8922a8f}]
[task 2017-10-09T09:29:34.374Z] 09:29:34     INFO - GECKO(1079) | ++DOMWINDOW == 1 (0x7ff764396000) [pid = 1079] [serial = 1] [outer = (nil)]
[task 2017-10-09T09:29:34.415Z] 09:29:34     INFO - GECKO(1079) | ++DOMWINDOW == 2 (0x7ff7643a5800) [pid = 1079] [serial = 2] [outer = 0x7ff764396000]
[task 2017-10-09T09:29:34.680Z] 09:29:34     INFO - GECKO(1079) | ++DOCSHELL 0x7ff75db83000 == 2 [pid = 1079] [id = {b2fb11d1-7d0a-42dc-af53-dd6bf09488d7}]
[task 2017-10-09T09:29:34.682Z] 09:29:34     INFO - GECKO(1079) | ++DOMWINDOW == 3 (0x7ff75db83800) [pid = 1079] [serial = 3] [outer = (nil)]
[task 2017-10-09T09:29:34.684Z] 09:29:34     INFO - GECKO(1079) | ++DOMWINDOW == 4 (0x7ff75db84800) [pid = 1079] [serial = 4] [outer = 0x7ff75db83800]
[task 2017-10-09T09:29:34.881Z] 09:29:34     INFO - GECKO(1079) | LoadPlugin() /tmp/tmpxmy7gy.mozrunner/plugins/libnptest.so returned 7ff75da90d00
[task 2017-10-09T09:29:34.917Z] 09:29:34     INFO - GECKO(1079) | LoadPlugin() /tmp/tmpxmy7gy.mozrunner/plugins/libnpswftest.so returned 7ff75da90f40
[task 2017-10-09T09:29:34.925Z] 09:29:34     INFO - GECKO(1079) | LoadPlugin() /tmp/tmpxmy7gy.mozrunner/plugins/libnpsecondtest.so returned 7ff75da90fa0
[task 2017-10-09T09:29:34.966Z] 09:29:34     INFO - GECKO(1079) | LoadPlugin() /tmp/tmpxmy7gy.mozrunner/plugins/libnpthirdtest.so returned 7ff7649c58e0
[task 2017-10-09T09:29:35.055Z] 09:29:35     INFO - GECKO(1079) | ++DOMWINDOW == 5 (0x7ff75c788800) [pid = 1079] [serial = 5] [outer = 0x7ff764396000]
[task 2017-10-09T09:29:35.397Z] 09:29:35     INFO - GECKO(1079) | ++DOCSHELL 0x7ff75b58f000 == 3 [pid = 1079] [id = {21d9b2bf-d5d4-4ca7-a27f-8394f200467f}]
[task 2017-10-09T09:29:35.402Z] 09:29:35     INFO - GECKO(1079) | ++DOMWINDOW == 6 (0x7ff75b58f800) [pid = 1079] [serial = 6] [outer = (nil)]
[task 2017-10-09T09:29:35.592Z] 09:29:35     INFO - GECKO(1079) | Assertion failure: mTemporaryStorageInitialized, at /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp:3936
[task 2017-10-09T09:30:15.126Z] 09:30:15     INFO - GECKO(1079) | #01: GetQuotaObjectFromNameAndParameters [xpcom/string/nsTSubstring.h:79]
[task 2017-10-09T09:30:15.127Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.127Z] 09:30:15     INFO - GECKO(1079) | #02: xOpen [mfbt/RefPtr.h:63]
[task 2017-10-09T09:30:15.130Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.412Z] 09:30:15     INFO - GECKO(1079) | #03: sqlite3OsOpen [db/sqlite3/src/sqlite3.c:20817]
[task 2017-10-09T09:30:15.414Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.416Z] 09:30:15     INFO - GECKO(1079) | #04: sqlite3BtreeOpen [db/sqlite3/src/sqlite3.c:52227]
[task 2017-10-09T09:30:15.416Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.419Z] 09:30:15     INFO - GECKO(1079) | #05: openDatabase [db/sqlite3/src/sqlite3.c:144554]
[task 2017-10-09T09:30:15.419Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.420Z] 09:30:15     INFO - GECKO(1079) | #06: mozilla::storage::Connection::initialize [storage/mozStorageConnection.cpp:700]
[task 2017-10-09T09:30:15.421Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.423Z] 09:30:15     INFO - GECKO(1079) | #07: mozilla::storage::Service::OpenDatabaseWithFileURL [storage/mozStorageService.cpp:742]
[task 2017-10-09T09:30:15.423Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.424Z] 09:30:15     INFO - GECKO(1079) | #08: OpenDatabaseAndHandleBusy<nsCOMPtr<nsIFileURL> > [dom/indexedDB/ActorsParent.cpp:4427]
[task 2017-10-09T09:30:15.432Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.434Z] 09:30:15     INFO - GECKO(1079) | #09: CreateStorageConnection [dom/indexedDB/ActorsParent.cpp:4520]
[task 2017-10-09T09:30:15.435Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.437Z] 09:30:15     INFO - GECKO(1079) | #10: OpenDatabaseOp::DoDatabaseWork [dom/indexedDB/ActorsParent.cpp:21669]
[task 2017-10-09T09:30:15.440Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.442Z] 09:30:15     INFO - GECKO(1079) | #11: FactoryOp::Run [dom/indexedDB/ActorsParent.cpp:21418]
[task 2017-10-09T09:30:15.445Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.447Z] 09:30:15     INFO - GECKO(1079) | #12: nsThread::ProcessNextEvent [mfbt/Maybe.h:445]
[task 2017-10-09T09:30:15.449Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.451Z] 09:30:15     INFO - GECKO(1079) | #13: NS_ProcessNextEvent [xpcom/threads/nsThreadUtils.cpp:524]
[task 2017-10-09T09:30:15.455Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.457Z] 09:30:15     INFO - GECKO(1079) | #14: mozilla::ipc::MessagePumpForNonMainThreads::Run [ipc/glue/MessagePump.cpp:368]
[task 2017-10-09T09:30:15.458Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.460Z] 09:30:15     INFO - GECKO(1079) | #15: MessageLoop::RunInternal [ipc/chromium/src/base/message_loop.cc:327]
[task 2017-10-09T09:30:15.462Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.464Z] 09:30:15     INFO - GECKO(1079) | #16: MessageLoop::Run [ipc/chromium/src/base/message_loop.cc:298]
[task 2017-10-09T09:30:15.467Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.468Z] 09:30:15     INFO - GECKO(1079) | #17: nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:427]
[task 2017-10-09T09:30:15.472Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.544Z] 09:30:15     INFO - GECKO(1079) | #18: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:219]
[task 2017-10-09T09:30:15.544Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.544Z] 09:30:15     INFO - GECKO(1079) | #19: libpthread.so.0 + 0x76ba
[task 2017-10-09T09:30:15.547Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.547Z] 09:30:15     INFO - GECKO(1079) | #20: libc.so.6 + 0x1073dd
[task 2017-10-09T09:30:15.547Z] 09:30:15     INFO - 
[task 2017-10-09T09:30:15.547Z] 09:30:15     INFO - GECKO(1079) | #21: ??? (???:???)
[task 2017-10-09T09:30:15.552Z] 09:30:15     INFO - GECKO(1079) | ExceptionHandler::GenerateDump cloned child 1123
[task 2017-10-09T09:30:15.554Z] 09:30:15     INFO - GECKO(1079) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2017-10-09T09:30:15.557Z] 09:30:15     INFO - GECKO(1079) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2017-10-09T09:32:29.294Z] 09:32:29     INFO - TEST-INFO | Main app process: exit 11
[task 2017-10-09T09:32:29.297Z] 09:32:29     INFO - Buffered messages finished
[task 2017-10-09T09:32:29.299Z] 09:32:29    ERROR - TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code 11
Flags: needinfo?(shuang)
Comment on attachment 8915866 [details] [diff] [review]
Bug 1389561 - Part 2: Add MOZ_DIAGNOSTIC_ASSERT for mTemporaryStorageInitialized

Review of attachment 8915866 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/quota/ActorsParent.cpp
@@ +3832,5 @@
>  
> +#if defined(NIGHTLY_BUILD)
> +  {
> +    MutexAutoLock autoLock(mQuotaMutex);
> +    MOZ_DIAGNOSTIC_ASSERT(mTemporaryStorageInitialized);

Shawn, I overlooked this, there are probably callers that calls this with PERSISTENCE_TYPE_PERSISTENT, in that case mTemporaryStorageInitialized doesn't have to be true.
Sorry for creating a regression. I will update the patch again.
Flags: needinfo?(shuang)
(In reply to Shawn Huang [:shawnjohnjr] from comment #22)
> I had discussion with Jan in the Storage meeting, he provided some
> suggestion since we don't have high volume of crash reports.
> Here are action plans to tackle down this issue:
> 3. Write one test containing scenario because user left comments in bug
> report, crash happened when they left computers and they saw crash when they
> return:
> (a). Hit global limit on purpose and make vacuum for temporary storage
> happened 
> (b). Send Idle notification to do maintenance
I tried to add a test case and ran on Windows, but no luck to hit crash. Then I will work on that Shutdown case for closing files from QuotaClient.
There are 89 crashes with signature "mozilla::dom::quota::QuotaManager::GetQuotaObject" in nightly 58 with buildid 20171011220113.
The moz crash reason is always "MOZ_RELEASE_ASSERT(mTemporaryStorageInitialized)".
Crash Signature: mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ] → mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ] [@ mozilla::dom::quota::QuotaManager::GetQuotaObject]
Shawn, can you please investigate those new crash reports, it seems everything is coming from DatabaseMaintenance::PerformMaintenanceOnDatabase
You can backout the last patch once you finish the investigation.
Ok, the last diagnostic patch helped a lot. I think the problem is this line:
https://dxr.mozilla.org/mozilla-central/rev/3d918ff5d63442d7b88e1b7e9cb03b832bc28fdf/dom/indexedDB/ActorsParent.cpp#18433

We have acquired a directory lock for database maintenance and we are now on quota manager IO thread and we call EnsureStorageIsInitialized() to make sure the storage directory is upgraded to latest version.
Obviously this is not enough we also need to initialize temporary storage.
Also, I think there could be a problem in IDB's QuotaClient::ShutdownWorkThreads().
We need to make sure that not only all QuotaObjects are released, all DirectoryLock objects belonging to IDB should be released too. I suspect that especially database maintenance could still hold a directory lock after ShutdownWorkThreads() finishes.
Hi Tom,
I also observed many crash reports hitting assertion, after doing LockedGetPaddingSizeFromDB. This seems to me abnormal since you requires QuotaObject when opening database.


0 	xul.dll 	mozilla::dom::quota::QuotaManager::GetQuotaObject(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char16_t> const&, __int64*) 	dom/quota/ActorsParent.cpp:3936
1 	xul.dll 	`anonymous namespace'::GetQuotaObjectFromNameAndParameters 	storage/TelemetryVFS.cpp:317
2 	xul.dll 	`anonymous namespace'::xOpen 	storage/TelemetryVFS.cpp:662
3 	nss3.dll 	sqlite3OsOpen 	db/sqlite3/src/sqlite3.c:20816
4 	nss3.dll 	sqlite3PagerOpen 	db/sqlite3/src/sqlite3.c:52226
5 	nss3.dll 	sqlite3BtreeOpen 	db/sqlite3/src/sqlite3.c:61893
6 	nss3.dll 	openDatabase 	db/sqlite3/src/sqlite3.c:144552
7 	nss3.dll 	sqlite3_open_v2 	db/sqlite3/src/sqlite3.c:144735
8 	xul.dll 	mozilla::storage::Connection::initialize(nsIFileURL*) 	storage/mozStorageConnection.cpp:699
9 	xul.dll 	mozilla::storage::Service::OpenDatabaseWithFileURL(nsIFileURL*, mozIStorageConnection**) 	storage/mozStorageService.cpp:741
10 	xul.dll 	mozilla::dom::cache::OpenDBConnection(mozilla::dom::cache::QuotaInfo const&, nsIFile*, mozIStorageConnection**) 	dom/cache/DBAction.cpp:221
11 	xul.dll 	`anonymous namespace'::LockedGetPaddingSizeFromDB 	dom/cache/QuotaClient.cpp:87
12 	xul.dll 	`anonymous namespace'::CacheQuotaClient::GetUsageForOrigin 	dom/cache/QuotaClient.cpp:180
Flags: needinfo?(ttung)
I opened the DB (cache.sqlite) to restore the padding size if the padding file is missing or there is a temporary padding file. I didn't notice that it will utilize the VFS to get a Quota Object while opening the DB which shouldn't be opened while QM initialization.
Flags: needinfo?(ttung)
If the padding file is fine to be read and the temporary padding file does not exist while getting usage, cache's quota client will just read the padding size from that file. But if the result isn't as expected, cache's quota client will try to open the DB to get the correct padding size.

Hi Jan,

This issue happens when the QM is initializing, and it's on the stage of getting usage of origins from quota clients. At this moment, if there is a temporary padding file or the padding file is missing in the cache directory, cache's quota client cannot get the padding size. The only way to get the correct size is to open the cache.sqlite to get the overall padding size. However, it also gets the quota object.

IIUC, GetQuotaObject() should fail if we won't have this assertion because the originInfo is not created at this moment. Thus, the QM will fail to initialize the origin. We should fix this.

Do you have some idea to avoid this? Thanks!
Flags: needinfo?(jvarga)
An another pattern, when doing InitializeRepository(), this indirectly calls GetQuotaObject().


https://crash-stats.mozilla.com/report/index/d89c3dd1-c4d4-4712-8b2a-b260e0171012
 0 	libxul.so 	mozilla::dom::quota::QuotaManager::GetQuotaObject(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char16_t> const&, long*) [clone .cold.254] 	
1 	libxul.so 	(anonymous namespace)::GetQuotaObjectFromNameAndParameters(char const*, char const*) 	
2 	libxul.so 	(anonymous namespace)::xOpen(sqlite3_vfs*, char const*, sqlite3_file*, int, int*) 	
3 	libmozsqlite3.so 	sqlite3BtreeOpen 	
4 	libmozsqlite3.so 	openDatabase 	
5 	libxul.so 	mozilla::storage::Connection::initialize(nsIFileURL*) 	
6 	libxul.so 	mozilla::storage::Service::OpenDatabaseWithFileURL(nsIFileURL*, mozIStorageConnection**) 	
7 	libxul.so 	nsresult mozilla::dom::indexedDB::(anonymous namespace)::OpenDatabaseAndHandleBusy<nsCOMPtr<nsIFileURL> >(mozIStorageService*, nsCOMPtr<nsIFileURL>, mozIStorageConnection**) 	
8 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::CreateStorageConnection(nsIFile*, nsIFile*, nsTSubstring<char16_t> const&, mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, unsigned int, mozIStorageConnection**) 	
9 	libxul.so 	mozilla::dom::indexedDB::FileManager::InitDirectory(nsIFile*, nsIFile*, mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, unsigned int) [clone .cold.622] 	
10 	libxul.so 	mozilla::dom::indexedDB::(anonymous namespace)::QuotaClient::InitOrigin(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, mozilla::Atomic<bool, (mozilla::MemoryOrdering)2, void> const&, mozilla::dom::quota::UsageInfo*) [clone .cold.618] 	
11 	libxul.so 	mozilla::dom::quota::QuotaManager::InitializeOrigin(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, long, bool, nsIFile*) 	
12 	libxul.so 	mozilla::dom::quota::QuotaManager::InitializeRepository 	dom/quota/ActorsParent.cpp:4287
13 	libxul.so 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitializedInternal(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**, bool*) [clone .cold.246] 	
14 	libxul.so 	mozilla::dom::quota::QuotaManager::EnsureOriginIsInitialized(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**) 	
15 	libxul.so 	mozilla::dom::cache::Context::QuotaInitRunnable::Run 	dom/cache/Context.cpp:438
Shawn, please backout the last patch ASAP.
Flags: needinfo?(jvarga)
Tom, in theory we can open sqlite databases in IDB and DOM cache code without getting a quota object (actually it makes no sense to check quota when we are initializing quota).
We currently open a sqlite database by creating a file URI that contains additional info like group, grigin, etc.
VFS layer can then extract this info and get a QuotaObject.
If we don't pass the additional info in InitOrigin(), a QuotaObject won't be created.

Anyway, while we are here, when we are in InitOrigin() and we have to open a sqlite database, we need to make sure that file size of the database and the wal file is processed after the database is closed. In theory, it can happen that even pure read operations can change size of those files.
(In reply to Jan Varga [:janv] from comment #57)
Oh, I see! Learn a lesson form it! Thanks for the explanation!

I still have one question, and I put it below.

> Anyway, while we are here, when we are in InitOrigin() and we have to open a
> sqlite database, we need to make sure that file size of the database and the
> wal file is processed after the database is closed. In theory, it can happen
> that even pure read operations can change size of those files.

How do we make sure the file size of the database and the wal file is processed? By closing the connection explicitly? Or the mozIStorageConnection object be deleted?
connection->Close();
connection = nullptr;
(In reply to Jan Varga [:janv] from comment #59)
> connection->Close();
> connection = nullptr;

Got it! Thanks!

I think forgot to close the connection, I just let this object be deleted when leaving the scope [1]. I'll file a bug for it.

[1] http://searchfox.org/mozilla-central/source/dom/cache/QuotaClient.cpp#73-105
See Also: → 1407939
Crash Signature: mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ] [@ mozilla::dom::quota::QuotaManager::GetQuotaObject] → mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ]
Comment on attachment 8918249 [details] [diff] [review]
Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8918249 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +18433,5 @@
> +  nsCString dummySuffix;
> +  nsCOMPtr<nsIFile> directory;
> +  nsresult rv;
> +
> +  for (DirectoryInfo& directoryInfo : mDirectoryInfos) {

mDirectoryInfos is empty here, it's initialized below
I did not separate InitializeRepository() from EnsureOriginIsInitializedInternal().  Multiple times 'if' check requires to go through in EnsureOriginIsInitializedInternal(). Since CheckTemporaryStorageLimits() may trigger eviction and EnsureOriginDirectory() may add it back. So I decided not to separate InitializeRepository() from EnsureOriginIsInitializedInternal().
Attachment #8918786 - Flags: review?(jvarga)
Comment on attachment 8918786 [details] [diff] [review]
Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8918786 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ -18429,5 @@
>  
>    QuotaManager* quotaManager = QuotaManager::Get();
>    MOZ_ASSERT(quotaManager);
>  
> -  nsresult rv = quotaManager->EnsureStorageIsInitialized();

Why you want to remove this ?

We iterate over profile/storage, for that we need to be sure that the directory structure is up to date.

@@ +18700,5 @@
> +  // Idle maintenance may occur before origin is initailized.
> +  // Ensure origin is initialized first. It will initialize all origins for
> +  // temporary storage including IDB origins.
> +  for (DirectoryInfo& directoryInfo : mDirectoryInfos) {
> +    rv = quotaManager->EnsureOriginIsInitialized(PERSISTENCE_TYPE_DEFAULT,

PERSISTENCE_TYPE_DEFAULT ? why ?
The structure contains persistenceType too, no ?

@@ +18701,5 @@
> +  // Ensure origin is initialized first. It will initialize all origins for
> +  // temporary storage including IDB origins.
> +  for (DirectoryInfo& directoryInfo : mDirectoryInfos) {
> +    rv = quotaManager->EnsureOriginIsInitialized(PERSISTENCE_TYPE_DEFAULT,
> +                                                 dummySuffix,

passing dummySuffix ? why ?
It's an input argument (not output)
Attachment #8918786 - Flags: review?(jvarga)
(In reply to Shawn Huang [:shawnjohnjr] from comment #65)
> Created attachment 8918786 [details] [diff] [review]
> Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork
> 
> I did not separate InitializeRepository() from
> EnsureOriginIsInitializedInternal().  Multiple times 'if' check requires to
> go through in EnsureOriginIsInitializedInternal(). Since
> CheckTemporaryStorageLimits() may trigger eviction and
> EnsureOriginDirectory() may add it back. So I decided not to separate
> InitializeRepository() from EnsureOriginIsInitializedInternal().

I don't get this.
(In reply to Jan Varga [:janv] from comment #66)
> Comment on attachment 8918786 [details] [diff] [review]
> Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork
> 
> Review of attachment 8918786 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/indexedDB/ActorsParent.cpp
> @@ -18429,5 @@
> >  
> >    QuotaManager* quotaManager = QuotaManager::Get();
> >    MOZ_ASSERT(quotaManager);
> >  
> > -  nsresult rv = quotaManager->EnsureStorageIsInitialized();
> 
> Why you want to remove this ?
> 
> We iterate over profile/storage, for that we need to be sure that the
> directory structure is up to date.

http://searchfox.org/mozilla-central/rev/40b456626e2d0409b7034768b4d9526fc7235ea4/dom/quota/ActorsParent.cpp#5193
I removed it because I thought QuotaManager::EnsureOriginIsInitializedInternal already call it.
Since we have to iterate over profile/storage first, I will add it back.

> @@ +18700,5 @@
> > +  // Idle maintenance may occur before origin is initailized.
> > +  // Ensure origin is initialized first. It will initialize all origins for
> > +  // temporary storage including IDB origins.
> > +  for (DirectoryInfo& directoryInfo : mDirectoryInfos) {
> > +    rv = quotaManager->EnsureOriginIsInitialized(PERSISTENCE_TYPE_DEFAULT,
> 
> PERSISTENCE_TYPE_DEFAULT ? why ?
> The structure contains persistenceType too, no ?
I will use persistenceType.

> @@ +18701,5 @@
> > +  // Ensure origin is initialized first. It will initialize all origins for
> > +  // temporary storage including IDB origins.
> > +  for (DirectoryInfo& directoryInfo : mDirectoryInfos) {
> > +    rv = quotaManager->EnsureOriginIsInitialized(PERSISTENCE_TYPE_DEFAULT,
> > +                                                 dummySuffix,
> 
> passing dummySuffix ? why ?
> It's an input argument (not output)
I'm wrong here. I will fix it.
(In reply to Jan Varga [:janv] from comment #67)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #65)
> > Created attachment 8918786 [details] [diff] [review]
> > Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork
> > 
> > I did not separate InitializeRepository() from
> > EnsureOriginIsInitializedInternal().  Multiple times 'if' check requires to
> > go through in EnsureOriginIsInitializedInternal(). Since
> > CheckTemporaryStorageLimits() may trigger eviction and
> > EnsureOriginDirectory() may add it back. So I decided not to separate
> > InitializeRepository() from EnsureOriginIsInitializedInternal().
> 
> I don't get this.

Shall I do the simliar things like Tom did in bug 1373183?
https://bug1373183.bmoattachments.org/attachment.cgi?id=8918109
have to think about that
(In reply to Shawn Huang [:shawnjohnjr] from comment #69)
> Shall I do the simliar things like Tom did in bug 1373183?
> https://bug1373183.bmoattachments.org/attachment.cgi?id=8918109

Note that if you use this function rather than EnsureOriginIsInitialized() in this case, you need to consider the eviction in CheckTemporaryStorageLimits(). The reason is that it may delete the origin (both directory and runtime objects such as originInfo/groupInfo/...) which will be used by Maintenance after a while.
Comment on attachment 8918826 [details] [diff] [review]
Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork

It should make sense to get suffix from metadata2, and pass suffix to EnsureOriginIsInitilized().
Comment on attachment 8918826 [details] [diff] [review]
Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8918826 [details] [diff] [review]:
-----------------------------------------------------------------

Those patches part1 and part2 were just diagnostic. So I wouldn't name this patch part 3

::: dom/indexedDB/ActorsParent.cpp
@@ +9398,5 @@
>  struct Maintenance::DirectoryInfo final
>  {
>    const nsCString mGroup;
>    const nsCString mOrigin;
> +  const nsCString mSuffix;

Nit: mSuffix should go before mGroup

@@ +9405,5 @@
>  
>    DirectoryInfo(PersistenceType aPersistenceType,
>                  const nsACString& aGroup,
>                  const nsACString& aOrigin,
> +                const nsACString& aSuffix,

Nit: before aGroup

@@ +18615,5 @@
>        }
>  
>        nsCString group;
>        nsCString origin;
> +      nsCString suffix;

Nit: before group

@@ +18673,5 @@
>          // Found a database.
>          if (databasePaths.IsEmpty()) {
>            MOZ_ASSERT(group.IsEmpty());
>            MOZ_ASSERT(origin.IsEmpty());
> +          MOZ_ASSERT(suffix.IsEmpty());

Nit: before group

@@ +18718,5 @@
> +                                                 getter_AddRefs(directory));
> +
> +  }
> +
> +  if (NS_WARN_IF(NS_FAILED(rv))) {

Is there a reason this is not checked inside of the loop ?
See Also: → 1408681
(In reply to Jan Varga [:janv] from comment #74)
> Comment on attachment 8918826 [details] [diff] [review]
> Bug 1389561 - Part 3: Ensure origin initialized in Maintenance::DirectoryWork
> 
> Review of attachment 8918826 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Those patches part1 and part2 were just diagnostic. So I wouldn't name this
> patch part 3
Rename to part 1, okay?
> @@ +18718,5 @@
> > +                                                 getter_AddRefs(directory));
> > +
> > +  }
> > +
> > +  if (NS_WARN_IF(NS_FAILED(rv))) {
> 
> Is there a reason this is not checked inside of the loop ?
I will move into the loop.
Comment on attachment 8919163 [details] [diff] [review]
Bug 1389561 - Part 1: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8919163 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +18707,5 @@
> +
> +      // Idle maintenance may occur before origin is initailized.
> +      // Ensure origin is initialized first. It will initialize all origins for
> +      // temporary storage including IDB origins.
> +      rv = quotaManager->EnsureOriginIsInitialized(persistenceType,

Are you sure, this is not called when suffix, group, etc. is empty ?
(In reply to Jan Varga [:janv] from comment #50)
> Also, I think there could be a problem in IDB's
> QuotaClient::ShutdownWorkThreads().
> We need to make sure that not only all QuotaObjects are released, all
> DirectoryLock objects belonging to IDB should be released too. I suspect
> that especially database maintenance could still hold a directory lock after
> ShutdownWorkThreads() finishes.
I'm looking into how to safely release all DirectoryLock objects in QuotaClient::ShutdownWorkThreads() now.
Comment on attachment 8919249 [details] [diff] [review]
Bug 1389561 - Part 1: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8919249 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +18707,5 @@
> +        // Idle maintenance may occur before origin is initailized.
> +        // Ensure origin is initialized first. It will initialize all origins
> +        // for temporary storage including IDB origins.
> +        rv = quotaManager->EnsureOriginIsInitialized(persistenceType,
> +                                                     suffix,

ok, isn't mSuffix now useless in the DirectoryInfo struct ?
Attachment #8919249 - Flags: review?(jvarga)
Comment on attachment 8919607 [details] [diff] [review]
Bug 1389561 - Part 1: Ensure origin initialized in Maintenance::DirectoryWork

Review of attachment 8919607 [details] [diff] [review]:
-----------------------------------------------------------------

Remove mSuffix from DirectoryInfo.
Attachment #8919607 - Flags: review?(jvarga)
Attachment #8919607 - Flags: review?(jvarga) → review+
(In reply to Shawn Huang [:shawnjohnjr] from comment #82)
> (In reply to Jan Varga [:janv] from comment #50)
> > Also, I think there could be a problem in IDB's
> > QuotaClient::ShutdownWorkThreads().
> > We need to make sure that not only all QuotaObjects are released, all
> > DirectoryLock objects belonging to IDB should be released too. I suspect
> > that especially database maintenance could still hold a directory lock after
> > ShutdownWorkThreads() finishes.
> I'm looking into how to safely release all DirectoryLock objects in
> QuotaClient::ShutdownWorkThreads() now.

Something like this should be enough:

MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() { return !QuotaClient->GetCurrentMaintenance(); }));

and maybe Maintenance::IsAborted() should also check:
QuotaClient::IsShuttingDownOnBackgroundThread()
(In reply to Jan Varga [:janv] from comment #86)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #82)
> > (In reply to Jan Varga [:janv] from comment #50)
> > > Also, I think there could be a problem in IDB's
> > > QuotaClient::ShutdownWorkThreads().
> > > We need to make sure that not only all QuotaObjects are released, all
> > > DirectoryLock objects belonging to IDB should be released too. I suspect
> > > that especially database maintenance could still hold a directory lock after
> > > ShutdownWorkThreads() finishes.
> > I'm looking into how to safely release all DirectoryLock objects in
> > QuotaClient::ShutdownWorkThreads() now.
> 
> Something like this should be enough:
> 
> MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() { return
> !QuotaClient->GetCurrentMaintenance(); }));
Tom said he saw some shutdown hang issues after doing SpinEventLoopUntil().
https://hg.mozilla.org/releases/mozilla-release/annotate/8fbf05f4b921/dom/quota/ActorsParent.cpp#l2866

See: https://crash-stats.mozilla.com/report/index/7006bb77-e573-489a-834d-966070171018
Is there anything I should be very careful during shutdown when doing SpinEventLoopUntil()?

More reports:
https://crash-stats.mozilla.com/signature/?signature=shutdownhang%20%7C%20NtWaitForAlertByThreadId%20%7C%20mozilla%3A%3Adom%3A%3Aquota%3A%3AQuotaManager%3A%3AShutdownObserver%3A%3AObserve&date=%3E%3D2017-10-11T10%3A50%3A00.000Z&date=%3C2017-10-18T10%3A50%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1
(In reply to Shawn Huang [:shawnjohnjr] from comment #87)
> Is there anything I should be very careful during shutdown when doing
> SpinEventLoopUntil()?

Of course, you should know what you are doing :)
Test it as much as you can.
Comment on attachment 8920081 [details] [diff] [review]
(WIP)Bug 1389561 - Part 2: Wait for idle maintenance releasing DirectoryLock when finished

Review of attachment 8920081 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +9306,5 @@
>    bool
>    IsAborted() const
>    {
> +    MOZ_ASSERT(mQuotaClient);
> +    MOZ_ASSERT(mQuotaClient->IsShuttingDownOnBackgroundThread());

Why would this be true ?

We call IsAborted() in all the Maintenance:: methods to bail out early, but now we only check mAborted flag, we should also check if quota manager is shutting down. For that IsShuttingDownOnBackgroundThread() can be used if we are on the background thread or IsShuttingDownOnNonBackgroundThread() if we are on some other thread.

@@ +18014,5 @@
>  
>      gFileHandleThreadPool = nullptr;
>    }
> +
> +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {

I would place this right after:
if (mMaintenanceThreadPool) {
  mMaintenanceThreadPool->Shutdown();
  mMaintenanceThreadPool = nullptr;
}
(In reply to Jan Varga [:janv] from comment #90)
> Comment on attachment 8920081 [details] [diff] [review]
> (WIP)Bug 1389561 - Part 2: Wait for idle maintenance releasing DirectoryLock
> when finished
> 
> Review of attachment 8920081 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/indexedDB/ActorsParent.cpp
> @@ +9306,5 @@
> >    bool
> >    IsAborted() const
> >    {
> > +    MOZ_ASSERT(mQuotaClient);
> > +    MOZ_ASSERT(mQuotaClient->IsShuttingDownOnBackgroundThread());
> 
> Why would this be true ?
> 
> We call IsAborted() in all the Maintenance:: methods to bail out early, but
> now we only check mAborted flag, we should also check if quota manager is
> shutting down. For that IsShuttingDownOnBackgroundThread() can be used if we
> are on the background thread or IsShuttingDownOnNonBackgroundThread() if we
> are on some other thread.
Thanks for your last comments. I misunderstood Comment 86, I will change it again.
> 
> @@ +18014,5 @@
> >  
> >      gFileHandleThreadPool = nullptr;
> >    }
> > +
> > +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {
> 
> I would place this right after:
> if (mMaintenanceThreadPool) {
>   mMaintenanceThreadPool->Shutdown();
>   mMaintenanceThreadPool = nullptr;
> }
Sure.
Comment on attachment 8920988 [details] [diff] [review]
(WIP) Bug 1389561 - Part 3: Wait for deallocating all AsmJsCache Parent Actors in ShutdownWorkThread

Review of attachment 8920988 [details] [diff] [review]:
-----------------------------------------------------------------

I like the approach.

::: dom/asmjscache/AsmJSCache.cpp
@@ +74,5 @@
>  
>  // The number of characters to hash into the Metadata::Entry::mFastHash.
>  static const unsigned sNumFastHashChars = 4096;
>  
> +// Track all created parent actors

all *live* parent actors

@@ +75,5 @@
>  // The number of characters to hash into the Metadata::Entry::mFastHash.
>  static const unsigned sNumFastHashChars = 4096;
>  
> +// Track all created parent actors
> +static nsTArray<const PAsmJSCacheEntryParent*>* sLivedParentActors = nullptr;

sLive (not sLived)

and you can add a typedef for this, see:
https://dxr.mozilla.org/mozilla-central/rev/69e24c678a28dc46a4c1bda3ff04b2f6106ff71a/dom/indexedDB/ActorsParent.cpp#10189

@@ +1058,5 @@
>  
>  void
>  DeallocEntryParent(PAsmJSCacheEntryParent* aActor)
>  {
> +  if (sLivedParentActors) {

I think the "unregistration" can be done in:
ParentRunnable::FinishOnOwningThread()
right after the directory lock is released.

@@ +1059,5 @@
>  void
>  DeallocEntryParent(PAsmJSCacheEntryParent* aActor)
>  {
> +  if (sLivedParentActors) {
> +    sLivedParentActors->RemoveElement(aActor);

You can check IsEmpty() here and delete the array if it's empty.

@@ +1735,5 @@
> +    AssertIsOnBackgroundThread();
> +
> +    if (sLivedParentActors) {
> +      MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {
> +        return sLivedParentActors->IsEmpty();

This can be then just |return !sLiveParentActors|.
Attachment #8920988 - Flags: feedback+
User Comments: crashed while applying update

https://crash-stats.mozilla.com/signature/?product=Firefox&signature=nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAt%20%7C%20mozilla%3A%3Adom%3A%3Aquota%3A%3AGroupInfoPair%3A%3A%7EGroupInfoPair#comments


0 	XUL 	nsTArray_Impl<RefPtr<mozilla::dom::quota::OriginInfo>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned long, unsigned long) 	dom/quota/ActorsParent.cpp:518
1 	XUL 	mozilla::dom::quota::GroupInfoPair::~GroupInfoPair() 	xpcom/ds/nsTArray.h:1738
2 	XUL 	nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::dom::quota::GroupInfoPair> > >::s_ClearEntry(PLDHashTable*, PLDHashEntryHdr*) 	dom/quota/ActorsParent.cpp:630
3 	XUL 	PLDHashTable::~PLDHashTable() 	xpcom/ds/PLDHashTable.cpp:325
4 	XUL 	mozilla::dom::quota::QuotaManager::~QuotaManager() 	xpcom/ds/nsTHashtable.h:391
5 	XUL 	mozilla::dom::quota::QuotaManager::~QuotaManager() 	dom/quota/ActorsParent.cpp:3219
6 	XUL 	mozilla::dom::quota::DirectoryLockImpl::Release() 	dom/quota/ActorsParent.cpp:2553
7 	XUL 	mozilla::dom::indexedDB::(anonymous namespace)::Database::ConnectionClosedCallback() 	mfbt/RefPtr.h:41
8 	XUL 	mozilla::detail::RunnableMethodImpl<mozilla::dom::indexedDB::(anonymous namespace)::Database*, void (mozilla::dom::indexedDB::(anonymous namespace)::Database::*)(), true, (mozilla::RunnableKind)0>::Run() 	xpcom/threads/nsThreadUtils.h:1142
9 	XUL 	mozilla::dom::indexedDB::(anonymous namespace)::WaitForTransactionsHelper::MaybeWaitForFileHandles() 	dom/indexedDB/ActorsParent.cpp:13908
10 	XUL 	mozilla::dom::indexedDB::(anonymous namespace)::WaitForTransactionsHelper::Run() 	dom/indexedDB/ActorsParent.cpp:13873
11 	XUL 	mozilla::dom::indexedDB::(anonymous namespace)::Database::MaybeCloseConnection() 	dom/indexedDB/ActorsParent.cpp:13853
12 	XUL 	mozilla::dom::indexedDB::(anonymous namespace)::Database::Invalidate() 	dom/indexedDB/ActorsParent.cpp:14302
13 	XUL 	mozilla::dom::indexedDB::PBackgroundIDBDatabaseParent::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) 	ipc/ipdl/PBackgroundIDBDatabaseParent.cpp:719
We have a number of crashes on linux nightly hitting the MOZ_DIAGNOSTIC_ASSERT(!mQuotaObjects.Count()) since September 30, as added in https://hg.mozilla.org/mozilla-central/rev/89ffaadaf58b
Crash Signature: mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ] → mozilla::dom::quota::GroupInfoPair::~GroupInfoPair ] [@ mozilla::dom::quota::OriginInfo::Release ]
(In reply to Shawn Huang [:shawnjohnjr] from comment #26)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> 89ffaadaf58b8afabb671cfcc7382aa743f78a51

Shawn, you can backout this patch too. We need to fix all the issues we found before these assertions are added permanently.
(In reply to Jan Varga [:janv] from comment #99)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #26)
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/
> > 89ffaadaf58b8afabb671cfcc7382aa743f78a51
> 
> Shawn, you can backout this patch too. We need to fix all the issues we
> found before these assertions are added permanently.

Done.
https://hg.mozilla.org/integration/mozilla-inbound/rev/34b8dc4374dc3d707402d11ef3e2f9e82791f486
I probably need to write tests for these Shutdown cases, however, i found that it's hard to add auto tests. Then I started to think about adding gtest if possible.
(In reply to Shawn Huang [:shawnjohnjr] from comment #102)
> I found a few user comments mentioned that hit the crash (after executing
> idle maintenance) when they're watching youtube.
> Maybe this is a good lead.
> 
> https://crash-stats.mozilla.com/report/index/f9532c5e-17f3-459b-8d56-
> 8124f0170901
> https://crash-stats.mozilla.com/report/index/47047dd3-c51e-4ccf-8556-
> 1b0ce0170928
> https://crash-stats.mozilla.com/report/index/16cb54de-3b1c-4bd3-aacc-
> f61f60170811
> https://crash-stats.mozilla.com/report/index/cab0a2e3-36da-4851-b232-
> a80120170802

I think these should be fixed by patch Part 1. Can you land it ?
(In reply to Jan Varga [:janv] from comment #103)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #102)
> I think these should be fixed by patch Part 1. Can you land it ?
Yes, if you accept partial landing, I will land it first.
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4a578d5a37b2333db5f733fc4719db706716ce2

Somehow, I'm still trying to reproduce that bug to figure out how to write test case for it.
https://hg.mozilla.org/mozilla-central/rev/c4a578d5a37b
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Hm, this has leave-open keyword, shouldn't be resolved.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla58 → ---
Comment on attachment 8921373 [details] [diff] [review]
Bug 1389561 - Part 3: Wait for deallocating all AsmJsCache Parent Actors in ShutdownWorkThread

Review of attachment 8921373 [details] [diff] [review]:
-----------------------------------------------------------------

The commit info needs some adjustments...

I'll attach a new patch with suggested changes.

::: dom/asmjscache/AsmJSCache.cpp
@@ +74,5 @@
>  
>  // The number of characters to hash into the Metadata::Entry::mFastHash.
>  static const unsigned sNumFastHashChars = 4096;
>  
> +// Track all live parent actors

Nit: Add a dot.

@@ +75,5 @@
>  // The number of characters to hash into the Metadata::Entry::mFastHash.
>  static const unsigned sNumFastHashChars = 4096;
>  
> +// Track all live parent actors
> +typedef nsTArray<const PAsmJSCacheEntryParent*>* AsmJSCacheParentLists;

It's an array not a list so the name should reflect that and I think we can use the concrete class here (ParentRunnable).

@@ +76,5 @@
>  static const unsigned sNumFastHashChars = 4096;
>  
> +// Track all live parent actors
> +typedef nsTArray<const PAsmJSCacheEntryParent*>* AsmJSCacheParentLists;
> +static AsmJSCacheParentLists sLiveParentActors = nullptr;

This can be a StaticAutoPtr.

@@ +807,5 @@
>    FileDescriptorHolder::Finish();
>  
>    mDirectoryLock = nullptr;
> +
> +  if (sLiveParentActors) {

We should assert sLiveParentActors here. If we already registered the object we must be able to unregister it otherwise the nested event loop could never finish.

@@ +812,5 @@
> +    sLiveParentActors->RemoveElement(this);
> +
> +    if (sLiveParentActors->IsEmpty()) {
> +      delete sLiveParentActors;
> +      sLiveParentActors = nullptr;

This is simpler with StaticAutoPtr.

@@ +1053,5 @@
>    RefPtr<ParentRunnable> runnable =
>      new ParentRunnable(aPrincipalInfo, aOpenMode, aWriteParams);
>  
> +  if (!sLiveParentActors) {
> +    sLiveParentActors = new nsTArray<const PAsmJSCacheEntryParent*>();

The primary purpose of the added typedef is to be able shortened type name everywhere.

@@ +1739,5 @@
>    ShutdownWorkThreads() override
> +  {
> +    AssertIsOnBackgroundThread();
> +
> +    MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {

I think the extra |if (sLiveParentActors)| is worth it here. We don't have to touch event loop stuff if there are no actors.
Attachment #8921373 - Flags: review?(jvarga) → review+
Attached patch changed part 3 patch (obsolete) — Splinter Review
Shawn, I already addressed my review comments to speed things up a bit. Please test it locally and on try and if everything is green, you can land it.
(In reply to Jan Varga [:janv] from comment #108)
> Created attachment 8923719 [details] [diff] [review]
> changed part 3 patch
> 
> Shawn, I already addressed my review comments to speed things up a bit.
> Please test it locally and on try and if everything is green, you can land
> it.

I see. I will proceed the rest of things. Thank you.
Shawn, could you also investigate latest crash reports if part 1 fix helped or not. Thanks.
(In reply to Jan Varga [:janv] from comment #110)
> Shawn, could you also investigate latest crash reports if part 1 fix helped
> or not. Thanks.
I monitored for a few days.
According to the latest crash report (signature: PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release), the latest buildid on nightly is 20171026221945 (rev:aa958b29c149).
Part 1 fix is at rev:c4a578d5a37b. So that nightly crash report revision number is behind my changeset.

I don't see any further new crash report from nightly. I also searched for other related signaures, however, all are coming from beta or release channel. It's a good sign, but I guess I might need to wait for a bit longer for confirmation according to the population of nightly users.


Signature
PLDHashTable::Search | mozilla::dom::quota::QuotaObject::Release

https://crash-stats.mozilla.com/signature/?signature=PLDHashTable%3A%3ASearch%20%7C%20mozilla%3A%3Adom%3A%3Aquota%3A%3AQuotaObject%3A%3ARelease&date=%3E%3D2017-10-25T00%3A53%3A08.000Z&date=%3C2017-11-01T00%3A53%3A08.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-build_id&_sort=-date&page=1
Cool, sounds promising.
(In reply to Shawn Huang [:shawnjohnjr] from comment #113)
> Part 3 patch on try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=f8538a245ce7ede2a7314696674bb98d36dd8a7d&selectedJob=1
> 40999143

Try looks good. But inbound tree is closed :(
Luke, this is needed for following Part 5 patch.
Attachment #8923783 - Flags: review?(luke)
Luke, this improves the fix in Part 3 patch, but it also adds infrastructure for bug 1331209. Especially IsShuttingDownOnBackgroundThread/IsShuttingDownOnNonBackgroundThread will be useful.
Attachment #8923785 - Flags: review?(luke)
Comment on attachment 8920100 [details] [diff] [review]
Bug 1389561 - Part 2: Wait for idle maintenance releasing DirectoryLock when finished

Review of attachment 8920100 [details] [diff] [review]:
-----------------------------------------------------------------

I'm improving this patch a bit, will attach a modified patch today.

::: dom/indexedDB/ActorsParent.cpp
@@ +9307,5 @@
>    IsAborted() const
>    {
> +    bool isShuttingDown = (IsOnBackgroundThread()) ?
> +                            mQuotaClient->IsShuttingDownOnBackgroundThread() :
> +                            mQuotaClient->IsShuttingDownOnNonBackgroundThread();

I changed my mind, I decided to let this method to just check mAborted. It will be better to add IsShuttingDownOnBackgroundThread checks to specific methods.

@@ +18000,5 @@
>      mMaintenanceThreadPool->Shutdown();
>      mMaintenanceThreadPool = nullptr;
>    }
>  
> +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {

I think |if (mCurrentMaintenance)| is worth it here.

@@ +18001,5 @@
>      mMaintenanceThreadPool = nullptr;
>    }
>  
> +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {
> +    RefPtr<Maintenance> currentMaintenance = GetCurrentMaintenance();

We can check mCurrentMaintenance directly here.
Attachment #8920100 - Flags: review?(jvarga)
Attachment #8923783 - Flags: review?(luke) → review+
Attachment #8923785 - Flags: review?(luke) → review+
(In reply to Jan Varga [:janv] from comment #117)
> Comment on attachment 8920100 [details] [diff] [review]
> Bug 1389561 - Part 2: Wait for idle maintenance releasing DirectoryLock when
> finished
> 
> Review of attachment 8920100 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm improving this patch a bit, will attach a modified patch today.
> 
> ::: dom/indexedDB/ActorsParent.cpp
> I changed my mind, I decided to let this method to just check mAborted. It
> will be better to add IsShuttingDownOnBackgroundThread checks to specific
> methods.
Okay, I will use |IsShuttingDownOnBackgroundThread| to check.

> > +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {
> 
> I think |if (mCurrentMaintenance)| is worth it here.
Sure.
> > +  MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() {
> > +    RefPtr<Maintenance> currentMaintenance = GetCurrentMaintenance();
> 
> We can check mCurrentMaintenance directly here.
Definitely.
Attached patch modified part 2 patch (obsolete) — Splinter Review
Shawn, please test "modified part 2 patch" along with part 4 and part 5 on try server.
(In reply to Shawn Huang [:shawnjohnjr] from comment #123)
> try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=35d3be7a62b79b60d771494e0bf43616fe1cabbb

It looks good. Let's land it.
Ok.
Ok, I think we can close this bug now.

The only thing that we didn't fix yet (discovered by the temporary diagnostic asserts, not related to this bug) is stuff described in comment 57.

Shawn, can you file a new bug for that ?

Ideally, we add the asserts again and keep them if there are no other issues.
Attachment #8924444 - Flags: review+
Attachment #8924112 - Attachment is obsolete: true
Why did we land fixes for a sec-high without sec-approval?
(In reply to Julien Cristau [:jcristau] from comment #130)
> Why did we land fixes for a sec-high without sec-approval?

Sorry, I somehow forgot about that.

Shawn, can you please request it ?
Comment on attachment 8919607 [details] [diff] [review]
Bug 1389561 - Part 1: Ensure origin initialized in Maintenance::DirectoryWork

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
1. When executing indexeddb idle maintenance, if OriginInfo is destroyed, but QuotaObjects are not all released. It's possible that DatabaseMaintenance::Run before origins initialized, it causes a UAF issue.

Additionally, we discovered another potential issue when adding diagnostics assertion, not the original UAF issue.
2. When shutting down, if OriginInfo is destroyed, but QuotaObjects are not all released, it causes a UAF issue.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- No. Comments only mentioned:
Part 1. Make sure origins had been initialized in Maintenance::DirectoryWork before getting QuotaObject.
Part 2. Wait for releasing any maintenance related directory locks.
Part 3. Release DirectoryLock and QuotaObject objects for AsmJSCache
Part 5. Prevent allocation of parent actors if quota manager is shutting down

Which older supported branches are affected by this flaw?
- It happens on 44 first, then no crash reports, see Bug 1242255. Issues are reported again on Firefox 52-ESR.

If not all supported branches, which bug introduced the flaw?
- It's not a regression.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- No. The same difficulty to create and equal risky as on m-c.

How likely is this patch to cause regressions; how much testing does it need?
- 
Part 1: Very low risk, the patch just makes sure origin is initialized. If origin is initialized already, nothing will happen.
Test should cover Indexeddb maintenance when idle.

Part 2: Low risk. The patch waits for mCurrentMaintenance is deleted when shutting down. Potential risk is that mCurrentMaintenance is not deleted in the end during shutting down, but it should not happen. Part 2 depends on Part 4, 5.

Part 3: Low risk. The patch waits for sLiveParentActors is deleted when shutting down. The potential risk is that sLiveParentActors is not deleted in the end during shutting down, but it should not happen.

Part 4: No risk. Split AsmJSCache's Client implementation into declaration and definition.

Part 5: Low risk. It just prevents allocation of parent actors if quota manager is shutting down.
Attachment #8919607 - Flags: sec-approval?
Comment on attachment 8924444 [details] [diff] [review]
Bug 1389561 - Part 2: Wait for idle maintenance releasing DirectoryLock when finished. r=janv

See Comment 133.
Attachment #8924444 - Flags: sec-approval?
Comment on attachment 8924551 [details] [diff] [review]
Bug 1389561 - Part 3: Wait for releasing all AsmJSCache parent actors in Client: :ShutdownWorkThreads(). r=janv

See Comment 133.
Attachment #8924551 - Flags: sec-approval?
Comment on attachment 8923783 [details] [diff] [review]
Part 4: Split AsmJSCache's Client implementation into declaration and definition

See Comment 133.
Attachment #8923783 - Flags: sec-approval?
Comment on attachment 8923785 [details] [diff] [review]
Part 5: Prevent allocation of parent actors if quota manager is shutting down

See Comment 133.
Attachment #8923785 - Flags: sec-approval?
We should also request beta approval once we get sec-approval, I'll do that.
(In reply to Jan Varga [:janv] from comment #129)
> Shawn, can you file a new bug for that ?
If I understood the Comment 57 correctly, opened Bug 1414191.
No, I meant the first paragraph, especially "actually it makes no sense to check quota when we are initializing quota"
The thing is that when Client::InitOrigin() is called, quota is not yet initialized, we're in the process of creating objects for quota checks. So if InitOrigin() needs to open a database we probably shouldn't get a QuotaObject for it to check quota.

But, you can also keep the bug you just filed for now.
(In reply to Jan Varga [:janv] from comment #138)
> We should also request beta approval once we get sec-approval, I'll do that.

Err, release-approval, it's already on beta.
Comment on attachment 8919607 [details] [diff] [review]
Bug 1389561 - Part 1: Ensure origin initialized in Maintenance::DirectoryWork

sec-approval=dveditz
Attachment #8919607 - Flags: sec-approval? → sec-approval+
Attachment #8923783 - Flags: sec-approval? → sec-approval+
Attachment #8923785 - Flags: sec-approval? → sec-approval+
Attachment #8924444 - Flags: sec-approval? → sec-approval+
Attachment #8924551 - Flags: sec-approval? → sec-approval+
Unfortunately this crash is so rare on nightly we're not going to know whether the patch "worked" before having to decide whether or not to uplift to 57 beta. Whenever we release this fix we should also release it on ESR-52.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → mozilla58
Group: dom-core-security → core-security-release
Please request ESR52 approval on these patches when you're comfortable doing so.
Flags: needinfo?(jvarga)
Whiteboard: [adv-main58+]
Flags: qe-verify-
Whiteboard: [adv-main58+] → [adv-main58+][post-critsmash-triage]
Over in bug 1331209, we went with the asm.js cache disabling patch for ESR52, so I don't think we need land anything from this bug there.
Flags: needinfo?(jvarga)
Yes, the only reason for landing on ESR52 was that the other bug depended on it.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.