crash in mozilla::dom::quota::QuotaObject::DisableQuotaCheck
Categories
(Toolkit :: Storage, defect, P2)
Tracking
()
People
(Reporter: shawnjohnjr, Assigned: sg)
References
(Blocks 2 open bugs)
Details
Crash Data
Attachments
(2 files)
Copied from: https://crash-stats.mozilla.com/report/index/83d18b0a-c1cf-49ea-a136-5b2920171106 0 xul.dll mozilla::dom::quota::QuotaObject::DisableQuotaCheck() dom/quota/ActorsParent.cpp:2714 1 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::DatabaseConnection::DisableQuotaChecks dom/indexedDB/ActorsParent.cpp:11355 2 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::Database::StartTransactionOp::DoDatabaseWork dom/indexedDB/ActorsParent.cpp:14927 3 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::RunOnConnectionThread dom/indexedDB/ActorsParent.cpp:23735 4 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::Database::StartTransactionOp::RunOnConnectionThread dom/indexedDB/ActorsParent.cpp:14914 5 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::Run dom/indexedDB/ActorsParent.cpp:23906 6 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1216 7 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:361
Reporter | ||
Comment 1•7 years ago
|
||
Hmm. It doesn't make any sense to see crash there. https://hg.mozilla.org/mozilla-central/annotate/e64846245b00/dom/quota/ActorsParent.cpp#l3002
Comment 2•7 years ago
|
||
Maybe the QuotaObject has been removed without being null'd? So that, firefox crash while accessing its member variables.
Reporter | ||
Comment 3•7 years ago
|
||
Ok. It's not a regression. Reports are showing the earliest reports since v48.
Reporter | ||
Comment 4•7 years ago
|
||
(In reply to Tom Tung [:tt] from comment #2) > Maybe the QuotaObject has been removed without being null'd? So that, > firefox crash while accessing its member variables. https://hg.mozilla.org/mozilla-central/annotate/e64846245b00/dom/indexedDB/ActorsParent.cpp#l11193 But we do have check mQuotaObject there. So it's possible that before mQuotaObject->DisableQuotaCheck() called, mQuotaObject is cleared on different threads.
Comment 5•7 years ago
|
||
(In reply to Shawn Huang [:shawnjohnjr] from comment #4) > (In reply to Tom Tung [:tt] from comment #2) > > Maybe the QuotaObject has been removed without being null'd? So that, > > firefox crash while accessing its member variables. > > https://hg.mozilla.org/mozilla-central/annotate/e64846245b00/dom/indexedDB/ > ActorsParent.cpp#l11193 > > But we do have check mQuotaObject there. So it's possible that before > mQuotaObject->DisableQuotaCheck() called, mQuotaObject is cleared on > different threads. Indeed. Besides, idb::DatabaseConnection holds it by a RefPtr [1]. It shouldn't become a dangling while DatabaseConnection is accessing it. [1] http://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#5054
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Comment 6•7 years ago
|
||
Note: 1. "cleanup" transaction is only accessible if "dom.indexeddb.experimental" is set to true (which is "false" by default). 2. "dom.indexeddb.experimental" was not changed according to the "userPrefs" from crash report.
Reporter | ||
Comment 7•7 years ago
|
||
Reporter | ||
Comment 8•7 years ago
|
||
At least one thing we can be sure that both mQuotaObject, mJournalQuotaObject are null. https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/dom/indexedDB/ActorsParent.cpp#11196 The only thing I can say is that mStorageConnection->GetQuotaObjects() fail to get both mQuotaObject, mJournalQuotaObject. But return value is not NS_ERROR_FAILURE.
Reporter | ||
Comment 9•7 years ago
|
||
Maybe like Bug 1389561, when doing cleanup and origin is not initialized, then QuotaObject is null?
Reporter | ||
Comment 10•7 years ago
|
||
After reading through Bug 1182987, I have better understanding the problem. I'm working on test case now.
Reporter | ||
Comment 11•7 years ago
|
||
(In reply to Bevis Tseng[:bevis][:btseng] from comment #6) > Note: > 1. "cleanup" transaction is only accessible if "dom.indexeddb.experimental" > is set to true (which is "false" by default). > 2. "dom.indexeddb.experimental" was not changed according to the "userPrefs" > from crash report. Note: For (2). userPrefs only are listed in crash report whilte list (telemetry environment). https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/toolkit/components/telemetry/TelemetryEnvironment.jsm#185 So this doesn't prove anything.
Reporter | ||
Comment 12•7 years ago
|
||
Bevis, One thing I found that code might not call db.transaction with 'cleanup' option. In https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/dom/indexedDB/IDBDatabase.cpp#676. It's possible that originally in IDBDatabase::Transaction(), aMode is IDBTransactionMode::Readwrite, which is very common usage and when mQuotaExceeded is true (user might have low disk easily) and then we create idb transaction with 'cleanup' mode. Then we can explain why cleanup mode transaction is executed to disable quota check without setting idb experimental flag. And that also explains why we can see crash on FennecAndroid, if Mobile browser users have no intention to set preferences.
Reporter | ||
Comment 13•7 years ago
|
||
We marked the database as "QuotaExceeded" and very next "readwrite" transaction would actually run as "cleanup" transaction. So add/put would be prohibited for that transaction, quota checks would be disabled and vacuuming would be done after commit. But I still don't understand why we QuotaObject is not accessible. If somehow 'cleanup' transaction would disable quota check, and somehow we still hold DirectoryLock and it wouldn't be able to collect inactive origins, OriginInfo should not be cleared.
Reporter | ||
Comment 14•7 years ago
|
||
Current test_cleanup_transaction.js doesn't cover Comment 13 implementation, so I have to write tests for readwrite mode trasnfer to cleanup mode, I guess this is why I can't hit the problem using test_cleanup_transaction.js for a template.
Reporter | ||
Comment 15•7 years ago
|
||
Okay, test_quotaExceeded_recovery.js could test that execution path, but I still need to figure out how to make QuotaObject is not accessible to simulate the original crash.
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Comment 16•5 years ago
|
||
the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...
Comment 17•5 years ago
|
||
(In reply to [:philipp] from comment #16)
the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...
I'll redirect that question to Tom who's been working in this area recently.
Comment 18•5 years ago
|
||
Frankly, I don't find any suspicious patch/bug in 64 make the crash increase dramatically.
Keep the ni flag and will look into it deeper.
Comment 19•5 years ago
|
||
i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.
Reporter | ||
Comment 20•5 years ago
|
||
Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).
- We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).
- We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).
- The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.
- Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.
We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.
If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.
Reporter | ||
Comment 21•5 years ago
|
||
(In reply to [:philipp] from comment #19)
i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.
I think it's not 64 related. It depends on the low-disk-space pre-condition, at least we have seen this bug on 48.
Reporter | ||
Comment 22•5 years ago
|
||
I'm not sure Jan or Andrew would agree to add a trivial nullptr check for mQuotaObject, mJournalQuotaObject. So I guess Tom might want to investigate the root cause here?
Reporter | ||
Comment 23•5 years ago
|
||
(In reply to Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor) from comment #20)
Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).
- We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).
- We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).
- The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.
- Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.
We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.
If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.
In our project, here is what we found:
Here is the backtrace:
(gdb) bt
#0 0xb5850090 in mozilla::dom::quota::QuotaObject::DisableQuotaCheck (this=0x0) at /home/claire/KaiOS/gecko/dom/quota/ActorsParent.cpp:2627
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162
#3 0xb58c4322 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::RunOnConnectionThread (this=this@entry=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22545
#4 0xb58c455a in mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::RunOnConnectionThread (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14149
#5 0xb58c2d22 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::Run (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22632
#6 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d730) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#7 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dc57) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#8 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=true)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#9 0xb58bccae in mozilla::dom::indexedDB::(anonymous namespace)::ConnectionPool::ThreadRunnable::Run (this=0xac587e20) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:12720
#10 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d700) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#11 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dd0f) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#12 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=false)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#13 0xb4e44648 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0xaba9d370, aDelegate=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/glue/MessagePump.cpp:340
#14 0xb4e34002 in MessageLoop::RunInternal (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:230
#15 0xb4e340b4 in RunHandler (this=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:223
#16 MessageLoop::Run (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:203
#17 0xb4c53398 in nsThread::ThreadFunc (aArg=0xac9fe840) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:396
#18 0xb478abbc in _pt_root (arg=0xaca52c80) at ../../../../../gecko/nsprpub/pr/src/pthreads/ptthread.c:216
#19 0xb6e982b0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
warning: Source file is more recent than executable.
10520 }
(gdb) p mQuotaObject
$1 = {mRawPtr = 0x0}
(gdb) p mJournalQuotaObject
$2 = {mRawPtr = 0x0}
mQuotaObject and mJournalQuotaObject are both null pointers.
Frame1:
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mOrigin
$21 = {<nsACString_internal> = {mData = 0xac14cc18 "1024+f+app://system.gaiamobile.org", mLength = 34, mFlags = 5}, <No data fields>}
(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mDatabaseName
$22 = {<nsAString_internal> = {mData = 0xaccd4168 u"asyncStorage", mLength = 12, mFlags = 5}, <No data fields>}
Frame2:
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162
(gdb) p ((TransactionBase*)mTransaction).mMode
$16 = mozilla::dom::IDBTransaction::CLEANUP
Origin: "1024+f+app://system.gaiamobile.org"
Name: "asyncStorage"
Transaction Mode: CLEANUP
Reporter | ||
Comment 24•5 years ago
|
||
Tom, feel free to take this bug if you want to fix it properly. Since I'm not sure it's acceptable to introduce a trivial fix.
Comment 25•5 years ago
|
||
Thanks for the information! I believe low-disk condition is an important clue to reproduce the issue. I will look into this after implementing the bug 1521541.
Comment 26•5 years ago
|
||
Looking into this
Updated•5 years ago
|
Comment 27•5 years ago
|
||
I wonder if this is somehow an issue on re-escaping the char in the group string [1]. For instance, if I change the spec in [2] to spec = "http://foo'.com";
. The test wouldn't stop once it's executed. The reason is that the quotaObject
mismatching (create a groupInfo
with "foo'.com" but trying to get the groupInfo
object with "foo%27.com"). I suspect if we somehow run into the CLEANUP logic in the indexedDB, we would hit the crash.
[1] https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/quota/ActorsParent.cpp#3732-3735
[2] https://searchfox.org/mozilla-central/source/dom/indexedDB/test/unit/test_quotaExceeded_recovery.js
Comment 28•5 years ago
•
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #27)
I wonder if this is somehow an issue on re-escaping the char in the group string [1].
I meant the group mismatching issue (could be introduced because of the group re-escaping)
Comment 29•5 years ago
|
||
Note that Bug 1576593 is fixed and I suppose this should help to reduce the issues here.
Comment 30•5 years ago
|
||
I suspect this could be caused by bug 1592404 as well and should be fixed by QuotaManager v3
Comment 31•4 years ago
|
||
Unassign myself because I'm not actively work on this and I believe QMv3 should fix most of issues for this bug.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 32•4 years ago
•
|
||
Based on crash report db9af6ba-607b-43b9-b7a8-c7f4f0200114, DatabaseConnection::mQuotaObject
is nullptr
here, meaning that GetQuotaObjectForFile
returned a nullptr
when called from Connection::GetQuotaObjects
at https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/mozStorageConnection.cpp#2267 but Connection::GetQuotaObjects
was still returning NS_OK
.
(I am actually just repeating what https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c8 already said.)
Should this be checked in Connection::GetQuotaObjects
and an error code returned instead?
Assignee | ||
Updated•4 years ago
|
Comment 33•4 years ago
•
|
||
I wonder, how this is supposed to work:
already_AddRefed<QuotaObject> GetQuotaObjectForFile(sqlite3_file* pFile) {
MOZ_ASSERT(pFile);
telemetry_file* p = (telemetry_file*)pFile;
RefPtr<QuotaObject> result = p->quotaObject;
return result.forget();
}
assumes that pFile
is of type telemetry_file
. But in the calling function, we have:
sqlite3_file* file;
int srv = ::sqlite3_file_control(mDBConn, nullptr, SQLITE_FCNTL_FILE_POINTER,
&file);
if (srv != SQLITE_OK) {
return convertResultCode(srv);
}
RefPtr<QuotaObject> databaseQuotaObject = GetQuotaObjectForFile(file);
Is there any magic callback inside sqlite3_file_control
that allocates a telemetry_file
object?
Assignee | ||
Comment 34•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #33)
I wonder, how this is supposed to work:
telemetry_file
is a pseudo-subclass of sqlite3_file
: https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/TelemetryVFS.cpp#149
I wonder if this should also be changed to proper inheritance.
Comment 35•4 years ago
|
||
Yes, good old C-"Inheritance". And the initialization of TelemetryVFS accounts for the telemetry_file
size, of course. So this explains why this works in the normal case (still it reads bad) :-).
Probably we see an operation on an already closed database (file) here (assuming that sqlite nulls its internal file pointer in that case) and should just return an error, rather than crash. Such that the caller can figure out, if he can still do something meaningful in this situation.
Comment 36•4 years ago
|
||
I don't think it's an already closed database. It's more like there's no quota object for the given database, but there should always be a quota object for an IDB database (except for persistent databases, but DisableQuotaChecks shouldn't be called for them).
Comment 37•4 years ago
|
||
int xClose(sqlite3_file* pFile) {
telemetry_file* p = (telemetry_file*)pFile;
int rc;
{ // Scope for IOThreadAutoTimer
IOThreadAutoTimer ioTimer(IOInterposeObserver::OpClose);
rc = p->pReal->pMethods->xClose(p->pReal);
}
if (rc == SQLITE_OK) {
delete p->base.pMethods;
p->base.pMethods = nullptr;
p->quotaObject = nullptr;
delete[] p->location;
#ifdef DEBUG
p->fileChunkSize = 0;
#endif
}
return rc;
}
where the quotaObject is explicitly set to null. Might not be the only case this happens, though.
Comment 38•4 years ago
•
|
||
But then Connection::GetQuotaObjects will return NS_ERROR_NOT_INITIALIZED because the connection is closed.
Comment 39•4 years ago
|
||
So the assumption/invariant is, that if we have a valid mDBConn
we also must have a valid QuotaObject inside? This seems to be broken in this specific case (for whatever reason).
Well, we would definitely benefit from a recording of that rather than guessing from the code. Still we might want to consider to follow Shawn's suggestion https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c20 to check for a nullptr and fail more gently in case.
Comment 40•4 years ago
|
||
I'm ok with adding a check for nullptr if we also add something that won't hide this problem. I need more time to figure out how to do that or someone else can try.
Assignee | ||
Comment 41•4 years ago
|
||
Adding an NS_WARN_IF and an appropriate comment in the code and opening a new Bug here that covers analyzing the root cause seems to be sufficiently visible to me.
Another possibility would be to add telemetry so we can track the number of such failures.
Comment 42•4 years ago
|
||
New telemetry sounds good to me, while we are here, the telemetry should check that there are quota objects after opening the database (not just during the cleanup transaction).
Comment 43•4 years ago
|
||
(still in guess mode and obviously ok to not hide the cause)
I see in two reports I randomly clicked on parallel operations on IndexedDB on another thread: 8b0e7f1f-8d58-4e0f-828d-1f7f10200115 and 8b0e7f1f-8d58-4e0f-828d-1f7f10200115. We might see the effect of a race here?
Assignee | ||
Comment 44•4 years ago
|
||
To resolve the crashes, as part of this bug, the crashes should be changed into regular failures of Connection::GetQuotaObjects
. Bugs 1609640 and 1609641 track adding telemetry and identifying/resolving the root cause of these failures.
Assignee | ||
Comment 45•4 years ago
|
||
Updated•4 years ago
|
Comment 46•4 years ago
|
||
Pushed by sgiesecke@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a90fb9414ef6 Ensure Connection::GetQuotaObjects only returns NS_OK if the quota objects are non-null. r=janv,dom-workers-and-storage-reviewers
Comment 47•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•