crash in mozilla::dom::quota::QuotaObject::DisableQuotaCheck
Categories
(Core :: SQLite and Embedded Database Bindings, defect, P2)
Tracking
()
People
(Reporter: shawnjohnjr, Assigned: sg)
References
(Blocks 2 open bugs)
Details
Crash Data
Attachments
(2 files)
| Reporter | ||
Comment 1•8 years ago
|
||
Comment 2•8 years ago
|
||
| Reporter | ||
Comment 3•8 years ago
|
||
| Reporter | ||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Updated•8 years ago
|
| Reporter | ||
Updated•8 years ago
|
| Reporter | ||
Updated•8 years ago
|
Comment 6•8 years ago
|
||
| Reporter | ||
Comment 7•8 years ago
|
||
| Reporter | ||
Comment 8•8 years ago
|
||
| Reporter | ||
Comment 9•8 years ago
|
||
| Reporter | ||
Comment 10•8 years ago
|
||
| Reporter | ||
Comment 11•8 years ago
|
||
| Reporter | ||
Comment 12•8 years ago
|
||
| Reporter | ||
Comment 13•8 years ago
|
||
| Reporter | ||
Comment 14•8 years ago
|
||
| Reporter | ||
Comment 15•8 years ago
|
||
Updated•8 years ago
|
| Reporter | ||
Updated•7 years ago
|
Comment 16•7 years ago
|
||
the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...
Comment 17•7 years ago
|
||
(In reply to [:philipp] from comment #16)
the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...
I'll redirect that question to Tom who's been working in this area recently.
Comment 18•7 years ago
|
||
Frankly, I don't find any suspicious patch/bug in 64 make the crash increase dramatically.
Keep the ni flag and will look into it deeper.
Comment 19•7 years ago
|
||
i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.
| Reporter | ||
Comment 20•7 years ago
|
||
Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).
- We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).
- We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).
- The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.
- Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.
We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.
If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.
| Reporter | ||
Comment 21•7 years ago
|
||
(In reply to [:philipp] from comment #19)
i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.
I think it's not 64 related. It depends on the low-disk-space pre-condition, at least we have seen this bug on 48.
| Reporter | ||
Comment 22•7 years ago
|
||
I'm not sure Jan or Andrew would agree to add a trivial nullptr check for mQuotaObject, mJournalQuotaObject. So I guess Tom might want to investigate the root cause here?
| Reporter | ||
Comment 23•7 years ago
|
||
(In reply to Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor) from comment #20)
Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).
- We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).
- We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).
- The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.
- Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.
We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.
If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.
In our project, here is what we found:
Here is the backtrace:
(gdb) bt
#0 0xb5850090 in mozilla::dom::quota::QuotaObject::DisableQuotaCheck (this=0x0) at /home/claire/KaiOS/gecko/dom/quota/ActorsParent.cpp:2627
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162
#3 0xb58c4322 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::RunOnConnectionThread (this=this@entry=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22545
#4 0xb58c455a in mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::RunOnConnectionThread (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14149
#5 0xb58c2d22 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::Run (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22632
#6 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d730) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#7 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dc57) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#8 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=true)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#9 0xb58bccae in mozilla::dom::indexedDB::(anonymous namespace)::ConnectionPool::ThreadRunnable::Run (this=0xac587e20) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:12720
#10 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d700) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#11 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dd0f) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#12 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=false)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#13 0xb4e44648 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0xaba9d370, aDelegate=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/glue/MessagePump.cpp:340
#14 0xb4e34002 in MessageLoop::RunInternal (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:230
#15 0xb4e340b4 in RunHandler (this=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:223
#16 MessageLoop::Run (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:203
#17 0xb4c53398 in nsThread::ThreadFunc (aArg=0xac9fe840) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:396
#18 0xb478abbc in _pt_root (arg=0xaca52c80) at ../../../../../gecko/nsprpub/pr/src/pthreads/ptthread.c:216
#19 0xb6e982b0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
warning: Source file is more recent than executable.
10520 }
(gdb) p mQuotaObject
$1 = {mRawPtr = 0x0}
(gdb) p mJournalQuotaObject
$2 = {mRawPtr = 0x0}
mQuotaObject and mJournalQuotaObject are both null pointers.
Frame1:
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mOrigin
$21 = {<nsACString_internal> = {mData = 0xac14cc18 "1024+f+app://system.gaiamobile.org", mLength = 34, mFlags = 5}, <No data fields>}
(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mDatabaseName
$22 = {<nsAString_internal> = {mData = 0xaccd4168 u"asyncStorage", mLength = 12, mFlags = 5}, <No data fields>}
Frame2:
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162
(gdb) p ((TransactionBase*)mTransaction).mMode
$16 = mozilla::dom::IDBTransaction::CLEANUP
Origin: "1024+f+app://system.gaiamobile.org"
Name: "asyncStorage"
Transaction Mode: CLEANUP
| Reporter | ||
Comment 24•7 years ago
|
||
Tom, feel free to take this bug if you want to fix it properly. Since I'm not sure it's acceptable to introduce a trivial fix.
Comment 25•7 years ago
|
||
Thanks for the information! I believe low-disk condition is an important clue to reproduce the issue. I will look into this after implementing the bug 1521541.
Comment 26•7 years ago
|
||
Looking into this
Updated•6 years ago
|
Comment 27•6 years ago
|
||
I wonder if this is somehow an issue on re-escaping the char in the group string [1]. For instance, if I change the spec in [2] to spec = "http://foo'.com";. The test wouldn't stop once it's executed. The reason is that the quotaObject mismatching (create a groupInfo with "foo'.com" but trying to get the groupInfo object with "foo%27.com"). I suspect if we somehow run into the CLEANUP logic in the indexedDB, we would hit the crash.
[1] https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/quota/ActorsParent.cpp#3732-3735
[2] https://searchfox.org/mozilla-central/source/dom/indexedDB/test/unit/test_quotaExceeded_recovery.js
Comment 28•6 years ago
•
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #27)
I wonder if this is somehow an issue on re-escaping the char in the group string [1].
I meant the group mismatching issue (could be introduced because of the group re-escaping)
Comment 29•6 years ago
|
||
Note that Bug 1576593 is fixed and I suppose this should help to reduce the issues here.
Comment 30•6 years ago
|
||
I suspect this could be caused by bug 1592404 as well and should be fixed by QuotaManager v3
Comment 31•6 years ago
|
||
Unassign myself because I'm not actively work on this and I believe QMv3 should fix most of issues for this bug.
Updated•6 years ago
|
Updated•6 years ago
|
| Assignee | ||
Comment 32•6 years ago
•
|
||
Based on crash report db9af6ba-607b-43b9-b7a8-c7f4f0200114, DatabaseConnection::mQuotaObject is nullptr here, meaning that GetQuotaObjectForFile returned a nullptr when called from Connection::GetQuotaObjects at https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/mozStorageConnection.cpp#2267 but Connection::GetQuotaObjects was still returning NS_OK.
(I am actually just repeating what https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c8 already said.)
Should this be checked in Connection::GetQuotaObjects and an error code returned instead?
| Assignee | ||
Updated•6 years ago
|
Comment 33•6 years ago
•
|
||
I wonder, how this is supposed to work:
already_AddRefed<QuotaObject> GetQuotaObjectForFile(sqlite3_file* pFile) {
MOZ_ASSERT(pFile);
telemetry_file* p = (telemetry_file*)pFile;
RefPtr<QuotaObject> result = p->quotaObject;
return result.forget();
}
assumes that pFile is of type telemetry_file. But in the calling function, we have:
sqlite3_file* file;
int srv = ::sqlite3_file_control(mDBConn, nullptr, SQLITE_FCNTL_FILE_POINTER,
&file);
if (srv != SQLITE_OK) {
return convertResultCode(srv);
}
RefPtr<QuotaObject> databaseQuotaObject = GetQuotaObjectForFile(file);
Is there any magic callback inside sqlite3_file_control that allocates a telemetry_file object?
| Assignee | ||
Comment 34•6 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #33)
I wonder, how this is supposed to work:
telemetry_file is a pseudo-subclass of sqlite3_file: https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/TelemetryVFS.cpp#149
I wonder if this should also be changed to proper inheritance.
Comment 35•6 years ago
|
||
Yes, good old C-"Inheritance". And the initialization of TelemetryVFS accounts for the telemetry_file size, of course. So this explains why this works in the normal case (still it reads bad) :-).
Probably we see an operation on an already closed database (file) here (assuming that sqlite nulls its internal file pointer in that case) and should just return an error, rather than crash. Such that the caller can figure out, if he can still do something meaningful in this situation.
Comment 36•6 years ago
|
||
I don't think it's an already closed database. It's more like there's no quota object for the given database, but there should always be a quota object for an IDB database (except for persistent databases, but DisableQuotaChecks shouldn't be called for them).
Comment 37•6 years ago
|
||
int xClose(sqlite3_file* pFile) {
telemetry_file* p = (telemetry_file*)pFile;
int rc;
{ // Scope for IOThreadAutoTimer
IOThreadAutoTimer ioTimer(IOInterposeObserver::OpClose);
rc = p->pReal->pMethods->xClose(p->pReal);
}
if (rc == SQLITE_OK) {
delete p->base.pMethods;
p->base.pMethods = nullptr;
p->quotaObject = nullptr;
delete[] p->location;
#ifdef DEBUG
p->fileChunkSize = 0;
#endif
}
return rc;
}
where the quotaObject is explicitly set to null. Might not be the only case this happens, though.
Comment 38•6 years ago
•
|
||
But then Connection::GetQuotaObjects will return NS_ERROR_NOT_INITIALIZED because the connection is closed.
Comment 39•6 years ago
|
||
So the assumption/invariant is, that if we have a valid mDBConn we also must have a valid QuotaObject inside? This seems to be broken in this specific case (for whatever reason).
Well, we would definitely benefit from a recording of that rather than guessing from the code. Still we might want to consider to follow Shawn's suggestion https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c20 to check for a nullptr and fail more gently in case.
Comment 40•6 years ago
|
||
I'm ok with adding a check for nullptr if we also add something that won't hide this problem. I need more time to figure out how to do that or someone else can try.
| Assignee | ||
Comment 41•6 years ago
|
||
Adding an NS_WARN_IF and an appropriate comment in the code and opening a new Bug here that covers analyzing the root cause seems to be sufficiently visible to me.
Another possibility would be to add telemetry so we can track the number of such failures.
Comment 42•6 years ago
|
||
New telemetry sounds good to me, while we are here, the telemetry should check that there are quota objects after opening the database (not just during the cleanup transaction).
Comment 43•6 years ago
|
||
(still in guess mode and obviously ok to not hide the cause)
I see in two reports I randomly clicked on parallel operations on IndexedDB on another thread: 8b0e7f1f-8d58-4e0f-828d-1f7f10200115 and 8b0e7f1f-8d58-4e0f-828d-1f7f10200115. We might see the effect of a race here?
| Assignee | ||
Comment 44•6 years ago
|
||
To resolve the crashes, as part of this bug, the crashes should be changed into regular failures of Connection::GetQuotaObjects. Bugs 1609640 and 1609641 track adding telemetry and identifying/resolving the root cause of these failures.
| Assignee | ||
Comment 45•6 years ago
|
||
Updated•6 years ago
|
Comment 46•6 years ago
|
||
Comment 47•6 years ago
|
||
| bugherder | ||
Updated•6 years ago
|
Updated•1 year ago
|
Description
•