<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Comment 2

•

8 years ago

Maybe the QuotaObject has been removed without being null'd? So that, firefox crash while accessing its member variables.

Reporter

Comment 3

•

8 years ago

Ok. It's not a regression. Reports are showing the earliest reports since v48.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 4

•

8 years ago

(In reply to Tom Tung [:tt] from comment #2) > Maybe the QuotaObject has been removed without being null'd? So that, > firefox crash while accessing its member variables. https://hg.mozilla.org/mozilla-central/annotate/e64846245b00/dom/indexedDB/ActorsParent.cpp#l11193 But we do have check mQuotaObject there. So it's possible that before mQuotaObject->DisableQuotaCheck() called, mQuotaObject is cleared on different threads.

Andrew Overholt [:overholt]

Comment 5

•

8 years ago

(In reply to Shawn Huang [:shawnjohnjr] from comment #4) > (In reply to Tom Tung [:tt] from comment #2) > > Maybe the QuotaObject has been removed without being null'd? So that, > > firefox crash while accessing its member variables. > > https://hg.mozilla.org/mozilla-central/annotate/e64846245b00/dom/indexedDB/ > ActorsParent.cpp#l11193 > > But we do have check mQuotaObject there. So it's possible that before > mQuotaObject->DisableQuotaCheck() called, mQuotaObject is cleared on > different threads. Indeed. Besides, idb::DatabaseConnection holds it by a RefPtr [1]. It shouldn't become a dangling while DatabaseConnection is accessing it. [1] http://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#5054

Updated

•

8 years ago

Priority: -- → P2

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Updated

•

8 years ago

Crash Signature: [@ mozilla::dom::quota::QuotaObject::DisableQuotaCheck]

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Updated

•

8 years ago

Assignee: nobody → shuang

Bevis Tseng[:bevis][:btseng](Exited)

Comment 6

•

8 years ago

Note: 1. "cleanup" transaction is only accessible if "dom.indexeddb.experimental" is set to true (which is "false" by default). 2. "dom.indexeddb.experimental" was not changed according to the "userPrefs" from crash report.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 7

•

8 years ago

Attached image stack trace (prove mQuotaObject, mJournalQuotaObject is null) — Details

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 8

•

8 years ago

At least one thing we can be sure that both mQuotaObject, mJournalQuotaObject are null. https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/dom/indexedDB/ActorsParent.cpp#11196 The only thing I can say is that mStorageConnection->GetQuotaObjects() fail to get both mQuotaObject, mJournalQuotaObject. But return value is not NS_ERROR_FAILURE.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 9

•

8 years ago

Maybe like Bug 1389561, when doing cleanup and origin is not initialized, then QuotaObject is null?

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 10

•

8 years ago

After reading through Bug 1182987, I have better understanding the problem. I'm working on test case now.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 11

•

8 years ago

(In reply to Bevis Tseng[:bevis][:btseng] from comment #6) > Note: > 1. "cleanup" transaction is only accessible if "dom.indexeddb.experimental" > is set to true (which is "false" by default). > 2. "dom.indexeddb.experimental" was not changed according to the "userPrefs" > from crash report. Note: For (2). userPrefs only are listed in crash report whilte list (telemetry environment). https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/toolkit/components/telemetry/TelemetryEnvironment.jsm#185 So this doesn't prove anything.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 12

•

8 years ago

Bevis, One thing I found that code might not call db.transaction with 'cleanup' option. In https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/dom/indexedDB/IDBDatabase.cpp#676. It's possible that originally in IDBDatabase::Transaction(), aMode is IDBTransactionMode::Readwrite, which is very common usage and when mQuotaExceeded is true (user might have low disk easily) and then we create idb transaction with 'cleanup' mode. Then we can explain why cleanup mode transaction is executed to disable quota check without setting idb experimental flag. And that also explains why we can see crash on FennecAndroid, if Mobile browser users have no intention to set preferences.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 13

•

8 years ago

We marked the database as "QuotaExceeded" and very next "readwrite" transaction would actually run as "cleanup" transaction. So add/put would be prohibited for that transaction, quota checks would be disabled and vacuuming would be done after commit. But I still don't understand why we QuotaObject is not accessible. If somehow 'cleanup' transaction would disable quota check, and somehow we still hold DirectoryLock and it wouldn't be able to collect inactive origins, OriginInfo should not be cleared.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 14

•

8 years ago

Current test_cleanup_transaction.js doesn't cover Comment 13 implementation, so I have to write tests for readwrite mode trasnfer to cleanup mode, I guess this is why I can't hit the problem using test_cleanup_transaction.js for a template.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 15

•

8 years ago

Okay, test_quotaExceeded_recovery.js could test that execution path, but I still need to figure out how to make QuotaObject is not accessible to simulate the original crash.

Marion Daly [:mdaly]

Updated

•

8 years ago

Assignee: shawnjohnjr → nobody

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Updated

•

7 years ago

Assignee: nobody → shawnjohnjr

[:philipp]

Comment 16

•

7 years ago

the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...

Flags: needinfo?(overholt)

Andrew Overholt [:overholt]

Comment 17

•

7 years ago

(In reply to [:philipp] from comment #16)

the crash signature curiously started spiking mid-january on release during the 64 cycle, is it worth further looking into that? so far it appears that it's not correlated to particular urls...

I'll redirect that question to Tom who's been working in this area recently.

Flags: needinfo?(overholt) → needinfo?(shes050117)

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Comment 18

•

7 years ago

Frankly, I don't find any suspicious patch/bug in 64 make the crash increase dramatically.

Keep the ni flag and will look into it deeper.

[:philipp]

Comment 19

•

7 years ago

i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.

Reporter

Comment 20

•

7 years ago

Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).

We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).
We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).
The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.
Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.

We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.

If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 21

•

7 years ago

(In reply to [:philipp] from comment #19)

i wouldn't expect it to be related to something in 64 in particular, it got released on december 11, whereas this spike happened over a month later.

I think it's not 64 related. It depends on the low-disk-space pre-condition, at least we have seen this bug on 48.

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 22

•

7 years ago

I'm not sure Jan or Andrew would agree to add a trivial nullptr check for mQuotaObject, mJournalQuotaObject. So I guess Tom might want to investigate the root cause here?

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 23

•

7 years ago

(In reply to Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor) from comment #20)

Here is the further information how we can observe the problem on our project (well, it's the b2g project based on gecko 48).

We have very low storage space device, and the device consumes storage space constantly (such as writing logs into disk).

We have a mechanism to aggressively clean up temporary origin space when storage is under 50MB (via b2g disk space watcher not via QuotaManager, since QuotaManager won't actively update device space information, but only when Firefox is initialized).

The crash happens when we place device in the idle situation. And we suspected that we constantly hit low storage and at the same time we cleanup storage.

Simply adding nullptr-check guards for both mQuotaObject, mJournalQuotaObject is fine. And we did not see any other crash here after guards been added. At least from our point of view, it's safe to do and we did not see side effects.

We did not spend too much time here to investigate why mQuotaObject, mJournalQuotaObject are nullptr for StorageConnection due to limited manpower.

If you want to investigate further, you must try to make space very low and constantly write data into disk and at the same time try to clean up origin.

In our project, here is what we found:

Here is the backtrace:

(gdb) bt
#0 0xb5850090 in mozilla::dom::quota::QuotaObject::DisableQuotaCheck (this=0x0) at /home/claire/KaiOS/gecko/dom/quota/ActorsParent.cpp:2627
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162
#3 0xb58c4322 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::RunOnConnectionThread (this=this@entry=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22545
#4 0xb58c455a in mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::RunOnConnectionThread (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14149
#5 0xb58c2d22 in mozilla::dom::indexedDB::(anonymous namespace)::TransactionDatabaseOperationBase::Run (this=0xab974380)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:22632
#6 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d730) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#7 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dc57) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#8 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=true)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#9 0xb58bccae in mozilla::dom::indexedDB::(anonymous namespace)::ConnectionPool::ThreadRunnable::Run (this=0xac587e20) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:12720
#10 0xb5d057d6 in mozilla::tasktracer::TracedRunnable::Run (this=0xaba9d700) at /home/claire/KaiOS/gecko/tools/profiler/tasktracer/TracedTaskCommon.cpp:114
#11 0xb4c522ea in nsThread::ProcessNextEvent (this=0xac9fe840, aMayWait=<optimized out>, aResult=0xa680dd0f) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:994
#12 0xb4c63c90 in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0xac9fe840, aMayWait=aMayWait@entry=false)
at /home/claire/KaiOS/gecko/xpcom/glue/nsThreadUtils.cpp:290
#13 0xb4e44648 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0xaba9d370, aDelegate=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/glue/MessagePump.cpp:340
#14 0xb4e34002 in MessageLoop::RunInternal (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:230
#15 0xb4e340b4 in RunHandler (this=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:223
#16 MessageLoop::Run (this=this@entry=0xb07ef900) at /home/claire/KaiOS/gecko/ipc/chromium/src/base/message_loop.cc:203
#17 0xb4c53398 in nsThread::ThreadFunc (aArg=0xac9fe840) at /home/claire/KaiOS/gecko/xpcom/threads/nsThread.cpp:396
#18 0xb478abbc in _pt_root (arg=0xaca52c80) at ../../../../../gecko/nsprpub/pr/src/pthreads/ptthread.c:216
#19 0xb6e982b0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520
warning: Source file is more recent than executable.
10520 }
(gdb) p mQuotaObject
$1 = {mRawPtr = 0x0}
(gdb) p mJournalQuotaObject
$2 = {mRawPtr = 0x0}

mQuotaObject and mJournalQuotaObject are both null pointers.

Frame1:
#1 0xb58c479a in DisableQuotaChecks (this=0xa98a1cc0) at ../../../gecko/dom/indexedDB/ActorsParent.cpp:10520

(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mOrigin
$21 = {<nsACString_internal> = {mData = 0xac14cc18 "1024+f+app://system.gaiamobile.org", mLength = 34, mFlags = 5}, <No data fields>}
(gdb) p ((mozilla::dom::indexedDB::FileManager*)mFileManager)->mDatabaseName
$22 = {<nsAString_internal> = {mData = 0xaccd4168 u"asyncStorage", mLength = 12, mFlags = 5}, <No data fields>}

Frame2:
#2 mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork (this=0xab974380, aConnection=0xa98a1cc0)
at ../../../gecko/dom/indexedDB/ActorsParent.cpp:14162

(gdb) p ((TransactionBase*)mTransaction).mMode
$16 = mozilla::dom::IDBTransaction::CLEANUP

Origin: "1024+f+app://system.gaiamobile.org"
Name: "asyncStorage"
Transaction Mode: CLEANUP

Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor)

Reporter

Comment 24

•

7 years ago

Tom, feel free to take this bug if you want to fix it properly. Since I'm not sure it's acceptable to introduce a trivial fix.

Comment 25

•

7 years ago

Thanks for the information! I believe low-disk condition is an important clue to reproduce the issue. I will look into this after implementing the bug 1521541.

Comment 26

•

7 years ago

Looking into this

Assignee: shawnjohnjr → shes050117

Status: NEW → ASSIGNED

Flags: needinfo?(shes050117)

Updated

•

6 years ago

Crash Signature: [@ mozilla::dom::quota::QuotaObject::DisableQuotaCheck] → [@ mozilla::dom::quota::QuotaObject::DisableQuotaCheck][@ mozilla::dom::indexedDB::(anonymous namespace)::Database::StartTransactionOp::DoDatabaseWork(mozilla::dom::indexedDB::(anonymous namespace)::DatabaseConnection*)]

status-firefox66: --- → affected

status-firefox67: --- → affected

status-firefox68: --- → affected

status-firefox-esr60: --- → affected

[1] https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/quota/ActorsParent.cpp#3732-3735
[2] https://searchfox.org/mozilla-central/source/dom/indexedDB/test/unit/test_quotaExceeded_recovery.js

Comment 27

•

6 years ago

I wonder if this is somehow an issue on re-escaping the char in the group string [1]. For instance, if I change the spec in [2] to spec = "http://foo'.com";. The test wouldn't stop once it's executed. The reason is that the quotaObject mismatching (create a groupInfo with "foo'.com" but trying to get the groupInfo object with "foo%27.com"). I suspect if we somehow run into the CLEANUP logic in the indexedDB, we would hit the crash.

Comment 28

•

6 years ago

•

Edited

(In reply to Tom Tung [:tt, :ttung] from comment #27)

I wonder if this is somehow an issue on re-escaping the char in the group string [1].

I meant the group mismatching issue (could be introduced because of the group re-escaping)

Comment 29

•

6 years ago

Note that Bug 1576593 is fixed and I suppose this should help to reduce the issues here.

Comment 30

•

6 years ago

I suspect this could be caused by bug 1592404 as well and should be fixed by QuotaManager v3

Comment 31

•

6 years ago

Unassign myself because I'm not actively work on this and I believe QMv3 should fix most of issues for this bug.

Assignee: ttung → nobody

Status: ASSIGNED → NEW

Updated

•

6 years ago

Component: Storage: Quota Manager → Storage: IndexedDB

Updated

•

6 years ago

Flags: needinfo?(sgiesecke)

Assignee

Comment 32

•

6 years ago

•

Edited

Based on crash report db9af6ba-607b-43b9-b7a8-c7f4f0200114, DatabaseConnection::mQuotaObject is nullptr here, meaning that GetQuotaObjectForFile returned a nullptr when called from Connection::GetQuotaObjects at https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/mozStorageConnection.cpp#2267 but Connection::GetQuotaObjects was still returning NS_OK.

(I am actually just repeating what https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c8 already said.)

Should this be checked in Connection::GetQuotaObjects and an error code returned instead?

Flags: needinfo?(sgiesecke) → needinfo?(jvarga)

Assignee

Updated

•

6 years ago

Component: Storage: IndexedDB → Storage: Quota Manager

Comment 33

•

6 years ago

•

Edited

I wonder, how this is supposed to work:

already_AddRefed<QuotaObject> GetQuotaObjectForFile(sqlite3_file* pFile) {
  MOZ_ASSERT(pFile);

  telemetry_file* p = (telemetry_file*)pFile;
  RefPtr<QuotaObject> result = p->quotaObject;
  return result.forget();
}

assumes that pFile is of type telemetry_file. But in the calling function, we have:

  sqlite3_file* file;
  int srv = ::sqlite3_file_control(mDBConn, nullptr, SQLITE_FCNTL_FILE_POINTER,
                                   &file);
  if (srv != SQLITE_OK) {
    return convertResultCode(srv);
  }

  RefPtr<QuotaObject> databaseQuotaObject = GetQuotaObjectForFile(file);

Is there any magic callback inside sqlite3_file_control that allocates a telemetry_file object?

Assignee

Comment 34

•

6 years ago

(In reply to Jens Stutte [:jstutte] from comment #33)

I wonder, how this is supposed to work:

telemetry_file is a pseudo-subclass of sqlite3_file: https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/storage/TelemetryVFS.cpp#149

I wonder if this should also be changed to proper inheritance.

Comment 35

•

6 years ago

Yes, good old C-"Inheritance". And the initialization of TelemetryVFS accounts for the telemetry_file size, of course. So this explains why this works in the normal case (still it reads bad) :-).

Probably we see an operation on an already closed database (file) here (assuming that sqlite nulls its internal file pointer in that case) and should just return an error, rather than crash. Such that the caller can figure out, if he can still do something meaningful in this situation.

Comment 36

•

6 years ago

I don't think it's an already closed database. It's more like there's no quota object for the given database, but there should always be a quota object for an IDB database (except for persistent databases, but DisableQuotaChecks shouldn't be called for them).

Flags: needinfo?(jvarga)

Comment 37

•

6 years ago

Hmmm, I see:

int xClose(sqlite3_file* pFile) {
  telemetry_file* p = (telemetry_file*)pFile;
  int rc;
  {  // Scope for IOThreadAutoTimer
    IOThreadAutoTimer ioTimer(IOInterposeObserver::OpClose);
    rc = p->pReal->pMethods->xClose(p->pReal);
  }
  if (rc == SQLITE_OK) {
    delete p->base.pMethods;
    p->base.pMethods = nullptr;
    p->quotaObject = nullptr;
    delete[] p->location;
#ifdef DEBUG
    p->fileChunkSize = 0;
#endif
  }
  return rc;
}

where the quotaObject is explicitly set to null. Might not be the only case this happens, though.

Comment 38

•

6 years ago

•

Edited

But then Connection::GetQuotaObjects will return NS_ERROR_NOT_INITIALIZED because the connection is closed.

Comment 39

•

6 years ago

So the assumption/invariant is, that if we have a valid mDBConn we also must have a valid QuotaObject inside? This seems to be broken in this specific case (for whatever reason).

Well, we would definitely benefit from a recording of that rather than guessing from the code. Still we might want to consider to follow Shawn's suggestion https://bugzilla.mozilla.org/show_bug.cgi?id=1414737#c20 to check for a nullptr and fail more gently in case.

Comment 40

•

6 years ago

I'm ok with adding a check for nullptr if we also add something that won't hide this problem. I need more time to figure out how to do that or someone else can try.

Assignee

Comment 41

•

6 years ago

Adding an NS_WARN_IF and an appropriate comment in the code and opening a new Bug here that covers analyzing the root cause seems to be sufficiently visible to me.

Another possibility would be to add telemetry so we can track the number of such failures.

Comment 42

•

6 years ago

New telemetry sounds good to me, while we are here, the telemetry should check that there are quota objects after opening the database (not just during the cleanup transaction).

Comment 43

•

6 years ago

(still in guess mode and obviously ok to not hide the cause)

I see in two reports I randomly clicked on parallel operations on IndexedDB on another thread: 8b0e7f1f-8d58-4e0f-828d-1f7f10200115 and 8b0e7f1f-8d58-4e0f-828d-1f7f10200115. We might see the effect of a race here?

Assignee

Updated

•

6 years ago

Blocks: 1609640

Assignee

Updated

•

6 years ago

Blocks: 1609641

Assignee

Comment 44

•

6 years ago

To resolve the crashes, as part of this bug, the crashes should be changed into regular failures of Connection::GetQuotaObjects. Bugs 1609640 and 1609641 track adding telemetry and identifying/resolving the root cause of these failures.

Assignee: nobody → sgiesecke

Status: NEW → ASSIGNED

Assignee

Comment 45

•

6 years ago

Attached file Bug 1414737 - Ensure Connection::GetQuotaObjects only returns NS_OK if the quota objects are non-null. r=#dom-workers-and-storage,janv — Details