Consider removing the temporary fix for bug 1528690
Categories
(Core :: Storage: IndexedDB, enhancement, P2)
Tracking
()
People
(Reporter: tt, Assigned: tt)
References
(Blocks 1 open bug)
Details
Crash Data
Attachments
(2 files, 2 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
10.23 KB,
patch
|
Details | Diff | Splinter Review |
The patch Bug 1481164 might also fix the bug 1528690 because that corrects threading for closing connection in the case of missing a connection.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
We haven't collected any crash from the assertions in bug 1528690. So, that
means there is no a closing connection and its mInWriteTransaction flag is true
or its transaction count is non-zero.
Bug 1481164 corrects dispatching to the wrong thread for closing idle databases.
That might be related to the mUpdateRefcountFunction issue because that make it
possible to access dbInfo, transactionInfo on a Connection thread.
Therefore, this patch intends to remove the pointer check on Nightly and enable
an assertion to see if the problem is still exsit.
Pushed by shes050117@gmail.com: https://hg.mozilla.org/integration/autoland/rev/1a7c349c2c60 P1 - Removing the temporary fix for bug 1528690; r=asuth
Assignee | ||
Comment 3•5 years ago
|
||
We should track if there are crashes after the patch goes to m-c
Assignee | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
bugherder |
Assignee | ||
Comment 5•5 years ago
|
||
Updated•5 years ago
|
Backout by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c61887431ba8 Backed out changeset 1a7c349c2c60 for causing Bug 1528690 on a CLOSED TREE
Updated•5 years ago
|
Assignee | ||
Comment 7•5 years ago
|
||
Thanks! So, bug 1481164 doesn't fix the crash because missing mUpdateRefcountFunction from a connection
of ObjectStoreAddOrPutRequestOp
. I need to figure out the root cause of that crash.
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Not sure if it's related to the root cause yet, but in the minidump of a crash report. I saw the mResultCode
of the aTransaction
is NS_ERROR_STORAGE_IOERR(2153971714) [1].
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #8)
Not sure if it's related to the root cause yet, but in the minidump of a crash report. I saw the
mResultCode
of theaTransaction
is NS_ERROR_STORAGE_IOERR(2153971714) [1].
And the mInvalidate
of it is false
Assignee | ||
Comment 10•5 years ago
|
||
When the TransactionDatabaseOperationBase
is constructing, the const member variable mTransactionIsAbort
is initialized to be aTransaction->IsAbort()
. At the third of the stack(TransactionDatabaseOperationBase::RunOnConnectionThread()
), the mTransactionIsAbort
is false
. So, it means a failure for the transacion happen between the time when TransactionDatabaseOperationBase
is constructing and the time when it crashes.
Assignee | ||
Comment 11•5 years ago
|
||
mCommitOrAbortReceived
is false so it's not TransactionBase::RecvAbort
before the EnsureConnection
of something like that.
I wonder if it's this [1]
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #11)
mCommitOrAbortReceived
is false so it's notTransactionBase::RecvAbort
before theEnsureConnection
of something like that.
I wonder if it's this [1]
There is an element called __vfptr
and it shows that the mTransaction
is implemented by a NormalTransaction
.
Combine with all the information above, there is only one possible place:
Some other TransactionDatabaseOperationBase
set that result and the mTransaction
is aborted right after the executing ObjectStoreAddOrPutRequestOp
jumped from background thread to the connection thread.
It seems that's a timing issue. I'm going to try to figure out why the connection is closed by that "other" TransactionDatabaseOperationBase, but it appears that there is some storage error. I guess I could consider aborting the transaction earlier if it's possible.
Assignee | ||
Comment 13•5 years ago
|
||
I feel like I find out a possible way to run into this issue.
So there is a Database::StartTransactionOp
and a ObjectStoreAddOrPutRequestOp
. The StartTransactionOp
executes first and then the ObjectStoreAddOrPutRequestOp
executes right after it.
- The
StartTransactionOp
fails to transfer the connection into a write connection on the connection thread [1] whileObjectStoreAddOrPutRequestOp
is initializing on the background thread. - The
StartTransactionOp
jumps back to the background thread and then abort the transaction whileObjectStoreAddOrPutRequestOp
jumps from the background thread to the connection thread. - Although the transaction is aborted, the
ObjectStoreAddOrPutRequestOp
isn't able to know the connection was aborted and keep using that connection (doesn't have amUpdateRefcountFunction
). - Hit the issue.
I need to verify if it's possible to have the story above.
Assignee | ||
Comment 14•5 years ago
|
||
It turns out that the temporary fix for bug 1528690 might be the right way to
do. Based on the recent crash reports, the mResultCode for the Transaction
object wasn't an NS_OK and the crashing operation hasn't updated the failure to
that yet. So, the right way to do here is that we should abort the operation on
the connection thread.
Assignee | ||
Comment 15•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #14)
Created attachment 9066412 [details]
Bug 1546892 - Update the comment for the fix and add an assertion to ensure all cases have been caught;It turns out that the temporary fix for bug 1528690 might be the right way to
do. Based on the recent crash reports, the mResultCode for the Transaction
object wasn't an NS_OK and the crashing operation hasn't updated the failure to
that yet. So, the right way to do here is that we should abort the operation on
the connection thread.
The follow-up or another thing which needs to do is to answer the question "Why does the number of NS_ERROR_STORAGE_IOERR for doing BeginWriteTransaction()
increase dramatically on Windows in these a few months?"
Assignee | ||
Comment 16•5 years ago
|
||
I should also try to make a test to verify it. Something like removes the SQLite database while doing IDB stuff, or utilize functions in TelemetryVFS to create the error.
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #16)
I should also try to make a test to verify it. Something like removes the SQLite database while doing IDB stuff, or utilize functions in TelemetryVFS to create the error.
I'm trying that by other methods, the problem I have now is that the test often hit some MOZ_ASSERT in advance. I will figure out how to make it easier or more straightforward (crashing on the same point for the crash report).
Assignee | ||
Comment 18•5 years ago
|
||
I can reproduce the issue by manually making BeginWriteTransacation()
fails when StartTransactionOp
calls that function and forcing all existing connections being idle.
I will try to make the BeginWriteTransacation()
fails without changing any c++ code.
Assignee | ||
Comment 19•5 years ago
|
||
With this patch, I can constantly hit this issue on both my Mac and Ubuntu and that verifies my guess. Note, only on opt builds, because debug builds would run into many MOZ_ASSERT
s before reaching the crashing place.
The next step is to verify if removing sqlite file can make BeginWriteTransaction()
fail so that we can remove the forcing failure code on the c++.
Assignee | ||
Comment 20•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #19)
The next step is to verify if removing sqlite file can make
BeginWriteTransaction()
fail so that we can remove the forcing failure code on the c++.
I set a breakpoint at the beginning of that function and remove the sqlite files, but I didn't get an error in that function...
Assignee | ||
Comment 21•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #20)
(In reply to Tom Tung [:tt, :ttung] from comment #19)
The next step is to verify if removing sqlite file can make
BeginWriteTransaction()
fail so that we can remove the forcing failure code on the c++.I set a breakpoint at the beginning of that function and remove the sqlite files, but I didn't get an error in that function...
That result was on Mac. I should and will test it on Windows. I got issues while getting "ProfD" on the test. I guess that means I need to have a chrome privilege. Thus, I'm going to transfer the mochitest test to a browser mochitest test.
Comment 22•5 years ago
|
||
Pushed by shes050117@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7a8bb795a4c9 Update the comment for the fix and add an assertion to ensure all cases have been caught; r=asuth
Comment 23•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Updated•3 years ago
|
Description
•