Crash in [@ mozilla::dom::indexedDB::`anonymous namespace'::DatabaseConnection::AutoSavepoint::Start(const class mozilla::dom::indexedDB::`anonymous namespace'::TransactionBase*)]

RESOLVED FIXED in Firefox 67

Status

()

defect
P1
critical
RESOLVED FIXED
3 months ago
18 days ago

People

(Reporter: gsvelto, Assigned: tt)

Tracking

(Depends on 1 bug, {crash, topcrash})

unspecified
mozilla68
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 wontfix, firefox65 wontfix, firefox66 wontfix, firefox67blocking fixed, firefox68 fixed)

Details

(crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

Reporter

Description

3 months ago

This bug is for crash report bp-dde28edf-0b5b-4a68-8d14-4e40d0190217.

Top 10 frames of crashing thread:

0 xul.dll nsresult mozilla::dom::indexedDB::`anonymous namespace'::DatabaseConnection::AutoSavepoint::Start dom/indexedDB/ActorsParent.cpp:10021
1 xul.dll nsresult mozilla::dom::indexedDB::`anonymous namespace'::ObjectStoreAddOrPutRequestOp::DoDatabaseWork dom/indexedDB/ActorsParent.cpp:23588
2 xul.dll void mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::RunOnConnectionThread dom/indexedDB/ActorsParent.cpp:21249
3 xul.dll nsresult mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::Run dom/indexedDB/ActorsParent.cpp:21399
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1162
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:474
6 xul.dll nsresult mozilla::dom::indexedDB::`anonymous namespace'::ConnectionPool::ThreadRunnable::Run dom/indexedDB/ActorsParent.cpp:11738
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1162
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:474
9 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:333

Found this in nightly but it doesn't seem like a new crash since it's present in various older versions.

Reporter

Comment 1

3 months ago

Windows-specific but happens on all supported versions.

OS: Windows 10 → Windows

Comment 2

3 months ago

crashes are also occurring on non-windows platforms with a slightly different signature.

Crash Signature: [@ mozilla::dom::indexedDB::`anonymous namespace'::DatabaseConnection::AutoSavepoint::Start(const class mozilla::dom::indexedDB::`anonymous namespace'::TransactionBase*)] → [@ mozilla::dom::indexedDB::`anonymous namespace'::DatabaseConnection::AutoSavepoint::Start(const class mozilla::dom::indexedDB::`anonymous namespace'::TransactionBase*)] [@ mozilla::dom::indexedDB::(anonymous namespace)::DatabaseConnection::AutoSavepoin…
OS: Windows → All
Assignee

Comment 3

3 months ago

It's a crash so mark it as P1.

From the report [1], it crashes in [2]. It shows that the connection object cannot be got from the 'Database'. I guess that means somehow we haven't called EnsureConnection() before accessing the connection.

[1] https://crash-stats.mozilla.com/report/index/dde28edf-0b5b-4a68-8d14-4e40d0190217
[2] https://searchfox.org/mozilla-central/rev/93905b660fc99a5d52b683690dd26471daca08c8/dom/indexedDB/ActorsParent.cpp#10021

Priority: -- → P1

#11 top browser crash on nightly 67.

27 crashes for 67.0b1 (Devedition) released 2 days which is more crashes than we had for 66 betas while there is at least an order of magnitude difference in userbase. It seems that the situation is getting worse on 67 than on 66. Andrew, can we get somebody to investigate please? Thanks!

Flags: needinfo?(overholt)

Tom, are you able to take another look here?

Flags: needinfo?(overholt) → needinfo?(shes050117)
Assignee

Comment 7

2 months ago

Sure.

Assignee: nobody → shes050117
Status: NEW → ASSIGNED
Flags: needinfo?(shes050117)
Assignee

Comment 8

2 months ago

After looking into more, I reckon it's because the ObjectStoreAddOrRequestOp/TransactionDatabaseOperationBase is destroyed so that the connection is a dangling pointer. With the ActorDestory() stuff which Jan found, I think it's likely to happen.

The reason is:

  • connection is held by the database and the database is held by the transaction. The transaction is held by TransactionDatabaseOperationBase. [1]
  • The transaction only is null'd in Cleanup() and the function shouldn't be called before the RunOnConnectionThread()
  • EnsureConnection() is called slightly before the crash happened [2].

I'm working a patch to make this happen manually to verify my guess.

[1] https://hg.mozilla.org/mozilla-central/annotate/529782921812aaf0558fbb22384b3051174657d3/dom/indexedDB/ActorsParent.cpp#l10029
[2] https://hg.mozilla.org/mozilla-central/annotate/529782921812aaf0558fbb22384b3051174657d3/dom/indexedDB/ActorsParent.cpp#l21506

Assignee

Comment 9

2 months ago

So, after testing it with the minidump, it actually crashes with the reason:
connection->mUpdateRefcountFunction.mRawPtr was nullptr.

It seems it's [1].

[1] https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/dom/indexedDB/ActorsParent.cpp#9487

Assignee

Comment 10

2 months ago

What I found on the minidump:

  • Frame 0

    • connection isn't a nullptr
      • mInReadTransaction is true
      • mInWriteTransaction is false
      • mRefCnt is 2
    • mUpdateRefcountFunction is a nullptr
  • Frame 2 TransactionDatabaseOperationBase::RunOnConnectionThread

  • In the successful case of EnsureConnection()

The only place where the mUpdateRefcountFunction is set:

  • DatabaseConnection::BeginWriteTransaction()

https://searchfox.org/mozilla-central/rev/2738efcf98d746c8810819e7c0634b2c57212a8d/dom/indexedDB/ActorsParent.cpp#9356

It seems the problem is that:
If a new connection is set in EnsureConnection() is created, its (the connection object) mUpdateRefcountFunction is never set before the crash point.

I have no idea how do I create a test to close the connection while adding or putting an object store. So, I might right a patch to force it close and get a new one in EnsureConnection()

Assignee

Comment 11

2 months ago

My guess is that the connection is closed for some reasons before the EnsureConnection. Although EnsureConnection generate a new connection for the database/transaction, the attribute of the connection wasn't set properly.

With this patch, I can reproduce the crash by running the test_objectStore_inline_autoincrement_key_added_on_put.htm

The only strange point is that CloseDatabase() is called only when the database is recognized as an idle database. It seems that there might have some issues.

Setting as a blocker for 67, with 200 crashes per day on beta, this is very likely to explode in release if we don't have a fix or a mitigation in place.

Assignee

Comment 13

2 months ago

So, my understanding of this issue is that:
ConnectionPool::CloseDatabase() [1] is called before the PBackgroundIDBRequestParent* TransactionBase::AllocRequest() [2] is called. For a read transaction that should be okay, but it becomes a problem in the case of a write transaction. The reason is that in the Database::EnsureConnection() [3], it only creates a read transaction when it can't find out a connection for the given database. Consequently, for instance, it would crash when it starts the autosave point in ObjectStoreAddorPutRequestOp::DoDatabaseWork().

My first impression for this is that I should check whether we can prevent the connection from being closed in this case and also check if we can either abort the operation or update the read transaction to the write transaction.

For the "prevent the connection from being closed in this case":
I check all the caller of the CloseDatabase(). And, all of them are executed because the idle timer (30s) is expired. It seems they are reasonable.

For the "abort the operation or update the read transaction to the write transaction"

  • "Abort the operation"
    I hit the assertion for lacking mStoreageConnection [GetCachedStatement in 4] for the Transaction while the Transaction is commiting if I tried to abort the operation in TransactionDatabaseOperationBase::RunOnConnectionThread() (before ensuring the connection [5]).

  • "Update the read transaction to the write transaction"
    I hit the assertion for duplicating the key [6] while executing the statement if I tried to do that by calling BeginWriteTransaction() in ObjectStoreAddorPutRequestOp::DoDatabaseWork() [7]. It's a failure case for getting a constraint error of foreign key. It seems that part of the cached statement was rollbacked while closing the previous connection.

[1] https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/dom/indexedDB/ActorsParent.cpp#11510
[2] https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/dom/indexedDB/ActorsParent.cpp#13797
[3] https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/dom/indexedDB/ActorsParent.cpp#12503
[4] https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/dom/indexedDB/ActorsParent.cpp#21886
[5] https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/dom/indexedDB/ActorsParent.cpp#21564-21572
[6] https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/dom/indexedDB/ActorsParent.cpp#24214
[7] https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/dom/indexedDB/ActorsParent.cpp#23946

Comment 14

2 months ago

I'll take a look too.

Flags: needinfo?(jvarga)
Assignee

Comment 15

a month ago

(In reply to Tom Tung [:tt, :ttung] from comment #13)

  • "Update the read transaction to the write transaction"
    I hit the assertion for duplicating the key [6] while executing the statement if I tried to do that by calling BeginWriteTransaction() in ObjectStoreAddorPutRequestOp::DoDatabaseWork() [7]. It's a failure case for getting a constraint error of foreign key. It seems that part of the cached statement was rollbacked while closing the previous connection.

After adding more log,
Under a VersionChangeTransaction

  1. CreateObjectStoreOp::DoDatabaseWork() -> Insert (id, auto_increament, name, key_path) into table object_store
  2. ObjectStoreAddOrPutRequestOp::DoDatabaseWork() -> Insert (object_store_id, ...) into table object_data

However, if I manually closed the connection of the transaction to mimic idle timer expired between 1 and 2 (in fact, I closed it at the beginning of initializing ObjectStoreAddOrPutRequestOp), I found out that the inserted statment could be find out (by select the id from table object_store) before the connection is closing (at the SendingToConnectionPool of ObjectStoreAddOrPutRequestOp). The problem is that after the connection is closed, I couldn't find out the inserted statment if I wanted to select the id from the table object_store. Consequectly, it leads a foreign key contraint error when inserting a object_store_id to table object_data.

The problem for me on this is that, I assemed the statment would be written into a SQLite database after the Op completes. However, it seems I was wrong.

It seems that the only way I can do for this is to abort the transaction for the write connection if the connection is about to be closed. And, the remaining problem of this is that it seems there is no easy way to get a transaction from the ConnectionPool. I need to figure a way out of this.

(In reply to Jan Varga [:janv] from comment #14)

I'll take a look too.

I have to re-evaluate this, we have only 3 weeks for stabilizing LSNG, so I'd like to focus on that.
Not sure who else could help here. Everyone seems to be busy right now.

Flags: needinfo?(jvarga)

This is a very high crash volume on beta and would block the 67 release, so it should be a high priority.
Andrew, is there anyone else who can help out here?

Flags: needinfo?(overholt)

We're working on a plan here.

Flags: needinfo?(overholt)
Assignee

Comment 19

a month ago

(In reply to Tom Tung [:tt, :ttung] from comment #13)

  • "Abort the operation"
    I hit the assertion for lacking mStoreageConnection [GetCachedStatement in 4] for the Transaction while the Transaction is commiting if I tried to abort the operation in TransactionDatabaseOperationBase::RunOnConnectionThread() (before ensuring the connection [5]).

I'm thinking to rollback all the statement when the connection for a write transaction is eventually closed. Then, I will need to notify the Transaction and set some flag there. So that when CommitOp::Run is executing, we can bypass these GetCachedConnection()s and also the assertions.

I will check if it's really safe to do that (the code is really complicated, though) before the discussion tomorrow.

We discussed this again and Tom has a plan to add a null check. He'd also like to convert an assertion to a diagnostic assertion and asuth will coordinate with sheriffs (tomorrow) so we don't have an explosion of crashes due to that.

Assignee

Comment 21

a month ago

The real problem is because the connection was closed for some reason. This
patch won't resolve the issue by only let the operation and the transaction fail
to escape from the crash. Note that it also changing an assertion when the
transaction is either commiting or aborting. This is prevent the problem for
closing connection is worser than we expected.

Comment 23

a month ago
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/6520a5a9e6f3
P1 - Mitigate the issue by checking the pointer before accessing it; r=asuth

Comment 24

a month ago
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/7b127d669026
P2 - Change some assertions to diagnostic assertion to get more information on Nightly; r=asuth

Comment 25

a month ago
bugherder
Status: ASSIGNED → RESOLVED
Last Resolved: a month ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee

Comment 27

a month ago

Comment on attachment 9059381 [details]
Bug 1528690 - P1 - Mitigate the issue by checking the pointer before accessing it;

Beta/Release Uplift Approval Request

  • User impact if declined: Crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It's a nullptr check before the place where users crashed. If a user runs into that check, it would return an error so that the operation would be aborted.
  • String changes made/needed:
Attachment #9059381 - Flags: approval-mozilla-beta?

Comment on attachment 9059381 [details]
Bug 1528690 - P1 - Mitigate the issue by checking the pointer before accessing it;

Fixes a top crash, approved for 67.0b13.

Attachment #9059381 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Volume on ESR60 is low enough that I don't think we need to worry about backporting there. Feel free to nominate if you feel strongly otherwise, however.

Comment 32

a month ago
Pushed by shes050117@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2942b016522b
P3 - Change more assertions to diagnostic assertion to get more information on Nightly; r=asuth
Assignee

Comment 33

a month ago

(In reply to Pulsebot from comment #32)

Pushed by shes050117@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2942b016522b
P3 - Change more assertions to diagnostic assertion to get more information
on Nightly; r=asuth

This is a patch to diagnosis the issue so that it is expected to make some users crash at DatabaseConnection::~DatabaseConnection() or DatabaseConnection::Close().

Assignee

Comment 34

a month ago

I'll wait two more days to see if P3 can catch the issues on Nightly. If it's not, I'll change or add more assertions and make the assertions in P3 into release assertion and uplift that to Beta to catch the issue as soon as possible.

Assignee

Updated

a month ago
Depends on: 1546892

Comment on attachment 9063568 [details]
Backed out changeset 1a7c349c2c60 for causing Bug 1528690;

Revision D30387 was moved to bug 1546892. Setting attachment 9063568 [details] to obsolete.

Attachment #9063568 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.