Closed Bug 1335054 Opened 7 years ago Closed 7 years ago

Intermittent dom/indexedDB/test/test_file_sharing.html | application crashed [@ RefPtr<mozilla::dom::BlobImpl>::operator->][@ ObjectStoreAddOrPutRequestOp::DoDatabaseWork]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox52

---

fixed

firefox-esr52

---

fixed

firefox53

---

fixed

firefox54

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: asuth)

References

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(2 files, 1 obsolete file)

Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v1 7 years ago Andrew Sutherland [:asuth] (he/him) 4.94 KB, patch	janv : review+	Details \| Diff \| Splinter Review
Interdiff: remove HasBlobImpl, mFileManager. 7 years ago Andrew Sutherland [:asuth] (he/him) 10.31 KB, patch		Details \| Diff \| Splinter Review
Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v2 7 years ago Andrew Sutherland [:asuth] (he/him) 12.94 KB, patch	asuth : review+ gchang : approval-mozilla-aurora+ jcristau : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Treeherder Bug Filer

Reporter

Description

•

7 years ago

treeherder

Filed by: cbook [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=72981055&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/LU3beekgSDCGUfNf6CQ7KA/runs/0/artifacts/public/logs/live_backing.log

Ryan VanderMeulen [:RyanVM]

Comment 1

•

7 years ago

The timing of this fits nicely with bug 1319531. Can you please take a look, asuth?

Blocks: 1319531

status-firefox52: --- → affected

status-firefox53: --- → affected

status-firefox54: --- → affected

Component: DOM → DOM: IndexedDB

Flags: needinfo?(bugmail)

Keywords: assertion, crash

Summary: Intermittent dom/indexedDB/test/test_file_sharing.html | application crashed [@ RefPtr<mozilla::dom::BlobImpl>::operator->] → Intermittent dom/indexedDB/test/test_file_sharing.html | application crashed [@ RefPtr<mozilla::dom::BlobImpl>::operator->][@ ObjectStoreAddOrPutRequestOp::DoDatabaseWork]

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

7 years ago

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 2

•

7 years ago

DatabaseFile::GetBlockingInputStream() is in the midst of executing on the Connection Thread when ObjectStoreAddOrPutRequestOp::Cleanup() interrupts it on the Background Thread and nulls mBlobImpl right out from under it.

My fix in bug 1319531 introduced this race. Previously, mBlobImpl was only ever accessed on the background thread. The lifecycle looked like: mBlobImpl initialized, mBlobImpl used to produce an nsIInputStream 0+ times, mBlobImpl cleared by ActorDestroy and/or ClearInputStream(). The ownership rule was only the background thread touched mBlobImpl.

With the fix introducing BlobImpl::GetSize(), file I/O became possible and so mBlobImpl started being accessed on the connection thread as well. The bug/race is due to:
- The effective mBlobImpl life-cycle rule was initialization and clearing on the background thread, but read multiple times on the connection thread.
- ObjectStoreAddOrPutRequestOp::Init() errs on the side of scheduling a file to be written multiple times until it's sure the file has already been written to disk. This can potentially result in a file being written multiple times. I think this might better explain FileHelper::CreateFileFromStream's original undocumented confusing behavior where it checked the file size and skipped the write if it was already correct. (Which I raised in https://bugzilla.mozilla.org/show_bug.cgi?id=1312808#c16 review hunk 4.) I think the "exists" check CreateFileFromStream ended up with still can make sense, however.
- test_file_sharing.html schedules the same blob to be written in 2 consecutive add() operations within the upgradeneeded transaction. This produces the race we are witnessing here where the first add() operation has successfully written the blob to disk and set mCopiedSuccessfully and its Cleanup() is clearing it at the same time DoDatabaseWork() is trying to perform a redundant write.

I'm going to bypass options like "use a monitor" and "save a copy of mBlobImpl off" in favor of eliminating the StoredFileInfo::mCopiedSuccessfully indirection and instead directly clearing mBlobImpl on the connection thread after a successful CreateFileFromStream(). mCopiedSuccessfully was appropriate when mBlobImpl was only accessed on the background, thread. But with our new control flow, we can clear it on the connection thread and also thereby eliminate the potential duplicate write problem.

The new mBlobImpl lifecycle becomes:
- Initialized on background thread at DatabaseFile creation time. Safe because by definition no other thread has a reference during initialization.
- Cleared only on the connection thread. (Or by the DatabaseFile destructor via its threadsafe refcount hitting zero. But by definition that means no other thread has a reference to access.) Note that we already removed the ActorDestroy clearing in the fix.
- Accessed on both threads. But the background thread case is only to call bool-returning HasBlobImpl(). The connection-thread DoDatabaseWork can handle a previous op's DoDatabaseWork having cleared mBlobImpl out on the same thread, which is how we handle the attempted duplicate write case test_file_sharing.html and others can create. (So even if HasBlobImpl() on the background thread always lied and returned true, it would not matter.)

Patch and try run shortly.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 3

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f5d8dbc37eb

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 4

•

7 years ago

Attached patch Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v1 (obsolete) — Details — Splinter Review

Flags: needinfo?(bugmail)

Attachment #8833238 - Flags: review?(jvarga)

Jan Varga [:janv]

Comment 5

•

7 years ago

Comment on attachment 8833238 [details] [diff] [review]
Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v1

Review of attachment 8833238 [details] [diff] [review]:
-----------------------------------------------------------------

Great analysis as usual :)

I can only say that it seems we can get rid of HasBlobImpl() now by:
- removing checks for HasBlobImpl() in ObjectStoreAddOrPutRequestOp::Init()
- removing mFileManager
- adding mDatabase and initialize it in ObjectStoreAddOrPutRequestOp constructor
- calling mDatabase->GetFileManager() in ObjectStoreAddOrPutRequestOp::DoDatabaseWork() to get fileManager when it's needed

I think this makes sense given that HasBlobImpl() is not reliable as you outlined in your comment and the lifecycle will be even cleaner/clearer.

All the comments related to mBlobImpl and stuff will need to be updated too.

Anyway, great analysis, comments and fix.
Thanks!

Attachment #8833238 - Flags: review?(jvarga) → review+

Jan Varga [:janv]

Comment 6

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #2)
> multiple times.  I think this might better explain
> FileHelper::CreateFileFromStream's original undocumented confusing behavior
> where it checked the file size and skipped the write if it was already
> correct.  (Which I raised in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1312808#c16 review hunk 4.)  I
> think the "exists" check CreateFileFromStream ended up with still can make
> sense, however.

Ah, so this was the reason for having the original check. I thought it was for the corner case described at https://dxr.mozilla.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#29386
Anyway, as you said, it still makes sense for the corner case.

Comment hidden (Intermittent Failures Robot)

24 failures in 733 pushes (0.033 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* autoland: 12
* mozilla-inbound: 9
* try: 1
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 12
* linux32: 11
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1335054&startday=2017-01-30&endday=2017-02-05&tree=all

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 8

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=71888e081b5d

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 9

•

7 years ago

Attached patch Interdiff: remove HasBlobImpl, mFileManager. — Details — Splinter Review

These are the changes as requested.  The one deviation was that since ObjectStoreAddOrPutRequestOp subclasses TransactionDatabaseOperationBase (via NormalTransactionOp), ObjectStoreAddOrPutRequestOp has access to TransactionDatabaseOperationBase's mTransaction via public Transaction(), so we're able to pull the FileManager out via `Transaction()->GetDatabase()->GetFileManager();` without having to save off mDatabase.  (We know this to be safe because TransactionDatabaseOperationBase::RunOnConnectionThread which calls DoDatabaseWork() explicitly asserts on mTransaction and mTransaction->GetDatabase() and the Database class never clears its mFileManager.)

Alternately we could have exposed DatabaseConnection's mFileManager by adding a new public accessor, since the signature is DoDatabaseWork(DatabaseConnection* aConnection).

I'll land with the patches folded together after sufficiently green try run.

Jan Varga [:janv]

Comment 10

•

7 years ago

Comment on attachment 8834276 [details] [diff] [review]
Interdiff: remove HasBlobImpl, mFileManager.

Review of attachment 8834276 [details] [diff] [review]:
-----------------------------------------------------------------

Perfect!

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 11

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8eb49b4026a776a1636a746216ea95dc0dbac783
Bug 1335054 - Fix IndexedDB blob writing ownership race. r=janv

Geoff Brown [:gbrown]

Updated

•

7 years ago

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8eb49b4026a7

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox54: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Ryan VanderMeulen [:RyanVM]

Comment 14

•

7 years ago

OrangeFactor looks good! Please request Aurora/Beta approval on this when you get a chance.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 15

•

7 years ago

Attached patch Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v2 — Details — Splinter Review

(This is the r=janv patch that landed.  I expect uplift to graft anyways, but for easy history, I'm updating the attachment.)  The landed https://hg.mozilla.org/mozilla-central/rev/8eb49b4026a7 grafts onto beta without problem unlike the original patch.

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1319531 that was uplifted to alpha and beta already.

[User impact if declined]:
See comment 2 for a detailed analysis, but in-brief: there is a crashing race in IndexedDB Blob/File writes.  It occurs when a Blob that was not previously stored in IDB is referenced in back-to-back add/put operations that don't wait for completion/success notifications.  (An add/put that references the Blob multiple times will not cause the race.)

[Is this code covered by automated tests?]:
Yes, and they have given us this bug ;).  Creating a test that would more reliably reproduce the race is not really feasible; this test is the most efficient reproduction possible (without massive new test infra).

[Has the fix been verified in Nightly?]:
Yes, and :RyanVM has helpfully confirmed that orange factor saw an improvement.

[Needs manual test from QE? If yes, steps to reproduce]:
No, this can't be manually tested.  The appropriate verification is making sure that the test crashes go away like :RyanVM did for nightly.

[List of other uplifts needed for the feature/fix]:
None.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
With wisdom and guidance from :janv, the usage pattern that resulted in a race was completely eliminated.

[String changes made/needed]:
None.

Attachment #8833238 - Attachment is obsolete: true

Flags: needinfo?(bugmail)

Attachment #8834990 - Flags: review+

Attachment #8834990 - Flags: approval-mozilla-beta?

Attachment #8834990 - Flags: approval-mozilla-aurora?

Gerry Chang [:gchang]

Updated

•

7 years ago

Attachment #8834990 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Gerry Chang [:gchang]

Comment 16

•

7 years ago

Comment on attachment 8834990 [details] [diff] [review]
Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v2

Fix a intermittent-failure. Aurora53+.

Julien Cristau [:jcristau]

Comment 17

•

7 years ago

Comment on attachment 8834990 [details] [diff] [review]
Stop defering mBlobImpl clear via mCopiedSuccessfully, clear immediately. v2

fix race condition in indexeddb, beta52+

Attachment #8834990 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Carsten Book [:Tomcat]

Comment 18

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/0ab74d80ea74

status-firefox53: affected → fixed

Carsten Book [:Tomcat]

Comment 19

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/ef757ca3fbfe

status-firefox52: affected → fixed

Comment hidden (Intermittent Failures Robot)

6 failures in 836 pushes (0.007 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 3
* mozilla-beta: 2
* mozilla-inbound: 1

Platform breakdown:
* linux64: 5
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1335054&startday=2017-02-06&endday=2017-02-12&tree=all

Ryan VanderMeulen [:RyanVM]

Comment 21

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/ef757ca3fbfe

status-firefox-esr52: --- → fixed

You need to log in before you can comment on or make changes to this bug.