Closed Bug 1319531 Opened 8 years ago Closed 7 years ago

Sporadic NS_BASE_STREAM_WOULD_BLOCK error for IDBObjectStore.put() on WebAssembly.Module

Categories

(Core :: Storage: IndexedDB, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 + fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: luke, Assigned: asuth)

References

Details

Attachments

(2 files, 6 obsolete files)

23.46 KB, patch
asuth
: review+
Details | Diff | Splinter Review
22.84 KB, patch
asuth
: review+
Details | Diff | Splinter Review
STR:
 1. set javascript.options.wasm = true
 2. load http://files.unity3d.com/alexsuvorov/test/indexedDB/put-wasm/
 3. wait until 'put' is enabled then repeatedly click 'put'
For me, 3 fails frequently on opt builds and sporadically on debug builds; it seems to be timing dependent.

The underlying failure seems to be an aInputStream->Read() in ObjectStoreAddOrPutRequestOpt::CopyFileData() in the parent process where the return value is NS_BASE_STREAM_WOULD_BLOCK.  Any pointers on how this is supposed to work?  Happy to help debug.
Jan probably knows what's up here.
Assignee: nobody → jvarga
Priority: -- → P1
Jan was saying earlier that baku might be able to help triage this as it relates to streaming blobs.
Flags: needinfo?(amarchesini)
just had a 1:1, overholt is also going to ask asuth
I'm going to investigate this with bug 1319737; clearing the needinfo on :baku for now.  (:baku please speak up if you're actively investigating! :)
Assignee: jvarga → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(amarchesini)
The problem is that once memory-backed blobs cross the 1MiB threshold, instead of being sent as a StringInputStreamParams where the entire contents go across the wire in a single message, a PSendStream is used instead.  The contents of the memory-backed blob are asynchronously sent to the parent where IndexedDB gets an nsPipeInputStream which is receiving the data asynchronously and therefore can block.  And AllocPSendStreamParent explicitly invokes NS_NewPipe2 with the input set to non-blocking, so that pipe is non-blocking.  Note that the pipe also has infinite buffering; at this time there's no back-pressure of any kind.

There's no current way to retroactively mark the nsPipeInputStream as non-blocking[1] after it's created, the options that come to mind to deal with streams that QI to nsIAsyncInputStream (nsStringStream does not) are:
- Simplest: wrap it in a second pipe that we explicitly create to have a blocking input stream.
- Defer the DoDatabaseWork call until asyncWait(callback, WAIT_CLOSURE_ONLY, ...) notifies us.  This is potentially more efficient since it avoids wasted work, but would break if sendstream gains any type of backpressure or stops using an infinite pipe.  Also, it causes us to wait for all the memory-backed-blobs to be fully sent across before we start any writing, which could increase peak memory usage.
- Non-trivial: Re-write DoDatabaseWork to use NS_AsyncCopy.
- Duplicate the nsPipeInputStream's blocking idiom by having CopyFileData use a monitor that gets asyncWait() notified about on another thread.
- Madness: Use nsSyncStreamListener, which creates another pipe anyways, and also involves spinning a nested event loop.
- Also madness: Augment PSendStream to support specifying whether the input stream should be blocking.  Besides having it be weird for the child to describe the implementation behavior of streams in the parent, all the PSendStream decisions are made at a low-level where we're exclusively interacting with blobs.  We'd need to effectively mark the blob implementation at creation time and propagate that info through multiple call-sites.

What do you think, Jan?  Simplest is just wrapping the input stream in a blocking pipe, which is what I'd lean towards. 

1: Although SetNonBlocking is public on nsPipeInputStream, the class is only defined in nsPipe3.cpp.
Flags: needinfo?(jvarga)
And to be clear, the read-race problem is not WASM specific.  Any blob being written to IndexedDB that wasn't A) already known to the specific IndexedDB database or B) entirely synchronously available in the parent process will potentially race and may fail to put/add().

Which I think also means that the wait-for-closed option is not safe/viable since there can be other blob sources where that just won't work.  For example, the fancy XHR support that stashes Blobs on disk in temporary files.
(In reply to Andrew Sutherland [:asuth] from comment #6)
> And to be clear, the read-race problem is not WASM specific.  Any blob being
> written to IndexedDB that wasn't A) already known to the specific IndexedDB
> database or B) entirely synchronously available in the parent process will
> potentially race and may fail to put/add().

Oh, this came to my mind immediately when I was reading comment 5. Yeah this is a generic issue and not WASM specific. I'm surprised we don't have noticeable issues elsewhere.

I'll investigate your proposed options, NS_AsyncCopy() would be nice, but that probably requires the task queues or something, so that would quite complex fix.
(In reply to Andrew Sutherland [:asuth] from comment #5)
> - Simplest: wrap it in a second pipe that we explicitly create to have a
> blocking input stream.

I prefer the first option, it seems easy, clean and effective.
Flags: needinfo?(jvarga)
I was unsure of the best way to test/verify this.  I considered adding a DEBUG-only mechanism to allow unit tests to force SendStream streams to stall until explicitly released, but this seemed perhaps overly complicated for what amounts to a correctness issue.  (And in particular I was concerned about it being a brittle strategy since it could possibly break other things.  Also, it wouldn't be clear when to release the stream.  So, many issues.)

In the end I decided on calling this an obvious correctness fix and manually reproducing the failure mode to sanity check the fix.  The next patch I'll attach adds a debugging printf to explain the decision-making process of the code and induces nsITimer-based stalls in all SendStreams.  That doesn't need to be reviewed and absolutely shouldn't land, etc.
Attachment #8823502 - Flags: review?(jvarga)
Attached patch Testing Helper v1 (obsolete) — Splinter Review
Comment on attachment 8823502 [details] [diff] [review]
Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. v1

Review of attachment 8823502 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, now when it's implemented, it looks so easy :)

::: dom/indexedDB/ActorsParent.cpp
@@ +6675,5 @@
> +   *   to a non-blocking pipe that will return NS_BASE_STREAM_WOULD_BLOCK.
> +   * - Other Blob types, such as files, may also be non-blocking.
> +   *
> +   * Failure of this code-path is deemed sufficiently impossible that we assert
> +   * on failure and do not need to propagate errors.

I really appreciate the comment, thanks.
I also like the idea to keep things simple, but I'm a bit worried about file input streams, for example Available(), can't that ever fail ?
There are only two callers of this method and it seems to me that error propagation wouldn't be so hard to do.
Also, Available() and GetSize() could do I/O in theory, we generally try to avoid that on the background thread too.

@@ +6681,2 @@
>    already_AddRefed<nsIInputStream>
> +  GetBlockingInputStream() const

Nit: This method may deserve definition outside of the class declaration.
(In reply to Jan Varga [:janv] from comment #14)
> I also like the idea to keep things simple, but I'm a bit worried about file
> input streams, for example Available(), can't that ever fail ?

Yes, it can fail; my mental model of what it was doing was wrong.  I'll add error-handling that results in failure for the add/put request if anything goes wrong acquiring the stream.  If the Blob can't provide an error-free stream, the Blob's contract has been violated and the add/put must fail because there's no way to get the correct value out.  Prior drafts of the patch did have error handling, but more than verbosity I was worried about the cleanup if we'd already processed one-or-more Blobs.  (And since it seemed like maybe these things couldn't/shouldn't fail, that the extra error handling/analysis would be wasted effort/complexity.)  I'll make sure to do the analysis to ensure we don't leak things.  I can probably fashion a test for the "someone deleted the file backing our Blob!" if IDB doesn't already have coverage for that.

It might also make sense to optimize the file path slightly.  nsFileStreamBase returns aNonBlocking=false, so the call to Available() and the fseek()s that NSPR does under the hood are wasted unless the OS/filesystem takes that as a hint to start read-ahead.  I can make the available/size checks conditional on the stream being aNonBlocking=true.  This will eliminate it as a source of failures.  (But we'll still keep the error handling because I don't want to make assumptions about other stream implementations.)  I'll plan to do this.
Yeah, that all sounds good to me.
(In reply to Jan Varga [:janv] from comment #14)
> @@ +6681,2 @@
> >    already_AddRefed<nsIInputStream>
> > +  GetBlockingInputStream() const
> 
> Nit: This method may deserve definition outside of the class declaration.

I'm parsing this as meaning that you think it might be a useful utility function elsewhere.  There's 2 reasons I think it probably makes sense to leave it like this for now:
- The Available() check in the PSendStream case depends on knowing the true size of the Blob.  Since the method is a member function of DatabaseFile and has access to its mBlobImpl, that's really easy for this helper as structured but makes it less generic for this use case and any would-be other users.
- This is, in many ways, a hack.  I think if there are other call-sites that are finding it necessary to do this, then it probably makes sense to address in Necko.  (And in the future as extensions are primarily limited to webextensions, it should be easier to make such a change without having to worry about massive extension breakage.)
Oh, sorry I wasn't clear probably. I meant to just declare the method and move definition to DatabaseFile::GetBlockingInputStream()
{
  ...
}

But it's not a big deal.
Ah, yes, that makes sense!  Especially with the error handling, the method becomes quite large.

I have a cleaned-up patch and a test that deletes the file out from under an IndexedDB file-backed-blob.  Unfortunately, the error-handling situation is proving somewhat complicated.  Currently, if op->Init() returns false in TransactionBase::StartRequest, IPC_FAIL_NO_REASON() is returned, which ends badly for the content process.  Changing this to send an error and return success is running into assumptions/assertions baked into NormalTransactionOp and TransactionDatabaseOperationBase that the state machine advanced to reporting, ran cleanup at that point, and ended up completed.  Things are made somewhat more complex by SendFailureResult's Send__delete__ call deleting the operation since it's not independently ref-counted.  (Namely, making SendFailureResult advance the state machine and invoking it before Cleanup() is not safe and asserts.)

I think I'm pretty close to making it work without being horribly ugly, but if you have any thoughts on how to handle this scenario, I'm interested.  The high level options seem to be to make the operations:
- understand an early/very-early failure as an explicit state where we report the error and delete the operation during the constructor processing
- mark the operation as failed because of Init but let it propagate more formally through its state machine.

I'm a bit torn about the scenario.  I'm pursuing this test because it's the easiest way to induce failures in getting the stream, but there's also a degree of realism in a world where anti-virus software may arbitrarily decide to quarantine/delete random files it deems suspicious.

It's worth noting that because the file stream actually does a deferred opening, my initial fix and test resulted in a failure where Init() succeeded because the input stream opened acceptably, but the sync copy failed, which ended up with weird state where I think the put of a deleted file succeeded but the get was messed up.  (Or something like that, I didn't investigate too much because that seemed more complicated to disentangle and not aligned with my testing goals.  Although perhaps relevant for the current WASM file format transition.)  So I made GetBlockingInputStream() run the Available() check even if we know the stream is already blocking (which is the case for file streams) so that the deferred open is instead fully processed and we can notice if the stream is broken during Init()-time.
As discussed, this adds error-checking to the stream-retrieval process.

Rather than treating failure to retrieve the stream as something fatal that should result in the termination of the child process, we opt to instead just fail the put().  We do this by saving off the returned error code in a new member, mInitResult which we then return in DoDatabaseWork if there's a failure.  To reduce error-handling boilerplate in Init(), we use the ScopeExit helper that can be found at http://searchfox.org/mozilla-central/source/mfbt/ScopeExit.h.

I opted for this approach because, as hinted at by my previous comment, repurposing Init() to allow for non-fatal returns got complicated/ugly quick.  Many of the existing "return false" calls did seem like appropriate cases for IPC_FAIL_NO_REASON returns.  Early result transmission was at odds with deeply asserted invariants.  Plus (and probably worst), having early returns without actually doing the thread dispatching could result in apparent reordering of operations which would be very undesirable.

As also noted in comment 5, the test I added depends on us asking the input stream for its Available() bytes even in the case of the already-known-blocking file stream.  Unfortunately there's really no other good way to test this functionality without doing this.  (At least not without adding non-trivial fault injection to the blob/sendstream infrastructure.)
Attachment #8823502 - Attachment is obsolete: true
Attachment #8823502 - Flags: review?(jvarga)
Attachment #8827100 - Flags: review?(jvarga)
Attached patch Testing Helper v2 (obsolete) — Splinter Review
Attachment #8823503 - Attachment is obsolete: true
Oh, and the change in specialpowers to use appendRelativePath instead of appendPath is because appendPath only lets a single child path component be appended, whereas appendRelativePath allows multiple segments to be appended at once.  I looked at the original bug adding the e10s-friendly functionality and the use of appendPath seemed like it was just the simplest thing that worked.  There didn't seem to be any consideration of potentially using appendRelativePath instead.  And since specialpowers implies (intended) chrome-level privileges anyways, I don't see any security ramifications to the change.
Ok, I'll take a look ASAP.
Comment on attachment 8827100 [details] [diff] [review]
Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. v2

Review of attachment 8827100 [details] [diff] [review]:
-----------------------------------------------------------------

This looks really good!

There's just one thing, I'm not 100% sure, but I think that so far we've been avoiding any I/O on the PBackground thread.
The new GetBlockingInputStream() calls Available() which may do deferred file opening and then probably getting size of the file.
I'm not sure about blobImpl->GetSize(), but in theory it can do I/O too.
Could we do the Available() call and following actions in DoDatabaseWork() instead ?
You mentioned in some other bug that the stream transport service can be used on any thread. Not sure about NS_NewPipe().
That way we would avoid I/O on the background thread, on the other hand an error would be caught later, possibly when other files were already copied.
However these files should be removed by our reference counting implementation for stored files.

If you find this idea reasonable and technically feasible, you can do it in this patch.
Anyway, sorry for being detailist and thanks a lot for helping with this bug.

::: dom/indexedDB/ActorsParent.cpp
@@ +6738,5 @@
> +      return nullptr;
> +    }
> +
> +    // If it's non-blocking we may need a pipe.
> +    bool pipeNeeded = true;

Nit: Do we need to initialize it to true ?

@@ +6748,5 @@
> +    // Detect file-backed blob problems earlier:
> +    // Retrieve the number of bytes available even if we know the stream to be
> +    // blocking.  This is because local file streams use a deferred opening
> +    // implementation and it's much simpler for our (current) error handling if
> +    // we detect it now rather than inside SyncCopy.

Hm, I don't understand why it's easier to do it here.

@@ +8292,5 @@
> +  // a failure is stored here, it will be returned by our DoDatabaseWork().
> +  // While nsresult is also appropriate for size reasons, ErrorResult cannot be
> +  // used currently because it asserts an owning-thread relationship which is
> +  // not compatible with our usage.
> +  nsresult mInitResult;

Nit: This should go before any bools or 1-byte enums, before mPersistenceType I think to achieve more efficient alignment.

@@ +8297,1 @@
>  

Nit 2: This is currently strictly scoped to |if (!fileAddInfos.IsEmpty()) {| in ObjectStoreAddOrPutRequestOp::Init() and I fully understand reasons for not making this generic, so I think that we could use a more specific name, like mStoredFileInitResult (feel free to name it other way).

@@ +26132,5 @@
> +    // We clean up mStoredFileInfos/mFileManager to maintain our invariants.
> +    // Cleanup() style logic is not appropriate here; that method handles
> +    // successful clearing of the underlying input stream.
> +    ErrorResult rv;
> +    auto assumeFileErrorOnExit = mozilla::MakeScopeExit([&] {

This is new to me, very nice. I like that we don't have to define own RAII class for this.

::: dom/indexedDB/test/test_file_put_deleted.html
@@ +92,5 @@
> +    // This is async without a callback because it's intended for cleanup.
> +    // Since IDB is PBackground, we can't depend on serial ordering, so we need
> +    // to use another async action.
> +    SpecialPowers.removeFiles();
> +    SpecialPowers.executeAfterFlushingMessageQueue(grabEventAndContinueHandler);

Wow, good to know that something like this exists!
Attachment #8827100 - Flags: review?(jvarga) → review+
(In reply to Jan Varga [:janv] from comment #24)
> @@ +6748,5 @@
> > +    // Detect file-backed blob problems earlier:
> > +    // Retrieve the number of bytes available even if we know the stream to be
> > +    // blocking.  This is because local file streams use a deferred opening
> > +    // implementation and it's much simpler for our (current) error handling if
> > +    // we detect it now rather than inside SyncCopy.
> 
> Hm, I don't understand why it's easier to do it here.

I think I got overly fixated on:
- Doing the stream conversion to blocking inside Init()
- Making sure the test was actually testing the logic I was adding (in Init) rather than the more generic error failure case.
This comment expresses that rationalization.  That's one upside to reviewing my patches; I'll usually incriminate myself in a comment when I realized I'm doing something dubious!

Related to this and your other comments, you're absolutely right that I/O on PBackground is a huge no-no.  File-backed Blobs can/will do I/O if they don't know the size.  Shouldn't happen, but since we're now involving DoDatabaseWork(), it's clear that the right thing to do is defer all of those calls to there.  I'll update the patch appropriately.  (I'm also going to probably ask for re-review; my apologies on the multiple cycles here and my deep appreciation for your guidance and feedback.)  I agree that given how exceedingly rare/unexpected these errors are that it's best to structure the control flow for simplicity, even though it means we'll recognize the errors later and there may be some wasted I/O performed/etc.

I should have the new patch in ~1day.
I've revised the patch and it passes the new test and existing IDB tests, but I'm experiencing an intermittent failure when manually testing with the WASM testcase from bug 1319737.  If I keep refreshing the page, inevitably what happens is:
- IPC logs show both WASM blobs being constructed and sent to the parent process and the sendstreams having an appropriate life-cycle.
- None of the blob writing code NS_WARNs even though every control flow path that results in the blob not being written to disk should do so.
- UpdateRefcountFunction::RemoveJournals warns that it failed to remove a journal but the transaction claims to complete successfully.
- The test issues the read-back get and FileManager::GetCheckedFileForId asserts on one of the blobs not existing.
- Consistent with that, at this point on disk we only have the 10 meg wasm bytecode blob, not the 50 meg compiled blob.

I plan to continue investigating this tomorrow, probably with some more printf's/logging.  (Unfortunately, WASM-related things have been painful to debug since WASM gets disabled when mach disables the slow-script mechanism when using a debugger.  And when I explicitly leave slow-script enabled with "--slowscript" my X session has locked up several times.  rr has worked for recording but would tend to lockup in replay.)  Fingers crossed that it's something obvious.
(In reply to Andrew Sutherland [:asuth] from comment #26)
Thanks for all the hard work so far!

> I plan to continue investigating this tomorrow, probably with some more
> printf's/logging.  (Unfortunately, WASM-related things have been painful to
> debug since WASM gets disabled when mach disables the slow-script mechanism
> when using a debugger.  And when I explicitly leave slow-script enabled with
> "--slowscript" my X session has locked up several times.  rr has worked for
> recording but would tend to lockup in replay.)  Fingers crossed that it's
> something obvious.

On a side note, that "--slowscript" option should become the default and just be removed.  (The original motivation was random watchdog-timer-induced safe-segfaults, but those are all gone now, leaving only segfaults for wasm/asm.js out-of-bounds.)  I'm surprised to hear about the frequent X session lockups though, I haven't ever experienced or heard of that in relation to anything wasm.  I have, on rare occasion, seen X lock up when debugging FF, though.
The problem I was running into is that although ActorsParent's DatabaseFile implementation is ref-counted, its ActorDestroy method nulls out mBlobImpl and mFileInfo.  As implemented (but not documented ;) on trunk, DatabaseFile is only guaranteed to have valid contents:
- During the ObjectStoreAddOrPutRequestOp::Init call.
- As long as there is a strong reference to the originating Blob in the child process.  IDBDatabase maintains a weak-map from Blob (not BlobImpl) to DatabaseFile for the purposes of ensuring a not-already-stored-in-the-database Blob is only stored once in the database if possible.  That's the only reason the actor has to stay alive at all.  It was also the source of my confusion because the continued existence of mBlobImpl means that a previous transaction failed to write to disk, presumably for quota reasons.

Because the Available() optimization that lets us avoid a wasteful pipe needs access to the Blob and pulling the Blob out adds a lot of plumbing, and I didn't immediately see the harm in continuing to access StoredFileInfo's mFileActor, I opted to use it to keep the patch and interdiff small. Unfortunately, I ran afoul of IDBDatabase's weak-map reaping expiration timing which can look non-deterministic.   (Even slightly pipelined transactions can cause expiration to occur while the subsequent transaction is running because transaction completion triggers an expiration check.)

I'm going to address this now and add some comments.  Unless I hear otherwise, I think I'm just going to stop ActorDestroy from nulling out DatabaseFile's mBlobImpl and mFileInfo.  DatabaseFile is the only data structure that has awareness of the underlying Blob, and I think it makes sense to only spread the Blob-touching if the non-actor references to DatabaseFile were potentially long-lived.  But StoredFileInfo only lives as long as its operation, which is a reasonable lifetime.  Additionally, in normal operation, the child's expiration logic will only fire after the StoredFileInfo instance is dead.


And :luke, thanks very much for the clarification about slowscript and filing and fixing bug 1332312.  The less things to worry about, especially if they're nothing to worry about, the better!
Sounds good and yeah this stuff is quite complicated :)
(I'm speculatively carrying forward the r+ here, I'm attaching an explicit interdiff separately in a sec that I'm putting an r? on too.  This patch is the previous patch you r+'d plus the "interdiff" qfolded into it.)
Attachment #8827100 - Attachment is obsolete: true
Attachment #8828699 - Flags: review+
Comment on attachment 8828699 [details] [diff] [review]
Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. v3

Review of attachment 8828699 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! I've got only some nits.

::: dom/indexedDB/ActorsParent.cpp
@@ +6637,5 @@
>  };
>  
> +/**
> + * In coordination with IDBDatabase's mFileActors weak-map, a long-lived mapping
> + * from a child process's non-IDB-file-backed Blob to its FileInfo in our owning

I'm not sure if "non-IDB-file-backed" is true here. I think DatabaseFile is created for all blobs. See for example Database::AllocPBackgroundIDBDatabaseFileParent()

@@ +6670,5 @@
> +   * Is there a blob available to provide an input stream?  This may be called
> +   * on any thread under current lifecycle constraints.
> +   */
> +  bool
> +  HasInputStream() const

Nit: Maybe HasBlobImpl() ?

@@ +6758,5 @@
> +  // thread.  The background thread should call HasInputStream().
> +  MOZ_ASSERT(!IsOnBackgroundThread());
> +
> +  nsCOMPtr<nsIInputStream> inputStream;
> +  if (mBlobImpl) {

Nit: Almost entire method body is in this branch, maybe we could do |if (!mBlobImpl) return null;| here.

@@ +6796,5 @@
> +      if (!target) {
> +        rv.Throw(NS_ERROR_UNEXPECTED);
> +        return nullptr;
> +      }
> +      MOZ_ASSERT(target);

I think this assert is useless, you already checked |target|.
An alternative would be to somehow pass |rv| to do_GetService() and then check it instead of checking |target|.

@@ +6797,5 @@
> +        rv.Throw(NS_ERROR_UNEXPECTED);
> +        return nullptr;
> +      }
> +      MOZ_ASSERT(target);
> +      nsCOMPtr<nsIInputStream> pipeInputStream;

Nit: I would add an empty line before these two variables.

@@ +26444,1 @@
>        storedFileInfo.mInputStream.swap(inputStream);

Hm, maybe you could add assertions that either mInputStream or mFileActor is non-null
Attachment #8828700 - Flags: review?(jvarga) → review+
I've made the changes suggested and will land after this last try run.  I also fixed a small bug in SpecialPowersObserver.jsm where the created output stream was only explicitly closed if data was provided to be written.  That was leading to windows test failures because DeleteFile() on Windows only marks the file for deletion if there are still open handles to it, deferring the actual deletion until the handles are closed.  The output stream would remain open until garbage collection triggered a close, which reliably raced in a way that resulted in test failure on windows.  (Because we expected the deletion to have occurred synchronously, because that's what it's supposed to do.)

(In reply to Jan Varga [:janv] from comment #34)
> I'm not sure if "non-IDB-file-backed" is true here. I think DatabaseFile is
> created for all blobs. See for example
> Database::AllocPBackgroundIDBDatabaseFileParent()

Indeed.  That comment also needed some other improvement too; I've tried to flesh out the intent and Blob context a bit more.
 
> Hm, maybe you could add assertions that either mInputStream or mFileActor is
> non-null

This also needed to accept that both are null if it's a mutable file, I've done so.
Looks great.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f1285bd1cbdb2e3230468b4782973a5c471856d
Bug 1319531 - Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. r=janv
(what I pushed to inbound, r=janv)
Attachment #8828699 - Attachment is obsolete: true
Attachment #8828700 - Attachment is obsolete: true
Attachment #8830674 - Flags: review+
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/670965db76ae
Backed out changeset 3f1285bd1cbd for test failures in own test on a CLOSED TREE
The test failures were due to the test's path logic getting faked out by the android mount-point names.  I've grown the regexp to include a little of the quota manager path too (so, /storage/default/ instead of /storage/, and also supporting windows path separators, of course), and android tests pass now.  Pushing to inbound again.
Flags: needinfo?(bugmail)
https://hg.mozilla.org/integration/mozilla-inbound/rev/515256f6226adaddb7ced052259387a4d0c10925
Bug 1319531 - Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. r=janv
The mozilla-central fix uplifts cleanly to aurora, but beta 52 needed help.  My understanding is we'll want to uplift to 52 because of e10s && WASM.  The bug may potentially manifest when storing any memory-backed blob >= 1 MiB in size with e10s in play.  WASM makes things much more likely because it involves large blobs to being with (ex: 11MiB in the bug 1319737 test-case) and the compiled bytecode blobs are even larger (ex: 47MiB).  NI'ing myself to request uplift after short baking on trunk.

No breakage was expected, but I locally ran mochitests and the manual test-case to be sure.
Attachment #8827101 - Attachment is obsolete: true
Flags: needinfo?(bugmail)
Attachment #8831646 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/515256f6226a
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Confirmed that the test case in bug 1319737 doesn't show any errors anymore, on 10 runs: good job!
(In reply to Benjamin Bouvier [:bbouvier] from comment #48)
> Confirmed that the test case in bug 1319737 doesn't show any errors anymore,
> on 10 runs: good job!

And thank you very much for the invaluable test-case!  Easily-run, minimal reproduction test-cases are the best.  You rock!
\o/
Comment on attachment 8830674 [details] [diff] [review]
Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. v4

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1294450 changed stream serialization under e10s such that IndexedDB's assumption that it could issue synchronous reads of Blobs sent to it was no longer safe.  (This was an incorrect assumption, but it previously held.)  Specifically, when sending memory-backed Blobs from a child process to a parent process, if the Blob is >= 1MiB, IndexedDB would receive a non-blocking nsPipe which violated its assumptions.  If the child process failed to send the data to the parent process and relay it to the pipe quickly enough, IDB reads could fail with NS_BASE_STREAM_WOULD_BLOCK, resulting in a failure to write the Blob.

[User impact if declined]:
IndexedDB can fail to write memory-backed Blobs to disk resulting in unexpected errors when later attempting to read the blobs out of the database.  This particularly matters for the WebAssembly ("WASM") effort targeted at Firefox 52.  Bug 1311466 introduced the ability to store WebAssembly modules in IndexedDB.  These end up as potentially huge memory-backed Blobs (ex: 11MiB, 47MiB) which are at serious risk of losing the race.  The resulting read errors are potentially worse for WASM modules since they are directly consumed by IDB at a lower level; as evidenced in reported bug 1319737, assertions will fire in assertion-enabled builds, etc.

[Is this code covered by automated tests?]:
The patch introduces a new test which verifies correct behavior.

[Has the fix been verified in Nightly?]:
Yes, manually by the author and by the reporter of bug 1319737 in comment 48.

[Needs manual test from QE? If yes, steps to reproduce]:
No.  In the case of aurora, there has been little deviation from trunk so far, and there is automated coverage.

[List of other uplifts needed for the feature/fix]:
No dependencies.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
It's a small amount of code that conditionally wraps an existing async stream into a blocking stream only when necessary.  In the cases where there was no risk of a race to begin with, the code behaves identically to the existing code.

[String changes made/needed]:
None.
Flags: needinfo?(bugmail)
Attachment #8830674 - Flags: approval-mozilla-aurora?
Comment on attachment 8831646 [details] [diff] [review]
Beta 52 channel version of fix. (Minor bit-rot and test infra changes resolved.)

Note that this is basically the same scenario as for the aurora uplift request.  This patch has simply been adapted for the 52 branch.  I've also manually tested the build with the test case from bug 1319737.

[Feature/Bug causing the regression]:
Bug 1294450 changed stream serialization under e10s such that IndexedDB's assumption that it could issue synchronous reads of Blobs sent to it was no longer safe.  (This was an incorrect assumption, but it previously held.)  Specifically, when sending memory-backed Blobs from a child process to a parent process, if the Blob is >= 1MiB, IndexedDB would receive a non-blocking nsPipe which violated its assumptions.  If the child process failed to send the data to the parent process and relay it to the pipe quickly enough, IDB reads could fail with NS_BASE_STREAM_WOULD_BLOCK, resulting in a failure to write the Blob.

[User impact if declined]:
IndexedDB can fail to write memory-backed Blobs to disk resulting in unexpected errors when later attempting to read the blobs out of the database.  This particularly matters for the WebAssembly ("WASM") effort targeted at Firefox 52.  Bug 1311466 introduced the ability to store WebAssembly modules in IndexedDB.  These end up as potentially huge memory-backed Blobs (ex: 11MiB, 47MiB) which are at serious risk of losing the race.  The resulting read errors are potentially worse for WASM modules since they are directly consumed by IDB at a lower level; as evidenced in reported bug 1319737, assertions will fire in assertion-enabled builds, etc.

[Is this code covered by automated tests?]:
The patch introduces a new test which verifies correct behavior.

[Has the fix been verified in Nightly?]:
Yes, manually by the author and by the reporter of bug 1319737 in comment 48.

[Needs manual test from QE? If yes, steps to reproduce]:
No.  In the case of beta, there hasn't been much codebase drift, and I manually tested my own application of the patch to 52 using the test case from bug 1319737, plus also the test added in the patch also passes for me.

[List of other uplifts needed for the feature/fix]:
No dependencies.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
It's a small amount of code that conditionally wraps an existing async stream into a blocking stream only when necessary.  In the cases where there was no risk of a race to begin with, the code behaves identically to the existing code.

[String changes made/needed]:
None.
Attachment #8831646 - Flags: approval-mozilla-beta?
Comment on attachment 8830674 [details] [diff] [review]
Ensure all streams provided by DatabaseFile can be read without generating NS_BASE_STREAM_WOULD_BLOCK errors. v4

Fix an issue related to IndexedDB. Aurora53+.
Attachment #8830674 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8831646 [details] [diff] [review]
Beta 52 channel version of fix. (Minor bit-rot and test infra changes resolved.)

indexeddb fix affecting wasm, should be in 52.0b3

Thanks Andrew for the nicely detailed uplift request. :)
Attachment #8831646 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1335054
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: