Closed Bug 1393063 Opened 7 years ago Closed 7 years ago

XHR send of FormData with certain Blobs hangs indefinitely (Firefox 55.0.0 - 55.0.2)

Categories

(Core :: DOM: Core & HTML, defect)

55 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 + wontfix
firefox56 + verified
firefox57 --- verified

People

(Reporter: mail, Assigned: asuth)

References

Details

(Keywords: dev-doc-complete, regression, site-compat)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170814072924

Steps to reproduce:

I'm a developer at https://p3d.in/ and today we noticed that uploading certain models using latest Firefox versions hangs indefinitely. After couple of hours of debugging we managed to produce a simple test case that replicates the issue. As far as I can tell it is a bug in Firefox 55 and there is no workaround for the issue:

When sending FormData that has multiple Blobs appended to it the XHR request never sends anything. No events fire, no network request is made, there are no errors in the console.

It seems to only happen when combining constructed Blob() objects together with a manually selected file blob in the FormData.

Here (and attached in the report) is a test case that we have confirmed to replicate the bug on both Windows 10 and Ubuntu 16.04 using the Firefox 55.0.2 release:

Steps to reproduce:
1) Open the test case HTML page: https://miikah.fi/other/firefox_blob_xhr_bug.html
2) In the file dialog select any file or this file that has been confirmed to reproduce the issue: https://miikah.fi/other/test_texture.jpg
3) When the file is selected javascript will send XHR request to "/dummyurl" containing the selected file and two randomly generated Blob() objects



Actual results:

What happens on latest Firefox version:
- No "onload", "onerror" or "onreadystatechange" events ever fire (Event handlers have console.log in them)
- No errors are visible in the console
- No XHR requests appears in the developer tools "Network" tab
- The status box will indefinitely display "Pending..."


Expected results:

What is supposed to happen:
- Both "onload" and "onreadystatechange" events should fire at some point (Event handlers have console.log in them)
- Network tab should have the "/dummyurl" request visible
- The status box should switch to "Done"

This test case works fine in previous Firefox versions (<= 54.x) and all other browsers in cluding Google Chrome.
Thanks for your bug report! I guess it's a duplicate of Bug 1383518 which is going to be solved with Firefox 55.0.3.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Component: Untriaged → HTML: Form Submission
Product: Firefox → Core
Resolution: --- → DUPLICATE
Thank you for providing a reproduction and trying out the try build and providing feedback at https://bugzilla.mozilla.org/show_bug.cgi?id=1383518#c44.  Based on your description and testing, this certainly sounds like a different problem and I'm reopening to investigate.  I aim to have a progress update within 5 hours if not better news than that.
Assignee: nobody → bugmail
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
See Also: → 1383518
[Tracking Requested - why for this release]: A site compatibility regression in Firefox 55. Resulting in the same regression range as Bug 1383518 but the cause seems different. I'm updating my developer note [1] to mention this as well.

[1] https://www.fxsitecompat.com/en-CA/docs/2017/firefox-55-breaks-file-upload-on-certain-sites/
Status: REOPENED → ASSIGNED
Has Regression Range: --- → yes
Has STR: --- → yes
Ah sorry, "Resulting in the same regression range as Bug 1383518" was about Bug 1392580.
Has Regression Range: yes → no
I'm still investigating, but the overview is:
- In the content process, we end up with a complicated hierarchy of streams (gdb python powered pretty print attached).  Internal nodes are multiplex streams and buffered streams, leaf nodes are:
  - nsStringInputStreams for the multipart string pieces
  - nsStringInputStreams for the 2 memory-back blobs created in the content process
  - IPCBlobInputStream for the <input type=file> that came from the parent.
- SerializeInputStreamChild computes a good estimate of the ExpectedSerializedLength of 3000584 which is larger than kTooLargeStream so we take the SerializeInputStream branch that instantiates an IPCStreamSource.
- The resulting PChildToParentStream does a fair bit of streaming according to MOZ_IPC_LOG, but it seems to end up hanging according to both the IPC log and HTTP log and the fact that devtools never sees the request, etc.
  - We do see a dom::IPCBlobInputStreamParent::RecvStreamNeeded happening in the middle, and it does materialize and serialize a file stream (which we know because REOPEN_ON_REWIND results in a spurious NS_ENSURE_SUCCESS failure in nsFileStreamBase::Tell during the redundant destructor-based close that happens strictly after the "hey I stole your FD" Close() during fd serialization).

I'm going to continue to investigate what's up with the streaming.  We should definitely send the "rr" team another bag of money with a dollar sign on it.  I can't imagine debugging stuff like this without it.
Track 56+ as site-compat and regression.
In case this helps tracking down this bug, we just came up with a workaround that fixes p3d.in uploads:

Like stated in the report, having mixed constructed blobs and file uploads seems to trigger the issue.

As a workaround we first tried converting file uploads to constructed blobs like this: "new Blob([file]);" That didn't make a difference.

However, converting the file to an ArrayBuffer using FileReader.readAsArrayBuffer() and then converting that ArrayBuffer back to blob (new Blob([e.target.result])) seems to have "fixed" the issue. Its a bit nasty change and probably multiplies memory footprint of file upload, but at least Firefox users can upload again until you guys release a fix.

Anyway, this seems to indicate that the bug is in appending Files to FormData, not Blobs themselves.
(In reply to MiikaH from comment #7)
> As a workaround we first tried converting file uploads to constructed blobs
> like this: "new Blob([file]);" That didn't make a difference.

Yeah, this just adds another layer of blobs to the mix which is not a major change given the nesting already occurring.

> However, converting the file to an ArrayBuffer using
> FileReader.readAsArrayBuffer() and then converting that ArrayBuffer back to
> blob (new Blob([e.target.result])) seems to have "fixed" the issue. Its a
> bit nasty change and probably multiplies memory footprint of file upload,
> but at least Firefox users can upload again until you guys release a fix.

But this does!  And indeed, this does result in the memory being entrained by the JS engine until garbage-collected, but it's hard to argue with working versus broken ;)
 
> Anyway, this seems to indicate that the bug is in appending Files to
> FormData, not Blobs themselves.

We shall see!  Thanks for the update, it's very much appreciated to have any extra info.
Found it.  IPCStreamSource::DoRead is buggy and can only be safely used on streams that self-close at EOF, returning NS_BASE_STREAM_CLOSED.  DoRead uses Available() to size its read buffer for efficiency, but also will issue a Wait() whenever it sees 0 available.  This is problematic since the only way for its OnEnd method to be invoked is one of the following things to happen:
- For Available() to return an NS_FAILED(rv), such as NS_BASE_STREAM_CLOSED.  However, many streams do not self-close and return 0 at EOF.
- For Read() to return an empty buffer.  Zero byte reads are the canonical method for indicating EOFs for streams.  But since we Wait() if Available() returns 0, it's basically impossible to get a 0 byte read to happen unless Available() is lying to us.

The result is that it goes into an infinite loop of:
- Available() returning 0
- Calling Wait()
- Receiving an OnStreamReady() callback
- Invoking DoRead()
- (repeat)

This raises questions of how things ever work in the condition where the kTooLargeStream threshold is crossed.  Maybe the answer is that usually it's not because we're usually dealing with Blobs that are already known to the parent?
See Also: → 1371699
:bz had a high quality patch (part 5) for this on bug 1371699 that never landed because of backout complications experienced by the rest of the patch series.  I'm going to update the patch to reference this bug leaving authorship and review credit intact, adding the extra context in the commit message, and land it here.
This aligns the code more closely with how the input stream pump works: 0
available bytes when the stream itself told us it's ready means the stream is at
the end.

Note from asuth: This fix was originally part of bug 1371699, it was part 5.
However, its initial landing was backed out, but it is now very much needed for
bug 1393063, so I'm landing it.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce120febd926283815b1d74a1fde5b7aeeb24219
Bug 1393063 - Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available. r=bkelly
I verified that both my minimal local fix (https://hg.mozilla.org/try/rev/64d8e7dd06376f4919c87fa3dda9af62487ab94b) and :bz's fix made the comment 0 test case pass.  I aborted my try push jobs and pushed :bz's fix to inbound.

In terms of the question of how things ever worked, the answer is that memory-backed Blobs are represented as streams as nsStringInputStream instances.  These are not currently nsIAsyncInputStreams (although bug 1371699 will make them do so), so SerializeInputStream is compelled to use NS_NewPipe2 and NS_AsyncCopy.  NS_AsyncCopy by default sets aCloseSource=true and aCloseSink=true, and SerializeInputStream leaves these defaults intact.  So IPCStreamSource benefits from Available() returning NS_BASE_STREAM_CLOSED. 

That's how dom/file/ipcs/tests/browser_ipcBlob.js passed, and why having a POST consisting only of memory-backed blobs works.  Because nsMultiplexStream only conditionally QI's to nsIAsyncInputStream when it contains streams which QI to nsIAsyncInputStream.  If all the leaf streams are nsStringInputStream instances, as is the case for both memory-backed blobs and the MIME multipart encoding stuff, then the root doesn't QI to nsIAsyncInputStream and the self-closing pipe gets used.

I'm not 100% clear on what to do test-wise here, since Bug 1371699 will immediately provide us with all the coverage we could ever want for this case (which is likely why :bz determined the fix was needed).  Obviously, with that flip, we would expect a decrease in coverage for non-nsIAsyncInputStream cases, but file streams that don't get wrapped in multipart encoding should provide the coverage for us.

:baku, what do you think... is it sufficient to just count on bug 1371699 addressing the test coverage issue?  Should we do something in browser_ipcBlob.js to wrap the memory-backed blobs into non-self-closing async streams?
Flags: needinfo?(amarchesini)
Comment on attachment 8901150 [details] [diff] [review]
Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available

Approval Request Comment
[Feature/Bug causing the regression]:
The blob/stream overhaul cleanup.

[User impact if declined]:
Form submission will hang under e10s in cases where:
- A type="file" form input is in use, resulting in the presence of an async stream that avoids the pipe/NS_AsyncCopy that protects the broken streaming logic from itself.
- There's also ~1 megabyte of non-file data that pushes us over our 1 megabyte serializations threshold, causing us to use the broken streaming logic.  In practice, this is most likely to occur for programmatic XHR/fetch using FormData() with memory-backed Blobs created by JS in the content process.  However, it could also occur if the form includes text fields where the user pasted in an immense amount of data.

[Is this code covered by automated tests?]:
It's complicated.  See comment 14 and whatever :baku's response is.  The short answer is not yet, but bug 1371699 will result in us immediately having a lot of coverage.

[Has the fix been verified in Nightly?]:
The nightly fix just got pushed, but I verified locally on a very fresh nightly build.

[Needs manual test from QE? If yes, steps to reproduce]: 
The comment 0 test is excellent, please follow MiikaH's excellent STR and test page there.

[List of other uplifts needed for the feature/fix]:
No other uplifts needed.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
The streaming logic was broken.  It just didn't get used a lot.

Also, :bz actually wrote the patch, so you know it's gotta be good.

[String changes made/needed]:
No string changes made.
Attachment #8901150 - Flags: approval-mozilla-release?
Attachment #8901150 - Flags: approval-mozilla-beta?
Comment on attachment 8901150 [details] [diff] [review]
Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available

(Setting the r+ flag to avoid questions during approval.  This is actually r=bkelly, but I also give it the r=asuth seal of approval.)
Attachment #8901150 - Flags: review+
We don't need extra tests because Blob inputStreams can be File, String, another stream if received by another process. Plus sliced streams or multiplex of the previous types. All of them are well covered by our tests currently.

Interesting point is that, if string inputStream are becoming nsIAsyncInputStream, we can force BlobImpl to work with nsIAsyncInputStream only. I'll file a bug for this.
Flags: needinfo?(amarchesini)
https://hg.mozilla.org/mozilla-central/rev/ce120febd926
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Hm, I ran into this problem too in bug 1351488 and fixed it a bit differently. I'll attach my patch in bug 1351488.
Hi Brindusa, could you help find someone to verify if this issue was fixed as expected on the latest Nightly build? Thanks!
Flags: needinfo?(brindusa.tot)
Flags: qe-verify+
Comment on attachment 8901150 [details] [diff] [review]
Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available

Verified by asuth in nightly; let's uplift this fix for 56 beta 7.
Attachment #8901150 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
How widespread do you think this issue is on release?
Flags: needinfo?(bugmail)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #22)
> How widespread do you think this issue is on release?

I don't think we have the telemetry to reliably answer this question.  Given that the criteria depends on having multipart for submissions combining >=1MiB of data locally sourced in the content process in conjunction with a Blob/File sourced from the parent process via input type=file, I think the following types of sites would break:
- Those dealing with WebGL and user generated content, such as the site from comment 0, p3d.in.  Specifically, we could expect such sites to use a combination of models and/or textures sourced via <input type=file> as well as derived data.
- Photo-editing sites that perform photo re-touching locally on the client.  If we imagine one or more pictures loaded from disk via <input type=file> plus one or more photos locally processed through JS/WebGL, this could happen.  (It might also not happen if FormData isn't in play.)

If the underlying questions are:
1) Should this go on the release branch?
2) Is it necessary to immediately perform a release and push it to everyone?

I would say:

1) Yes, this should go on the release branch.  I know you are familiar with "profile from the future" bug 1246615.  You may also have seen my schema change tracking in https://public.etherpad-mozilla.org/p/quota-manager-schema-change-log.  If not, the bad news is that downgrading from Firefox 55 to 54 breaks everything because 55 is the release we rev'ed the QuotaManager schema in, which is the worst possible schema rev as far as downgrades go. 
2) No, it is not necessary to push another re-spin out to everyone with chemspill immediacy.  However, I would say the sooner this gets in a release and the sooner it's in the download people get if they explicitly reinstall, the better.  If it's possible to spin a new "soft" release and have bouncer/whatever deprioritize pushing it so we're not compelling a download and a Firefox restart, but that new downloads/re-downloads/people-checking-their-version-to-see-if-an-upgrade-exists-that-fixes-their-problem will get it, that would be ideal.
Flags: needinfo?(bugmail)
OK, thanks very much Andrew!  Julien, fyi you likely want this on m-r.
Flags: needinfo?(jcristau)
Flags: needinfo?(brindusa.tot)
Comment on attachment 8901150 [details] [diff] [review]
Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available

ok let's take this on release as well as a ride along
Flags: needinfo?(jcristau)
Attachment #8901150 - Flags: approval-mozilla-release? → approval-mozilla-release+
Note that I don't know when, or even whether, bug 1371699 will land....  So I would not rely on that for test coverage.
Comment on attachment 8901150 [details] [diff] [review]
Part 1: Fix IPCStreamSource's handling of async streams returning 0 from Available

We're two weeks away from 56 release and there are no plans for 55.0.4 at this time.
Attachment #8901150 - Flags: approval-mozilla-release+ → approval-mozilla-release-
Reproduced the issue using Firefox 55.0.3 RC on macOS 10.12.6 and I verified that the issue is fixed using Firefox 56 beta 11 and latest Nightly 57.0a1 across platforms (Windows 10 64bit, macOS 10.12.6 and Ubuntu 16.04 32bit).
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.