Closed Bug 1397645 Opened 7 years ago Closed 7 years ago

Analyzing of large file upload on mediafire.com is too slow

Categories

(Core :: DOM: Core & HTML, defect)

57 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 + wontfix
firefox57 + verified

People

(Reporter: ananuti, Assigned: baku)

References

Details

(Keywords: regression)

Attachments

(2 files)

Uploading large media file (≈300 MiB mkv file) to mediafire.com is too slow.

STR:

1. open https://www.mediafire.com/ (need login)
2. try upload large mkv or mp4 file

Result: file analyzing is too slow.

Last known good revision is mozilla-inbound changeset c35f0ad34397
First bad revision is mozilla-inbound changeset cf6065f64f9a

pushlog:
o  changeset:   394940:cf6065f64f9a
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Tue Apr 25 14:07:31 2017 +0200
|  files:       dom/file/ipc/Blob.cpp
|  description:
|  Bug 1359172 - RemoteInputStream must create a correct inputStream when used and sliced in a separate process, r=smaug
|
|
o  changeset:   394939:2629873b5fdc
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Mon Apr 24 12:16:50 2017 +0200
|  files:       dom/base/nsContentUtils.cpp dom/ipc/ContentChild.cpp dom/ipc/DOMTypes.ipdlh dom/ipc/TabParent.cpp
|  description:
|  Bug 1358115 - Use IPCBlob in DataTransfer, r=smaug
|
|
o  changeset:   394938:049673ad3893
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Mon Apr 24 12:16:50 2017 +0200
|  files:       dom/ipc/ContentChild.cpp dom/ipc/ContentParent.cpp dom/ipc/PContent.ipdl
|  description:
|  Bug 1358113 - Use IPCBlob in File.createFromNsIFile/createFromFileName, r=smaug
|
|
o  changeset:   394937:2ef7ae7a605f
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Mon Apr 24 12:16:49 2017 +0200
|  files:       dom/file/nsHostObjectProtocolHandler.cpp dom/file/nsHostObjectProtocolHandler.h dom/ipc/ContentChild.cpp dom/ipc/ContentChild.h dom/ipc/ContentParent.cpp dom/ipc/ContentParent.h dom/ipc/PContent.ipdl dom/url/URL.cpp
|  description:
|  Bug 1358114 - Use IPCBlob in BlobURL, r=smaug
|
|
o  changeset:   394936:7d7576b29e2b
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Mon Apr 24 12:16:49 2017 +0200
|  files:       dom/filesystem/FileSystemTaskBase.cpp dom/filesystem/FileSystemTaskBase.h dom/filesystem/GetDirectoryListingTask.cpp dom/filesystem/GetFileOrDirectoryTask.cpp dom/filesystem/GetFilesTask.cpp dom/filesystem/PFileSystemParams.ipdlh dom/filesystem/PFileSystemRequest.ipdl
|  description:
|  Bug 1358111 - Use IPCBlob in Entries API - part 2 - Entries API, r=smaug
|
|
o  changeset:   394935:b3cc0e03d9b7
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Mon Apr 24 12:16:49 2017 +0200
|  files:       dom/filesystem/GetFilesHelper.cpp dom/ipc/ContentChild.cpp dom/ipc/PContent.ipdl
|  description:
|  Bug 1358111 - Use IPCBlob in Entries API - part 1 - GetFilesHelper, r=smaug
|
|
o  changeset:   394934:4de83aa3b817
|  user:        Andrea Marchesini <amarchesini@mozilla.com>
|  date:        Tue Apr 25 14:07:30 2017 +0200
|  files:       dom/file/ipc/IPCBlobUtils.h dom/ipc/FilePickerParent.cpp dom/ipc/PFilePicker.ipdl widget/nsFilePickerProxy.cpp
|  description:
|  Bug 1358109 - Use IPCBlob in PFilePicker, r=smaug
Flags: needinfo?(amarchesini)
screencast the problem https://www.dropbox.com/s/2ucbbhxc2d03ll0/bug1397645.mp4?dl=0
Has Regression Range: --- → yes
Has STR: --- → yes
Summary: Upload large file to mediafire.com is too slow → Analyzing of large file upload on mediafire.com is too slow
[Tracking Requested - why for this release]: regression
I can track this for 56, but we have very little time left to come up with a fix before the 56 release. 

baku, what do you think, can you take a look?
Here what is happening:

1. a blob is created from the FilePicker. Its stream is a IPCBlobInputSteam in a init state, no remote stream yet. On the parent side we have a nsFileInputStream.
The blob impl is a StreamBlobImpl.

2. The page creates X slices of the blob (around 20) each one of 200k bytes.
StreamBlobImpl uses SlicedInputStream in order to create the new blobs.

3. SlicedInputStream checks if the internal stream is seekable, but IPCBlobInputStream, is not. It starts doing consecutive read() in order to reach the starting point. This is very slow and inefficient.

Here the fix: I introduced a new interface called nsICloneableInputStreamWithRange for those streams (so far only IPCBlobInputStream) that are faster in doing clone with range instead of clone + SlicedInputStream.

IPCBlobInputStream is faster doing clone with range, because the slice will be done when the remote stream is received, before creating the async remote stream. Because the remote stream in this case is a nsFileInputStream, it is fully seekable, and SlicedInputStream will work more efficiently.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8906244 - Flags: review?(bugmail)
Comment on attachment 8906244 [details] [diff] [review]
nsICloneableInputStreamWithRange

I can confirm this patch fixes the regression for me. :)
Attachment #8906244 - Flags: feedback+
Comment on attachment 8906244 [details] [diff] [review]
nsICloneableInputStreamWithRange

Review of attachment 8906244 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the excellent characterization of the problem and the solution in comment 5.

It would really be great if we could have test logic assert on the stream classes in use.  Maybe we could add a debug-only stream "visitor" API that would be something like debugVisitStreams(callback), where callback(parentPtr, childPtr, childConcreteClassName, childDepth, childStart, childLength) is invoked in a pre-order tree traversal fashion.  Then we can build a JS helpers on top of that called assertStreamHierarchy() and assertBlobStreamHierarchy() that might look like this for FormData XHR payloads in a test that just cares that we're stating the expected hierarchy with nothing about slices so we can omit those fields:
  assertStreamHierarchy(
    postStream,
    {
     type: 'nsMultiplexInputStream',
     children: [
       'nsStringInputStream',
       {
         type: 'nsBufferedInputStream',
         children: [{ type: 'nsMultiplexInputStream', children: ['nsStringInputStream'] }]
       }
       'nsStringInputStream',
       {
         type: 'nsBufferedInputStream',
         children: ['IPCBlobInputStream']
       },
       'nsStringInputStream'
     ]
    });

And for this specific case, we'd go from:
  is(blob.size, 4096)
  assertBlobStreamHierarchy(
    blob.slice(1024, 1024+2048),
    {
      type: 'SlicedInputStream',
      start: 1024, length: 2048,
      children: [{
        type: 'IPCBlobInputStream',
        start: 0, length: 4096
      }]
    });
to
  is(blob.size, 4096)
  assertBlobStreamHierarchy(
    blob.slice(1024, 1024+2048),
    {
      type: 'IPCBlobInputStream',
      start: 1024, length: 2048
    });
Attachment #8906244 - Flags: review?(bugmail) → review+
What you are saying would be really nice for testing. I'll file a bug and I'll implement it as a follow up.
We also need to send start/length to the parent process when IPCBlobInputStream is serialized. I also extended the documentation. Green on try.
Attachment #8906567 - Flags: review?(bugmail)
Comment on attachment 8906567 [details] [diff] [review]
range sent to the parent process + documentation

Review of attachment 8906567 [details] [diff] [review]:
-----------------------------------------------------------------

As always, it is so awesome to see the IPCBlobUtils.h documentation comments and see them updated!

::: dom/file/ipc/IPCBlobInputStream.cpp
@@ +475,5 @@
>    MOZ_ASSERT(mActor->Size() >= aStart + aLength);
>    mStart = aStart;
>    mLength = aLength;
> +
> +  // Now it's the right time to apply a slice if needed.

I wonder if this method should be renamed to "Init" or "InitWithSlicedRange" (and normal Clone maybe would call InitWithExistingRange?) to make it more clear that this is a construction-time mutation, not something that can happen later on.  Alternately, I think some state-based assertion may be appropriate.

I'd suggest fleshing out this comment a little.  Things become more clear with a naming change above, but still, maybe:
In the child, we slice in StreamReady() when we set mState to eRunning.  But in the parent, we start out eRunning, so it's necessary to slice the stream as soon as we have the information during the initialization phase because the stream is immediately consumable.

::: dom/file/ipc/IPCBlobUtils.h
@@ +188,5 @@
> + *
> + * Slicing IPCBlob
> + * ~~~~~~~~~~~~~~~
> + *
> + * Normally, slicing a blob consists in the creation of a new Blob, with a

super-nitty phrasing nit I'm only nitting because of a typo below: s/consists in/consists of/.  (Although I learned a new thing from https://english.stackexchange.com/questions/61600/consist-in-vs-consist-of... I had never realized "consists in" was even a thing.)

Also space-after-period-required nit in the line below this one.

@@ +193,5 @@
> + * SlicedInputStream() wrapping a clone of the original inputStream.But this
> + * approach is extremely inefficient with IPCBlob, because it could be that we
> + * wrap the pipe stream and not the remote inputStream (See the previous section
> + * of this documentation). If we end up doing so, also if the remote
> + * inputStream is seekable, the pipe will not, and in order to reach the

rather-nitty phrasing nit: s/the pipe will not/the pipe will not be/.

@@ +200,5 @@
> + * This problem is fixed implmenting nsICloneableWithRange in IPCBlobInputStream
> + * and using cloneWithRange() when a StreamBlobImpl is sliced. When the remote
> + * stream is received, it will be sliced directly.
> + *
> + * If we want to rappresent the hierarchy of the InputStream classes, instead

typo nit: s/rappresent/represent/

@@ +202,5 @@
> + * stream is received, it will be sliced directly.
> + *
> + * If we want to rappresent the hierarchy of the InputStream classes, instead
> + * of having: |SlicedInputStream(IPCBlobInputStream(Async Pipe(RemoteStream)))|,
> + * we have: |IPCBlobInputStream(Async Pipe(SlicedInputStream(RemoteStream)))|.

I love the existence of this whole comment block, and this bit in particular is invaluable.  Expertise well-distilled!
Attachment #8906567 - Flags: review?(bugmail) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d285843321b5
Optimize IPCBlobInputStream slicing with the introduction of nsICloneableInputStreamWithRange, r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/47982714b7b6
Propagation of the IPCBlobInputStream slicing to the parent process, r=asuth
Verified on 20170912013600 Nightly.

baku, is this something we'd consider safe to uplift? If so, could you request approval on this when you get a chance, please. :)
Status: RESOLVED → VERIFIED
Flags: needinfo?(amarchesini)
Comment on attachment 8906244 [details] [diff] [review]
nsICloneableInputStreamWithRange

Approval Request Comment
[Feature/Bug causing the regression]: IPCBlob refactoring
[User impact if declined]: some operation related to slicing blobs can be extremely slow.
[Is this code covered by automated tests?]: n/a
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: yes. See the STR in the bug description
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: low
[Why is the change risky/not risky?]: These 2 patches introduce a new interface that is used to slice a inputStream at cloning time. This is used by StreamBlobImpl to optimize the slicing of IPCBlobInputStream. The code is covered by tests and we have a positive feedback on real usecases.
[String changes made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8906244 - Flags: approval-mozilla-beta?
Attachment #8906567 - Flags: approval-mozilla-beta?
Sorry to spam this bug, but I still cannot upload any video on Youtube (I reported duplicate bug #1398556) even using a build based on this commit : https://hg.mozilla.org/mozilla-central/rev/b0e945eed81db8bf076daf64e381c514f70144f0
Frederic, could you file a new bug, please. CC :baku and me.
Flags: needinfo?(fredbezies)
Done. See bug #1399215
Flags: needinfo?(fredbezies)
Comment on attachment 8906244 [details] [diff] [review]
nsICloneableInputStreamWithRange

Sounds a bit risky but we have a verification that this at least fixes the original problem. Let's uplift this for beta 12.
Attachment #8906244 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs rebasing for Beta.
Flags: needinfo?(amarchesini)
Attachment #8906567 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Is this wontfix for 56 now? Seems like a risky change to land in the RC even if there are rebased patches available.
Flags: needinfo?(lhenry)
Let's try this for the 56 RC.  I would like to fix this before we ship since it is a new regression.  

In discussion with asuth on irc he thinks this may affect other sites, and that the fix is low risk over all, even if we hit some edge cases that will need further patching. If it creates any major problems we will still have one more chance to back this out for an RC2.
Flags: needinfo?(lhenry)
Attachment #8906244 - Flags: approval-mozilla-release+
Attachment #8906567 - Flags: approval-mozilla-release+
It looks like this fix didn't land for the 56 RC.  While we are having an RC2 build, I think at this point it is too risky and the fix will need to wait for 57.
Flags: needinfo?(amarchesini)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.