Closed Bug 1301094 Opened 3 years ago Closed 3 years ago

e10s FX hangs when sending file data through XMLHttpRequest in Worker

Categories

(Core :: DOM: Workers, defect, P1)

48 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla51
Tracking Status
e10s ? ---
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- fixed
firefox51 --- verified

People

(Reporter: jake, Assigned: baku)

References

Details

Attachments

(4 files)

Attached file fx-e10s-worker-bug.7z
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160823121617

Steps to reproduce:

Send file data in FormData through XMLHttpRequest in a Worker with Electrolysis enabled.


Actual results:

Firefox freezes/hangs. Can open new tab but cannot navigate, switching between open tabs show a spinner.
tracking-e10s: --- → ?
This sounds pretty serious, overholt, can you find someone to look at this?
Component: Untriaged → DOM: Service Workers
Flags: needinfo?(overholt)
Priority: -- → P1
Product: Firefox → Core
Nothing specific to Service Workers here.
Component: DOM: Service Workers → DOM: Workers
Reproduced this locally in dev edition.  Andrea, you've been looking at XHR recently.  Do you have any idea what is happening here?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(overholt) → needinfo?(amarchesini)
I'll work on this tomorrow.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
I don't know who can review these 2 patches. Here the simple one:

on debug build we crash for 2 reasons: a non-sense assertion (see that we get the main-thread 2 lines above the assertion); and a non-cancellable runnable dispatched to workers.
Attachment #8789742 - Flags: review?(bugs)
Here the complex part. I try to explain what is happening:

1. We are in a sync event loop because of a XHR in workers doing send(). (yeah... XHR in workers)
2. we want to send a blob. This blob is created by the parent process, so it's a remoteBlob.
3. in order to do the FormData, we have to retrieve the inputStream from the blob. Doing so, we arrive here: https://dxr.mozilla.org/mozilla-central/source/dom/ipc/Blob.cpp#2370

At this point we try to dispatch a runnable to the worker in order to create a  RemoteInputStream:

https://dxr.mozilla.org/mozilla-central/source/dom/ipc/Blob.cpp#2430

but this runnable will never be dispatched because of the sync event loop and we will be blocked here forever:

https://dxr.mozilla.org/mozilla-central/source/dom/ipc/Blob.cpp#2404

My solution is:

1. When a RemoteBlobImpl is created with a WorkerThread as target, I keep also the pointer of the current WorkerPrivate. And I add a WorkerHolder in order to be notified when the worker goes away.

2. There is a new DispatchToTarget method in RemoteBlobImpl. This method is called by GetStream. Here we check if we have a WorkerPrivate and in this case, we wrap the runnable into a WorkerControlRunnable.

3. A WorkerControlRunnable is executed also when a sync event loop is active. So we create the RemoteInputStream correctly. Note: the inputStream is then used in a different thread by FormData; here we are talking just about the creation of it.

4. The WorkerHolder is used to reset mWorkerPrivate when the worker goes away. And everything is protected by a mutex because DispatchToTarget can be called by any thread.
Attachment #8789744 - Flags: review?(bugs)
Can we get some tests for this.
This is really bad issue.
Attachment #8789742 - Flags: review?(bugs) → review+
Attached patch part 3 - testSplinter Review
Here a test
Attachment #8789821 - Flags: review?(bugs)
Comment on attachment 8789744 [details] [diff] [review]
part 2 - the complex part

Review of attachment 8789744 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/Blob.cpp
@@ +2220,5 @@
>  
> +    {
> +      MutexAutoLock lock(mMutex);
> +      mWorkerPrivate = nullptr;
> +      mWorkerHolder = nullptr;

Pretty sure its not safe to destroy a WorkerHolder of the worker thread.  You need a control runnable to dispose of this, no?
Comment on attachment 8789744 [details] [diff] [review]
part 2 - the complex part

Review of attachment 8789744 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/Blob.cpp
@@ +2220,5 @@
>  
> +    {
> +      MutexAutoLock lock(mMutex);
> +      mWorkerPrivate = nullptr;
> +      mWorkerHolder = nullptr;

The mutex made me think this could happen on a different thread, but I did not read closely enough.  This is on the worker thread.
Comment on attachment 8789821 [details] [diff] [review]
part 3 - test

unclear to me why to use somewhat deprecated onreadystatechange and not just onload with XHR. But either way should be fine.
Attachment #8789821 - Flags: review?(bugs) → review+
Comment on attachment 8789744 [details] [diff] [review]
part 2 - the complex part

>@@ -2111,16 +2157,29 @@ RemoteBlobImpl::CommonInit(BlobChild* aA
> {
>   MOZ_ASSERT(aActor);
>   aActor->AssertIsOnOwningThread();
>   MOZ_ASSERT(!mIsSlice);
> 
>   mActor = aActor;
>   mActorTarget = aActor->EventTarget();
> 
>+  if (!NS_IsMainThread()) {
>+    mWorkerPrivate = GetCurrentThreadWorkerPrivate();
>+    if (mWorkerPrivate) {
>+      mWorkerHolder = new RemoteBlobImpl::WorkerHolder(this);
>+      if (NS_WARN_IF(!mWorkerHolder->HoldWorker(mWorkerPrivate, Closing))) {
>+        // We don't care too much if the worker is already going away because no
>+        // sync-event-loop can be created at this point.
>+        mWorkerPrivate = nullptr;
>+        mWorkerHolder = nullptr;
>+      }
>+    }
>+  }
I don't understand the setup by just reading this code. What guarantees that RemoteBlobImpl is actually created in the worker thread and not in the main thread and then passed to worker?
http://searchfox.org/mozilla-central/rev/950e13cca1fda6507dc53c243295044e8eda4493/dom/base/StructuredCloneHolder.cpp#552-554 and similar is needed to be read to understand that.
So, could you add some comment about RemoteBlob handling


>+    {
>+      MutexAutoLock lock(mMutex);
>+      mWorkerPrivate = nullptr;
>+      mWorkerHolder = nullptr;
>+    }
This needs some comment or assertion would be even better to hint that we're in the right thread.
Right now one needs to look at the place where mWorkerHolder is set to non-null to figure out that this is right.


+
+  bool
+  WorkerRun(JSContext* aCx, WorkerPrivate* aWorkerPrivate)
Missing override?


Do we have tests for the case when the same Blob is being uploaded simultaneously in main thread and a worker?


(tiny bit silly that WorkerHolder doesn't expose the WorkerPrivate it holds via any method. If it did do that, we wouldn't need mWorkerPrivate member variable here.)
(I'm pretty sure we have similar to RemoteBlobControlRunnable class somewhere, but couldn't find where)
Attachment #8789744 - Flags: review?(bugs) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/19a758a7a5ae
part 1 - RemoteBlobImpl::CreateStreamHelper must be a CancelableRunnable, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/03c0969b435f
part 2 - RemoteInputStream must be created using a ControlWorkerRunnable, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdc7be84b21a
part 3 - Test for Blob+XHR+FormData+Workers, r=smaug
Should this be uplifted to 50?
Flags: needinfo?(amarchesini)
Comment on attachment 8789742 [details] [diff] [review]
part 1 - simple fix first

Approval Request Comment
[Feature/regressing bug #]: XHR and RemoteImpl Blob in Workers.
[User impact if declined]: FF freezes when, in a worker, XHR.send() is called with a FormData containing a Blob.
[Describe test coverage new/current, TreeHerder]: test included.
[Risks and why]: These 3 patches are not small and easy.
[String/UUID change made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8789742 - Flags: approval-mozilla-aurora?
Hello Jake, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(jake)
(In reply to Ritu Kothari (:ritu) from comment #17)
> Hello Jake, could you please verify this issue is fixed as expected on a
> latest Nightly build? Thanks!

Hi Ritu,

Yup, looks like it's fixed
Flags: needinfo?(jake)
Comment on attachment 8789742 [details] [diff] [review]
part 1 - simple fix first

Fix was verified on Nightly, Aurora50+
Attachment #8789742 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to jake from comment #18)

> 
> Hi Ritu,
> 
> Yup, looks like it's fixed

Great! Really appreciate your prompt reply.
Depends on: 1317725
You need to log in before you can comment on or make changes to this bug.