Closed Bug 1224825 Opened 4 years ago Closed 4 years ago

Race condition in MessagePort::close

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: baku, Assigned: baku)

References

Details

Attachments

(2 files, 8 obsolete files)

Attached patch foo.patch (obsolete) — Splinter Review
var mc = new MessageChannel(); worker.postMessage('', [mc.port2]);
for(var i = 0; i < 100; ++i) mc.port1.postMessage(i);
mc.port1.close();

There are 2 issues here:

1. Our MessagePort code can produce a race condition with this code because when we call Close(), we send a message to the MessagePortParent in order to close the channel between the 2 ports. But, if the worker is too slow, it will not receive the 100 messages.

2. when we call Close(), if there is a pending queue of messages, we should wait until it's fully sent to PBackground.
Attachment #8687574 - Flags: review?(bugs)
Blocks: 1221852
Comment on attachment 8687574 [details] [diff] [review]
foo.patch

>@@ -262,16 +283,30 @@ MessagePortService::CloseAll(const nsID&
>     data->mParent->Close();
>   }
> 
>   for (const MessagePortServiceData::NextParent& parent : data->mNextParents) {
>     parent.mParent->CloseAndDelete();
>   }
> 
>   nsID destinationUUID = data->mDestinationUUID;
>+
>+  // If we have informations about the other port and that port has some
>+  // pending messages to deliver but the parent has not processed them yet,
>+  // because its entangling request didn't arrive yet), we cannot close this
>+  // channel.
>+  MessagePortServiceData* destinationData;
>+  if (mPorts.Get(destinationUUID, &destinationData) &&
>+      !destinationData->mMessages.IsEmpty() &&
>+      destinationData->mWaitingForNextParent) {
I don't understand this.
What is mWaitingForNextParent about here?

>+    MOZ_ASSERT(!destinationData->mNextStepCloseAll);
>+    destinationData->mNextStepCloseAll = true;
and why if we're waiting for next parent we should close something?
I think I'm missing to understand the meaning of the variables here.



>+  var MAX = 100;
>+
>+  function runTest() {
>+    var worker = new Worker('data:javascript,onmessage = function(e) { e.ports[0].onmessage = function(evt) { postMessage(evt.data);}}');
>+
>+    var count = 0;
>+    worker.onmessage = function(e) {
>+      is(e.data, count, "Correct value expected!");
>+      if (++count == MAX) {
>+       info("All the messages correctly received");
>+       SimpleTest.finish();
>+     }
>+    }
>+
>+    var mc = new MessageChannel();
>+    worker.postMessage(42, [mc.port2]);
>+
>+    for (var i = 0; i < MAX; ++i) {
>+      mc.port1.postMessage(i);
>+    }
>+
>+    mc.port1.close();

Could you add a test where you post a message after .close();
and then change onmessage handler a bit so that it calls finish() asynchronously (using some timeout, say 100ms) and checks that
e.data is never >= MAX.
Attachment #8687574 - Flags: review?(bugs) → review-
Hmm, I guess I start to understand the variables...
but don't then understand the removal of if (!data->mNextParents.IsEmpty()) {
(In reply to Olli Pettay [:smaug] from comment #2)
> Hmm, I guess I start to understand the variables...
> but don't then understand the removal of if (!data->mNextParents.IsEmpty()) {

The reason why I removed that is because it doesn't make sense :)
The nextParent can arrive in the future and at that point that check is just misleading.
> >+  MessagePortServiceData* destinationData;
> >+  if (mPorts.Get(destinationUUID, &destinationData) &&
> >+      !destinationData->mMessages.IsEmpty() &&
> >+      destinationData->mWaitingForNextParent) {
> I don't understand this.
> What is mWaitingForNextParent about here?

We set this variable to true when a disentangle request is received but we don't have a valid 'next' parent yet.
The meaning is: eventually a new parent will arrive and will dispatch the messages to it.
(In case something else happens, StructuredClone will call ForceClose).

So, in case a Close() has been called, we must check if have to deliver messages to a parent that doesn't exist yet but eventually will come.

> >+    MOZ_ASSERT(!destinationData->mNextStepCloseAll);
> >+    destinationData->mNextStepCloseAll = true;
> and why if we're waiting for next parent we should close something?
> I think I'm missing to understand the meaning of the variables here.

Because we must deliver the pending messages that are still in the queue.
Otherwise this code:

var mc = new MessageChannel();
somewhere.postMessage(mc.port2, [mc.port2]);
mc.port1.postMessage(42);
mc.port1.close();

will not be able to dispatch '42' to the port2 when it will be received on the other side.
mWaitingForNextParent is also true by default, and that is what is a bit misleading - at least in my mind.
"next" hints that there is some current or old parent.
Perhaps mWaitingForNewParent?
Attached patch foo.patch (obsolete) — Splinter Review
Attachment #8687574 - Attachment is obsolete: true
Attachment #8687663 - Flags: review?(bugs)
Attachment #8687663 - Flags: review?(bugs) → review+
Had to back this out together with bug 1221852 for M(3) failures:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4922c78419fa

http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32/1447588173/mozilla-inbound_win7-ix_test-mochitest-3-bm126-tests1-windows-build74.txt.gz

05:02:40  WARNING -  TEST-UNEXPECTED-FAIL | dom/security/test/csp/test_upgrade_insecure_reporting.html | application terminated with exit code 1
05:02:40     INFO -  runtests.py | Application ran for: 0:01:26.973000
05:02:40     INFO -  zombiecheck | Reading PID log: c:\users\cltbld\appdata\local\temp\tmpczcznlpidlog
05:02:40     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/9IEpt2YhSKiHy2Y4angz2w/artifacts/public/build/firefox-45.0a1.en-US.win32.crashreporter-symbols.zip
05:02:51     INFO -  mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\4a5d6160-41b7-4840-9ee8-2492525fefb6.dmp
05:02:51     INFO -  mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\4a5d6160-41b7-4840-9ee8-2492525fefb6.extra
05:02:51  WARNING -  PROCESS-CRASH | dom/security/test/csp/test_upgrade_insecure_reporting.html | application crashed [@ mozilla::`anonymous namespace'::RunWatchdog(void *)]
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
And out again in https://hg.mozilla.org/integration/mozilla-inbound/rev/a63e2f7359f3, RunWatchdog crashes in desktop mochitest-3 running dom/security/test/csp/test_upgrade_insecure_reporting.html, and if my bet is right also Android mochitest hangs.
Attached patch patch 1 - DeliverAndClose (obsolete) — Splinter Review
It turned out that this approach is not what the spec wants.
I added 2 tests and I introduced a new method for SharedWorkers (or any other use of MessagePort): DeliverAndClose().
With this method we mark the port as closed but we finish the delivering of the pending messages. More comments on the patch.
Attachment #8687663 - Attachment is obsolete: true
Attachment #8687829 - Flags: review?(bugs)
With this second patch we force the shutting down of the port in case we need it in MessagePortService. The previous code is correct, but it needs some changes when ForceClose() is called.
Attachment #8687830 - Flags: review?(bugs)
Comment on attachment 8687829 [details] [diff] [review]
patch 1 - DeliverAndClose

>+function test_closeInBetween() {
>+  var mc = new MessageChannel();
>+
>+  for (var i = 0; i < MAX; ++i) {
>+    mc.port1.postMessage(i);
>+  }
>+
>+  for (var i = 0; i < MAX; ++i) {
>+    mc.port1.postMessage(i + MAX);
>+  }
>+
>+  mc.port2.onmessage = function(e) {
>+    ok (e.data < MAX, "Correct message received!");
>+    if (e.data == MAX - 1) {
>+      mc.port2.close();
What in the spec says this is the correct behavior?
.close() should just disentangle, and I don't see disentangling meaning that other messages shouldn't be processed.
Sure, step 13 in postMessage is totally buggy, but at least it doesn't talk about disentangled ports.
Attachment #8687829 - Flags: review?(bugs) → review-
Attachment #8687830 - Flags: review?(bugs) → review+
Attached patch patch 1 - interdiff (obsolete) — Splinter Review
Attachment #8688104 - Flags: review?(bugs)
Attached patch patch 1 - Close (obsolete) — Splinter Review
Full patch again, with the removed |if (mClosing)...| and a test for that scenario.
Attachment #8687829 - Attachment is obsolete: true
Attachment #8688122 - Flags: review?(bugs)
Attachment #8688104 - Flags: review?(bugs)
Comment on attachment 8688122 [details] [diff] [review]
patch 1 - Close

I start to dislike MessagePort API (which forces the implementation to be so complicated)
Attachment #8688122 - Flags: review?(bugs) → review+
Attached patch patch 1 - Close (obsolete) — Splinter Review
Sorry for the ping-pong of review requests. Here a better approach:

1. We unref the MessagePort only when we don't have pending messages.
2. When the window is close or the worker is down, we call CloseForced. CloseForced flushed the pending messages and it starts the close procedure.
3. Dispatch() has a lot of comments about all the possible scenarios.
Attachment #8688104 - Attachment is obsolete: true
Attachment #8688122 - Attachment is obsolete: true
Attachment #8688171 - Flags: review?(bugs)
Comment on attachment 8688171 [details] [diff] [review]
patch 1 - Close

case eStateDisentangling: vs case eStateDisentangled:
could use a comment. It is odd that latter state may process messages which earlier state can't. (this is because those aren't quite earlier/latter in all the cases)
Attachment #8688171 - Flags: review?(bugs) → review+
Attached patch patch 1 - interdiff (obsolete) — Splinter Review
In this patch I get rid of mNextStep and I introduce 3 new state.
Attachment #8688632 - Flags: review?(bugs)
Comment on attachment 8688632 [details] [diff] [review]
patch 1 - interdiff

I guess this makes the code a bit easier to read.

The change is scary. Do we need this on beta too?
Attachment #8688632 - Flags: review?(bugs) → review+
About beta: without these patches we deliver the messages in a non-consistent way after Close(). I guess it's ok if we skip beta for this code. But I would like to have in aurora.
But do we need to fix Bug 1221852 for beta?
We can easily write a patch for beta without the test included in patch for bug 1221852. In this way we will not have an intermittent failure.
Attached patch patch 2Splinter Review
Attachment #8687830 - Attachment is obsolete: true
Attachment #8688171 - Attachment is obsolete: true
Attachment #8688632 - Attachment is obsolete: true
Attached patch patch 1Splinter Review
https://hg.mozilla.org/mozilla-central/rev/f04ec936e226
https://hg.mozilla.org/mozilla-central/rev/67a3836250de
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.