Closed Bug 1509292 Opened 6 years ago Closed 4 years ago

Intermittent dom/broadcastchannel/tests/test_broadcastchannel_worker.html | Test timed out.

Categories

(Core :: DOM: postMessage, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: sg)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Filed by: rmaries [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=213359697&repo=mozilla-central

https://queue.taskcluster.net/v1/task/EXlcyGOaQU68fUeIwvDYJg/runs/0/artifacts/public/logs/live_backing.log

[task 2018-11-22T11:48:26.033Z] 11:48:26     INFO - TEST-START | dom/broadcastchannel/tests/test_broadcastchannel_worker.html
[task 2018-11-22T11:53:53.363Z] 11:53:53     INFO - TEST-INFO | started process screentopng
[task 2018-11-22T11:53:53.647Z] 11:53:53     INFO - TEST-INFO | screentopng: exit 0
[task 2018-11-22T11:53:53.648Z] 11:53:53     INFO - Buffered messages logged at 11:48:26
[task 2018-11-22T11:53:53.649Z] 11:53:53     INFO - TEST-PASS | dom/broadcastchannel/tests/test_broadcastchannel_worker.html | Worker is ready! 
[task 2018-11-22T11:53:53.650Z] 11:53:53     INFO - Buffered messages finished
[task 2018-11-22T11:53:53.651Z] 11:53:53     INFO - TEST-UNEXPECTED-FAIL | dom/broadcastchannel/tests/test_broadcastchannel_worker.html | Test timed out. 
[task 2018-11-22T11:53:53.651Z] 11:53:53     INFO -     SimpleTest.ok@SimpleTest/SimpleTest.js:275:7
[task 2018-11-22T11:53:53.652Z] 11:53:53     INFO -     reportError@SimpleTest/TestRunner.js:121:7
[task 2018-11-22T11:53:53.653Z] 11:53:53     INFO -     TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2018-11-22T11:53:53.653Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.654Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.655Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.655Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.656Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.656Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.657Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.658Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.658Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.659Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.659Z] 11:53:53     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-11-22T11:53:53.660Z] 11:53:53     INFO -     TestRunner.runTests/<@SimpleTest/TestRunner.js:381:9
[task 2018-11-22T11:53:53.661Z] 11:53:53     INFO -     promise callback*TestRunner.runTests@SimpleTest/TestRunner.js:368:50
[task 2018-11-22T11:53:53.662Z] 11:53:53     INFO -     RunSet.runtests@SimpleTest/setup.js:201:3
[task 2018-11-22T11:53:53.662Z] 11:53:53     INFO -     RunSet.runall@SimpleTest/setup.js:180:5
[task 2018-11-22T11:53:53.663Z] 11:53:53     INFO -     hookupTests@SimpleTest/setup.js:273:5
[task 2018-11-22T11:53:53.663Z] 11:53:53     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2018-11-22T11:53:53.664Z] 11:53:53     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2018-11-22T11:53:53.665Z] 11:53:53     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2018-11-22T11:53:53.665Z] 11:53:53     INFO -     hookup@SimpleTest/setup.js:253:5
[task 2018-11-22T11:53:53.666Z] 11:53:53     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2018-11-22T11:53:54.358Z] 11:53:54     INFO - GECKO(6120) | MEMORY STAT | vsize 404MB | residentFast 117MB | heapAllocated 8MB
See Also: → 1416241
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
Component: DOM → DOM: Core & HTML
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

This has failed 25 times in the last 7 days, affected platforms are macosx1014-64 debug, linux64-shippable-qr opt and android-em-7-0-x86_64 opt.
Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=283719838&repo=autoland&lineNumber=31745

Hsin-Yi,
Could you please redirect this to someone who can look into it?

Flags: needinfo?(htsai)

Hello Jens, could you help me to forward to someone on your team to investigate? Thank you.

Component: DOM: Core & HTML → DOM: postMessage
Flags: needinfo?(htsai) → needinfo?(jstutte)

:sgiesecke, can you please do some initial investigation here?

Flags: needinfo?(jstutte) → needinfo?(sgiesecke)

I have been able to reproduce this once, but unfortunately lost the rr trace. However, this makes me confident I can reproduce it again.

Flags: needinfo?(sgiesecke)
Assignee: nobody → sgiesecke

I was able to reproduce this now AND capture it with rr. I have two Pernosco traces:

The order in which the parent receives/processes the requests is different:

  • in the GOOD run, the BroadcastChannel ctor from the worker is received/processed first, then the postMessage from the window,
  • in the BAD run, it is the opposite way, leading to the postMessage not being delivered to the worker

I am not sure how to deal with this. To me, at first sight, this seems like an inevitable race condition, since the construction of the BackgroundChannel on the parent side is done asynchronously, without any confirmation sent back. While messages from the same child are processed in order, here we are dealing with messages from two different children (worker vs. window), which are not implicitly serialized.

One way to remedy this were to make the construction of the BackgroundChannel parent synchronous, i.e. add a confirmation that is sent back to the worker child, so that, in the test, it only sends its READY message after the BackgroundChannel parent has registered its actor.

I don't see a solution which only changes the test to fix this properly (adding some delay might reduce the chance of hitting this issue, but it doesn't resolve the race, and might only mask other problems).

:baku, could you give some advice how this should be handled? In general, I'd expect a registration to happen synchronously from the point of view of the one who registers. Should it be like this here as well? I can work on that if I know what to do.

Flags: needinfo?(amarchesini)

:baku, could you give some advice how this should be handled? In general, I'd expect a registration to happen synchronously from the point of view of the one who registers. Should it be like this here as well? I can work on that if I know what to do.

By spec, it seems that BC + any other postmessage can generate race condition. A possible fix can be to change the test in this way:

var bc = new BroadcastChannel("foobar"); 
bc.addEventListener("message", event => {
  if (event.data === "Internal check") {
    postMessage("READY");
    retrn;
  }

  bc.postMessage(                                                           
    event.data == "hello world from the window"
      ? "hello world from the worker"
      : "KO"       
  );                                   
  bc.close();
});  

new BroadcastChannel("foobar").postMessage("Internal check");
Flags: needinfo?(amarchesini)

Ok, thanks. Does the spec say that explicitly?

I see that posting a message from the worker itself will synchronize this.

I wonder if the test can be removed entirely, as it seems to be redundant with the wpt at https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/testing/web-platform/tests/webmessaging/broadcastchannel/workers.html#7

Curiously, the wpt test does not seem to suffer from this problem, though the code in https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/testing/web-platform/tests/webmessaging/broadcastchannel/resources/worker.js looks pretty similar.

Flags: needinfo?(amarchesini)
See Also: → 1370725

Update:
There have been 26 failures within the last 7 days:

  • 1 failure on Linux x64 QuantumRender Shippable opt
  • 1 failure on Linux 18.04 x64 asan opt
  • 1 failure on Linux 18.04 x64 debug
  • 23 failures on OS X 10.14 debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=285475067&repo=autoland&lineNumber=29349

Whiteboard: [stockwell needswork]

(In reply to Simon Giesecke [:sg] [he/him] from comment #41)

Bug 1370725 is similar to this, and the worker in https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/dom/broadcastchannel/tests/broadcastchannel_worker_any.js should be changed similarly, if the keep that test.

I'm OK with removing this test.
In the spec, I don't see where the 2 queues (BC event queue and worker post-messages one) should be synchronized. Am I right?

Flags: needinfo?(amarchesini)
Flags: needinfo?(sgiesecke)
Whiteboard: [stockwell needswork] → [stockwell needswork:owner]

The test suffers from a race condition, and shows frequest intermittent
failures. There already is is WPT test testing the same in
/testing/web-platform/tests/webmessaging/broadcastchannel/workers.html
test case 'BroadcastChannel works in workers', which does not seem
to suffer from this problem.

Pushed by sgiesecke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fc73a1225106
Remove test test_broadcastchannel_worker.html r=baku
Pushed by sgiesecke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f8aa2cf1196a
Remove test test_broadcastchannel_worker.html r=baku
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Patch landed, the offending test was removed.

Flags: needinfo?(sgiesecke)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: