Closed Bug 1770172 Opened 3 years ago Closed 3 years ago

Intermittent /streams/transferable/writable-stream.html | second write should wait for first underlying write to complete - promise_test: Unhandled rejection with value: object "TypeError: resolveWrite is not a function"

Categories

(Core :: DOM: Streams, defect, P5)

defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: saschanaz)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=378615548&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/N2y6e3dvRq2TtKitQHJBGQ/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/N2y6e3dvRq2TtKitQHJBGQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2022-05-19T12:41:26.549Z] 12:41:26     INFO - TEST-PASS | /streams/transferable/writable-stream.html | effective queue size of a transferred writable should be 2 
[task 2022-05-19T12:41:26.549Z] 12:41:26     INFO - TEST-UNEXPECTED-FAIL | /streams/transferable/writable-stream.html | second write should wait for first underlying write to complete - promise_test: Unhandled rejection with value: object "TypeError: resolveWrite is not a function"
[task 2022-05-19T12:41:26.550Z] 12:41:26     INFO - ..
[task 2022-05-19T12:41:26.550Z] 12:41:26     INFO - TEST-OK | /streams/transferable/writable-stream.html | took 213ms
[task 2022-05-19T12:41:26.550Z] 12:41:26     INFO - No more tests
[task 2022-05-19T12:41:26.556Z] 12:41:26     INFO - PID 14268 | 1652964086552	Marionette	INFO	Stopped listening on port 50602
[task 2022-05-19T12:41:26.998Z] 12:41:26     INFO - Browser exited with return code 0
[task 2022-05-19T12:41:27.001Z] 12:41:27     INFO - Closing logging queue
[task 2022-05-19T12:41:27.001Z] 12:41:27     INFO - queue closed
[task 2022-05-19T12:41:27.017Z] 12:41:27     INFO - Got 1 unexpected results, with 0 unexpected passes
[task 2022-05-19T12:41:27.019Z] 12:41:27     INFO - SUITE-END | took 5s
[task 2022-05-19T12:41:27.019Z] 12:41:27     INFO - Repetition 5 / 5
[task 2022-05-19T12:41:27.020Z] 12:41:27     INFO - SUITE-START | Running 1 tests
[task 2022-05-19T12:41:27.021Z] 12:41:27     INFO - Running testharness tests
[task 2022-05-19T12:41:27.033Z] 12:41:27     INFO - Setting up ssl
[task 2022-05-19T12:41:27.066Z] 12:41:27     INFO - certutil | b''
[task 2022-05-19T12:41:27.100Z] 12:41:27     INFO - certutil | b''
[task 2022-05-19T12:41:27.112Z] 12:41:27     INFO - certutil | b'\nCertificate Nickname                                         Trust Attributes\n                                                             SSL,S/MIME,JAR/XPI\n\nweb-platform-tests                                           CT,, \n'
[task 2022-05-19T12:41:27.116Z] 12:41:27     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpqykr7ett
[task 2022-05-19T12:41:27.126Z] 12:41:27     INFO - Starting runner
[task 2022-05-19T12:41:27.677Z] 12:41:27     INFO - PID 14535 | 1652964087676	Marionette	INFO	Marionette enabled
[task 2022-05-19T12:41:27.680Z] 12:41:27     INFO - PID 14535 | 1652964087679	Marionette	INFO	Listening on port 55993
[task 2022-05-19T12:41:31.098Z] 12:41:31     INFO - PID 14535 | console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at /tmp/tmpqykr7ett/search.json.mozlz4", (void 0)))
[task 2022-05-19T12:41:31.796Z] 12:41:31     INFO - TEST-START | /streams/transferable/writable-stream.html

:saschanaz, since you are the author of the regressor, bug 1659025, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(krosylight)
See Also: → 1769956

(Copying from https://bugzilla.mozilla.org/show_bug.cgi?id=1769956#c1)

Unhandled rejection with value: object "TypeError: resolveWrite is not a function"

This sounds like a JavaScript context issue. Matthew, do you have any idea? I couldn't repro this locally, not sure what's different on TVw?

Seems it doesn't fail on Windows, interesting.

Flags: needinfo?(krosylight) → needinfo?(mgaudet)

Tier 1 failure here.

Summary: Intermittent TV /streams/transferable/writable-stream.html | second write should wait for first underlying write to complete - promise_test: Unhandled rejection with value: object "TypeError: resolveWrite is not a function" → Intermittent /streams/transferable/writable-stream.html | second write should wait for first underlying write to complete - promise_test: Unhandled rejection with value: object "TypeError: resolveWrite is not a function"

Set release status flags based on info from the regressing bug 1659025

So Test Verification tests run with a whole batch of variations to try to trip up ordering constraints. So if we got anything wrong (or if the testcase+spec is actually wrong) I'd be entirely unsurprised to see it trip up intermittents here.

Reading the test case it seems like the problem is that we're in a situation where we've made it all the way to line 107, where we invoke the binding resolveWrite before we've ever actually executed the underlying sink write method that should have actually installed it.

Tweaking the test case I was able to reproduce this on my laptop (macOS arm64):

promise_test(async () => {
  let resolveWrite;
  assert_true(resolveWrite == undefined, "resolveWrite ") // new assert
  const orig = new WritableStream({
    write() {
      return new Promise(resolve => {
        resolveWrite = resolve;
      });
    }
  });
  const transferred = await transfer(orig);
  const writer = transferred.getWriter();
  await writer.write('a');
  assert_true(resolveWrite == undefined, "ResolveWrite need to wait until microtask queue is drained");  // new assert
  let writeDone = false;
  const writePromise = writer.write('b').then(() => {
    writeDone = true;
  });
  await flushAsyncEvents();
  assert_false(writeDone, 'second write should not have resolved yet');  // new assert
  assert_true(resolveWrite instanceof Function, "ResolveWrite needed to wait until microtask queue is drained");  // new assert
  resolveWrite();
  await writePromise; // (makes sure this doesn't cause timeout)
  assert_true(writeDone, "now we've done write two.")
}, 'second write should wait for first underlying write to complete');

The case that failed is Running tests in a loop 10 times with flags chaos_mode_flags=0xfb : FAIL

Given that this happening is reliant on events happening in the correct order, one does wonder if what is happening in these TV cases is something is causing the event loop or micro task queue to have the wrong number of events at the wrong time; I wonder how chaos achieves this.

We could potentially rewrite this test to be more robust to interference (considering flushAsyncEvents is defined as 'two event loop spins'), by having another promise that's only resolved once resolveWrite exists...

I'll keep looking at this after lunch.

Flags: needinfo?(mgaudet)

Oh thanks, that does look problematic. Generally anything with delay(0) seems harmful in streams tests, since there's no guarantee that it will actually wait until all the messaging finishes.

That said, I still can't repro this even with your modification 🧐

function makePromiseAndResolveFunc() {
  let resolve;
  const promise = new Promise(r => { resolve = r; });
  return [promise, resolve];
};

promise_test(async () => {
  const [writeCalled, resolveWriteCalled] = makePromiseAndResolveFunc();

  let resolveWrite;
  const orig = new WritableStream({
    write() {
      resolveWriteCalled();
      return new Promise(resolve => {
        resolveWrite = resolve;
      });
    }
  });
  const transferred = await transfer(orig);
  const writer = transferred.getWriter();
  await writer.write('a');
  let writeDone = false;
  const writePromise = writer.write('b').then(() => {
    writeDone = true;
  });
  await writeCalled;
  assert_false(writeDone, 'second write should not have resolved yet');
  resolveWrite();
  await writePromise; // (makes sure this doesn't cause timeout)
}, 'second write should wait for first underlying write to complete');

Does this fix the problem on your machine?

If that works, I'd like to refactor the tests with makePromiseAndResolveFunc which I just copied from web-locks/resources/helpers.js.

Edit: It seems it works https://treeherder.mozilla.org/jobs?repo=try&revision=c3ae797419a3c90b20d1484821fc03966202104f

Locally that definitely fixes this one for me.
(In reply to Kagami :saschanaz from comment #6)

Oh thanks, that does look problematic. Generally anything with delay(0) seems harmful in streams tests, since there's no guarantee that it will actually wait until all the messaging finishes.

So that's a question I have for someone who know more about this than I: Are there any guarantees about messaging that have to be enforced? Or is it totally legal for messages to sit in a queue for an arbitrary amount of time (spins of the event loop/microtask queues) before being delivered? So long as that's the case, strongly agree these tests need to be refactored.

That said, I still can't repro this even with your modification 🧐

To reproduce I commented out everything else in streams/transferable/writable-stream.html, and then ran ./mach wpt --headless streams/transferable/writable-stream.html --verify

Are there any guarantees about messaging that have to be enforced? Or is it totally legal for messages to sit in a queue for an arbitrary amount of time (spins of the event loop/microtask queues) before being delivered? So long as that's the case, strongly agree these tests need to be refactored.

Messages uses different event queue than the setTimeout does:

Those two are independent, so there's no guarantee that any given setTimeout will fire after a given postMessage task.

To reproduce I commented out everything else in streams/transferable/writable-stream.html, and then ran ./mach wpt --headless streams/transferable/writable-stream.html --verify

That doesn't work either, and the TVw CI job doesn't fail on Windows either, so I guess something is different on Windows. 🤷‍♀️

(In reply to Kagami :saschanaz from comment #10)

Messages uses different event queue than the setTimeout does:

Those two are independent, so there's no guarantee that any given setTimeout will fire after a given postMessage task.

That seems like pretty conclusive evidence these tests need to be refactored. Thank you for the explanation!

That doesn't work either, and the TVw CI job doesn't fail on Windows either, so I guess something is different on Windows. 🤷‍♀️

Ah; I was testing on macOS.

postMessage is independent from setTimeout and thus delay(0) cannot gauarantee the task order.

Assignee: nobody → krosylight
Status: NEW → ASSIGNED
Pushed by krosylight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/86eeee115597 Add makePromiseAndResolveFunc and replace delay(0) r=mgaudet
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/34138 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
Upstream PR merged by moz-wptsync-bot
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: