Closed Bug 1513466 Opened 7 years ago Closed 8 months ago

Intermittent /wasm/webapi/abort.any.js | single tracking bug

Categories

(Core :: JavaScript: WebAssembly, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, intermittent-testcase)

Attachments

(1 file, 2 obsolete files)

Filed by: rmaries [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=216543613&repo=mozilla-central https://queue.taskcluster.net/v1/task/OwysPB8KSQed0i9aNYGFkQ/runs/0/artifacts/public/logs/live_backing.log [task 2018-12-12T06:30:39.154Z] 06:30:39 INFO - TEST-START | /wasm/webapi/abort.any.worker.html [task 2018-12-12T06:30:39.501Z] 06:30:39 INFO - [task 2018-12-12T06:30:39.501Z] 06:30:39 INFO - TEST-PASS | /wasm/webapi/abort.any.worker.html | compileStreaming() on an already-aborted request should reject with AbortError [task 2018-12-12T06:30:39.501Z] 06:30:39 INFO - TEST-UNEXPECTED-FAIL | /wasm/webapi/abort.any.worker.html | compileStreaming() synchronously followed by abort should reject with AbortError - assert_unreached: Should have rejected: compileStreaming should reject Reached unreachable code [task 2018-12-12T06:30:39.501Z] 06:30:39 INFO - .... [task 2018-12-12T06:30:39.501Z] 06:30:39 INFO - TEST-OK | /wasm/webapi/abort.any.worker.html | took 347ms [task 2018-12-12T06:30:40.261Z] 06:30:40 INFO - Closing logging queue [task 2018-12-12T06:30:40.261Z] 06:30:40 INFO - queue closed [task 2018-12-12T06:30:40.269Z] 06:30:40 INFO - Setting up ssl [task 2018-12-12T06:30:40.289Z] 06:30:40 INFO - certutil | [task 2018-12-12T06:30:40.310Z] 06:30:40 INFO - certutil | [task 2018-12-12T06:30:40.325Z] 06:30:40 INFO - certutil | [task 2018-12-12T06:30:40.325Z] 06:30:40 INFO - Certificate Nickname Trust Attributes [task 2018-12-12T06:30:40.325Z] 06:30:40 INFO - SSL,S/MIME,JAR/XPI [task 2018-12-12T06:30:40.326Z] 06:30:40 INFO - [task 2018-12-12T06:30:40.326Z] 06:30:40 INFO - web-platform-tests CT,, [task 2018-12-12T06:30:40.326Z] 06:30:40 INFO - [task 2018-12-12T06:30:42.110Z] 06:30:42 INFO - adb Granting important runtime permissions to org.mozilla.fennec_aurora [task 2018-12-12T06:30:43.372Z] 06:30:43 INFO - adb launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp -a android.intent.action.VIEW --es env9 MOZ_PROCESS_LOG=/tmp/tmpOHSGJLpidlog --es env8 MOZ_CRASHREPORTER_NO_REPORT=1 --es args "-no-remote -profile /sdcard/tests/profile --marionette about:blank" --es env3 STYLO_THREADS=4 --es env2 MOZ_HIDE_RESULTS_TABLE=1 --es env1 R_LOG_VERBOSE=1 --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env6 R_LOG_DESTINATION=stderr --es env5 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env4 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4,MediaPipelineFactory:4 --es env10 R_LOG_LEVEL=6
Ms2ger: am I interpreting this failure correctly as: WebAssembly.compileStreaming() is being fulfilled, instead of rejected with AbortError, which shouldn't happen because the Promise.resolve() inside WebAssembly.compileStreaming() should mean that compileStreaming() can't even *start* until we return from the function?
Flags: needinfo?(Ms2ger)
I think that's correct, yes. Might the failure be related to the fact that we pass an async function to promise_test()? Not sure what happens there if there's no await.
Flags: needinfo?(Ms2ger)
That's an interesting question: I don't see any yield points between the fetch() and the abort(), though. I expect not but, just to be sure: arai, can you confirm that JS::AddPromiseReactions() never synchronously runs a reaction instead of waiting until returning to the event loop? (Specifically when the promise is a PromisObject::unforgeableResolve().) Assuming not, then I think the error would be with the fetch()+abort(): maybe the stream started by fetch() racily "finishes" such that the abort() is ignored (even though noone has started consuming the stream). It'd be interesting to know whether this is allowed by the spec; it seems like it could be, in case the test would need to accept completion like the third test I added. Ms2ger, do you know the right spec person we could ping?
Flags: needinfo?(arai.unmht)
Flags: needinfo?(Ms2ger)
I'm going to say Anne.
Flags: needinfo?(Ms2ger) → needinfo?(annevk)
For reference, the question is: can this fetch() ignore this abort(): https://searchfox.org/mozilla-central/source/testing/web-platform/tests/wasm/webapi/abort.any.js#19,21 if the fetch()'s internal stream racily completes entirely before the abort(), noting the lack of any yield to the event loop between the two.
abort() invokes https://dom.spec.whatwg.org/#abortsignal-signal-abort. That runs https://fetch.spec.whatwg.org/#abort-fetch (invoking fetch() adds that algorithm to the abort algorithms). That invokes reject. Since any invocation of resolve will be from on a task queued from a different thread (in parallel), that reject will by definition always win, unless I'm missing something. (I also cannot reproduce the failure locally (Nightly, macOS).)
Flags: needinfo?(annevk)
Thanks Anne. So then the next question, I guess for baku, would be: is it possible we have a bug that allows abort()s (in the situation described in comment 5 and 6) to fail to generate an AbortError?
Flags: needinfo?(amarchesini)
just a note for debugging. if I put the following busy loop after fetch line, the test fails for me, on m-c 6f21533f643e artifact build (opt) on macOS. for (let i = 0; i < 10000; i++) { for (let j = 0; j < 1000; j++) {} } will check things around reactions later today.
(In reply to Tooru Fujisawa [:arai] from comment #8) Great idea and very useful result; thanks!! If it's not reactions, I'll see if I can confirm with printf()s that the stream is just closing succesfully, despite the abort().
Attached patch fetch_race.patch (obsolete) — — Splinter Review
It can happen that the main-thread is faster than the worker and we are resolving the promise before considering the abort(). The fix introduces an extra runnable to start the operation. To be consistent, the patch introduces a similar runnable for the main-thread case as well.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #9031066 - Flags: review?(bugs)
(In reply to Luke Wagner [:luke] from comment #3) > I expect not but, just to be sure: arai, can you confirm that > JS::AddPromiseReactions() never synchronously runs a reaction instead of > waiting until returning to the event loop? (Specifically when the promise > is a PromisObject::unforgeableResolve().) It doesn't. We do some optimization by not-returning to event loop only for `await` [1], and it happens only when the embedding allows it, by telling SpiderMonkey that the job queue is empty [2]. [1] https://searchfox.org/mozilla-central/rev/fd32b3a6fa3eff1468311f6fcf32b45c117136df/js/src/builtin/Promise.cpp#4832 [2] https://searchfox.org/mozilla-central/search?q=symbol:_ZN2JS15JobQueueIsEmptyEP9JSContext&redirect=false
Flags: needinfo?(arai.unmht)
Comment on attachment 9031066 [details] [diff] [review] fetch_race.patch I'm not happy about this patch. I'll submit a better one.
Attachment #9031066 - Attachment is obsolete: true
Attachment #9031066 - Flags: review?(bugs)
Wow, thanks for the quick action baku! Anne: it's perhaps worth pointing out that this semantic detail seems to be adding some modicum of overhead/complexity. Is it worth changing the spec instead to allow for the abort() to lose the race, even intra-turn?
Flags: needinfo?(annevk)
Generally, if we can require determinism we do because non-determinism is hard for developers to deal with. It can also turn into a compatibility issue if some implementations with lots of market share are nevertheless deterministic in some manner. There's a web platform principle of sorts that either you require an order, or you require randomness. (I hope this doesn't come across weirdly as I realize you also do a lot of web platform API design work, just trying to explain where I'm coming from.) Also, the main reason we seem to be non-deterministic here is because we go via the main thread for the worker implementation, if I understand things correctly. That doesn't necessarily seem like the desirable long term strategy for this code, at which point we'd have to change things presumably.
Flags: needinfo?(annevk)
Yes, thanks for explaining and agreed with those goals; just thought it'd be worth asking in case this was more of an accidental spec consequence rather than an intentional property. I also missed that the root issue was this worker/main-thread impl detail.
Attached patch fetch_race.patch — — Splinter Review
Attachment #9031795 - Flags: review?(bugs)
Comment on attachment 9031795 [details] [diff] [review] fetch_race.patch Please fix the commit message to not talk about main-thread case. (main thread should be behaving sanely already, as we discussed on IRC)
Attachment #9031795 - Flags: review?(bugs) → review+

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:baku, could you have a look please?

Flags: needinfo?(amarchesini)
Severity: normal → S3
Summary: Intermittent /wasm/webapi/abort.any.worker.html | compileStreaming() synchronously followed by abort should reject with AbortError - assert_unreached: Should have rejected: compileStreaming should reject Reached unreachable code → Intermittent /wasm/webapi/abort.any.js | single tracking bug
Attachment #9385351 - Attachment is obsolete: true
Flags: needinfo?(amarchesini)
Assignee: amarchesini → nobody
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: