Closed Bug 1858092 Opened 1 year ago Closed 1 year ago

Atomics.wait(sb, 0, 0, delay) not working on notify(sb, 0)

Categories

(Core :: JavaScript Engine, defect, P3)

Firefox 118
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: andrea.giammarchi, Unassigned)

Details

Attachments

(1 file)

Attached file issue-26.zip

Steps to reproduce:

Similarly to Chromium but worse: https://bugs.chromium.org/p/chromium/issues/detail?id=1491342

If I notify(sb, 0) and a worker is doing wait(sb, 0) all it's good but if the worker is doing wait(sb, 0, 0, 42) or any other delay, it stale into timed-out state forever.

The only way to make it not stale is to assign a value different from 0 to the shared buffer so that notify(sb, 0) finally unlock the worker.

Actual results:

You can check the index.html in the attached zip and see that, unless you comment out line 15, the worker never shows a thing in console. You can also comment out line 17 and 18 in the worker.js, and remove the comment from line 15 to see that everything is fine when no delay is used with or without removing comments form the index.

Expected results:

I don't know ... every browser is doing something different but I'd expect that wait with a delay works just as well and never stale forever like it does now.

The workaround is also awkward as I need to explicitly set something at index 0 that is not the 0 number itself or nothing works regardless ... this feels a bit weird when Shared Array Buffers are passed along to wait for operation but don't necessarily need a value different from zero, actually making zero itself an impossible to trust value for a shared view of such buffer.

Component: Untriaged → JavaScript Engine
Product: Firefox → Core

Iain, give your work on waitAsync, would you be able to take a look at this? I'm not sure if this is a documentation/spec or implementation issue -- it could be this is a gap in the specified behaviour.

I was unfortunately unable to reproduce the provided test case; while I did managed to get the headers described in the documentation sent, for some reason my browser insisted I was still not cross-origin isolated and thus wouldn't load SharedArrayBuffer

Severity: -- → S3
Flags: needinfo?(iireland)
Priority: -- → P3

Thanks for the report!

However, I think there's a race condition in your testcase. Consider what happens in the following interleaving:

WORKER                  MAIN
postMessage             (doing other work)
wait                    ...
...                     ...
timeout                 process message from worker
call doSomething()      call notify
<loop>             

Atomics.wait does not look back in time to see if a notify happened in the past; if there's nothing waiting when a notify occurs, the notification just gets dropped on the floor. If the main thread happens to call notify in the gap between the end of one wait and the beginning of the next, then the worker gets stuck in an infinite loop. The main thread will never see another message, so it will never notify again.

The value argument is designed to give you a way to work around this. The idea is that when one thread notifies another, it also updates the value at that index. In that case, the waiting thread will see that the value has already changed, and can go ahead without waiting.

You may think: this is not a very user-friendly API. True! Looking back at the original design documents for Atomics, the goal was to provide low-level tools that could be wrapped up in simpler abstractions, while still having enough power for all the intended use cases.

I'm going to resolve this as invalid. If you think I've missed something, feel free to reopen it.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(iireland)
Resolution: --- → INVALID

if there's nothing waiting when a notify occurs, the notification just gets dropped on the floor.

how a while (true) { ... } loop became a "nothing waiting" in your mind?

then the worker gets stuck in an infinite loop

how is this acceptable as undocumented footgun for developers?

The value argument is designed to give you a way to work around this. The idea is that when one thread notifies another, it also updates the value at that index. In that case

how is the inability to notify a 0 at index 0 as meant result acceptable?

That being said, you fully explained to me what happens in Firefox, which is not what happens in Chromium or WebKit, but at least I can reason about your behavior and move forward with my workaround that setting the index [0] as value -1 will indeed work as expected, assuming nobody has a use case to have 0 as valid result and -1 as indication no result was found, as it is for every single indexOf based operation in JS ... and here if I set wait(sb, 0, -1, delay) everything breaks instead even with 0 as desired result.

btw, this is for Matthew

I was unfortunately unable to reproduce the provided test case;

npx static-handler --coi .

that should do

(In reply to Andrea Giammarchi from comment #3)

if there's nothing waiting when a notify occurs, the notification just gets dropped on the floor.

how a while (true) { ... } loop became a "nothing waiting" in your mind?

This is your code:

 while (wait(i32a, 0, 0, 42) === 'timed-out')
    doSomething();

You call the wait function. The worker thread starts waiting, and continues waiting for the next 42 ms. When the timeout fires, the worker thread stops waiting and calls doSomething(). During the window of time between the end of one wait, and the beginning of the next, nobody is waiting. If the main thread calls notify during that window, there is nothing to notify. The call to notify will return 0, because 0 sleeping threads were woken up.

The expected use case for wait/notify is to replace code that looks like this:

Sender                          Receiver

Atomics.store(ia, 0, value);    while (Atomics.load(ia, 1) == 0) {}
Atomics.store(ia, 1, 1);        var value = Atomics.load(ia, 0);

with code that looks like this:

Sender                          Receiver
Atomics.store(ia, 0, value);    Atomics.wait(ia, 1, 0);
Atomics.store(ia, 1, 1);        var value = Atomics.load(ia, 0);
Atomics.notify(ia, 1);

In this code, ia[0] is the value you want to pass from one thread to another, and ia[1] is a flag that is used to communicate whether the value is available yet. Using wait/notify eliminates a busy wait loop, allowing that CPU core to be used for something else until the value is ready.

If this explanation does not make sense to you, you might want to spend some more time learning about multithreading.

To be very clear: this is the correct behaviour according to the specification. This is a low-level feature designed for people who know what they're doing. The sharp edges of this API are necessary to enable that. I gently suggest that you should either read the documentation very carefully, or find a library that wraps this up in a more user-friendly way.

During the window of time between the end of one wait, and the beginning of the next, nobody is waiting. If the main thread calls notify during that window, there is nothing to notify.

this makes no sense as the solution is to notify a value different from 0 and everything works so your explanation doesn't really explain the fix as presented.

This is a low-level feature designed for people who know what they're doing.

there are only two libraries out there based on atomics I believe both authors know what they are doing and both authors read the specs ... I one of those two authors and this comment was really unnecessary and pretty lame to me.

I gently suggest that you should either read the documentation very carefully, or find a library that wraps this up in a more user-friendly way.

I am the author of the library that wraps all this in the most friendly possible way, it's called coincident so I suggest you look up how many things we're building on top of that already and the usage of the delay was to have interrupts invoked for WASM foreign PLs such as Python via Pyodide.

I really think this whole comment was unnecessary but most importantly it's pretty smatterer without explaining why the fix works reliably.

For history sake, all the reasons the answer was really hostile but also wrong.

You call the wait function. The worker thread starts waiting, and continues waiting for the next 42 ms.

The bug exists with any timeout, it's not a timeout issue, the number 42 was just a placeholder, the wait time is unrelated.

When the timeout fires, the worker thread stops waiting and calls doSomething()

the worker is still a single thread and it has all the behaviors typical of a single thread ... in fact, the worker can get stuck just like any main thread could in a while (true) condition. The doSomething() is also synchronous, but the worker is still in the same "event loop" there, going back to the while so that it's busy, yet non blocking, only while it's waiting, otherwise it's just busy, hence not responding, to any input ... this is the same reason if you Atomics.wait(sb, 0) but you have an event listener for postMessage set before, that listener will never fire until the atomic ends, because the worker is blocked there.

If the main thread calls notify during that window, there is nothing to notify.

too bad beside this whole description of what happens being highly inaccurate, as soon as the notify is on a view where the index 0 is set to another value different from 0, everything indeed works as expected.

The suggestion here was to read the whole specs, but I am not even sure the dev in case read any of my comments in my 10 LOC example ... anyway ...

The expected use case for wait/notify is to replace code that looks like this: ...
with code that looks like this: ...

This is the only explanation to my fix via -1 which is to send a value different from 0 to the index waiting for it ... basically the nitty-gritty is that with Atomics.wait(view, 0, 0, anyDelay) if the notification happens to provide a 0 the result is never resolved and it will keep timing out forever because 0 matches the 0 third argument. This is not true at all in Chromium and WebKit but I believe Firefox might be the only one strictly implementing specs and I'd be happy for it to be case as I want other vendors to do and either fail or work the same.

In short, when third and/or fourth arguments of Atomics.wait(view, index[, control[, delay]]) is desired, the code should create a double amount of buffer size to have one index for the value and another one for the control of such value.

I can actually confirm this indeed seems to be reliable enough for the presented test, so that all major browsers now seem to work as expected even after multiple refresh.

console.log('worker.js');

const { BYTES_PER_ELEMENT: I32_BYTES } = Int32Array;
const { wait } = Atomics;

const doSomething = () => {};

for (let i = 0; i < 100000; i++) {
  const i32a = new Int32Array(new SharedArrayBuffer(I32_BYTES * 2));
  postMessage(i32a);
  while (wait(i32a, 1, 0, 42) === 'timed-out')
    doSomething();
}

console.log('DONE');
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width,initial-scale=1.0">
  <script type="module">
    console.log('main');
    let func_calls = 0;
    const worker = new Worker('worker.js');
    worker.onmessage = ({ data: i32a }) => {
      func_calls++;
      if (func_calls % 1000 == 0)
        console.log(`func: ${func_calls}`);
      i32a[1] = 1;
      Atomics.notify(i32a, 1);
    };
  </script>
</head>
</html>

As summary, it's now clear to me what was the issue, with the work around solved it, and what's in general the better approach.

In none of this exchange there was any need to patronize, tell me I don't know what I am doing, and showing off also nonsensical parts of the answer because "I don't understand multi-threading" ... the exchange was awful to state it in a very mild way, so I encourage you to revisit your answering style and ego because none of this is explained in MDN and none of this is showed as example in specs to clarify what's the control argument about ... it's not even called control which would've already simplified everyone life.

Take care.

I will note for posterity that the first sentence of the MDN page is as follows:

The Atomics.wait() static method verifies that a shared memory location still contains a given value and if so sleeps, awaiting a wake-up notification or times out.

Beyond that, I can only say that hostility is in the eye of the beholder.

to whom it might concern, there's now a video about the bug and it has nothing to do with anything explained in here in other browsers: https://www.youtube.com/watch?v=Wx5BKw9ol7E

I still think you should not let your own developers ruin your trust from developers, by indirectly insulting whoever files bug or by even trusting the developer understood it all.

If nothing, as long time contributor to both MDN and the OSS in general, this interaction made me hope I don't want ever to file a bug again in here.

Regards.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: