Closed Bug 2029633 Opened 6 days ago Closed 5 days ago

Atomics.wait "not-equal" return path missing memory fence — 63% stale reads with 3+ workers

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: lostit1278, Unassigned)

References

()

Details

When Atomics.wait returns "not-equal" (because the watched value has already changed before the call), SpiderMonkey does not appear to emit a full sequential-consistency memory fence. This causes workers that take the "not-equal" fast path to read stale values from SharedArrayBuffer — values written by other workers before the barrier are invisible.

The failure rate is ~63% of cross-worker reads, which matches the theoretical 2/3 expected when the fence is missing (3 workers, each reading 2 other workers' slots).

This is not SpiderMonkey-specific. All three major JS engines are affected: V8 (Chromium), SpiderMonkey (Firefox), and JavaScriptCore (Safari). V8 has progressively fixed the fence in recent versions. Three independent engines failing identically suggests a spec-level ambiguity in the ECMAScript memory model.

STEPS TO REPRODUCE

  1. Open https://lostbeard.github.io/v8-atomics-wait-bug/ in Firefox
  2. Click "Run All Tests"
  3. Observe Test 2 fails with stale reads

Source: https://github.com/LostBeard/v8-atomics-wait-bug

WHAT THE TEST DOES

Three workers synchronize using a standard generation-counting barrier with Atomics.wait/Atomics.notify. Each iteration: workers write a unique value to their slot, enter the barrier, then read all other workers' slots and verify values match.

Three tests isolate the bug:

  • Test 1 (2 workers, wait/notify): PASS — 0 stale reads
  • Test 2 (3 workers, wait/notify): FAIL — 63.2% stale reads at 1K iterations
  • Test 3 (3 workers, spin/Atomics.load): PASS — 0 stale reads

EXPECTED BEHAVIOR

After Atomics.wait returns — regardless of return value ("ok", "not-equal", "timed-out") — all prior stores from all agents that happened-before the event that caused the return should be visible.

ACTUAL BEHAVIOR

When Atomics.wait returns "not-equal", stores from other workers that preceded the generation bump are not visible. Workers read stale values. Error rate is ~63%, consistent with the missing fence affecting 2 out of 3 cross-worker read pairs.

SPIDERMONKEY TEST RESULTS

Firefox 148 / Windows 11 (AMD Ryzen 5 7500F, 6c/12t):

  • Test 1 (2W wait/notify): PASS — 0 / 200,000 stale reads (0%)
  • Test 2 (3W wait/notify): FAIL — 1,897 / 3,000 stale reads (63.2%)
  • Test 3 (3W spin): PASS — 0 / 9,000 stale reads (0%)

Firefox 149 / macOS Tahoe (Apple Silicon, 10 cores, via BrowserStack):

  • Test 1 (2W wait/notify): PASS — 0 / 200,000 stale reads (0%)
  • Test 2 (3W wait/notify): FAIL — 4,004 / 39,000 stale reads (10.3%)
  • Test 3 (3W spin): PASS — 0 / 36,000 stale reads (0%)

SpiderMonkey fails on both x86 (Windows) and ARM (macOS Apple Silicon). On the same macOS Tahoe BrowserStack host, V8 (Chrome/Edge 146) passes with 0 stale reads across 10 runs — confirming V8 has fixed the fence while SpiderMonkey has not.

WORKAROUND

Replacing Atomics.wait with a pure spin on Atomics.load fixes the issue:
while (Atomics.load(view, genIdx) === myGen) {}
Every Atomics.load is seq_cst — when it observes the new generation, the total order guarantees all prior stores are visible.

CROSS-ENGINE RESULTS

All three major engines affected:

  • V8 12.4 (Node.js 22.14), x86-64 Windows: ~66%
  • V8 14.6 (Chrome 146), x86-64 Windows: 10.5%
  • V8 14.6 (Chrome 146), macOS Tahoe: 0% (fixed)
  • SpiderMonkey (Firefox 148), x86-64 Windows: 63.2%
  • SpiderMonkey (Firefox 149), macOS Tahoe: 10.3%
  • JSC (Safari 18), macOS Sequoia: 10.8%
  • JSC (Safari 17), macOS Sonoma: 50.9%
  • JSC (Safari 26), macOS Tahoe: 26.1%
  • Android Chrome (3 ARM SoCs): 14.5%-48.4% — fails 2-worker test on ARM

SPEC REFERENCES

RELATED BUGS

Cross-browser testing powered by BrowserStack (https://www.browserstack.com).

Severity: -- → S3
Priority: -- → P3

Closing this - the bug was in our barrier implementation, not in SpiderMonkey.

Our barrier used a single Atomics.wait without a loop, making it vulnerable to spurious cross-barrier wakeups. Atomics.notify wakes waiters by index, not by value - so a notify from barrier N can wake a waiter at barrier N+1. Without a loop to re-check, the worker exits the barrier prematurely.

The fix is wrapping Atomics.wait in a while loop that re-checks the condition:

while (Atomics.load(v, GEN) === gen) {
    Atomics.wait(v, GEN, gen);
}

Verified: 0 stale reads across Chrome, Firefox, Safari, and Android ARM with the corrected barrier.

Credit to Shu-yu Guo for identifying the issue (https://github.com/tc39/ecma262/issues/3800). Apologies for the false report.

Closing per comment #1.

Status: UNCONFIRMED → RESOLVED
Closed: 5 days ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.