WebAssembly.compileStreaming hangs a wasm helper thread forever when the code section is truncated mid-stream
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
People
(Reporter: jandem, Assigned: jpages)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: ai-involved, regression)
Attachments
(2 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
Summary
WebAssembly.compileStreaming / WebAssembly.instantiateStreaming permanently deadlocks a wasm compilation helper thread when the streamed module's code-section header declares a size larger than the number of bytes actually delivered before the stream ends cleanly (i.e. a malformed but complete HTTP response). The returned promise never settles, and the helper thread cannot be joined at shutdown.
Root cause
CompileStreamTask in js/src/wasm/WasmJS.cpp drives a state machine Env -> Code -> Tail -> Closed. The helper thread runs CompileStreaming() (js/src/wasm/WasmCompile.cpp:1207) whose StreamingDecoder::waitForBytes() (WasmCompile.cpp:1181-1192) blocks on the exclusiveCodeBytesEnd_ condvar and only exits when either more bytes arrive or cancelled_ is set:
bool waitForBytes(size_t numBytes) {
numBytes = std::min(numBytes, d_.bytesRemain());
const uint8_t* requiredEnd = d_.currentPosition() + numBytes;
auto codeBytesEnd = codeBytesEnd_.lock();
while (codeBytesEnd < requiredEnd) {
if (cancelled_) { return false; }
codeBytesEnd.wait();
}
return true;
}
CompileStreamTask::streamEnd() (WasmJS.cpp:5459-5486) handles Code and Tail identically — it only marks exclusiveStreamEnd_.reached = true and notifies exclusiveStreamEnd_. When the stream ends while still in Code, the helper thread is blocked on exclusiveCodeBytesEnd_ (not exclusiveStreamEnd_); neither codeBytesEnd_ advancing nor cancelled_ (i.e. streamFailed_) being set ever happens, so it waits indefinitely.
Contrast with the error path rejectAndDestroyAfterHelperThreadStarted() (WasmJS.cpp:5369-5377), which correctly sets streamFailed_ = true and notifies both condvars. A clean end-of-stream while still in Code is effectively a truncation and should be treated the same way.
The DOM consumer behaves the same as the shell harness: a normal stream close (NS_BASE_STREAM_CLOSED) calls streamEnd() (dom/fetch/FetchUtil.cpp:569), not streamError() — so a complete HTTP response carrying a malformed wasm module reaches this path from web content.
Test case
// timeout 15 obj-shell-dbgopt/dist/bin/js --fuzzing-safe repro.js
//
// Expected (after fix): "bad: rejected CompileError: ..." printed, exit 0.
// Observed (buggy): "script end reached" prints, then the process hangs
// because the wasm helper thread is stuck in StreamingDecoder::waitForBytes
// and cannot be joined at shutdown.
function streamCompile(bytes, tag) {
WebAssembly.compileStreaming(bytes).then(
m => print(tag + ": compiled OK"),
e => print(tag + ": rejected " + e));
}
// Code-section header declares size=0x0a (10) but only 3 content bytes
// follow before the stream ends (a complete but malformed module).
var bad = new Uint8Array([
0x00,0x61,0x73,0x6d, 0x01,0x00,0x00,0x00, // preamble
0x01,0x04,0x01,0x60,0x00,0x00, // type section: () -> ()
0x03,0x02,0x01,0x00, // function section: func0 : type0
0x0a,0x0a, // code section, declared size = 10
0x01,0x08,0x00 // count=1, body0 size=8, locals=0 ... then EOF
]);
streamCompile(bad, "bad");
print("script end reached");
Locally reproduced on current main:
$ timeout 15 obj-shell-dbgopt/dist/bin/js --fuzzing-safe test.js
script end reached
EXIT=124 # killed by timeout: hung
$ # control: same module with code section size 4 matching its 4 content bytes
$ timeout 15 obj-shell-dbgopt/dist/bin/js --fuzzing-safe test.js
script end reached
ok: compiled OK
EXIT=0
The hang reproduces even though drainJobQueue() is not called — "script end reached" is printed before the process hangs at shutdown, demonstrating the leaked/blocked helper thread is independent of any job-queue draining.
Suggested fix
When streamEnd() arrives in the Code state, the declared code section is necessarily shorter than what was delivered: treat it as a truncation and reject, mirroring rejectAndDestroyAfterHelperThreadStarted(). This also notifies exclusiveCodeBytesEnd_, which unblocks waitForBytes() via the cancelled_ check.
Sketch (js/src/wasm/WasmJS.cpp, in CompileStreamTask::streamEnd):
void streamEnd(
JS::OptimizedEncodingListener* completeTier2Listener) override {
switch (streamState_.lock().get()) {
case Env: {
BytecodeBuffer bytecode(envBytes_, nullptr, nullptr);
module_ = CompileModule(*compileArgs_,
BytecodeBufferOrSource(std::move(bytecode)),
&compileError_, &warnings_, nullptr);
setClosedAndDestroyBeforeHelperThreadStarted();
return;
}
- case Code:
+ case Code:
+ // The stream ended mid-code-section: the declared code-section
+ // size was larger than the bytes actually delivered. The helper
+ // thread is blocked in StreamingDecoder::waitForBytes() on
+ // exclusiveCodeBytesEnd_, and only the streamFailed_/cancelled_
+ // path will unblock it. Treat this as a truncation error.
+ rejectAndDestroyAfterHelperThreadStarted(
+ JSMSG_WASM_COMPILE_ERROR /* or a dedicated "truncated" code */);
+ return;
case Tail:
// Unlock exclusiveStreamEnd_ before locking streamState_.
{
auto streamEnd = exclusiveStreamEnd_.lock();
MOZ_ASSERT(!streamEnd->reached);
streamEnd->reached = true;
streamEnd->tailBytes = tailBytes_;
streamEnd->completeTier2Listener = completeTier2Listener;
streamEnd.notify_one();
}
setClosedAndDestroyAfterHelperThreadStarted();
return;
case Closed:
MOZ_CRASH("streamEnd() in Closed state");
}
}
Equivalent alternatives:
- Always set
streamFailed_and notifyexclusiveCodeBytesEnd_from theCodebranch (so the streaming decoder fails the same way it does on stream error), and rely on the existingTail/Codeend-handling for the rest. - Give
StreamingDecoder::waitForBytes()visibility intoexclusiveStreamEnd_.reachedand fail with a "code section truncated"CompileErrorwhenreachedis set butcodeBytesEnd_ < requiredEnd. This produces a clearer error message than reusing the stream-error path, at the cost of a wider API change.
| Reporter | ||
Comment 1•10 days ago
|
||
Claude thinks this goes back all the way to bug 1406421 (2017), when streaming compilation was implemented.
| Reporter | ||
Updated•10 days ago
|
Comment 2•10 days ago
|
||
Set release status flags based on info from the regressing bug 1406421
Updated•10 days ago
|
| Assignee | ||
Updated•10 days ago
|
Comment 3•10 days ago
|
||
This could lead to some DoS, but this does not sounds like a security issue.
Updated•10 days ago
|
| Assignee | ||
Updated•10 days ago
|
Comment 4•10 days ago
|
||
I agree, this sounds like DoS but not a security issue. Jan is there any reason we can't open this? Just triple-checking before doing it.
| Reporter | ||
Comment 5•9 days ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #4)
I agree, this sounds like DoS but not a security issue. Jan is there any reason we can't open this? Just triple-checking before doing it.
I'll open it up. I sometimes file these as security bugs just to be safe.
| Assignee | ||
Comment 6•9 days ago
|
||
Updated•8 days ago
|
Comment 8•6 days ago
|
||
| bugherder | ||
Comment 9•4 days ago
|
||
The patch landed in nightly and beta is affected.
:jpages, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- See https://wiki.mozilla.org/Release_Management/Requesting_an_Uplift for documentation on how to request an uplift.
- If no, please set
status-firefox152towontfix.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 10•1 day ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D302892
Updated•1 day ago
|
Comment 11•1 day ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined/Reason for urgency: This bug can cause a deadlock if an incomplete wasm module is being streamed. In this case the helper thread can become stuck waiting for the rest of the module.
- Code covered by automated testing?: yes
- Fix verified in Nightly?: yes
- Needs manual QE testing?: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: Small patch with changes in the wasm streaming code to wake up the stuck thread as well as propagating the error.
- String changes made/needed?: N/A
- Is Android affected?: yes
| Assignee | ||
Updated•1 day ago
|
Updated•1 day ago
|
Updated•1 day ago
|
Updated•1 day ago
|
Comment 12•1 day ago
|
||
| uplift | ||
Description
•