Crash in [@ js::wasm::DebugState::adjustEnterAndLeaveFrameTrapsState]
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
thunderbird_esr91 | --- | unaffected |
firefox-esr91 | --- | unaffected |
firefox98 | --- | unaffected |
firefox99 | --- | unaffected |
firefox100 | --- | affected |
People
(Reporter: calixte, Assigned: lth)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/f3524ce4-2f64-4568-89d4-f78330220321
MOZ_CRASH Reason: MOZ_CRASH(No code segment at this tier)
Top 10 frames of crashing thread:
0 XUL js::wasm::DebugState::adjustEnterAndLeaveFrameTrapsState js/src/wasm/WasmDebug.cpp:357
1 XUL js::wasm::DebugState::ensureEnterFrameTrapsState js/src/wasm/WasmDebug.cpp:365
2 XUL UpdateExecutionObservabilityOfScriptsInZone js/src/debugger/Debugger.cpp:3255
3 XUL js::Debugger::updateExecutionObservabilityOfScripts js/src/debugger/Debugger.cpp:3273
4 XUL js::Debugger::updateExecutionObservability js/src/debugger/Debugger.cpp:3341
5 XUL js::Debugger::updateObservesAllExecutionOnDebuggees js/src/debugger/Debugger.cpp:3450
6 XUL js::Debugger::setHookImpl js/src/debugger/Debugger.cpp:4157
7 XUL bool js::Debugger::CallData::ToNative<&js::Debugger::CallData::setOnEnterFrame js/src/debugger/Debugger.cpp:4126
8 XUL js::CallSetter js/src/vm/Interpreter.cpp:730
9 XUL SetExistingProperty js/src/vm/NativeObject.cpp:2490
There is 1 crash in nightly 100 with buildid 20220321065848. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1756951.
[1] https://hg.mozilla.org/mozilla-central/rev?node=5ab04623c826
Assignee | ||
Comment 1•3 years ago
|
||
Yeah, this is almost certainly mine, unless my change uncovered an existing problem.
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
This is a very curious failure. The MOZ_CRASH indicates no debugging code is available. Yet UpdateExecutionObservabilityOfScriptsInZone guards on instance->debugEnabled(), and when instance->debugEnabled() is true there should definitely be debugging code available in the instance.
We could strengthen the assertion in the DebugState constructor in an attempt to catch the problem earlier:
diff --git a/js/src/wasm/WasmDebug.cpp b/js/src/wasm/WasmDebug.cpp
--- a/js/src/wasm/WasmDebug.cpp
+++ b/js/src/wasm/WasmDebug.cpp
@@ -41,7 +41,8 @@ DebugState::DebugState(const Code& code,
module_(&module),
enterFrameTrapsEnabled_(false),
enterAndLeaveFrameTrapsCounter_(0) {
- MOZ_ASSERT(code.metadata().debugEnabled);
+ MOZ_RELEASE_ASSERT(code.metadata().debugEnabled);
+ MOZ_RELEASE_ASSERT(code.hasTier(Tier::Debug));
}
There's a possibility that this is another problem with the interaction of code caching and debugging, but without STR it's hard to say for sure.
Assignee | ||
Comment 3•3 years ago
|
||
Comment 4•3 years ago
|
||
You might be on something: looking at CompilerEnvironment::computeParameters
or CompileArgs::build
, I cannot find that we are rejecting debugEnabled
if baseline is not available.
Assignee | ||
Comment 5•3 years ago
|
||
(In reply to Yury Delendik (:yury) from comment #4)
You might be on something: looking at
CompilerEnvironment::computeParameters
orCompileArgs::build
, I cannot find that we are rejectingdebugEnabled
if baseline is not available.
(After some IM conversation) I think that was based on a misreading of the code on Yury's part.
But a more interesting matter is, what are the invariants of code and how do we check them? Consider that compiled code in the form of a Module can enter a run-time context in at least these ways:
- it can be the result of compiling bytecode under a set of flags
- it can be the result of deserializing cached code, and whether deserialization can be done depends on current flags and cached flags; more simply, if debugging is enabled, we should not be deserializing code at all, since serialized code is never debuggable
- it can be received by postMessage from another context, where it was compiled under some flags, and in principle the receiving context can have different flags (eg, the present failure would be explained by code being posted from a context with no debugging to a context where debugging is enabled)
Yury makes the point that "we need better management/assertion of debugEnabled metadata flag at the end of compilation" and I think this is exactly right.
Assignee | ||
Comment 6•3 years ago
|
||
Looking at the crash report, we see "RemoteType webCOOP+COEP" which suggests the presence of workers. It's not hard to get non-debug wasm code into a context that has an active debugger (bullet 3 above) but I've yet to make that crash or assert in any way, so something more needs to happen.
Assignee | ||
Comment 7•3 years ago
|
||
It's not completely obvious whether the crash comes when trying to enable or disable the frame traps state. The line number corresponds to the end of the function; in the machine code, a number of the calls to abort() are shunted to the end, so it could be either. The enable case also accesses the Tier::Debug because it may need to install the trap handler and to do that it needs to compute the handler's address. Need STR to do better, probably.
The Module has a flag debugEnabled
on its metadata. That flag is initially false, and is set to true only in ModuleGenerator::finishMetadata()
, if compilerEnv->debugEnabled()
is true. In ModuleGenerator::finishModule()
we additionally check that if compilerEnv->debugEnabled()
is true then the module has code at Tier::Debug
and is not tiered, and we create other debugging metadata. The module is immutable after creation. Ergo the flag is true iff there is debug code/data. Ergo the module is self-consistent. It's the module that's shared between threads, including its code and metadata. Code compiled in a non-debugging context would not have debug code/data and its debugEnabled flag would not be set. This includes code deserialized from cache. Thus a self-consistent module with or without debugging code would arrive in some context and be the argument of instantiation.
The instance has a flag, debugEnabled(), which is true iff the instance has a DebugState. This is the flag that guards the code referenced in an earlier comment. The DebugState is created in Module::instantiate iff the module's metadata has the debugEnabled flag set. Thus it would seem that instance->debugEnabled()
is true iff the module is debuggable, and assuming that the module is self-consistent the instance should be too.
All code I've seen so far assumes that the instance is the unit of debugging, ie, there's no assumption that I've found that all instances are debuggable if some are.
It could look like the main possibility here is that the DebugState that we're using comes from another instance than the instance that's also being passed as an argument to the debugger functions and used for the guard, or that a DebugState is being constructed with code/metadata not belonging to the module in the instance. The former does not seem to happen (cf SearchFox); the latter does not seem possible: we're using the module's code_ field directly to construct the DebugState and then passing the DebugState to the instance's constructor.
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
One thing that has changed is that code being debugged is now actually being shared. Before, when a DebugState was deleted, it would own its code jointly with its instance, and decrementing the refcount on the code would have local effects (indeed none, since DebugState's finalizer is called from the Instance's finalizer and it ends up being the latter that brings the refcount on the owned code to 0). With code being shared process-wide if the module has been shared, any error in maintaining the code's refcount is more likely to be felt elsewhere. The crash could have been a result of observing garbage left behind from an erroneously deleted code object. There's no evidence of this, but with other possibilities being eliminated, it should be considered.
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Not obvious that bug 1761606 would be the cause of this, but it could be.
Assignee | ||
Comment 10•3 years ago
|
||
Actually, bug 1761606 is almost certainly the culprit.
Comment 11•3 years ago
|
||
Updated•3 years ago
|
Comment 12•3 years ago
|
||
bugherder |
Description
•