Closed Bug 1760627 Opened 3 years ago Closed 3 years ago

Crash in [@ js::wasm::DebugState::adjustEnterAndLeaveFrameTrapsState]

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

Firefox 100
defect

Tracking

()

RESOLVED DUPLICATE of bug 1761606
Tracking Status
thunderbird_esr91 --- unaffected
firefox-esr91 --- unaffected
firefox98 --- unaffected
firefox99 --- unaffected
firefox100 --- affected

People

(Reporter: calixte, Assigned: lth)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/f3524ce4-2f64-4568-89d4-f78330220321

MOZ_CRASH Reason: MOZ_CRASH(No code segment at this tier)

Top 10 frames of crashing thread:

0 XUL js::wasm::DebugState::adjustEnterAndLeaveFrameTrapsState js/src/wasm/WasmDebug.cpp:357
1 XUL js::wasm::DebugState::ensureEnterFrameTrapsState js/src/wasm/WasmDebug.cpp:365
2 XUL UpdateExecutionObservabilityOfScriptsInZone js/src/debugger/Debugger.cpp:3255
3 XUL js::Debugger::updateExecutionObservabilityOfScripts js/src/debugger/Debugger.cpp:3273
4 XUL js::Debugger::updateExecutionObservability js/src/debugger/Debugger.cpp:3341
5 XUL js::Debugger::updateObservesAllExecutionOnDebuggees js/src/debugger/Debugger.cpp:3450
6 XUL js::Debugger::setHookImpl js/src/debugger/Debugger.cpp:4157
7 XUL bool js::Debugger::CallData::ToNative<&js::Debugger::CallData::setOnEnterFrame js/src/debugger/Debugger.cpp:4126
8 XUL js::CallSetter js/src/vm/Interpreter.cpp:730
9 XUL SetExistingProperty js/src/vm/NativeObject.cpp:2490

There is 1 crash in nightly 100 with buildid 20220321065848. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1756951.

[1] https://hg.mozilla.org/mozilla-central/rev?node=5ab04623c826

Flags: needinfo?(lhansen)

Yeah, this is almost certainly mine, unless my change uncovered an existing problem.

Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Flags: needinfo?(lhansen)
Priority: -- → P2
Has Regression Range: --- → yes

This is a very curious failure. The MOZ_CRASH indicates no debugging code is available. Yet UpdateExecutionObservabilityOfScriptsInZone guards on instance->debugEnabled(), and when instance->debugEnabled() is true there should definitely be debugging code available in the instance.

We could strengthen the assertion in the DebugState constructor in an attempt to catch the problem earlier:

diff --git a/js/src/wasm/WasmDebug.cpp b/js/src/wasm/WasmDebug.cpp
--- a/js/src/wasm/WasmDebug.cpp
+++ b/js/src/wasm/WasmDebug.cpp
@@ -41,7 +41,8 @@ DebugState::DebugState(const Code& code,
       module_(&module),
       enterFrameTrapsEnabled_(false),
       enterAndLeaveFrameTrapsCounter_(0) {
-  MOZ_ASSERT(code.metadata().debugEnabled);
+  MOZ_RELEASE_ASSERT(code.metadata().debugEnabled);
+  MOZ_RELEASE_ASSERT(code.hasTier(Tier::Debug));
 }

There's a possibility that this is another problem with the interaction of code caching and debugging, but without STR it's hard to say for sure.

You might be on something: looking at CompilerEnvironment::computeParameters or CompileArgs::build, I cannot find that we are rejecting debugEnabled if baseline is not available.

(In reply to Yury Delendik (:yury) from comment #4)

You might be on something: looking at CompilerEnvironment::computeParameters or CompileArgs::build, I cannot find that we are rejecting debugEnabled if baseline is not available.

(After some IM conversation) I think that was based on a misreading of the code on Yury's part.

But a more interesting matter is, what are the invariants of code and how do we check them? Consider that compiled code in the form of a Module can enter a run-time context in at least these ways:

  • it can be the result of compiling bytecode under a set of flags
  • it can be the result of deserializing cached code, and whether deserialization can be done depends on current flags and cached flags; more simply, if debugging is enabled, we should not be deserializing code at all, since serialized code is never debuggable
  • it can be received by postMessage from another context, where it was compiled under some flags, and in principle the receiving context can have different flags (eg, the present failure would be explained by code being posted from a context with no debugging to a context where debugging is enabled)

Yury makes the point that "we need better management/assertion of debugEnabled metadata flag at the end of compilation" and I think this is exactly right.

Looking at the crash report, we see "RemoteType webCOOP+COEP" which suggests the presence of workers. It's not hard to get non-debug wasm code into a context that has an active debugger (bullet 3 above) but I've yet to make that crash or assert in any way, so something more needs to happen.

It's not completely obvious whether the crash comes when trying to enable or disable the frame traps state. The line number corresponds to the end of the function; in the machine code, a number of the calls to abort() are shunted to the end, so it could be either. The enable case also accesses the Tier::Debug because it may need to install the trap handler and to do that it needs to compute the handler's address. Need STR to do better, probably.


The Module has a flag debugEnabled on its metadata. That flag is initially false, and is set to true only in ModuleGenerator::finishMetadata(), if compilerEnv->debugEnabled() is true. In ModuleGenerator::finishModule() we additionally check that if compilerEnv->debugEnabled() is true then the module has code at Tier::Debug and is not tiered, and we create other debugging metadata. The module is immutable after creation. Ergo the flag is true iff there is debug code/data. Ergo the module is self-consistent. It's the module that's shared between threads, including its code and metadata. Code compiled in a non-debugging context would not have debug code/data and its debugEnabled flag would not be set. This includes code deserialized from cache. Thus a self-consistent module with or without debugging code would arrive in some context and be the argument of instantiation.

The instance has a flag, debugEnabled(), which is true iff the instance has a DebugState. This is the flag that guards the code referenced in an earlier comment. The DebugState is created in Module::instantiate iff the module's metadata has the debugEnabled flag set. Thus it would seem that instance->debugEnabled() is true iff the module is debuggable, and assuming that the module is self-consistent the instance should be too.

All code I've seen so far assumes that the instance is the unit of debugging, ie, there's no assumption that I've found that all instances are debuggable if some are.


It could look like the main possibility here is that the DebugState that we're using comes from another instance than the instance that's also being passed as an argument to the debugger functions and used for the guard, or that a DebugState is being constructed with code/metadata not belonging to the module in the instance. The former does not seem to happen (cf SearchFox); the latter does not seem possible: we're using the module's code_ field directly to construct the DebugState and then passing the DebugState to the instance's constructor.

Attachment #9268881 - Attachment description: WIP: Bug 1760627 - Strengthen assertions → WIP: Bug 1760627 - Strengthen assertions, remove dead code, fix comments

One thing that has changed is that code being debugged is now actually being shared. Before, when a DebugState was deleted, it would own its code jointly with its instance, and decrementing the refcount on the code would have local effects (indeed none, since DebugState's finalizer is called from the Instance's finalizer and it ends up being the latter that brings the refcount on the owned code to 0). With code being shared process-wide if the module has been shared, any error in maintaining the code's refcount is more likely to be felt elsewhere. The crash could have been a result of observing garbage left behind from an erroneously deleted code object. There's no evidence of this, but with other possibilities being eliminated, it should be considered.

Attachment #9268881 - Attachment description: WIP: Bug 1760627 - Strengthen assertions, remove dead code, fix comments → Bug 1760627 - Strengthen assertions, remove dead code, fix comments. r?yury
Keywords: leave-open

Not obvious that bug 1761606 would be the cause of this, but it could be.

See Also: → 1761606

Actually, bug 1761606 is almost certainly the culprit.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/df4812dbeedc Strengthen assertions, remove dead code, fix comments. r=yury
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: