Closed Bug 1539654 Opened 3 years ago Closed 2 years ago

[jsdbg2] Debugger misses async function returns

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: jimb, Assigned: jimb)

References

Details

(Whiteboard: [debugger-mvp])

Attachments

(3 files, 3 obsolete files)

The following script should print out two completions (one for the await, and one for the return), but only prints one:

let g = newGlobal({newCompartment: true});
g.eval(`
    async function f() {
        debugger;
        await Promise.resolve(3);
        return "ok";
    }
`);

let dbg = Debugger(g);
dbg.onDebuggerStatement = frame => {
    frame.onPop = completion => {
        print(`completion: ${uneval(completion)}`);
    };
};

g.dis(g.f);
g.f().then(value => { print(`callback`); });
Priority: -- → P1

The test frame.isDebuggee() test in js::Debugger::onLeaveFrame is returning false, so we never make it to slowPathOnLeaveFrame.

If the debugger has an onEnterFrameHook set, then the bug doesn't occur: Realm::debuggerObservesAllExecution returns true, so JSScript::isDebuggee returns true on all scripts in the realm, so frames executing such scripts are marked as debuggees as soon as they are pushed, and js::Debugger::onLeaveFrame correctly takes the slow path.

A frame is supposed to be isDebuggee whenever there is a Debugger.Frame referring to it, even if no hooks or breakpoints are set. So the fact that we pushed a new frame without marking it as a debuggee at all is a problem.

Type: enhancement → defect

The patch above is just a work in progress. It passes under the interpreter, but fails with --baseline-eager.

After talking with jandem on IRC, I think we need to take a different tack.

There are several straightforward fixes for the interpreter, so leave that aside
and consider only Baseline.

Background:

  • Stack frames may be marked as debuggees individually: even if two frames are
    executing the same script, one may be a debuggee while the other is not. Among
    other things, if a frame has a Debugger.Frame referring to it, it is a
    debuggee.

  • Scripts may be marked as debuggees. Among other things, setting breakpoints in
    a script or single-stepping in it makes it a debuggee. Any frame executing a
    debuggee script is itself marked as a debuggee. But the converse is not true:
    debuggee frames can run non-debuggee scripts.

  • Each realm has a DebuggerObservesAllExecution flag, set if it is a debuggee
    of a Debugger with an onEnterFrame hook. All scripts in such a realm are
    marked as debuggees. But the converse is not true: debuggee scripts can exist
    in non-DebuggerObservesAllExecution realms.

  • The realm debuggee flag is the broadest category: debuggee frames and scripts
    can only exist in debuggee realms. And only debuggee realms can have their
    DebuggerObservesAllExecution flag set.

  • If a debuggee frame is executing Baseline code, that code must always be
    instrumented as necessary to keep Debugger informed of the frame's behavior.
    Since debuggee frames can run non-debuggee scripts, this means that even
    non-debuggee scripts may have instrumentation.

  • To make sure the debugger doesn't affect the performance of the ~100% of cases
    in which the page is not being debugged, Baseline code in non-debuggee realms
    must never include debugging instrumentation.

  • The sole inputs to baseline code generation are frame and script debuggee
    flags. The Realm's debuggee flag is not considered.

  • A Debugger.Frame referring to a generator or async frame must continue to
    refer to any and all resumptions of that call after an await or yield. Thus,
    newly pushed frames for resumptions may need to be marked as debuggees.

The bug is that newly pushed frames for resumed generator/async calls are not,
in fact, marked as debuggees.

At present, when we resume a generator or async call, neither the calling frame
nor the script being resumed are necessarily debuggees, so we have no
opportunity to insert instrumentation to check whether the resuming frame should
be a debuggee.

We could avoid affecting non-debuggee realms by instrumenting resumption only in
debuggee realms, but the realm's debuggee flag, per se, is not currently an
influence on code generation, and it would be nice to avoid new forms of
influence.

We could avoid introducing a branch by having generator objects hold a copy of
the right initial value for the BaselineFrame flags field. Resumption must
initialize this field in any case, and initializing it from a field loaded from
the generator object is not that much slower than initializing it with a
constant. Unfortunately, resumption of a debuggee frame requires recompilation
if the extant Baseline script was not compiled with instrumentation, so in the
general case a VM call is required. A branch will certainly be needed to avoid
that.

The solution jandem and I settled on was to mark generator/async scripts as
debuggees if any Debugger.Frame refers to them, much as we do new for scripts
and onStep handlers. Marking the script as a debuggee makes Baseline compile
JSOP_AFTERYIELD as a call to js::jit::DebugAfterYield, which sets the frame's
debuggee flag and calls Debugger::onResumeFrame. That function takes care of
recompiling the script with the needed instrumentation.

We have some extant problems with managing step counts in bug 1501666. I think I have a patch for that.

See Also: → 1501666

When resuming a generator call that has a Debugger.Frame referring to it, the
frame must be marked as a debuggee, so that the frame's activity will be
reported to the debugger. But in such a resumption, the resuming frame is not
necessarily a debuggee, so there isn't a good opportunity to check whether the
new frame needs to be a debuggee, re-JIT for debugging instrumentation, and so
on.

With this patch, the presence of a Debugger.Frame referring to a generator frame
contributes one count to the frame's script's step mode count. This marks the
script as a debuggee, which includes instrumentation to mark resumed frames as
debuggees.

No longer blocks: dbg-68
Depends on: 1546727
Assignee: nobody → jimb
Status: NEW → ASSIGNED
Whiteboard: [debugger-mvp]
Depends on: 1551176
Attached file Bug 1539654: wip (obsolete) —

When a Debugger.Frame refers to a generator/async call, the generator's script
must be marked as a debuggee, so that if the call is resumed, the
JSOP_AFTERYIELD bytecode is compiled to the instrumentation that re-associates
the extant Debugger.Frame with the new concrete frame.

This extends DebugScript with a new field generatorObserverCount, akin to
stepperCount, which tracks the number of Debugger.Frames referring to the
script if it is a generator script. This count is adjusted when the
GeneratorInfo structure is attached to a DebuggerFrame, cleared, and
finalized.

hg: Enter commit message. Lines beginning with 'HG:' are removed.

Depends on D32393

Attachment #9067188 - Attachment description: Bug 1539654: Improve error messages from SpiderMonkey test Match library. → Bug 1539654: Improve error messages from SpiderMonkey test Match library. r?jorendorff
Attachment #9057086 - Attachment is obsolete: true
Attachment #9059135 - Attachment is obsolete: true
Attachment #9065115 - Attachment is obsolete: true

Revised patches a bit: new try push.

Found more tests using Match.js library. New try push.

Okay, I've learned my lesson. These try failures have nothing to do with the actual point of the patch series; I just changed the behavior of a function in a testing library (for the better, I feel) and thought the failures would be rare enough that I could just fix them up. It turns out it's used everywhere. Not worth it. Better to leave the existing function's behavior unchanged, and then use new functions in my new tests.

This try run is looking better: try run

Unfortunately, there is a failure in the 'compacting' test job that does seem to be related:

JS_GC_ZEAL="IncrementalMultipleSlices,10" ~/moz/dbg/js/src/obj~/js/src/js --baseline-eager -f /home/jimb/moz/dbg/js/src/jit-test/lib/prologue.js -e 'const platform='"'"'linux2'"'"'' -e 'const libdir='"'"'/home/jimb/moz/dbg/js/src/jit-test/lib/'"'"'' -e 'const scriptdir='"'"'/home/jimb/moz/dbg/js/src/jit-test/tests/generators/'"'"'' --module-load-path /home/jimb/moz/dbg/js/src/jit-test/modules/ -f /home/jimb/moz/dbg/js/src/jit-test/tests/generators/bug1491331.js
Assertion failure: observing, at /home/jimb/moz/dbg/js/src/vm/Debugger.cpp:3019
Segmentation fault (core dumped)

I am able to reproduce this.

Jason, I have some questions about this if condition in Debugger::removeDebuggeeGlobal (searchfox):

  // Clear this global's generators from generatorFrames as well.
  //
  // This method can be called either from script (dbg.removeDebuggee) or
  // from an awkward time during GC sweeping. In the latter case, skip this
  // loop to avoid touching dead objects. It's correct because, when we're
  // called from GC, all `global`'s generators are guaranteed to be dying:
  // live generators would keep the global alive and we wouldn't be here. GC
  // will sweep dead keys from the weakmap.
  if (!global->zone()->isGCSweeping()) {
    ...
  }

It seems like this assumes that if the global's zone is sweeping, then the global is going away so there's no need to clean up generatorFrames entries. But in the test case, I'm observing that the zone is in the Sweep state even though removeDebuggeeGlobal is being called from dbg.removeDebuggee, in JS. I should verify this with the GC folks, but it seems like this condition is not sufficient to ensure that removeDebuggeeGlobal is being called because global is dying; it could just be that the GC phase is overlapping with some JS execution.

Would it make sense to instead add an argument to removeDebuggeeGlobal to distinguish the sweep-driven from the JS-driven cases?

Also: I wish I had a solid way to reproduce this; it's actually intermittent on my machine right now, although I did catch it under rr.

Flags: needinfo?(jorendorff)

(In reply to Jim Blandy :jimb from comment #14)

Would it make sense to instead add an argument to removeDebuggeeGlobal to distinguish the sweep-driven from the JS-driven cases?

Doing so does prevent the crashes. I'm not convinced, though, even when removeDebuggeeGlobal is being called from removeDebuggee by JavaScript, that it's safe for it to touch the keys of generatorFrames to extract their globals if there is a sweep going on. Potentially, both key and value could be being finalized whenever there is a sweep going on, which is what the global->zone()->isGCSweeping() test is trying to prevent.

This analysis makes sense to me. I agree this is something to verify with the GC team.

Flags: needinfo?(jorendorff)

I've implemented this and added a test, based on jonco's advice, that crashes reliably.

The new export Pattern.OBJECT_WITH_EXACTLY is like the ordinary Pattern
constructor, but also fails to match if the actual value has any extra
properties.

The default behavior of object matching (to ignore additional properties) is
left unchanged, since there are too many extant tests that rely on this behavior
to be worth fixing.

Depends on D32393

Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f0b7dfa9bdbd
Improve error messages from SpiderMonkey test Match library. r=jorendorff
https://hg.mozilla.org/integration/autoland/rev/5aecd9ecbab2
Add all-and-only object property patterns to SpiderMonkey's Match.js test library. r=jorendorff
https://hg.mozilla.org/integration/autoland/rev/8a7c8f8ecd56
Ensure generator scripts observed by Debugger.Frames are marked as debuggees. r=jorendorff
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Regressions: 1557343
Regressions: 1584195
You need to log in before you can comment on or make changes to this bug.