Closed Bug 1230345 (dbg-breakpoint-die-one-thousand-fires-et) Opened 5 years ago Closed 5 years ago

Breakpoint sliding should die in a thousand fires. A THOUSAND, I SAY.

Categories

(DevTools :: Debugger, defect)

defect
Not set
normal

Tracking

(firefox46 fixed)

RESOLVED FIXED
Firefox 46
Tracking Status
firefox46 --- fixed

People

(Reporter: fitzgen, Assigned: jlong)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 6 obsolete files)

* Tons of maintenance cost here. This is not straightforward code.

* Users are _frequently_ confused by this, especially when a function is reclaimed by the GC.

* The one case it actually helps out on (woops I slipped and misclicked one line above) is super rare in comparison.

Instead, we should just set pending breakpoints. Its ok if they pend forever (eg breakpoint on a comment) and it isn't the end of the world. Pending breakpoints have a MUCH better interaction with functions that have been reclaimed by the GC.
Yeah.
I'm crying tears of joy.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
Assignee: nfitzgerald → ejpbruel
Attached patch 1230345.patch (obsolete) — Splinter Review
Here's an initial patch that removes sliding on the client and the server. It also removes some of the xpcshell tests. `test_breakpoint-19.js` is failing and I'm not sure what the correct behavior for that one is. I haven't touched the frontend tests either; hopefully we just need to remove the ones that has `actualLocation` in the name.
Oh Eddy, test_breakpoint-19.js is skipped in xpcshell.ini because of bug 1104838. We should probably just remove it. In the future we may support it if we go back to sending breakpoint requests to the thread actor.
What about the case when you click on a function declaration? Covers the "I want to break on this function so I'm going to click on it" use case.  Gaving the breakpoint move to the first line of the function seems useful and is chrome-parity ux.  OTOH I don't know if keeping any part of it around negates the maintenance benefit we'd get from removing it entirely.
I do sort of like the idea of doing sliding when it's easy, and only in a few simple scenarios.

Sliding is not that complicated on the frontend. The backend is really complicated, especially with sourcemaps. But if we only do it in certain places it still removes a ton of complexity.

Sliding isn't that hard when it's just "go to the next bytecode", it just gets hard when we try to detect which bytecodes map to which lines and be smart about where it slides to. We can keep "dumb" sliding around and *only* do it when it won't destroy the UX:

1. Don't do it for sourcemaps, ever
2. *Only* do it in top-level functions (or something like that, see next paragraph)

I was writing out #2 but then realized that might be harder than I thought. Chrome seems to disable sliding on more complex scenarios, but I can't tell what they are. For example, it slides here:

function foo() {
  function baz() {
    // This is a comment
    // and here
    return 5;
  }
}

But it does not slide here:

(function() {
  function baz() {
    // This is a comment
    // and here
    return 5;
  }
})()

Even if I wait for a while, when it should be GCed, it still slides. It seems like they are able to slide even though the function *should* be GCed?


(SIDE QUESTION) This made me think of something. Take the following code:

function foo() {
  function baz() {
    // This is a comment
    // and here
    var x = 5;
    console.log('x is ', x);
  }

  return baz;
}

When we set a breakpoint inside `baz`, do we make sure to set breakpoints on every *new* instance of `baz` after we set the breakpoint?
OMG I accidentally removed some text which made the above horribly confusing. Sorry, let me continue on from this point:

But it does not slide here:

(function() {
  function baz() {
    // This is a comment
    // and here
    return 5;
  }
})()

But theoretically, example #1 (both functions are named) should still not slide right? The inner function could be GCed. Or are named functions never GCed? Even if I wait for a while, when it should be GCed, it still slides. It seems like they are able to slide even though the function *should* be GCed?
(In reply to James Long (:jlongster) from comment #7)
> (SIDE QUESTION) This made me think of something. Take the following code:
> 
> function foo() {
>   function baz() {
>     // This is a comment
>     // and here
>     var x = 5;
>     console.log('x is ', x);
>   }
> 
>   return baz;
> }
> 
> When we set a breakpoint inside `baz`, do we make sure to set breakpoints on
> every *new* instance of `baz` after we set the breakpoint?

There is not a D.Script (or JSScript, internally) for every function. Each function object has a reference to the D.Script/JSScript.
Ok, Chrome's sliding is definitely GC-sensitive. Take the following code:

```
(function() {
  function baz() {
    // This is a comment
    // and here
    var x = 5;
    console.log('x is ', x);
  }

  setTimeout(baz, 1000);
})();

console.log('hello')
```

I can slide in `baz` for a while, but then it just stops sliding.

Note that when it can't slide in `baz` anymore, it does *not* slide all the way down to the `console.log` on the last time, which is what we do and is why it feels so bad I think.
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #9)

> There is not a D.Script (or JSScript, internally) for every function. Each
> function object has a reference to the D.Script/JSScript.

Ah right, thanks for that. Too much to keep in my head.
So I vote that we remove sliding for sourcemapped sources and look into only sliding within the current function. I feel like we talked about this before; is there a reason we can't just slide only within the current function, and if we reach the end of the function set a pending breakpoint?
It's also worth noting that if we generally remove breakpoint sliding (i.e. only do it when we know it's a top-level function that can't be GCed), we could probably change how we use the `ScriptStore` and not hold strong references to scripts anymore. This might reduce the memory usage on pages with devtools open.
Eddy if you don't mind I'm going to take this over. Just so happens we need to fix this to support huge files in the debugger anyway and I'd like to land these fixes before Christmas.
Assignee: ejpbruel → jlong
Attached patch 1230345.patch (obsolete) — Splinter Review
This is a better patch. I've disabled sliding all except for specific scenarios:

1. Not a column breakpoint
2. Not sourcemapped
3. This is global scope OR
4. This is inside a top-level named function. This
   means that this code can never be GCed

I think this will work pretty well, but I'm going to play with it some more.

This patch won't run for you, it depends on bug 1232816 and a few other Debugger API additions that were very simple but help analyze the scopes of functions.
Depends on: 1232816
Changing the title because I think I found a really solid solution that doesn't *completely* remove sliding.
Summary: BREAKPOINT SLIDING SHOULD DIE IN ONE THOUSAND FIRES OF LAVA AND DEATH AND HATE AND OTHER DARK THINGS → Be FAR more conservative about breakpoint sliding
Attached patch 1230345.patch (obsolete) — Splinter Review
This patch is confusing to read in splinter so you may just want to apply it and check it out.

I think I found a really good compromise. Sliding is useful for things like setting a breakpoint on the end curly-brace of a function, or setting it on the `function foo() {` line, neither of which have bytecodes. The sliding tells the user that it's an invalid position and it moves it. The latter is particularly helpful because it'll move to break on the first line of the function.

It's a good user experience, but BY FAR not worth it to even ATTEMPT for anything complicated like sourcemapped sources or column breakpoints. I say in those cases, we just bail and don't do it.

It's not *that* hard to do with normal line breakpoints. But now the problem is that it's GC-sensitive; some functions may be entirely removed from the engine so we have nothing to set a breakpoint on and the breakpoints runs far away from where the user is trying to set it. For example this code:

(function() {
  function bar() {
    var y = 20;
  }
  // Hello
  var x = 10;
})();

function baz() {
  var x = 5;
}

The first two scripts (the anon function and the inner one) can be GCed because there's absolutely no way for them to run again. If you ran this code and then opened the devtools, it'll show this code but if you try to set a breakpoint anywhere in the anon function it'll slide all the way down to inside `baz`!!

So can we ever reliably do sliding? I see only place that we can reliably do sliding: inside top-level named functions. This functions are set on the `window` scope and thus can never be GCed. I need to expose a few more properties on the Debugger API to get at this information, but it's actually pretty easy. This patch implements this. Everywhere else (not inside a top-level named function, sourcemapped scripts, column breakpoints, etc) never does sliding.

I need to work though what a "top-level named function" means a little more, like what does it mean in an eval script and in module scope. I'll do that more tomorrow, but this patch will remain roughly the same.
Attachment #8697392 - Attachment is obsolete: true
Attachment #8698710 - Attachment is obsolete: true
Attachment #8698816 - Flags: review?(nfitzgerald)
Attached patch 1230345.patch (obsolete) — Splinter Review
Minor formatting. Also, this is just requesting a general r? of the algorithm, I still need to fix up a bunch of tests so I'm not going to land this with a r+.
Attachment #8698816 - Attachment is obsolete: true
Attachment #8698816 - Flags: review?(nfitzgerald)
Attachment #8698817 - Flags: review?(nfitzgerald)
(In reply to James Long (:jlongster) from comment #17)

> I think I found a really good compromise. Sliding is useful for things like
> setting a breakpoint on the end curly-brace of a function, or setting it on
> the `function foo() {` line, neither of which have bytecodes.

A couple of notes here --

When I fix up the regressions from bug 1013219, the closing brace of a function
will have bytecode.  In particular the final "return" or "retrval" will have
this location.

Second, while I did make it so that any function prologue now has the line number
of the first actual line of the function, I think the line number of the 
"function foo()" can be had by looking at Debugger.Script.sourceStart.  So, maybe
this case could be detected without a generic sliding heuristic.  (Aside from
the GC issue.)
(In reply to Tom Tromey :tromey from comment #19)
> (In reply to James Long (:jlongster) from comment #17)
> 
> > I think I found a really good compromise. Sliding is useful for things like
> > setting a breakpoint on the end curly-brace of a function, or setting it on
> > the `function foo() {` line, neither of which have bytecodes.
> 
> A couple of notes here --
> 
> When I fix up the regressions from bug 1013219, the closing brace of a
> function
> will have bytecode.  In particular the final "return" or "retrval" will have
> this location.

Oh awesome, I noticed that oddity in my testing and I'm glad that will be fixed. That means that the `lineCount` will be more correct, yes? I noticed when I had a function like this:

function foo() {
  return 5;
  // just some useless
  // comments
}

That the `lineCount` would only be 2.

> 
> Second, while I did make it so that any function prologue now has the line
> number
> of the first actual line of the function, I think the line number of the 
> "function foo()" can be had by looking at Debugger.Script.sourceStart.  So,
> maybe
> this case could be detected without a generic sliding heuristic.  (Aside from
> the GC issue.)

What do you mean by "function prologue"? Does this effect D.Script.prototype.startLine?

We can't really use anything from the raw source (like line numbers) to figure out the case of "top-level named function" because there are all sorts of weird formatting that might mess that up.
(In reply to James Long (:jlongster) from comment #20)

> Oh awesome, I noticed that oddity in my testing and I'm glad that will be
> fixed. That means that the `lineCount` will be more correct, yes? 

I think so, because lineCount is computed by walking the source note table.
The ending line of the function will definitely be correct now;
that's the BytecodeEmitter.cpp change in the first patch.


> What do you mean by "function prologue"? Does this effect
> D.Script.prototype.startLine?

In some cases there is bytecode emitted that takes the location of some
token from "function xxx() {" (I don't recall which token exactly).
I also don't remember exactly what case this is, sorry.  Something to do
with a nested function.

I think the startLine property should still be correct, because that is
taken from the bytecode emitter, not the source notes.
Comment on attachment 8698817 [details] [diff] [review]
1230345.patch

Review of attachment 8698817 [details] [diff] [review]:
-----------------------------------------------------------------

Looks alright and seems much more maintainable compared to the old code, but we should get together with ejpbruel and decide whether we want to to kill sliding 100% or not. If we want to push forward with this, we should also go back through all the BP sliding bugs and re-test on those test cases and ensure that this has sane behavior in all of them.

::: devtools/server/actors/script.js
@@ +2738,5 @@
> +        // 2. Not a column breakpoint
> +        // 3. There are scripts (none might exist because of GC or
> +        //    edge cases at the bottom of scripts where no code exists)
> +        // 4. This is inside a top-level named function. This
> +        //    means that this code can never be GCed

It isn't strictly true that 4 means that the script can never be GC'd: any JS could overwrite the global binding and it would go away. But I think it is irrelevant either way...

To back up a bit though: this explains some requirements for when we will slide BPs, but not *why* we require those things, why they are important, or anything like that.

I suggest something like: "BP sliding goes horribly wrong in the presence of scripts that have already been GC'd before the debugger was enabled. We don't know that they ever existed and an easily slide through whole functions in the source text to the bewilderment of our users and our own chagrin. With source maps, breakpoint sliding is very difficult and has proven to be even more error prone; the maintenance burden greatly outweighed the value gained. Therefore, we only want to slide breakpoints when we are Very Sure we can do the Right Thing Very Simply. First, we never do breakpoint sliding on source mapped sources or column breakpoints (where would column BPs even slide to?) Now, only consider scripts that are within or are themselves a named function at the top level. This means that no scripts could have been collected before the debugger started up because parents keep their children alive (but the opposite is not true), so if we still have a live parent with a global binding, then we know all the children are still alive and we won't embarrassingly slide past some GC'd child script."

@@ +2748,3 @@
>  
> +        let actualLine = originalLine;
> +        while (true) {

Let's bound this loop by the top level named function's line count rather than repeatedly querying the script store (which results in a linear search of all scripts on the page for each iteration of the BP sliding loop).
Attachment #8698817 - Flags: review?(nfitzgerald) → feedback+
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #22)
> Comment on attachment 8698817 [details] [diff] [review]
> 1230345.patch
> 
> Review of attachment 8698817 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks alright and seems much more maintainable compared to the old code, but
> we should get together with ejpbruel and decide whether we want to to kill
> sliding 100% or not. If we want to push forward with this, we should also go
> back through all the BP sliding bugs and re-test on those test cases and
> ensure that this has sane behavior in all of them.

Yes I will update all the tests. Still working through how to actually do that because all the tests use eval sources, but I have an idea that might work.

Agreed that we need Eddy's consensus as well, if he feels strongly we should kill it I'll talk to him some more. I think it's a nice-to-have feature when we can do it simply, and keeps up with feature-parity of other debuggers.

Thanks for the other comments, I will update my patch.
So I'm considering filing a new bug for my only-slide-on-named-function idea above, and landing a patch that just removes sliding entirely. I don't want this to block on bug 1233287 and bug 1232816 as we figure out how to expose some more information in the Debugger API.

I can probably land this bug next week if I just remove it altogether.
Summary: Be FAR more conservative about breakpoint sliding → Breakpoint sliding should die in a thousand fires. A THOUSAND, I SAY.
(In reply to James Long (:jlongster) from comment #24)

SGTM
Blocks: 1233474
Attached patch 1230345.patch (obsolete) — Splinter Review
A christmas present for you: a new sliding algorithm! I thought about this a lot more and was able to reduce it to something much simpler and straight-forward. Forget about all the top-level named function stuff.

(I was going to remove sliding entirely first, but it turned out to be a pain to update all the tests *twice*)

I've been thinking about something you said: "a parent script keeps all of its children alive". Something about this seemed important. And I figured it out: that means we can easily know if we can reliably slide or not. A simple metric tells us that: are they are any scripts available on this line?

If there are no scripts available, we should not slide at all (unlike our previous algorithm). Take the following code:

1. (function() {
2.   console.log('hi');
3. })();
4.
5. function foo() {
6.   return 5;
7. }

Lines 1-3 will be GCed. So what scripts exist on line 2? None! There used to be a parent script that encapsulated the whole source, but *it got GCed as well*. I used to think the top-level script would be returned, but that's not true. *Only* lines 5-7 have a script that exists.

So if we get any scripts on the line we are trying to set a breakpoint on, we can figure out which is the "largest" script and slide breakpoints across it. We will never, ever do breakpoint sliding on a line of code that has been GCed.

And of course, still no sliding for column or sourcemapped sources.
Attachment #8698817 - Attachment is obsolete: true
Attachment #8700114 - Flags: review?(nfitzgerald)
Attachment #8700114 - Flags: review?(nfitzgerald) → feedback?(nfitzgerald)
(In reply to James Long (:jlongster) from comment #26)

I haven't looked at the approach yet, but this sounds very sane to me. Nice job!
Attached patch 1230345.patch (obsolete) — Splinter Review
Good! It seems useful to keep this around for simple cases.

I had an off-by-1 error in the last patch, fixed here.
Attachment #8700114 - Attachment is obsolete: true
Attachment #8700114 - Flags: feedback?(nfitzgerald)
Attachment #8700137 - Flags: feedback?(nfitzgerald)
Comment on attachment 8700114 [details] [diff] [review]
1230345.patch

Review of attachment 8700114 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM! \o/

Add the damn space between "if"/"for"/etc and the opening paren! We have eslint for this stuff! :P

::: devtools/server/actors/script.js
@@ +2744,5 @@
> +        // valid script, we can slide through it and know we won't
> +        // slide through any of its child scripts. Additionally, if a
> +        // script gets GCed, that means that all parents scripts are
> +        // GCed as well, and no scripts will exist on those lines
> +        // anymore. We will never slide through a GCed script.

Thanks for this comment, its great :)

@@ +2752,2 @@
>  
> +        const largestScript = scripts.reduce((largestScript, script) => {

Might be worth a comment like "Find the script that encompasses this line with the highest line count and use it to bound the sliding". Not obvious why we care about the largest script until we scroll down later. Maybe that is only a problem when reading the diff and the loop is pushed kind of far down from this largetScript :P

@@ +2771,5 @@
> +          // because we found scripts on the line we started from,
> +          // which means there must be valid entry points somewhere
> +          // within those scripts.
> +          assert(
> +            false,

Nit: assert(actualLine <= maxLine, ...) no need for the if statement.
Attachment #8700114 - Attachment is obsolete: false
Attachment #8700114 - Attachment is obsolete: true
Comment on attachment 8700137 [details] [diff] [review]
1230345.patch

Review of attachment 8700137 [details] [diff] [review]:
-----------------------------------------------------------------

See comments on the last patch revision. Also, don't forget to remove the debugging dump calls before official review and landing and all that.
Attachment #8700137 - Flags: feedback?(nfitzgerald) → feedback+
Attached patch 1230345.patchSplinter Review
Final patch! Implements the above algorithm: only slide when we have a script, and only slide within that script. Never slide if no script exist. (and never slide column or sourcemapped sources)

In the future, I think doing a similar thing for column breakpoints wouldn't be too hard. But I don't see a whole lot of payoff for that, so I think it's fine to not slide them. We only use column breakpoints for sourcemaps anyway.

I removed 3 tests regarding column breakpoints, as I didn't think they were relevant anymore, and the other tests covered enough edge cases. Most of the line breakpoints stayed the same, actually. And I had to tweak some of the sourcemap tests.

I also converted most of the breakpoint xpcshell tests to use `evalInSandbox` for consistency. Previously the latter half seemed to use it, and the earlier tests just used `eval`. The different is how the engine sees the script: if it's a top-level <script> source or a normal `eval` source. Doesn't really affect us, but we should stick to one thing, and treating it like a normal <script> tag seems nice.
Attachment #8700137 - Attachment is obsolete: true
Attachment #8704363 - Flags: review?(nfitzgerald)
Comment on attachment 8704363 [details] [diff] [review]
1230345.patch

Review of attachment 8704363 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! :)

::: devtools/server/actors/script.js
@@ +2760,3 @@
>  
> +        let actualLine = originalLine;
> +        for (actualLine = originalLine; actualLine <= maxLine; actualLine++) {

Nit: don't repeat `actualLine = originalLine;` in both the loop initialization and the variable declaration. Choose one or the other so that readers don't have to double take and figure out what they are missing.
Attachment #8704363 - Flags: review?(nfitzgerald) → review+
Huh. I don't know why those tests are permanently failing. It must be on top of a bad commit, I can't imagine why this would change those. But I've been wrong before about that.
I see all of those errors, even the perma-failures, on fx-team.
https://hg.mozilla.org/mozilla-central/rev/923889d990bd
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 46
Duplicate of this bug: 1232292
[bugday-20160323]

Status: RESOLVED,FIXED -> UNVERIFIED

Comments:
STR: Not clear.
Developer specific testing

Component: 
Name			Firefox
Version			46.0b9
Build ID		20160322075646
Update Channel	        beta
User Agent		Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
OS			Windows 7 SP1 x86_64

Expected Results: 
Developer specific testing

Actual Results: 
As expected
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.