Closed Bug 1567210 Opened 1 year ago Closed 1 year ago

Closing devtools should resume the thread

Categories

(DevTools :: Debugger, defect, P1, blocker)

69 Branch
defect

Tracking

(firefox-esr60 unaffected, firefox-esr68 unaffected, firefox68 unaffected, firefox69 verified, firefox70 verified)

VERIFIED FIXED
Firefox 70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- verified
firefox70 --- verified

People

(Reporter: jlast, Assigned: ochameau)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(4 files, 2 obsolete files)

Has STR: --- → yes
Regressed by: 1494796

:jlast, if you think that's a regression, then could you try to find a regression range in using for example mozregression?

Yep, i added a regressed by

Priority: -- → P1

Alright, after investigating this a bit more, it looks like the issue originates from the destroy handler, which is a relief because that is not unsafeSyncrhronize territory.

Also, just to be safe, I double checked and the STR does not repro on June 5th, which was the date of the last change prior to the thread-client work.

Yulia, can you help unpack some of the onResume/doResume changes? It looks like there is now a subtle timing issue. Perhaps related to promises/async-await?

Flags: needinfo?(ystartsev)

The regression range is: https://hg.mozilla.org/mozilla-central/rev/2494fafdeae721029ce499c9367f522c76aaa9c1

The cause is that, before the front and actor would be destroyed together. With the change to getFront, the destruction of the actor happened well after the front.

I added a fix but there is a problem -- we might actually test this somewhere, but the test passes. I am not sure what is different in our test environment and when testing manually, but the test included in the first patch here is exactly the steps that i used to reproduce this. I have included a test to demonstrate this. It might take some investigation into why the tests behave differently. We might have some cleanup for convenience.

Flags: needinfo?(ystartsev)
Assignee: nobody → ystartsev

Exceptions can be something else than Error.
An example could be Cu.Exception(), but native code can probably throw
unexpected things. It is also possible to throw any kind of object in JS...

pre and post nest functions are rather something specific to the thread actor
or the event loop classes. The only reason it is today on the target actors
is that the list of targetted windows can be different based on the target.
But we can figure out, from the thread actor's debuggees, the list of these
windows that are inspected by the thread actor.
I think that it is better that was as it better reflect what actual final
globals, the thread actor is interacting with. The previous code could possibly
introduce differences between globals for which we pauses the timeouts and events
versus the globals being inspected by the thread actor.

Attachment #9079684 - Attachment is obsolete: true
Attachment #9079685 - Attachment is obsolete: true
Duplicate of this bug: 1566910

Use thread actor reference instead of target's actor URL and connection
to distinguish the currently paused event loop from the other ones which
may be on-pause from another thread actor on another connection/tab.
This happens when connecting more than one client against the same tab.

Assignee: ystartsev → poirot.alex
Severity: normal → blocker
Status: NEW → ASSIGNED
Has Regression Range: --- → yes
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: needs-review
Version: Trunk → 69 Branch

This appears ready to land. Are you going to request uplift?

Flags: needinfo?(poirot.alex)

Should we wait for the patch to bake in nightly for a few days before uplifting it right away?

Flags: needinfo?(poirot.alex)

BTW, I can't land the patches via lando because of this error message:

Landing Failed
There are landing blockers present which prevent landing.

Harald, it looks like I can't land because of the severity flag being set to blocker.
I don't know how to proceed, is this flag really relevant?

Flags: needinfo?(hkirschner)
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b123b923bf61
Better handle non-javascript exception in thread actor. r=loganfsmyth
https://hg.mozilla.org/integration/autoland/rev/6d6ee45769d2
Ensure resuming page execution when the toolbox closes. r=loganfsmyth
https://hg.mozilla.org/integration/autoland/rev/759b3ded0024
Use Thread actor reference to distinguish the event loops. r=loganfsmyth

Weird. So the "landing blockers" weren't this bug, but something else.
First time I hit that. Does that mean, this bug is going to block the other pushes to autoland?
I didn't knew about this specialty of lando/autoland.

Flags: needinfo?(hkirschner)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 70

Hey Alexandre, since 69 is marked as affected, could you please nominate an uplift of this fix to beta69? Thanks!

Flags: needinfo?(poirot.alex)

Comment on attachment 9080318 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Beta/Release Uplift Approval Request

  • User impact if declined: If the web developer hit a breakpoint and close the DevTools without manually resuming the page execution,
    the web page will still be blocked on the breakpoint, even if the Devtools are now closed.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: * open a page where you can set a breakpoint (for example: data:text/html,<script>window.onclick=()=>{debugger;};setInterval(()=>{document.body.innerHTML+="x<br/>"}, 1000)</script>
  • open the debugger, set a breakpoint and trigger it (with the example, there is no need to set it, but just click in the page)
  • close the devtools

Expected result: the page should be resume and work correctly:

  • links should work
  • in the example, the setInterval loop should be resume and "x" be printed
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This modifies some critical breakpoint code, but at the same time this overall modification is quite limited.
  • String changes made/needed: no
Flags: needinfo?(poirot.alex)
Attachment #9080318 - Flags: approval-mozilla-beta?
Flags: qe-verify+
Blocks: 1571637
QA Whiteboard: [qa-triaged]

Verified as fixed on Firefox Nightly 70.0a1 on Windows 10 x 64, Mac OS X 10.14 and on Ubuntu 18.04 x64.

Comment on attachment 9080318 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Fix verified on nightly by QA, has tests and fixes a major regression in devtools in 69, uplift approved for 69 beta 12, thanks!

Attachment #9080318 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9080318 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Alex, your uplift request is only for one of the 3 patches, can you confirm that this is intended? Thanks

Flags: needinfo?(poirot.alex)
Attachment #9080318 - Flags: approval-mozilla-beta+
Attachment #9080318 - Flags: approval-mozilla-beta?

(In reply to Pascal Chevrel:pascalc (pto Jul 12-30) from comment #22)

Comment on attachment 9080318 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Alex, your uplift request is only for one of the 3 patches, can you confirm that this is intended? Thanks

Yes. The two other patches are independant cleanup. So I'm only expecting to uplift this one patch.

Flags: needinfo?(poirot.alex)
Flags: in-testsuite+

Comment on attachment 9080318 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Fixes a bug which can cause a site to be blocked on a debugger breakpoint even if devtools were closed. Approved for 69.0b14.

Attachment #9080318 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

pre and post nest functions are rather something specific to the thread actor
or the event loop classes. The only reason it is today on the target actors
is that the list of targetted windows can be different based on the target.
But we can figure out, from the thread actor's debuggees, the list of these
windows that are inspected by the thread actor.
I think that it is better that was as it better reflect what actual final
globals, the thread actor is interacting with. The previous code could possibly
introduce differences between globals for which we pauses the timeouts and events
versus the globals being inspected by the thread actor.

Comment on attachment 9084995 [details]
Bug 1567210 - Ensure resuming page execution when the toolbox closes.

Here is a new beta specific patch that should be green on beta:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56f3659c186c5dddc39950857a169a8f3b3f4ff

Flags: needinfo?(poirot.alex)

Verified as fixed using Firefox Beta 69.0b14 (20190814024055) on Windows 10, Mac OS 10.14.6 and Ubuntu 18.04.
Build from: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.