Closed Bug 1552017 Opened 5 years ago Closed 5 years ago

Expand the kinds of URLs that can cancel content JS when navigating from the location bar

Categories

(Core :: DOM: Navigation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox68 --- wontfix
firefox69 --- fixed

People

(Reporter: squib, Assigned: squib)

References

Details

(Whiteboard: [qa-69b-p2])

Attachments

(1 file, 1 obsolete file)

Attached patch Proof of concept (obsolete) — Splinter Review

There are a couple of minor issues with how RemoteWebNavigation handles the URL in loadURI when deciding whether to cancel content JS during navigation (see bug 1493225). In particular, it only applies to URLs beginning with http(s)://.

We should be able to expand this to include other URLs. There are two main cases I can think of here:

  1. about: URLs
  2. "URLs" with no protocol (you can end up with this if you type something into the location bar and then close the dropdown so it doesn't autocomplete the protocol)

Attached is a quick patch that handles (2), but (1) doesn't work yet. I'll investigate that a bit more to see if it's easy to fix, and then put up a finalized version for review.

This patch makes two changes to the kinds of URLs where we can cancel content
JS when navigating between them:

  1. When navigating directly to a URL (e.g. by typing something into the
    location bar and hitting Enter), we allow canceling content JS if the URLs
    differ in any way except their ref ("#").
  2. When navigating through history, we allow canceling content JS if the
    prePath part of the URLs differ. Most notably, this allows canceling
    content JS when one of the URLs is an about: page (e.g. when hitting the
    Home button).
Attachment #9065197 - Attachment is obsolete: true

Just a quick update on this: it turns out this patch breaks a lot of tests. I'm still figuring out why (and what we can do about it) though. I also noticed a couple small issues with my DocShell changes, which I've pushed to Phabricator.

Attachment #9066204 - Attachment description: Bug 1552017 - Expand the kinds of URLs that can cancel content JS when navigating; r?smaug → Bug 1552017 - Expand the kinds of URLs that can cancel content JS when navigating

Ok, I think I know what's going on here: many mochitests use window.open. This results in two tabs/windows sharing one JSContext, which causes problems for our "cancel content JS" code. If we navigate to a new URL in either the opener or the openee (and it seems this includes the initial "navigation" from window.open), we hit HangMonitorChild::InterruptCallback to cancel the content JS. However, our ability to cancel content JS when navigating is based on the idea that we're leaving the page anyway, so it doesn't matter if we trash stuff. With two tabs sharing one JSContext, we've lost that guarantee and can no longer safely cancel content JS.

Now I guess the question is: how can we determine that our JSContext is only used by one tab? Any thoughts here? Maybe we could set a flag when window.open is used so that we don't try to cancel content JS? Or maybe there's already something tracking the tabs (windows? documents?) using a JSContext?

Here's a quick summary of how the "cancel content JS" code currently works: when we navigate to a new page (e.g. by clicking back/forward or typing something into the location bar), we send a message to the ProcessHangMonitor to suggest canceling the current page's JS. This helps prevent issues where the page is doing something like while(true);, since that holds everything up and prevents the user from actually navigating away from the page (see bug 1493225).

In the ProcessHangMonitor, we request a JS interrupt where we check if we can actually cancel the JS[1], and if so, we return false from our InterruptCallback to cancel the JS. As described in comment 3, this poses problems when we have one JSContext for multiple tabs/documents (I think stopping JS from the InterruptCallback applies to the entire JSContext, right?).

[1] The check is essentially "will the page we're navigating to be a full document load?", though the details aren't super important right now.

:tcampbell, any thoughts on what to do here (see comment 3 and 4)? Feel free to forward this onto someone else if there's a better person to ask, of course. :)

Flags: needinfo?(tcampbell)

I haven't read the code or thought too hard about any of this, but maybe I can reply to one portion:

(In reply to Jim Porter (:squib) from comment #4)

In the ProcessHangMonitor, we request a JS interrupt where we check if we can actually cancel the JS[1], and if so, we return false from our InterruptCallback to cancel the JS. As described in comment 3, this poses problems when we have one JSContext for multiple tabs/documents (I think stopping JS from the InterruptCallback applies to the entire JSContext, right?).

That does not sound right.

JS is run-to-completion nearly everywhere. Multiple tabs normally share one JSContext -- one page will run some code, it will finish, then another page will run something (these are triggered by events of various sorts.) If you cancel the JS running for one page, then we'll return false from the interrupt callback which will unwind the stack (it's considered an "uncatchable exception"). Once the stack is empty, the toplevel caller will clear the exception state (presumably after doing something violent to that page so it doesn't keep trying to run more JS code) and we'll return to the event loop.

https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/script/ScriptSettings.cpp#535

So sharing a JSContext is expected, not a problem.

Said otherwise, stopping JS from the InterruptCallback applies to the "entire" JSContext, yes, but that phrasing implies a JSContext is some stateful thing. A JSContext is mostly an execution stack and an exception state. Stopping JS unwinds the entire stack, but that stack is normally only running code from one tab at a time.

Hm, then there must be something else causing the problem. This is what I'm seeing when running ./mach test browser/components/extensions/test/mochitest/test_ext_all_apis.html (presumably it's the same issue for other failing tests):

  1. The mochitest runner opens a page containing an iframe for test_ext_all_apis.html.
  2. test_ext_all_apis.html calls window.open('file_sample.html'); and then closes it shortly thereafter.
  3. That calls RemoteWebNavigation.loadURI('about:blank', ...), which calls maybeCancelContentJSExecution on the remote tab for the window.open window.
  4. Eventually, the message gets to HangMonitorChild::InterruptCallback, where we get the BrowserChild corresponding to the window.open window.
  5. We check if we could cancel JS for this window. We can, since we're leaving the page entirely (there's no actual JS on this page though)/
  6. We return false from our InterruptCallback.
  7. This results in the JS for test_ext_all_apis.html being canceled(!).

The last step is the issue, since we never left that page; we left the page from the window.open call. I'm not sure why this is happening, although the way you describe JSContexts, I think perhaps what I need is to be sure that, for my JSContext, the stack I'm in corresponds to the tab that called maybeCancelContentJSExecution in step (3). Does that make sense? If so, could I do something like this?

  • Get the current global for my JSContext (does the global change to match the tab currently running code in that context?)
  • Get the window from that global
  • Get the tab ID from that window and check that it matches the tab ID that triggered step (3) above; if so, we're running JS in the page that we're navigating away from, and we can terminate it
Flags: needinfo?(sphink)

(In reply to Jim Porter (:squib) from comment #7)

The last step is the issue, since we never left that page; we left the page from the window.open call. I'm not sure why this is happening, although the way you describe JSContexts, I think perhaps what I need is to be sure that, for my JSContext, the stack I'm in corresponds to the tab that called maybeCancelContentJSExecution in step (3). Does that make sense? If so, could I do something like this?

  • Get the current global for my JSContext (does the global change to match the tab currently running code in that context?)

Yes, the JSContext remembers which global it's running. I guess that's another important piece of the state: <stack,exception,realm>.

  • Get the window from that global
  • Get the tab ID from that window and check that it matches the tab ID that triggered step (3) above; if so, we're running JS in the page that we're navigating away from, and we can terminate it

Sounds reasonable. If I'm following correctly, you're cancelling something (by navigating away), and the JS cancellation gets queued up and may not be processed before the thing it refers to goes away. So it's a race between the page/global getting torn down and the queued cancellation? In which case, it seems like the two natural options are (1) for the teardown to erase the cancellation from the "queue", or (2) for the cancellation message to remember what it's for and do nothing if it no longer applies. As I understand it, you're doing (2). Whether the tab ID is the right piece of state to use, I'm not sure (I don't understand much of this setup.) But it does look like eg HangMonitorParent::CancelContentJSExecutionIfRunning is doing things based on tab ID, so perhaps it's fine. (My concern is if this is an iframe, could you still see the same tab ID even if the page being canceled is already gone? Would the iframe and page share a tab ID? Should this be based on global object instead? CancelContentJSOptions::Init() already remembers some bits of information, and could remember JS::CurrentGlobalOrNull(cx) as well. But maybe the tab ID is good enough. And I'm not sure what it means to pass a global through an IPC message.)

Flags: needinfo?(sphink)

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

Sounds reasonable. If I'm following correctly, you're cancelling something (by navigating away), and the JS cancellation gets queued up and may not be processed before the thing it refers to goes away. So it's a race between the page/global getting torn down and the queued cancellation? In which case, it seems like the two natural options are (1) for the teardown to erase the cancellation from the "queue", or (2) for the cancellation message to remember what it's for and do nothing if it no longer applies. As I understand it, you're doing (2).

Yeah, that's about right. I think (1) would be safer in some ways, but (2) fit in more readily with (my understanding of) the ProcessHangMonitor code that was already there.

(My concern is if this is an iframe, could you still see the same tab ID even if the page being canceled is already gone? Would the iframe and page share a tab ID? Should this be based on global object instead? CancelContentJSOptions::Init() already remembers some bits of information, and could remember JS::CurrentGlobalOrNull(cx) as well. But maybe the tab ID is good enough. And I'm not sure what it means to pass a global through an IPC message.)

Iframes should already be handled by BrowserChild::CanCancelContentJS. We only allow cancelling the JS if the root document is going away, so the tab ID should be sufficient. In the future, it'd be cool if we could handle iframes in a smarter way though.

Attachment #9066204 - Attachment description: Bug 1552017 - Expand the kinds of URLs that can cancel content JS when navigating → Bug 1552017 - Expand the kinds of URLs that can cancel content JS when navigating; r?smaug

(In reply to Jim Porter (:squib) from comment #10)

Down to just one set of suspicious-looking failures here: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249529447&repo=try&lineNumber=9314-9380

Fixed this in the latest push to Phabricator. I think everything is good now. Here's a try run for the latest version: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cde91faf84ae67643b1d3be459c4fd9d2834f660

Flags: needinfo?(tcampbell)
Pushed by jporter@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/04d9dba72f22
Expand the kinds of URLs that can cancel content JS when navigating; r=smaug
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Regressions: 1557243
Whiteboard: [qa-69b-p2]

https://bugzilla.mozilla.org/show_bug.cgi?id=1592470

Is this why when I use a bookmarklet to affect the page without navigating away from it(example url in bug link is, "avascript:void(document.title='reset');"). The canvas on the page stops working properly?

(In reply to Simon from comment #14)

https://bugzilla.mozilla.org/show_bug.cgi?id=1592470

Is this why when I use a bookmarklet to affect the page without navigating away from it(example url in bug link is, "javascript:void(document.title='reset');"). The canvas on the page stops working properly?

Could be... if you set dom.ipc.cancel_content_js_when_navigating to false in about:config, does it work as expected?

Flags: needinfo?(shoust)

It works as expected when that setting is false yes.

Flags: needinfo?(shoust)
Depends on: 1592470
No longer depends on: 1592470
Regressions: 1592470
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: