Closed Bug 1493225 Opened 6 years ago Closed 6 years ago

GeckoSession.goBack() can be unresponsive during a heavy page load.

Categories

(GeckoView :: General, enhancement, P1)

enhancement

Tracking

(Performance Impact:?, geckoview62 wontfix, firefox-esr60 wontfix, firefox63 wontfix, firefox64 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 fixed)

RESOLVED FIXED
mozilla68
Performance Impact ?
Tracking Status
geckoview62 --- wontfix
firefox-esr60 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: rbarker, Assigned: squib)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [geckoview:fxr:p1])

Attachments

(4 files, 16 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
This issue seems to be exacerbated on devices running the Snapdragon 821 chip set such as the Oculus Go and Google Pixel 1. This issue was specifically called out by Oculus during the Oculus App Store review of Firefox Reality. The issue gives the appearance that GeckoView is unable to go back until a page load has completed. Many pages at https://sketchfab.com/ download a great deal of data and often display this behavior. New devices that use the Snapdragon 835 chip set such as the Pixel 2 do not seem to display this issue to such a degree and are much more responsive. However, the 835 is a much higher performance chip set.
Whiteboard: [geckoview:fxr]
Randall, what is the priority of this bug?
Flags: needinfo?(rbarker)
It really only affects the Oculus Go (which happens to be are largest user base). There are also a lot of phones out there running the 82X (including the insanely popular Galaxy S7 I believe). This seems like it might be a good candidate for a mobile flow analysis?
Flags: needinfo?(rbarker)
Reminder to try with e10s on again and report findings. We may need to raise priority.
Flags: needinfo?(rbarker)
Priority: -- → P2
Whiteboard: [geckoview:fxr] → [geckoview:fxr:p1]
I captured profiles from Firefox Reality 1.0.1 (FxR) release running on an Oculus Go which uses the 821 Snapdragon chipset. All were captured while loading sketchfab.com and pressing the back button while the page was loading. These first two were captured with out e10s: https://perfht.ml/2zSR4Wp https://perfht.ml/2zS2vxE And these two were captured with e10s: https://perfht.ml/2zSkLH6 https://perfht.ml/2zTi6gj The back button press on the Oculus Browser loading the same page on the same hardware is almost instant while in FxR, the lag between pressing the back button and the browser actually going back can be noticeably long.
Flags: needinfo?(rbarker)
@ QF team: do you see any obvious problems in the profiles Randall captured in comment 4? On the Oculus Go, the main thread is so busy that it can't respond to the back button to exit the browser.
Whiteboard: [geckoview:fxr:p1] → [geckoview:fxr:p1][qf]
hmm, so, JS takes lots of time. But we could also improve the situation by forcing back/forward from UI to use higher priority tasks, at least when e10s is used (I don't know what kind of setup we have without e10s). And, we could check pending reloads/back/forward/etc in JS's interrupt handler, and if there is something pending, cancel the current JS. Or wait for Fission, which could just use another process.
Whiteboard: [geckoview:fxr:p1][qf] → [geckoview:fxr:p1][qf:js:investigate]
Steve, can you have the JS team do a quick triage on this JS investigate bug. Thanks!
Flags: needinfo?(sdetar)
(In reply to Olli Pettay [:smaug] from comment #6) > hmm, so, JS takes lots of time. But we could also improve the situation by > forcing back/forward from UI to use higher priority tasks, at least when > e10s is used (I don't know what kind of setup we have without e10s). > And, we could check pending reloads/back/forward/etc in JS's interrupt > handler, and if there is something pending, cancel the current JS. Randall says using a higher priority task for the back/forward events would require core Gecko changes. Would those changes be helpful (or at least harmless) on desktop?
Summary: GeckoSesion.goBack() can be unresponsive during a heavy page load. → GeckoSession.goBack() can be unresponsive during a heavy page load.
Hello Kannan, Can you take a look at this bug based on the js:investigate WB tag?
Flags: needinfo?(kvijayan)
Flags: needinfo?(sdetar)
(In reply to Olli Pettay [:smaug] (high review load) from comment #6) > hmm, so, JS takes lots of time. But we could also improve the situation by > forcing back/forward from UI to use higher priority tasks, at least when > e10s is used (I don't know what kind of setup we have without e10s). > And, we could check pending reloads/back/forward/etc in JS's interrupt > handler, and if there is something pending, cancel the current JS. > > Or wait for Fission, which could just use another process. Is it valid semantics to cancel the current running JS at all times? I'm not familiar enough with expected behaviour on the web. I took a look at the profile and the main issue just seems to be that a lot of JS is being run at startup. There are no easy fixes to this given the runtime of the JS - there are a handful of execution spans >12s here. Even getting that down by a significant percentage would leave a lot of lag in responding to the back button. This seems like a situation where a higher level fix is appropriate: either terminate the JS via interrupt when the back-button is clicked. I was wondering if it would be possible on 'back' to "abandon" the current page to a ghost (hidden) window that terminates gracefully, while the previous page is loaded into a new container instantly. I don't have a great feel for which is more feasible / doable. It seems like the interrupt-JS is the simplest fix. > Randall says using a higher priority task for the back/forward events would require core Gecko changes. Would those changes be helpful (or at least harmless) on desktop? This shouldn't be harmful on desktop, and if anything should improve responsiveness there as well (just to a smaller degree). Randall: if I understand correctly, changing the priority of the back/forward task should improve things, but for those long-running JS slices it will still delay the back until they finish execution, right?
Flags: needinfo?(rjesup)
Flags: needinfo?(kvijayan)
Flags: needinfo?(bugs)
(In reply to Kannan Vijayan [:djvj] from comment #10) > Is it valid semantics to cancel the current running JS at all times? I'm > not familiar enough with expected behaviour on the web. well, we can "kill" a web page at any time, the same way as interrupt-js handler already does. > I was wondering if it would be possible on 'back' to "abandon" the current > page to a ghost (hidden) window that terminates gracefully, while the > previous page is loaded into a new container instantly. how is this different to interrupting JS approach? I would expect interrupted page to not enter bfcache, so it would be destroyed asap a new page is loaded. > Randall: if I understand correctly, changing the priority of the > back/forward task should improve things, but for those long-running JS > slices it will still delay the back until they finish execution, right? No. If we'd do IPC layer high priority task for back/forward, we could have a flag telling to js engine that when its interrupt handler runs, it would just stop running.
Flags: needinfo?(bugs)
Randell != Randall I'm afraid... I think kannan meant to NI rbarker (Randall)
Flags: needinfo?(rjesup) → needinfo?(rbarker)
I believe if you do this, you'll have to dump the page from the bfcache, so on Forward it would be a reload (and potentially lost state, though we're not required to hold state there). It would also bypass onunload handlers, etc. We'd also need to cancel running JIT compiles/etc, of course. Since we would have interrupted things in mid-execution, the state of the page/DOM/etc and resources touched by it could be in inconsistent/"Impossible" states.
64=wontfix because FxR 1.1 is using GV 65 and this issue doesn't block Focus 8.0 from using GV 64.
Andreas, this bug will be a big problem for responsiveness. Is this problem reproducible on the Moto G5? rbarker says e10s doesn't help on FxR.
Flags: needinfo?(abovens)
Priority: P2 → P1
Pasting my email about this here. "It is not really high prio IPC task (in vsync sense), but checking whether we have pending "goBack". Similar to what billm did in https://bugzilla.mozilla.org/show_bug.cgi?id=1279086, but simpler and safer, I'd say, since we'd just cancel the current JS execution so that we could handle pending tasks. https://bug1279086.bmoattachments.org/attachment.cgi?id=8773987 modified PProcessHangMonitor.ipdl. Not sure who has time, but this shouldn't be too hard. Some new message similar to PaintWhileInterruptingJS. Perhaps "CancelContentJSExecutionIfRunning()" and then using high prio messages for goBack/Forward in PBrowser. Something like that. Would need to test a bit different setups. Like, do we need to ignore beforeunload and unload and pagehide event listeners. And bfcache should be disabled. "
Assignee: nobody → jporter+bmo
Status: NEW → ASSIGNED
This applies to desktop too, btw.
http://mozilla.pettay.fi/moztests/goback.html is a simple testcase. When trying to implement this, the tricky part is to figure out in which cases we can stop a page to run. We probably can't if session history is between iframes in the page, but if it is about top level pages, then it should be ok, I think. In other words, if session history has entries for iframes in the page and the page is busy executing JS, perhaps we should skip those session history entries and go back to the previous top level page.
:smaug, do you have any further details on where to start with this (or who else I should ask)? I think I have a pretty good idea of how this patch would need to be structured, but I'm not familiar with the relevant parts of the source that I'd be modifying.
Flags: needinfo?(bugs)
As I said in comment 17, I'd start looking bug 1279086. Then add the new message to PProcessHangMonitor.ipdl and handle the message on child process side in such way that long running scripts can be interrupted. You may want to take a look https://searchfox.org/mozilla-central/rev/e22c0d152060f4f8d4ca8904094f15f65a1b6f93/js/xpconnect/src/XPCJSContext.cpp#726-734 which deals with the slow script warning. I could think nsITabParent.idl to have a method for sending the interrupt message from parent side JS. Then once that all is working, make session history code in parent side to use it. Feel free to ping on IRC :)
Flags: needinfo?(bugs)
Flags: needinfo?(rbarker)
Attached patch WIP patch (obsolete) — Splinter Review
:smaug, does this look (roughly) like what we should be doing here? I haven't gotten the patch to actually stop the content JS yet, but I think I have the messages passed through all the components like in bug 1279086. I'm not sure if initiating this from <xul:browser> is quite right, but it's good enough for initial testing at least... Feel free to forward this on to someone else if you're too busy to take a look!
Attachment #9029888 - Flags: feedback?(bugs)
Comment on attachment 9029888 [details] [diff] [review] WIP patch >diff --git a/dom/interfaces/base/nsITabParent.idl b/dom/interfaces/base/nsITabParent.idl >--- a/dom/interfaces/base/nsITabParent.idl >+++ b/dom/interfaces/base/nsITabParent.idl >@@ -117,4 +117,6 @@ interface nsITabParent : nsISupports > * see nsISecureBrowserUI.contentBlockingLogJSON. > */ > Promise getContentBlockingLog(); >+ >+ void interruptScript(); I guess we'll need to pass TabId here so that we know which tab's script execution to interrupt possibly. >+++ b/dom/ipc/PProcessHangMonitor.ipdl >@@ -43,6 +43,8 @@ child: > async EndStartingDebugger(); > > async PaintWhileInterruptingJS(TabId tabId, bool forceRepaint, LayersObserverEpoch aEpoch); >+ >+ async CancelContentJSExecutionIfRunning(); so this would need to have TabId param too >+++ b/toolkit/content/widgets/browser.xml >@@ -55,8 +55,10 @@ > <body> > <![CDATA[ > var webNavigation = this.webNavigation; >- if (webNavigation.canGoBack) >+ if (webNavigation.canGoBack) { >+ this.frameLoader.tabParent.interruptScript(); > this._wrapURIChangeCall(() => webNavigation.goBack()); Initially I think we should interrupt script only when we're moving back to a different page, so not when doing fragment navigation or so
Attachment #9029888 - Flags: feedback?(bugs) → feedback+
Attached patch WIP patch v2 (obsolete) — Splinter Review
Attachment #9029888 - Attachment is obsolete: true
Attachment #9031281 - Flags: feedback?(bugs)
Comment on attachment 9031281 [details] [diff] [review] WIP patch v2 and then change canGo* to check if it is cross domain load and pass tabid to somewhere so that main thread can use it and check it in around https://searchfox.org/mozilla-central/rev/e22c0d152060f4f8d4ca8904094f15f65a1b6f93/js/xpconnect/src/XPCJSContext.cpp#718 Looks like TabChild can be get easily by adding a getter for https://searchfox.org/mozilla-central/source/dom/base/nsGlobalWindowInner.h#1362 and then compare to https://searchfox.org/mozilla-central/rev/49e78df13e7a505827a3a86daae9efdf827133c6/dom/ipc/TabChild.h#242 Something like that...
Attachment #9031281 - Flags: feedback?(bugs) → feedback+
Comment on attachment 9031281 [details] [diff] [review] WIP patch v2 :tcampbell, I'm not sure if you're the best person to ask about this, but do you know the best way to terminate (or pause) a tab's content JS context from within HangMonitorChild (at least, I think that's where I want to do this)? Here's the relevant part of the diff: >+mozilla::ipc::IPCResult HangMonitorChild::RecvCancelContentJSExecutionIfRunning( >+ const TabId &aTabId) { >+ MOZ_RELEASE_ASSERT(IsOnThread()); >+ >+ printf("HangMonitorChild::RecvCancelContentJSExecutionIfRunning(%lld)\n", >+ (long long)(int64_t(aTabId))); >+ // FIXME: Do something here! >+ return IPC_OK(); >+} If it helps, all I really need is to throttle or stop any content JS that's blocking page navigation from occurring. For the simple case (no iframes), I think killing the content JS entirely is fine, but I haven't been able to get it to work yet.
Attachment #9031281 - Flags: feedback?(tcampbell)
Comment on attachment 9031281 [details] [diff] [review] WIP patch v2 Review of attachment 9031281 [details] [diff] [review]: ----------------------------------------------------------------- Overall seems like a good plan. The slow-script dialog already has a mechanism to safely stop JS execution at next reasonable opportunity (my wild guess is < 1us). Start looking at JS_RequestInterruptCallback. ::: dom/ipc/ProcessHangMonitor.cpp @@ +455,5 @@ > + MOZ_RELEASE_ASSERT(IsOnThread()); > + > + printf("HangMonitorChild::RecvCancelContentJSExecutionIfRunning(%lld)\n", > + (long long)(int64_t(aTabId))); > + // FIXME: Do something here! Start by looking at |JS_RequestInterruptCallback|
Attachment #9031281 - Flags: feedback?(tcampbell) → feedback+
Hmm, I've been thinking that maybe the plumbing in my patch isn't actually needed. I think I have something that at least roughly works now (though it's not ready to post), and based on :smaug's suggestion in comment 25, it looks like all I really need is to set the tab ID somewhere that XPCJSContext::InterruptCallback can read. I can get the tab ID in TabParent (it looks like it's always the same as the ID in TabChild), so maybe I don't even need to get the hang monitor involved at all? Would going through the hang monitor make it easier to pass the tab ID to XPCJSContext::InterruptCallback?
Flags: needinfo?(bugs)
Answered on IRC. Thanks again, :smaug!
Flags: needinfo?(bugs)
Product: Firefox for Android → GeckoView
Attached patch Mostly-working patch v3 (obsolete) — Splinter Review
Ok, this patch seems to work for the simple case, though I had to do some gross stuff to tell xpconnect to stop the content JS. I'll work on cleaning that up and then try to be smarter about how/when to run this (e.g. be careful with iframes).
Attachment #9031281 - Attachment is obsolete: true
Attached patch Cleaned up patch v4 (obsolete) — Splinter Review
Attachment #9032778 - Attachment is obsolete: true
Attached patch Patch v5 (obsolete) — Splinter Review
:smaug, how does this patch look overall? This patch works for the simple case, though as I alluded to above, I haven't done anything to handle more complex cases like iframes. I'm sure this isn't in a landable state yet, so I'd be interested to see what you think I need to work on next to get this landed. Maybe I can talk to session history to handle things like iframes? It looks like I need to call `interruptScript()` in my patch pretty early on though, since the normal IPC messages stall when going from the TabParent to the TabChild. I'm not sure how I'd talk to session history at this point. I'll also probably need to make a better API for talking to XPCJSContext than just setting the static `gTabIdToCancel` value.
Attachment #9034300 - Attachment is obsolete: true
Attachment #9034501 - Flags: feedback?(bugs)
I think initially we should handle only top level page session history loads, since stopping scripts when dealing with iframes could easily break pages. So go*() functions should check the type of the load. One thing this needs is to disable bfcache on the document which is cancelled. XPCJSContext::InterruptCallback could just set a flag on top level Document (win->GetExtantDoc()->GetTopLevelContentDocument()) and then
Comment on attachment 9034501 [details] [diff] [review] Patch v5 Might be worth to also add a pref for this to StaticPrefList.h
Attachment #9034501 - Flags: feedback?(bugs) → feedback+

(In reply to Olli Pettay [:smaug] from comment #33)

I think initially we should handle only top level page session history
loads, since stopping scripts when dealing with iframes could easily break
pages. So go*() functions should check the type of the load.

Is there a good way to check the type of load? I was thinking of checking session history, but I can't do that from the chrome process due to bug 1233803 (or at least, I don't know how to do it). Otherwise, I'd add the check in RemoteWebNavigation.js

Maybe I could check in HangMonitorChild::RecvCancelContentJSExecutionIfRunning? I'd have to tell that function how the user navigated though (back, forward, going to a particular index in history)...

Flags: needinfo?(bugs)

Hmm, parent process should know the history, since back-forward button, when pressed long time, shows it.
Firefox UI folks would know that better. Perhaps jaws?

If parent side doesn't have enough information, then we'd need to basically send back/forward commands as the interrupt message using HangMonitor channel, but interrupt JS run only after checking that session history navigation is cross-page. And then dispatch a runnable to trigger back/forward asynchronously.

Flags: needinfo?(bugs)

:mconley, you worked on bug 1233803, which moved sessionHistory stuff from the chrome process. Do you know what the right way to get an item from session history would be here? I'm starting in RemoteWebNavigation.js, so something in the chrome process would be the easiest for me. All I really need to know is: "When navigating through history, is the next page going to be a top-level load?" Eventually, the messages in this patch end up going through the HangMonitor, so I could check in a child process via HangMonitorChild. However, that's a lot more plumbing than just doing it in the chrome process. :)

Flags: needinfo?(mconley)
Attachment #9034501 - Attachment is obsolete: true

Fission should make this easier by moving session history state back into the parent process here in bug 1438272.

The way the back and forward longpress/context menu currently does this is by asking SessionStore:

https://searchfox.org/mozilla-central/rev/5053031ba7621fa8f63f42de4c204ab3561e4e59/browser/base/content/browser.js#4278

This relies on the parent having an up-to-date snapshot of the tab's session history state - and those updates get batched and set up every 2 seconds or so when they change. So that's a thing to keep in mind.

So here's what I suggest:

  1. See how far along the Fission team is to moving this stuff into the parent process. I suspect it's still a ways out - but we can ask. If the timeline is reasonable, use that.

  2. Assuming you need to do this check in native code, we might have to expose an XPCOM hack to let C++ talk to SessionStore to get that information out, and do that until the Fission stuff is ready (and then switch over and remove the hack).

Hey nika, what's story with bug 1438272 - can we assume that's still a ways out?

Flags: needinfo?(mconley) → needinfo?(nika)

mconley, we're almost getting started with SessionHistory now, so no progress to report just yet.

Flags: needinfo?(nika)

:mconley, thanks for the info! I'll have to see whether it'd be better to do this in C++ or JS once I get this working on GeckoView as well (I'm guessing it'll be easier to do in JS though...)

:snorp, smaug suggested I ping you about how to hook up the patch in attachment 9035176 [details] [diff] [review] to GeckoView. I had thought I could call TabParent::interruptScript() here:

https://searchfox.org/mozilla-central/rev/c21d6620d384dfb13ede6054015da05a6353b899/mobile/android/modules/geckoview/GeckoViewNavigation.jsm#52

similar to what I do in toolkit/components/remotebrowserutils/RemoteWebNavigation.js. However, it doesn't seem to work. I'm still debugging stuff on my end, but is there anything in particular I should know about differences in GeckoView's architecture compared to desktop here?

Just as a quick summary of the patch, on desktop, I start by calling TabParent::interruptScript(), which calls ContentParent::CancelContentJSExecutionIfRunning(), which calls the ProcessHangMonitor, eventually setting a flag on XPCJSContext that terminates the content JS so the chrome JS for our back button can run.

Flags: needinfo?(snorp)

(In reply to Jim Porter (:squib) from comment #43)

:snorp, smaug suggested I ping you about how to hook up the patch in attachment 9035176 [details] [diff] [review] to GeckoView. I had thought I could call TabParent::interruptScript() here:

https://searchfox.org/mozilla-central/rev/c21d6620d384dfb13ede6054015da05a6353b899/mobile/android/modules/geckoview/GeckoViewNavigation.jsm#52

similar to what I do in toolkit/components/remotebrowserutils/RemoteWebNavigation.js. However, it doesn't seem to work. I'm still debugging stuff on my end, but is there anything in particular I should know about differences in GeckoView's architecture compared to desktop here?

GeckoView is just using a <browser>, which should already be using RemoteWebNavigation.js, so you should only have to add your interruptScript() stuff in the one place. But no, I don't think we're really doing anything special there.

Just as a quick summary of the patch, on desktop, I start by calling TabParent::interruptScript(), which calls ContentParent::CancelContentJSExecutionIfRunning(), which calls the ProcessHangMonitor, eventually setting a flag on XPCJSContext that terminates the content JS so the chrome JS for our back button can run.

That seems like it should work, I guess? Are things getting dropped along the way somewhere?

Flags: needinfo?(snorp)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #44)

That seems like it should work, I guess? Are things getting dropped along the way somewhere?

I dug into this a bit more and it turns out there's no hang monitor in ContentParent with GeckoView (at least in GeckoView-example), so we can't actually pass the message along. I don't know if this is just a part of GeckoView that hasn't been implemented yet or if there's some other way we should do things in this case. Any ideas?

Flags: needinfo?(snorp)

I've been thinking over how to determine whether something will be a top-level load, and there are a few subtleties that I think we'd need to handle. The biggest one is that we need to be able to cancel content JS not only when you go back/forward, but when you navigate to an arbitrary point in the session history (e.g. go back 3 pages). It's unlikely, but certainly possible that you'd have a session history like this:

Page A, iframe A1
Page B
Page A, iframe A2
Page A, iframe A3 (current)

I'm not 100% sure how the actual load would be handled, but I'd expect "go back 3 pages" from the current to trigger a top-level load, even though the top-level pages are the same. That means we'd have to check every session history item between the start and the end to ensure that the top-level page never changes. This seems like something complex enough that we'd want to have a special SessionHistory function that asks "would navigating to the following item in history trigger a top-level load"? Then we could just expose that function to chrome somehow, rather than doing a workaround while we wait for bug 1438272. It's even possible such a function already exists; I'm not super-familiar with the session history code.

Does the above sound reasonable, :mconley? Is there anything else I should factor in that I've missed?

Flags: needinfo?(mconley)

(In reply to Jim Porter (:squib) from comment #45)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #44)

That seems like it should work, I guess? Are things getting dropped along the way somewhere?

I dug into this a bit more and it turns out there's no hang monitor in ContentParent with GeckoView (at least in GeckoView-example), so we can't actually pass the message along. I don't know if this is just a part of GeckoView that hasn't been implemented yet or if there's some other way we should do things in this case. Any ideas?

That looks like an oversight. I filed bug 1521087 and put up a patch. We still don't get the slow script warning (bug 1521094), but the right events should be emitted now.

Flags: needinfo?(snorp)

(In reply to Jim Porter (:squib) from comment #46)

Does the above sound reasonable, :mconley? Is there anything else I should factor in that I've missed?

Tentatively reasonable... so, one thing I'll note is that nsISHEntry, which is what eventually gets serialized via SessionHistory.jsm in the content process and sent up to the parent, has a isSubFrame property this is sent with it.

So I guess the "is top-level" check could look at the starting index, and work its way back to the target index, and if it ever sees an entry where isSubFrame is false, then bam, I guess we know we need a top-level load.

Hey qDot, I hate to rope you into this, but I know you've been down in the guts of DocShell lately... do you have any additional thoughts on this, or things we might be missing?

Flags: needinfo?(mconley) → needinfo?(kyle)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #48)

Tentatively reasonable... so, one thing I'll note is that nsISHEntry, which is what eventually gets serialized via SessionHistory.jsm in the content process and sent up to the parent, has a isSubFrame property this is sent with it.

So I guess the "is top-level" check could look at the starting index, and work its way back to the target index, and if it ever sees an entry where isSubFrame is false, then bam, I guess we know we need a top-level load.

:qDot has roped me in to help out with this ^_^

Hmm, so the decision as to whether or not to do a "short-circuited load" is done here: https://searchfox.org/mozilla-central/rev/39265dd58298c40bed029617ad408bf806cce761/docshell/base/nsDocShell.cpp#9089-9106

This depends on knowing which nsISEntryShared is being used by each of these entries. To my knowledge this isn't stored in the SessionStore's understanding of session history. We'd also need to know whether the entry is due to a frame load, but that's already available in the parent.

Flags: needinfo?(kyle)

If we send down the InnerWindowID for the nsISHEntryShared for each entry and store it in SessionStore, we may be able to determine whether a short-circuited load is occurring? We'd need to add that information to each entry when serializing it here: https://searchfox.org/mozilla-central/rev/39265dd58298c40bed029617ad408bf806cce761/toolkit/modules/sessionstore/SessionHistory.jsm#142

Here are my thoughts about how to do this:

First, should we ask the DocShell directly whether a load is short-circuited? Or would it be ok/better to ask session history? (Maybe the above check for short-circuited loads should just be moved into the session history code so the logic is shared?)

Second, I'm worried about using the SessionStore's representation of history (i.e. in the parent process), since per comment 40, this is synced every 2 seconds or so. In the case where a user clicks on a link and then immediately tries to go back, we'd be working with an outdated version of the session history, which could lead to us doing the wrong thing here.

Finally, the big question I still have is where to do the check to determine if the load is short-circuited. With bug 1438272, I think it would make sense to put this in the parent process (e.g. in RemoteWebNavigation.js). However, in the short term, I think we'd need to do the check in the content process (see my second point above). One way to do this is to make a new interface for the parent process that just asks the content process whether a load would be short-circuited; this would be async for now, but perhaps could become synchronous when bug 1438272 lands. Another way would be to pass the navigation we're doing (e.g. "go back 3 pages") down to the content process and do the "is short-circuited" check down there. The latter means that we'd have to make bigger changes to this code when bug 1438272 lands, but it might also be easier to do now. Since we'd like to eventually enable this patch for all loads, including short-circuited ones, maybe we don't need to think too far into the future with this; the check might go away entirely in the medium term.

What do people think about these options?

(I'm sort of leaning towards making a special session history function to check if a load should be short-circuited, and then exposing an API for that in both the content process (for the DocShell) and the parent process (for this patch), and then doing my check in RemoteWebNavigation.js. However, I'm not entirely sure this is the right way to do things.)

Flags: needinfo?(rjesup)
Flags: needinfo?(nika)

A quick note on how this happens at the moment in the content process on a new load.

I'm currently refactoring short circuited loads out of nsDocShell::InternalLoad as part of Bug 1471711, specifically https://phabricator.services.mozilla.com/D17018. To get to that part of the check in the new short circuiting function, we currently need:

  • Current URI
  • URI we're going to
  • Load Type
  • A SHEntry to check against
  • Whether or not this load has a post data stream attached

And that's about it. All of these are currently attached to a DocShellLoadState object. We could probably shuffle the "are we even doing a short circuited load" logic somewhere else if we need to access it outside of actually running the history load.

(In reply to Jim Porter (:squib) from comment #51)

Second, I'm worried about using the SessionStore's representation of history (i.e. in the parent process), since per comment 40, this is synced every 2 seconds or so. In the case where a user clicks on a link and then immediately tries to go back, we'd be working with an outdated version of the session history, which could lead to us doing the wrong thing here.

  1. I agree with that worry.
  2. Right now, GeckoView doesn't even have something comparable to the Desktop session store.

(In reply to Jan Henning [:JanH] from comment #53)

(In reply to Jim Porter (:squib) from comment #51)

Second, I'm worried about using the SessionStore's representation of history (i.e. in the parent process), since per comment 40, this is synced every 2 seconds or so. In the case where a user clicks on a link and then immediately tries to go back, we'd be working with an outdated version of the session history, which could lead to us doing the wrong thing here.

  1. I agree with that worry.
  2. Right now, GeckoView doesn't even have something comparable to the Desktop session store.

In that case, we're in a bit of an awkward situation. The parent process currently doesn't have any reflection of session history which is super reliable (although we're working on it in bug 1467221). Without that information, it isn't really possible for us to correctly invoke the check to see if we have a same-document load from OOP.

We'd have to bounce into content, interrupt script execution, and do a check there to see what the behaviour would be.

Flags: needinfo?(nika)

(In reply to :Nika Layzell from comment #54)

We'd have to bounce into content, interrupt script execution, and do a check there to see what the behaviour would be.

Hmm, that does seem bad, given that we'd essentially be interrupting script execution to see if we should, well, interrupt script execution.

Now, maybe there's a way to simply pause the content script so that our chrome JS can run and then resume the content script where it left off, but then I don't think we'd need to check for short-circuited loads anyway: we'd just pause the content script to let the chrome JS do its navigation stuff and then, if the content JS is still around (as in a short-circuited load), resume it. I'm not sure if there's a way to do this though...

Kannan: I just wanted to check... is there a good way to pause the content JS and resume it later (so that our chrome JS can do its thing)? That would probably solve our issue here a lot more neatly than cancelling the content JS entirely...

Flags: needinfo?(kvijayan)

Why can't we send "back" (or forward) message to child process via HangMonitor channel. If we detect that we can stop processing JS, stop. Any in any case, dispatch a runnable to the main thread to actually do 'back' action.

Interrupting JS is not difficult, but interrupting JS to run other JS in the same runtime is not really feasible with the current setup - as far as I know.

Can setting an interrupt and waiting until the next event loop fire (at which point executing more JS should be fine) not work?

Flags: needinfo?(kvijayan)
Flags: needinfo?(jporter+bmo)

(In reply to Kannan Vijayan [:djvj] from comment #58)

Interrupting JS is not difficult, but interrupting JS to run other JS in the same runtime is not really feasible with the current setup - as far as I know.

Can setting an interrupt and waiting until the next event loop fire (at which point executing more JS should be fine) not work?

I'm not totally clear on the differences between your first paragraph and your second. What you've said in the first paragraph is how I understood things to be; is there a different way to set an interrupt that would help here?

Currently, I'm looking into doing something more like what :smaug suggested in comment 57, though I'm not sure how much I can rely on the HangMonitor being around. From bug bug 1521087, I can see that the HangMonitor can be disabled with a pref, and so at minimum, we'd want to use a non-HangMonitor way of passing the goBack message when it's preffed off. I'm not sure we want even more code paths for handling these messages though...

Flags: needinfo?(jporter+bmo)
Flags: needinfo?(abovens)

Quick update: I have a proof of concept that checks session history via the HangMonitor and only cancels content JS on non-short-circuited loads. It still needs some cleanup (in particular, I'm only handling the back button now, not arbitrary history navigation).

What remains after this:

  • Figure out how aggressive this patch should be; we don't want to trigger this except when necessary, since we're removing the current page from the BF cache when we cancel its JS.
  • Tests
  • Reviews!
Attachment #9035176 - Attachment is obsolete: true
Attachment #9045538 - Flags: feedback?(bugs)

In this part, I pass along the navigation type (and index for when using the dropdown on the back/forward buttons). This allows us to check if there's a top-level load "between" the start and end pages. The patch might look a bit strange, since we're passing the navigation operation to two places from RemoteWebNavigation.js (the normal message passing that existed before this patch and the HangMonitor channel in this patch). This is primarily to make it easier to stop passing the navigation info along the HangMonitor channel once session history is uplifted into the parent process. At that point, the check for whether there's a top-level load could happen in TabParent (I think).

Attachment #9045539 - Flags: feedback?(bugs)
Comment on attachment 9045539 [details] [diff] [review] Part 2: Pass navigation operation along HangMonitor channel to selectively cancel content JS This is the same as the part 1
Attachment #9045539 - Flags: feedback?(bugs)

Uploaded the correct patch this time!

Attachment #9045539 - Attachment is obsolete: true
Attachment #9046550 - Flags: feedback?(bugs)
Comment on attachment 9045538 [details] [diff] [review] Part 1: Unconditionally cancel content JS Unconditionally interrupting JS is fine, but conceptually making that also disable bfcaching is weird. So, if bfcaching needs to be bound to js interruption, then the ipdl message and methods should be called something a bit different.
Attachment #9045538 - Flags: feedback?(bugs)
Comment on attachment 9046550 [details] [diff] [review] Part 2: Pass navigation operation along HangMonitor channel to selectively cancel content JS (fixed) ah, this way. Yes, and if the first param for maybeCancelContentJSExecution would be some good enum value, then this would be reasonable easy to understand. And could be used later for other things. But, HangMonitorChild::RecvCancelContentJSExecutionIfRunning doesn't run on main thread, so it can't touch session history. Need to store the navigation type and index somewhere, and do CanCancelContentJS check in the interrupt callback.
Attachment #9046550 - Flags: feedback?(bugs)

I'd assume the part 2 asserted on debug builds about wrong thread.

Comment on attachment 9046550 [details] [diff] [review] Part 2: Pass navigation operation along HangMonitor channel to selectively cancel content JS (fixed) >+ nsCOMPtr<nsISHistory> history = do_GetInterface(WebNavigation()); >+ if (history) { >+ int32_t current; >+ history->GetIndex(&current); >+ if (aNavigationType == 0) { >+ aNavigationIndex = current - 1; >+ } else if (aNavigationType == 1) { >+ aNavigationIndex = current + 1; >+ } >+ >+ int32_t delta = aNavigationIndex > current ? 1 : -1; >+ int32_t start = delta == 1 ? current + 1 : current; >+ int32_t end = delta == 1 ? aNavigationIndex + 1 : aNavigationIndex; >+ for (int32_t i = start; i != end; i += delta) { >+ nsCOMPtr<nsISHEntry> entry; >+ history->GetEntryAtIndex(i, getter_AddRefs(entry)); >+ if (!entry) { >+ // Something went wrong... XXX do something here? >+ return false; >+ } >+ >+ nsCOMPtr<nsIURI> uri = entry->GetURI(); >+ printf(" checking %s\n", uri->GetSpecOrDefault().get()); >+ if (!entry->GetIsSubFrame()) { >+ return true; >+ } >+ } >+ } >+ return false; >+} Another thing, how does this handle fragment navigations, pushState and such. Should this just check if the very top level shentry is about to change the domain.

This is my attempt to convert the navigationType to an enum. It doesn't work though, since I don't know how to get the enum definition to be imported into PProcessHangMonitor.ipdl. Is there something obvious I'm missing? I looked through the m-c source and didn't find any examples I could copy from...

Attachment #9048350 - Flags: feedback?(bugs)

This is the same as the previous version of part 1, but without debug printfs

Attachment #9045538 - Attachment is obsolete: true

As above, but without the failed attempt to merge. :(

Attachment #9048392 - Attachment is obsolete: true

This moves the call to CanCancelContentJS to the interrupt callback. It seems a little strange to request another interrupt inside the interrupt callback though; should I move all the change from XPCJSContext::InterruptCallback to HangMonitorChild::InterruptCallback? (I think we'd end up moving it back when session history is in the main process though.)

Attachment #9046550 - Attachment is obsolete: true
Attachment #9048396 - Flags: feedback?(bugs)
Attachment #9048396 - Attachment description: go-back-2.patch → Part 2: Pass navigation operation along HangMonitor channel to selectively cancel content JS (v2)
Comment on attachment 9048350 [details] [diff] [review] Part 2.5: Non-working patch to use enums looks fine, but I don't know why that ipdl part doesn't work.
Attachment #9048350 - Flags: feedback?(bugs) → feedback+
Comment on attachment 9048396 [details] [diff] [review] Part 2: Pass navigation operation along HangMonitor channel to selectively cancel content JS (v2) >+ >+ if (cancelContentJS) { >+ RefPtr<TabChild> tabChild = TabChild::FindTabChild(cancelContentJSTab); >+ if (tabChild) { >+ js::AutoAssertNoContentJS nojs(mContext); >+ if (tabChild->CanCancelContentJS(cancelContentJSNavigationType, >+ cancelContentJSNavigationIndex)) { >+ // Tell xpconnect that we want to cancel the content JS in this tab >+ // during the next interrupt callback. >+ // XXX: There's probably a better API for this... >+ XPCJSContext::gTabIdToCancel = cancelContentJSTab; Perhaps we could store the TabIdToCancel in TabChild as a static variable and calling CanCancelContentJS could set it >+bool TabChild::CanCancelContentJS(int32_t aNavigationType, >+ int32_t aNavigationIndex) { >+ if (aNavigationType == 3) { >+ return true; >+ } >+ >+ nsCOMPtr<nsISHistory> history = do_GetInterface(WebNavigation()); >+ if (history) { >+ int32_t current; >+ history->GetIndex(&current); >+ if (aNavigationType == 0) { >+ aNavigationIndex = current - 1; >+ } else if (aNavigationType == 1) { >+ aNavigationIndex = current + 1; >+ } >+ >+ int32_t delta = aNavigationIndex > current ? 1 : -1; >+ int32_t start = delta == 1 ? current + 1 : current; >+ int32_t end = delta == 1 ? aNavigationIndex + 1 : aNavigationIndex; >+ for (int32_t i = start; i != end; i += delta) { >+ nsCOMPtr<nsISHEntry> entry; >+ history->GetEntryAtIndex(i, getter_AddRefs(entry)); >+ if (!entry) { >+ // Something went wrong... XXX do something here? >+ return false; >+ } >+ >+ nsCOMPtr<nsIURI> uri = entry->GetURI(); >+ if (!entry->GetIsSubFrame()) { >+ return true; And need to still check the URI here, right? Comparing it to the current one and check that it is not same origin. (or we could tweak it (later?) tad more to allow same origin, but certainly separate page)
Attachment #9048396 - Flags: feedback?(bugs) → feedback+

https://searchfox.org/mozilla-central/rev/3e0f1d95fcf8832413457e3bec802113bdd1f8e8/docshell/base/nsDocShell.cpp#8829
can be useful to check the shentry navigation between documents.
(in-documents navigations can happen with fragment navigation or push/replaceState)

Comment on attachment 9048350 [details] [diff] [review] Part 2.5: Non-working patch to use enums froydnj: smaug suggested I ask you about this. Could you take a look at this patch and see if I'm doing something wrong? I tried to define an enum in an IDL file and then import it to use in an IPDL file. However, I get the following error when building: https://dpaste.de/q85U
Attachment #9048350 - Flags: feedback?(nfroyd)

smaug: One more (hopefully quick!) question: do you have any ideas on what I should do here for automated tests? I was hoping to base tests here on ones from bug 1279086, but that bug doesn't appear to have any tests! Are automated tests not needed for something like this? (It sure would make my job easier if they aren't, although I tend to prefer having tests so we don't regress things accidentally.)

Flags: needinfo?(bugs)
Comment on attachment 9048350 [details] [diff] [review] Part 2.5: Non-working patch to use enums Review of attachment 9048350 [details] [diff] [review]: ----------------------------------------------------------------- You need to define how to serialize nsITabParent::NavigationType via a ParamTraits specialization. See examples from: https://searchfox.org/mozilla-central/search?q=ContiguousEnumSerializer&case=false&regexp=false&path= and the documentation for ContiguousEnumSerializer and ContiguousEnumSerializerInclusive in IPCMessageUtils.h. The only wrinkle with that approach is that the IPDL code assumes that the header the enum comes from defines the ParamTraits specialization, so you'll need a bunch of extra %{C++ blocks or similar in nsITabParent.idl.
Attachment #9048350 - Flags: feedback?(nfroyd)

Automated test could be a browser test perhaps. Since this behavior should be behind a pref,
perhaps test first pref off, going back, and then forward and check that when going forward, pageshow.persisted should be true (to tell that bfcache wasn't disabled) and then enable the pref and do the same again and pageshow.persisted should be false.

Flags: needinfo?(bugs)

This fixes things so we actually use an enum. Thanks for the pointer, :froydnj! It required a lot of changes to the moz.build files, which worries me a bit, but maybe it's not really an issue in practice. I'm going to take a look at history pushState stuff and on logging how often we block the bfcache with this patch (if it's too often, we'll probably have to be less aggressive with cancelling content JS). At that point, this should be pretty much ready for review.

Attachment #9048350 - Attachment is obsolete: true
Attachment #9048396 - Attachment is obsolete: true

This is the same patch as before, but I exported it with the commit message. I'll try to figure out how to submit this stuff through Phabricator (I think that's what it's called...) tomorrow. :)

Attachment #9048393 - Attachment is obsolete: true
Attachment #9049855 - Flags: review?(bugs)

This patch only cancels content JS when the host+port changes between SH entries. If you think it makes sense, I could merge this and part 1 before landing.

Attachment #9049358 - Attachment is obsolete: true
Attachment #9049856 - Flags: review?(bugs)

This patch passes a message through the HangMonitor channel when navigating
through history to cancel content JS that could hang the chrome JS in the
content process responsible for history navigation. If the content JS is
actually canceled, this also disables the BF cache for the current page, since
it could end up in an inconsistent state due to the JS cancellation.

In this part, we pass along the navigation type (and index for when using the
dropdown on the back/forward buttons). This allows us to check if there's a
top-level load "between" the start and end pages.

The patch might look a bit strange, since we're passing the navigation
operation to two places from RemoteWebNavigation.js (the normal message passing
that existed before this patch and the HangMonitor channel in this patch). This
is primarily to make it easier to stop passing the navigation info along the
HangMonitor channel once session history is uplifted into the parent process.
At that point, the check for whether there's a top-level load could happen in
TabParent (I think).

Depends on D23089

Attachment #9049855 - Attachment is obsolete: true
Attachment #9049855 - Flags: review?(bugs)
Attachment #9049856 - Attachment is obsolete: true
Attachment #9049856 - Flags: review?(bugs)

Patches are rebased onto latest m-c (52b03bc34899) and submitted through Phabricator.

I updated both patches, though I'm not sure what the right way to fix your comments for part 1 is. I also haven't fixed the issue with the user manually typing in a URL fragment in the location bar for part 2 yet; that's coming up next.

Flags: needinfo?(bugs)

This patch adds an ID to ensure that we avoid canceling content JS if the next
page already started loading by the time we're ready to try canceling the JS.

Depends on D23090

Some test here would be good.

Flags: needinfo?(bugs)

Add tests to verify the behavior of the previous patches.

Depends on D25164

Attachment #9050206 - Attachment description: Bug 1493225, part 1 - Cancel content JS when navigating through history to prevent hangs; r?smaug → Bug 1493225, part 1 - Cancel content JS when navigating through history to prevent hangs
Attachment #9050208 - Attachment description: Bug 1493225, part 2 - Cancel content JS when navigating through history to prevent hangs; r?smaug → Bug 1493225, part 2 - Cancel content JS when navigating through history to prevent hangs
Attachment #9054083 - Attachment description: Bug 1493225, part 3 - Cancel content JS when navigating through history to prevent hangs; r?smaug → Bug 1493225, part 3 - Cancel content JS when navigating through history to prevent hangs
Attachment #9057989 - Attachment description: Bug 1493225, part 4 - Cancel content JS when navigating through history to prevent hangs; r?smaug → Bug 1493225, part 4 - Cancel content JS when navigating through history to prevent hangs
Pushed by jporter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e9c116dba541 part 1 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/1de539e8f568 part 2 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/3537b8f8d38a part 3 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/62ee0aed9e59 part 4 - Cancel content JS when navigating through history to prevent hangs r=smaug

Backed out for browser-chrome failures

Backout: https://hg.mozilla.org/integration/autoland/rev/06cc9cecc881b931bf5275f1b7b25adb1af9c56d

Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=62ee0aed9e59a3b157b04a19177c4039d1d494e6

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=241490893&repo=autoland&lineNumber=6855

20:37:04 INFO - TEST-OK | toolkit/components/normandy/test/browser/browser_actions_ShowHeartbeatAction.js | took 167ms
20:37:04 INFO - checking window state
20:37:06 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 10264
20:37:06 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 3644
20:37:06 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 1328
20:37:06 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 7552
20:37:06 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 7412
20:37:07 INFO - GECKO(1880) | Completed ShutdownLeaks collections in process 11228
20:37:07 INFO - TEST-START | Shutdown
20:37:07 INFO - Browser Chrome Test Summary
20:37:07 INFO - Passed: 1200
20:37:07 INFO - Failed: 0
20:37:07 INFO - Todo: 1
20:37:07 INFO - Mode: e10s
20:37:07 INFO - *** End BrowserChrome Test Results ***
20:40:13 INFO - GECKO(1880) | Exiting due to channel error.
20:40:13 INFO - GECKO(1880) | Exiting due to channel error.
20:40:14 INFO - TEST-INFO | Main app process: exit 80000003
20:40:14 INFO - Buffered messages finished
20:40:14 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 2147483651
20:40:14 INFO - runtests.py | Application ran for: 0:03:37.510000
20:40:14 INFO - zombiecheck | Reading PID log: c:\users\task_1555700576\appdata\local\temp\tmpuvfkpmpidlog
20:40:14 INFO - ==> process 11228 launched child process 4936 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.0.1276749770\1876179686" -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 gpu)
20:40:14 INFO - ==> process 11228 launched child process 3644 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.6.1984891992\433242137" -childID 1 -isForBrowser -prefsHandle 2808 -prefMapHandle 2804 -prefsLen 1 -prefMapSize 197180 -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 tab)
20:40:14 INFO - ==> process 11228 launched child process 7552 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.13.1722184701\1549202633" -childID 2 -isForBrowser -prefsHandle 2980 -prefMapHandle 2976 -prefsLen 98 -prefMapSize 197180 -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 tab)
20:40:14 INFO - ==> process 11228 launched child process 7412 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.20.1795948229\1932126039" -childID 3 -isForBrowser -prefsHandle 3224 -prefMapHandle 3232 -prefsLen 159 -prefMapSize 197180 -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 tab)
20:40:14 INFO - ==> process 11228 launched child process 1328 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.27.1629263259\218627492" -childID 4 -isForBrowser -prefsHandle 4320 -prefMapHandle 4240 -prefsLen 9327 -prefMapSize 197180 -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 tab)
20:40:14 INFO - ==> process 11228 launched child process 10264 ("Z:\task_1555700576\build\application\firefox\firefox.exe" -contentproc --channel="11228.34.749569842\354085081" -childID 5 -isForBrowser -prefsHandle 4184 -prefMapHandle 4312 -prefsLen 9407 -prefMapSize 197180 -parentBuildID 20190419163916 -greomni "Z:\task_1555700576\build\application\firefox\omni.ja" -appomni "Z:\task_1555700576\build\application\firefox\browser\omni.ja" -appdir "Z:\task_1555700576\build\application\firefox\browser" - 11228 tab)
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 7552
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 4936
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 1328
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 7412
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 10264
20:40:14 INFO - zombiecheck | Checking for orphan process with PID: 3644
20:40:14 INFO - Stopping web server
20:40:14 INFO - Stopping web socket server
20:40:14 INFO - Stopping ssltunnel
20:40:14 WARNING - leakcheck | refcount logging is off, so leaks can't be detected!
20:40:14 INFO - runtests.py | Running tests: end.
20:40:14 INFO - Buffered messages finished
20:40:14 INFO - Running manifest: toolkit\components\pictureinpicture\tests\browser.ini
20:40:14 INFO - The following extra prefs will be set:
20:40:14 INFO - media.videocontrols.picture-in-picture.enabled=true
20:40:14 INFO - INFO | runtests.py | ASan using symbolizer at Z:\task_1555700576\build\application\firefox\llvm-symbolizer.exe
20:40:14 INFO - INFO | runtests.py | ASan running in default memory configuration
20:40:15 INFO - Z:\task_1555700576\build\tests\bin\pk12util.exe: PKCS12 IMPORT SUCCESSFUL
20:40:15 INFO - Increasing default timeout to 90 seconds
20:40:15 INFO - INFO | runtests.py | ASan using symbolizer at Z:\task_1555700576\build\application\firefox\llvm-symbolizer.exe
20:40:15 INFO - INFO | runtests.py | ASan running in default memory configuration
20:40:15 INFO - INFO | runtests.py | ASan using symbolizer at Z:\task_1555700576\build\application\firefox\llvm-symbolizer.exe
20:40:15 INFO - INFO | runtests.py | ASan running in default memory configuration
20:40:15 INFO - MochitestServer : launching [u'Z:\task_1555700576\build\tests\bin\xpcshell.exe', '-g', 'Z:\task_1555700576\build\application\firefox', '-f', 'Z:\task_1555700576\build\tests\bin\components\httpd.js', '-e', "const _PROFILE_PATH = 'c:\\users\\task_1555700576\\appdata\\local\\temp\\tmp2hpbgl.mozrunner'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', 'Z:\task_1555700576\build\tests\mochitest\server.js']
20:40:15 INFO - runtests.py | Server pid: 1968
20:40:15 INFO - runtests.py | Websocket server pid: 1476
20:40:15 INFO - INFO | runtests.py | ASan using symbolizer at Z:\task_1555700576\build\application\firefox\llvm-symbolizer.exe
20:40:15 INFO - INFO | runtests.py | ASan running in default memory configuration
20:40:15 INFO - runtests.py | SSL tunnel pid: 10196
20:40:15 INFO - runtests.py | Running with scheme: http
20:40:15 INFO - runtests.py | Running with e10s: True
20:40:15 INFO - runtests.py | Running with serviceworker_e10s: False
20:40:15 INFO - runtests.py | Running with socketprocess_e10s: False
20:40:15 INFO - runtests.py | Running tests: start.
20:40:15 INFO -
20:40:15 INFO - Application command: Z:\task_1555700576\build\application\firefox\firefox.exe -marionette --wait-for-browser -foreground -profile c:\users\task_1555700576\appdata\local\temp\tmp2hpbgl.mozrunner
20:40:15 INFO - runtests.py | Application pid: 10820
20:40:15 INFO - TEST-INFO | started process GECKO(10820)

Flags: needinfo?(jporter+bmo)

Bas: Do you have any idea what's going on here, or know of anyone who could help? It's not really clear to me what the issue is, aside from some sort of IPC error.

Flags: needinfo?(jporter+bmo) → needinfo?(bas)
Regressions: 1535459

Strange, I re-pushed this to try and everything's green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4108d557cac4c2b42797305b275414b58dcec2ed

The previous logs looked like it was failing 100% of the time, so I'm surprised that it passes here. Am I missing something? Did some other patch resolve this?

(In reply to Cristina Coroiu [:ccoroiu] from comment #95)

Here are the failures, green on the backout:
https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=windows%2C10%2Cx64%2Casan%2Cmochitests%2Ctest-windows10-64-asan%2Fopt-mochitest-browser-chrome-e10s-3%2Cm%28bc3%29&tochange=d023d7f7dd80b4c65ed85b75c3a2327391d4b535&fromchange=f0c2b4af4d6dd3d69455e184bbbd635a3de48e42&selectedJob=241499967

I've been having a hard time reproducing this, so I took a closer look at these logs. It seems that the backout commit has the same failures as in comment 93; however, they ended up in bc5 instead of bc3: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=windows%2C10%2Cx64%2Casan%2Cmochitests%2Ctest-windows10-64-asan%2Fopt-mochitest-browser-chrome-e10s&tochange=d023d7f7dd80b4c65ed85b75c3a2327391d4b535&fromchange=f0c2b4af4d6dd3d69455e184bbbd635a3de48e42&selectedJob=241517009

I'm not totally sure why these tests moved to a different batch (maybe just from the new tests in part 4 of my patch?), but if I'm reading the logs right, that test fails after my patch was backed out, as well as before (see commit 30b70a449280 in the link above, a few rows below my initial landing); the only difference is that my patch moved the failing tests to bc3 from bc5. Does that look right, or am I missing something?

Flags: needinfo?(ccoroiu)
Pushed by jporter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5c3bff8f45d6 part 1 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/a7fa3041bff0 part 2 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/100acc204e5e part 3 - Cancel content JS when navigating through history to prevent hangs r=smaug https://hg.mozilla.org/integration/autoland/rev/9a0ce3016f03 part 4 - Cancel content JS when navigating through history to prevent hangs r=smaug

67=wontfix because we don't need to uplift this to 67 Beta.

Depends on: 1548565
Depends on: 1548623
See Also: → 1550387
See Also: → 1532529
Depends on: 1562782
Regressions: 1574654
See Also: → 1592093
Flags: needinfo?(rjesup)
Blocks: 1164286
Flags: needinfo?(bas)
Performance Impact: --- → ?
Whiteboard: [geckoview:fxr:p1][qf:js:investigate] → [geckoview:fxr:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: