Open Bug 1891706 Opened 11 months ago Updated 6 days ago

[Android] TypeError: can't access property "remoteTab", this._browser.frameLoader is null

Categories

(GeckoView :: General, defect, P3)

Unspecified
Android
defect

Tracking

(firefox130 unaffected, firefox131 unaffected, firefox132 affected, firefox133 affected)

Tracking Status
firefox130 --- unaffected
firefox131 --- unaffected
firefox132 --- affected
firefox133 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, intermittent-testcase, regression, Whiteboard: [fxdroid][group1][geckoview][platform])

Attachments

(1 obsolete file)

Filed by: hskupin [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=454682980&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/eAY1LG17RVy_QazOvA4KSg/runs/0/artifacts/public/logs/live_backing.log


This failure starts to happen with the patches on bug 1522790 landed. It's the only wpt test that seems to show a higher occurrence of it.

I'll investigate once bug 1522790 is merged to mozilla-central.

:whimboo, since you are the author of the regressor, bug 1522790, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(hskupin)

Tests that failed because of this error are:

  • /html/cross-origin-embedder-policy/credentialless/fetch.https.window.html
  • /html/cross-origin-opener-policy/iframe-popup-same-origin-allow-popups-to-same-origin.https.html

Here the actual stacktrace of the JS code that is failing:
https://treeherder.mozilla.org/logviewer?job_id=454682980&repo=try&lineNumber=3510-3518

[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - marionette_driver.errors.UnknownException: TypeError: can't access property "remoteTab", this._browser.frameLoader is null
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - stacktrace:
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	maybeCancelContentJSExecution@resource://gre/modules/RemoteWebNavigation.sys.mjs:28:5
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	fixupAndLoadURIString@resource://gre/modules/RemoteWebNavigation.sys.mjs:149:37
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	fixupAndLoadURIString/<@chrome://global/content/elements/browser-custom-element.js:827:28
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	_wrapURIChangeCall@chrome://global/content/elements/browser-custom-element.js:755:9
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	fixupAndLoadURIString@chrome://global/content/elements/browser-custom-element.js:826:12
[task 2024-04-16T10:31:07.476Z] 10:31:07  WARNING - 	createNewTab@resource://gre/modules/GeckoViewTestUtils.sys.mjs:58:20

There is something wrong when the initial URL get loaded by opening a new tab/window in GeckoView. But not sure yet if the problem is in createNewTab or is related to a bug in fixupAndLoadURIString.

OS: Unspecified → Android
Summary: Intermittent /html/cross-origin-embedder-policy/credentialless/fetch.https.window.html | TypeError: can't access property "remoteTab", this._browser.frameLoader is null → [Android] TypeError: can't access property "remoteTab", this._browser.frameLoader is null

The causing change from bug 1522790 got backed out so it's no longer a regression. I'll take a look once other dependencies for bug 1522790 are done and I can reproduce this issue.

Blocks: 1522790
Flags: needinfo?(hskupin)
Keywords: regression
No longer regressed by: 1522790
No longer blocks: 1522790

This actually looks more like a bug in GeckoView. As such moving the bug into the hopefully correct component.

To get more details I've triggered a new try build for specifically those wpt jobs that failed so far which are the chunks 17, 18 and 20:
https://treeherder.mozilla.org/jobs?repo=try&revision=2082675af7344c7380b0d3b0896ef030fd7259f7

Component: web-platform-tests → General
Product: Testing → GeckoView

Some random ideas:

  • When the browser for the new tab is created, is there an initial load of about:blank?
  • If yes, we wait for browser-delayed-startup-finished but that doesn't actually mean that the initial page load is done? Maybe we need an extra listener for the load event so we do not run into a race when the frame loader is replaced?
  • Should we trigger a load to about:blank again when the tab was already created with about:blank?

Here is a test failure:
https://treeherder.mozilla.org/logviewer?job_id=475106192&repo=try&lineNumber=4915

And an excerpt from the Android log which looks kinda strange due to all the duplicated observer notifications. Especially I wonder why we get those domwindowopened notifications which usually do not appear for all the other tests:

09-20 07:38:24.958 25127 25142 I Gecko   : 1726817904958	Marionette	DEBUG	2 -> [0,351,"WebDriver:NewWindow",{"type":"tab","focus":true,"private":false}]
09-20 07:38:24.960 25127 25142 I Gecko   : 1726817904960	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.960 25198 25241 D EGL_emulation: eglMakeCurrent: 0x783f07a7c1c0: ver 3 0 (tinfo 0x783efd19cda0)
09-20 07:38:24.961 25198 25241 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
09-20 07:38:24.962 25127 25142 I Gecko   : 1726817904962	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.964 25127 25142 I Gecko   : 1726817904964	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.965 25127 25142 I Gecko   : 1726817904965	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.965 25127 25142 I Gecko   : 1726817904965	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.966 25127 25142 I Gecko   : 1726817904966	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.967 25127 25142 I Gecko   : 1726817904967	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.967 25127 25142 I Gecko   : 1726817904967	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.968 25127 25142 I Gecko   : 1726817904968	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.969 25127 25142 I Gecko   : 1726817904969	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.970 25127 25142 I Gecko   : 1726817904970	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.971 25127 25142 I Gecko   : 1726817904970	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.972 25127 25142 I Gecko   : 1726817904972	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-20 07:38:24.978 25127 25142 I Gecko   : 1726817904978	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.978 25127 25142 I Gecko   : 1726817904978	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.978 25127 25142 I Gecko   : 1726817904978	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.979 25127 25142 I Gecko   : 1726817904979	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.979 25127 25142 I Gecko   : 1726817904979	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.979 25127 25142 I Gecko   : 1726817904979	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.980 25127 25142 I Gecko   : 1726817904980	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.980 25127 25142 I Gecko   : 1726817904980	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.980 25127 25142 I Gecko   : 1726817904980	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.981 25127 25142 I Gecko   : 1726817904981	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.981 25127 25142 I Gecko   : 1726817904981	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.981 25127 25142 I Gecko   : 1726817904981	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.981 25127 25142 I Gecko   : 1726817904981	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.982 25127 25142 I Gecko   : 1726817904982	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.982 25127 25142 I Gecko   : 1726817904982	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.982 25127 25142 I Gecko   : 1726817904982	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.982 25127 25142 I Gecko   : 1726817904982	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.983 25127 25142 I Gecko   : 1726817904983	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.983 25127 25142 I Gecko   : 1726817904983	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.983 25127 25142 I Gecko   : 1726817904983	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.984 25127 25142 I Gecko   : 1726817904983	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.984 25127 25142 I Gecko   : 1726817904984	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.984 25127 25142 I Gecko   : 1726817904984	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.984 25127 25142 I Gecko   : 1726817904984	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.984 25127 25142 I Gecko   : 1726817904984	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.985 25127 25142 I Gecko   : 1726817904984	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.985 25127 25142 I Gecko   : 1726817904985	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.985 25127 25142 I Gecko   : 1726817904985	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.985 25127 25142 I Gecko   : 1726817904985	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.985 25127 25142 I Gecko   : 1726817904985	RemoteAgent	TRACE	Received observer notification domwindowopened
09-20 07:38:24.989 25127 25142 I Gecko   : console.log: "\n*** create new tab\n"
09-20 07:38:24.990 25127 25142 I GeckoConsole: 
09-20 07:38:24.990 25127 25142 I GeckoConsole: *** create new tab
09-20 07:38:24.990 25127 25142 I GeckoDump: 
09-20 07:38:24.990 25127 25142 I GeckoDump: **** Browser: destroyed=undefined docshell=null isRemote=true frameLoader=null

For a job that is passing I can see the number of domwindowopened notifications growing by 1 for each time we call WebDriver:NewWindow as well, which is installed on purpose for the prompt listener.

Bug 1920053 should take care of the above reported observer leak.

Maybe we need a similar fix here as done on bug 1766322?

Flags: needinfo?(ohall)

Resetting priority / severity so that the bug can triaged correctly by the GeckoView team.

Severity: S4 → --
Priority: P5 → --

If yes, we wait for browser-delayed-startup-finished but that doesn't actually mean that the initial page load is done? Maybe we need an extra listener for the load event so we do not run into a race when the frame loader is replaced?

This thought sticks out to me in particular. I saw something interesting in AppUiTestDelegate to try here. I'll send a try up with that added into GeckoViewTestUtils to see if it does anything interesting.

See Also: → 1884684

Olivia, when I checked the code of GeckoViewTestUtils.sys.mjs I can see the following call which already forwards the url to some other code:

      sessionId = await lazy.EventDispatcher.instance.sendRequestForResult({
        type: "GeckoView:Test:NewTab",
        url,
      });

Does that call already load the requested URL? If yes, why do we have to call window.browser.fixupAndLoadURIString() for the URL again?

What makes me wonder is this code in GeckoRuntime.java which seems to always load about:blank. Why do we pass url for the GeckoView:Test:NewTab type if we do not immediately load the final URL but still go through about:blank? Or do I miss something?

We probably have to delay resolving windowPromise until the document.readyState of the new window is complete (and that not for about:blank) or a registered event listener fires the load event.

Thanks for taking a look! I'll try to figure out more today and also add some links here to help gather info on the problem.

Does that call already load the requested URL? If yes, why do we have to call window.browser.fixupAndLoadURIString() for the URL again?

I'm not sure. I don't think so the more that I look at it. I'm looking deeper into the service worker code. It ends up calling onOpenWindow. For test runner onOpenWindow is implemented here. That then calls onNewSession, which the test runner version implements here. Comment "The URI is provided only for informational purposes. Do not call GeckoSession.load here." seems possibly relevant.

What is interesting is that in no implementation is url ever used meaningfully that was passed here. It really does just seem for informational purposes.

I added some logging and it does seem we never go through GeckoView:LoadUri. So, fixupAndLoadURIString really is what is loading the url.

What makes me wonder is this code in GeckoRuntime.java which seems to always load about:blank. Why do we pass url for the GeckoView:Test:NewTab type if we do not immediately load the final URL but still go through about:blank? Or do I miss something?

final String url = message.getString("url", "about:blank"); should pick url, if url is in the message, and if it is not fallback to the default of about:blank. What is interesting is url is essentially unused and just gets passed around. From what I can tell, we open a session, which I presume opens with about:blank, then fixupAndLoadURIString does the next navigation. I'll try to understand more what opening a session is exactly doing.

We probably have to delay resolving windowPromise until the document.readyState of the new window is complete (and that not for about:blank) or a registered event listener fires the load event.

I'll add some logs and look into this!

I also keep wondering why Agi said this in the past in the bug you linked. I think I did see a set active call in the path, I'll see if that leads anywhere too.

I also keep wondering why Agi said this in the past in the bug you linked. I think I did see a set active call in the path, I'll see if that leads anywhere too.

onNewSession does pass through the same function as in bug 1766322, which also had a missing frameLoader when we expected one. Bug 1766322's code path changed though in bug 1764998.

I don't know if that thought adds more value, but I think we can assume we don't necessarily have a frameloader after windowPromise. Still not sure why an about: page could be potentially different though.

I'll test waiting for document.readyState and see if I can find out anything new.

I pushed a try build to wait with loading the URL until the initial load is done. Lets see if that works:
https://treeherder.mozilla.org/jobs?repo=try&revision=e9edb77300726af492bca892d583bc1dc0b898b0

I see two test failures for the exact same test (/html/cross-origin-embedder-policy/credentialless/fetch.https.window.html) but only for the Android lite build. Olivia what is special for that build? Is it enough to add ac_add_options --enable-geckoview-lite to the mozconfig to build an artifact build? I tried but I still cannot get this test to fail locally in the emulator.

The added logging shows the following:

09-27 14:35:05.176 20729 20744 I Gecko   : ****** readyState: complete, initial: false
09-27 14:35:05.176 20729 20744 I Gecko   : 
09-27 14:35:05.185 20729 20744 I Gecko   : Marionette threw an error: TypeError: can't access property "remoteTab", this._browser.frameLoader is null

It means when we get the "browser-delayed-startup-finished" notification the document is already loaded and we do not show the initial about:blank. Maybe we could run with the MOZ_LOG="BCWebProgress:5" env set to see if there is some other navigation on going?

Olivia what is special for that build?

Should just be the same but without any extra dependencies. IIRC, it was made for people external to Mozilla who might want to use the GeckoView project, so they don't need all the extras. Bug 1709640 seems to collaborate that thought.

Is it enough to add ac_add_options --enable-geckoview-lite to the mozconfig to build an artifact build?

Ohh, not sure if I have ever tried that specifically. I have this in my mozconfig for a standard android artifact build, not sure if that needs it too:

ac_add_options --enable-project=mobile/android
ac_add_options --enable-artifact-builds
ac_add_options --target=aarch64
# # Write build artifacts to:
mk_add_options MOZ_OBJDIR=./objdir-frontend

I usually only do aritifact builds for Fenix specific work, so I haven't spent much time with it.

Maybe we could run with the MOZ_LOG="BCWebProgress:5" env set to see if there is some other navigation on going?

That sounds interesting!

I have a couple of random things I've sent tries up with, but nothing particularly interesting so far. I was looking at AppUiTestDelegate.sys.mjs at one point for ideas.

I still cannot get this test to fail locally in the emulator.

I think I saw it fail once, but might be misremembering or thinking of something else. I have this in my notes as what I was probably running, if it did fail locally:

./mach wpt testing/web-platform/tests/html/semantics/links/links-created-by-a-and-area-elements/target_blank_implicit_noopener_base.html --webdriver-arg=-vv --setpref="fission.disableSessionHistoryInParent=true"  --pause-on-unexpected --repeat-until-unexpected --disable-fission

The generated adb logcat is too huge so it stops around 5min before the failure actually happened. Given that we only see the failure for /html/cross-origin-embedder-policy/credentialless/fetch.https.window.html I'm going to push a new try for just /html/cross-origin-embedder-policy/credentialless/. Lets hope that works and reproduces the failure:

https://treeherder.mozilla.org/jobs?repo=try&revision=d6964fa238731cfc6bfe64e560fe5d424396246d

So here a failing case:

09-27 20:32:44.952  6895  6910 I Gecko   : 1727469164951	Marionette	DEBUG	2 -> [0,197,"WebDriver:NewWindow",{"type":"tab","focus":true,"private":false}]
09-27 20:32:44.955  6895  6910 I Gecko   : 
09-27 20:32:44.955  6895  6910 I Gecko   : 
09-27 20:32:44.955  6895  6910 I Gecko   : ****** href= readyState: complete, initial: false
09-27 20:32:44.955  6895  6910 I Gecko   : 
09-27 20:32:44.956  6895  6910 I Gecko   : 1727469164956	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-27 20:32:44.958  6895  6910 I Gecko   : 1727469164958	Marionette	TRACE	Received observer notification browser-delayed-startup-finished
09-27 20:32:44.962  6895  6910 I Gecko   : 1727469164962	RemoteAgent	TRACE	Received observer notification domwindowopened

The problem seems to be that domwindowopened is not send out first but delayed. That means due to some reason we see two very early browser-delayed-startup-finished notifications and want to load the URL but the actual window that we want to operate on hasn't been opened.

Usually it should look like:

09-27 20:32:59.939  7846  7861 I Gecko   : 1727469179939	Marionette	DEBUG	2 -> [0,52,"WebDriver:NewWindow",{"type":"tab","focus":true,"private":false}]
09-27 20:32:59.946  7846  7861 I Gecko   : 1727469179946	RemoteAgent	TRACE	Received observer notification domwindowopened
[..]
09-27 20:33:00.084  7846  7861 I Gecko   : ****** href=chrome://geckoview/content/geckoview.xhtml readyState: complete, initial: false
09-27 20:33:00.084  7846  7861 I Gecko   : 
09-27 20:33:00.085  7846  7861 I Gecko   : 1727469180084	Marionette	TRACE	Received observer notification browser-delayed-startup-finished

As the try build shows this is basically a problem for non-Fission and SHIP disabled. If SHIP or Fission is enabled it seems to work all fine at least for this specific test. So maybe this issue is somewhat related to BFCache?

Peter do you have an idea?

Maybe we could just mark this specific test, that fails a lot more compared to others, as [OK, Error] for non-fission, and Olivia can correct the expectation when updating the meta data over on bug 1919837 for SHIP? I pushed a try build for now and will create a patch if that is fine.

Flags: needinfo?(peterv)
Assignee: nobody → hskupin
Status: NEW → ASSIGNED

Actually the last web-platform downstream sync added the multiple status already so that this no longer appears as failure on autoland and mozilla-central.

Olivia, can you please adjust your meta data update patch on bug 1919837 to remove that multiple status for SHIP?

Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Depends on: 1921204
Flags: needinfo?(peterv)
Attachment #9428173 - Attachment is obsolete: true

Maybe Irene can give her feedback as well.

Flags: needinfo?(bugzeeeeee)

Set release status flags based on info from the regressing bug 1522790

Whiteboard: [fxdroid][group3]

Found multiple android wpt failures that have this failure line: TypeError: this._browser.frameLoader is null. Example: https://treeherder.mozilla.org/logviewer?job_id=477904800&repo=mozilla-beta&lineNumber=1994

Hi Henrik! This seems to be a variation of the failure line for which this bug was filed for. Can you please confirm that?
If this is a separate issue, I can file a separate bug for it.
Thank you!

Flags: needinfo?(hskupin)

Iulian, this is the exact same issue and all the failures should still be classified against this bug.

Olivia or owlish were you able to take a look into that?

Flags: needinfo?(hskupin)
Severity: -- → S3
Flags: needinfo?(bugzeeeeee)
Priority: -- → P3

Nika gave a suggestion on Matrix:

diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp
index d86d469134d73..66fa226699f15 100644
--- a/dom/base/nsFrameLoader.cpp
+++ b/dom/base/nsFrameLoader.cpp
@@ -3403,13 +3403,18 @@ void nsFrameLoader::ExitPrintPreview() {
 }
 
 already_AddRefed<nsIRemoteTab> nsFrameLoader::GetRemoteTab() {
-  if (!mRemoteBrowser) {
+  // remoteTab is only available in the parent process for remote tabs.
+  if (!XRE_IsParentProcess() || !IsRemoteFrame()) {
     return nullptr;
   }
-  if (auto* browserHost = mRemoteBrowser->AsBrowserHost()) {
-    return do_AddRef(browserHost);
+
+  // If we don't have a remote browser yet, ensure it's started.
+  if (!EnsureRemoteBrowser()) {
+    NS_WARNING("Couldn't create child process for iframe.");
+    return nullptr;
   }
-  return nullptr;
+
+  return do_AddRef(mRemoteBrowser->AsBrowserHost());
 }
 
 already_AddRefed<nsILoadContext> nsFrameLoader::GetLoadContext() {

As discussed with Olivia, she will test it out tomorrow by re-enabling the for now disabled test that nearly perma failed.

Testing:

  • Try to enable fetch.https.window.js for a consistent failure by removing ERROR - expected to fail to establish a baseline
  • Try with Nika's suggestion and fetch.https.window.js enabled to see if the change improves / solves the failure

Still running right now. Will revisit with results once testing completes.

Flags: needinfo?(ohall)

Just to add a note the test as mentioned runs in wpt17 for opt / shippable builds. I've retriggered these jobs for both try builds so that we can see the stability.

So it's still happening with the patch applied:
https://treeherder.mozilla.org/logviewer?job_id=481567413&repo=try&lineNumber=5043

Maybe you have another idea Nika?

Flags: needinfo?(nika)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #42)

So it's still happening with the patch applied:
https://treeherder.mozilla.org/logviewer?job_id=481567413&repo=try&lineNumber=5043

Maybe you have another idea Nika?

This crash looks a bit different than the one I remember seeing before (or maybe I wasn't looking at the correct stack originally - very believable). This new error is TypeError: can't access property "remoteTab", this._browser.frameLoader is null, which I don't believe would be improved by the changes I made in the suggestion.

I think the only way something like that could be happening is if the browser wasn't initialized properly yet, which seems surprising to me. Currently I'm not sure how this could happen without the tab being destroyed, but perhaps there's a way? Would be much easier to see what's going on with a debugger attached or similar, but I take it this is quite intermittent.

Flags: needinfo?(nika)

The JS stack is still exactly the same as what I reported in comment 2 on this bug. So it's not a new error - but matches as well the error of the bug's summary.

What do you mean with browser wasn't initialized properly yet? I assume it's about the remote browser and not GeckoView itself? In the past we had some issues with a new tab where the browser wasn't ready yet and we thought to have it fixed by using the browser-delayed-startup-finished observer (bug 1908844). Maybe that was not enough and something else is missing?

If you want we could push a try build with the Marionette trace logs enabled to at least get more insights of what's happening.

I've actually taken a look at the adb logcat and saw the following:

11-07 08:46:10.776 31316 31316 I AndroidRuntime: VM exiting with result code 0, cleanup skipped.
11-07 08:46:10.784  1658  1669 I ActivityManager: Process org.mozilla.geckoview.test_runner:tab8 (pid 31316) has died
11-07 08:46:10.784  1658  1669 D ActivityManager: cleanUpApplicationRecord -- 31316
11-07 08:46:10.798 30769 30813 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
11-07 08:46:10.799 30700 30721 I Gecko   : [Parent 30700, IPC I/O Parent] WARNING: waitid failed pid:31316 errno:10: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_posix.cc:314
11-07 08:46:10.816 30700 30715 I Gecko   : Marionette threw an error: TypeError: can't access property "remoteTab", this._browser.frameLoader is null

Could it be that the newly opened tab actually crashes and as result we get this failure? Olivia it might make sense to push a new try with remote.log.level=Trace enabled this time.

Flags: needinfo?(ohall)

Sure, added some more logging using variants.yml. (LMK if there is an easier or better way!) I looked through that section of the log. Nothing new stood out to me either. Hopefully the logging can shed some more light on the issue.

Try w/ logging and fetch.https.window.js to OK

Flags: needinfo?(ohall)

(In reply to Olivia Hall [:olivia] from comment #47)

Try w/ logging and fetch.https.window.js to OK

This try build doesn't show any failures of this type yet. I've retriggered wpt17 on Android opt to hopefully catch it.

Here is a new test failure with the logging enabled:
https://treeherder.mozilla.org/logviewer?job_id=483117936&repo=try&lineNumber=3982

When I search for remoteTab in the adb logcat I can always see a Process org.mozilla.geckoview.test_runner:tab0 (pid %id) has died some lines before this error appears. As such I assume the newly created tab most likely crashes? If that is the case why do we continue trying to use it and not fail? Do we maybe miss some await in the stack?

Olivia, we actually set the environment variable MOZ_CRASHREPORTER_SHUTDOWN=1 so that Firefox is shutting down itself in case of a tab is crashing. Does GeckoView not support this or do we miss to shutdown due to some other reason?

Flags: needinfo?(ohall)

I added some more logging in this try run to try and help answer some of these questions.

I almost think something is going wrong somewhere around here in GeckoViewTestUtils, like not waiting for some event or something.

I'm not sure about MOZ_CRASHREPORTER_SHUTDOWN=1. I put some logging around there to see at what point that occurs to try and find out something.

Those changes seem to have changed the way it fails. Now I can see a TypeError: can't access property "tab", this.window is null error:
https://treeherder.mozilla.org/logviewer?job_id=483686569&repo=try&lineNumber=5086

In the logcat there is even another failure that happens but only causes the error logged to the console:

11-21 19:20:23.032 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs updateSessionStoreFromTabListener
11-21 19:20:23.032 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs onTabStateUpdate 
11-21 19:20:23.032 25519 25534 E GeckoConsole: [JavaScript Error: "TypeError: can't access property "WindowEventDispatcher", win is null" {file: "resource://gre/modules/GeckoViewSessionStore.sys.mjs" line: 143}]
11-21 19:20:23.032 25519 25534 E GeckoConsole: onTabStateUpdate@resource://gre/modules/GeckoViewSessionStore.sys.mjs:143:5
11-21 19:20:23.032 25519 25534 E GeckoConsole: updateSessionStoreFromTabListener@resource://gre/modules/GeckoViewSessionStore.sys.mjs:240:10
11-21 19:20:23.032 25519 25534 E GeckoConsole: SSF_updateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:46:27
11-21 19:20:23.032 25519 25534 E GeckoConsole: UpdateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:16:46
11-21 19:20:23.062 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs updateSessionStoreFromTabListener
11-21 19:20:23.062 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs onTabStateUpdate 
11-21 19:20:23.063 25519 25534 E GeckoConsole: [JavaScript Error: "TypeError: can't access property "WindowEventDispatcher", win is null" {file: "resource://gre/modules/GeckoViewSessionStore.sys.mjs" line: 143}]
11-21 19:20:23.063 25519 25534 E GeckoConsole: onTabStateUpdate@resource://gre/modules/GeckoViewSessionStore.sys.mjs:143:5
11-21 19:20:23.063 25519 25534 E GeckoConsole: updateSessionStoreFromTabListener@resource://gre/modules/GeckoViewSessionStore.sys.mjs:240:10
11-21 19:20:23.063 25519 25534 E GeckoConsole: SSF_updateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:46:27
11-21 19:20:23.063 25519 25534 E GeckoConsole: UpdateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:16:46
11-21 19:20:23.146 25589 25631 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
11-21 19:20:23.322 25589 25631 D EGL_emulation: eglMakeCurrent: 0x784d4b87e560: ver 3 0 (tinfo 0x784d4b9e8fa0)
11-21 19:20:23.364 25589 25631 D EGL_emulation: eglMakeCurrent: 0x784d4b87e560: ver 3 0 (tinfo 0x784d4b9e8fa0)
11-21 19:20:23.364 25589 25631 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
11-21 19:20:23.373 25589 25631 D EGL_emulation: eglMakeCurrent: 0x784d4b87e560: ver 3 0 (tinfo 0x784d4b9e8fa0)
11-21 19:20:23.374 25589 25631 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
11-21 19:20:23.407 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs updateSessionStoreFromTabListener
11-21 19:20:23.407 25519 25534 I Gecko   : GV - GeckoViewSessionStore.sys.mjs onTabStateUpdate 
11-21 19:20:23.408 25519 25534 E GeckoConsole: [JavaScript Error: "TypeError: can't access property "WindowEventDispatcher", win is null" {file: "resource://gre/modules/GeckoViewSessionStore.sys.mjs" line: 143}]
11-21 19:20:23.408 25519 25534 E GeckoConsole: onTabStateUpdate@resource://gre/modules/GeckoViewSessionStore.sys.mjs:143:5
11-21 19:20:23.408 25519 25534 E GeckoConsole: updateSessionStoreFromTabListener@resource://gre/modules/GeckoViewSessionStore.sys.mjs:240:10
11-21 19:20:23.408 25519 25534 E GeckoConsole: SSF_updateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:46:27
11-21 19:20:23.408 25519 25534 E GeckoConsole: UpdateSessionStore@resource://gre/modules/SessionStoreFunctions.sys.mjs:16:46
11-21 19:20:23.446 25589 25631 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
11-21 19:20:23.447 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - createNewTab
11-21 19:20:23.447 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - openingObserver - add
11-21 19:20:23.447 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - GeckoView:Test:NewTab
11-21 19:20:23.449 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - openingObserver - remove
11-21 19:20:23.457 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - sessionId: d00cfb09e2724ede8c1a408108219eb6
11-21 19:20:23.457 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - waiting for window
11-21 19:20:23.457 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - window waited for
11-21 19:20:23.458 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - begin fixupAndLoadURIString
11-21 19:20:23.458 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - !window.browser.frameLoader
11-21 19:20:23.458 25519 25534 I GeckoDump: GV - browser-custom-element.js - fixupAndLoadURIString
11-21 19:20:23.458 25519 25534 I GeckoDump: GV - browser-custom-element.js - _fixLoadParamsToLoadURIOptions
11-21 19:20:23.458 25519 25534 I GeckoDump: GV - browser-custom-element.js - _wrapURIChangeCall
11-21 19:20:23.458 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs fixupAndLoadURIString didn't work.
11-21 19:20:23.459 25519 25534 I Gecko   : GV - GeckoViewTestUtils.sys.mjs - return window.tab

Turned on some more logging and trying a couple of different things.

  • Try using "browser-idle-startup-tasks-finished" instead of "browser-delayed-startup-finished". (Not sure if this a good idea or not, but was curious about it.)
  • Try waiting for "XULFrameLoaderCreated" when it isn't available. (Not sure if that was the right item to wait for or not.)

I compared the logs some to a working run v. a non-working run. The nofis run also removes the WindowEventDispatcher error, so that just seems like log noise on SHIP.

It looks like there is no difference. What do you think about the tab died line in the adb logcat logs? Is the tab really crashing right after it gets opened? Or is that a red herring? If not I wonder about getting a crash report which doesn't seem to work for generic wpt tests then.

Olivia, I wonder if the added API over on bug 1825910 might be a better choice these days?

What do you think about the tab died line in the adb logcat logs? Is the tab really crashing right after it gets opened? Or is that a red herring?

The tab died error mostly feels like a red herring to me, but I'm not confident enough to rule it out.

I added logging here in comment 52's run as TestRunnerActivity.java statements and it never got that far. (Maybe it crashed another way or took another path?)

The tab pid has died log does seem to appear in good runs pretty frequently, so I almost just think it is a thing that happens when the test ends or changes. Although, I do find 678 instances of has died on a bad run and 671 instances on a good run. Not sure if that is significant or not.

The new API in bug 1825910 could be very helpful. I'll take a closer look and make a try substituting it in.

The new API in bug 1825910 could be very helpful. I'll take a closer look and make a try substituting it in.

Update: I'm still testing this idea when I have time. It has been a bit tricky getting the new API to substitute and run.

Hi Makoto,

I've been looking at substituting this openWindow API from bug 1825910 as a replacement for createNewTab. Is there a way to create aOpenWindowInfo in the JS side? Does it make sense for me to try and use this API that way? I haven't had much luck so far, but I might just be going about it the wrong way too. I'm only trying different things out to see if it solves the bug, so just looking for advice if this is a workable path to pursue.

I've been trying some random things on this bug, but none have worked out. Substituting to this new API was an idea to try that it might fix whatever the underlying issue is. The bug doesn't really reproduce locally or at least not well enough to be helpful.

Please let me know if you have any suggestions to try or if the new API could be used this way.

Flags: needinfo?(ohall) → needinfo?(m_kato)

(In reply to Olivia Hall [:olivia] from comment #64)

Hi Makoto,

I've been looking at substituting this openWindow API from bug 1825910 as a replacement for createNewTab. Is there a way to create aOpenWindowInfo in the JS side? Does it make sense for me to try and use this API that way? I haven't had much luck so far, but I might just be going about it the wrong way too. I'm only trying different things out to see if it solves the bug, so just looking for advice if this is a workable path to pursue.

nsIOpenWindowInfo is scriptable, so we can use it on JS side. But unfortunately, it cannot use browsingContextReadyCallback member since it isn't scriptable/xpcom. When I worked on bug 1777509 (See https://phabricator.services.mozilla.com/D197131), I wrote a patch to become that browsingContextReadyCallback is scriptable at first. But Nika didn't want it then rejected it.

I've been trying some random things on this bug, but none have worked out. Substituting to this new API was an idea to try that it might fix whatever the underlying issue is. The bug doesn't really reproduce locally or at least not well enough to be helpful.

Please let me know if you have any suggestions to try or if the new API could be used this way.

If this issue is race condition whether frame loader is ready or not since browsing context isn't completely ready,
I guess that you should discuss with Nika whether we can change that browsingContextReadyCallback is scriptable, so get something new idea.

Flags: needinfo?(m_kato)

Or we create small XPCOM interface by C++ that for waiting browsing context, like bug 1825910.

Whiteboard: [fxdroid][group3] → [fxdroid][group1][geckoview][platform]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: