Intermittent Wd Android TEST-UNEXPECTED-FAIL | <random> - webdriver.error.TimeoutException: timeout (500): Navigation timed out after xxxx ms
Categories
(GeckoView :: General, defect, P5)
Tracking
(firefox107 unaffected, firefox108 affected, firefox109 affected)
Tracking | Status | |
---|---|---|
firefox107 | --- | unaffected |
firefox108 | --- | affected |
firefox109 | --- | affected |
People
(Reporter: whimboo, Unassigned)
References
(Blocks 3 open bugs)
Details
(Keywords: intermittent-failure, Whiteboard: [geckoview:m111][geckoview:m114?])
As it looks like in some cases the navigation to a URL will hang in GeckoView:
https://treeherder.mozilla.org/logviewer?job_id=398541373&repo=try&lineNumber=19235
[task 2022-12-03T11:44:21.721Z] 11:44:21 INFO - PID 2786 | 1670067861720 webdriver::server DEBUG -> POST /session/ba706cf4-3899-485d-9834-25c82eff465b/url {"url": "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}
[task 2022-12-03T11:45:20.756Z] 11:45:20 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/forward/forward.py | expected OK
Here the data from the Android logcat:
12-03 11:44:21.528 17677 17692 I Gecko : 1670067861528 Marionette DEBUG 0 -> [0,5,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}]
12-03 11:44:21.538 17677 17692 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
12-03 11:44:21.557 17677 17692 I Gecko : 1670067861557 Marionette TRACE [4] Received event beforeunload for about:blank
12-03 11:44:21.727 17806 17820 D GeckoViewContent[C]: receiveMessage: CollectSessionState
12-03 11:44:21.769 17677 17692 D GeckoViewXUL: WillChangeBrowserRemoteness
12-03 11:44:21.775 17806 17820 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
12-03 11:44:21.777 17806 17820 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
12-03 11:44:21.778 17806 17820 D GeckoViewAutoFill[C]: handleEvent: pagehide
12-03 11:44:21.778 17677 17692 D GeckoViewXUL: DidChangeBrowserRemoteness
12-03 11:44:21.778 17677 17692 D GeckoViewNavigation: sessionContextId=null
12-03 11:44:21.783 17677 17692 I Gecko : [Parent 17677, Main Thread] WARNING: '!window', file /builds/worker/checkouts/gecko/mobile/android/components/geckoview/GeckoViewHistory.cpp:266
12-03 11:44:21.793 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-03 11:44:21.800 17806 17820 D GeckoViewContent[C]: receiveMessage: RestoreHistoryAndNavigate
12-03 11:44:21.820 17677 17692 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
12-03 11:44:21.821 17677 17692 D GeckoViewActorParent: receiveMessage window destroyed DispatcherMessage GeckoView:DismissClipboardPermissionRequest
12-03 11:44:21.822 17677 17692 I Gecko : 1670067861822 Marionette TRACE [4] Received event pagehide for about:blank
12-03 11:44:21.824 17677 17692 D GeckoViewAutoFill: receiveMessage Clear
12-03 11:44:21.841 17677 17692 D GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"SessionStateAggregator"}
12-03 11:44:21.848 1658 1731 E TaskPersister: File error accessing recents directory (directory doesn't exist?).
12-03 11:44:21.856 17677 17692 I Gecko : [Parent 17677, Main Thread] WARNING: 'aOwner->IsDiscarded()', file /builds/worker/workspace/obj-build/dist/include/mozilla/dom/SyncedContextInlines.h:95
12-03 11:44:21.861 17806 17820 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
12-03 11:44:21.866 17806 17820 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
12-03 11:44:21.868 17806 17820 D GeckoViewAutoFill[C]: handleEvent: pagehide
12-03 11:44:21.874 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-03 11:44:21.891 17806 17820 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin" line: 0}]
12-03 11:44:21.895 17677 17692 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
12-03 11:44:21.895 17677 17692 W GeckoEventDispatcher: No listener for GeckoView:DismissClipboardPermissionRequest
12-03 11:44:21.899 17677 17692 D GeckoViewAutoFill: receiveMessage Clear
12-03 11:44:21.916 17806 17820 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-03 11:44:21.917 17677 17692 D GeckoViewNavigation: onLocationChange
12-03 11:44:21.918 17677 17677 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin
12-03 11:44:21.934 17677 17692 D GeckoViewContent: handleEvent: pagetitlechanged
12-03 11:44:21.958 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-03 11:44:22.052 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-03 11:44:22.095 17806 17820 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-03 11:44:22.099 17677 17692 D GeckoViewNavigation: onLocationChange
12-03 11:44:22.100 17806 17820 W Web Content: [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-03 11:44:22.100 17677 17677 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html
12-03 11:44:22.114 17806 17820 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-03 11:44:22.120 17806 17820 D GeckoViewContent[C]: handleEvent: pageshow
12-03 11:44:22.121 17806 17820 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-03 11:44:22.127 17806 17820 D GeckoViewContent[C]: handleEvent: pageshow
12-03 11:44:22.129 17806 17820 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-03 11:44:22.144 17677 17692 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
12-03 11:44:22.145 17677 17692 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
12-03 11:44:22.184 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-03 11:44:22.190 17677 17692 I Gecko : 1670067862190 Marionette TRACE Received observer notification browser-delayed-startup-finished
12-03 11:44:22.355 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-03 11:44:22.362 17806 17820 D GeckoViewContentDelegateChild[C]: handleEvent: MozFirstContentfulPaint
12-03 11:44:22.363 17677 17692 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
12-03 11:44:22.386 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-03 11:44:22.449 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-03 11:44:22.617 17739 17763 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-03 11:44:22.872 17677 17692 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
12-03 11:44:25.948 17677 17692 D GeckoIdleService: Get idle time: time since reset 5010 msec
12-03 11:44:25.948 17677 17692 D GeckoIdleService: Idle timer callback: current idle time 5010 msec
12-03 11:44:25.948 17677 17692 D GeckoIdleService: next timeout 4294967289989 msec from now
12-03 11:44:25.948 17677 17692 D GeckoIdleService: SetTimerExpiryIfBefore: next timeout 4294967289989 msec from now
12-03 11:44:25.948 17677 17692 D GeckoIdleService: reset timer expiry to 4294967289999 msec from now
12-03 11:44:25.949 17677 17692 D GeckoIdleService: Idle timer callback: tell observer 0x7e0e5770ec68 user is idle
12-03 11:45:01.010 17854 17854 W art : Unexpected CPU variant for X86 using defaults: x86_64
12-03 11:45:01.015 1658 1672 I ActivityManager: Start proc 17854:com.android.deskclock/u0a32 for broadcast com.android.deskclock/com.android.alarmclock.DigitalAppWidgetProvider
12-03 11:45:01.036 17854 17854 W System : ClassLoader referenced unknown path: /system/app/DeskClock/lib/x86_64
12-03 11:45:01.055 1658 1670 I ActivityManager: Killing 2161:android.process.media/u0a10 (adj 906): empty #17
12-03 11:45:01.059 1658 1916 D ActivityManager: cleanUpApplicationRecord -- 2161
12-03 11:45:19.802 17677 17733 W rkv::backend::impl_safe::environment: `load_ratio()` is irrelevant for this storage backend.
12-03 11:45:20.564 17677 17692 I Gecko : 1670067920564 Marionette DEBUG Closed connection 0
12-03 11:45:21.346 17886 17886 D AndroidRuntime: >>>>>> START com.android.internal.os.RuntimeInit uid 0 <<<<<<
12-03 11:45:21.348 17886 17886 D AndroidRuntime: CheckJNI is OFF
12-03 11:45:21.370 17886 17886 W art : Unexpected CPU variant for X86 using defaults: x86_64
12-03 11:45:21.374 17886 17886 D ICU : No timezone override file found: /data/misc/zoneinfo/current/icu/icu_tzdata.dat
12-03 11:45:21.390 17886 17886 E memtrack: Couldn't load memtrack module (No such file or directory)
12-03 11:45:21.390 17886 17886 E android.os.Debug: failed to load memtrack module: -2
12-03 11:45:21.391 17886 17886 I Radio-JNI: register_android_hardware_Radio DONE
12-03 11:45:21.397 17886 17886 D AndroidRuntime: Calling main entry com.android.commands.pm.Pm
12-03 11:45:21.408 17886 17886 I art : System.exit called, status: 0
12-03 11:45:21.408 17886 17886 I AndroidRuntime: VM exiting with result code 0.
12-03 11:45:21.424 17886 17894 W MessageQueue: Handler (android.os.Handler) {87428ef} sending message to a Handler on a dead thread
12-03 11:45:21.424 17886 17894 W MessageQueue: java.lang.IllegalStateException: Handler (android.os.Handler) {87428ef} sending message to a Handler on a dead thread
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.MessageQueue.enqueueMessage(MessageQueue.java:543)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.Handler.enqueueMessage(Handler.java:643)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.Handler.sendMessageAtTime(Handler.java:612)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.Handler.sendMessageDelayed(Handler.java:582)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.Handler.post(Handler.java:338)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.ResultReceiver$MyResultReceiver.send(ResultReceiver.java:57)
12-03 11:45:21.424 17886 17894 W MessageQueue: at com.android.internal.os.IResultReceiver$Stub.onTransact(IResultReceiver.java:58)
12-03 11:45:21.424 17886 17894 W MessageQueue: at android.os.Binder.execTransact(Binder.java:565)
12-03 11:45:21.551 17902 17902 D AndroidRuntime: >>>>>> START com.android.internal.os.RuntimeInit uid 0 <<<<<<
12-03 11:45:21.552 17902 17902 D AndroidRuntime: CheckJNI is OFF
12-03 11:45:21.573 17902 17902 W art : Unexpected CPU variant for X86 using defaults: x86_64
12-03 11:45:21.576 17902 17902 D ICU : No timezone override file found: /data/misc/zoneinfo/current/icu/icu_tzdata.dat
12-03 11:45:21.592 17902 17902 E memtrack: Couldn't load memtrack module (No such file or directory)
12-03 11:45:21.592 17902 17902 E android.os.Debug: failed to load memtrack module: -2
12-03 11:45:21.593 17902 17902 I Radio-JNI: register_android_hardware_Radio DONE
12-03 11:45:21.598 17902 17902 D AndroidRuntime: Calling main entry com.android.commands.pm.Pm
12-03 11:45:21.603 1658 2085 I ActivityManager: Force stopping org.mozilla.geckoview.test_runner appid=10062 user=0: clear data
12-03 11:45:21.603 1658 2085 I ActivityManager: Killing 17739:org.mozilla.geckoview.test_runner:gpu/u0a62 (adj 0): stop org.mozilla.geckoview.test_runner
12-03 11:45:21.604 1658 2085 D ActivityManager: cleanUpApplicationRecord -- 17739
12-03 11:45:21.604 1658 2085 W ActivityManager: Scheduling restart of crashed service org.mozilla.geckoview.test_runner/org.mozilla.gecko.process.GeckoChildProcessServices$gpu in 1000ms
12-03 11:45:21.605 1658 2085 I ActivityManager: Killing 17677:org.mozilla.geckoview.test_runner/u0a62 (adj 0): stop org.mozilla.geckoview.test_runner
12-03 11:45:21.605 1658 2085 D ActivityManager: cleanUpApplicationRecord -- 17677
12-03 11:45:21.607 17806 17806 I ServiceChildProcess: Destroying GeckoServiceChildProcess
12-03 11:45:21.607 1658 2085 W ActivityManager: Force removing ActivityRecord{39f240a u0 org.mozilla.geckoview.test_runner/.App t2}: app died, no saved state
As it looks like this only affects debug-isolated-process
(Fission) jobs in CI and something is killing the GeckoView process during navigation.
Chris could someone from the Android team could have a look at this failure?
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
Actually it is always the same URL where the hang/kill of the process seems to happens and it's not only related to debug-isolated-process but also opt. I'll asked the sheriffs to classify upcoming failures against this bug.
Reporter | ||
Comment 2•2 years ago
|
||
And it's the forward.py test that is affected here only.
Comment 3•2 years ago
|
||
This is failing only on mozilla-beta and on try on beta-sims: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-11-28&endday=2022-12-05&tree=all&failurehash=all&bug=1804027
Possibly started here.
Reporter | ||
Comment 4•2 years ago
|
||
I've triggered backfills and there is this potential changeset which might have regressed this on November 4th.
Reporter | ||
Comment 5•2 years ago
|
||
As it looks like with this changeset the frequency of the failure has been drastically increased.
Nick, could have bug 1799002 an impact here? If yes, why does it only happen on beta and beta-sim? Maybe there are other settings present for Nightly?
Reporter | ||
Comment 6•2 years ago
•
|
||
Actually when checking the backfills more closely with the following list of changes the mozilla-central to mozilla-beta merge seems to have caused the issue:
I'm not sure how to figure out which of the changes could have potentially regressed that, but I assume maybe a feature that is only available on Nightly builds overshadows this crash?
Reporter | ||
Comment 7•2 years ago
|
||
Given that this only affects the back and forward tests I wonder if this might be related to the Session history in parent feature. Not sure what the status is here for Android.
Reporter | ||
Comment 8•2 years ago
|
||
CC'ing Olli and Peter who both work more or less on session history in parent.
Comment 9•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #5)
As it looks like with this changeset the frequency of the failure has been drastically increased.
Nick, could have bug 1799002 an impact here? If yes, why does it only happen on beta and beta-sim? Maybe there are other settings present for Nightly?
Yes, it's possible that Bug 17990002 could have impact, but not in a satisfying manner: that ticket is about updating the Kotlin compiler and some related libraries. So it looks like they are impacting this test in some unknown way, but who can say what that really means.
The status is that Bug 1799002 backed out the upgrade. I'm committed to getting that upgrade landed in the current train (so, into Nightly 109), which hopefully will return the intermittent frequency to its previous lower rate.
Reporter | ||
Comment 10•2 years ago
|
||
Hi Nick. By the time when I wrote this comment I was wrong. Later comments state that it's most likely not the regressor. As best have a look at the following changesets with the backfill jobs:
The merge commit as referenced in comment 6 seems to have increased the failure rate. But it's basically impossible to reduce that to a single bug.
One option could be to have a look at early central to beta simulations on try and check when this got started. Aryx, who is generally running the beta simulations so that we could get a list of results for each of the 108 simulations?
Reporter | ||
Updated•2 years ago
|
Comment 11•2 years ago
|
||
This document contains the central-as-beta simulations for Gecko version 108. Because they are Try pushes, their artifacts - including the one required for retriggers - expired if they are older than 4 weeks. The oldest push which reran the tasks had the test fail 1/9 runs.
Comment 12•2 years ago
|
||
Irene, do you have any insight into this bug? Apparently, some WPTs are failing only in debug-isolated-process (Fission) jobs and only in mozilla-beta and central-as-beta simulations, not debug-isolated-process (Fission) jobs in regular mozilla-central builds?
Reporter | ||
Comment 13•2 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #11)
This document contains the central-as-beta simulations for Gecko version 108. Because they are Try pushes, their artifacts - including the one required for retriggers - expired if they are older than 4 weeks. The oldest push which reran the tasks had the test fail 1/9 runs.
Thanks a lot! Hm, this oldest push
has the revision 1ca248f03a7cae5f49e0c1fc66a97eac9c01c1a4
but I'm not able to find that in the before mentioned document. Nevertheless it is from November 9th, and checking jobs on the previous simulation from November 8th everything is fine. As such I would assume the regression got it between those two days.
That would include the following changes on mozilla-central:
(In reply to Chris Peterson [:cpeterson] from comment #12)
Irene, do you have any insight into this bug? Apparently, some WPTs are failing only in debug-isolated-process (Fission) jobs and only in mozilla-beta and central-as-beta simulations, not debug-isolated-process (Fission) jobs in regular mozilla-central builds?
Note that it's not limited to debug-isolated-process
for beta builds, it's affecting regular builds as well. Now after the recent beta to release merge the failure also started to happen on the mozilla-release branch.
Reporter | ||
Comment 14•2 years ago
|
||
I'm going to move this bug to the GeckoView product for now given that this is an unexpected close (crash) of the application and not related to the specific webdriver test.
Comment 15•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #13)
Thanks a lot! Hm, this
oldest push
has the revision1ca248f03a7cae5f49e0c1fc66a97eac9c01c1a4
but I'm not able to find that in the before mentioned document.
The document mentions the mozilla-central revisions which have been used as base for the simulations. At least the config change for the simulation is applied as a commit on top.
Reporter | ||
Comment 16•2 years ago
|
||
There was the first beta-sim for 109b1 and it still shows the failure.
A successful load for this page (from forward.py of the same CI job, which was actually passing that time) looks like the following:
12-07 14:23:39.775 17932 17947 I Gecko : 1670423019775 Marionette TRACE [5] Received event beforeunload for about:blank
12-07 14:23:39.784 18063 18078 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
12-07 14:23:39.793 18063 18078 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
12-07 14:23:39.794 18063 18078 D GeckoViewAutoFill[C]: handleEvent: pagehide
12-07 14:23:39.815 18063 18078 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin" line: 0}]
12-07 14:23:39.819 17932 17947 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
12-07 14:23:39.819 17932 17947 W GeckoEventDispatcher: No listener for GeckoView:DismissClipboardPermissionRequest
12-07 14:23:39.826 17932 17947 I Gecko : 1670423019826 Marionette TRACE [5] Received event pagehide for about:blank
12-07 14:23:39.827 17932 17947 D GeckoViewAutoFill: receiveMessage Clear
12-07 14:23:39.829 18063 18078 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-07 14:23:39.838 17932 17947 D GeckoViewNavigation: onLocationChange
12-07 14:23:39.839 17932 17932 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin
12-07 14:23:39.867 17993 18017 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-07 14:23:39.875 17932 17947 D GeckoViewContent: handleEvent: pagetitlechanged
12-07 14:23:39.882 17932 17947 I Gecko : 1670423019882 Marionette TRACE [5] Received event DOMContentLoaded for https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin
12-07 14:23:39.939 17993 18017 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-07 14:23:39.940 17932 17947 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
12-07 14:23:40.029 18063 18078 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-07 14:23:40.033 18063 18078 W Web Content: [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-07 14:23:40.045 18063 18078 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-07 14:23:40.046 17932 17947 D GeckoViewNavigation: onLocationChange
12-07 14:23:40.047 17932 17932 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html
12-07 14:23:40.052 18063 18078 D GeckoViewContent[C]: handleEvent: pageshow
12-07 14:23:40.053 18063 18078 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-07 14:23:40.057 18063 18078 D GeckoViewContent[C]: handleEvent: pageshow
12-07 14:23:40.059 18063 18078 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-07 14:23:40.060 17993 18017 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-07 14:23:40.067 18063 18078 D GeckoViewContentDelegateChild[C]: handleEvent: MozFirstContentfulPaint
12-07 14:23:40.071 17932 17947 I Gecko : 1670423020071 Marionette TRACE [5] Received event pageshow for https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin
12-07 14:23:40.072 17932 17947 I Gecko : 1670423020072 Marionette DEBUG 0 <- [1,5,null,{"value":null}]
And this is how it looks like for a failing case:
12-07 14:17:31.055 4392 4407 I Gecko : 1670422651055 Marionette TRACE [4] Received event beforeunload for about:blank
12-07 14:17:31.156 4525 4540 D GeckoViewContent[C]: receiveMessage: CollectSessionState
12-07 14:17:31.169 4392 4407 D GeckoViewXUL: WillChangeBrowserRemoteness
12-07 14:17:31.173 4525 4540 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
12-07 14:17:31.174 4392 4407 D GeckoViewXUL: DidChangeBrowserRemoteness
12-07 14:17:31.174 4392 4407 D GeckoViewNavigation: sessionContextId=null
12-07 14:17:31.174 4525 4540 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
12-07 14:17:31.175 4525 4540 D GeckoViewAutoFill[C]: handleEvent: pagehide
12-07 14:17:31.182 4454 4477 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-07 14:17:31.189 4525 4540 D GeckoViewContent[C]: receiveMessage: RestoreHistoryAndNavigate
12-07 14:17:31.193 4392 4407 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
12-07 14:17:31.194 4454 4477 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-07 14:17:31.194 4392 4407 D GeckoViewActorParent: receiveMessage window destroyed DispatcherMessage GeckoView:DismissClipboardPermissionRequest
12-07 14:17:31.195 4392 4407 I Gecko : 1670422651195 Marionette TRACE [4] Received event pagehide for about:blank
12-07 14:17:31.196 4392 4407 D GeckoViewAutoFill: receiveMessage Clear
12-07 14:17:31.216 4454 4477 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-07 14:17:31.216 4392 4407 D GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"SessionStateAggregator"}
12-07 14:17:31.246 4525 4540 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
12-07 14:17:31.250 4525 4540 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
12-07 14:17:31.251 4525 4540 D GeckoViewAutoFill[C]: handleEvent: pagehide
12-07 14:17:31.265 4525 4540 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin" line: 0}]
12-07 14:17:31.270 4392 4407 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
12-07 14:17:31.270 4392 4407 W GeckoEventDispatcher: No listener for GeckoView:DismissClipboardPermissionRequest
12-07 14:17:31.277 4392 4407 D GeckoViewAutoFill: receiveMessage Clear
12-07 14:17:31.279 4525 4540 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-07 14:17:31.283 4392 4407 D GeckoViewNavigation: onLocationChange
12-07 14:17:31.285 4392 4392 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin
12-07 14:17:31.286 4392 4407 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
12-07 14:17:31.306 4392 4407 D GeckoViewContent: handleEvent: pagetitlechanged
12-07 14:17:31.317 4454 4477 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
12-07 14:17:31.339 4392 4407 I Gecko : 1670422651339 Marionette TRACE Received observer notification browser-delayed-startup-finished
12-07 14:17:31.362 4525 4540 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-07 14:17:31.366 4392 4407 D GeckoViewNavigation: onLocationChange
12-07 14:17:31.367 4525 4540 W Web Content: [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
12-07 14:17:31.368 4392 4392 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html
12-07 14:17:31.375 4525 4540 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
12-07 14:17:31.381 4525 4540 D GeckoViewContent[C]: handleEvent: pageshow
12-07 14:17:31.383 4525 4540 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-07 14:17:31.384 4454 4477 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
12-07 14:17:31.386 4525 4540 D GeckoViewContent[C]: handleEvent: pageshow
12-07 14:17:31.387 4525 4540 D GeckoViewAutoFill[C]: handleEvent: pageshow
12-07 14:17:31.394 4525 4540 D GeckoViewContentDelegateChild[C]: handleEvent: MozFirstContentfulPaint
[...]
12-07 14:17:35.581 4392 4407 D GeckoIdleService: Idle timer callback: tell observer 0x7db276fb2188 user is idle
12-07 14:19:00.630 4392 4407 I Gecko : 1670422740629 Marionette DEBUG Closed connection 0
This is causing the test to timeout and as such the wpt harness kills Firefox. The question is why it actually hangs. As it looks like none of the DOMContentLoaded
nor pageshow
events seem to make it to the parent process where Marionette is listening for them via a webProgress listener. The one difference that I can see is the heavily delayed browser-delayed-startup-finished
observer notification, which in case of a failure is emitted now in the middle of the active page load. In case of a working test this notification is send out after the tab has been created and before the initial navigation for that tab gets started, and in a failing case it happens later.
Reporter | ||
Comment 17•2 years ago
|
||
When opening a tab in Marionette we make use of the createNewTab
helper in:
https://searchfox.org/mozilla-central/source/mobile/android/modules/geckoview/GeckoViewTestUtils.jsm#23
Not sure if this helper maybe has to wait for browser-delayed-startup-finished
itself? But nevertheless it would be good to know why this notification is delayed that much.
Comment 18•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #0)
As it looks like this only affects
debug-isolated-process
(Fission) jobs in CI and something is killing the GeckoView process during navigation.
The test that is intermittently failing is W-nofis(Wd3) (so not Fission) and currently affects Android debug-isolated-process and regular shippable builds on Beta:
https://treeherder.mozilla.org/jobs?repo=mozilla-beta&tier=1%2C2%2C3&searchStr=Android%2Cwd3
This test passes consistently on mozilla-central, but fails more often than it passes on mozilla-beta. Why might this particular Wd3 test time out on Beta but not Nightly?
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 20•2 years ago
|
||
Chris, who from the GeckoView team could help me with the question in comment 16? Thanks.
Updated•2 years ago
|
Comment 21•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #20)
Chris, who from the GeckoView team could help me with the question in comment 16? Thanks.
Olivia is going to take a look in January.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Reporter | ||
Comment 25•2 years ago
|
||
This also happens for normal navigations and not only for forward and back. As such making it a generic Wdspec bug.
Comment 26•2 years ago
|
||
I plan on looking at this deeper soon. I'm not quite sure yet, but I have a vague sense that some of the discussion over on bug 1797558 could be relevant to this bug too. The GeckoViewTestUtils messaging to open tabs was modeled after the web extensions way of opening tabs, so they follow similar code paths in GeckoViewNavigation. That bug also has a similar beta/release difference occurring.
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 30•2 years ago
|
||
I had trouble reproducing this bug locally, but was consistently successful reproducing on try with a beta change set like this (didn't require language changes). I think there may be a race condition that has a chance to happen on CI because the bug never reproduced locally with the same changeset or even after downloading a failing CI Android APK artifact and testing locally against that.
Added some more detailed logging in try with changeset:
02-09 17:19:26.157 17287 17302 I Gecko : 1675963166157 Marionette DEBUG 0 -> [0,5,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}]
02-09 17:19:26.162 17287 17302 I Gecko : GV - 3 driver.sys.mjs this.currentSession.contentBrowsingContext: 4
02-09 17:19:26.164 17287 17302 I Gecko : GV - navigate.sys.mjs isLoadEventExpected
02-09 17:19:26.164 17287 17302 I Gecko : GV - navigate.sys.mjs isLoadEventExpected true
02-09 17:19:26.165 17287 17302 I Gecko : GV - navigate.sys.mjs - add event listener XULFrameLoaderCreated
02-09 17:19:26.175 1659 1733 E TaskPersister: File error accessing recents directory (directory doesn't exist?).
02-09 17:19:26.188 17287 17302 I Gecko : 1675963166188 Marionette TRACE [4] Received event beforeunload for about:blank
02-09 17:19:26.188 17287 17302 I Gecko : GV - navigate.sys.mjs onNavigation data.type: beforeunload
02-09 17:19:26.323 17287 17302 I GeckoDump: GV - gv.js - prepareToChangeRemoteness
02-09 17:19:26.328 17422 17437 D GeckoViewContent[C]: receiveMessage: CollectSessionState
02-09 17:19:26.433 17287 17302 I GeckoDump: GV - gv.js - willChangeBrowserRemoteness
02-09 17:19:26.433 17287 17302 D GeckoViewXUL: WillChangeBrowserRemoteness
02-09 17:19:26.433 17287 17302 I GeckoDump: GV - b-custom-elements.js - destroy
02-09 17:19:26.439 17422 17437 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
02-09 17:19:26.440 17422 17437 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
02-09 17:19:26.441 17422 17437 D GeckoViewAutoFill[C]: handleEvent: pagehide
02-09 17:19:26.446 17287 17302 I GV - : nsFrameLoaderOwner.cpp XULFrameLoaderCreated aOwner IsXULELEMENT
02-09 17:19:26.446 17287 17302 I Gecko : GV - navigate.sys.mjsonBrowsingContextChanged
02-09 17:19:26.446 17287 17302 I Gecko : GV - navigate.sys.mjsonBrowsingContextChanged - true
02-09 17:19:26.447 17287 17302 I GeckoDump: GV - b-custom-elements.js - construct
02-09 17:19:26.448 17287 17302 I GeckoDump: GV - b-custom-elements.js - this.isRemoteBrowser = true
02-09 17:19:26.448 17287 17302 I Gecko : GV - RWN.jsm - constructor
02-09 17:19:26.449 17287 17302 I GeckoDump: GV - b-custom-elements.js - construct !this.hasAttribute('disablehistory;) = true
02-09 17:19:26.449 17287 17302 I GeckoDump: GV - gv.js - didChangeBrowserRemoteness
02-09 17:19:26.449 17287 17302 D GeckoViewXUL: DidChangeBrowserRemoteness
02-09 17:19:26.450 17287 17302 D GeckoViewNavigation: sessionContextId=null
02-09 17:19:26.450 17287 17302 I Gecko : GV - GVN.jsm onInitBrowser
02-09 17:19:26.450 17287 17302 I GeckoDump: GV - gv.js - afterBrowserRemotenessChange
02-09 17:19:26.452 17287 17302 I Gecko : [Parent 17287, Main Thread] WARNING: '!window', file /builds/worker/checkouts/gecko/mobile/android/components/geckoview/GeckoViewHistory.cpp:266
02-09 17:14:15.296 4422 4437 I Gecko : 1675962855296 Marionette DEBUG 0 -> [0,5,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}]
02-09 17:14:15.297 4422 4437 I Gecko : GV - 3 driver.sys.mjs this.currentSession.contentBrowsingContext: 4
02-09 17:14:15.302 4422 4437 I Gecko : GV - navigate.sys.mjs isLoadEventExpected
02-09 17:14:15.302 4422 4437 I Gecko : GV - navigate.sys.mjs isLoadEventExpected true
02-09 17:14:15.303 4422 4437 I Gecko : GV - navigate.sys.mjs - add event listener XULFrameLoaderCreated
02-09 17:14:15.309 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.311 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.313 4422 4437 I Gecko : 1675962855312 Marionette TRACE [4] Received event beforeunload for about:blank
02-09 17:14:15.313 4422 4437 I Gecko : GV - navigate.sys.mjs onNavigation data.type: beforeunload
02-09 17:14:15.315 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.332 4484 4509 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
02-09 17:14:15.342 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.344 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.349 4484 4509 W webrender::device::gl: Cropping texture upload Box2D((0, 0), (0, 1)) to None
02-09 17:14:15.398 4422 4437 I GeckoDump: GV - gv.js - prepareToChangeRemoteness
02-09 17:14:15.401 4556 4571 D GeckoViewContent[C]: receiveMessage: CollectSessionState
02-09 17:14:15.411 4422 4437 I GeckoDump: GV - gv.js - willChangeBrowserRemoteness
02-09 17:14:15.412 4422 4437 D GeckoViewXUL: WillChangeBrowserRemoteness
02-09 17:14:15.412 4422 4437 I GeckoDump: GV - b-custom-elements.js - destroy
02-09 17:14:15.416 4422 4437 I GV - : nsFrameLoaderOwner.cpp XULFrameLoaderCreated aOwner IsXULELEMENT
02-09 17:14:15.416 4556 4571 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
02-09 17:14:15.416 4422 4437 I Gecko : GV - driver.sys.mjs XULFrameLoaderCreated event
02-09 17:14:15.416 4422 4437 I Gecko : GV - driver.sys.mjs XULFrameLoaderCreated event target === this.curBrowser.contentBrowser
02-09 17:14:15.417 4422 4437 I Gecko : 1675962855417 Marionette TRACE Remoteness change detected. Set new top-level browsing context to 5
02-09 17:14:15.417 4422 4437 I Gecko : GV - 2 driver.sys.mjs this.currentSession.contentBrowsingContext: 5
02-09 17:14:15.417 4422 4437 I Gecko : GV - navigate.sys.mjsonBrowsingContextChanged
02-09 17:14:15.417 4422 4437 I Gecko : GV - navigate.sys.mjsonBrowsingContextChanged - true
02-09 17:14:15.417 4556 4571 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
02-09 17:14:15.417 4422 4437 I GeckoDump: GV - b-custom-elements.js - construct
02-09 17:14:15.418 4422 4437 I GeckoDump: GV - b-custom-elements.js - this.isRemoteBrowser = true
02-09 17:14:15.418 4422 4437 I Gecko : GV - RWN.jsm - constructor
02-09 17:14:15.418 4556 4571 D GeckoViewAutoFill[C]: handleEvent: pagehide
02-09 17:14:15.418 4422 4437 I GeckoDump: GV - b-custom-elements.js - construct !this.hasAttribute('disablehistory;) = true
02-09 17:14:15.418 4422 4437 I GeckoDump: GV - gv.js - didChangeBrowserRemoteness
02-09 17:14:15.418 4422 4437 D GeckoViewXUL: DidChangeBrowserRemoteness
02-09 17:14:15.419 4422 4437 D GeckoViewNavigation: sessionContextId=null
02-09 17:14:15.419 4422 4437 I Gecko : GV - GVN.jsm onInitBrowser
02-09 17:14:15.420 4422 4437 I GeckoDump: GV - gv.js - afterBrowserRemotenessChange
02-09 17:14:15.421 4422 4437 I Gecko : [Parent 4422, Main Thread] WARNING: '!window', file /builds/worker/checkouts/gecko/mobile/android/components/geckoview/GeckoViewHistory.cpp:266
Root issue seems most related to around XULFrameLoaderCreated
events. In the failing case, after the XULFrameLoaderCreated
event, it continues through navigate.sys.mjs waitForNavigationCompleted. On a passing case, it reacts to the XULFrameLoaderCreated
event . The event listener for XULFrameLoaderCreated
was set for both cases.
May have some relation to bug 1786640, as both have conversations around XULFrameLoaderCreated
.
(TaskPersister: File error accessing recents directory (directory doesn't exist?).
does show up consistently around the same time for failing case, but it is in passing case too - source.)
Comment hidden (Intermittent Failures Robot) |
Comment 32•2 years ago
|
||
Will check the logs to see if I see a connection between this and bug 1786640. Note that the latest findings over at bug 1786640 & its followup indicate that the difference in XULFrameLoaderCreated events seem connected to the way clicks are simulated on Desktop compared to Android.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 36•2 years ago
|
||
(In reply to Olivia Hall [:olivia] from comment #30)
Root issue seems most related to around
XULFrameLoaderCreated
events. In the failing case, after theXULFrameLoaderCreated
event, it continues through navigate.sys.mjs waitForNavigationCompleted. On a passing case, it reacts to theXULFrameLoaderCreated
event . The event listener forXULFrameLoaderCreated
was set for both cases.
Interesting. Could you maybe describe what exactly happens in GeckoView for a remoteness change? In Marionette we are adding the event listener on the window
object. Maybe the remoteness change is causing the window to be replaced? It could explain why the registered handler is no longer called and we miss that event.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment 44•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #36)
(In reply to Olivia Hall [:olivia] from comment #30)
Root issue seems most related to around
XULFrameLoaderCreated
events. In the failing case, after theXULFrameLoaderCreated
event, it continues through navigate.sys.mjs waitForNavigationCompleted. On a passing case, it reacts to theXULFrameLoaderCreated
event . The event listener forXULFrameLoaderCreated
was set for both cases.Interesting. Could you maybe describe what exactly happens in GeckoView for a remoteness change? In Marionette we are adding the event listener on the
window
object. Maybe the remoteness change is causing the window to be replaced? It could explain why the registered handler is no longer called and we miss that event.
I'm going to outline my understanding of the flow. I'm having trouble forming a good answer to this question or finding solid evidence, but what you described sounds reasonable to me too.
- New Tab Request -> serviceWorkerOpenWindow-> createSession
- Window Created
- ...
- geckoview.js prepareToChangeRemoteness - sets up session state
- geckoview - willChangeBrowserRemoteness - destroy's modules
- browser-custom-element.js destroy - noticed in the failure case that
visibilitychange
andpagehide
events are logged beforeXULFrameLoaderCreated
. ThatonPageHide
keeps catching my attention. - XULFrameLoaderCreated - dispatched
- ...
- Failure case path
- Pass case path
I'll also ask if any other GeckoView engineers have any insight into this area too.
Comment 45•1 years ago
|
||
I have been doing beta simulations for 115 today, and I no longer get the failure: https://treeherder.mozilla.org/jobs?repo=try&revision=a04fdfb2c9566610e32fe8a946d872d11d65732d
This push had other tests in the file commented out so I will do other try pushes, but I wonder if this got fixed in the last month?
Comment 46•1 years ago
|
||
I vaguely wonder if this issue was related or a symptom of bug 1815015, which was solved last month in 114 and impacts browser activeness.
Comment 47•1 years ago
|
||
(In reply to Olivia Hall [:olivia] from comment #46)
I vaguely wonder if this issue was related or a symptom of bug 1815015, which was solved last month in 114 and impacts browser activeness.
Thanks for the hint! I tried to check the bug you mentioned but unfortunately I'm unable to get failures on try either before or after the landing of Bug 1815015. After that I went back to a previous try push which was able to trigger the intermittent, pushed that again, and was unable to get failures there as well.
So I'm not sure if this is for some reason not happening on CI anymore, or if I'm doing something wrong with my pushes.
Olivia, do you have a (recent-ish) try push which reproduces the issue?
Comment 48•1 years ago
|
||
I sent some pushes up trying to reproduce. A few of them after the 30 day mark and beyond look like possible reproductions, but I'm not entirely sure, what do you think?
At the 30 day mark and later there are some forward.py
and backwards.py
signatures, but it looks a little different from the earlier signature I was using as a reference. However, I see other linked reference bugs with a signature more like that.
May 22 - run 1 115.0.1, run 2 115.0, run 3 (saw a bug that increased the timeout in mochitests, thought it might be somehow related to this harness, but no difference)
Apr 23 - 30 days - run 1 114.0.1, run 2 114.0
Apr 13 - 40 days - run 1 114.0.1, run 2 114.0
Mar 29 - linked reference run
Mar 24 - 60 days - run 1 113.0.1, run 2 113.0
Mar 9 - linked reference run
Feb 3 - tried to replay push of what I think is c30 or similar - run 1 111.0 has toolchain errors
Comment 49•1 years ago
|
||
Thanks a lot for checking!
All my recent pushes are using a central which is slightly more recent than the Apr 23 where you start seeing the first failures.
I think the signature you have on those older jobs matches this intermittent (or at least this was the one I was investigating). Given that recent jobs don't have this error anymore, I think we can close this as WORKSFORME for now and reopen if the issue comes back?
Reporter | ||
Comment 50•1 years ago
|
||
That sounds fine to me. Thank you both for these deep investigations! Lets hope that this issue doesn't return.
Reporter | ||
Comment 51•1 years ago
|
||
Actually my patch on bug 1760554 changed the behavior here. With it we do no longer run into a test module wide timeout as this bug was about, but only see a test failure:
https://treeherder.mozilla.org/logviewer?job_id=416750530&repo=mozilla-beta&lineNumber=21066-21068
[task 2023-05-23T18:16:26.767Z] 18:16:26 INFO - PID 2779 | 1684865786766 webdriver::server DEBUG -> POST /session/14a2b694-6e0b-408c-a559-3cf6f3a76abf/url {"url": "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}
[task 2023-05-23T18:16:35.791Z] 18:16:35 INFO - PID 2779 | 1684865795790 webdriver::server DEBUG <- 500 Internal Server Error {"value":{"error":"timeout","message":"Navigation timed out after 9000 ms","stacktrace":"RemoteE ... rome://remote/content/shared/webdriver/Errors.sys.mjs:503:5\nbail@chrome://remote/content/marionette/sync.sys.mjs:213:19\n"}}
[task 2023-05-23T18:16:35.848Z] 18:16:35 INFO - STDOUT: FAILED
The Android log cat shows quite a lot of deadlocks for gfxFontEntry
which may be related to this issue?
05-23 18:16:26.345 17864 17879 I Gecko : 1684865786345 Marionette DEBUG 0 -> [0,8,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin"}]
05-23 18:16:26.369 17864 17879 I Gecko : 1684865786369 Marionette TRACE [4] Received event beforeunload for about:blank
05-23 18:16:26.513 17996 18010 D GeckoViewContent[C]: receiveMessage: CollectSessionState
05-23 18:16:26.545 17864 17879 D GeckoViewXUL: WillChangeBrowserRemoteness
05-23 18:16:26.551 17996 18010 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
05-23 18:16:26.552 17864 17879 D GeckoViewXUL: DidChangeBrowserRemoteness
05-23 18:16:26.552 17996 18010 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
05-23 18:16:26.552 17864 17879 D GeckoViewNavigation: sessionContextId=null
05-23 18:16:26.552 17996 18010 D GeckoViewAutoFill[C]: handleEvent: pagehide
05-23 18:16:26.554 17864 17879 I Gecko : [Parent 17864, Main Thread] WARNING: '!window', file /builds/worker/checkouts/gecko/mobile/android/components/geckoview/GeckoViewHistory.cpp:266
05-23 18:16:26.560 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
05-23 18:16:26.580 17864 17879 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
05-23 18:16:26.581 17864 17879 D GeckoViewActorParent: receiveMessage window destroyed DispatcherMessage GeckoView:DismissClipboardPermissionRequest
05-23 18:16:26.581 17996 18010 D GeckoViewContent[C]: receiveMessage: RestoreHistoryAndNavigate
05-23 18:16:26.582 17864 17879 D GeckoViewAutoFill: receiveMessage Clear
05-23 18:16:26.583 17864 17879 I Gecko : 1684865786583 Marionette TRACE [4] Received event pagehide for about:blank
05-23 18:16:26.614 17864 17879 D GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"SessionStateAggregator"}
05-23 18:16:26.651 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
05-23 18:16:26.653 17864 17879 I Gecko : [Parent 17864, Main Thread] WARNING: 'aOwner->IsDiscarded()', file /builds/worker/workspace/obj-build/dist/include/mozilla/dom/SyncedContextInlines.h:95
05-23 18:16:26.664 17996 18010 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
05-23 18:16:26.665 17996 18010 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
05-23 18:16:26.666 17996 18010 D GeckoViewAutoFill[C]: handleEvent: pagehide
05-23 18:16:26.682 17996 18010 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin" line: 0}]
05-23 18:16:26.693 17864 17879 D GeckoViewClipboardPermissionParent: receiveMessage: DispatcherMessage
05-23 18:16:26.694 17864 17879 W GeckoEventDispatcher: No listener for GeckoView:DismissClipboardPermissionRequest
05-23 18:16:26.695 17864 17879 D GeckoViewAutoFill: receiveMessage Clear
05-23 18:16:26.699 17996 18010 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
05-23 18:16:26.701 1659 1733 E TaskPersister: File error accessing recents directory (directory doesn't exist?).
05-23 18:16:26.711 17864 17879 D GeckoViewNavigation: onLocationChange
05-23 18:16:26.712 17864 17864 D GeckoSession: handleMessage GeckoView:LocationChange uri=<https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin>
05-23 18:16:26.735 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
05-23 18:16:26.785 17864 17879 D GeckoViewContent: handleEvent: pagetitlechanged
05-23 18:16:26.854 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
05-23 18:16:26.893 17996 18010 W Web Content: [JavaScript Warning: "This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>â€." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
05-23 18:16:26.897 17864 17879 D GeckoViewNavigation: onLocationChange
05-23 18:16:26.898 17996 18010 W Web Content: [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html" line: 0}]
05-23 18:16:26.898 17864 17864 D GeckoSession: handleMessage GeckoView:LocationChange uri=<https://web-platform.test:8443/webdriver/tests/support/html/deleteframe.html>
05-23 18:16:26.907 17996 18010 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
05-23 18:16:26.911 17996 18010 I Gecko : [Child 17996, Main Thread] WARNING: Potential deadlock detected:
05-23 18:16:26.911 17996 18010 I Gecko : Cyclical dependency starts at
05-23 18:16:26.911 17996 18010 I Gecko : Mutex : gfxFontEntry lock
05-23 18:16:26.911 17996 18010 I Gecko : Next dependency:
05-23 18:16:26.911 17996 18010 I Gecko : Mutex : SharedFTFace::mLock (currently acquired)
05-23 18:16:26.911 17996 18010 I Gecko : Cycle completed at
05-23 18:16:26.911 17996 18010 I Gecko : Mutex : gfxFontEntry lock
05-23 18:16:26.911 17996 18010 I Gecko : Deadlock may happen for some other execution
05-23 18:16:26.911 17996 18010 I Gecko :
05-23 18:16:26.911 17996 18010 I Gecko : : file /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:247
[..]
05-23 18:16:26.918 17996 18010 D GeckoViewContent[C]: handleEvent: pageshow
05-23 18:16:26.918 17996 18010 D GeckoViewAutoFill[C]: handleEvent: pageshow
05-23 18:16:26.922 17996 18010 D GeckoViewContent[C]: handleEvent: pageshow
05-23 18:16:26.924 17996 18010 D GeckoViewAutoFill[C]: handleEvent: pageshow
05-23 18:16:26.933 17996 18010 D GeckoViewContentDelegateChild[C]: handleEvent: MozFirstContentfulPaint
05-23 18:16:26.936 17864 17879 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
05-23 18:16:26.948 17864 17879 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
05-23 18:16:26.954 17864 17879 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
05-23 18:16:26.965 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
05-23 18:16:27.006 17864 17879 I Gecko : 1684865787006 Marionette TRACE Received observer notification browser-delayed-startup-finished
05-23 18:16:27.063 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
05-23 18:16:27.217 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
05-23 18:16:27.285 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
05-23 18:16:27.465 17926 17952 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008741
05-23 18:16:27.589 17864 17879 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
05-23 18:16:30.656 17864 17879 D GeckoIdleService: Get idle time: time since reset 5010 msec
05-23 18:16:30.657 17864 17879 D GeckoIdleService: Idle timer callback: current idle time 5010 msec
05-23 18:16:30.657 17864 17879 D GeckoIdleService: next timeout 4294967289989 msec from now
05-23 18:16:30.657 17864 17879 D GeckoIdleService: SetTimerExpiryIfBefore: next timeout 4294967289989 msec from now
05-23 18:16:30.657 17864 17879 D GeckoIdleService: reset timer expiry to 4294967289999 msec from now
05-23 18:16:30.657 17864 17879 D GeckoIdleService: Idle timer callback: tell observer 0x771839a3bc68 user is idle
05-23 18:16:35.355 17864 17879 I Gecko : 1684865795355 Marionette DEBUG 0 <- [1,8,{"error":"timeout","message":"Navigation timed out after 9000 ms","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Er ... or@chrome://remote/content/shared/webdriver/Errors.sys.mjs:503:5\nbail@chrome://remote/content/marionette/sync.sys.mjs:213:19\n"},null]
Failures were present in the last two days again and have been classified against bug 1769430:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-04-24&endday=2023-05-24&tree=all&failurehash=all&bug=1769430
Reporter | ||
Comment 52•1 years ago
|
||
It's interesting as well that the following line is shown but we do not get this event in Marionette:
05-23 18:16:26.918 17996 18010 D GeckoViewContent[C]: handleEvent: pageshow
I assume it is for the same browsing context our navigation runs in?
Comment 53•1 years ago
|
||
Just noting it here, but on the 21st we switched from early to late beta. This could have an impact if the issue is related to a preference checking EARLY_BETA_OR_EARLIER, but I think all the beta simulations in comment 48 are late beta, so maybe this has nothing to do with it...
Comment 54•1 years ago
|
||
The Android log cat shows quite a lot of deadlocks for
gfxFontEntry
which may be related to this issue?
The font deadlock sounds like it could be involved to me too - it is tracked over in bug 1827756 for causing fission test timeouts.
The beta criteria is really strange to me too. I don't quite understand what part causes the issue to occur.
At one point, I downloaded a failing Treeherder Geckoview Test Runner Android APK and tried testing against it locally. (IIRC, I had to do something special with the naming to prevent it from rebuilding and replacing the downloaded APK.) However, running the tests against the APK on my machine didn't reproduce the failure.
Maybe something is slowing things down just enough on the CI emulator to cause unexpected race conditions when in beta, like the font deadlock?
Comment 55•1 years ago
|
||
I did 4 pushes this morning, 2 early beta, 2 late beta. I didn't manage to get the intermittent once.
https://treeherder.mozilla.org/jobs?repo=try&author=jdescottes%40mozilla.com&tochange=6c803a10f66d7412f9230e195152f72d86e220c0&fromchange=6cccfc7dd7ea6e0385830ce602c40a4fd527b94b
Am I doing anything wrong with those pushes? Do you update anything apart from doing a beta simulation?
Comment 56•1 years ago
|
||
Those pushes looks like the ones that were reproducing the bug to me!
I didn't add anything additional. I skipped setting the browser/locales/l10n-changesets.json
and shipped-locales
, but on the real tree examples those were set. (I manually applied a beta patch and set the milestones. I didn't use the try release
command or anything, if that could somehow make a difference.)
Comment 57•1 years ago
•
|
||
Thanks for taking a look!
Given that there are no other reports since the 24th, it might be less frequent than it used to so I started retriggers.
Let's keep monitoring for now.
Edit: no failure over 200 retriggers of wd3 in total between the 4 pushes
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 60•1 year ago
•
|
||
Might be similar to bug 1825501 which affects all the other platforms as well. There have been 7 failures last week.
Reporter | ||
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 78•1 year ago
|
||
The following adb log is interesting:
https://firefoxci.taskcluster-artifacts.net/bNdLiApHRNudx2nDfekKXw/0/public/test_info/logcat-emulator-5554.log
At 10-12 13:02:24.614
we are opening a new tab, and then at 10-12 13:02:24.713
the initial page load starts and finishes at 10-12 13:02:24.834
. Right away we switch to that newly opened tab and start another navigation (10-12 13:02:24.853
) to a page as served by wptserve. Here how it looks like:
10-12 13:02:24.935 30562 30577 I Gecko : 1697115744935 Marionette TRACE Received observer notification browser-delayed-startup-finished
10-12 13:02:24.939 30949 30964 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
10-12 13:02:24.939 30562 30577 D GeckoViewContent: handleEvent: pagetitlechanged
10-12 13:02:24.954 30949 30964 D GeckoViewContentDelegateChild[C]: handleEvent: MozFirstContentfulPaint
10-12 13:02:24.956 30562 30577 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
10-12 13:02:24.960 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libsoftokn3.so: unused DT entry: type 0x8000024 arg 0x24b8
10-12 13:02:24.960 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libsoftokn3.so: unused DT entry: type 0x8000023 arg 0x68
10-12 13:02:24.960 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libsoftokn3.so: unused DT entry: type 0x8000025 arg 0x8
10-12 13:02:24.960 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libfreebl3.so: unused DT entry: type 0x8000024 arg 0xe98
10-12 13:02:24.961 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libfreebl3.so: unused DT entry: type 0x8000023 arg 0x68
10-12 13:02:24.961 30949 30964 W linker : /data/app/org.mozilla.geckoview.test_runner-1/lib/x86_64/libfreebl3.so: unused DT entry: type 0x8000025 arg 0x8
10-12 13:02:24.965 30625 30649 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
10-12 13:02:24.977 30949 30964 D GeckoViewContentDelegateChild[C]: handleEvent: DOMContentLoaded
10-12 13:02:24.978 30562 30577 D GeckoViewNavigation: onLocationChange
10-12 13:02:24.979 30562 30562 D GeckoSession: handleMessage GeckoView:LocationChange uri=https://web-platform.test:8443/webdriver/tests/support/inline.py?doc=%3C%21doctype+html%3E%0A%3Cmeta+charset%3DUTF-8%3E%0A%3Cdiv+id%3D%22in-frame%22%3E%3Cinput+type%3D%22checkbox%22%2F%3E%3C%2Fdiv%3E&mime=text%2Fhtml&charset=UTF-8&pipe=header%28Cross-Origin-Opener-Policy%2Csame-origin%29
10-12 13:02:24.982 30949 30964 D GeckoViewContent[C]: handleEvent: pageshow
10-12 13:02:24.983 30949 30964 D GeckoViewAutoFill[C]: handleEvent: pageshow
10-12 13:02:24.985 30949 30964 D GeckoViewContent[C]: handleEvent: pageshow
10-12 13:02:24.986 30949 30964 D GeckoViewAutoFill[C]: handleEvent: pageshow
10-12 13:02:24.996 30625 30649 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008caa
10-12 13:02:24.998 30562 30577 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
10-12 13:02:25.000 30562 30577 D GeckoViewContentDelegateParent: receiveMessage: DispatcherMessage
10-12 13:02:25.904 30562 30577 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
10-12 13:02:37.815 30562 30577 I Gecko : 1697115757815 Marionette DEBUG 0 <- [1,179,{"error":"timeout","message":"Navigation timed out after 12000 ms","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Er ... or@chrome://remote/content/shared/webdriver/Errors.sys.mjs:691:5\nbail@chrome://remote/content/marionette/sync.sys.mjs:211:19\n"},null]
Notice that there are 4 entries from GeckoViewContent about a pageshow
event being handled, but we never actually receive this event in the Marionette's wait for navigation completed method. So why is that the case? Maybe due to the process switch we are too late in registering the load
event listener? In the MarionetteEventsActor's definition we you DOMContentLoaded
and pageshow
, but DOMContentLoaded
should have already created the actor so that we do not miss to handle pageshow
. We do not explicitly handle load
but that's most likely fine given that this usually works.
I wonder if moving away from event handling to the progress listener might help here so that we get also proper logs created and might not miss possible progress changes.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 93•10 months ago
|
||
I actually don't think that we should track this separately from bug 1825501. Both have the same underlying issue. As such lets dupe.
Reporter | ||
Updated•10 months ago
|
Description
•