Intermittent raptor-main Critical: Tests failed to finish! Application timed out.
Categories
(Testing :: Raptor, defect, P1)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [comment 75][stockwell disable-recommended])
Attachments
(2 files)
Filed by: rmaries [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=257742678&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/D0lQPKNfRAaQXjScbx2DwQ/runs/0/artifacts/public/logs/live_backing.log
21:04:45 CRITICAL - raptor-main Critical: Tests failed to finish! Application timed out.
21:04:45 ERROR - raptor-main Error: Test failed to finish. Application timed out after 170 seconds
21:04:45 INFO - raptor-results-handler Info: summarizing raptor test results
21:04:45 ERROR - raptor-output Error: no raptor test results found for raptor-tp6m-wikipedia-geckoview-cold
21:04:45 INFO - raptor-output Info: error: no raptor test results found, so no need to combine browser cycles
21:04:45 ERROR - raptor-output Error: no summarized raptor results found for raptor-tp6m-wikipedia-geckoview-cold
21:04:45 CRITICAL - raptor-results-handler Critical: PERFHERDER_DATA was seen 0 times, expected 1.
21:04:45 INFO - raptor-main Info: removing test folder for raptor: /sdcard/raptor
21:04:46 INFO - adb shell_output: adb -s ZY3222CFLM wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output:
21:04:46 INFO - raptor-control-server Info: shutting down control server
21:04:46 INFO - raptor-main Info: finished
21:04:46 ERROR - raptor-main Critical: TEST-UNEXPECTED-FAIL: no raptor test results were found for raptor-tp6m-wikipedia-geckoview-cold
21:04:46 ERROR - Return code: 1
Comment hidden (Intermittent Failures Robot) |
Comment 2•5 years ago
|
||
:bebe, could you (or someone on the SV team) please have a look and see if this is valid / if there is something to investigate here? Thanks!
Comment 3•5 years ago
|
||
There are 2 errors here:
- The app "goes unresponsive/crashes" during startup
* https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258538146&repo=mozilla-central&lineNumber=1880 - The app "goes unresponsive" during the test run
* https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258713871&repo=mozilla-central&lineNumber=1467
We would need to investigate both cases better.
Comment 4•5 years ago
•
|
||
I have investigated two logs so far, one for a passing test and one for a failing test.
It seems that the failure reason is related to raptor's runner.js file which is not loaded on the failing test.
The flow is blocked for ~ 3 minutes and then it fails with the timeout.
Passing test + logs:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=257742662&revision=6d98669f6869e4f050ba95d943c1577800e00462
https://taskcluster-artifacts.net/AssWdt-kSuyE74mgbGTV5A/0/public/logs/live_backing.log
Failing test + logs:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=257742678&revision=6d98669f6869e4f050ba95d943c1577800e00462
https://taskcluster-artifacts.net/D0lQPKNfRAaQXjScbx2DwQ/0/public/logs/live_backing.log
Side by side comparison of these 2 logs:
https://drive.google.com/file/d/1Y83cyZglXCtkkslg65RmwquAvLLSEkAg/view?usp=sharing
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 6•5 years ago
|
||
Over the last 7 days there are 50 failures present on this bug. These happen on: windows7-32-shippable, android-hw-p2-8-0-arm7-api-16, android-hw-g5-7-0-arm7-api-16.
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=260469132&repo=autoland&lineNumber=1380
Comment hidden (Intermittent Failures Robot) |
Comment 8•5 years ago
|
||
Tier 1 fail: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=260668239&repo=mozilla-central&lineNumber=1383
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 11•5 years ago
|
||
Joel, any idea what to do here? Most failures are on android-hw machines and it has 212 failures in the last 30 days.
Marian, could you write a patch for the failures? Is this something you actively work on?
Comment 12•5 years ago
|
||
these are only on the cold tests, 212 failures in the last 7 days is serious, I see good discussion in this bug and the right people on it, just not sure if this is a priority to fix (it is a P5, but one of the most frequent intermittent failures we have).
An alternative is to run cold load as tier-2 (m-c/try only) for the g5
:davehunt, can you ensure this is getting focus this week?
Comment hidden (Intermittent Failures Robot) |
Comment 14•5 years ago
|
||
From the earlier comments it sounds like we're timing out waiting for the application to open, or that the application is crashing. Can we investigate to see if there are any patterns in the devices that are failing? Perhaps we have one or two defective devices in the lab. I also wonder if there's additional debug we could gather when we're expecting the application to be running. As this is happening only for the cold page load tests, perhaps we're trying to launch the application before the previous process has completely terminated? :bc do you have any suggestions for how we might proceed here?
:rwood if there's no progress or improvement by the end of this week, could you arrange for temporarily dropping g5 cold tests to tier 2 and disable them running on autoland/inbound.
Comment 15•5 years ago
|
||
There is clear that we have a problem with runner.js file :
I analyzed the logcat from android device and we have some issues:
- NetworkError when fetching resources
- Javascript loading failures
- JavaScript SyntaxErrors - malformed URI for runner.js
- we receive "starting raptorRunner", so raptorRunner() function is reached, but then
we end up with -> JavaScript Error: "ReferenceError: getTestConfig is not defined"
and from that point the script is not moving forward to
raptorLog(test name is: ${config.test_name}
);
raptorLog(test settings url is: ${config.test_settings_url}
);
More details here:
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/runner.js#669-673
FULL LOGCAT:
https://taskcluster-artifacts.net/bKPPZN_vTB2ErmLk-7pCbg/0/public/test_info/logcat-ZY322FZT7B.log
LOGCAT EXCERPT:
08-07 16:10:00.841 7316 7331 D GeckoThread: State changed to RUNNING
08-07 16:10:00.853 7295 7310 D GeckoViewRemoteDebugger: USB remote debugger - listening on org.mozilla.geckoview_example/firefox-debugger-socket
08-07 16:10:00.854 7295 7310 D GeckoViewConsole: enabled = true
08-07 16:10:00.906 7295 7310 D GeckoViewModule: queue GeckoView:LoadUri, data={"uri":"about:blank","flags":0}
08-07 16:10:00.907 7295 7310 D GeckoViewModule: queue GeckoView:SetFocused, data={"focused":true}
08-07 16:10:00.908 7295 7310 D GeckoViewModule: queue GeckoView:SetActive, data={"active":true}
08-07 16:10:00.908 7295 7310 D GeckoViewModule: queue GeckoView:SetFocused, data={"focused":true}
--->>>> 08-07 16:10:01.024 7295 7310 E GeckoConsole: [JavaScript Error: "NetworkError when attempting to fetch resource."]
08-07 16:10:01.024 7295 7310 E GeckoConsole: get@resource://services-settings/RemoteSettingsClient.jsm:306:12
--->>>> 08-07 16:10:01.025 7295 7310 W GeckoConsole: [JavaScript Warning: "Loading failed for the <script> with source “moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/auto_gen_test_config.jsâ€." {file: "moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/_generated_background_page.html" line: 5}]
08-07 16:10:01.034 7295 7310 I Gecko : console.log: "[raptor-runnerjs] Attaching event listener successful!"
08-07 16:10:01.037 7295 7310 I GeckoConsole: [raptor-runnerjs] Attaching event listener successful!
--->>>> 08-07 16:10:01.038 7295 7310 E GeckoConsole: [JavaScript Error: "SyntaxError: The URI is malformed." {file: "moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/runner.js" line: 625}]
08-07 16:10:01.038 7295 7310 E GeckoConsole: postToControlServer@moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/runner.js:625:10
08-07 16:10:01.038 7295 7310 E GeckoConsole: @moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/runner.js:737:22
--->>>> 08-07 16:10:01.053 7295 7310 I Gecko : console.log: "[raptor-runnerjs] starting raptorRunner"
08-07 16:10:01.054 7295 7310 I GeckoConsole: [raptor-runnerjs] starting raptorRunner
--->>>> 08-07 16:10:01.054 7295 7310 E GeckoConsole: [JavaScript Error: "ReferenceError: getTestConfig is not defined" {file: "moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/runner.js" line: 671}]
--->>>> 08-07 16:10:01.054 7295 7310 E GeckoConsole: raptorRunner@moz-extension://e7ef8f4c-4c1c-4b46-883a-b13de60c5407/runner.js:671:16
08-07 16:10:01.082 7295 7310 D GeckoViewNavigation: onLocationChange
08-07 16:10:01.083 7295 7310 D GeckoViewProgress: onLocationChange: location=about:blank, flags=0
08-07 16:10:01.084 7295 7310 D GeckoViewProgress: onSecurityChange
08-07 16:10:01.087 7316 7331 D GeckoViewContent[C]: onInit
Comment 16•5 years ago
•
|
||
On another job:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=261228363&revision=1ca06cd3e226f80adf2b1b8837ed9a475fa90baa
We have this logcat :
https://taskcluster-artifacts.net/bQUQ7nTzQ2qIf_rxHMesZQ/0/public/test_info/logcat-ZY322MQ97M.log
After analysing the GeckoConsole lines i found this error:
GeckoConsole: [JavaScript Error: "Error while loading 'file:///storage/emulated/0/raptor/profile/extensions/raptor@mozilla.org/manifest.json' (NS_ERROR_FILE_ACCESS_DENIED)" {file: "resource://gre/modules/Extension.jsm" line: 522}]
If this manifest.json file is the same with:
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/manifest.json#12
We might lose the auto_gen_test_config.js which has the getTestConfig() definition, used by runner.js
Here is a copy of auto_gen_test_config.js from my local environment:
// this file is auto-generated by raptor, do not edit directly
function getTestConfig() {
return {"browser": "geckoview",
"cs_port": "52047",
"test_name": "raptor-tp6m-google-geckoview",
"test_settings_url": "http://127.0.0.1:52047/json/raptor-tp6m-google-geckoview.json",
"post_startup_delay": "30000",
"benchmark_port": "0",
"host": "127.0.0.1",
"debug_mode": "0",
"browser_cycle": "1"};
}
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•5 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC; 🏖 PTO August 15-26 from comment #14)
:rwood if there's no progress or improvement by the end of this week, could you arrange for temporarily dropping g5 cold tests to tier 2 and disable them running on autoland/inbound.
I'll make a patch for ^
Updated•5 years ago
|
Comment 22•5 years ago
|
||
I don't have a g5 to test locally so I'm kind of flying in the dark here.
My first thought was about memory. The g5 devices appear to be from two cohorts: 1867MB and 2884MB. It seems the 1867MB devices have a higher failure rate.
The p2s all appear to have 3651MB. aerickson: Can you check with sakari or devarsh about the memory sizes on the devices?
Second thought has to do with the performance tweaks particularly the setting virtual memory parameters. Perhaps this is causing the low mem devices to fail due to swap issues?
Comment 23•5 years ago
|
||
Motorola G5's do come in two varieties: 16GB disk with 2GB RAM and 32GB disk with 3GB RAM.
All Pixel2's come with 4GB or RAM. Varieties: 64GB disk with 4GB RAM, 128GB disk with 4GB RAM
I've asked Bitbar for a report on the RAM for each of our G5's.
There are absolutely nodes that have a lower success ratio than others.
gecko-t-bitbar-gw-perf-g5.motog5-01 {sr: [ ] 0.0%, suc: 0, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-02 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-03 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-04 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-05 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-06 {sr: [ ] 5.3%, suc: 1, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-07 {sr: [ ] 0.0%, suc: 0, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-09 {sr: [==========] 100.0%, suc: 19, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-10 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-11 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-12 {sr: [==========] 100.0%, suc: 19, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-13 {sr: [======= ] 78.9%, suc: 15, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-14 {sr: [ ] 5.3%, suc: 1, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-16 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-17 {sr: [==========] 100.0%, suc: 19, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-18 {sr: [==========] 100.0%, suc: 19, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-19 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-20 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-21 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-22 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-23 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-24 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-25 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-26 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-27 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-28 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-29 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-30 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-31 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-32 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-33 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-34 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-35 {sr: [======== ] 84.2%, suc: 16, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-bitbar-gw-perf-g5.motog5-36 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-37 {sr: [========= ] 94.7%, suc: 18, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-38 {sr: [======== ] 89.5%, suc: 17, cmp: 19, exc: 0, rng: 1}
gecko-t-bitbar-gw-perf-g5.motog5-39 {sr: [======= ] 78.9%, suc: 15, cmp: 19, exc: 0, rng: 1, alerts: ['Low health (less than 0.85)!']}
Elapsed Time: 33.85214686393738
Comment 24•5 years ago
|
||
Here are the memory sizes for the g5s:
DEVICE_DISPLAY_NAME:motog5-01 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-02 MemTotal:1912812
DEVICE_DISPLAY_NAME:motog5-03 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-04 MemTotal:1912808
DEVICE_DISPLAY_NAME:motog5-05 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-06 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-07 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-08 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-09 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-10 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-11 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-12 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-13 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-14 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-15 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-16 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-17 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-18 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-19 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-20 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-21 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-22 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-23 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-24 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-25 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-26 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-27 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-28 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-29 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-30 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-31 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-32 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-33 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-34 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-35 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-36 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-37 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-38 MemTotal:2953344
DEVICE_DISPLAY_NAME:motog5-39 MemTotal:1912804
DEVICE_DISPLAY_NAME:motog5-40 MemTotal:1912804
Comment 25•5 years ago
|
||
we have 30 2G motog5 and 10 3G motog5. When I have some time to dig through the results, I'll try to get a picture of how the memory affects the success/failure rate for these two groups of devices. Perhaps we can segregate the tests so that tests which require more memory run on the high memory devices only.
davehunt, rwood: thoughts?
Comment 26•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #25)
we have 30 2G motog5 and 10 3G motog5. When I have some time to dig through the results, I'll try to get a picture of how the memory affects the success/failure rate for these two groups of devices. Perhaps we can segregate the tests so that tests which require more memory run on the high memory devices only.
davehunt, rwood: thoughts?
Interesting, thanks :bc. IMO instead of segregating the tests to run on the mogog5's with higher memory, we should just only run the tests that require more memory on the P2s and not on the motog5's at all.
Comment hidden (Intermittent Failures Robot) |
Comment 28•5 years ago
|
||
I looked at the last week of failures for g5 and see the following
Count, test, memory
33 tp6m-8, 1867
16 tp6m-27, 1867
11 tp6m-5, 1867
7 tp6m-4, 1867
5 tp6m-7, 1867
5 tp6m-2, 1867
4 unity-webgl, 1867
4 tp6m-3, 1867
3 tp6m-9, 1867
3 tp6m-8, 2884
3 tp6m-12, 1867
2 tp6m-6, 1867
2 tp6m-24, 1867
2 tp6m-21, 1867
2 tp6m-10, 2884
1 youtube-playback, 2884
1 unity-webgl, 2884
1 tp6m-24, 2884
1 tp6m-23, 1867
1 tp6m-20, 1867
1 tp6m-17, 2884
1 tp6m-16, 1867
1 tp6m-14, 1867
1 tp6m-11, 1867
If someone else could decide which tests to disable on g5 and follow through with a patch that would be awesome.
Comment 29•5 years ago
|
||
Are these all cold load tests? At least some of them must be, as the warm tests only go up to tp6m-9 and I see tp6m-8 in there twice. The cold tests only run a single site per test, whereas the warm tests run several. We could see if redistributing the chunks helps for any warm tests affected here. For cold tests, we should consider if we have a problem if we're running out of memory opening a single site 25 times. I don't think we want to disable these tests against the moto g5, as this is our reference device for low end. Perhaps a better question is should we review our reference devices, and what should the minimum memory be?
Updated•5 years ago
|
Comment 30•5 years ago
|
||
I didn't break it down sufficiently to answer that question. I'll redo it later today and get back to you.
Comment 31•5 years ago
|
||
They were not all cold tests. The worst offenders were
29 tp6m-8-geckoview-e10s,tp6m-8,System memory: 1867MB.
16 tp6m-27-geckoview-cold-e10s,tp6m-c-27,System memory: 1867MB.
8 tp6m-5-geckoview-cold-e10s,tp6m-c-5,System memory: 1867MB.
7 tp6m-4-geckoview-cold-e10s,tp6m-c-4,System memory: 1867MB.
5 tp6m-7-geckoview-cold-e10s,tp6m-c-7,System memory: 1867MB.
5 tp6m-2-geckoview-e10s,tp6m-2,System memory: 1867MB.
Comment 32•5 years ago
|
||
Based on OF there was a spike here between August 11th and 18th, however failure rate seems to have decreased. Removing disable recommended tag.
There are 17 total failures in the last 7 days on android-hw-g5-7-0-arm7-api-16 opt and pgo.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=263895372&repo=mozilla-central&lineNumber=1520
[task 2019-08-28T17:06:56.008Z] 17:06:48 INFO - raptor-main Info: starting geckoview
[task 2019-08-28T17:06:56.008Z] 17:06:49 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell am force-stop org.mozilla.geckoview_example, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:49 INFO - adb Granting important runtime permissions to org.mozilla.geckoview_example
[task 2019-08-28T17:06:56.008Z] 17:06:50 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.WRITE_EXTERNAL_STORAGE, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:51 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.READ_EXTERNAL_STORAGE, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:52 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.ACCESS_COARSE_LOCATION, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:53 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.ACCESS_FINE_LOCATION, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:54 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.CAMERA, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:06:56.008Z] 17:06:56 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pm grant org.mozilla.geckoview_example android.permission.RECORD_AUDIO, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.519Z] 17:06:56 INFO - adb launch_application: am start -W -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -a android.intent.action.MAIN -d about:blank
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell am start -W -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -a android.intent.action.MAIN -d about:blank, timeout: None, root: False, timedout: None, exitcode: 0, output: Starting: Intent { act=android.intent.action.MAIN dat=about:blank cmp=org.mozilla.geckoview_example/.GeckoViewActivity }
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - Status: ok
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - Activity: org.mozilla.geckoview_example/.GeckoViewActivity
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - ThisTime: 653
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - TotalTime: 653
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - WaitTime: 666
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - Complete
[task 2019-08-28T17:09:46.520Z] 17:06:57 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell pidof org.mozilla.geckoview_example, timeout: None, root: False, timedout: None, exitcode: 0, output: 7966
[task 2019-08-28T17:09:46.520Z] 17:09:44 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell am force-stop org.mozilla.geckoview_example, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:44 CRITICAL - raptor-main Critical: Tests failed to finish! Application timed out.
[task 2019-08-28T17:09:46.520Z] 17:09:44 ERROR - raptor-main Error: Test failed to finish. Application timed out after 170 seconds
[task 2019-08-28T17:09:46.520Z] 17:09:44 INFO - raptor-main Info: removing reverse socket connections
[task 2019-08-28T17:09:46.520Z] 17:09:44 INFO - adb command_output: adb -s ZY3223F52J wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_bool: adb -s ZY3223F52J wait-for-device shell test -d /sdcard/raptor/profile/minidumps, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell sync, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_bool: adb -s ZY3223F52J wait-for-device shell test -d /sdcard/raptor/profile/minidumps, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb command_output: adb -s ZY3223F52J wait-for-device pull /sdcard/raptor/profile/minidumps /tmp/tmpbI5XS4/minidumps, timeout: None, timedout: None, exitcode: 0, output: /sdcard/raptor/profile/minidumps/: 0 files pulled.
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - mozproxy Stopping mitmproxy playback, killing process 787
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - mozproxy Successfully killed the mitmproxy playback process
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - raptor-main Info: removing webext /builds/task_1567011825/workspace/build/tests/raptor/raptor/../webext/raptor
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - raptor-results-handler Info: summarizing raptor test results
[task 2019-08-28T17:09:46.520Z] 17:09:45 ERROR - raptor-output Error: no raptor test results found for raptor-tp6m-bing-geckoview-cold
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - raptor-output Info: error: no raptor test results found, so no need to combine browser cycles
[task 2019-08-28T17:09:46.520Z] 17:09:45 ERROR - raptor-output Error: no summarized raptor results found for raptor-tp6m-bing-geckoview-cold
[task 2019-08-28T17:09:46.520Z] 17:09:45 CRITICAL - raptor-results-handler Critical: PERFHERDER_DATA was seen 0 times, expected 1.
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - raptor-main Info: removing test folder for raptor: /sdcard/raptor
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell sync, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell rm -r /sdcard/raptor, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:45 INFO - adb shell_output: adb -s ZY3223F52J wait-for-device shell sync, timeout: None, root: False, timedout: None, exitcode: 0, output:
[task 2019-08-28T17:09:46.520Z] 17:09:46 INFO - adb shell_bool: adb -s ZY3223F52J wait-for-device shell test -e /sdcard/raptor, timeout: None, root: False, timedout: None, exitcode: 1, output:
[task 2019-08-28T17:09:46.520Z] 17:09:46 INFO - raptor-control-server Info: shutting down control server
[task 2019-08-28T17:09:46.520Z] 17:09:46 INFO - raptor-main Info: finished
[task 2019-08-28T17:10:05.918Z] 17:09:46 ERROR - raptor-main Critical: TEST-UNEXPECTED-FAIL: no raptor test results were found for raptor-tp6m-bing-geckoview-cold
[task 2019-08-28T17:10:05.918Z] 17:09:46 ERROR - Return code: 1
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 36•5 years ago
•
|
||
I bisected the most recent regression regarding this bug is
changeset: 493035:e7364a10a663
user: Rob Wu <rob@robwu.nl>
date: Thu Sep 12 21:39:51 2019 +0000
summary: Bug 1570715 - Treat (deprecation) warnings as errors r=rpl
Edit: Actually this was already filed as bug 1581223
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 42•5 years ago
|
||
There are 22 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-10-01&endday=2019-10-08&tree=trunk&bug=1568050
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270233268&repo=mozilla-central&lineNumber=1513
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 45•5 years ago
|
||
There are 36 failures present on this bug over the last 7 days, these happen on windows10-64-ref-hw-2017, linux64-shippable, android-hw-p2-8-0-arm7-api-16, android-hw-g5-7-0-arm7-api-16
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271930000&repo=mozilla-central&lineNumber=1516
Comment 46•5 years ago
|
||
(In reply to Stefan Hindli [:stefan_hindli] from comment #45)
There are 36 failures present on this bug over the last 7 days, these happen on windows10-64-ref-hw-2017, linux64-shippable, android-hw-p2-8-0-arm7-api-16, android-hw-g5-7-0-arm7-api-16
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271930000&repo=mozilla-central&lineNumber=1516
I'm working on it.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment 51•5 years ago
|
||
Over the last 7 days there are 51 failures present on this bug. These happen on windows10-aarch64, windows10-64-ref-hw-2017, macosx1014-64-shippable, android-hw-p2-8-0-arm7-api-16, android-hw-g5-7-0-arm7-api-16,
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=273422281&repo=autoland&lineNumber=1458
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 57•5 years ago
|
||
I believe recent occurences here (15th of Nov) are due to https://hg.mozilla.org/mozilla-central/rev/738801392270
Agi can you provide some insight?
Comment hidden (Intermittent Failures Robot) |
Comment 59•5 years ago
|
||
Looks like the recent failures here got fixed by backing out Bug 1530402, Bug 1533156: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=raptor%2Candroid&fromchange=46348d491bc2337bb5c46a5dc4e5c674feb49604&tochange=2007edb47f8ffdda9ef310c833e131ac7efc230f&selectedJob=276389022
https://hg.mozilla.org/integration/autoland/rev/17db3abeba1a0ee39f3887279ff770c5f04f6313
Comment 60•5 years ago
|
||
Thanks, I think I know what's going on there, I'll fix it before pushing again.
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 65•5 years ago
|
||
This is spiking again. Agi are there any updates?
Comment 66•5 years ago
|
||
I don't see the spike? https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-10-28&endday=2019-11-27&tree=trunk&bug=1568050 Also my changes have been in m-c since the 18th so I doubt it's anything related to that?
Comment 67•5 years ago
|
||
(In reply to :Agi | ⏰ PST | he/him from comment #66)
I don't see the spike? https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-10-28&endday=2019-11-27&tree=trunk&bug=1568050 Also my changes have been in m-c since the 18th so I doubt it's anything related to that?
You cannot see it because that is filtered for the last month.
Since your patch landed it looks like this: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-11-16&endday=2019-11-27&tree=trunk&bug=1568050
Comment 68•5 years ago
|
||
I still don't understand, the spike on the 24th was fixed by this https://hg.mozilla.org/integration/autoland/rev/b4755981c1382cb88fed4e4fcff3ba73779b2080. Excluding that the level of intermittents is consistent with what looks to be the normal for this test? Why do you think it's my change? I looked into some of the failures and I don't see anything pointing to Bug 1530402
Comment 69•5 years ago
|
||
(In reply to :Agi | ⏰ PST | he/him from comment #68)
I still don't understand, the spike on the 24th was fixed by this https://hg.mozilla.org/integration/autoland/rev/b4755981c1382cb88fed4e4fcff3ba73779b2080. Excluding that the level of intermittents is consistent with what looks to be the normal for this test? Why do you think it's my change? I looked into some of the failures and I don't see anything pointing to Bug 1530402
I don't think it's your change, I'm just asking if it is. However, you answered what I actually wanted to know, meaning: "the spike on the 24th was fixed by this https://hg.mozilla.org/integration/autoland/rev/b4755981c1382cb88fed4e4fcff3ba73779b2080"
Thank you.
Comment 70•5 years ago
|
||
Ah sorry I misread your comments, cool. Thanks for looking into this!
Comment 71•5 years ago
|
||
(In reply to :Agi | ⏰ PST | he/him from comment #70)
Ah sorry I misread your comments, cool. Thanks for looking into this!
Hehe, no problem :) thank you for helping me figure this out.
Comment 72•5 years ago
|
||
This failure message is frequently shown in logs for tasks run which are set as 'Pass'. If this non-fatal error message gets shown on a run in which an actual failure occurred this bug might be considered the original one which caused the task to fail. Currently all occurrences I am aware of are for test-android-hw-g5-7-0-arm7-api-16/pgo-raptor-tp6m-<randomchunknumber>-geckoview-cold-e10s. Examples:
Comment hidden (Intermittent Failures Robot) |
Comment 74•5 years ago
|
||
Hi Alexandru. Is there any way to disable this until you have a fix?
There are 35 total failures in the last 7 days and 570 total failures in the last 30 on android-hw-g5-7-0-arm7-api-16 pgo.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278929257&repo=mozilla-central&lineNumber=2054
Comment 75•5 years ago
|
||
Don't think so. The intermittent is caused by the raptor tp6 tests. I have a fix that might relate to this one, or at least make the occurrence lower. Just pushed to try and submitted to phab review.
Bebe, other ideas?
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 78•5 years ago
|
||
The intermittents here are mixed, but most often occurs CRITICAL - raptor-main Critical: Tests failed to finish! Application timed out.
Comment 79•5 years ago
|
||
looking at windows (!aarch64), the most recent 5 failures match that, most failures are on tp6-c-XX for firefox. here is a successful log snippet:
[task 2019-12-09T10:59:15.913Z] 10:59:15 INFO - raptor-mitmproxy Info: Verified mitmproxy CA certificate is installed in Firefox
[task 2019-12-09T10:59:15.913Z] 10:59:15 INFO - raptor-main Info: Playback recording date: 2019-06-21
[task 2019-12-09T10:59:15.913Z] 10:59:15 INFO - raptor-main Info: starting firefox
[task 2019-12-09T10:59:15.913Z] 10:59:15 INFO - Application command: C:\Users\task_1575887212\build\application\firefox\firefox.exe --wait-for-browser -profile c:\users\task_1575887212\appdata\local\temp\tmpte23fe.mozrunner
[task 2019-12-09T10:59:17.604Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] Attaching event listener successful!"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] starting raptorRunner"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] test name is: raptor-tp6-sheets-firefox-cold"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] test settings url is: http://127.0.0.1:49783/json/raptor-tp6-sheets-firefox-cold.json"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] raptor runner.js is loaded!"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] posting to control server"
[task 2019-12-09T10:59:17.609Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] raptor runner.js is loaded!"
[task 2019-12-09T10:59:17.614Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] testing on Firefox 73.0a1 20191209095039"
[task 2019-12-09T10:59:17.614Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] getting test settings from control server"
[task 2019-12-09T10:59:17.619Z] 10:59:17 INFO - raptor-control-server Info: received webext_status: raptor runner.js is loaded!
[task 2019-12-09T10:59:17.619Z] 10:59:17 INFO - raptor-control-server Info: reading test settings from json/raptor-tp6-sheets-firefox-cold.json
[task 2019-12-09T10:59:17.619Z] 10:59:17 INFO - raptor-control-server Info: sent test settings to webext runner
[task 2019-12-09T10:59:17.624Z] 10:59:17 INFO - PID 676 | console.log: "[raptor-runnerjs] post success"
and the failing ones:
task 2019-12-09T00:00:32.148Z] 00:00:32 INFO - Application command: C:\Users\task_1575841074\build\application\firefox\firefox.exe --wait-for-browser -profile c:\users\task_1575841074\appdata\local\temp\tmpu31emg.mozrunner
[task 2019-12-09T00:03:34.255Z] 00:03:34 CRITICAL - raptor-main Critical: Tests failed to finish! Application timed out.
[task 2019-12-09T00:03:34.255Z] 00:03:34 ERROR - raptor-main Error: Test failed to finish. Application timed out after 170 seconds
[task 2019-12-09T00:03:34.255Z] 00:03:34 INFO - raptor-mitmproxy Info: Stopping mitmproxy playback, killing process 10168
[task 2019-12-09T00:03:34.274Z] 00:03:34 ERROR - raptor-mitmproxy Error: Mitmproxy exited with error code 572
[task 2019-12-09T00:03:34.274Z] 00:03:34 INFO - raptor-mitmproxy Info: Turning off the browser proxy
[task 2019-12-09T00:03:34.274Z] 00:03:34 INFO - raptor-mitmproxy Info: writing: C:\Users\task_1575841074\build\application\firefox\distribution\policies.json
the problem is that "[raptor-runnerjs] Attaching event listener successful!" is never output in the failing case and the task times out. this seems predominately on windows, if there are concerns with windows startup, maybe asking :dmajor for help debugging would be a good idea.
Comment 80•5 years ago
|
||
Hi :dmajor, can you please debug into the above?
Comment 81•5 years ago
|
||
Sorry, I don't know how to debug this kind of thing. Whatever is happening here is at a very different level of the application than my expertise.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment 84•5 years ago
|
||
Seems to have started to perma fail with the changes from bug 1604917.
snorp, could you, please, take a look?
Comment hidden (Intermittent Failures Robot) |
This is the same as Bug 1605991. Duping there.
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Description
•