tp6m-c-1-vismet failing on Android 7.0 Moto G5 (Try)
Categories
(Testing :: Raptor, defect, P2)
Tracking
(Not tracked)
People
(Reporter: acreskey, Unassigned)
References
Details
On pushes to Try with or without additional commits, I'm consistently seeing the tp6m-c-1-vismet
test fail on Android 7.0 Moto G5.
The error is
Error extracting the video frames from /builds/worker/workspace/jobs/4/pages/www.amazon.com/data/video/1.mp4
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=285499634&repo=try&lineNumber=358
Reporter | ||
Comment 1•5 years ago
|
||
Also failing on Android 7.0 MotoG5 opt
Reporter | ||
Updated•5 years ago
|
Comment 3•5 years ago
|
||
(In reply to Ionuț Goldan [:igoldan] from comment #2)
Dave, could you set a priority for this?
Anything blocking visual metrics results should be considered P1.
Comment 4•5 years ago
|
||
I thought I saw this even with the newer browsertime from upstream, but I can't find the link. No matter, I will do an initial investigation.
Comment 5•5 years ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #4)
I thought I saw this even with the newer browsertime from upstream, but I can't find the link. No matter, I will do an initial investigation.
OK, first thoughts: first, it's not clear that we should be running the (red) vismet task at all, 'cuz its dependent (orange) browsertime task did, in fact, fail. Let's put a pin in that.
Inspecting the geckodriver logs suggests that we're killing ourselves in some way:
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - [2020-01-20 00:11:06] INFO: [browsertime.video] Stop firefox window recorder.
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - [2020-01-20 00:11:06] VERBOSE: [browsertime] Executing privileged async script const cb = arguments[arguments.length-1];
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - Promise.resolve(
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - windowUtils.setCompositionRecording(false)
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - )
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - .then(() => cb());
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - 1579479066047 webdriver::server DEBUG -> GET /session/32de162e-2cca-4d10-a7f7-b9f5062f1ae7/moz/context
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - 1579479066054 webdriver::server DEBUG <- 200 OK {"value":"content"}
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - 1579479066055 webdriver::server DEBUG -> POST /session/32de162e-2cca-4d10-a7f7-b9f5062f1ae7/moz/context {"context":"chrome"}
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - 1579479066063 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2020-01-20T00:12:15.111Z] 00:11:06 INFO - 1579479066065 webdriver::server DEBUG -> POST /session/32de162e-2cca-4d10-a7f7-b9f5062f1ae7/execute/async {"script":" const cb = arguments[arguments.length-1];\n Promise.resolve(\n windowUtils.setCompositionRecording(false)\n )\n .then(() => cb());\n ","args":[null]}
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 webdriver::server DEBUG Deleting session
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 geckodriver::android DEBUG Force stopping the Android package: org.mozilla.geckoview_example
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 mozdevice DEBUG Force stopping Android package: org.mozilla.geckoview_example
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 mozdevice DEBUG execute_host_command: >> "host:transport:ZY322LDKCW"
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 mozdevice DEBUG execute_host_command: << []
[task 2020-01-20T00:12:15.111Z] 00:11:51 INFO - 1579479111164 mozdevice DEBUG execute_host_command: >> "shell:am force-stop org.mozilla.geckoview_example"
[task 2020-01-20T00:12:15.111Z] 00:11:53 INFO - 1579479113783 mozdevice DEBUG execute_host_command: << ""
[task 2020-01-20T00:12:15.111Z] 00:11:53 INFO - 1579479113783 geckodriver::marionette DEBUG Android package force-stopped
[task 2020-01-20T00:12:15.111Z] 00:11:53 INFO - 1579479113783 mozdevice DEBUG execute_host_command: >> "host:transport:ZY322LDKCW"
[task 2020-01-20T00:12:15.111Z] 00:11:53 INFO - 1579479113783 mozdevice DEBUG execute_host_command: << []
[task 2020-01-20T00:12:15.111Z] 00:11:53 INFO - 1579479113783 mozdevice DEBUG execute_host_command: >> "shell:am clear-debug-app org.mozilla.geckoview_example"
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - 1579479114833 mozdevice DEBUG execute_host_command: << ""
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - 1579479114833 geckodriver::android DEBUG Disabled reading from configuration file
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - 1579479114834 geckodriver::android DEBUG Android port forward (34803 -> 2829) stopped
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - 1579479114834 webdriver::server DEBUG <- 500 Internal Server Error {"value":{"error":"unknown error","message":"Failed to decode response from marionette","stacktrace":""}}
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - /builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/promise.js:2626
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - throw error;
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - ^
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - WebDriverError: Failed to decode response from marionette
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at Object.throwDecodedError (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/error.js:514:15)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at parseHttpResponse (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:519:13)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at doSend.then.response (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:441:30)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at <anonymous>
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at process._tickCallback (internal/process/next_tick.js:189:7)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - From: Task: WebDriver.executeScript()
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at Driver.schedule (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/webdriver.js:807:17)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at Driver.executeAsyncScript (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/webdriver.js:891:17)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at SeleniumRunner.runPrivilegedAsyncScript (/builds/task_1579478599/fetches/browsertime/node_modules/browsertime/lib/core/seleniumRunner.js:390:30)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at <anonymous>
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - at process._tickCallback (internal/process/next_tick.js:189:7)
[task 2020-01-20T00:12:15.111Z] 00:11:54 INFO - raptor-main Info: removing reverse socket connections
Now, the geckodriver logging doesn't make it easy to determine the actual sequence, because there's no request ID threaded through response/request pairs. But it sure looks like there was an outstanding response/request pair. Or that the /execute/async
response/request died in the middle (usually, "Failed to decode response from marionette" means a crash (often OOM, with window recorder)).
I can think of a few ways to proceed:
- Improve the geckodriver logging so that we get more information about in-flight response/request pairs. If we've forgotten an
await
somewhere in browsertime, this will help us find it. I will try to see if there is a mechanical concurrency issue of this type shortly. - Actually observe crashes in browsertime jobs: Bug 1585036. This has independent value but will require some additional labour.
Comment 6•5 years ago
|
||
I realized that I didn't look at some of the logs to investigate crashes. I do think that we're seeing an OOM. From adb logcat:
01-20 00:11:51.092 535 535 E lowmemorykiller: Error writing /proc/8820/oom_score_adj; errno=22
01-20 00:11:51.094 1513 2623 W ActivityManager: Failed setting process group of 8820 to 1
...
01-20 00:11:51.105 1513 2567 I ActivityManager: Process org.mozilla.geckoview_example (pid 8820) has died
01-20 00:11:51.105 1513 2567 D ActivityManager: cleanUpApplicationRecord -- 8820
01-20 00:11:51.108 1513 2567 W ActivityManager: Force removing ActivityRecord{410e1e8 u0 org.mozilla.geckoview_example/.GeckoViewActivity t25798}: app died, no saved state
Which I think means that the GVE proc (in this case, it's 8820) was in fact killed by the OOM mechanism. However -- I don't see any evidence of crash reporting/breakpad. That means that we're unlikely to be able to do 2. in the comment above, i.e., to observe crashes.
Comment 7•5 years ago
|
||
Bas: could you suggest why we see OOM when we try to turn off (and write to disk, I guess) the window recorder rather than during regular use? That suggests that we can keep lots of frames but not process them. Any thoughts on why?
Updated•5 years ago
|
Updated•5 years ago
|
Comment 9•4 years ago
|
||
This is fixed now for mobile.
Description
•