Intermittent INFRA-ERROR: Failed to install geckoview_example.apk
Categories
(Testing :: Mozbase, defect, P5)
Tracking
(firefox69 fixed)
Tracking | Status | |
---|---|---|
firefox69 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: bc)
References
Details
(Keywords: intermittent-failure, Whiteboard: [cia:logs:2019][stockwell disable-recommended])
Attachments
(1 file)
Filed by: btara [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=219394957&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/R96TByOfQdS8hFDjPYNVdg/runs/0/artifacts/public/logs/live_backing.log 14:10:54 INFO - Installing collected packages: wptserve 14:10:55 INFO - Successfully installed wptserve-1.4.0 14:10:55 INFO - Return code: 0 14:10:55 INFO - Running post-action listener: _resource_record_post_action 14:10:55 INFO - Running post-action listener: _start_resource_monitoring 14:10:55 INFO - Starting resource monitoring. 14:10:55 INFO - [mozharness: 2018-12-31 14:10:55.037002Z] Finished create-virtualenv step (success) 14:10:55 INFO - [mozharness: 2018-12-31 14:10:55.037439Z] Running install step. 14:10:55 INFO - Running pre-action listener: _resource_record_pre_action 14:10:55 INFO - Running main action method: install 14:10:55 INFO - Getting output from command: ['/builds/worker/workspace/build/venv/bin/pip', 'freeze'] 14:10:55 INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/pip freeze 14:10:55 INFO - Reading from file tmpfile_stdout 14:10:55 INFO - Reading from file tmpfile_stderr 14:10:55 INFO - Using adb 1.0.40 14:10:55 INFO - adbd not restarted as root 14:10:56 INFO - su -c supported 14:10:56 INFO - /system/bin/ls -1A supported 14:10:56 INFO - Native cp support: True 14:10:56 INFO - Native chmod -R support: True 14:10:56 INFO - Native chown -R support: True 14:10:56 INFO - Setting SELinux Permissive 14:10:57 INFO - Native pidof support: True 14:10:57 INFO - New mozdevice with adb=adb, device=HT85S1A02743 14:15:57 FATAL - INFRA-ERROR: Failed to install /builds/worker/workspace/build/geckoview_example.apk on pixel2-27 14:15:57 FATAL - Running post_fatal callback... 14:15:57 FATAL - Exiting 4 14:15:57 INFO - Running post-action listener: _resource_record_post_action 14:15:57 INFO - [mozharness: 2018-12-31 14:15:57.938964Z] Finished install step (failed) 14:15:57 INFO - Running post-run listener: _resource_record_post_run 14:15:57 INFO - Total resource usage - Wall time: 303s; CPU: 2.0%; Read bytes: -36140032; Write bytes: 1052622848; Read time: -6804; Write time: -7318024 14:15:57 INFO - TinderboxPrint: CPU usage<br/>1.7% 14:15:57 INFO - TinderboxPrint: I/O read bytes / time<br/>-36,140,032 / -6,804 14:15:57 INFO - TinderboxPrint: I/O write bytes / time<br/>1,052,622,848 / -7,318,024 14:15:57 INFO - TinderboxPrint: CPU idle<br/>1,148.0 (95.2%)
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 6•5 years ago
|
||
This message comes from:
https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/android.py#326
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 17•5 years ago
|
||
bc: this isn't a build issue, it's a "can't push to device issue". Can you put it somewhere so that I don't get spammed about it each week? Thanks!
Assignee | ||
Comment 18•5 years ago
|
||
Sure thing -> Testing::Mozbase for mozdevice love.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•5 years ago
|
||
Bob, any idea why any of those failing jobs don't have an Android log cat as artifact? Maybe we cannot connect to the device at all? Or do we simply miss to upload the file?
Assignee | ||
Comment 22•5 years ago
|
||
My first thought would be in the mozharness Running install step. It may just be that we attempt to do the install before starting logcat collection and terminate the job when failing to install and therefore never get the logcat. I would look to see if we can't change the logcat collection to begin as soon as possible after we begin access the device. Close to the first ADBDevice initialization call would be awesome.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 25•5 years ago
|
||
We've gotten several of these failures after Bug 1550417 landed. They all show ADBTimeoutError during the install but nothing interesting that I can see in the logcat.
For April 1 - May 4 the breakdown by device is
Count | Device | Cluster |
---|---|---|
1 | pixel2-05 | and06 |
1 | pixel2-07 | and06 |
1 | pixel2-19 | and10 |
2 | pixel2-20 | and10 |
3 | pixel2-21 | and10 |
1 | pixel2-22 | and10 |
2 | pixel2-23 | and08 |
4 | pixel2-24 | and08 |
2 | pixel2-25 | and08 |
3 | pixel2-26 | and02 |
1 | pixel2-27 | and02 |
5 | pixel2-28 | and02 |
1 | pixel2-29 | and03 |
4 | pixel2-30 | and03 |
2 | pixel2-31 | and03 |
3 | pixel2-33 | and03 |
1 | pixel2-35 | and09 |
aerickson: Perhaps you could leverage the intermittent failure reports to collect the devices which exhibit failures for specific bugs combined with the cluster where the device is connected to identify potential issues with the device/host such as intermittent failures due to usb cables or hubs like we saw this week on and18.
We might also want to consider updating adb. adb 1.0.41 / Version 28.0.3-5475833 was released recently and it might help with intermittent issues as well.
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment 27•5 years ago
|
||
There are 24 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-12&endday=2019-05-19&tree=trunk&bug=1516985
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247192905&repo=autoland&lineNumber=802
13:25:22 INFO - Failed to install /builds/task_1558185568/workspace/build/geckoview_example.apk on pixel2-39: ADBTimeoutError args: adb wait-for-device install /builds/task_1558185568/workspace/build/geckoview_example.apk, exitcode: None, stdout:
13:25:22 FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
13:25:22 FATAL - Running post_fatal callback...
13:25:22 FATAL - Exiting 4
13:25:22 INFO - Killing logcat pid 404.
13:25:22 INFO - Running post-action listener: _resource_record_post_action
13:25:22 INFO - [mozharness: 2019-05-18 13:25:22.977346Z] Finished install step (failed)
13:25:22 INFO - Running post-run listener: _resource_record_post_run
13:25:23 INFO - Total resource usage - Wall time: 303s; CPU: 11.0%; Read bytes: 0; Write bytes: 590082048; Read time: 0; Write time: 431896
13:25:23 INFO - TinderboxPrint: CPU usage<br/>10.7%
13:25:23 INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
13:25:23 INFO - TinderboxPrint: I/O write bytes / time<br/>590,082,048 / 431,896
13:25:23 INFO - TinderboxPrint: CPU idle<br/>1,065.4 (88.8%)
13:25:23 INFO - TinderboxPrint: CPU system<br/>33.7 (2.8%)
13:25:23 INFO - TinderboxPrint: CPU user<br/>91.3 (7.6%)
13:25:23 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
13:25:23 INFO - install - Wall time: 304s; CPU: 11.0%; Read bytes: 0; Write bytes: 590082048; Read time: 0; Write time: 431896
cleanup
- cleanup
- local rv=4
- [[ -s /builds/worker/.xsession-errors ]]
- false
- exit 4
netstat -aop
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 127.0.0.11:34519 : LISTEN - off (0.00/0/0)
tcp 0 0 localhost:5037 localhost:38377 TIME_WAIT - timewait (59.90/0/0)
tcp 0 0 localhost:49108 localhost:60022 ESTABLISHED 23/generic-worker keepalive (1.86/0/0)
tcp 0 0 localhost:58329 localhost:5037 TIME_WAIT - timewait (59.73/0/0)
tcp6 0 0 [::]:60022 [::]:* LISTEN 35/livelog off (0.00/0/0)
tcp6 0 0 [::]:60023 [::]:* LISTEN 35/livelog off (0.00/0/0)
tcp6 0 0 localhost:60022 localhost:49108 ESTABLISHED 35/livelog keepalive (57.65/0/0)
udp 0 0 127.0.0.11:43887 : - off (0.00/0/0)
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path
script.py exitcode 4
[taskcluster 2019-05-18T13:25:23.256Z] Exit Code: 1
[taskcluster 2019-05-18T13:25:23.256Z] User Time: 28.76107s
[taskcluster 2019-05-18T13:25:23.256Z] Kernel Time: 6.550215s
[taskcluster 2019-05-18T13:25:23.256Z] Wall Time: 5m51.131256876s
[taskcluster 2019-05-18T13:25:23.256Z] Result: FAILED
[taskcluster 2019-05-18T13:25:23.256Z] === Task Finished ===
[taskcluster 2019-05-18T13:25:23.256Z] Task Duration: 5m51.131599033s
[taskcluster 2019-05-18T13:25:23.775Z] Uploading artifact public/logs/localconfig.json from file workspace/logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:24.226Z] Uploading artifact public/test_info/logcat-HT86B1A01456.log from file workspace/build/blobber_upload_dir/logcat-HT86B1A01456.log with content encoding "gzip", mime type "text/plain" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:24.678Z] Uploading artifact public/test_info/resource-usage.json from file workspace/build/blobber_upload_dir/resource-usage.json with content encoding "gzip", mime type "application/json" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:25.254Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/La42msjCTwOobTLPTx0yfg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-05-17T12:13:52.777Z
[taskcluster:error] exit status 1
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 29•5 years ago
|
||
No discernible pattern with regard to devices though all but one were pixel2.
19 ADBTimeoutError
1 ADBError : ls could not be found
1 ADBProcessError: unknown
2 ADBProcessError: [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!
Comment 30•5 years ago
|
||
Now I'm seeing this on 7.0 MotoG5 as well: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247413868&repo=autoland&lineNumber=803
20:16:35 INFO - Performing Streamed Install
20:16:35 FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
20:16:35 FATAL - Running post_fatal callback...
20:16:35 FATAL - Exiting 4
20:16:35 INFO - Killing logcat pid 408.
20:16:35 INFO - Running post-action listener: _resource_record_post_action
20:16:35 INFO - [mozharness: 2019-05-20 20:16:35.322699Z] Finished install step (failed)
20:16:35 INFO - Running post-run listener: _resource_record_post_run
20:16:35 INFO - Total resource usage - Wall time: 9s; CPU: 4.0%; Read bytes: 13156352; Write bytes: 318992384; Read time: 0; Write time: 255412
20:16:35 INFO - TinderboxPrint: CPU usage<br/>4.0%
20:16:35 INFO - TinderboxPrint: I/O read bytes / time<br/>13,156,352 / 0
20:16:35 INFO - TinderboxPrint: I/O write bytes / time<br/>318,992,384 / 255,412
20:16:35 INFO - TinderboxPrint: CPU idle<br/>30.1 (83.9%)
20:16:35 INFO - TinderboxPrint: CPU iowait<br/>4.3 (12.0%)
20:16:35 INFO - TinderboxPrint: CPU system<br/>0.7 (2.0%)
20:16:35 INFO - TinderboxPrint: CPU user<br/>0.6 (1.8%)
20:16:35 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
20:16:35 INFO - install - Wall time: 10s; CPU: 4.0%; Read bytes: 13156352; Write bytes: 318992384; Read time: 0; Write time: 255412
cleanup
Assignee | ||
Comment 31•5 years ago
|
||
That is another example of Failure [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!]
I guess it has a stale version still installed from a previous run?
Comment 32•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #31)
That is another example of Failure [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!]
I guess it has a stale version still installed from a previous run?
RyanVM, jlorenzo: is it possible that the changes to support FENNEC_NIGHTLY are resulting in GVE APKs with different signatures?
Comment hidden (Intermittent Failures Robot) |
Comment 34•5 years ago
|
||
Excellent question! I just checked the APK used by the task in comment 30 and the one use by this passing task[1]. Signatures are identical. Therefore, I don't think FENNEC_NIGHTLY is to blame.
I just noticed org.mozilla.geckoview_example
is signed by a key that exists on the build worker. Moreover, the autophone workers are available to try. As a consequence, it's possible a try build didn't fetch the existing key, and thus let gradle generate a new one on the fly. A clean way to avoid this kind of failure would be to let the signing workers sign org.mozilla.geckoview_example
. This way, the key is ensured to be the same across builds.
This change can be done in tree. It's a matter of adding new signing tasks and rewire the raptor dependencies to use the signing tasks instead of the build ones.
What do you all think?
[1] https://tools.taskcluster.net/groups/ScjYLdyTRFqWII-CAQr5og/tasks/JZOYrNCpRr-1NOKaaldXHA/details
Assignee | ||
Comment 35•5 years ago
|
||
I think it is definitely worth a try. We would still have issues with people triggering older builds and potentially causing failures. Would automatically attempting to uninstall prior to the install fix this even with builds signed with different keys?
Comment 36•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #35)
I think it is definitely worth a try. We would still have issues with people triggering older builds and potentially causing failures. Would automatically attempting to uninstall prior to the install fix this even with builds signed with different keys?
Yes, this should address the issue. I'd still like to understand how we're seeing this, just for depth of knowledge, but uninstalling should address the issue. When we do this, let's make it conditional on this specific failure, rather than always uninstalling, 'cuz uninstall-reinstall is very slow locally and shouldn't generally be necessary.
Comment hidden (Intermittent Failures Robot) |
Comment 38•5 years ago
|
||
re: https://bugzilla.mozilla.org/show_bug.cgi?id=1516985#c25
I've created a TODO to create an "android-hw interminttent failure scanner tool". It does sound useful.
I've verified that the docker image is already using adb 1.0.41.
Comment hidden (Intermittent Failures Robot) |
Comment 40•5 years ago
|
||
In the last 7 days there have been 39 occurrences on android-hw, build types opt and pgo.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 43•5 years ago
|
||
This bug failed 42 times in the last 7 days. Occurs on android-hw-g5-7-0-arm7-api-16, android-hw-p2-8-0-android-aarch64 and android-hw-p2-8-0-arm7-api-16 on opt build types.
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252082314&repo=autoland&lineNumber=787
Comment hidden (Intermittent Failures Robot) |
Comment 45•5 years ago
|
||
There are 55 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-06-18&endday=2019-06-25&tree=trunk&bug=1516985
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=253289228&repo=mozilla-central&lineNumber=796
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 48•5 years ago
|
||
hi Bob, this has reached the disable recommended queue, is there something we can do here?
Assignee | ||
Comment 49•5 years ago
|
||
I have a patch to fix the issue on raptor but it has been sitting on try for two days. Hopefully I can get it to run today and I'll let you know.
The root fix would be to fix this signing issue I think.
Assignee | ||
Comment 50•5 years ago
|
||
Depends on D36013
Assignee | ||
Comment 51•5 years ago
|
||
Comment 52•5 years ago
|
||
Apart from the two exceptions, try looks good.
Thanks for working on this Bob.
Comment 53•5 years ago
|
||
Pushed by bclary@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4d37f42cecaf Uninstall Firefox Android app prior to installation to prevent INSTALL_FAILED_UPDATE_INCOMPATIBLE errors, r=davehunt
Comment 54•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 55•5 years ago
|
||
This is still happening on central: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255091752&repo=mozilla-central&lineNumber=796
11:57:11 INFO - Setting SELinux Permissive
11:57:11 INFO - Native pidof support: True
11:57:12 INFO - New mozdevice with adb=adb, device=FA84B1A00179
11:57:12 INFO - adb -s FA84B1A00179 logcat -v threadtime Trace:S StrictMode:S ExchangeService:S
12:02:12 INFO - Failed to install /builds/task_1562414179/workspace/build/geckoview_example.apk on pixel2-21: ADBTimeoutError args: adb wait-for-device install /builds/task_1562414179/workspace/build/geckoview_example.apk, exitcode: None, stdout:
12:02:12 FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
12:02:12 FATAL - Running post_fatal callback...
12:02:12 FATAL - Exiting 4
12:02:12 INFO - Killing logcat pid 494.
12:02:12 INFO - Running post-action listener: _resource_record_post_action
12:02:12 INFO - [mozharness: 2019-07-06 12:02:12.840568Z] Finished install step (failed)
12:02:12 INFO - Running post-run listener: _resource_record_post_run
12:02:12 INFO - Total resource usage - Wall time: 303s; CPU: 10.0%; Read bytes: 0; Write bytes: 5529354240; Read time: 0; Write time: 49664424
12:02:12 INFO - TinderboxPrint: CPU usage<br/>10.4%
12:02:12 INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
12:02:12 INFO - TinderboxPrint: I/O write bytes / time<br/>5,529,354,240 / 49,664,424
12:02:12 INFO - TinderboxPrint: CPU idle<br/>1,058.2 (87.7%)
12:02:12 INFO - TinderboxPrint: CPU iowait<br/>22.9 (1.9%)
12:02:12 INFO - TinderboxPrint: CPU system<br/>41.9 (3.5%)
12:02:12 INFO - TinderboxPrint: CPU user<br/>79.3 (6.6%)
12:02:12 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
12:02:12 INFO - install - Wall time: 304s; CPU: 10.0%; Read bytes: 0; Write bytes: 5529354240; Read time: 0; Write time: 49664424
cleanup
+ cleanup
+ local rv=4
+ [[ -s /builds/worker/.xsession-errors ]]
+ false
+ exit 4
Assignee | ||
Comment 56•5 years ago
|
||
That is an ADBTimeoutError error which is a generic problem with accessing the device over adb/usb and is not related to the specific installation error due to INSTALL_FAILED_UPDATE_INCOMPATIBLE errors.
Assignee | ||
Comment 57•5 years ago
|
||
This error is continuing to be misclassified due to the error not being properly reported and not being available in the bug suggestions in treeherder:
08:10:45 INFO - Failed to install /builds/task_1562313891/workspace/build/geckoview_example.apk on pixel2-38: ADBTimeoutError args: adb wait-for-device install /builds/task_1562313891/workspace/build/geckoview_example.apk, exitcode: None, stdout:
08:10:45 FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
except (mozdevice.ADBError, mozdevice.ADBTimeoutError), e:
self.info('Failed to install %s on %s: %s %s' %
(apk, self.device_name,
type(e).__name__, e))
self.fatal('INFRA-ERROR: Failed to install %s' %
os.path.basename(apk),
EXIT_STATUS_DICT[TBPL_RETRY])
I'll file a bug to fix that.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 59•5 years ago
|
||
This has dramatically fallen since the patch to uninstall then install on raptor. The patch in bug 1563957 landed today and should help distinguish real errors from device issues.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•