Closed Bug 1516985 Opened 5 years ago Closed 5 years ago

Intermittent INFRA-ERROR: Failed to install geckoview_example.apk

Categories

(Testing :: Mozbase, defect, P5)

defect

Tracking

(firefox69 fixed)

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bc)

References

Details

(Keywords: intermittent-failure, Whiteboard: [cia:logs:2019][stockwell disable-recommended])

Attachments

(1 file)

Filed by: btara [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=219394957&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/R96TByOfQdS8hFDjPYNVdg/runs/0/artifacts/public/logs/live_backing.log

14:10:54     INFO -  Installing collected packages: wptserve
14:10:55     INFO -  Successfully installed wptserve-1.4.0
14:10:55     INFO - Return code: 0
14:10:55     INFO - Running post-action listener: _resource_record_post_action
14:10:55     INFO - Running post-action listener: _start_resource_monitoring
14:10:55     INFO - Starting resource monitoring.
14:10:55     INFO - [mozharness: 2018-12-31 14:10:55.037002Z] Finished create-virtualenv step (success)
14:10:55     INFO - [mozharness: 2018-12-31 14:10:55.037439Z] Running install step.
14:10:55     INFO - Running pre-action listener: _resource_record_pre_action
14:10:55     INFO - Running main action method: install
14:10:55     INFO - Getting output from command: ['/builds/worker/workspace/build/venv/bin/pip', 'freeze']
14:10:55     INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/pip freeze
14:10:55     INFO - Reading from file tmpfile_stdout
14:10:55     INFO - Reading from file tmpfile_stderr
14:10:55     INFO - Using adb 1.0.40
14:10:55     INFO - adbd not restarted as root
14:10:56     INFO - su -c supported
14:10:56     INFO - /system/bin/ls -1A supported
14:10:56     INFO - Native cp support: True
14:10:56     INFO - Native chmod -R support: True
14:10:56     INFO - Native chown -R support: True
14:10:56     INFO - Setting SELinux Permissive
14:10:57     INFO - Native pidof support: True
14:10:57     INFO - New mozdevice with adb=adb, device=HT85S1A02743
14:15:57    FATAL - INFRA-ERROR: Failed to install /builds/worker/workspace/build/geckoview_example.apk on pixel2-27
14:15:57    FATAL - Running post_fatal callback...
14:15:57    FATAL - Exiting 4
14:15:57     INFO - Running post-action listener: _resource_record_post_action
14:15:57     INFO - [mozharness: 2018-12-31 14:15:57.938964Z] Finished install step (failed)
14:15:57     INFO - Running post-run listener: _resource_record_post_run
14:15:57     INFO - Total resource usage - Wall time: 303s; CPU: 2.0%; Read bytes: -36140032; Write bytes: 1052622848; Read time: -6804; Write time: -7318024
14:15:57     INFO - TinderboxPrint: CPU usage<br/>1.7%
14:15:57     INFO - TinderboxPrint: I/O read bytes / time<br/>-36,140,032 / -6,804
14:15:57     INFO - TinderboxPrint: I/O write bytes / time<br/>1,052,622,848 / -7,318,024
14:15:57     INFO - TinderboxPrint: CPU idle<br/>1,148.0 (95.2%)
Summary: Intermittent INFRA-ERROR: Failed to install /builds/worker/workspace/build/geckoview_example.apk on pixel2-27 → Intermittent INFRA-ERROR: Failed to install /builds/worker/workspace/build/geckoview_example.apk on pixel2-XX
See Also: → 1524271

bc: this isn't a build issue, it's a "can't push to device issue". Can you put it somewhere so that I don't get spammed about it each week? Thanks!

Flags: needinfo?(bob)

Sure thing -> Testing::Mozbase for mozdevice love.

Component: Build Config & IDE Support → Mozbase
Flags: needinfo?(bob)
Product: Firefox for Android → Testing

Bob, any idea why any of those failing jobs don't have an Android log cat as artifact? Maybe we cannot connect to the device at all? Or do we simply miss to upload the file?

Flags: needinfo?(bob)

My first thought would be in the mozharness Running install step. It may just be that we attempt to do the install before starting logcat collection and terminate the job when failing to install and therefore never get the logcat. I would look to see if we can't change the logcat collection to begin as soon as possible after we begin access the device. Close to the first ADBDevice initialization call would be awesome.

Flags: needinfo?(bob)

We've gotten several of these failures after Bug 1550417 landed. They all show ADBTimeoutError during the install but nothing interesting that I can see in the logcat.

For April 1 - May 4 the breakdown by device is

Count Device Cluster
1 pixel2-05 and06
1 pixel2-07 and06
1 pixel2-19 and10
2 pixel2-20 and10
3 pixel2-21 and10
1 pixel2-22 and10
2 pixel2-23 and08
4 pixel2-24 and08
2 pixel2-25 and08
3 pixel2-26 and02
1 pixel2-27 and02
5 pixel2-28 and02
1 pixel2-29 and03
4 pixel2-30 and03
2 pixel2-31 and03
3 pixel2-33 and03
1 pixel2-35 and09

aerickson: Perhaps you could leverage the intermittent failure reports to collect the devices which exhibit failures for specific bugs combined with the cluster where the device is connected to identify potential issues with the device/host such as intermittent failures due to usb cables or hubs like we saw this week on and18.

We might also want to consider updating adb. adb 1.0.41 / Version 28.0.3-5475833 was released recently and it might help with intermittent issues as well.

Flags: needinfo?(aerickson)
Summary: Intermittent INFRA-ERROR: Failed to install /builds/worker/workspace/build/geckoview_example.apk on pixel2-XX → Intermittent INFRA-ERROR: Failed to install geckoview_example.apk on pixel2-XX

There are 24 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-12&endday=2019-05-19&tree=trunk&bug=1516985

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247192905&repo=autoland&lineNumber=802

13:25:22 INFO - Failed to install /builds/task_1558185568/workspace/build/geckoview_example.apk on pixel2-39: ADBTimeoutError args: adb wait-for-device install /builds/task_1558185568/workspace/build/geckoview_example.apk, exitcode: None, stdout:
13:25:22 FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
13:25:22 FATAL - Running post_fatal callback...
13:25:22 FATAL - Exiting 4
13:25:22 INFO - Killing logcat pid 404.
13:25:22 INFO - Running post-action listener: _resource_record_post_action
13:25:22 INFO - [mozharness: 2019-05-18 13:25:22.977346Z] Finished install step (failed)
13:25:22 INFO - Running post-run listener: _resource_record_post_run
13:25:23 INFO - Total resource usage - Wall time: 303s; CPU: 11.0%; Read bytes: 0; Write bytes: 590082048; Read time: 0; Write time: 431896
13:25:23 INFO - TinderboxPrint: CPU usage<br/>10.7%
13:25:23 INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
13:25:23 INFO - TinderboxPrint: I/O write bytes / time<br/>590,082,048 / 431,896
13:25:23 INFO - TinderboxPrint: CPU idle<br/>1,065.4 (88.8%)
13:25:23 INFO - TinderboxPrint: CPU system<br/>33.7 (2.8%)
13:25:23 INFO - TinderboxPrint: CPU user<br/>91.3 (7.6%)
13:25:23 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
13:25:23 INFO - install - Wall time: 304s; CPU: 11.0%; Read bytes: 0; Write bytes: 590082048; Read time: 0; Write time: 431896
cleanup

  • cleanup
  • local rv=4
  • [[ -s /builds/worker/.xsession-errors ]]
  • false
  • exit 4

netstat -aop
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 127.0.0.11:34519 : LISTEN - off (0.00/0/0)
tcp 0 0 localhost:5037 localhost:38377 TIME_WAIT - timewait (59.90/0/0)
tcp 0 0 localhost:49108 localhost:60022 ESTABLISHED 23/generic-worker keepalive (1.86/0/0)
tcp 0 0 localhost:58329 localhost:5037 TIME_WAIT - timewait (59.73/0/0)
tcp6 0 0 [::]:60022 [::]:* LISTEN 35/livelog off (0.00/0/0)
tcp6 0 0 [::]:60023 [::]:* LISTEN 35/livelog off (0.00/0/0)
tcp6 0 0 localhost:60022 localhost:49108 ESTABLISHED 35/livelog keepalive (57.65/0/0)
udp 0 0 127.0.0.11:43887 : - off (0.00/0/0)
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path

script.py exitcode 4
[taskcluster 2019-05-18T13:25:23.256Z] Exit Code: 1
[taskcluster 2019-05-18T13:25:23.256Z] User Time: 28.76107s
[taskcluster 2019-05-18T13:25:23.256Z] Kernel Time: 6.550215s
[taskcluster 2019-05-18T13:25:23.256Z] Wall Time: 5m51.131256876s
[taskcluster 2019-05-18T13:25:23.256Z] Result: FAILED
[taskcluster 2019-05-18T13:25:23.256Z] === Task Finished ===
[taskcluster 2019-05-18T13:25:23.256Z] Task Duration: 5m51.131599033s
[taskcluster 2019-05-18T13:25:23.775Z] Uploading artifact public/logs/localconfig.json from file workspace/logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:24.226Z] Uploading artifact public/test_info/logcat-HT86B1A01456.log from file workspace/build/blobber_upload_dir/logcat-HT86B1A01456.log with content encoding "gzip", mime type "text/plain" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:24.678Z] Uploading artifact public/test_info/resource-usage.json from file workspace/build/blobber_upload_dir/resource-usage.json with content encoding "gzip", mime type "application/json" and expiry 2020-05-17T12:13:52.777Z
[taskcluster 2019-05-18T13:25:25.254Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/La42msjCTwOobTLPTx0yfg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-05-17T12:13:52.777Z
[taskcluster:error] exit status 1

Whiteboard: [cia:logs:2019] → [cia:logs:2019][stockwell needswork:owner]

No discernible pattern with regard to devices though all but one were pixel2.

19 ADBTimeoutError
1 ADBError : ls could not be found
1 ADBProcessError: unknown
2 ADBProcessError: [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!

Now I'm seeing this on 7.0 MotoG5 as well: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247413868&repo=autoland&lineNumber=803

20:16:35     INFO - Performing Streamed Install
20:16:35    FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
20:16:35    FATAL - Running post_fatal callback...
20:16:35    FATAL - Exiting 4
20:16:35     INFO - Killing logcat pid 408.
20:16:35     INFO - Running post-action listener: _resource_record_post_action
20:16:35     INFO - [mozharness: 2019-05-20 20:16:35.322699Z] Finished install step (failed)
20:16:35     INFO - Running post-run listener: _resource_record_post_run
20:16:35     INFO - Total resource usage - Wall time: 9s; CPU: 4.0%; Read bytes: 13156352; Write bytes: 318992384; Read time: 0; Write time: 255412
20:16:35     INFO - TinderboxPrint: CPU usage<br/>4.0%
20:16:35     INFO - TinderboxPrint: I/O read bytes / time<br/>13,156,352 / 0
20:16:35     INFO - TinderboxPrint: I/O write bytes / time<br/>318,992,384 / 255,412
20:16:35     INFO - TinderboxPrint: CPU idle<br/>30.1 (83.9%)
20:16:35     INFO - TinderboxPrint: CPU iowait<br/>4.3 (12.0%)
20:16:35     INFO - TinderboxPrint: CPU system<br/>0.7 (2.0%)
20:16:35     INFO - TinderboxPrint: CPU user<br/>0.6 (1.8%)
20:16:35     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
20:16:35     INFO - install - Wall time: 10s; CPU: 4.0%; Read bytes: 13156352; Write bytes: 318992384; Read time: 0; Write time: 255412
cleanup
Summary: Intermittent INFRA-ERROR: Failed to install geckoview_example.apk on pixel2-XX → Intermittent INFRA-ERROR: Failed to install geckoview_example.apk

That is another example of Failure [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!]

I guess it has a stale version still installed from a previous run?

(In reply to Bob Clary [:bc:] from comment #31)

That is another example of Failure [INSTALL_FAILED_UPDATE_INCOMPATIBLE: Package org.mozilla.geckoview_example signatures do not match the previously installed version; ignoring!]

I guess it has a stale version still installed from a previous run?

RyanVM, jlorenzo: is it possible that the changes to support FENNEC_NIGHTLY are resulting in GVE APKs with different signatures?

Flags: needinfo?(ryanvm)
Flags: needinfo?(jlorenzo)

Excellent question! I just checked the APK used by the task in comment 30 and the one use by this passing task[1]. Signatures are identical. Therefore, I don't think FENNEC_NIGHTLY is to blame.

I just noticed org.mozilla.geckoview_example is signed by a key that exists on the build worker. Moreover, the autophone workers are available to try. As a consequence, it's possible a try build didn't fetch the existing key, and thus let gradle generate a new one on the fly. A clean way to avoid this kind of failure would be to let the signing workers sign org.mozilla.geckoview_example. This way, the key is ensured to be the same across builds.
This change can be done in tree. It's a matter of adding new signing tasks and rewire the raptor dependencies to use the signing tasks instead of the build ones.

What do you all think?

[1] https://tools.taskcluster.net/groups/ScjYLdyTRFqWII-CAQr5og/tasks/JZOYrNCpRr-1NOKaaldXHA/details

Flags: needinfo?(ryanvm)
Flags: needinfo?(jlorenzo)

I think it is definitely worth a try. We would still have issues with people triggering older builds and potentially causing failures. Would automatically attempting to uninstall prior to the install fix this even with builds signed with different keys?

(In reply to Bob Clary [:bc:] from comment #35)

I think it is definitely worth a try. We would still have issues with people triggering older builds and potentially causing failures. Would automatically attempting to uninstall prior to the install fix this even with builds signed with different keys?

Yes, this should address the issue. I'd still like to understand how we're seeing this, just for depth of knowledge, but uninstalling should address the issue. When we do this, let's make it conditional on this specific failure, rather than always uninstalling, 'cuz uninstall-reinstall is very slow locally and shouldn't generally be necessary.

re: https://bugzilla.mozilla.org/show_bug.cgi?id=1516985#c25

I've created a TODO to create an "android-hw interminttent failure scanner tool". It does sound useful.

I've verified that the docker image is already using adb 1.0.41.

Flags: needinfo?(aerickson)

In the last 7 days there have been 39 occurrences on android-hw, build types opt and pgo.

This bug failed 42 times in the last 7 days. Occurs on android-hw-g5-7-0-arm7-api-16, android-hw-p2-8-0-android-aarch64 and android-hw-p2-8-0-arm7-api-16 on opt build types.

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252082314&repo=autoland&lineNumber=787

hi Bob, this has reached the disable recommended queue, is there something we can do here?

Flags: needinfo?(bob)

I have a patch to fix the issue on raptor but it has been sitting on try for two days. Hopefully I can get it to run today and I'll let you know.

The root fix would be to fix this signing issue I think.

Flags: needinfo?(bob)

Apart from the two exceptions, try looks good.

Thanks for working on this Bob.

Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4d37f42cecaf
Uninstall Firefox Android app prior to installation to prevent INSTALL_FAILED_UPDATE_INCOMPATIBLE errors, r=davehunt
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Assignee: nobody → bob

This is still happening on central: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255091752&repo=mozilla-central&lineNumber=796

11:57:11     INFO - Setting SELinux Permissive
11:57:11     INFO - Native pidof support: True
11:57:12     INFO - New mozdevice with adb=adb, device=FA84B1A00179
11:57:12     INFO - adb -s FA84B1A00179 logcat -v threadtime Trace:S StrictMode:S ExchangeService:S
12:02:12     INFO - Failed to install /builds/task_1562414179/workspace/build/geckoview_example.apk on pixel2-21: ADBTimeoutError args: adb wait-for-device install /builds/task_1562414179/workspace/build/geckoview_example.apk, exitcode: None, stdout: 
12:02:12    FATAL - INFRA-ERROR: Failed to install geckoview_example.apk
12:02:12    FATAL - Running post_fatal callback...
12:02:12    FATAL - Exiting 4
12:02:12     INFO - Killing logcat pid 494.
12:02:12     INFO - Running post-action listener: _resource_record_post_action
12:02:12     INFO - [mozharness: 2019-07-06 12:02:12.840568Z] Finished install step (failed)
12:02:12     INFO - Running post-run listener: _resource_record_post_run
12:02:12     INFO - Total resource usage - Wall time: 303s; CPU: 10.0%; Read bytes: 0; Write bytes: 5529354240; Read time: 0; Write time: 49664424
12:02:12     INFO - TinderboxPrint: CPU usage<br/>10.4%
12:02:12     INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
12:02:12     INFO - TinderboxPrint: I/O write bytes / time<br/>5,529,354,240 / 49,664,424
12:02:12     INFO - TinderboxPrint: CPU idle<br/>1,058.2 (87.7%)
12:02:12     INFO - TinderboxPrint: CPU iowait<br/>22.9 (1.9%)
12:02:12     INFO - TinderboxPrint: CPU system<br/>41.9 (3.5%)
12:02:12     INFO - TinderboxPrint: CPU user<br/>79.3 (6.6%)
12:02:12     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
12:02:12     INFO - install - Wall time: 304s; CPU: 10.0%; Read bytes: 0; Write bytes: 5529354240; Read time: 0; Write time: 49664424
cleanup
+ cleanup
+ local rv=4
+ [[ -s /builds/worker/.xsession-errors ]]
+ false
+ exit 4
Flags: needinfo?(bob)

That is an ADBTimeoutError error which is a generic problem with accessing the device over adb/usb and is not related to the specific installation error due to INSTALL_FAILED_UPDATE_INCOMPATIBLE errors.

Flags: needinfo?(bob)

This error is continuing to be misclassified due to the error not being properly reported and not being available in the bug suggestions in treeherder:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=254904531&repo=mozilla-central&lineNumber=795-796

08:10:45     INFO - Failed to install /builds/task_1562313891/workspace/build/geckoview_example.apk on pixel2-38: ADBTimeoutError args: adb wait-for-device install /builds/task_1562313891/workspace/build/geckoview_example.apk, exitcode: None, stdout: 
08:10:45    FATAL - INFRA-ERROR: Failed to install geckoview_example.apk

https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/android.py#342-348

         except (mozdevice.ADBError, mozdevice.ADBTimeoutError), e:
            self.info('Failed to install %s on %s: %s %s' %
                      (apk, self.device_name,
                       type(e).__name__, e))
            self.fatal('INFRA-ERROR: Failed to install %s' %
                       os.path.basename(apk),
                       EXIT_STATUS_DICT[TBPL_RETRY])

I'll file a bug to fix that.

See Also: → 1563957

This has dramatically fallen since the patch to uninstall then install on raptor. The patch in bug 1563957 landed today and should help distinguish real errors from device issues.

See Also: → 1875766
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: