Intermittent Android x86 We have not been able to establish a telnet connection with the emulator

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

({intermittent-failure})

other
x86
Mac OS X
intermittent-failure

Firefox Tracking Flags

(firefox27 unaffected, firefox28 unaffected, firefox29 fixed, firefox-esr24 unaffected)

Details

Attachments

(3 attachments, 1 obsolete attachment)

Consider: 

https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=5dc615915eb6

There are 3 instances of this error in 22 runs of S4.

We can probably live with that, but I wonder if we could do better.
15:15:30     INFO - #####
15:15:30     INFO - ##### Running start-emulators step.
15:15:30     INFO - #####
15:15:30     INFO - Running pre-action listener: _resource_record_pre_action
15:15:30     INFO - Running main action method: start_emulators
15:15:30     INFO - Let's kill every process called compiz
15:15:30     INFO - Killing pid 2393.
15:15:30     INFO - Created temp file /tmp/tmpzo7j_0.
15:15:30     INFO - Trying to start the emulator with this command: emulator -avd test-1 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5554 -qemu -m 1024 -enable-kvm
15:15:30     INFO - Sleeping 10 seconds
15:15:40     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
15:15:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:15:40     INFO - Sleeping 30 seconds
15:16:10     INFO -   Attempt #2 to redirect ports: (5554, 20701, 20700)
15:16:10     INFO - Trying again after exception: [Errno 111] Connection refused
15:16:10     INFO - Sleeping 30 seconds
15:16:40     INFO -   Attempt #3 to redirect ports: (5554, 20701, 20700)
15:16:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:16:40     INFO - Sleeping 30 seconds
15:17:10     INFO -   Attempt #4 to redirect ports: (5554, 20701, 20700)
15:17:10     INFO - Trying again after exception: [Errno 111] Connection refused
15:17:10     INFO - Sleeping 30 seconds
15:17:40     INFO -   Attempt #5 to redirect ports: (5554, 20701, 20700)
15:17:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:17:40    FATAL - We have not been able to establish a telnet connection with the emulator
15:17:40    FATAL - Running post_fatal callback...
15:17:40     INFO - Let's kill every process called emulator64-x86
15:17:40     INFO - Killing pid 2718.
15:17:40    FATAL - Exiting -1
In comparison, this seems typical on success:

09:23:19     INFO - #####
09:23:19     INFO - ##### Running start-emulators step.
09:23:19     INFO - #####
09:23:19     INFO - Running pre-action listener: _resource_record_pre_action
09:23:19     INFO - Running main action method: start_emulators
09:23:19     INFO - Let's kill every process called compiz
09:23:19     INFO - Killing pid 2415.
09:23:19     INFO - Created temp file /tmp/tmpgzGPNu.
09:23:19     INFO - Trying to start the emulator with this command: emulator -avd test-1 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5554 -qemu -m 1024 -enable-kvm
09:23:19     INFO - Sleeping 10 seconds
09:23:29     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
09:23:29     INFO - test-1: 5554; sut port: 20701/20700
09:23:29     INFO - Created temp file /tmp/tmpXjYJmL.
09:23:29     INFO - Trying to start the emulator with this command: emulator -avd test-2 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5556 -qemu -m 1024 -enable-kvm
09:23:29     INFO - Sleeping 10 seconds
09:23:39     INFO -   Attempt #1 to redirect ports: (5556, 20703, 20702)
09:23:39     INFO - test-2: 5556; sut port: 20703/20702
09:23:39     INFO - Created temp file /tmp/tmpqkRu5X.
09:23:39     INFO - Trying to start the emulator with this command: emulator -avd test-3 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5558 -qemu -m 1024 -enable-kvm
09:23:39     INFO - Sleeping 10 seconds
09:23:49     INFO -   Attempt #1 to redirect ports: (5558, 20705, 20704)
09:23:49     INFO - test-3: 5558; sut port: 20705/20704
09:23:49     INFO - Created temp file /tmp/tmp49YwFe.
09:23:49     INFO - Trying to start the emulator with this command: emulator -avd test-4 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5560 -qemu -m 1024 -enable-kvm
09:23:49     INFO - Sleeping 10 seconds
09:23:59     INFO -   Attempt #1 to redirect ports: (5560, 20707, 20706)
09:23:59     INFO - test-4: 5560; sut port: 20707/20706
Blocks: 891959
Created attachment 8347499 [details] [diff] [review]
(1) dump emulator log if _redirectSUT fails

I wonder if the emulator log would show anything interesting in this case: let's dump it.
Attachment #8347499 - Flags: review?(armenzg)

Updated

4 years ago
Attachment #8347499 - Flags: review?(armenzg) → review+
https://hg.mozilla.org/build/mozharness/rev/563ce5d8cef5
Created attachment 8348234 [details] [diff] [review]
(2) fix emulator logging if redirect fails

That first patch didn't quite work: https://tbpl.mozilla.org/php/getParsedLog.php?id=32032518&tree=Cedar&full=1#error1

That's because emulator_procs has not been populated yet. A little re-arranging of the logic should get this working.
Attachment #8348234 - Flags: review?(armenzg)
Created attachment 8348239 [details] [diff] [review]
(2) fix emulator logging if redirect fails

oops - attached the wrong patch
Attachment #8348234 - Attachment is obsolete: true
Attachment #8348234 - Flags: review?(armenzg)
Attachment #8348239 - Flags: review?(armenzg)
Attachment #8347499 - Attachment description: dump emulator log if _redirectSUT fails → (1) dump emulator log if _redirectSUT fails

Updated

4 years ago
Attachment #8348239 - Flags: review?(armenzg) → review+
(2) https://hg.mozilla.org/build/mozharness/rev/b74d08c2bc86
That worked:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32040209&tree=Cedar&full=1#error0
The failed emulator log (comment 8) shows:

...                 ioctl(KVM_CREATE_VM) failed: Interrupted system call
12:32:10     INFO - ko:failed to initialize KVM
https://tbpl.mozilla.org/php/getParsedLog.php?id=32051383&tree=Cedar

(looks the same as Comment 8).
in production
Keywords: intermittent-failure
Blocks: 936226
No longer blocks: 891959
I checked the permissions on /dev/kvm; for both the passing and failing cases, permissions are the same:

crw-rw----+ 1 root kvm 10, 232 Dec 18 14:05 /dev/kvm
Delaying the launch of the emulators can virtually eliminate this error. Normally I am seeing the emulator launch fail about 1 time in 10. If we sleep for 3 minutes before trying to launch the emulators, this still happens, but only ~1 time in 100. If the sleep is reduced to 1 minute, I see about 1 failure in 50. Below 1 minute, failures increase significantly.
In light of comment 13, I thought I would try retrying: If an emulator is launched but cannot be contacted for port redirection, kill any running emulators, wait for a while, and then try launching the emulators again. A pause of 30 seconds between retries and up to 3 retries seems to effectively resolve this bug.

Here is an example where we fail (with the ioctl(KVM_CREATE_VM) message), retry, and succeed:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32254143&tree=Ash&full=1
Created attachment 8350759 [details] [diff] [review]
(3) retry as described in Comment 14
Comment hidden (Treeherder Robot)
Attachment #8350759 - Flags: review?(armenzg)

Comment 17

4 years ago
Comment on attachment 8350759 [details] [diff] [review]
(3) retry as described in Comment 14

Review of attachment 8350759 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/android_emulator_unittest.py
@@ +471,5 @@
>          self._kill_processes("compiz")
> +
> +        attempts = 0
> +        redirect_failed = True
> +        while attempts < 3 and redirect_failed:

Mind adding a comment on this while block to help future readers know what we're doing.
Attachment #8350759 - Flags: review?(armenzg) → review+
With block comment added:

https://hg.mozilla.org/build/mozharness/rev/7de07ccc4e04
Patch (3) is an effective solution: https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=36c8eeaf1589

Comment 20

4 years ago
Merged mozharness.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
status-firefox27: --- → unaffected
status-firefox28: --- → unaffected
status-firefox29: --- → fixed
status-firefox-esr24: --- → unaffected
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
See Also: → bug 1109932
Comment hidden (Treeherder Robot)
You need to log in before you can comment on or make changes to this bug.