Intermittent Android x86 We have not been able to establish a telnet connection with the emulator

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

({intermittent-failure})

other
x86
Mac OS X
intermittent-failure

Firefox Tracking Flags

(firefox27 unaffected, firefox28 unaffected, firefox29 fixed, firefox-esr24 unaffected)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

4 years ago
Consider: 

https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=5dc615915eb6

There are 3 instances of this error in 22 runs of S4.

We can probably live with that, but I wonder if we could do better.
(Assignee)

Comment 1

4 years ago
15:15:30     INFO - #####
15:15:30     INFO - ##### Running start-emulators step.
15:15:30     INFO - #####
15:15:30     INFO - Running pre-action listener: _resource_record_pre_action
15:15:30     INFO - Running main action method: start_emulators
15:15:30     INFO - Let's kill every process called compiz
15:15:30     INFO - Killing pid 2393.
15:15:30     INFO - Created temp file /tmp/tmpzo7j_0.
15:15:30     INFO - Trying to start the emulator with this command: emulator -avd test-1 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5554 -qemu -m 1024 -enable-kvm
15:15:30     INFO - Sleeping 10 seconds
15:15:40     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
15:15:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:15:40     INFO - Sleeping 30 seconds
15:16:10     INFO -   Attempt #2 to redirect ports: (5554, 20701, 20700)
15:16:10     INFO - Trying again after exception: [Errno 111] Connection refused
15:16:10     INFO - Sleeping 30 seconds
15:16:40     INFO -   Attempt #3 to redirect ports: (5554, 20701, 20700)
15:16:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:16:40     INFO - Sleeping 30 seconds
15:17:10     INFO -   Attempt #4 to redirect ports: (5554, 20701, 20700)
15:17:10     INFO - Trying again after exception: [Errno 111] Connection refused
15:17:10     INFO - Sleeping 30 seconds
15:17:40     INFO -   Attempt #5 to redirect ports: (5554, 20701, 20700)
15:17:40     INFO - Trying again after exception: [Errno 111] Connection refused
15:17:40    FATAL - We have not been able to establish a telnet connection with the emulator
15:17:40    FATAL - Running post_fatal callback...
15:17:40     INFO - Let's kill every process called emulator64-x86
15:17:40     INFO - Killing pid 2718.
15:17:40    FATAL - Exiting -1
(Assignee)

Comment 2

4 years ago
In comparison, this seems typical on success:

09:23:19     INFO - #####
09:23:19     INFO - ##### Running start-emulators step.
09:23:19     INFO - #####
09:23:19     INFO - Running pre-action listener: _resource_record_pre_action
09:23:19     INFO - Running main action method: start_emulators
09:23:19     INFO - Let's kill every process called compiz
09:23:19     INFO - Killing pid 2415.
09:23:19     INFO - Created temp file /tmp/tmpgzGPNu.
09:23:19     INFO - Trying to start the emulator with this command: emulator -avd test-1 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5554 -qemu -m 1024 -enable-kvm
09:23:19     INFO - Sleeping 10 seconds
09:23:29     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
09:23:29     INFO - test-1: 5554; sut port: 20701/20700
09:23:29     INFO - Created temp file /tmp/tmpXjYJmL.
09:23:29     INFO - Trying to start the emulator with this command: emulator -avd test-2 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5556 -qemu -m 1024 -enable-kvm
09:23:29     INFO - Sleeping 10 seconds
09:23:39     INFO -   Attempt #1 to redirect ports: (5556, 20703, 20702)
09:23:39     INFO - test-2: 5556; sut port: 20703/20702
09:23:39     INFO - Created temp file /tmp/tmpqkRu5X.
09:23:39     INFO - Trying to start the emulator with this command: emulator -avd test-3 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5558 -qemu -m 1024 -enable-kvm
09:23:39     INFO - Sleeping 10 seconds
09:23:49     INFO -   Attempt #1 to redirect ports: (5558, 20705, 20704)
09:23:49     INFO - test-3: 5558; sut port: 20705/20704
09:23:49     INFO - Created temp file /tmp/tmp49YwFe.
09:23:49     INFO - Trying to start the emulator with this command: emulator -avd test-4 -debug init,console,gles,memcheck,adbserver,adbclient,adb,avd_config,socket -port 5560 -qemu -m 1024 -enable-kvm
09:23:49     INFO - Sleeping 10 seconds
09:23:59     INFO -   Attempt #1 to redirect ports: (5560, 20707, 20706)
09:23:59     INFO - test-4: 5560; sut port: 20707/20706
(Assignee)

Updated

4 years ago
Blocks: 891959
(Assignee)

Comment 3

4 years ago
Created attachment 8347499 [details] [diff] [review]
(1) dump emulator log if _redirectSUT fails

I wonder if the emulator log would show anything interesting in this case: let's dump it.
Attachment #8347499 - Flags: review?(armenzg)

Updated

4 years ago
Attachment #8347499 - Flags: review?(armenzg) → review+
(Assignee)

Comment 4

4 years ago
https://hg.mozilla.org/build/mozharness/rev/563ce5d8cef5
(Assignee)

Comment 5

4 years ago
Created attachment 8348234 [details] [diff] [review]
(2) fix emulator logging if redirect fails

That first patch didn't quite work: https://tbpl.mozilla.org/php/getParsedLog.php?id=32032518&tree=Cedar&full=1#error1

That's because emulator_procs has not been populated yet. A little re-arranging of the logic should get this working.
Attachment #8348234 - Flags: review?(armenzg)
(Assignee)

Comment 6

4 years ago
Created attachment 8348239 [details] [diff] [review]
(2) fix emulator logging if redirect fails

oops - attached the wrong patch
Attachment #8348234 - Attachment is obsolete: true
Attachment #8348234 - Flags: review?(armenzg)
Attachment #8348239 - Flags: review?(armenzg)
(Assignee)

Updated

4 years ago
Attachment #8347499 - Attachment description: dump emulator log if _redirectSUT fails → (1) dump emulator log if _redirectSUT fails

Updated

4 years ago
Attachment #8348239 - Flags: review?(armenzg) → review+
(Assignee)

Comment 7

4 years ago
(2) https://hg.mozilla.org/build/mozharness/rev/b74d08c2bc86
(Assignee)

Comment 8

4 years ago
That worked:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32040209&tree=Cedar&full=1#error0
(Assignee)

Comment 9

4 years ago
The failed emulator log (comment 8) shows:

...                 ioctl(KVM_CREATE_VM) failed: Interrupted system call
12:32:10     INFO - ko:failed to initialize KVM
(Assignee)

Comment 10

4 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=32051383&tree=Cedar

(looks the same as Comment 8).
in production
(Assignee)

Updated

4 years ago
Keywords: intermittent-failure
(Assignee)

Updated

4 years ago
Blocks: 936226
No longer blocks: 891959
(Assignee)

Comment 12

4 years ago
I checked the permissions on /dev/kvm; for both the passing and failing cases, permissions are the same:

crw-rw----+ 1 root kvm 10, 232 Dec 18 14:05 /dev/kvm
(Assignee)

Comment 13

4 years ago
Delaying the launch of the emulators can virtually eliminate this error. Normally I am seeing the emulator launch fail about 1 time in 10. If we sleep for 3 minutes before trying to launch the emulators, this still happens, but only ~1 time in 100. If the sleep is reduced to 1 minute, I see about 1 failure in 50. Below 1 minute, failures increase significantly.
(Assignee)

Comment 14

4 years ago
In light of comment 13, I thought I would try retrying: If an emulator is launched but cannot be contacted for port redirection, kill any running emulators, wait for a while, and then try launching the emulators again. A pause of 30 seconds between retries and up to 3 retries seems to effectively resolve this bug.

Here is an example where we fail (with the ioctl(KVM_CREATE_VM) message), retry, and succeed:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32254143&tree=Ash&full=1
(Assignee)

Comment 15

4 years ago
Created attachment 8350759 [details] [diff] [review]
(3) retry as described in Comment 14
Comment hidden (Treeherder Robot)
(Assignee)

Updated

4 years ago
Attachment #8350759 - Flags: review?(armenzg)

Comment 17

4 years ago
Comment on attachment 8350759 [details] [diff] [review]
(3) retry as described in Comment 14

Review of attachment 8350759 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/android_emulator_unittest.py
@@ +471,5 @@
>          self._kill_processes("compiz")
> +
> +        attempts = 0
> +        redirect_failed = True
> +        while attempts < 3 and redirect_failed:

Mind adding a comment on this while block to help future readers know what we're doing.
Attachment #8350759 - Flags: review?(armenzg) → review+
(Assignee)

Comment 18

4 years ago
With block comment added:

https://hg.mozilla.org/build/mozharness/rev/7de07ccc4e04
(Assignee)

Comment 19

4 years ago
Patch (3) is an effective solution: https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=36c8eeaf1589

Comment 20

4 years ago
Merged mozharness.
(Assignee)

Updated

4 years ago
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
status-firefox27: --- → unaffected
status-firefox28: --- → unaffected
status-firefox29: --- → fixed
status-firefox-esr24: --- → unaffected
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Updated

3 years ago
See Also: → bug 1109932
Comment hidden (Treeherder Robot)
You need to log in before you can comment on or make changes to this bug.