895186 - Run Android x86 emulator unit tests from buildbot

Reporter

Description

•

11 years ago

Over in bug 891959, I am sorting out how we can run our unit tests in Android x86 emulators. :dminor handed off some excellent mozharness-based scripts to me; they run the tests, but I am not sure how buildbot will integrate with them, or what component will assign test jobs to emulators.

This is a little preliminary, but we should probably start thinking about how all the pieces fit together. At this point I'm mostly looking for someone to work with me to plan a strategy for running these tests.

Justin Wood (:Callek)

Comment 1

•

11 years ago

Armen, does this look likely you'd be able to coord with gbrown about this endeavor, or do we want to have a mini group meeting to figure out who has time?

Flags: needinfo?(armenzg)

Armen [:armenzg]

Assignee

Comment 2

•

11 years ago

I can make time to follow up with gbrown.

gbrown: my ical is up-to-date, could you please pick a date and time to chat?

Flags: needinfo?(armenzg)

Armen [:armenzg]

Assignee

Comment 3

•

11 years ago

It seems that gbrown has some mozharness scripts and configs that he's going to get me and I can try to integrate it to my staging master with an iX box.

We can't do this testing on EC2 due to some issues with OpenGL that crashed.

Run times are a bit slower (20-30% slower) than on Pandas and Tegras.

Assignee: nobody → armenzg

Blocks: 891959

Armen [:armenzg]

Assignee

Comment 4

•

11 years ago

Adding needinfo to keep track that I need the scripts.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Updated

•

11 years ago

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 5

•

11 years ago

Attached file androidx86_emulator_unittest.py (obsolete) — Details

This is the python script I have been using to run tests in an emulator. I have been executing: 

androidx86_emulator_unittest.py --config <file>

and specifying all options in the config file.

Geoff Brown [:gbrown]

Reporter

Comment 6

•

11 years ago

Attached file config-gb-m1.py (obsolete) — Details

Sample config file: This runs mochitest-1 on emulator-5554.

Geoff Brown [:gbrown]

Reporter

Comment 7

•

11 years ago

See also bug 894507 for procedure and supporting scripts for setting up the emulator environment and launching the emulators.

Geoff Brown [:gbrown]

Reporter

Comment 8

•

11 years ago

Attached file config-gb-m1.py (obsolete) — Details

Sample config file: This runs mochitest-1 on emulator-5554.

This version also sets base_work_dir. If more than one script is running at one time, it is essential that each has a distinct base_work_dir.

Attachment #780533 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Priority: -- → P3

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Priority: P3 → P2

Armen [:armenzg]

Assignee

Comment 9

•

11 years ago

Attached patch x86.diff (obsolete) — Details — Splinter Review

I'm currently trying to run this on a machine by applying the attached patch.

python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand

Attachment #780531 - Attachment is obsolete: true

Attachment #780541 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 10

•

11 years ago

gbrown, dminor, on which machines did you run these scripts on?
What are their hostnames?

I'm using a test machine called talos-linux64-ix-003 and adb is not installed.
I'm not 100% if the lack of it is what is making me fail. I assume so.

13:56:02     INFO - #####
13:56:02     INFO - ##### Running install step.
13:56:02     INFO - #####
13:56:02     INFO - Running pre-action listener: _resource_record_pre_action
13:56:02     INFO - Running main action method: install
Getting output from command: ['adb', '-s', 'emulator-5554', 'shell', 'date']
Copy/paste: adb -s emulator-5554 shell date
13:56:02     INFO - Running post-action listener: _resource_record_post_action
13:56:02    FATAL - Uncaught exception: Traceback (most recent call last):
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 1048, in run
13:56:02    FATAL -     self.run_action(action)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 990, in run_action
13:56:02    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 931, in _possibly_run_method
13:56:02    FATAL -     return getattr(self, method_name)()
13:56:02    FATAL -   File "scripts/androidx86_emulator_unittest.py", line 192, in install
13:56:02    FATAL -     dh.install_app(self.installer_path)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 349, in install_app
13:56:02    FATAL -     self.set_device_time()
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 309, in set_device_time
13:56:02    FATAL -     self.info(self.query_device_time())
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 300, in query_device_time
13:56:02    FATAL -     "shell", "date"])
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 719, in get_output_from_command
13:56:02    FATAL -     cwd=cwd, stderr=tmp_stderr, env=env)
13:56:02    FATAL -   File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
13:56:02    FATAL -     errread, errwrite)
13:56:02    FATAL -   File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
13:56:02    FATAL -     raise child_exception
13:56:02    FATAL - OSError: [Errno 2] No such file or directory
13:56:02    FATAL - Exiting -1
13:56:02     INFO - Running post-run listener: _resource_record_post_run
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ source build/venv/bin/activate
(venv)[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ adb -s emulator-5554 shell date
No command 'adb' found, did you mean:
 Command 'cdb' from package 'tinycdb' (main)
 Command 'gdb' from package 'gdb' (main)
 Command 'dab' from package 'bsdgames' (universe)
 Command 'zdb' from package 'zfs-fuse' (universe)
 Command 'kdb' from package 'elektra-bin' (universe)
 Command 'tdb' from package 'tads2-dev' (multiverse)
 Command 'pdb' from package 'python' (main)
 Command 'jdb' from package 'openjdk-6-jdk' (main)
 Command 'jdb' from package 'openjdk-7-jdk' (universe)
 Command 'ab' from package 'apache2-utils' (main)
 Command 'ad' from package 'netatalk' (universe)
adb: command not found

Geoff Brown [:gbrown]

Reporter

Comment 11

•

11 years ago

I have been working on talos-linux64-ix-001.test.releng.scl3.mozilla.com. We installed the Android SDK there and added $SDK/tools and $SDK/platform-tools to PATH.

Armen [:armenzg]

Assignee

Comment 12

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #11)
> I have been working on talos-linux64-ix-001.test.releng.scl3.mozilla.com. We
> installed the Android SDK there and added $SDK/tools and $SDK/platform-tools
> to PATH.

I believe the right approach is to create the emulator snapshots inside of the current Android x86 builds and upload them to ftp.

Then the test machines will download the avd files and start them up.

Would this approach work for you?

gbrown, could you please upload somewhere few avd files for me?
I would like to verify steps 5 & 6 from bug 894507 and add mock support.

On another note, could we meet on Tuesday? (I'm off on Monday) I'm finally having time to look at this and I can now ask more intelligent questions.

I see two sides to this project:
1) generate the avd files on the build machines and upload them
2) download the avd files on the talos-linux64-ix machines and trigger the emulator jobs

Note to self, I need to enable the Android x86 builds on Cedar and Ash.

Armen [:armenzg]

Assignee

Comment 13

•

11 years ago

It seems that the SDK packaging might have changed since your original setup.

I'm trying this:
wget http://dl.google.com/android/adt/adt-bundle-linux-x86_64-20130729.zip
unzip adt-bundle-linux-x86_64-20130729.zip
mv adt-bundle-linux-x86_64-20130729/sdk/ ~/android-sdk-linux
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
cd ~/mozharness
python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand


Which of these avd files cltbld@talos-linux64-ix-001:~/gbrown are good for me to try?
junk2-avd.tgz  test-avd-8.tgz  test-avds-4.tgz  test-avds-5.tgz  test-avds-6.tgz  test-avds.tgz

Geoff Brown [:gbrown]

Reporter

Comment 14

•

11 years ago

Use this one:

http://people.mozilla.org/~gbrown/test-avds.tgz

Let's chat on Tuesday about the rest.

Geoff Brown [:gbrown]

Reporter

Comment 15

•

11 years ago

Attached patch misc x86 emulator changes (obsolete) — Details — Splinter Review

Similar to your patch, here are the changes I have been running with lately:
 - enable xpcshell tests
 - use hostutils.zip instead of xre.zip
 - simplify

Attachment #786149 - Flags: review?(armenzg)

Armen [:armenzg]

Assignee

Comment 16

•

11 years ago

Comment on attachment 786149 [details] [diff] [review]
misc x86 emulator changes

Review of attachment 786149 [details] [diff] [review]:
-----------------------------------------------------------------

Changing to feedback+ to prevent landing until we're ready.

Would avoid the check-in get on the way for you? I can ask a review from aki once I'm comfortable that things are working all the way.

Attachment #786149 - Flags: review?(armenzg) → feedback+

Geoff Brown [:gbrown]

Reporter

Comment 17

•

11 years ago

(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) (EDT/UTC-4) from comment #16)
> Would avoid the check-in get on the way for you? 

That's fine. I thought it would be easier to "share" with you if I checked in -- whatever works for you is fine.

Armen [:armenzg]

Assignee

Comment 18

•

11 years ago

Attached patch [wip] androix86 (obsolete) — Details — Splinter Review

This patch brings our patches a little closer to each other but not quite.
After I get things working, I will paste final patches.

Attachment #783351 - Attachment is obsolete: true

Attachment #786149 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 19

•

11 years ago

Should this have worked?

mkdir ~/.android/avd
cd ~/.android/avd
wget http://people.mozilla.org/~gbrown/test-avds.tgz
tar zxvf test-avds.tgz
wget -O launch.py https://bugzilla.mozilla.org/attachment.cgi?id=782839
$ python launch.py 
emulator: ERROR: This AVD's configuration is missing a kernel file!!

Geoff Brown [:gbrown]

Reporter

Comment 20

•

11 years ago

That's the right idea, but something is going wrong.

I think the issue is that the avd definitions contain pointers to image files in the Android SDK. Probably the culprit is:

/home/cltbld/android-sdk-linux/system-images/android-17/x86//kernel-qemu

Armen [:armenzg]

Assignee

Comment 21

•

11 years ago

FYI, I might be using a newer SDK.
This is the version that I downloaded it:
wget http://dl.google.com/android/adt/adt-bundle-linux-x86_64-20130729.zip

I can try to find an older SDK to match what you used.

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ls -l /home/cltbld/android-sdk-linux/system-images/android-18/
total 4
drwxrwx--- 2 cltbld cltbld 4096 Jul 10 19:05 armeabi-v7a
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ls -l /home/cltbld/android-sdk-linux/system-images
total 4
drwxr-x--- 3 cltbld cltbld 4096 Jul 29 15:23 android-18

Armen [:armenzg]

Assignee

Comment 22

•

11 years ago

I got a little further. Suggestions?

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
SDL init failure, reason is: No available video device

Armen [:armenzg]

Assignee

Comment 23

•

11 years ago

Setting the DISPLAY value helps.

> After a few minutes, launch.py will start 4 emulator instances and print the
> names and ports associated with each:

Is there a way to make it take less time? or know that it has not hung?

Armen [:armenzg]

Assignee

Comment 24

•

11 years ago

export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
export DISPLAY=:0.0
cd ~/mozharness
python launch.py (modified some printing)

Should I worry about those WARNINGs?
Do we need the sleeptime?


[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
Launching emulator #0
Attemp #1 of SUT redirection
test-x86-1: 5554; sut port:20701/20700
Sleeping 60
WARNING: Data partition already in use. Changes will not persist!
WARNING: SD Card image already in use: /home/cltbld/.android/avd/test-x86-1.avd/sdcard.img
WARNING: Cache partition already in use. Changes will not persist!
Launching emulator #1
Attemp #1 of SUT redirection
Attemp #2 of SUT redirection
Attemp #3 of SUT redirection
^@Attemp #4 of SUT redirection
^@^@Attemp #5 of SUT redirection
^@^@^@^@^@^@^@^@^@^@Traceback (most recent call last):
  File "launch.py", line 45, in <module>
    proc = launchEmulatorByIndex(i)
  File "launch.py", line 37, in launchEmulatorByIndex
    redirectSUT(emuport, sutport1, sutport2)
  File "launch.py", line 23, in redirectSUT
    tn.read_until('OK')
UnboundLocalError: local variable 'tn' referenced before assignment

Armen [:armenzg]

Assignee

Comment 25

•

11 years ago

How can I remove devices?

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ adb devices
List of devices attached 
emulator-5554   device

Geoff Brown [:gbrown]

Reporter

Comment 26

•

11 years ago

> Should I worry about those WARNINGs?

Yes - they indicate that more than one emulator is running against the same image file...that should not be the case.

> Do we need the sleeptime?

Yes, I think so. I found that without those sleeps, there were intermittent failures to launch an emulator.

> How can I remove devices?

Kill the emulator:

ps -ef | grep emu
kill ...

Armen [:armenzg]

Assignee

Comment 27

•

11 years ago

I've removed all devices yet I get this:

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ ps -ef | grep emu
cltbld    3234  2920  0 14:48 pts/3    00:00:00 grep emu
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
Launching emulator #0
Attemp #1 of SUT redirection
Socket Error [Errno 111] Connection refused
Attemp #2 of SUT redirection
Socket Error [Errno 111] Connection refused
Attemp #3 of SUT redirection
Socket Error [Errno 111] Connection refused
^Z
[1]+  Stopped                 python launch.py
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ ps -ef | grep emu
cltbld    3236  3235  1 14:48 pts/3    00:00:03 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-1 -port 5554
cltbld    3250  2920  0 14:53 pts/3    00:00:00 grep emu

Geoff Brown [:gbrown]

Reporter

Comment 28

•

11 years ago

You can debug the telnet connection problem manually:

Kill all emulators and verify adb devices shows nothing running. Then:

$ emulator -avd test-x86-1 &
$ adb devices
List of devices attached 
emulator-5554	device

$ telnet localhost 5554
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Android Console: type 'help' for a list of commands
OK
redir add tcp:20701:20701
OK
redir add tcp:20700:20700
OK
quit
Connection closed by foreign host.

Geoff Brown [:gbrown]

Reporter

Comment 29

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #20)
> I think the issue is that the avd definitions contain pointers to image
> files in the Android SDK. Probably the culprit is:
> 
> /home/cltbld/android-sdk-linux/system-images/android-17/x86//kernel-qemu

I created a new tar of avd images that includes kernel-qemu, along with the system.img and ramdisk.img: http://people.mozilla.org/~gbrown/test-avds-aug6.tgz

With these images, you do not need the system images in the SDK. But, you need to launch the emulator slightly differently (specify paths to the kernel, etc on the emulator command line). I will update the launch.py on bug 894507.

Armen [:armenzg]

Assignee

Comment 30

•

11 years ago

Hi gbrown,
I believe there's something at times not working properly in our machines.
For some reason, I sometimes have trouble starting emulator #1.
I've have noticed that if I run into it, VNC becomes unresponsive.
I've also noticed that compiz starts running at 100% CPU and things only get back to normal after I kill it.
After I killed it, I managed to start all emulators.

What can I do if run into it again? What can use to debug it? Any log messages that I can check?

These are the steps that I followed:
cd ~/.android/avd
rm -rf *
wget http://people.mozilla.org/~gbrown/test-avds-aug6.tgz
tar zxvf test-avds-aug6.tgz [1]
rm test-avds-aug6.tgz
cd
export DISPLAY=:0.0
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
emulator -avd test-x86-1 &
emulator -avd test-x86-2 &
emulator -avd test-x86-3 &
emulator -avd test-x86-4 &

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ps -ef | grep emu
cltbld    3323  2917 20 09:16 pts/3    00:01:42 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-2
cltbld    3366  2917 24 09:18 pts/3    00:01:37 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-3
cltbld    3388  2917 23 09:18 pts/3    00:01:35 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-4
cltbld    3442  2917 34 09:21 pts/3    00:01:15 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-1

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ adb devices
List of devices attached 
emulator-5554   device
emulator-5556   device
emulator-5558   device
emulator-5560   device






[1]
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com temp]$ ls -l ~/.android/avd/
total 283028
-rw-r--r-- 1 cltbld cltbld   2825664 Aug  6 15:19 kernel-qemu
-rw-r--r-- 1 cltbld cltbld    270168 Aug  6 15:19 ramdisk.img
-rw-r--r-- 1 cltbld cltbld 286691328 Aug  6 15:00 system.img
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:24 test-x86-1.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:24 test-x86-1.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-2.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-2.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-3.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-3.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-4.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-4.ini

Geoff Brown [:gbrown]

Reporter

Comment 31

•

11 years ago

I saw those exact symptoms when I started using multiple emulators. The only way I could find to avoid it was to stagger the launches: sleep after launching each emulator. That's why there are those long sleep's in launch.py.

Geoff Brown [:gbrown]

Reporter

Comment 32

•

11 years ago

I don't know of a good way to debug/diagnose this problem.

Armen [:armenzg]

Assignee

Comment 33

•

11 years ago

What are the http and ssl ports supposed to be?

Geoff Brown [:gbrown]

Reporter

Comment 34

•

11 years ago

I have been using:

    "http_port": "8888",
    "ssl_port": "4445",

for the first emulator, and incrementing for each subsequent: 8889/4446 for the second, etc. dminor -- is that right?

Flags: needinfo?(dminor)

Dan Minor [:dminor]

Comment 35

•

11 years ago

I've been doing the same thing as Geoff. These are the ports set up by the test webserver, so any values that don't collide with something else running on the test system are fine.

Flags: needinfo?(dminor)

Armen [:armenzg]

Assignee

Comment 36

•

11 years ago

Attached patch [wip] integrated launch.py into mozharness (obsolete) — Details — Splinter Review

I've added an action called start-emulators.
I hope tomorrow to run the mochitest suite to completion.

gbrown, dminor: what should we do once the tests pass? Do we shut the emulators off with the "kill" command from within the telnet connection? Or should we reboot directly?

Should I check if there are any emulators running at the beginning to I can shut them off before creating the new ones?

This was run like this:
 python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand --test-suite mochitest

Attachment #786437 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 37

•

11 years ago

I'm trying to run mochitests manually but it is failing to run /system/bin/logcat -c.

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ telnet localhost 20701
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
$>/system/bin/logcat -c
##AGENT-WARNING## [/system/bin/logcat] command with arg(s) = [-c] is currently not implemented.
$>ver
SUTAgentAndroid Version 1.18
$>


export DISPLAY=:0.0
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
emulator -avd test-x86-1 &
telnet localhost 5554
redir add tcp:20701:20701
redir add tcp:20700:20700
quit
/home/cltbld/mozharness/build/venv/bin/python /home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level INFO --app org.mozilla.fennec --remote-webserver 10.0.2.2 --run-only-tests androidx86.json --xre-path /home/cltbld/mozharness/build/hostutils/xre --utility-path /home/cltbld/mozharness/build/hostutils/bin --deviceIP 127.0.0.1 --devicePort 20701 --http-port 8888 --ssl-port 4445 --httpd-path /home/cltbld/mozharness/build/tests/mochitest --total-chunks 8 --this-chunk 1 --symbols-path crashreporter-symbols.zip 
Device info: {'uptime': ['0 days 0 hours 6 minutes 41 seconds 181 ms'], 'sutuserinfo': ['User Serial:0'], 'power': ['Power status:', ' AC power ONLINE', ' Battery charge CHARGING', ' Remaining charge: 50%', ' Battery Temperature: 0.0 (c)'], 'process': [['10037', '1591', 'com.android.exchange'], ['10047', '1689', 'com.mozilla.SUTAgentAndroid'], ['10038', '1605', 'com.android.providers.calendar'], ['10027', '1646', 'com.android.calendar'], ['10033', '1265', 'com.android.systemui'], ['10018', '1362', 'com.android.inputmethod.latin'], ['10005', '1433', 'com.android.location.fused'], ['1000', '1171', 'system'], ['10022', '1570', 'com.android.deskclock'], ['10002', '1411', 'com.android.launcher'], ['10024', '1290', 'android.process.media'], ['1000', '1458', 'com.android.settings'], ['1001', '1390', 'com.android.phone'], ['10046', '1627', 'com.mozilla.watcher'], ['10015', '1535', 'com.android.mms'], ['10010', '1328', 'android.process.acore'], ['10010', '1501', 'com.android.contacts'], ['10025', '1485', 'com.android.music']], 'screen': ['X:1024 Y:720'], 'memory': ['PA:799289344, FREE: 665034752'], 'systime': ['2013/08/09 02:18:31:247'], 'rotation': ['ROTATION:0'], 'disk': ['/data: 610140160 total, 562380800 available', '/system: 277610496 total, 0 available', '/mnt/sdcard: 522225664 total, 518141952 available'], 'os': ['sdk_x86-eng 4.2 JOP40C eng.android-build.20121231.103448 test-keys'], 'id': ['52:54:00:12:34:56'], 'uptimemillis': ['401207'], 'temperature': ['Temperature: unknown']}
Test root: /mnt/sdcard/tests
Automation Error: Exception caught while running tests
Traceback (most recent call last):
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 688, in main
    dm.recordLogcat()
  File "/home/cltbld/mozharness/build/tests/mochitest/devicemanager.py", line 125, in recordLogcat
    self.shellCheckOutput(['/system/bin/logcat', '-c'], root=self._logcatNeedsRoot)
  File "/home/cltbld/mozharness/build/tests/mochitest/devicemanager.py", line 375, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/system/bin/logcat', '-c'] (output: 'su: uid 10047 not allowed to su', retval: '1')
Traceback (most recent call last):
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 707, in <module>
    main()
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 693, in main
    mochitest.stopWebServer(options)
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 332, in stopWebServer
    self.server.stop()
AttributeError: 'MochiRemote' object has no attribute 'server'

Geoff Brown [:gbrown]

Reporter

Comment 38

•

11 years ago

The important bit there is:

su: uid 10047 not allowed to su

It looks like su is not installed. Are you using my avd definitions, or your own?

Armen [:armenzg]

Assignee

Comment 39

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #38)
> The important bit there is:
> 
> su: uid 10047 not allowed to su
> 
> It looks like su is not installed. Are you using my avd definitions, or your
> own?

I'm using this: http://people.mozilla.org/~gbrown/test-avds-aug6.tgz
It comes inside of it, no?

Geoff Brown [:gbrown]

Reporter

Comment 40

•

11 years ago

Yes, su should be in system.img, in that tar.

It works for me. Compare:

mozdev@ubuntu:~/.android/avd$ emulator64-x86 -avd test-x86-2 -kernel kernel-qemu -system system.img -ramdisk ramdisk.img &
[1] 14552
mozdev@ubuntu:~/.android/avd$ adb shell ls -l /system/xbin/su
-rwsr-sr-x root     root        17748 2013-08-05 03:14 su
mozdev@ubuntu:~/.android/avd$ ls -l system.img
-rw-r--r-- 1 mozdev mozdev 286691328 Aug  6 15:00 system.img
mozdev@ubuntu:~/.android/avd$ adb shell su -c id
uid=0(root) gid=0(root)

Justin Wood (:Callek)

Comment 41

•

11 years ago

(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) (EDT/UTC-4) from comment #37)
> I'm trying to run mochitests manually but it is failing to run
> /system/bin/logcat -c.
...

> $>ver
> SUTAgentAndroid Version 1.18
> $>

As an FYI SUTAgent 1.18 had issues in production and was backed out pending ATeam having cycles to look into it. we're currently using 1.17.  No idea if that is affecting things here

Geoff Brown [:gbrown]

Reporter

Comment 42

•

11 years ago

(I am pretty comfortable with sut 1.18, at least on the emulator, but we should in mind that that is not the current version used elsewhere.)

I removed the system-images from my Android SDK installation (android-sdk-linux). If your machine has a system image in the SDK, your emulator might be picking that up.

Armen [:armenzg]

Assignee

Comment 43

•

11 years ago

Attached patch x86_tools.diff (obsolete) — Details — Splinter Review

Attachment #788270 - Flags: review?(bugspam.Callek)

Armen [:armenzg]

Assignee

Comment 44

•

11 years ago

Attached patch [wip] x86_bc.diff (obsolete) — Details — Splinter Review

I'm not happy with the for loop that I do but I did not feel like duplicating all of the dictionaries.

This adds the x86 builds and x86 emulator tests to Ash.
That branch is special because it allows us to push to http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness to test mozharness patches.

aki, what do you think about the for loop? Would you take it? (No need to review the rest of the patch)

Attachment #788293 - Flags: feedback?(aki)

Aki Sasaki (not active)

Comment 45

•

11 years ago

Comment on attachment 788293 [details] [diff] [review]
[wip] x86_bc.diff

It's fine as long as we don't change the structure of the dicts... less ugly than some of the for loops we have :P
I think I was used to this being a sql table, so the sql can be programmatic but you get a readable exploded version.
I think we were investigating configconfig which would have a similar type of thing: a .py file to generate a verbose dict in some format (json? yaml?)  We keep hitting this write-optimize vs read-optimize problem, so that might be a longer term solution.

Attachment #788293 - Flags: feedback?(aki) → feedback+

Armen [:armenzg]

Assignee

Comment 46

•

11 years ago

gbrown, dminor: could we use this mozharness repo for now? That way we won't be based of different places.
http://hg.mozilla.org/users/armenzg_mozilla.com/mozharness

I will be pushing my changes there.

I have this running in our staging environment.
As soon as we get to a half decent state I will ask for reviews and enabled it on Ash.

Which test suites have so far been successful?

Armen [:armenzg]

Assignee

Comment 47

•

11 years ago

Which test suites are we expected to run?
The same ones as on Panda Android?

Geoff Brown [:gbrown]

Reporter

Comment 48

•

11 years ago

The full set of (Android) unit tests, like Panda, but including reftests and xpcshell tests:

M1 M2 M3 M4 M5 M6 M7 M8 M-gl rc1 rc2 C J1 J2 J3 R1 R2 R3 R4 X

AFAIK, we are not (yet) attempting Talos.

Justin Wood (:Callek)

Comment 49

•

11 years ago

Comment on attachment 788270 [details] [diff] [review]
x86_tools.diff

Review of attachment 788270 [details] [diff] [review]:
-----------------------------------------------------------------

reluctant r+ here.

So you're not tying these builds to panda masters, and the bm19/20/22/10 set of masters are all going to be going away (they are on kvm) as well as I would prefer to not load them up with a new job type until after we get them to the new masters and balanced via slavealloc.

That said, the aws masters you're marking are not yet officially in use, so I don't know if we need to do anything there first. And would be interested in which masters you actually plan to attach these jobs to and how.

Finally, I trust you to figure this all out and the changes are technically sound so r+

Attachment #788270 - Flags: review?(bugspam.Callek) → review+

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Geoff Brown [:gbrown]

Reporter

Comment 51

•

11 years ago

Comment on attachment 787765 [details] [diff] [review]
[wip] integrated launch.py into mozharness

Review of attachment 787765 [details] [diff] [review]:
-----------------------------------------------------------------

::: configs/android/androidx86.py
@@ +12,5 @@
> +    "min_http_port": "8888",  # starting http port to use for the mochitest server
> +    "min_ssl_port": "4445",   # starting ssl port to use for the server
> +    "min_emu_port": 5554,
> +    "min_sut_port1": 20701,
> +    "min_sut_port2" : 20700, # XXX: why do we have two ports?

sutagent supports 2 ports: 20700 and 20701 by default. 20700 does not prompt for commands; 20701 does. As far as I know, our test automation only uses 20701, but I think it best to redirect both ports in case there is some usage I am not aware of, and to allow for future use.

::: scripts/androidx86_emulator_unittest.py
@@ +38,4 @@
>           "default": "browser",
>           "help": "The type of tests to run",
>          }],
> +        [["--robocop-url"],

We can do without a robocop-specific option: See _download_robocop_apk() in android_panda.py.

@@ +226,5 @@
> +            self.info("Sleeping %d" % sleeptime)
> +            time.sleep(sleeptime)
> +            # XXX: what is this for?
> +            #for proc in procs:
> +            #    proc.wait()

launch.py had this just so that the launch script would wait for all of the emulators. This allowed you to Ctrl-C out of launch.py and kill all of the emulators at the same time. It's safe to remove from your mozharness patch.

Armen [:armenzg]

Assignee

Comment 52

•

11 years ago

Attached patch androidx86.tools.diff (obsolete) — Details — Splinter Review

Thanks for pointing that out.
I believe this should meet your expectations.

Attachment #788270 - Attachment is obsolete: true

Attachment #790394 - Flags: review?(bugspam.Callek)

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Summary: Determine how to run Android x86 emulator unit tests from buildbot → Run Android x86 emulator unit tests from buildbot

Hal Wine [:hwine] use NI!

Updated

•

11 years ago

Whiteboard: [reit-x86]

Armen [:armenzg]

Assignee

Comment 53

•

11 years ago

Attached patch [wip] androidx86.mozharness.diff (obsolete) — Details — Splinter Review

Attachment #787765 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 54

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #31)
> I saw those exact symptoms when I started using multiple emulators. The only
> way I could find to avoid it was to stagger the launches: sleep after
> launching each emulator. That's why there are those long sleep's in
> launch.py.

I don't know what to do. We have staggered start ups yet we get into this bad state.
We have to find a way to determine why they would not start.
This is a blocker for me.

Armen [:armenzg]

Assignee

Comment 55

•

11 years ago

Attached file androidx86.log.txt (obsolete) — Details

I'm attaching the log (note this is not ready yet).

I will be adding a way to stop the whole thing if all 4 emulators are not ready.
I will also have to figure out how to deal with determining the robocop from the read-buildbot-configs step.

Geoff Brown [:gbrown]

Reporter

Comment 56

•

11 years ago

That log is puzzling to me. It looks like you are launching the emulators the same way that I do, but I never have a problem like that when the launches are staggered.

Do you know if the 2nd, 3rd, and 4th emulators processes are still running when those telnet connections fail?

The emulator does not write a log, as far as I can tell. However, it usually writes some messages to stdout and/or stderr. You could try collecting and reporting those. Also, you can more of those messages by specifying "-debug all" on the emulator command line.

Geoff Brown [:gbrown]

Reporter

Comment 57

•

11 years ago

Comment on attachment 791003 [details] [diff] [review]
[wip] androidx86.mozharness.diff

Review of attachment 791003 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/androidx86_emulator_unittest.py
@@ +239,5 @@
> +        Let's make sure that every emulator has been stopped
> +        '''
> +        for p in self.procs:
> +            if p.poll() is None:
> +                p.kill()

I think this will work fine, but if you want another option:

The emulator accepts a 'kill' command over telnet -- just like the 'redir' command used in _redirectSUT.

Armen [:armenzg]

Assignee

Comment 58

•

11 years ago

Attached patch [wip] androidx86.mozharness.diff (obsolete) — Details — Splinter Review

* adding -debug all
* moved emulators parameters into the configs (the http port and ssl port need adjustment)
* removed launch processed by index
* removed "minimum" port concepts
* using self.fatal() when an emulator fails to be connected to
* added _post_fatal() function to kill the emulators (TODO)
* start as many emulators as specified in config["emulators"] instead of xrange(0, 4)

Armen [:armenzg]

Assignee

Comment 59

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #56)
> That log is puzzling to me. It looks like you are launching the emulators
> the same way that I do, but I never have a problem like that when the
> launches are staggered.
> 
> Do you know if the 2nd, 3rd, and 4th emulators processes are still running
> when those telnet connections fail?
> 
> The emulator does not write a log, as far as I can tell. However, it usually
> writes some messages to stdout and/or stderr. You could try collecting and
> reporting those. Also, you can more of those messages by specifying "-debug
> all" on the emulator command line.

I think I was trying the scripts locally first, did not reboot and triggered a job with buildbot. This probably means that I had an emulator instance running.

_post_fatal() works very well.

I will post a new patch by the end of Monday.

Armen [:armenzg]

Assignee

Comment 60

•

11 years ago

FTR, I'm on duty this week and I might find it impossible to keep doing any development until Monday.
It does not yet trigger unit test jobs on all four emulators.
My latest work is in the attachment.

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Attachment #791003 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 61

•

11 years ago

gbrown, I'm having trouble with staggered start up.
The first one starts well. The second one does not.
I've rebooted the host and will try again.

Are 20701 and 20700 the default sut ports?
Should I be redirecting the ports of the first emulator somewhere else? It seems that we don't redirect the default sut ports for the first emulator.

13:27:54     INFO - Trying to start the emulator with this command: emulator -avd test-x86-1 -debug all -port 5554 -kernel /home/cltbld/.android/avd/kernel-qemu -system /home/cltbld/.android/avd/system.img -ramdisk /home/cltbld/.android/avd/ramdisk.img
13:27:54     INFO - Sleeping 10 seconds
13:28:04     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
13:28:04     INFO - test-x86-1: 5554; sut port: 20701/20700
13:28:04     INFO - Emulators staggered start up. Sleeping 60 secs.
13:29:04     INFO - Trying to start the emulator with this command: emulator -avd test-x86-2 -debug all -port 5556 -kernel /home/cltbld/.android/avd/kernel-qemu -system /home/cltbld/.android/avd/system.img -ramdisk /home/cltbld/.android/avd/ramdisk.img
13:29:04     INFO - Sleeping 10 seconds
13:29:14     INFO -   Attempt #1 to redirect ports: (5556, 20703, 20702)
13:29:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:29:14     INFO - Sleeping 30 seconds
13:29:44     INFO -   Attempt #2 to redirect ports: (5556, 20703, 20702)
13:29:44     INFO - Trying again after exception: [Errno 111] Connection refused
13:29:44     INFO - Sleeping 30 seconds
13:30:14     INFO -   Attempt #3 to redirect ports: (5556, 20703, 20702)
13:30:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:30:14     INFO - Sleeping 30 seconds
13:30:44     INFO -   Attempt #4 to redirect ports: (5556, 20703, 20702)
13:30:44     INFO - Trying again after exception: [Errno 111] Connection refused
13:30:44     INFO - Sleeping 30 seconds
13:31:14     INFO -   Attempt #5 to redirect ports: (5556, 20703, 20702)
13:31:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:31:14    FATAL - We have not been able to establish a telnetconnection with the emulator
13:31:14    FATAL - Running post_fatal callback...
13:31:14     INFO - Let's kill every process called emulator-x86
13:31:14     INFO - Killing pid 2917.
13:31:14     INFO - Killing pid 2951.
13:31:14     INFO - Copying logs to upload dir...
13:31:14     INFO - mkdir: /builds/slave/talos-slave/test/build/upload/logs
13:31:14    FATAL - Exiting -1

Armen [:armenzg]

Assignee

Comment 62

•

11 years ago

Killing compiz helped again.

Armen [:armenzg]

Assignee

Comment 63

•

11 years ago

Attached file launch2.py — Details

I've created a new version of launch2.py to help me speed up the mozharness development.
It pretty much matches what I already have in mozharness.

This way I can focus only on running the tests once launch2.py runs well.

Armen [:armenzg]

Assignee

Comment 64

•

11 years ago

Attached file buildprops.json (obsolete) — Details

Armen [:armenzg]

Assignee

Comment 65

•

11 years ago

Attached patch androidx86.mozharness.diff (obsolete) — Details — Splinter Review

To make use of this script and get to where I'm you would have to do this:
- in one session run launch2.py by setting PATH and DISPLAY
-- two emulators should start

- on another session set PATH and DISPLAY as well
-- this will not be needed once I pass env values to ADBDeviceHandler
- clone mozharness
- apply mozharness patch
- download the buildprops.json [1]
- export PROPERTIES_FILE=`pwd`/buildprops.json
- /tools/buildbot/bin/python scripts/scripts/androidx86_emulator_unittest.py --cfg android/androidx86.py --test-suite mochitest --download-symbols ondemand

It currently fails on the install step.

[1] https://bugzilla.mozilla.org/attachment.cgi?id=794874

Attachment #791458 - Attachment is obsolete: true

Geoff Brown [:gbrown]

Reporter

Comment 66

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #61)
> gbrown, I'm having trouble with staggered start up.
> The first one starts well. The second one does not.

Can you collect the emulator -debug output from a failed run and post it here? I wonder if it has anything useful.
 
> Are 20701 and 20700 the default sut ports?

Yes.

> Should I be redirecting the ports of the first emulator somewhere else? It
> seems that we don't redirect the default sut ports for the first emulator.

Keep in mind that 

redir add tcp:20701:20701

is *not* a no-op. It "redirects" port 20701 on the emulator to port 20701 on the host, so that "telnet 127.0.0.1 2070!" on the host connects to the sutagent, on the emulator.

We want to issue a redir for every emulator, including the first one, and we just need the host ports to be unique. Currently, emulator 1 = {20700, 20701}, emulator 2 = {20702, 20703}, etc.

> I've rebooted the host and will try again.

> Killing compiz helped again.

Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?

Armen [:armenzg]

Assignee

Comment 67

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #66)
> > I've rebooted the host and will try again.
> 
> > Killing compiz helped again.
> 
> Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?

The reboot was probably unnecessary. Killing compiz is what helped.

How is compiz involved in this project? Is it fine if after a failure to connect I kill compiz?

Armen [:armenzg]

Assignee

Comment 68

•

11 years ago

rail: we're working on running 4 android x86 emulators on the talos-linux64-ix machines. We have noticed that at times, compiz starts using 100% CPU and we have to kill it otherwise we can't connect to the emulator.
Callek pointed out to me that there's some puppet comments that mention that compiz can take 100% CPU.
Is there something we can do on our side to prevent hitting this bug?
Or do you have more context as to why it happens?
Thanks!

http://mxr.mozilla.org/build/source/puppet/modules/gui/manifests/init.pp#67
http://mxr.mozilla.org/build/source/puppet/modules/gui/templates/Xsession.conf.erb#11

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 69

•

11 years ago

> Is there something we can do on our side to prevent hitting this bug?

In bug 859867 I landed attachment 747431 [details] [diff] [review] to prevent this, but it doesn't look like it helps...

> Or do you have more context as to why it happens?

Last I poked this, I thought that the problem is https://bugzilla.mozilla.org/show_bug.cgi?id=859867#c23 (nvidia drivers)

I hope it helps.

Flags: needinfo?(rail)

Armen [:armenzg]

Assignee

Comment 70

•

11 years ago

:(

12:11 armenzg: hrmm talos-linux64-ix-003 rebooted on me
12:14 Callek: armenzg: rail-lunch: well this is interesting, syslog on -003 right now:
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: WARNING: Application 'compiz.desktop' killed by sig
12:14 Callek: nal
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: WARNING: App 'compiz.desktop' respawning too quickl
12:14 Callek: y
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: CRITICAL: We failed, but the fail whale is dead. So
12:14 Callek: rry....
12:14 Callek expects that was armen killing it ;-)
12:15 armenzg: Callek: I just killed the process
12:15 armenzg: like 10 seconds ago
12:15 armenzg: because I'm starting the emulators again
12:15 armenzg: and I knew that it would prevent them from starting
12:16 armenzg: I don't think I killed it fast enough
12:16 armenzg now kills emulator's pid 3007
12:16 Callek: armenzg: sooo looks like the system went down due to a kernel crash

Geoff Brown [:gbrown]

Reporter

Comment 71

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #67)
> (In reply to Geoff Brown [:gbrown] from comment #66)
> > > I've rebooted the host and will try again.
> > 
> > > Killing compiz helped again.
> > 
> > Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?
> 
> The reboot was probably unnecessary. Killing compiz is what helped.
> 
> How is compiz involved in this project? Is it fine if after a failure to
> connect I kill compiz?

We don't interact directly with compiz: none of the scripts start/stop compiz or anything like that. I have noticed that compiz is usually the "top" cpu user while running the emulators.

I expect that it is fine to kill compiz when all of the emulators are stopped. I would be hesitant to kill compiz with an active emulator running tests.

I don't have much insight into compiz. I wonder if :dminor has more info?

Flags: needinfo?(dminor)

Armen [:armenzg]

Assignee

Comment 72

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #71)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> Enginerring) (EDT/UTC-4) from comment #67)
> > (In reply to Geoff Brown [:gbrown] from comment #66)
> > > > I've rebooted the host and will try again.
> > > 
> > > > Killing compiz helped again.
> > > 
> > > Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?
> > 
> > The reboot was probably unnecessary. Killing compiz is what helped.
> > 
> > How is compiz involved in this project? Is it fine if after a failure to
> > connect I kill compiz?
> 
> We don't interact directly with compiz: none of the scripts start/stop
> compiz or anything like that. I have noticed that compiz is usually the
> "top" cpu user while running the emulators.
> 
> I expect that it is fine to kill compiz when all of the emulators are
> stopped. I would be hesitant to kill compiz with an active emulator running
> tests.
> 
> I don't have much insight into compiz. I wonder if :dminor has more info?

I have only been killing when the emulators are starting up.
Once they have started up I have not had any trouble since I have not yet run any tests.

Dan Minor [:dminor]

Comment 73

•

11 years ago

It might be worthwhile spending some time to get Ubuntu running without compiz. It isn't necessary and is causing problems. It should be possible to do something like this: http://askubuntu.com/questions/32447/how-do-i-disable-compiz-in-the-ubuntu-classic-session

Flags: needinfo?(dminor)

Rail Aliiev [:rail]

Comment 74

•

11 years ago

(In reply to Dan Minor [:dminor] from comment #73)
> It might be worthwhile spending some time to get Ubuntu running without
> compiz. It isn't necessary and is causing problems. It should be possible to
> do something like this:
> http://askubuntu.com/questions/32447/how-do-i-disable-compiz-in-the-ubuntu-
> classic-session

We thought about using XFCE in the beginning, but didn't go with this option because it doesn't represent an "average" Ubuntu user. Switching to something non-compiz may affect unit tests and talos results as well.

Dan Minor [:dminor]

Comment 75

•

11 years ago

I thought there was going to be dedicated hardware for the Android x86 unit tests. If that is the case, then it should be safe to disable compiz for the machines running the emulators. If not, then I guess we are stuck killing it when it acts up.

Armen [:armenzg]

Assignee

Comment 76

•

11 years ago

I have been able two test suites. One on each emulator.
Now, I have to run them concurrently rather than sequentially.

aki, how does this format work for you?
    "test_suite_definitions": {
        "mochitest-1": {
            "args": [("--total-chunks", "8"), ("--this-chunk", "1")],
            "manifest": "androidx86.json",
        },  
        "mochitest-2": {
            "args": [("--total-chunks", "8"), ("--this-chunk", "2")],
            "manifest": "androidx86.json",
        },  
    },

I make use of it like this (--test-suite is an appendable list):
...
        self.test_suite_definitions = c['test_suite_definitions']
        self.test_suites = c.get('test_suites')
        for suite in self.test_suites:
            assert suite in self.test_suite_definitions
...         
'--run-only-tests', self.test_suite_definitions[suite_name]["manifest"],
...        
for arg_pair in self.test_suite_definitions[suite_name]["args"]:
   cmd.extend(self._build_arg(arg_pair[0], arg_pair[1]))

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 77

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #76)
> I have been able two test suites. One on each emulator.
> Now, I have to run them concurrently rather than sequentially.
> 
> aki, how does this format work for you?
>     "test_suite_definitions": {
>         "mochitest-1": {
>             "args": [("--total-chunks", "8"), ("--this-chunk", "1")],
>             "manifest": "androidx86.json",
>         },  
>         "mochitest-2": {
>             "args": [("--total-chunks", "8"), ("--this-chunk", "2")],
>             "manifest": "androidx86.json",
>         },  
>     },
> 
> I make use of it like this (--test-suite is an appendable list):
> ...
>         self.test_suite_definitions = c['test_suite_definitions']
>         self.test_suites = c.get('test_suites')
>         for suite in self.test_suites:
>             assert suite in self.test_suite_definitions
> ...         
> '--run-only-tests', self.test_suite_definitions[suite_name]["manifest"],
> ...        
> for arg_pair in self.test_suite_definitions[suite_name]["args"]:
>    cmd.extend(self._build_arg(arg_pair[0], arg_pair[1]))

Hm, not every argument is a pair.  I think I leaned towards extra_args as a flat list because of this.

Also, I'm not sure that --this-chunk will take multiple settings; it may just take the final one, so --test-suite mochitest-1 --test-suite mochitest-2 may only run chunk 2/8... though they're running in separate emulators?  (Not entirely clear here.)

Anyway, I would lean towards extra_args as a flat list, unless you have some specific reason for needing them in tuple pairs.

Flags: needinfo?(aki)

Armen [:armenzg]

Assignee

Comment 78

•

11 years ago

Attached patch androidx86.mozharness.diff (obsolete) — Details — Splinter Review

I'm running two different mochitest test suites on two different emulators :)

Here are the steps to follow in case you want to reproduce locally.
This is more accurate than comment 65.

I had to update the host with Callek's latest deployment.
* go in as root, apt-get update; apt-get install android-sdk18;

- In one session run launch2.py and set PATH and DISPLAY
-- two emulators should be started
export PATH=$PATH:/tools/android-sdk18/tools:/tools/android-sdk18/platform-tools
export DISPLAY=:0.0
wget -Olaunch2.py https://bugzilla.mozilla.org/attachment.cgi?id=794869
python launch2.py

- on another session set PATH and DISPLAY as well and run the script
-- this will not be needed once I pass env values to ADBDeviceHandler
- clone mozharness
- run the script

http://hg.mozilla.org/users/armenzg_mozilla.com/mozharness scripts
export PATH=$PATH:/tools/android-sdk18/tools:/tools/android-sdk18/platform-tools
export DISPLAY=:0.0
/tools/buildbot/bin/python scripts/scripts/androidx86_emulator_unittest.py --cfg android/androidx86.py --download-symbols ondemand --installer-path /builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/fennec-26.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/fennec-26.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/robocop.apk --test-suite mochitest-1 --test-suite mochitest-2

Attachment #794875 - Attachment is obsolete: true

Justin Wood (:Callek)

Updated

•

11 years ago

Attachment #794869 - Attachment mime type: text/x-python → text/plain

Armen [:armenzg]

Assignee

Comment 79

•

11 years ago

Attached patch concurrent.diff (obsolete) — Details — Splinter Review

This tries to follow what aki recommended wrt to process manipulation (use poll() instead of wait()).

I still need to adjust it so we dump the log for each process that finishes.

I will continue this on Friday.

Justin Wood (:Callek)

Comment 80

•

11 years ago

Attached patch [puppet] add android-tools (obsolete) — Details — Splinter Review

This adds the android sdk to the ubuntu hosts.

Attachment #796385 - Flags: review?(rail)

Rail Aliiev [:rail]

Comment 81

•

11 years ago

Comment on attachment 796385 [details] [diff] [review]
[puppet] add android-tools

Review of attachment 796385 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/init.pp
@@ +7,5 @@
> +	    # We want it on Ubuntu
> +	    include packages::mozilla::android_sdk18
> +	}
> +    }
> +}
\ No newline at end of file

\ No newline at end of file

Can you add one?

If you want to use a separate module for the SDK, I'd suggest to add CentOS here and fix http://hg.mozilla.org/build/puppet/file/618c178fd73e/modules/signingserver/manifests/base.pp#l31

::: modules/toplevel/manifests/slave/test/gpu.pp
@@ +6,4 @@
>          gui:
>              on_gpu => true;
>      }
> +    

extra trailing space ^

@@ +7,5 @@
>              on_gpu => true;
>      }
> +    
> +    # Android Emulators only work on gpu slaves
> +    include android-emulator

The name doesn't match the class name.

Attachment #796385 - Flags: review?(rail) → review-

Justin Wood (:Callek)

Comment 82

•

11 years ago

Attached patch [puppet] add android-tools v2 — Details — Splinter Review

Attachment #796385 - Attachment is obsolete: true

Attachment #796728 - Flags: review?(rail)

Rail Aliiev [:rail]

Updated

•

11 years ago

Attachment #796728 - Flags: review?(rail) → review+

Justin Wood (:Callek)

Comment 83

•

11 years ago

Comment on attachment 796728 [details] [diff] [review]
[puppet] add android-tools v2

 https://hg.mozilla.org/build/puppet/rev/5d7c4c7f5d1a

Attachment #796728 - Flags: checked-in+

Justin Wood (:Callek)

Comment 84

•

11 years ago

Attached patch [puppet] deploy avd's — Details — Splinter Review

So I'm not sold on this approach, and ideally we'll move this to a tooltool mechanic, even though nothing (I know of) uses tooltool on test machines yet.

It will not change often, and deploying with puppet is meant as a "get it out". This patch assumes we'd blow-away and recreate each job run, however we could also decompress with puppet and run with the same avd's each time.

Attachment #796938 - Flags: review?(rail)

Rail Aliiev [:rail]

Comment 85

•

11 years ago

Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

Can you also document it at https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/config?

Attachment #796938 - Flags: review?(rail) → review+

Justin Wood (:Callek)

Comment 86

•

11 years ago

Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

https://hg.mozilla.org/build/puppet/rev/9569852b09bd

...and documented
https://wiki.mozilla.org/index.php?title=ReleaseEngineering%2FPuppetAgain%2FModules%2Fconfig&action=historysubmit&diff=700887&oldid=675263

Attachment #796938 - Flags: checked-in+

Justin Wood (:Callek)

Comment 87

•

11 years ago

Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

backed out in https://hg.mozilla.org/build/puppet/rev/ac14ecf45840 due to:

Wed Aug 28 20:11:27 -0700 2013 Puppet (err): Failed to apply catalog: Parameter source failed on File[/home/cltbld/avds/test-x86.tar.gz]: Cannot use URLs of type 'http' as source for fileserving at /etc/puppet/production/modules/androidemulator/manifests/x86.pp:21

I suspect I can work around it with using one of the many other ways to specify source => for this file but i'll test it tomorrow after I've rested.

Attachment #796938 - Flags: checked-in+ → checked-in-

Justin Wood (:Callek)

Comment 88

•

11 years ago

Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

relanded using puppet:/// rather than http://

https://hg.mozilla.org/build/puppet/rev/54046d654252

Attachment #796938 - Flags: checked-in- → checked-in+

Dustin J. Mitchell [:dustin] (he/him)

Comment 89

•

11 years ago

Comment on attachment 796728 [details] [diff] [review]
[puppet] add android-tools v2

Review of attachment 796728 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/init.pp
@@ +5,5 @@
> +    case $::operatingsystem {
> +        Ubuntu: {
> +	    # We want it on Ubuntu
> +	    include packages::mozilla::android_sdk18
> +	}

This should have a default line, or just remove the $::operatingsystem conditional and let the package class handle failing on other platforms.

Dustin J. Mitchell [:dustin] (he/him)

Comment 90

•

11 years ago

Comment on attachment 796728 [details] [diff] [review]
[puppet] add android-tools v2

Review of attachment 796728 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/packages/manifests/mozilla/android_sdk18.pp
@@ +5,5 @@
> +class packages::mozilla::android_sdk18 {
> +    case $::operatingsystem {
> +        Ubuntu: {
> +            package {
> +                # Built from https://github.com/rail/android-sdk

Sorry, one more thing on this ptach - like screenresolution, this should probably be moved to a Mozilla repo.  It gets back to the still-unsolved problem of how to version Debian packaging scripts.

Dustin J. Mitchell [:dustin] (he/him)

Comment 91

•

11 years ago

Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

Review of attachment 796938 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/x86.pp
@@ +18,5 @@
> +		    source => "http://${config::data_server}/repos/private/avds/test-x86-aug6.tar.gz",
> +		    owner  => $users::builder::username,
> +		    group  => $users::builder::group;
> +	    }
> +	}

Will this HTTP source cause the entire tarball, which is quite large, to be downloaded on every puppet run?  That could be very painful.  Could this be downloaded with tooltool instead, or wrapped in a .deb?  At any rate, it isn't really a repository right now, so probably doesn't belong under repos/.

Do these have to be installed in $HOME?  We're not doing anything else in that directory anymore - could this be in /build or /tools instead?

Finally, where does this tarball come from?  How could an external user upgrade it, or a developer figure out what's in it, or a future relenger upgrade it to a newer version than aug6?

Dustin J. Mitchell [:dustin] (he/him)

Comment 92

•

11 years ago

Oh, and why is $install_avds "yes"/"no" instead of boolean?

Armen [:armenzg]

Assignee

Comment 93

•

11 years ago

Attached patch [wip] androidx86.configs.diff (obsolete) — Details — Splinter Review

I have to do some more mozharness work with buildbot, hence, I had to create this patch to run Android x86 on my dev-master.

What do you guys think of the builder naming? and the structure to define it?
androidx86-set-# --> --test-suite jsreftest-1 --test-suite jsreftest-2 --test-suite jsreftest-3


I had to do that weird ANDROID_X86_MOZHARNESS_UNITTEST_DICT and ANDROID_X86_MOZHARNESS_UNITTEST_DICT dictionaries. It is ugly. Do you have any suggestions?

Attachment #798640 - Flags: feedback?(bugspam.Callek)

Attachment #798640 - Flags: feedback?(aki)

Armen [:armenzg]

Assignee

Comment 94

•

11 years ago

Attached patch [wip] androidx86.mozharness.3.diff (obsolete) — Details — Splinter Review

This is my latest work.

I would only want to highlight the addition of the suite definitions for all of the test suites.

This week I will be tackling:
* put compiz under control
* manage the avds appropriately
* report status correctly

Attachment #796213 - Attachment is obsolete: true

Attachment #796282 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 95

•

11 years ago

Attached file androidx86.log.txt (obsolete) — Details

I'm getting such weird errors.
I wonder what I've changed to get this.

09:26:10     INFO - One of the test suites have finished and we're going to dump its output
09:26:10     INFO - Reading from file /tmp/tmp1MnEMs
09:26:10     INFO - Traceback (most recent call last):
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 707, in <module>
09:26:10     INFO -     main()
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 532, in main
09:26:10     INFO -     dm = droid.DroidSUT(options.deviceIP, options.devicePort, deviceRoot=options.remoteTestRoot)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 47, in __init__
09:26:10     INFO -     self.getDeviceRoot()
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 694, in getDeviceRoot
09:26:10     INFO -     data = self._runCmds([{ 'cmd': 'testroot' }])
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 152, in _runCmds
09:26:10     INFO -     self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 128, in _sendCmds
09:26:10     INFO -     self._doCmds(cmdlist, outputfile, timeout)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 175, in _doCmds
09:26:10     INFO -     self._sock.connect((self.host, int(self.port)))
09:26:10     INFO -   File "/usr/lib/python2.7/socket.py", line 224, in meth
09:26:10     INFO -     return getattr(self._sock,name)(*args)
09:26:10     INFO - TypeError: coercing to Unicode: need string or buffer, NoneType found

Attachment #791412 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Attachment #788293 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 96

•

11 years ago

Attached patch [wip] androidx86.mozharness.3.diff (obsolete) — Details — Splinter Review

This has my latest mozharness code.

- It kills compiz in advance
- It does some basic avds manipulation (still needs to be tested)

NOTE: I've found out that I need to kill compiz before trying to start any emulator (rather than try to kill it after an SUT timeout occurs).

Attachment #798649 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 97

•

11 years ago

Comment on attachment 798913 [details]
androidx86.log.txt

Ignore comment 95. I was doing something wrong.

Attachment #798913 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 98

•

11 years ago

Hi aki,
I see that ADBDeviceHandler prints to stdout rather than using logging.
http://hg.mozilla.org/build/mozharness/file/061c3d6c7b52/mozharness/mozilla/testing/device.py#l99

What should I do? Thanks!

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 99

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #98)
> Hi aki,
> I see that ADBDeviceHandler prints to stdout rather than using logging.
> http://hg.mozilla.org/build/mozharness/file/061c3d6c7b52/mozharness/mozilla/
> testing/device.py#l99
> 
> What should I do? Thanks!

I don't see any print()s in that file.  I'm going to guess this is the output spew from adb/devicemanager itself.  I don't know what we can do about that other than

* turn off output from devicemanager, which may hide issues
* get devicemanager to accept a log object
* try redirecting stdout/stderr itself, which may be an ugly solution but might work.
* do what I did, which is let devicemanager spew to stdout, and generally ignore it.

Flags: needinfo?(aki)

Armen [:armenzg]

Assignee

Comment 100

•

11 years ago

Comment on attachment 798914 [details] [diff] [review]
[wip] androidx86.mozharness.3.diff

gbrown, dminor: could I please get your feedback on this patch?

I'm getting close to completion and I would like to give you a day or so to give me your feedback.

I would like to ask aki for a review with all of your concerns addressed first.

Attachment #798914 - Flags: feedback?(gbrown)

Attachment #798914 - Flags: feedback?(dminor)

Armen [:armenzg]

Assignee

Comment 101

•

11 years ago

Attached patch [wip] status mozharness experiments (obsolete) — Details — Splinter Review

This patch shows the changes from my last mozharness patch where I try to experiment how to report that final status.
This probably will take a couple of days.

Geoff Brown [:gbrown]

Reporter

Comment 102

•

11 years ago

Comment on attachment 798914 [details] [diff] [review]
[wip] androidx86.mozharness.3.diff

Review of attachment 798914 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good! 

I am really looking forward to seeing this running via tbpl (on some tree), even if not everything works yet.

::: configs/android/androidx86.py
@@ +20,5 @@
> +    },
> +    "default_actions": [
> +        'clobber',
> +        'read-buildbot-config',
> +        #'setup-avds',

Why is this commented out?

@@ +42,5 @@
> +        {
> +            "name": "test-x86-2",
> +            "device_id": "emulator-5556",
> +            "http_port": "8888", # starting http port to use for the mochitest server
> +            "ssl_port": "4445", # starting ssl port to use for the server

Are you sure it is okay to have the same http_port and ssl_port for all of the instances? I have always used distinct ports...I don't know if this will be a problem or not. If they are all the same, maybe these parameters can be moved out of the per-emulator config?

@@ +130,5 @@
> +            "extra_args": [os.path.join('tests', 'testing', 'crashtest', 'crashtests.list')]
> +        },
> +        "xpcshell": {
> +            "category": "xpcshell",
> +            "extra_args": ["--manifest", os.path.join('..','jsreftest', 'tests', 'jstests.list')]

That's not right! The xpcshell manifest should be xpcshell/xpcshell_android.ini.

@@ +134,5 @@
> +            "extra_args": ["--manifest", os.path.join('..','jsreftest', 'tests', 'jstests.list')]
> +        },
> +        "robocop-1": {
> +            "category": "mochitest",
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],

Missing total-chunks, this-chunk args.

@@ +138,5 @@
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],
> +        },
> +        "robocop-2": {
> +            "category": "mochitest",
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],

Ditto.

::: scripts/androidx86_emulator_unittest.py
@@ +58,5 @@
>  
>      error_list = [
>          {'substr': 'FAILED (errors=', 'level': ERROR},
>          {'substr': r'''Could not successfully complete transport of message to Gecko, socket closed''', 'level': ERROR},
>          {'substr': 'Timeout waiting for marionette on port', 'level': ERROR},

There are a couple of errors left over from b2g here -- I'm sure they can be removed.

@@ +329,5 @@
> +        '''
> +        This action starts the emulators and redirects the two SUT ports for each one of them
> +        '''
> +        # XXX: This line is needed since I'm not rebootig the machine in between jobs
> +        self._kill_processes("emulator-x86")

I thought "emulator" spawned "emulator64-x86" on 64 bit machines -- might be worth double checking the process names.

@@ +395,5 @@
> +            if procs == []:
> +                break
> +            else:
> +                self.info("#")
> +                time.sleep(30)

I never like to see polling. Is this needed to avoid an output-driven timeout?

Attachment #798914 - Flags: feedback?(gbrown) → feedback+

Geoff Brown [:gbrown]

Reporter

Comment 103

•

11 years ago

Also...I know earlier versions were not setting minidump_stackwalk_path correctly and I did not notice any changes in your patch. We should check on that.

Armen [:armenzg]

Assignee

Comment 104

•

11 years ago

Thanks gbrown for your catches (specially the ports - I lost the changes somewhere).
I've fixed them and testing again.

Armen [:armenzg]

Assignee

Comment 105

•

11 years ago

> @@ +395,5 @@
> > +            if procs == []:
> > +                break
> > +            else:
> > +                self.info("#")
> > +                time.sleep(30)
> 
> I never like to see polling. Is this needed to avoid an output-driven
> timeout?

The printing of "#" is to avoid an output-driven timeout.

I needed to know when the process that triggers the tests finishes.
I wished I could use a callback or a similar mechanism but I didn't research more.

Aki Sasaki (not active)

Comment 106

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #105)
> The printing of "#" is to avoid an output-driven timeout.

You may want to sys.stdout.write('#') to avoid filling up the log with a timestamp + INFO + # every 30 seconds.

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Attachment #794874 - Attachment is obsolete: true

Aki Sasaki (not active)

Comment 107

•

11 years ago

Comment on attachment 798640 [details] [diff] [review]
[wip] androidx86.configs.diff

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #93)
> What do you guys think of the builder naming? and the structure to define it?
> androidx86-set-# --> --test-suite jsreftest-1 --test-suite jsreftest-2
> --test-suite jsreftest-3

I think that makes sense.
I called groups of builds 'suites' elsewhere, but we already use 'suite' here so 'set' is fine.

> I had to do that weird ANDROID_X86_MOZHARNESS_UNITTEST_DICT and
> ANDROID_X86_MOZHARNESS_UNITTEST_DICT dictionaries. It is ugly. Do you have
> any suggestions?

I don't know about the names, but the current dict assumes we're running the test chunks individually, not parallelized on a single machine, so we do require a separate config for this.

>diff --git a/mozilla-tests/BuildSlaves.py.template b/mozilla-tests/BuildSlaves.py.template
>--- a/mozilla-tests/BuildSlaves.py.template
>+++ b/mozilla-tests/BuildSlaves.py.template
>@@ -1,43 +1,44 @@
>+    "ubuntu32": "pass",
>+    "ubuntu64": "pass",
>+    "ubuntu64-b2g": "pass",
>-    "tiger": "pass",
>-    "w764": "pass",
>-    "vista": "pass",
>+    "linux64_android-x86": "pass",

Did you intend to make these other changes?
I think linux64_android-x86 is the only specifically applicable line here, right?
I didn't see 'ubuntu32' in use anywhere; I didn't check on the others.

Also, nit: lots of trailing whitespace in your mobile_config.py patch :)

Attachment #798640 - Flags: feedback?(aki) → feedback+

Armen [:armenzg]

Assignee

Comment 108

•

11 years ago

Attached patch androidx86.configs.2.diff — Details — Splinter Review

I've dealt with all of your feedback.
I've removed some of the noise of sorting platforms so staging_config.py and production_config.py match (I think that was my original reason).
I'm also reusing the ubuntu64_hw instead.
I've removed a bunch of trailing white spaces.

Attachment #798640 - Attachment is obsolete: true

Attachment #798640 - Flags: feedback?(bugspam.Callek)

Attachment #799702 - Flags: review?(aki)

Armen [:armenzg]

Assignee

Comment 109

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #103)
> Also...I know earlier versions were not setting minidump_stackwalk_path
> correctly and I did not notice any changes in your patch. We should check on
> that.

What should it be?
I see this output for the panda jobs:
./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH = "/builds/minidump_stackwalk"
./configs/android/android_panda_releng.py:    "minidump_stackwalk_path": MINIDUMP_STACKWALK_PATH,
./configs/android/android_panda_releng.py:    "minidump_save_path": "%(abs_work_dir)s/../minidumps",

Flags: needinfo?(gbrown)

Armen [:armenzg]

Assignee

Comment 110

•

11 years ago

gbrown: should I delete ~/.android/avds and unpack clean templates before each run?

Armen [:armenzg]

Assignee

Comment 111

•

11 years ago

Attached patch androidx86.mozharness.4.diff (obsolete) — Details — Splinter Review

I've completed the coding as best as possible.
Let me know if you prefer other ways to solve some of what I do.

Would you mind if we landed this after you review it and iterate after that?
I assume that we will need to see it running on tbpl and adjust as we see failures.

Attachment #798914 - Attachment is obsolete: true

Attachment #799090 - Attachment is obsolete: true

Attachment #798914 - Flags: feedback?(dminor)

Attachment #799741 - Flags: review?(aki)

Armen [:armenzg]

Assignee

Comment 112

•

11 years ago

Attached patch androidx86.mozharness.4.diff — Details — Splinter Review

Fixed some last minute typos.

Attachment #799741 - Attachment is obsolete: true

Attachment #799741 - Flags: review?(aki)

Attachment #799757 - Flags: review?(aki)

Geoff Brown [:gbrown]

Reporter

Comment 113

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #109)
> (In reply to Geoff Brown [:gbrown] from comment #103)
> > Also...I know earlier versions were not setting minidump_stackwalk_path
> > correctly and I did not notice any changes in your patch. We should check on
> > that.
> 
> What should it be?
> I see this output for the panda jobs:
> ./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH =
> "/builds/minidump_stackwalk"
> ./configs/android/android_panda_releng.py:    "minidump_stackwalk_path":
> MINIDUMP_STACKWALK_PATH,
> ./configs/android/android_panda_releng.py:    "minidump_save_path":
> "%(abs_work_dir)s/../minidumps",

I do not know where minidump_stackwalk lives officially, but there are some at http://mxr.mozilla.org/build/source/tools/breakpad/. 

We need minidump_stackwalk_path to point to the minidump_stackwalk binary (the file itself). So if we can get http://mxr.mozilla.org/build/source/tools/breakpad/linux64/minidump_stackwalk copied to /build/minidump_stackwalk and set minidump_stackwalk_path == "/build/minidump_stackwalk", that should work.

minidump_save_path just needs to point to a directory (it can be empty) that .dmp files can be copied to - a temporary directory to hold dumps.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 114

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #110)
> gbrown: should I delete ~/.android/avds and unpack clean templates before
> each run?

I think that would be best.

Justin Wood (:Callek)

Comment 115

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #113)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> > What should it be?
> > I see this output for the panda jobs:
> > ./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH =
> > "/builds/minidump_stackwalk"
> > ./configs/android/android_panda_releng.py:    "minidump_stackwalk_path":
> > MINIDUMP_STACKWALK_PATH,
> > ./configs/android/android_panda_releng.py:    "minidump_save_path":
> > "%(abs_work_dir)s/../minidumps",
> 
> I do not know where minidump_stackwalk lives officially, but there are some
> at http://mxr.mozilla.org/build/source/tools/breakpad/. 
> 
> We need minidump_stackwalk_path to point to the minidump_stackwalk binary
> (the file itself). So if we can get
> http://mxr.mozilla.org/build/source/tools/breakpad/linux64/
> minidump_stackwalk copied to /build/minidump_stackwalk and set
> minidump_stackwalk_path == "/build/minidump_stackwalk", that should work.
> 
> minidump_save_path just needs to point to a directory (it can be empty) that
> .dmp files can be copied to - a temporary directory to hold dumps.

We don't need to stuff it in /builds like foopies, we do however want to be sure we use it. The binary is based on the host OS not target OS, so using the same one that Ubuntu test slaves use is just fine, we checkout tools as part of these tests aiui so we can just point at the location in our local tools repo.

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #799702 - Flags: review?(aki) → review+

Aki Sasaki (not active)

Comment 116

•

11 years ago

Comment on attachment 799757 [details] [diff] [review]
androidx86.mozharness.4.diff

Awesome work, Armen!
I'm pretty impressed you got parallel processes working... I'd love to see that as a generic helper object and logger, but that's definitely out of scope here.

There are a lot of comments below.
You can land after fixing, or I'm happy to re-review after changes.

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #111)
> Would you mind if we landed this after you review it and iterate after that?
> I assume that we will need to see it running on tbpl and adjust as we see
> failures.

That's ok.  You might have to make some of these fixes to actually have this runnable, though.


Pyflakes says:

scripts/androidx86_emulator_unittest.py:10: 're' imported but unused
scripts/androidx86_emulator_unittest.py:24: 'BaseErrorList' imported but unused
scripts/androidx86_emulator_unittest.py:25: 'ERROR' imported but unused
scripts/androidx86_emulator_unittest.py:375: undefined name 'c'
scripts/androidx86_emulator_unittest.py:397: local variable 'joint_return_code' is assigned to but never used

The undefined name 'c' is probably going to break things.


>diff --git a/configs/android/androidx86.py b/configs/android/androidx86.py
>new file mode 100644
>--- /dev/null
>+++ b/configs/android/androidx86.py
>@@ -0,0 +1,191 @@
>+import os
>+
>+config = {
>+    "buildbot_json_path": "buildprops.json",
>+    "host_utils_url": "http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip",

We may want this url in-tree at some point.  Not a blocker.

>+    "fennec_package_name": "org.mozilla.fennec",

This will work for m-c level branches, but not as they ride the trains.
I think there should be a text file inside the apk (package-name.txt?) that says what the app name is.  (there is for android, not sure if that carried over to android-x86)  Reading that file will make this work across trains, or for developers' builds, or on try, no matter which train the developer pushed from.

We'll have to fix this before we can enable this on Aurora, or really have it be useful on Try.
This doesn't block rolling out to Cedar, but we should probably fix before rolling out further.

>+    "test_suite_definitions": {
>+        "mochitest-1": {
>+            "category": "mochitest",
>+            "extra_args": ["--total-chunks", "8", "--this-chunk", "1", "--run-only-tests", "androidx86.json"],
>+        },
<snip>
>+    "suite_definitions": {
>+        "mochitest": {
>+            "run_filename": "runtestsremote.py",
>+            "options": ["--autorun", "--close-when-done", "--dm_trans=sut",
>+                "--console-level=INFO", "--app=%(app)s", "--remote-webserver=%(remote_webserver)s",
>+                "--xre-path=%(xre_path)s", "--utility-path=%(utility_path)s",
>+                "--deviceIP=%(device_ip)s", "--devicePort=%(device_port)s",
>+                "--http-port=%(http_port)s", "--ssl-port=%(ssl_port)s",
>+                "--certificate-path=%(certs_path)s", "--symbols-path=%(symbols_path)s"
>+            ],
>+        },

At some point we may want these in-tree, like talos.json.  Again, not a blocker.

>+sleeptime = 60

This might be a good thing to be able to configure, with a default of 60 (or whatever).

>+    def _redirectSUT(self, emuport, sutport1, sutport2):
>+        '''
>+        This redirects the default SUT ports for a given emulator.
>+        This is needed if more than one emulator is started.
>+        '''

This is interesting... temporary workaround or permanent solution?
You might want a self.info() at the beginning, maybe not.

>+    def _post_fatal(self, message=None, exit_code=None):
>+        """ After we call fatal(), run this method before exiting.
>+        """
>+        self._kill_processes("emulator64-x86")
>+
>+        # XXX aki, I' not sure exactly what this block is for
>+        if 'notify' in self.actions:
>+            self.notify(message=message, fatal=True)
>+        self.copy_logs_to_upload_dir()

This is to send me email and save logs for the hg-git process.
You don't need this block.

>+        joint_return_code = 0
>+        while True:
>+            for p in procs:
>+                return_code = p["process"].poll()
>+                if return_code!=None:
>+                    self.info("##### %s log begins" % p["suite_name"])

You may want to sys.stdout.write('\n') before this self.info(), so it doesn't show up on screen to the right of a million #'s.

>+                    if return_code !=0:
>+                        joint_return_code=1

I think you need to do something with this.

> if __name__ == '__main__':
>     emulatorTest = Androidx86EmulatorTest()
>-    emulatorTest.run_and_exit()
>+    emulatorTest.run()

This should be run_and_exit().

Attachment #799757 - Flags: review?(aki) → review+

Armen [:armenzg]

Assignee

Comment 117

•

11 years ago

Attached patch androidx86.tools.diff - We add android-x86 on the linux masters instead of the mobile ones — Details — Splinter Review

Attachment #790394 - Attachment is obsolete: true

Attachment #790394 - Flags: review?(bugspam.Callek)

Attachment #800164 - Flags: review?(bugspam.Callek)

Armen [:armenzg]

Assignee

Comment 118

•

11 years ago

So sad :(

"Output exceeded 52428800 bytes, remaining output has been truncated"

This happens when all of the emulators do not actually fail right away.

Armen [:armenzg]

Assignee

Comment 119

•

11 years ago

Updating the maxLogSize is very unwanted since it affects the performance of the masters.

Plan to mitigate comment 118:
- pull verbose test jobs into their own separate Androix86 test set (run 1 emulator job instead of 4)

Plan to fix this issue (might not be implemented this time around):
- do not output test results
-- output only the test summary
- upload test log with blobber
- Tinderboxprint link to full log
- Teach tbpl how to follow those URLs and parse those

Armen [:armenzg]

Assignee

Comment 120

•

11 years ago

Landed: https://hg.mozilla.org/build/mozharness/rev/3b926f407e76

I will follow up with all feedback from comment 114, comment 116 and the minidump related comments.

Justin Wood (:Callek)

Updated

•

11 years ago

Attachment #800164 - Flags: review?(bugspam.Callek) → review+

Armen [:armenzg]

Assignee

Comment 121

•

11 years ago

Comment on attachment 800164 [details] [diff] [review]
androidx86.tools.diff - We add android-x86 on the linux masters instead of the mobile ones

http://hg.mozilla.org/build/tools/rev/f2f79ce56851

Attachment #800164 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 122

•

11 years ago

Comment on attachment 799702 [details] [diff] [review]
androidx86.configs.2.diff

http://hg.mozilla.org/build/buildbot-configs/rev/2da9aa1732c5

Attachment #799702 - Flags: checked-in+

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Attachment #799757 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 123

•

11 years ago

Merged to production branch. Live in production.

Aki Sasaki (not active)

Comment 124

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #118)
> So sad :(
> 
> "Output exceeded 52428800 bytes, remaining output has been truncated"
> 
> This happens when all of the emulators do not actually fail right away.

You can also do the sleep(5), but only sys.stdout.write('#') if more than X amount of time has passed since the last one (60 seconds? 5min?)

Armen [:armenzg]

Assignee

Comment 125

•

11 years ago

> >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> >+        '''
> >+        This redirects the default SUT ports for a given emulator.
> >+        This is needed if more than one emulator is started.
> >+        '''
>
> This is interesting... temporary workaround or permanent solution?
> You might want a self.info() at the beginning, maybe not.
>

aki, what do you mean with self.info()? It is permanent. Each test job will talk to the pair of sut ports redirected for each emulator.

Flags: needinfo?(aki)

Armen [:armenzg]

Assignee

Comment 126

•

11 years ago

I can't see the jobs running on tbpl or buildapi even though in the "hidden builders" section I can see the androidx86 set jobs.
https://tbpl.mozilla.org/?tree=Ash&rev=0d4ae6057ef5&jobname=Android.*&showall=1
https://secure.pub.build.mozilla.org/buildapi/self-serve/ash/rev/0d4ae6057ef5

I will wait a bit before filing a bug.

Aki Sasaki (not active)

Comment 127

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #125)
> > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > >+        '''
> > >+        This redirects the default SUT ports for a given emulator.
> > >+        This is needed if more than one emulator is started.
> > >+        '''
> >
> > This is interesting... temporary workaround or permanent solution?
> > You might want a self.info() at the beginning, maybe not.
> >
> 
> aki, what do you mean with self.info()? It is permanent. Each test job will
> talk to the pair of sut ports redirected for each emulator.

self.info("Attempting to redirect ports for X to ...")

Flags: needinfo?(aki)

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Depends on: 913174

Armen [:armenzg]

Assignee

Comment 128

•

11 years ago

We can see the jobs in here:
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set

The status reporting for TBPL it is not yet working properly.

Armen [:armenzg]

Assignee

Comment 129

•

11 years ago

(In reply to Aki Sasaki [:aki] from comment #127)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> Enginerring) (EDT/UTC-4) from comment #125)
> > > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > > >+        '''
> > > >+        This redirects the default SUT ports for a given emulator.
> > > >+        This is needed if more than one emulator is started.
> > > >+        '''
> > >
> > > This is interesting... temporary workaround or permanent solution?
> > > You might want a self.info() at the beginning, maybe not.
> > >
> > 
> > aki, what do you mean with self.info()? It is permanent. Each test job will
> > talk to the pair of sut ports redirected for each emulator.
> 
> self.info("Attempting to redirect ports for X to ...")

Does this do?
http://hg.mozilla.org/build/mozharness/file/default/scripts/androidx86_emulator_unittest.py#l159
    self.info("  Attempt #%d to redirect ports: (%d, %d, %d)" % \
            (attempts, emuport, sutport1, sutport2))

Armen [:armenzg]

Assignee

Comment 130

•

11 years ago

I'm now going to be pushing changes to ash-mozharness instead of my own user repo.
I've triggered a new set of jobs based on:
http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness/rev/460926e7ed43

The results will show up on the second run of these:
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set

Aki Sasaki (not active)

Comment 131

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #129)
> (In reply to Aki Sasaki [:aki] from comment #127)
> > (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> > Enginerring) (EDT/UTC-4) from comment #125)
> > > > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > > > >+        '''
> > > > >+        This redirects the default SUT ports for a given emulator.
> > > > >+        This is needed if more than one emulator is started.
> > > > >+        '''
> > > >
> > > > This is interesting... temporary workaround or permanent solution?
> > > > You might want a self.info() at the beginning, maybe not.
> > > >
> > > 
> > > aki, what do you mean with self.info()? It is permanent. Each test job will
> > > talk to the pair of sut ports redirected for each emulator.
> > 
> > self.info("Attempting to redirect ports for X to ...")
> 
> Does this do?
> http://hg.mozilla.org/build/mozharness/file/default/scripts/
> androidx86_emulator_unittest.py#l159
>     self.info("  Attempt #%d to redirect ports: (%d, %d, %d)" % \
>             (attempts, emuport, sutport1, sutport2))

Ah. Yes.

Armen [:armenzg]

Assignee

Comment 132

•

11 years ago

Results on tbpl:
- m-2 crashes: https://tbpl.mozilla.org/php/getParsedLog.php?id=27485148&tree=Ash&full=1#error99
06:04:43  WARNING -  PROCESS-CRASH | /tests/content/canvas/test/test_canvas.html | application crashed [Unknown top frame]

gbrown, I would not look at other logs until I fix few more things. Feel free to look into the m-2 crash.

It seems that I still need to set the MINIDUMP_STACKWALK correctly:
> 06:04:43     INFO -  MINIDUMP_STACKWALK not set, can't process dump.

Armen [:armenzg]

Assignee

Comment 133

•

11 years ago

- xpcshell is failing to 'mkdr /mnt/sdcard/tests'; err='Could not create the directory /mnt/sdcard/tests'
https://tbpl.mozilla.org/php/getParsedLog.php?id=27487615&tree=Ash&full=1#error0
- The command used is: /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py --deviceIP=127.0.0.1 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --no-logfiles --manifest xpcshell/xpcshell_android.ini

Armen [:armenzg]

Assignee

Comment 134

•

11 years ago

* robocop is failing with this:
07:36:49     INFO - Running on test-x86-3 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level=INFO --app=org.mozilla.fennec --remote-webserver=10.0.2.2 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --utility-path=/builds/slave/talos-slave/test/build/hostutils/bin --deviceIP=127.0.0.1 --devicePort=20705 --http-port=8858 --ssl-port=4458 --certificate-path=/builds/slave/talos-slave/test/build/tests/certs --symbols-path=crashreporter-symbols.zip --total-chunks 2 --this-chunk 1 --robocop-path=. --robocop-ids=fennec_ids.txt --robocop=robocop.ini
07:37:19     INFO - ERROR: Unable to find robocop APK './robocop.apk'
07:37:19     INFO - ERROR: Invalid options specified, use --help for a list of valid options
07:37:19     INFO -  ERROR: Unable to find robocop APK './robocop.apk'
07:37:19     INFO -  ERROR: Invalid options specified, use --help for a list of valid options
07:37:19     INFO - TinderboxPrint: robocop-2<br/><em class="testfail">T-FAIL</em>

I will adjust the robocop path but I wonder the invalid option is.

Armen [:armenzg]

Assignee

Comment 135

•

11 years ago

Summary of test results:
- m-1 to m-4 are running [1]:
TinderboxPrint: mochitest-2: T-FAIL CRASH
TinderboxPrint: mochitest-1: 32373/1/63
TinderboxPrint: mochitest-4: 37567/3/200
TinderboxPrint: mochitest-3: 19809/4/55

- m-5 to m-8 are running [2]:
TinderboxPrint: mochitest-7: 13070/10/1923
TinderboxPrint: mochitest-8: 73338/0/61
TinderboxPrint: mochitest-5: 39233/4/611
TinderboxPrint: mochitest-6: 12771/0/27

My own notes:
- The TBPL status seems to work for set-1 and set-2
- We have some jobs retrying and I want to understand why
- It seems that minidump is not quire there but I have made some progress
08:32:58     INFO -  Crash dump filename: /tmp/tmpY_ydDv/242a7c9a-2af1-0a96-7cb91f82-77bfb3dc.dmp
08:32:58     INFO -  MINIDUMP_STACKWALK binary not found: /talos-slave/test/build/venv/lib/python2.7/site-packages/talos/breakpad/linux64/minidump_stackwalk

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27491043&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27490746&tree=Ash&full=1

bhearsum@mozilla.com (:bhearsum)

Comment 136

•

11 years ago

something here is in production

Geoff Brown [:gbrown]

Reporter

Comment 137

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #133)
> - xpcshell is failing to 'mkdr /mnt/sdcard/tests'; err='Could not create the
> directory /mnt/sdcard/tests'
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=27487615&tree=Ash&full=1#error0
> - The command used is: /builds/slave/talos-slave/test/build/venv/bin/python
> /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py
> --deviceIP=127.0.0.1
> --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre
> --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules
> --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.
> apk --no-logfiles --manifest xpcshell/xpcshell_android.ini

Oops - there is no --devicePort in that command, so it is probably running on 20701, which is already running a mochitest job!!

Armen [:armenzg]

Assignee

Comment 138

•

11 years ago

Attached patch androidx86.mozharness.diff (obsolete) — Details — Splinter Review

What do you think?

Attachment #800865 - Flags: feedback?(aki)

Armen [:armenzg]

Assignee

Comment 139

•

11 years ago

It seems that we get "command timed out: 2400 seconds without output, attempting to kill" either on the install step or on the run-tests for these two sets regardless if we do staggered_startup or not.
https://tbpl.mozilla.org/php/getParsedLog.php?id=27499603&tree=Ash&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=27499150&tree=Ash&full=1

FTR, sets 1, 2 & 4 have been succeeding with 0 seconds of staggered start ups:
http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness/file/4509017b5815/configs/android/androidx86.py#l17

https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set-3
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set-5

Armen [:armenzg]

Assignee

Comment 140

•

11 years ago

Attached patch [wip] androidx86.mozharness.diff (obsolete) — Details — Splinter Review

I will leave the other patch as the one for feedback since it addressed aki's concerns. This one will be the wip which I will ask review for.

Armen [:armenzg]

Assignee

Comment 141

•

11 years ago

Summary:
* m-1 to m-8 are running well[1][2]
** gbrown will update androidx86.json to clear some test failures
* m-2 is crashing consistently
** I think that if I fix the minidump_stackwalk properly it will help fix it [3]
* The TBPL status seems to work for set-1 and set-2
* set-4 was the job that would normally RETRY
** It seems that it does not do it anymore after mochitest-gl stopped failing
** probably something in the output was triggering it
* I sometimes see command timeout of 2400 secs during the run_tests step
** I don't know if it is because we use this "sys.stdout.write('#')" instead of "self.info('#')

aki, if the timeout issues is not related to the usage of sys.stdout.write('#'), do you know if there is a safe way to see the last lines of each stdout before buildbot kills the job?



[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27491043&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27490746&tree=Ash&full=1
[3]
08:32:58     INFO -  Crash dump filename: /tmp/tmpY_ydDv/242a7c9a-2af1-0a96-7cb91f82-77bfb3dc.dmp
08:32:58     INFO -  MINIDUMP_STACKWALK binary not found: /talos-slave/test/build/venv/lib/python2.7/site-packages/talos/breakpad/linux64/minidump_stackwalk

Aki Sasaki (not active)

Comment 142

•

11 years ago

Comment on attachment 800865 [details] [diff] [review]
androidx86.mozharness.diff

Thanks for the pyflakes cleanup; that was annoying me.
Getting this:

scripts/androidx86_emulator_unittest.py:373: local variable 'dirs' is assigned to but never used

>     def _post_fatal(self, message=None, exit_code=None):
>         """ After we call fatal(), run this method before exiting.
>         """
>         self._kill_processes("emulator64-x86")
>-
>-        # XXX aki, I' not sure exactly what this block is for
>-        if 'notify' in self.actions:
>-            self.notify(message=message, fatal=True)
>         self.copy_logs_to_upload_dir()

I don't think you have to copy the logs either.

>     def run_tests(self):
>         """
>         Run the tests
>         """
>         procs = []
>
>         emulator_index = 0
>         for suite_name in self.test_suites:
>             procs.append(self._trigger_test(suite_name, emulator_index))
>             emulator_index+=1
>
>-        joint_return_code = 0
>+        joint_tbpl_status = None
>+        joint_log_level = None
>         while True:
>             for p in procs:
>                 return_code = p["process"].poll()
>                 if return_code!=None:

If you're having problems timing out, I would put in debug output around both the procs.append above, and the 'for p in procs' here.
I'm guessing something around here is hanging.
If it helps to switch to self.info() instead of sys.stdout.write() that's fine, i'm just aware that that will be a) way more verbose and b) create a ton more lines of log.  Still, you'll have timestamps and it's temporary.

>+                    # aki: do I need this?
>                     # I'm not using the concept of "plain-#" like other jobs; do I need this logging?
>                     # e.g. The mochitest suite: plain4 ran with return status: SUCCESS
>                     #self.log("The %s suite: %s ran with return status: %s" %
>                     #         (suite_category, p["suite_name"], tbpl_status), level=log_level)
>                     self.info("##### %s log ends" % p["suite_name"])

I'm not sure?  Does your log have a good summary at the end?

Attachment #800865 - Flags: feedback?(aki) → feedback+

Armen [:armenzg]

Assignee

Comment 143

•

11 years ago

gbrown, is this mochitest-gl cmd built correctly?

10:45:00     INFO - Running on test-x86-1 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level=INFO --app=org.mozilla.fennec --remote-webserver=10.0.2.2 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --utility-path=/builds/slave/talos-slave/test/build/hostutils/bin --deviceIP=127.0.0.1 --devicePort=20701 --http-port=8854 --ssl-port=4454 --certificate-path=/builds/slave/talos-slave/test/build/tests/certs --symbols-path=crashreporter-symbols.zip --test-path content/canvas/test/webgl --run-only-tests androidx86.json

What about xpcshell?

10:45:00     INFO - Running on test-x86-2 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py --deviceIP=127.0.0.1 --devicePort=20703 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --no-logfiles --manifest xpcshell/tests/xpcshell_android.ini

Flags: needinfo?(gbrown)

Armen [:armenzg]

Assignee

Comment 144

•

11 years ago

> What about xpcshell?

Ignore this last one, I have to look into an incorrect path:
10:51:00     INFO - IOError: Missing files: xpcshell/tests/xpcshell_android.ini

Geoff Brown [:gbrown]

Reporter

Comment 145

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #143)
> gbrown, is this mochitest-gl cmd built correctly?

That looks correct, except you should remove "--run-only-tests androidx86.json"

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 146

•

11 years ago

Relative paths always make me nervous. I would prefer to see an absolute path for content/canvas/test/webgl and xpcshell/tests/xpcshell_android.ini.

Armen [:armenzg]

Assignee

Comment 147

•

11 years ago

Attached patch androidx86.mozharness.diff (obsolete) — Details — Splinter Review

Added feedback from aki.
Added full paths.
I hope to have fixed the minidumps situation.
Other fixes.

I'm aiming to get xpcshell, jsreftest and reftests running by the end of Tuesday.

Attachment #800865 - Attachment is obsolete: true

Attachment #800936 - Attachment is obsolete: true

Justin Wood (:Callek)

Comment 148

•

11 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #89)

...to comment 92 are all answered in Bug 913011

Armen [:armenzg]

Assignee

Comment 149

•

11 years ago

gbrown, can you please look at this?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27648536&tree=Ash&full=1#error0
09:41:43     INFO - REFTEST TEST-UNEXPECTED-FAIL | | HTTP ERROR : 404
09:41:43     INFO - REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: Error 6 in manifest file http://10.0.2.2:8854//builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list line 1

I also see that mochitest-gl has gone a bit further:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27598902&tree=Ash&full=1#error16
but not a clean run:
15:08:58     INFO - Traceback (most recent call last):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 689, in main
15:08:58     INFO -     retVal = mochitest.runTests(options)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtests.py", line 624, in runTests
15:08:58     INFO -     self.cleanup(manifest, options)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 243, in cleanup
15:08:58     INFO -     if self._dm.fileExists(self.remoteLog):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 404, in fileExists
15:08:58     INFO -     return filename in self.listFiles(containingpath)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 408, in listFiles
15:08:58     INFO -     if not self.dirExists(rootdir):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 389, in dirExists
15:08:58     INFO -     ret = self._runCmds([{ 'cmd': 'isdir ' + remotePath }]).strip()
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 152, in _runCmds
15:08:58     INFO -     self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 134, in _sendCmds
15:08:58     INFO -     raise err
15:08:58     INFO - DMError: Automation Error: Timeout in command isdir /mnt/sdcard/tests/logs
15:08:58     INFO - Automation Error: Exception caught while running tests

I will trigger a new fresh set of builds based on:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Android&rev=074ec56640f6


Thanks!

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 150

•

11 years ago

"My" reftests run much better -- but currently with lots of errors! -- with:

/home/cltbld/tests/scripts/scripts/build/venv/bin/python remotereftest.py 
--app=org.mozilla.fennec --ignore-window-size --remote-webserver 10.0.2.2 
--xre-path /home/cltbld/tests/scripts/scripts/build/hostutils/xre 
--utility-path /home/cltbld/tests/scripts/scripts/build/hostutils/bin 
--deviceIP 127.0.0.1 --devicePort 20701 --http-port 8888 --ssl-port 4445 
--httpd-path reftest/components 
--total-chunks 10 --this-chunk 1 
--symbols-path crashreporter-symbols.zip 
tests/layout/reftests/reftest.list

Compared to "your" reftests:

/builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/reftest/remotereftest.py 
--app=org.mozilla.fennec --ignore-window-size --remote-webserver=10.0.2.2 
--xre-path=/builds/slave/talos-slave/test/build/hostutils/xre 
--utility-path=/builds/slave/talos-slave/test/build/hostutils/bin 
--deviceIP=127.0.0.1 --devicePort=20701 --http-port=8854 --ssl-port=4454 
--httpd-path reftest/components 
--total-chunks 4 --this-chunk 1 
/builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list

The only significant difference I see is your full path for reftest.list -- in this case, I think we  need a relative path. 

Also, I found that reftests run much slower on emulator -- I expect you need at least 10 chunks to avoid 60-minute timeouts.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 151

•

11 years ago

More evidence for that relative path...

My log has lines like:

INFO -  REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/reftests/reftest-sanity/test-async.xul

Compared to:

INFO - REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: Error 6 in manifest file http://10.0.2.2:8854//builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list line 1

This isn't going to work: http://10.0.2.2:8854//builds/...
                                              ^^

Armen [:armenzg]

Assignee

Comment 152

•

11 years ago

Attached patch androidx86.mozharness.diff (obsolete) — Details — Splinter Review

Switching back to relative paths.
We're testing the new inbound builds.
We have the tbpl patches live.

It's not been a very productive day. I hope to complete tomorrow what I had hoped to complete today.

Attachment #801806 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 153

•

11 years ago

I also split the reftest chunks into 10.
At least the 4 that were mentioned on the buildbot-configs.

https://tbpl.mozilla.org/?tree=Ash&jobname=Android 4.2 x86&rev=1c67140bc6a3 - The second the set of jobs will be using the build from m-i (as per comment 149)

Armen [:armenzg]

Assignee

Comment 154

•

11 years ago

Status summary:
* m-[1-8] are running green [1][2]
* xpcshell is running green [3]
* mochitest-gl is running but it fails [3]
* robocop-{1,2} are running but 2 tests fail [3]
* reftests should be running before the end of the day


[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700133&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700349&tree=Ash&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27705576&tree=Ash&full=1

https://tbpl.mozilla.org/?tree=Ash&jobname=Android%204.2%20x86&rev=1c67140bc6a3

Armen [:armenzg]

Assignee

Comment 155

•

11 years ago

Attached patch androidx86.mozharness.diff — Details — Splinter Review

This is very close.

I need to see set 3 & 5 not timing out and that should be mainly it.

Attachment #802585 - Attachment is obsolete: true

Attachment #803294 - Flags: review?(aki)

Attachment #803294 - Flags: feedback?(gbrown)

Armen [:armenzg]

Assignee

Comment 156

•

11 years ago

Attached patch androidx86.configs.diff (obsolete) — Details — Splinter Review

These are the branches where it would be enabled:
Android 4.2 x86 Emulator alder opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ash opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator b2g-inbound opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator birch opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator build-system opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator cedar opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator cypress opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator elm opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator fig opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator fx-team opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator graphics opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator gum opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator holly opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ionmonkey opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator jamun opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator larch opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator maple opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator mozilla-central opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator mozilla-inbound opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator oak opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator pine opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator profiling opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator services-central opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator try opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ux opt test androidx86-set-1 ScriptFactory

Attachment #803295 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 157

•

11 years ago

Comment on attachment 803295 [details] [diff] [review]
androidx86.configs.diff

Why TEMP ?

Attachment #803295 - Flags: review?(aki) → review+

Armen [:armenzg]

Assignee

Comment 158

•

11 years ago

gbrown, could you please have a look at these reftests, crashtest and jsreftests results?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27726116&tree=Ash&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=27726080&tree=Ash&full=1

Flags: needinfo?(gbrown)

Aki Sasaki (not active)

Comment 159

•

11 years ago

Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Good job!  And I'm really happy to see the pyflakes warnings go away too.

>     def setup_avds(self):
>         '''
>         We have deployed through Puppet tar ball with the pristine templates.
>-        If they have not been untarred before we go ahead and do so.
>+        Let's unpack them every time.
>         '''
>-        if not os.path.exists(os.path.join(self.config[".avds_dir"], "test-x86-1.avd")):
>-            avds_path = self.config["avds_path"]
>-            self.mkdir_p(self.config[".avds_dir"])
>-            self.unpack(avds_path, self.config[".avds_dir"])
>+        if os.path.exists(os.path.join(self.config[".avds_dir"], "test-x86-1.avd")):
>+           shutil.rmtree(self.config[".avds_dir"])

self.rmtree?

Attachment #803294 - Flags: review?(aki) → review+

Armen [:armenzg]

Assignee

Comment 160

•

11 years ago

Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Landed so we can see nicer results on Cedar.
Any follow up feedback or fixes will come in new patch to address them separately.

Attachment #803294 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 161

•

11 years ago

(In reply to Aki Sasaki [:aki] from comment #157)
> Comment on attachment 803295 [details] [diff] [review]
> androidx86.configs.diff
> 
> Why TEMP ?
I can leave it as it was to match the naming of other dictionaries in the file.
I have pet peeve with naming things very similar to other variables.

Summary:
> * m-[1-8] are running green [1][2]
> * xpcshell is running green [3]
> * mochitest-gl is running but it fails [3]
** gbrown to look into it
> * robocop-{1,2} are running but 2 tests fail [3]
* reftests, crashtest and jsreftests can run but FAIL [4][5]
** gbrown to look into it
* once mozharness is merged to production, we will be able to see the x86 jobs run as I mention in this summary
* I would like to wait until Cedar is green before we enable them across the board
** blassey works for you? (Or add voluntold to go to every tree to hide/show jobs as they green out)

> [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700133&tree=Ash&full=1
> [2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700349&tree=Ash&full=1
> [3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27705576&tree=Ash&full=1
> 
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27726116&tree=Ash&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27726080&tree=Ash&full=1

Flags: needinfo?(blassey.bugs)

Whiteboard: [reit-x86] → [reit-x86] summary in comment 161

Brad Lassey [:blassey] (use needinfo?)

Comment 162

•

11 years ago

voluntold??

do you have any gut feeling for how long it'll take to green cedar up?

Flags: needinfo?(blassey.bugs)

Armen [:armenzg]

Assignee

Comment 163

•

11 years ago

(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #162)
> voluntold??
>
Volunteer + told :P 
nvm. ignore me :)

> do you have any gut feeling for how long it'll take to green cedar up?

I can only get things as I mentioned on comment 161.
I will work *today/tomorrow* towards getting Cedar in such state (merge mozharness to production + trigger new set of builds)
mochitest-gl, reftests, crashtest and jsreftests are as of now out of my hand.

Armen [:armenzg]

Assignee

Comment 164

•

11 years ago

Merged to production and triggered new set of builds on Cedar.

Results to be found in here:
https://tbpl.mozilla.org/?tree=Cedar&jobname=Android 4.2 x86&rev=e6f8b77a8824

Geoff Brown [:gbrown]

Reporter

Comment 165

•

11 years ago

Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Review of attachment 803294 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine.

I am investigating the remaining test failures. We may need to add some chunks or introduce another x86-specific manifest, but other than that, I don't expect more harness changes.

Attachment #803294 - Flags: feedback?(gbrown) → feedback+

Armen [:armenzg]

Assignee

Comment 166

•

11 years ago

For some odd reason, Cedar is not reporting the results that I expected. I will look into it while gbrown looks into the issues that I reported earlier.

I hope to have some fixes landed before EOD tomorrow.

Geoff Brown [:gbrown]

Reporter

Comment 167

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=27772974&tree=Ash&full=1#error49 has a crash dump (good) but it lacks symbols (no file names, line numbers -- bad). I am not sure where this is going wrong. I think the harness should be invoking "minidump_stackwalk <dmp file> <symbols dir>". I have a bad feeling that we are passing crashreporter-symbols.zip as <symbols dir>, instead of unpacking crashreporter-symbols.zip to a directory and passing that directory name to minidump_stackwalk.

Armen [:armenzg]

Assignee

Comment 168

•

11 years ago

Attached patch add def worst_tbpl_status() (obsolete) — Details — Splinter Review

Somewhere down the line I failed to bring this part of my code into the review.

Attachment #804437 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 169

•

11 years ago

Comment on attachment 804437 [details] [diff] [review]
add def worst_tbpl_status()

Hm, I use

    self.tbpl_status = self.worst_level(TBPL_WARNING, self.tbpl_status,
                                        levels=TBPL_WORST_LEVEL_TUPLE)

http://hg.mozilla.org/build/mozharness/file/a660ae1a633f/mozharness/mozilla/testing/unittest.py#l135

This seems to be dup code?  I'm fine with having this method, but maybe it should call self.worst_level().

Aki Sasaki (not active)

Comment 170

•

11 years ago

Comment on attachment 804437 [details] [diff] [review]
add def worst_tbpl_status()

Minusing for now, due to comment 169.

Attachment #804437 - Flags: review?(aki) → review-

Armen [:armenzg]

Assignee

Comment 171

•

11 years ago

Attached patch tbpl's worst status - v2 — Details — Splinter Review

Running on Ash. I will ask for review when I see them complete.

Attachment #804437 - Attachment is obsolete: true

Armen [:armenzg]

Assignee

Comment 172

•

11 years ago

I asked on IRC but just in case our day is over and can't get back to me through IRC

I'm in the process of having Cedar match comment 161. Probably before EOD.

gbrown, what would you prefer me to help with?
* bug 915870 - make sure that our funky builder naming works with trychooser
* help with the minidump issue - comment 167

On another note, do we need to split reftests even more? (currently running 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)

Geoff Brown [:gbrown]

Reporter

Comment 173

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #172)
> gbrown, what would you prefer me to help with?
> * bug 915870 - make sure that our funky builder naming works with trychooser
> * help with the minidump issue - comment 167

The minidump issue please.

> On another note, do we need to split reftests even more? (currently running
> 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)

I think we will need to split reftests more, but I'm not sure how much. We are also considering running with skia-gl disabled -- some discussion in bug 907351 -- so I am running some special tests to see how much of a difference that makes. I'll try to get back to you with a recommendation for # chunks by Monday morning.

Flags: needinfo?(gbrown)

Armen [:armenzg]

Assignee

Comment 174

•

11 years ago

Comment on attachment 804623 [details] [diff] [review]
tbpl's worst status - v2

It works!

Attachment #804623 - Flags: review?(aki)

Armen [:armenzg]

Assignee

Comment 175

•

11 years ago

Attached patch androidx86.minidump.diff — Details — Splinter Review

This is wip.

The theory is that you can pass to --symbols-path either a path or a URL and the test harnesses take care of it.

I see that we have "download-symbols" set to be "ondemand"
> 'download_symbols': 'ondemand'
which causes us to set self.symbols_path to be self.symbols_url

http://hg.mozilla.org/build/mozharness/file/production/mozharness/mozilla/testing/testbase.py#l250

Let's see if we get a crash in https://tbpl.mozilla.org/?tree=Ash&jobname=Android%204.2%20x86&rev=82db508f2304

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #804623 - Flags: review?(aki) → review+

Armen [:armenzg]

Assignee

Comment 176

•

11 years ago

Comment on attachment 804623 [details] [diff] [review]
tbpl's worst status - v2

https://hg.mozilla.org/build/mozharness/rev/9ef0a3e99b55

I will re-trigger the jobs in here and we should see what I mentioned on comment 161 (in an hour from now):
https://tbpl.mozilla.org/?tree=Cedar&jobname=Android%204.2%20x86&rev=e6f8b77a8824

Attachment #804623 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 177

•

11 years ago

I see the symbols-path set to a URL but I don't see that the output is any different.

gbrown, is there anything else left to be set before this should work?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27848103&tree=Ash&full=1#error0

I do see this though:
14:21:49     INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/ash-android-x86/1379011131/fennec-26.0a1.en-US.android-i386.crashreporter-symbols.zip
http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#73

I will look at mozcrash's code on Monday.

Flags: needinfo?(gbrown)

Armen [:armenzg]

Assignee

Comment 178

•

11 years ago

Cedar is looking good :)

Comment 161 *might* need refreshing on Monday since I can already see that mochitest-3 is crashing when it didn't use to.

Geoff Brown [:gbrown]

Reporter

Comment 179

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #177)
> I see the symbols-path set to a URL but I don't see that the output is any
> different.

That looks like it should work.

:ted -- Can you see what is going wrong here? As shown in Comment 177, mozcrash reports that it downloads symbols, but I do not see symbols in the resulting crash dump.

Flags: needinfo?(gbrown) → needinfo?(ted)

Geoff Brown [:gbrown]

Reporter

Updated

•

11 years ago

Depends on: 916657

Geoff Brown [:gbrown]

Reporter

Comment 180

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #173)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> > On another note, do we need to split reftests even more? (currently running
> > 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)
> 
> I think we will need to split reftests more, but I'm not sure how much. We
> are also considering running with skia-gl disabled -- some discussion in bug
> 907351 -- so I am running some special tests to see how much of a difference
> that makes. I'll try to get back to you with a recommendation for # chunks
> by Monday morning.

Try:

crashtests -- 3 chunks
js-reftests -- 6 chunks

The plain-reftest situation is pretty bad. I think we need 20+ chunks for green runs -- we may want to wait for progress on bug 916657.

Armen [:armenzg]

Assignee

Comment 181

•

11 years ago

Attached patch chunk jsreftest into 6 chunks and crashtest into 3 — Details — Splinter Review

Attachment #805281 - Flags: review?(gbrown)

Armen [:armenzg]

Assignee

Comment 182

•

11 years ago

Attached patch [configs] - chunk jsreftest into 6 and crashtest into 3 (obsolete) — Details — Splinter Review

Attachment #805282 - Flags: review?(aki)

Armen [:armenzg]

Assignee

Comment 183

•

11 years ago

Attached patch [configs] add x86 testing across branches (obsolete) — Details — Splinter Review

Carrying forward the part of the patch which adds the tests across the board.
For now, this patch is on hold until we get everything green on Cedar.

Attachment #803295 - Attachment is obsolete: true

Attachment #805288 - Flags: review+

Armen [:armenzg]

Assignee

Comment 184

•

11 years ago

Summary (these results are from Cedar):

Running well:
* m-1, m-2 and m-4 are green; m-3 crashes [1]
* m-[4-8] are green [2]
* xpcshell is running green [4]

Suites needing attention:
* mochitest-gl is crashing [4]
** gbrown to look into it
* robocop-{1,2} are running but 2 tests fail [4]
* reftests, crashtest and jsreftests can run but FAIL/timeout [3][5]
** more crashtest and jsreftest chunking will happen today/tomorrow
** reftests will need more investigation - bug 916657

Others:
* minidumps are fixed
* crash symbols are not giving source code lines
** waiting on "needinfo" for ted
* adjust trychooser to handle x86 "sets" of suites approach
** bug 915870

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27851558&tree=Cedar&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27848601&tree=Cedar&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850566&tree=Cedar&full=1
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27849762&tree=Cedar&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850902&tree=Cedar&full=1

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Whiteboard: [reit-x86] summary in comment 161 → [reit-x86] summary in comment 184

Armen [:armenzg]

Assignee

Comment 185

•

11 years ago

I know that I've said that we should wait until everything is green on Cedar, however, after seeing the bug filed for reftests I wonder if we should go out the door with whatever is green and leave reftests only running on Cedar and Ash.

What do you think?

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #805282 - Flags: review?(aki) → review+

(not currently active) Ted Mielczarek

Comment 186

•

11 years ago

This isn't a symbols issue, this dump looks completely broken. If you can grab one of these minidumps off of a slave and attach it here or somewhere else we can poke at it.

Flags: needinfo?(ted)

Geoff Brown [:gbrown]

Reporter

Updated

•

11 years ago

Attachment #805281 - Flags: review?(gbrown) → review+

Geoff Brown [:gbrown]

Reporter

Comment 187

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #185)
> I know that I've said that we should wait until everything is green on
> Cedar, however, after seeing the bug filed for reftests I wonder if we
> should go out the door with whatever is green and leave reftests only
> running on Cedar and Ash.

I would prefer that. If the reftest issue takes a while to sort out, we risk losing today's greens over time.

Geoff Brown [:gbrown]

Reporter

Updated

•

11 years ago

Depends on: 916923

Armen [:armenzg]

Assignee

Comment 188

•

11 years ago

Attached patch enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar — Details — Splinter Review

Attachment #805282 - Attachment is obsolete: true

Attachment #805288 - Attachment is obsolete: true

Attachment #805514 - Flags: review?(aki)

Armen [:armenzg]

Assignee

Comment 189

•

11 years ago

Comment on attachment 805281 [details] [diff] [review]
chunk jsreftest into 6 chunks and crashtest into 3

https://hg.mozilla.org/build/mozharness/rev/5eca80d07e33

Attachment #805281 - Flags: checked-in+

Armen [:armenzg]

Assignee

Updated

•

11 years ago

Attachment #804696 - Flags: review?(aki)

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #804696 - Flags: review?(aki) → review+

Aki Sasaki (not active)

Comment 190

•

11 years ago

Comment on attachment 805514 [details] [diff] [review]
enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar

Thanks for fixing the dict spacing!  That was bugging me.

Attachment #805514 - Flags: review?(aki) → review+

Armen [:armenzg]

Assignee

Comment 191

•

11 years ago

Comment on attachment 805514 [details] [diff] [review]
enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar

https://hg.mozilla.org/build/buildbot-configs/rev/e5177c27ce46

Attachment #805514 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 192

•

11 years ago

Comment on attachment 804696 [details] [diff] [review]
androidx86.minidump.diff

https://hg.mozilla.org/build/mozharness/rev/6ca289a39407

Attachment #804696 - Flags: checked-in+

Armen [:armenzg]

Assignee

Comment 193

•

11 years ago

Summary (these results are from Cedar):

Coming up:
* we're enabling sets 1 and 2 across the board (whenever we have a reconfig)
** m-{1,2,4,5,6,7,8} and xpcshell (not m-3)
* the remaining suites will run on Cedar and Ash
** as suites get fixed on Cedar we will move them to the all other branches
* landed fix for download symbols correctly

Running well:
* m-1, m-2 and m-4 are green; m-3 crashes [1]
* m-[4-8] are green [2]
* xpcshell is running green [4]

Suites needing attention:
* mochitest-gl is crashing [4]
** gbrown to look into it
* robocop-{1,2} are running but 2 tests fail [4]
* reftests, crashtest and jsreftests can run but FAIL/timeout [3][5]
** more crashtest and jsreftest chunking will happen today/tomorrow
** reftests will need more investigation - bug 916657

Others:
* crash symbols are not giving source code lines
** ted and gbrown investigating in bug 916923
* bug 915870 - adjust trychooser to handle x86 "sets" of suites approach

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27851558&tree=Cedar&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27848601&tree=Cedar&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850566&tree=Cedar&full=1
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27849762&tree=Cedar&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850902&tree=Cedar&full=1

Whiteboard: [reit-x86] summary in comment 184 → [reit-x86] summary in comment 193

Geoff Brown [:gbrown]

Reporter

Updated

•

11 years ago

Depends on: 917053

Aki Sasaki (not active)

Comment 194

•

11 years ago

This should be in production.

Armen [:armenzg]

Assignee

Comment 195

•

11 years ago

We now have sets 1 & 2 running everywhere:
https://tbpl.mozilla.org/?jobname=Android%204.2%20x86&rev=1d27c4c9871f

It seems like splitting crashtest into 3 chunks made them run green.
Should we get those out to other branches?

The same with jsreftests, splitting them into 6 made them run green (except jsreftest-5).
Would you want to fix #5 before moving them to other branches?

robocop-2 is not failing any tests. robocop-1 is only failing 2 tests.

Ed Morley [:emorley]

Comment 196

•

11 years ago

We don't have proper crash stacks (bug ), eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27973359&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=27973359&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=27973163&tree=Mozilla-Inbound

So Android x86 tests hidden everywhere for now.

Ed Morley [:emorley]

Comment 197

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #196)
> We don't have proper crash stacks (bug )

Bug 916923

Geoff Brown [:gbrown]

Reporter

Comment 198

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #195)
> We now have sets 1 & 2 running everywhere:

Awesome!
 
> It seems like splitting crashtest into 3 chunks made them run green.
> Should we get those out to other branches?

I think so.
 
> The same with jsreftests, splitting them into 6 made them run green (except
> jsreftest-5).
> Would you want to fix #5 before moving them to other branches?
 
I think so -- I'll be looking at jsreftest-5 more closely today.

> robocop-2 is not failing any tests. robocop-1 is only failing 2 tests.

There are some non-x86 robocop patches landing today that may help.

There is a new patch landing in bug 913627 which I hope will green up M3.

I hope to have a patch up for review in bug 917053 today to fix M-gl.

Armen [:armenzg]

Assignee

Updated

•

11 years ago

No longer depends on: 916923

Armen [:armenzg]

Assignee

Updated

•

11 years ago

No longer depends on: 917053

Armen [:armenzg]

Assignee

Updated

•

11 years ago

No longer depends on: 916657

Armen [:armenzg]

Assignee

Comment 199

•

11 years ago

It's rather cumbersome to add green test suites to tbpl if they have to be hidden right away (due to tbpl's per-branch nature as well as waiting for them to be scheduled first).
Could we try fixing them on Cedar for this week and see what is ready for next week?
Could we use bug 891959 for further status updates as well as adding more tests suites to tbpl?

FYI, my biggest focus will be bug 915870.
I will not be able to look at bug 891959 before next week. Is there anyone besides gbrown that would be interested to give a hand with it?

FTR, I need a day or two to meet some Summit Preparation deadlines that are happening this week.

Flags: needinfo?(gbrown)

Flags: needinfo?(blassey.bugs)

Armen [:armenzg]

Assignee

Comment 200

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #199) 
> FYI, my biggest focus will be bug 915870.
> I will not be able to look at bug 891959 before next week. Is there anyone
> besides gbrown that would be interested to give a hand with it?
> 
I meant bug 917361 (make it easy for a dev to run the Android x86 test jobs).

Geoff Brown [:gbrown]

Reporter

Comment 201

•

11 years ago

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #199)
> Could we try fixing them on Cedar for this week and see what is ready for
> next week?

That seems reasonable. With patches on the go, I expect we can turn everything green on Cedar this week, except for plain reftests.

> Could we use bug 891959 for further status updates as well as adding more
> tests suites to tbpl?

OK.

Flags: needinfo?(gbrown)

Brad Lassey [:blassey] (use needinfo?)

Comment 202

•

11 years ago

sounds good

Flags: needinfo?(blassey.bugs)

Armen [:armenzg]

Assignee

Comment 203

•

11 years ago

I have triggered a new set of builds on Cedar where some changes that gbrown landed on m-i will be integrated.

A new summary will be given in bug 891959 once those builds complete.

This week the focus will be on bug 915870 for the try support.
Next week we will add whatever we green out this week.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: General Automation → General

androidx86_emulator_unittest.py 11 years ago Geoff Brown [:gbrown] 14.87 KB, text/plain		Details
config-gb-m1.py 11 years ago Geoff Brown [:gbrown] 1.00 KB, text/x-python		Details
config-gb-m1.py 11 years ago Geoff Brown [:gbrown] 1.00 KB, text/plain		Details
x86.diff 11 years ago Armen [:armenzg] 6.31 KB, patch		Details \| Diff \| Splinter Review
misc x86 emulator changes 11 years ago Geoff Brown [:gbrown] 7.05 KB, patch	armenzg : feedback+	Details \| Diff \| Splinter Review
[wip] androix86 11 years ago Armen [:armenzg] 11.88 KB, patch		Details \| Diff \| Splinter Review
[wip] integrated launch.py into mozharness 11 years ago Armen [:armenzg] 11.87 KB, patch		Details \| Diff \| Splinter Review
x86_tools.diff 11 years ago Armen [:armenzg] 4.31 KB, patch	Callek : review+	Details \| Diff \| Splinter Review
[wip] x86_bc.diff 11 years ago Armen [:armenzg] 8.12 KB, patch	mozilla : feedback+	Details \| Diff \| Splinter Review
androidx86.tools.diff 11 years ago Armen [:armenzg] 6.19 KB, patch		Details \| Diff \| Splinter Review
[wip] androidx86.mozharness.diff 11 years ago Armen [:armenzg] 17.59 KB, patch		Details \| Diff \| Splinter Review
androidx86.log.txt 11 years ago Armen [:armenzg] 21.89 KB, text/plain		Details
[wip] androidx86.mozharness.diff 11 years ago Armen [:armenzg] 20.98 KB, patch		Details \| Diff \| Splinter Review
launch2.py 11 years ago Armen [:armenzg] 4.24 KB, text/plain		Details
buildprops.json 11 years ago Armen [:armenzg] 1.41 KB, application/json		Details
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 21.73 KB, patch		Details \| Diff \| Splinter Review
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 26.22 KB, patch		Details \| Diff \| Splinter Review
concurrent.diff 11 years ago Armen [:armenzg] 1.45 KB, patch		Details \| Diff \| Splinter Review
[puppet] add android-tools 11 years ago Justin Wood (:Callek) 1.84 KB, patch	rail : review-	Details \| Diff \| Splinter Review
[puppet] add android-tools v2 11 years ago Justin Wood (:Callek) 1.81 KB, patch	rail : review+ Callek : checked-in+	Details \| Diff \| Splinter Review
[puppet] deploy avd's 11 years ago Justin Wood (:Callek) 2.26 KB, patch	rail : review+ Callek : checked-in+	Details \| Diff \| Splinter Review
[wip] androidx86.configs.diff 11 years ago Armen [:armenzg] 11.57 KB, patch	mozilla : feedback+	Details \| Diff \| Splinter Review
[wip] androidx86.mozharness.3.diff 11 years ago Armen [:armenzg] 32.39 KB, patch		Details \| Diff \| Splinter Review
androidx86.log.txt 11 years ago Armen [:armenzg] 66.33 KB, text/plain		Details
[wip] androidx86.mozharness.3.diff 11 years ago Armen [:armenzg] 35.72 KB, patch	gbrown : feedback+	Details \| Diff \| Splinter Review
[wip] status mozharness experiments 11 years ago Armen [:armenzg] 3.75 KB, patch		Details \| Diff \| Splinter Review
androidx86.configs.2.diff 11 years ago Armen [:armenzg] 15.43 KB, patch	mozilla : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
androidx86.mozharness.4.diff 11 years ago Armen [:armenzg] 37.77 KB, patch		Details \| Diff \| Splinter Review
androidx86.mozharness.4.diff 11 years ago Armen [:armenzg] 37.78 KB, patch	mozilla : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
androidx86.tools.diff - We add android-x86 on the linux masters instead of the mobile ones 11 years ago Armen [:armenzg] 3.15 KB, patch	Callek : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 19.82 KB, patch	mozilla : feedback+	Details \| Diff \| Splinter Review
[wip] androidx86.mozharness.diff 11 years ago Armen [:armenzg] 21.00 KB, patch		Details \| Diff \| Splinter Review
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 25.06 KB, patch		Details \| Diff \| Splinter Review
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 24.63 KB, patch		Details \| Diff \| Splinter Review
androidx86.mozharness.diff 11 years ago Armen [:armenzg] 25.07 KB, patch	mozilla : review+ gbrown : feedback+ armenzg : checked-in+	Details \| Diff \| Splinter Review
androidx86.configs.diff 11 years ago Armen [:armenzg] 7.72 KB, patch	mozilla : review+	Details \| Diff \| Splinter Review
add def worst_tbpl_status() 11 years ago Armen [:armenzg] 1.44 KB, patch	mozilla : review-	Details \| Diff \| Splinter Review
tbpl's worst status - v2 11 years ago Armen [:armenzg] 2.03 KB, patch	mozilla : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
androidx86.minidump.diff 11 years ago Armen [:armenzg] 2.14 KB, patch	mozilla : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
chunk jsreftest into 6 chunks and crashtest into 3 11 years ago Armen [:armenzg] 3.75 KB, patch	gbrown : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review
[configs] - chunk jsreftest into 6 and crashtest into 3 11 years ago Armen [:armenzg] 3.39 KB, patch	mozilla : review+	Details \| Diff \| Splinter Review
[configs] add x86 testing across branches 11 years ago Armen [:armenzg] 2.62 KB, patch	armenzg : review+	Details \| Diff \| Splinter Review
enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar 11 years ago Armen [:armenzg] 10.92 KB, patch	mozilla : review+ armenzg : checked-in+	Details \| Diff \| Splinter Review