Closed Bug 895186 Opened 11 years ago Closed 11 years ago

Run Android x86 emulator unit tests from buildbot

Categories

(Release Engineering :: General, defect, P2)

x86
Android
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: armenzg)

References

Details

(Whiteboard: [reit-x86] summary in comment 193)

Attachments

(11 files, 32 obsolete files)

4.24 KB, text/plain
Details
1.81 KB, patch
rail
: review+
Callek
: checked-in+
Details | Diff | Splinter Review
2.26 KB, patch
rail
: review+
Callek
: checked-in+
Details | Diff | Splinter Review
15.43 KB, patch
mozilla
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
37.78 KB, patch
mozilla
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
3.15 KB, patch
Callek
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
25.07 KB, patch
mozilla
: review+
gbrown
: feedback+
armenzg
: checked-in+
Details | Diff | Splinter Review
2.03 KB, patch
mozilla
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
2.14 KB, patch
mozilla
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
3.75 KB, patch
gbrown
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
10.92 KB, patch
mozilla
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
Over in bug 891959, I am sorting out how we can run our unit tests in Android x86 emulators. :dminor handed off some excellent mozharness-based scripts to me; they run the tests, but I am not sure how buildbot will integrate with them, or what component will assign test jobs to emulators.

This is a little preliminary, but we should probably start thinking about how all the pieces fit together. At this point I'm mostly looking for someone to work with me to plan a strategy for running these tests.
Armen, does this look likely you'd be able to coord with gbrown about this endeavor, or do we want to have a mini group meeting to figure out who has time?
Flags: needinfo?(armenzg)
I can make time to follow up with gbrown.

gbrown: my ical is up-to-date, could you please pick a date and time to chat?
Flags: needinfo?(armenzg)
It seems that gbrown has some mozharness scripts and configs that he's going to get me and I can try to integrate it to my staging master with an iX box.

We can't do this testing on EC2 due to some issues with OpenGL that crashed.

Run times are a bit slower (20-30% slower) than on Pandas and Tegras.
Assignee: nobody → armenzg
Blocks: 891959
Adding needinfo to keep track that I need the scripts.
Flags: needinfo?(gbrown)
Flags: needinfo?(gbrown)
Attached file androidx86_emulator_unittest.py (obsolete) —
This is the python script I have been using to run tests in an emulator. I have been executing: 

androidx86_emulator_unittest.py --config <file>

and specifying all options in the config file.
Attached file config-gb-m1.py (obsolete) —
Sample config file: This runs mochitest-1 on emulator-5554.
See also bug 894507 for procedure and supporting scripts for setting up the emulator environment and launching the emulators.
Attached file config-gb-m1.py (obsolete) —
Sample config file: This runs mochitest-1 on emulator-5554.

This version also sets base_work_dir. If more than one script is running at one time, it is essential that each has a distinct base_work_dir.
Attachment #780533 - Attachment is obsolete: true
Priority: -- → P3
Priority: P3 → P2
Attached patch x86.diff (obsolete) — Splinter Review
I'm currently trying to run this on a machine by applying the attached patch.

python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand
Attachment #780531 - Attachment is obsolete: true
Attachment #780541 - Attachment is obsolete: true
gbrown, dminor, on which machines did you run these scripts on?
What are their hostnames?

I'm using a test machine called talos-linux64-ix-003 and adb is not installed.
I'm not 100% if the lack of it is what is making me fail. I assume so.

13:56:02     INFO - #####
13:56:02     INFO - ##### Running install step.
13:56:02     INFO - #####
13:56:02     INFO - Running pre-action listener: _resource_record_pre_action
13:56:02     INFO - Running main action method: install
Getting output from command: ['adb', '-s', 'emulator-5554', 'shell', 'date']
Copy/paste: adb -s emulator-5554 shell date
13:56:02     INFO - Running post-action listener: _resource_record_post_action
13:56:02    FATAL - Uncaught exception: Traceback (most recent call last):
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 1048, in run
13:56:02    FATAL -     self.run_action(action)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 990, in run_action
13:56:02    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 931, in _possibly_run_method
13:56:02    FATAL -     return getattr(self, method_name)()
13:56:02    FATAL -   File "scripts/androidx86_emulator_unittest.py", line 192, in install
13:56:02    FATAL -     dh.install_app(self.installer_path)
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 349, in install_app
13:56:02    FATAL -     self.set_device_time()
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 309, in set_device_time
13:56:02    FATAL -     self.info(self.query_device_time())
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/mozilla/testing/device.py", line 300, in query_device_time
13:56:02    FATAL -     "shell", "date"])
13:56:02    FATAL -   File "/home/cltbld/mozharness/mozharness/base/script.py", line 719, in get_output_from_command
13:56:02    FATAL -     cwd=cwd, stderr=tmp_stderr, env=env)
13:56:02    FATAL -   File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
13:56:02    FATAL -     errread, errwrite)
13:56:02    FATAL -   File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
13:56:02    FATAL -     raise child_exception
13:56:02    FATAL - OSError: [Errno 2] No such file or directory
13:56:02    FATAL - Exiting -1
13:56:02     INFO - Running post-run listener: _resource_record_post_run
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ source build/venv/bin/activate
(venv)[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ adb -s emulator-5554 shell date
No command 'adb' found, did you mean:
 Command 'cdb' from package 'tinycdb' (main)
 Command 'gdb' from package 'gdb' (main)
 Command 'dab' from package 'bsdgames' (universe)
 Command 'zdb' from package 'zfs-fuse' (universe)
 Command 'kdb' from package 'elektra-bin' (universe)
 Command 'tdb' from package 'tads2-dev' (multiverse)
 Command 'pdb' from package 'python' (main)
 Command 'jdb' from package 'openjdk-6-jdk' (main)
 Command 'jdb' from package 'openjdk-7-jdk' (universe)
 Command 'ab' from package 'apache2-utils' (main)
 Command 'ad' from package 'netatalk' (universe)
adb: command not found
I have been working on talos-linux64-ix-001.test.releng.scl3.mozilla.com. We installed the Android SDK there and added $SDK/tools and $SDK/platform-tools to PATH.
(In reply to Geoff Brown [:gbrown] from comment #11)
> I have been working on talos-linux64-ix-001.test.releng.scl3.mozilla.com. We
> installed the Android SDK there and added $SDK/tools and $SDK/platform-tools
> to PATH.

I believe the right approach is to create the emulator snapshots inside of the current Android x86 builds and upload them to ftp.

Then the test machines will download the avd files and start them up.

Would this approach work for you?

gbrown, could you please upload somewhere few avd files for me?
I would like to verify steps 5 & 6 from bug 894507 and add mock support.

On another note, could we meet on Tuesday? (I'm off on Monday) I'm finally having time to look at this and I can now ask more intelligent questions.

I see two sides to this project:
1) generate the avd files on the build machines and upload them
2) download the avd files on the talos-linux64-ix machines and trigger the emulator jobs

Note to self, I need to enable the Android x86 builds on Cedar and Ash.
It seems that the SDK packaging might have changed since your original setup.

I'm trying this:
wget http://dl.google.com/android/adt/adt-bundle-linux-x86_64-20130729.zip
unzip adt-bundle-linux-x86_64-20130729.zip
mv adt-bundle-linux-x86_64-20130729/sdk/ ~/android-sdk-linux
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
cd ~/mozharness
python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand


Which of these avd files cltbld@talos-linux64-ix-001:~/gbrown are good for me to try?
junk2-avd.tgz  test-avd-8.tgz  test-avds-4.tgz  test-avds-5.tgz  test-avds-6.tgz  test-avds.tgz
Use this one:

http://people.mozilla.org/~gbrown/test-avds.tgz

Let's chat on Tuesday about the rest.
Attached patch misc x86 emulator changes (obsolete) — Splinter Review
Similar to your patch, here are the changes I have been running with lately:
 - enable xpcshell tests
 - use hostutils.zip instead of xre.zip
 - simplify
Attachment #786149 - Flags: review?(armenzg)
Comment on attachment 786149 [details] [diff] [review]
misc x86 emulator changes

Review of attachment 786149 [details] [diff] [review]:
-----------------------------------------------------------------

Changing to feedback+ to prevent landing until we're ready.

Would avoid the check-in get on the way for you? I can ask a review from aki once I'm comfortable that things are working all the way.
Attachment #786149 - Flags: review?(armenzg) → feedback+
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) (EDT/UTC-4) from comment #16)
> Would avoid the check-in get on the way for you? 

That's fine. I thought it would be easier to "share" with you if I checked in -- whatever works for you is fine.
Attached patch [wip] androix86 (obsolete) — Splinter Review
This patch brings our patches a little closer to each other but not quite.
After I get things working, I will paste final patches.
Attachment #783351 - Attachment is obsolete: true
Attachment #786149 - Attachment is obsolete: true
Should this have worked?

mkdir ~/.android/avd
cd ~/.android/avd
wget http://people.mozilla.org/~gbrown/test-avds.tgz
tar zxvf test-avds.tgz
wget -O launch.py https://bugzilla.mozilla.org/attachment.cgi?id=782839
$ python launch.py 
emulator: ERROR: This AVD's configuration is missing a kernel file!!
That's the right idea, but something is going wrong.

I think the issue is that the avd definitions contain pointers to image files in the Android SDK. Probably the culprit is:

/home/cltbld/android-sdk-linux/system-images/android-17/x86//kernel-qemu
FYI, I might be using a newer SDK.
This is the version that I downloaded it:
wget http://dl.google.com/android/adt/adt-bundle-linux-x86_64-20130729.zip

I can try to find an older SDK to match what you used.

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ls -l /home/cltbld/android-sdk-linux/system-images/android-18/
total 4
drwxrwx--- 2 cltbld cltbld 4096 Jul 10 19:05 armeabi-v7a
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ls -l /home/cltbld/android-sdk-linux/system-images
total 4
drwxr-x--- 3 cltbld cltbld 4096 Jul 29 15:23 android-18
I got a little further. Suggestions?

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
SDL init failure, reason is: No available video device
Setting the DISPLAY value helps.

> After a few minutes, launch.py will start 4 emulator instances and print the
> names and ports associated with each:

Is there a way to make it take less time? or know that it has not hung?
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
export DISPLAY=:0.0
cd ~/mozharness
python launch.py (modified some printing)

Should I worry about those WARNINGs?
Do we need the sleeptime?


[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
Launching emulator #0
Attemp #1 of SUT redirection
test-x86-1: 5554; sut port:20701/20700
Sleeping 60
WARNING: Data partition already in use. Changes will not persist!
WARNING: SD Card image already in use: /home/cltbld/.android/avd/test-x86-1.avd/sdcard.img
WARNING: Cache partition already in use. Changes will not persist!
Launching emulator #1
Attemp #1 of SUT redirection
Attemp #2 of SUT redirection
Attemp #3 of SUT redirection
^@Attemp #4 of SUT redirection
^@^@Attemp #5 of SUT redirection
^@^@^@^@^@^@^@^@^@^@Traceback (most recent call last):
  File "launch.py", line 45, in <module>
    proc = launchEmulatorByIndex(i)
  File "launch.py", line 37, in launchEmulatorByIndex
    redirectSUT(emuport, sutport1, sutport2)
  File "launch.py", line 23, in redirectSUT
    tn.read_until('OK')
UnboundLocalError: local variable 'tn' referenced before assignment
How can I remove devices?

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ adb devices
List of devices attached 
emulator-5554   device
> Should I worry about those WARNINGs?

Yes - they indicate that more than one emulator is running against the same image file...that should not be the case.

> Do we need the sleeptime?

Yes, I think so. I found that without those sleeps, there were intermittent failures to launch an emulator.

> How can I remove devices?

Kill the emulator:

ps -ef | grep emu
kill ...
I've removed all devices yet I get this:

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ ps -ef | grep emu
cltbld    3234  2920  0 14:48 pts/3    00:00:00 grep emu
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ python launch.py 
Launching emulator #0
Attemp #1 of SUT redirection
Socket Error [Errno 111] Connection refused
Attemp #2 of SUT redirection
Socket Error [Errno 111] Connection refused
Attemp #3 of SUT redirection
Socket Error [Errno 111] Connection refused
^Z
[1]+  Stopped                 python launch.py
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ ps -ef | grep emu
cltbld    3236  3235  1 14:48 pts/3    00:00:03 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-1 -port 5554
cltbld    3250  2920  0 14:53 pts/3    00:00:00 grep emu
You can debug the telnet connection problem manually:

Kill all emulators and verify adb devices shows nothing running. Then:

$ emulator -avd test-x86-1 &
$ adb devices
List of devices attached 
emulator-5554	device

$ telnet localhost 5554
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Android Console: type 'help' for a list of commands
OK
redir add tcp:20701:20701
OK
redir add tcp:20700:20700
OK
quit
Connection closed by foreign host.
(In reply to Geoff Brown [:gbrown] from comment #20)
> I think the issue is that the avd definitions contain pointers to image
> files in the Android SDK. Probably the culprit is:
> 
> /home/cltbld/android-sdk-linux/system-images/android-17/x86//kernel-qemu

I created a new tar of avd images that includes kernel-qemu, along with the system.img and ramdisk.img: http://people.mozilla.org/~gbrown/test-avds-aug6.tgz

With these images, you do not need the system images in the SDK. But, you need to launch the emulator slightly differently (specify paths to the kernel, etc on the emulator command line). I will update the launch.py on bug 894507.
Hi gbrown,
I believe there's something at times not working properly in our machines.
For some reason, I sometimes have trouble starting emulator #1.
I've have noticed that if I run into it, VNC becomes unresponsive.
I've also noticed that compiz starts running at 100% CPU and things only get back to normal after I kill it.
After I killed it, I managed to start all emulators.

What can I do if run into it again? What can use to debug it? Any log messages that I can check?

These are the steps that I followed:
cd ~/.android/avd
rm -rf *
wget http://people.mozilla.org/~gbrown/test-avds-aug6.tgz
tar zxvf test-avds-aug6.tgz [1]
rm test-avds-aug6.tgz
cd
export DISPLAY=:0.0
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
emulator -avd test-x86-1 &
emulator -avd test-x86-2 &
emulator -avd test-x86-3 &
emulator -avd test-x86-4 &

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ ps -ef | grep emu
cltbld    3323  2917 20 09:16 pts/3    00:01:42 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-2
cltbld    3366  2917 24 09:18 pts/3    00:01:37 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-3
cltbld    3388  2917 23 09:18 pts/3    00:01:35 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-4
cltbld    3442  2917 34 09:21 pts/3    00:01:15 /home/cltbld/android-sdk-linux/tools/emulator64-x86 -avd test-x86-1

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com ~]$ adb devices
List of devices attached 
emulator-5554   device
emulator-5556   device
emulator-5558   device
emulator-5560   device






[1]
[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com temp]$ ls -l ~/.android/avd/
total 283028
-rw-r--r-- 1 cltbld cltbld   2825664 Aug  6 15:19 kernel-qemu
-rw-r--r-- 1 cltbld cltbld    270168 Aug  6 15:19 ramdisk.img
-rw-r--r-- 1 cltbld cltbld 286691328 Aug  6 15:00 system.img
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:24 test-x86-1.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:24 test-x86-1.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-2.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-2.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-3.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-3.ini
drwxr-xr-x 2 cltbld cltbld      4096 Aug  6 15:27 test-x86-4.avd
-rw-r--r-- 1 cltbld cltbld       120 Aug  6 15:27 test-x86-4.ini
I saw those exact symptoms when I started using multiple emulators. The only way I could find to avoid it was to stagger the launches: sleep after launching each emulator. That's why there are those long sleep's in launch.py.
I don't know of a good way to debug/diagnose this problem.
What are the http and ssl ports supposed to be?
I have been using:

    "http_port": "8888",
    "ssl_port": "4445",

for the first emulator, and incrementing for each subsequent: 8889/4446 for the second, etc. dminor -- is that right?
Flags: needinfo?(dminor)
I've been doing the same thing as Geoff. These are the ports set up by the test webserver, so any values that don't collide with something else running on the test system are fine.
Flags: needinfo?(dminor)
I've added an action called start-emulators.
I hope tomorrow to run the mochitest suite to completion.

gbrown, dminor: what should we do once the tests pass? Do we shut the emulators off with the "kill" command from within the telnet connection? Or should we reboot directly?

Should I check if there are any emulators running at the beginning to I can shut them off before creating the new ones?

This was run like this:
 python scripts/androidx86_emulator_unittest.py --config-file configs/android/androidx86.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/fennec-25.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1375204094/robocop.apk --download-symbols ondemand --test-suite mochitest
Attachment #786437 - Attachment is obsolete: true
I'm trying to run mochitests manually but it is failing to run /system/bin/logcat -c.

[cltbld@talos-linux64-ix-003.test.releng.scl3.mozilla.com mozharness]$ telnet localhost 20701
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
$>/system/bin/logcat -c
##AGENT-WARNING## [/system/bin/logcat] command with arg(s) = [-c] is currently not implemented.
$>ver
SUTAgentAndroid Version 1.18
$>


export DISPLAY=:0.0
export PATH=$PATH:/home/cltbld/android-sdk-linux/tools:/home/cltbld/android-sdk-linux/platform-tools
emulator -avd test-x86-1 &
telnet localhost 5554
redir add tcp:20701:20701
redir add tcp:20700:20700
quit
/home/cltbld/mozharness/build/venv/bin/python /home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level INFO --app org.mozilla.fennec --remote-webserver 10.0.2.2 --run-only-tests androidx86.json --xre-path /home/cltbld/mozharness/build/hostutils/xre --utility-path /home/cltbld/mozharness/build/hostutils/bin --deviceIP 127.0.0.1 --devicePort 20701 --http-port 8888 --ssl-port 4445 --httpd-path /home/cltbld/mozharness/build/tests/mochitest --total-chunks 8 --this-chunk 1 --symbols-path crashreporter-symbols.zip 
Device info: {'uptime': ['0 days 0 hours 6 minutes 41 seconds 181 ms'], 'sutuserinfo': ['User Serial:0'], 'power': ['Power status:', ' AC power ONLINE', ' Battery charge CHARGING', ' Remaining charge: 50%', ' Battery Temperature: 0.0 (c)'], 'process': [['10037', '1591', 'com.android.exchange'], ['10047', '1689', 'com.mozilla.SUTAgentAndroid'], ['10038', '1605', 'com.android.providers.calendar'], ['10027', '1646', 'com.android.calendar'], ['10033', '1265', 'com.android.systemui'], ['10018', '1362', 'com.android.inputmethod.latin'], ['10005', '1433', 'com.android.location.fused'], ['1000', '1171', 'system'], ['10022', '1570', 'com.android.deskclock'], ['10002', '1411', 'com.android.launcher'], ['10024', '1290', 'android.process.media'], ['1000', '1458', 'com.android.settings'], ['1001', '1390', 'com.android.phone'], ['10046', '1627', 'com.mozilla.watcher'], ['10015', '1535', 'com.android.mms'], ['10010', '1328', 'android.process.acore'], ['10010', '1501', 'com.android.contacts'], ['10025', '1485', 'com.android.music']], 'screen': ['X:1024 Y:720'], 'memory': ['PA:799289344, FREE: 665034752'], 'systime': ['2013/08/09 02:18:31:247'], 'rotation': ['ROTATION:0'], 'disk': ['/data: 610140160 total, 562380800 available', '/system: 277610496 total, 0 available', '/mnt/sdcard: 522225664 total, 518141952 available'], 'os': ['sdk_x86-eng 4.2 JOP40C eng.android-build.20121231.103448 test-keys'], 'id': ['52:54:00:12:34:56'], 'uptimemillis': ['401207'], 'temperature': ['Temperature: unknown']}
Test root: /mnt/sdcard/tests
Automation Error: Exception caught while running tests
Traceback (most recent call last):
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 688, in main
    dm.recordLogcat()
  File "/home/cltbld/mozharness/build/tests/mochitest/devicemanager.py", line 125, in recordLogcat
    self.shellCheckOutput(['/system/bin/logcat', '-c'], root=self._logcatNeedsRoot)
  File "/home/cltbld/mozharness/build/tests/mochitest/devicemanager.py", line 375, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/system/bin/logcat', '-c'] (output: 'su: uid 10047 not allowed to su', retval: '1')
Traceback (most recent call last):
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 707, in <module>
    main()
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 693, in main
    mochitest.stopWebServer(options)
  File "/home/cltbld/mozharness/build/tests/mochitest/runtestsremote.py", line 332, in stopWebServer
    self.server.stop()
AttributeError: 'MochiRemote' object has no attribute 'server'
The important bit there is:

su: uid 10047 not allowed to su

It looks like su is not installed. Are you using my avd definitions, or your own?
(In reply to Geoff Brown [:gbrown] from comment #38)
> The important bit there is:
> 
> su: uid 10047 not allowed to su
> 
> It looks like su is not installed. Are you using my avd definitions, or your
> own?

I'm using this: http://people.mozilla.org/~gbrown/test-avds-aug6.tgz
It comes inside of it, no?
Yes, su should be in system.img, in that tar.

It works for me. Compare:

mozdev@ubuntu:~/.android/avd$ emulator64-x86 -avd test-x86-2 -kernel kernel-qemu -system system.img -ramdisk ramdisk.img &
[1] 14552
mozdev@ubuntu:~/.android/avd$ adb shell ls -l /system/xbin/su
-rwsr-sr-x root     root        17748 2013-08-05 03:14 su
mozdev@ubuntu:~/.android/avd$ ls -l system.img
-rw-r--r-- 1 mozdev mozdev 286691328 Aug  6 15:00 system.img
mozdev@ubuntu:~/.android/avd$ adb shell su -c id
uid=0(root) gid=0(root)
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) (EDT/UTC-4) from comment #37)
> I'm trying to run mochitests manually but it is failing to run
> /system/bin/logcat -c.
...

> $>ver
> SUTAgentAndroid Version 1.18
> $>

As an FYI SUTAgent 1.18 had issues in production and was backed out pending ATeam having cycles to look into it. we're currently using 1.17.  No idea if that is affecting things here
(I am pretty comfortable with sut 1.18, at least on the emulator, but we should in mind that that is not the current version used elsewhere.)

I removed the system-images from my Android SDK installation (android-sdk-linux). If your machine has a system image in the SDK, your emulator might be picking that up.
Attached patch x86_tools.diff (obsolete) — Splinter Review
Attachment #788270 - Flags: review?(bugspam.Callek)
Attached patch [wip] x86_bc.diff (obsolete) — Splinter Review
I'm not happy with the for loop that I do but I did not feel like duplicating all of the dictionaries.

This adds the x86 builds and x86 emulator tests to Ash.
That branch is special because it allows us to push to http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness to test mozharness patches.

aki, what do you think about the for loop? Would you take it? (No need to review the rest of the patch)
Attachment #788293 - Flags: feedback?(aki)
Comment on attachment 788293 [details] [diff] [review]
[wip] x86_bc.diff

It's fine as long as we don't change the structure of the dicts... less ugly than some of the for loops we have :P
I think I was used to this being a sql table, so the sql can be programmatic but you get a readable exploded version.
I think we were investigating configconfig which would have a similar type of thing: a .py file to generate a verbose dict in some format (json? yaml?)  We keep hitting this write-optimize vs read-optimize problem, so that might be a longer term solution.
Attachment #788293 - Flags: feedback?(aki) → feedback+
gbrown, dminor: could we use this mozharness repo for now? That way we won't be based of different places.
http://hg.mozilla.org/users/armenzg_mozilla.com/mozharness

I will be pushing my changes there.

I have this running in our staging environment.
As soon as we get to a half decent state I will ask for reviews and enabled it on Ash.

Which test suites have so far been successful?
Which test suites are we expected to run?
The same ones as on Panda Android?
The full set of (Android) unit tests, like Panda, but including reftests and xpcshell tests:

M1 M2 M3 M4 M5 M6 M7 M8 M-gl rc1 rc2 C J1 J2 J3 R1 R2 R3 R4 X

AFAIK, we are not (yet) attempting Talos.
Comment on attachment 788270 [details] [diff] [review]
x86_tools.diff

Review of attachment 788270 [details] [diff] [review]:
-----------------------------------------------------------------

reluctant r+ here.

So you're not tying these builds to panda masters, and the bm19/20/22/10 set of masters are all going to be going away (they are on kvm) as well as I would prefer to not load them up with a new job type until after we get them to the new masters and balanced via slavealloc.

That said, the aws masters you're marking are not yet officially in use, so I don't know if we need to do anything there first. And would be interested in which masters you actually plan to attach these jobs to and how.

Finally, I trust you to figure this all out and the changes are technically sound so r+
Attachment #788270 - Flags: review?(bugspam.Callek) → review+
Product: mozilla.org → Release Engineering
Comment on attachment 787765 [details] [diff] [review]
[wip] integrated launch.py into mozharness

Review of attachment 787765 [details] [diff] [review]:
-----------------------------------------------------------------

::: configs/android/androidx86.py
@@ +12,5 @@
> +    "min_http_port": "8888",  # starting http port to use for the mochitest server
> +    "min_ssl_port": "4445",   # starting ssl port to use for the server
> +    "min_emu_port": 5554,
> +    "min_sut_port1": 20701,
> +    "min_sut_port2" : 20700, # XXX: why do we have two ports?

sutagent supports 2 ports: 20700 and 20701 by default. 20700 does not prompt for commands; 20701 does. As far as I know, our test automation only uses 20701, but I think it best to redirect both ports in case there is some usage I am not aware of, and to allow for future use.

::: scripts/androidx86_emulator_unittest.py
@@ +38,4 @@
>           "default": "browser",
>           "help": "The type of tests to run",
>          }],
> +        [["--robocop-url"],

We can do without a robocop-specific option: See _download_robocop_apk() in android_panda.py.

@@ +226,5 @@
> +            self.info("Sleeping %d" % sleeptime)
> +            time.sleep(sleeptime)
> +            # XXX: what is this for?
> +            #for proc in procs:
> +            #    proc.wait()

launch.py had this just so that the launch script would wait for all of the emulators. This allowed you to Ctrl-C out of launch.py and kill all of the emulators at the same time. It's safe to remove from your mozharness patch.
Attached patch androidx86.tools.diff (obsolete) — Splinter Review
Thanks for pointing that out.
I believe this should meet your expectations.
Attachment #788270 - Attachment is obsolete: true
Attachment #790394 - Flags: review?(bugspam.Callek)
Summary: Determine how to run Android x86 emulator unit tests from buildbot → Run Android x86 emulator unit tests from buildbot
Whiteboard: [reit-x86]
Attached patch [wip] androidx86.mozharness.diff (obsolete) — Splinter Review
Attachment #787765 - Attachment is obsolete: true
(In reply to Geoff Brown [:gbrown] from comment #31)
> I saw those exact symptoms when I started using multiple emulators. The only
> way I could find to avoid it was to stagger the launches: sleep after
> launching each emulator. That's why there are those long sleep's in
> launch.py.

I don't know what to do. We have staggered start ups yet we get into this bad state.
We have to find a way to determine why they would not start.
This is a blocker for me.
Attached file androidx86.log.txt (obsolete) —
I'm attaching the log (note this is not ready yet).

I will be adding a way to stop the whole thing if all 4 emulators are not ready.
I will also have to figure out how to deal with determining the robocop from the read-buildbot-configs step.
That log is puzzling to me. It looks like you are launching the emulators the same way that I do, but I never have a problem like that when the launches are staggered.

Do you know if the 2nd, 3rd, and 4th emulators processes are still running when those telnet connections fail?

The emulator does not write a log, as far as I can tell. However, it usually writes some messages to stdout and/or stderr. You could try collecting and reporting those. Also, you can more of those messages by specifying "-debug all" on the emulator command line.
Comment on attachment 791003 [details] [diff] [review]
[wip] androidx86.mozharness.diff

Review of attachment 791003 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/androidx86_emulator_unittest.py
@@ +239,5 @@
> +        Let's make sure that every emulator has been stopped
> +        '''
> +        for p in self.procs:
> +            if p.poll() is None:
> +                p.kill()

I think this will work fine, but if you want another option:

The emulator accepts a 'kill' command over telnet -- just like the 'redir' command used in _redirectSUT.
Attached patch [wip] androidx86.mozharness.diff (obsolete) — Splinter Review
* adding -debug all
* moved emulators parameters into the configs (the http port and ssl port need adjustment)
* removed launch processed by index
* removed "minimum" port concepts
* using self.fatal() when an emulator fails to be connected to
* added _post_fatal() function to kill the emulators (TODO)
* start as many emulators as specified in config["emulators"] instead of xrange(0, 4)
(In reply to Geoff Brown [:gbrown] from comment #56)
> That log is puzzling to me. It looks like you are launching the emulators
> the same way that I do, but I never have a problem like that when the
> launches are staggered.
> 
> Do you know if the 2nd, 3rd, and 4th emulators processes are still running
> when those telnet connections fail?
> 
> The emulator does not write a log, as far as I can tell. However, it usually
> writes some messages to stdout and/or stderr. You could try collecting and
> reporting those. Also, you can more of those messages by specifying "-debug
> all" on the emulator command line.

I think I was trying the scripts locally first, did not reboot and triggered a job with buildbot. This probably means that I had an emulator instance running.

_post_fatal() works very well.

I will post a new patch by the end of Monday.
FTR, I'm on duty this week and I might find it impossible to keep doing any development until Monday.
It does not yet trigger unit test jobs on all four emulators.
My latest work is in the attachment.
Attachment #791003 - Attachment is obsolete: true
gbrown, I'm having trouble with staggered start up.
The first one starts well. The second one does not.
I've rebooted the host and will try again.

Are 20701 and 20700 the default sut ports?
Should I be redirecting the ports of the first emulator somewhere else? It seems that we don't redirect the default sut ports for the first emulator.

13:27:54     INFO - Trying to start the emulator with this command: emulator -avd test-x86-1 -debug all -port 5554 -kernel /home/cltbld/.android/avd/kernel-qemu -system /home/cltbld/.android/avd/system.img -ramdisk /home/cltbld/.android/avd/ramdisk.img
13:27:54     INFO - Sleeping 10 seconds
13:28:04     INFO -   Attempt #1 to redirect ports: (5554, 20701, 20700)
13:28:04     INFO - test-x86-1: 5554; sut port: 20701/20700
13:28:04     INFO - Emulators staggered start up. Sleeping 60 secs.
13:29:04     INFO - Trying to start the emulator with this command: emulator -avd test-x86-2 -debug all -port 5556 -kernel /home/cltbld/.android/avd/kernel-qemu -system /home/cltbld/.android/avd/system.img -ramdisk /home/cltbld/.android/avd/ramdisk.img
13:29:04     INFO - Sleeping 10 seconds
13:29:14     INFO -   Attempt #1 to redirect ports: (5556, 20703, 20702)
13:29:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:29:14     INFO - Sleeping 30 seconds
13:29:44     INFO -   Attempt #2 to redirect ports: (5556, 20703, 20702)
13:29:44     INFO - Trying again after exception: [Errno 111] Connection refused
13:29:44     INFO - Sleeping 30 seconds
13:30:14     INFO -   Attempt #3 to redirect ports: (5556, 20703, 20702)
13:30:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:30:14     INFO - Sleeping 30 seconds
13:30:44     INFO -   Attempt #4 to redirect ports: (5556, 20703, 20702)
13:30:44     INFO - Trying again after exception: [Errno 111] Connection refused
13:30:44     INFO - Sleeping 30 seconds
13:31:14     INFO -   Attempt #5 to redirect ports: (5556, 20703, 20702)
13:31:14     INFO - Trying again after exception: [Errno 111] Connection refused
13:31:14    FATAL - We have not been able to establish a telnetconnection with the emulator
13:31:14    FATAL - Running post_fatal callback...
13:31:14     INFO - Let's kill every process called emulator-x86
13:31:14     INFO - Killing pid 2917.
13:31:14     INFO - Killing pid 2951.
13:31:14     INFO - Copying logs to upload dir...
13:31:14     INFO - mkdir: /builds/slave/talos-slave/test/build/upload/logs
13:31:14    FATAL - Exiting -1
Killing compiz helped again.
Attached file launch2.py
I've created a new version of launch2.py to help me speed up the mozharness development.
It pretty much matches what I already have in mozharness.

This way I can focus only on running the tests once launch2.py runs well.
Attached file buildprops.json (obsolete) —
Attached patch androidx86.mozharness.diff (obsolete) — Splinter Review
To make use of this script and get to where I'm you would have to do this:
- in one session run launch2.py by setting PATH and DISPLAY
-- two emulators should start

- on another session set PATH and DISPLAY as well
-- this will not be needed once I pass env values to ADBDeviceHandler
- clone mozharness
- apply mozharness patch
- download the buildprops.json [1]
- export PROPERTIES_FILE=`pwd`/buildprops.json
- /tools/buildbot/bin/python scripts/scripts/androidx86_emulator_unittest.py --cfg android/androidx86.py --test-suite mochitest --download-symbols ondemand

It currently fails on the install step.

[1] https://bugzilla.mozilla.org/attachment.cgi?id=794874
Attachment #791458 - Attachment is obsolete: true
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #61)
> gbrown, I'm having trouble with staggered start up.
> The first one starts well. The second one does not.

Can you collect the emulator -debug output from a failed run and post it here? I wonder if it has anything useful.
 
> Are 20701 and 20700 the default sut ports?

Yes.

> Should I be redirecting the ports of the first emulator somewhere else? It
> seems that we don't redirect the default sut ports for the first emulator.

Keep in mind that 

redir add tcp:20701:20701

is *not* a no-op. It "redirects" port 20701 on the emulator to port 20701 on the host, so that "telnet 127.0.0.1 2070!" on the host connects to the sutagent, on the emulator.

We want to issue a redir for every emulator, including the first one, and we just need the host ports to be unique. Currently, emulator 1 = {20700, 20701}, emulator 2 = {20702, 20703}, etc.

> I've rebooted the host and will try again.

> Killing compiz helped again.

Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?
(In reply to Geoff Brown [:gbrown] from comment #66)
> > I've rebooted the host and will try again.
> 
> > Killing compiz helped again.
> 
> Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?

The reboot was probably unnecessary. Killing compiz is what helped.

How is compiz involved in this project? Is it fine if after a failure to connect I kill compiz?
rail: we're working on running 4 android x86 emulators on the talos-linux64-ix machines. We have noticed that at times, compiz starts using 100% CPU and we have to kill it otherwise we can't connect to the emulator.
Callek pointed out to me that there's some puppet comments that mention that compiz can take 100% CPU.
Is there something we can do on our side to prevent hitting this bug?
Or do you have more context as to why it happens?
Thanks!

http://mxr.mozilla.org/build/source/puppet/modules/gui/manifests/init.pp#67
http://mxr.mozilla.org/build/source/puppet/modules/gui/templates/Xsession.conf.erb#11
Flags: needinfo?(rail)
> Is there something we can do on our side to prevent hitting this bug?

In bug 859867 I landed attachment 747431 [details] [diff] [review] to prevent this, but it doesn't look like it helps...

> Or do you have more context as to why it happens?

Last I poked this, I thought that the problem is https://bugzilla.mozilla.org/show_bug.cgi?id=859867#c23 (nvidia drivers)

I hope it helps.
Flags: needinfo?(rail)
:(

12:11 armenzg: hrmm talos-linux64-ix-003 rebooted on me
12:14 Callek: armenzg: rail-lunch: well this is interesting, syslog on -003 right now:
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: WARNING: Application 'compiz.desktop' killed by sig
12:14 Callek: nal
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: WARNING: App 'compiz.desktop' respawning too quickl
12:14 Callek: y
12:14 Callek: Aug 26 09:13:29 talos-linux64-ix-003 x-session-manager[2368]: CRITICAL: We failed, but the fail whale is dead. So
12:14 Callek: rry....
12:14 Callek expects that was armen killing it ;-)
12:15 armenzg: Callek: I just killed the process
12:15 armenzg: like 10 seconds ago
12:15 armenzg: because I'm starting the emulators again
12:15 armenzg: and I knew that it would prevent them from starting
12:16 armenzg: I don't think I killed it fast enough
12:16 armenzg now kills emulator's pid 3007
12:16 Callek: armenzg: sooo looks like the system went down due to a kernel crash
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #67)
> (In reply to Geoff Brown [:gbrown] from comment #66)
> > > I've rebooted the host and will try again.
> > 
> > > Killing compiz helped again.
> > 
> > Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?
> 
> The reboot was probably unnecessary. Killing compiz is what helped.
> 
> How is compiz involved in this project? Is it fine if after a failure to
> connect I kill compiz?

We don't interact directly with compiz: none of the scripts start/stop compiz or anything like that. I have noticed that compiz is usually the "top" cpu user while running the emulators.

I expect that it is fine to kill compiz when all of the emulators are stopped. I would be hesitant to kill compiz with an active emulator running tests.

I don't have much insight into compiz. I wonder if :dminor has more info?
Flags: needinfo?(dminor)
(In reply to Geoff Brown [:gbrown] from comment #71)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> Enginerring) (EDT/UTC-4) from comment #67)
> > (In reply to Geoff Brown [:gbrown] from comment #66)
> > > > I've rebooted the host and will try again.
> > > 
> > > > Killing compiz helped again.
> > > 
> > > Just to be clear: Did you reboot, just kill compiz, or reboot + kill compiz?
> > 
> > The reboot was probably unnecessary. Killing compiz is what helped.
> > 
> > How is compiz involved in this project? Is it fine if after a failure to
> > connect I kill compiz?
> 
> We don't interact directly with compiz: none of the scripts start/stop
> compiz or anything like that. I have noticed that compiz is usually the
> "top" cpu user while running the emulators.
> 
> I expect that it is fine to kill compiz when all of the emulators are
> stopped. I would be hesitant to kill compiz with an active emulator running
> tests.
> 
> I don't have much insight into compiz. I wonder if :dminor has more info?

I have only been killing when the emulators are starting up.
Once they have started up I have not had any trouble since I have not yet run any tests.
It might be worthwhile spending some time to get Ubuntu running without compiz. It isn't necessary and is causing problems. It should be possible to do something like this: http://askubuntu.com/questions/32447/how-do-i-disable-compiz-in-the-ubuntu-classic-session
Flags: needinfo?(dminor)
(In reply to Dan Minor [:dminor] from comment #73)
> It might be worthwhile spending some time to get Ubuntu running without
> compiz. It isn't necessary and is causing problems. It should be possible to
> do something like this:
> http://askubuntu.com/questions/32447/how-do-i-disable-compiz-in-the-ubuntu-
> classic-session

We thought about using XFCE in the beginning, but didn't go with this option because it doesn't represent an "average" Ubuntu user. Switching to something non-compiz may affect unit tests and talos results as well.
I thought there was going to be dedicated hardware for the Android x86 unit tests. If that is the case, then it should be safe to disable compiz for the machines running the emulators. If not, then I guess we are stuck killing it when it acts up.
I have been able two test suites. One on each emulator.
Now, I have to run them concurrently rather than sequentially.

aki, how does this format work for you?
    "test_suite_definitions": {
        "mochitest-1": {
            "args": [("--total-chunks", "8"), ("--this-chunk", "1")],
            "manifest": "androidx86.json",
        },  
        "mochitest-2": {
            "args": [("--total-chunks", "8"), ("--this-chunk", "2")],
            "manifest": "androidx86.json",
        },  
    },

I make use of it like this (--test-suite is an appendable list):
...
        self.test_suite_definitions = c['test_suite_definitions']
        self.test_suites = c.get('test_suites')
        for suite in self.test_suites:
            assert suite in self.test_suite_definitions
...         
'--run-only-tests', self.test_suite_definitions[suite_name]["manifest"],
...        
for arg_pair in self.test_suite_definitions[suite_name]["args"]:
   cmd.extend(self._build_arg(arg_pair[0], arg_pair[1]))
Flags: needinfo?(aki)
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #76)
> I have been able two test suites. One on each emulator.
> Now, I have to run them concurrently rather than sequentially.
> 
> aki, how does this format work for you?
>     "test_suite_definitions": {
>         "mochitest-1": {
>             "args": [("--total-chunks", "8"), ("--this-chunk", "1")],
>             "manifest": "androidx86.json",
>         },  
>         "mochitest-2": {
>             "args": [("--total-chunks", "8"), ("--this-chunk", "2")],
>             "manifest": "androidx86.json",
>         },  
>     },
> 
> I make use of it like this (--test-suite is an appendable list):
> ...
>         self.test_suite_definitions = c['test_suite_definitions']
>         self.test_suites = c.get('test_suites')
>         for suite in self.test_suites:
>             assert suite in self.test_suite_definitions
> ...         
> '--run-only-tests', self.test_suite_definitions[suite_name]["manifest"],
> ...        
> for arg_pair in self.test_suite_definitions[suite_name]["args"]:
>    cmd.extend(self._build_arg(arg_pair[0], arg_pair[1]))

Hm, not every argument is a pair.  I think I leaned towards extra_args as a flat list because of this.

Also, I'm not sure that --this-chunk will take multiple settings; it may just take the final one, so --test-suite mochitest-1 --test-suite mochitest-2 may only run chunk 2/8... though they're running in separate emulators?  (Not entirely clear here.)

Anyway, I would lean towards extra_args as a flat list, unless you have some specific reason for needing them in tuple pairs.
Flags: needinfo?(aki)
Attached patch androidx86.mozharness.diff (obsolete) — Splinter Review
I'm running two different mochitest test suites on two different emulators :)

Here are the steps to follow in case you want to reproduce locally.
This is more accurate than comment 65.

I had to update the host with Callek's latest deployment.
* go in as root, apt-get update; apt-get install android-sdk18;

- In one session run launch2.py and set PATH and DISPLAY
-- two emulators should be started
export PATH=$PATH:/tools/android-sdk18/tools:/tools/android-sdk18/platform-tools
export DISPLAY=:0.0
wget -Olaunch2.py https://bugzilla.mozilla.org/attachment.cgi?id=794869
python launch2.py

- on another session set PATH and DISPLAY as well and run the script
-- this will not be needed once I pass env values to ADBDeviceHandler
- clone mozharness
- run the script

http://hg.mozilla.org/users/armenzg_mozilla.com/mozharness scripts
export PATH=$PATH:/tools/android-sdk18/tools:/tools/android-sdk18/platform-tools
export DISPLAY=:0.0
/tools/buildbot/bin/python scripts/scripts/androidx86_emulator_unittest.py --cfg android/androidx86.py --download-symbols ondemand --installer-path /builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --installer-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/fennec-26.0a1.en-US.android-i386.apk --test-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/fennec-26.0a1.en-US.android-i386.tests.zip --robocop-url http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1377528472/en-US/robocop.apk --test-suite mochitest-1 --test-suite mochitest-2
Attachment #794875 - Attachment is obsolete: true
Attachment #794869 - Attachment mime type: text/x-python → text/plain
Attached patch concurrent.diff (obsolete) — Splinter Review
This tries to follow what aki recommended wrt to process manipulation (use poll() instead of wait()).

I still need to adjust it so we dump the log for each process that finishes.

I will continue this on Friday.
Attached patch [puppet] add android-tools (obsolete) — Splinter Review
This adds the android sdk to the ubuntu hosts.
Attachment #796385 - Flags: review?(rail)
Comment on attachment 796385 [details] [diff] [review]
[puppet] add android-tools

Review of attachment 796385 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/init.pp
@@ +7,5 @@
> +	    # We want it on Ubuntu
> +	    include packages::mozilla::android_sdk18
> +	}
> +    }
> +}
\ No newline at end of file

\ No newline at end of file

Can you add one?

If you want to use a separate module for the SDK, I'd suggest to add CentOS here and fix http://hg.mozilla.org/build/puppet/file/618c178fd73e/modules/signingserver/manifests/base.pp#l31

::: modules/toplevel/manifests/slave/test/gpu.pp
@@ +6,4 @@
>          gui:
>              on_gpu => true;
>      }
> +    

extra trailing space ^

@@ +7,5 @@
>              on_gpu => true;
>      }
> +    
> +    # Android Emulators only work on gpu slaves
> +    include android-emulator

The name doesn't match the class name.
Attachment #796385 - Flags: review?(rail) → review-
Attachment #796385 - Attachment is obsolete: true
Attachment #796728 - Flags: review?(rail)
Attachment #796728 - Flags: review?(rail) → review+
So I'm not sold on this approach, and ideally we'll move this to a tooltool mechanic, even though nothing (I know of) uses tooltool on test machines yet.

It will not change often, and deploying with puppet is meant as a "get it out". This patch assumes we'd blow-away and recreate each job run, however we could also decompress with puppet and run with the same avd's each time.
Attachment #796938 - Flags: review?(rail)
Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

Can you also document it at https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/config?
Attachment #796938 - Flags: review?(rail) → review+
Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

backed out in https://hg.mozilla.org/build/puppet/rev/ac14ecf45840 due to:

Wed Aug 28 20:11:27 -0700 2013 Puppet (err): Failed to apply catalog: Parameter source failed on File[/home/cltbld/avds/test-x86.tar.gz]: Cannot use URLs of type 'http' as source for fileserving at /etc/puppet/production/modules/androidemulator/manifests/x86.pp:21

I suspect I can work around it with using one of the many other ways to specify source => for this file but i'll test it tomorrow after I've rested.
Attachment #796938 - Flags: checked-in+ → checked-in-
Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

relanded using puppet:/// rather than http://

https://hg.mozilla.org/build/puppet/rev/54046d654252
Attachment #796938 - Flags: checked-in- → checked-in+
Comment on attachment 796728 [details] [diff] [review]
[puppet] add android-tools v2

Review of attachment 796728 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/init.pp
@@ +5,5 @@
> +    case $::operatingsystem {
> +        Ubuntu: {
> +	    # We want it on Ubuntu
> +	    include packages::mozilla::android_sdk18
> +	}

This should have a default line, or just remove the $::operatingsystem conditional and let the package class handle failing on other platforms.
Comment on attachment 796728 [details] [diff] [review]
[puppet] add android-tools v2

Review of attachment 796728 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/packages/manifests/mozilla/android_sdk18.pp
@@ +5,5 @@
> +class packages::mozilla::android_sdk18 {
> +    case $::operatingsystem {
> +        Ubuntu: {
> +            package {
> +                # Built from https://github.com/rail/android-sdk

Sorry, one more thing on this ptach - like screenresolution, this should probably be moved to a Mozilla repo.  It gets back to the still-unsolved problem of how to version Debian packaging scripts.
Comment on attachment 796938 [details] [diff] [review]
[puppet] deploy avd's

Review of attachment 796938 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/androidemulator/manifests/x86.pp
@@ +18,5 @@
> +		    source => "http://${config::data_server}/repos/private/avds/test-x86-aug6.tar.gz",
> +		    owner  => $users::builder::username,
> +		    group  => $users::builder::group;
> +	    }
> +	}

Will this HTTP source cause the entire tarball, which is quite large, to be downloaded on every puppet run?  That could be very painful.  Could this be downloaded with tooltool instead, or wrapped in a .deb?  At any rate, it isn't really a repository right now, so probably doesn't belong under repos/.

Do these have to be installed in $HOME?  We're not doing anything else in that directory anymore - could this be in /build or /tools instead?

Finally, where does this tarball come from?  How could an external user upgrade it, or a developer figure out what's in it, or a future relenger upgrade it to a newer version than aug6?
Oh, and why is $install_avds "yes"/"no" instead of boolean?
Attached patch [wip] androidx86.configs.diff (obsolete) — Splinter Review
I have to do some more mozharness work with buildbot, hence, I had to create this patch to run Android x86 on my dev-master.

What do you guys think of the builder naming? and the structure to define it?
androidx86-set-# --> --test-suite jsreftest-1 --test-suite jsreftest-2 --test-suite jsreftest-3


I had to do that weird ANDROID_X86_MOZHARNESS_UNITTEST_DICT and ANDROID_X86_MOZHARNESS_UNITTEST_DICT dictionaries. It is ugly. Do you have any suggestions?
Attachment #798640 - Flags: feedback?(bugspam.Callek)
Attachment #798640 - Flags: feedback?(aki)
This is my latest work.

I would only want to highlight the addition of the suite definitions for all of the test suites.

This week I will be tackling:
* put compiz under control
* manage the avds appropriately
* report status correctly
Attachment #796213 - Attachment is obsolete: true
Attachment #796282 - Attachment is obsolete: true
Attached file androidx86.log.txt (obsolete) —
I'm getting such weird errors.
I wonder what I've changed to get this.

09:26:10     INFO - One of the test suites have finished and we're going to dump its output
09:26:10     INFO - Reading from file /tmp/tmp1MnEMs
09:26:10     INFO - Traceback (most recent call last):
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 707, in <module>
09:26:10     INFO -     main()
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 532, in main
09:26:10     INFO -     dm = droid.DroidSUT(options.deviceIP, options.devicePort, deviceRoot=options.remoteTestRoot)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 47, in __init__
09:26:10     INFO -     self.getDeviceRoot()
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 694, in getDeviceRoot
09:26:10     INFO -     data = self._runCmds([{ 'cmd': 'testroot' }])
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 152, in _runCmds
09:26:10     INFO -     self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 128, in _sendCmds
09:26:10     INFO -     self._doCmds(cmdlist, outputfile, timeout)
09:26:10     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 175, in _doCmds
09:26:10     INFO -     self._sock.connect((self.host, int(self.port)))
09:26:10     INFO -   File "/usr/lib/python2.7/socket.py", line 224, in meth
09:26:10     INFO -     return getattr(self._sock,name)(*args)
09:26:10     INFO - TypeError: coercing to Unicode: need string or buffer, NoneType found
Attachment #791412 - Attachment is obsolete: true
Attachment #788293 - Attachment is obsolete: true
This has my latest mozharness code.

- It kills compiz in advance
- It does some basic avds manipulation (still needs to be tested)

NOTE: I've found out that I need to kill compiz before trying to start any emulator (rather than try to kill it after an SUT timeout occurs).
Attachment #798649 - Attachment is obsolete: true
Comment on attachment 798913 [details]
androidx86.log.txt

Ignore comment 95. I was doing something wrong.
Attachment #798913 - Attachment is obsolete: true
Hi aki,
I see that ADBDeviceHandler prints to stdout rather than using logging.
http://hg.mozilla.org/build/mozharness/file/061c3d6c7b52/mozharness/mozilla/testing/device.py#l99

What should I do? Thanks!
Flags: needinfo?(aki)
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #98)
> Hi aki,
> I see that ADBDeviceHandler prints to stdout rather than using logging.
> http://hg.mozilla.org/build/mozharness/file/061c3d6c7b52/mozharness/mozilla/
> testing/device.py#l99
> 
> What should I do? Thanks!

I don't see any print()s in that file.  I'm going to guess this is the output spew from adb/devicemanager itself.  I don't know what we can do about that other than

* turn off output from devicemanager, which may hide issues
* get devicemanager to accept a log object
* try redirecting stdout/stderr itself, which may be an ugly solution but might work.
* do what I did, which is let devicemanager spew to stdout, and generally ignore it.
Flags: needinfo?(aki)
Comment on attachment 798914 [details] [diff] [review]
[wip] androidx86.mozharness.3.diff

gbrown, dminor: could I please get your feedback on this patch?

I'm getting close to completion and I would like to give you a day or so to give me your feedback.

I would like to ask aki for a review with all of your concerns addressed first.
Attachment #798914 - Flags: feedback?(gbrown)
Attachment #798914 - Flags: feedback?(dminor)
This patch shows the changes from my last mozharness patch where I try to experiment how to report that final status.
This probably will take a couple of days.
Comment on attachment 798914 [details] [diff] [review]
[wip] androidx86.mozharness.3.diff

Review of attachment 798914 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good! 

I am really looking forward to seeing this running via tbpl (on some tree), even if not everything works yet.

::: configs/android/androidx86.py
@@ +20,5 @@
> +    },
> +    "default_actions": [
> +        'clobber',
> +        'read-buildbot-config',
> +        #'setup-avds',

Why is this commented out?

@@ +42,5 @@
> +        {
> +            "name": "test-x86-2",
> +            "device_id": "emulator-5556",
> +            "http_port": "8888", # starting http port to use for the mochitest server
> +            "ssl_port": "4445", # starting ssl port to use for the server

Are you sure it is okay to have the same http_port and ssl_port for all of the instances? I have always used distinct ports...I don't know if this will be a problem or not. If they are all the same, maybe these parameters can be moved out of the per-emulator config?

@@ +130,5 @@
> +            "extra_args": [os.path.join('tests', 'testing', 'crashtest', 'crashtests.list')]
> +        },
> +        "xpcshell": {
> +            "category": "xpcshell",
> +            "extra_args": ["--manifest", os.path.join('..','jsreftest', 'tests', 'jstests.list')]

That's not right! The xpcshell manifest should be xpcshell/xpcshell_android.ini.

@@ +134,5 @@
> +            "extra_args": ["--manifest", os.path.join('..','jsreftest', 'tests', 'jstests.list')]
> +        },
> +        "robocop-1": {
> +            "category": "mochitest",
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],

Missing total-chunks, this-chunk args.

@@ +138,5 @@
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],
> +        },
> +        "robocop-2": {
> +            "category": "mochitest",
> +            "extra_args": ["--robocop-path=.", "--robocop-ids=fennec_ids.txt", "--robocop=robocop.ini"],

Ditto.

::: scripts/androidx86_emulator_unittest.py
@@ +58,5 @@
>  
>      error_list = [
>          {'substr': 'FAILED (errors=', 'level': ERROR},
>          {'substr': r'''Could not successfully complete transport of message to Gecko, socket closed''', 'level': ERROR},
>          {'substr': 'Timeout waiting for marionette on port', 'level': ERROR},

There are a couple of errors left over from b2g here -- I'm sure they can be removed.

@@ +329,5 @@
> +        '''
> +        This action starts the emulators and redirects the two SUT ports for each one of them
> +        '''
> +        # XXX: This line is needed since I'm not rebootig the machine in between jobs
> +        self._kill_processes("emulator-x86")

I thought "emulator" spawned "emulator64-x86" on 64 bit machines -- might be worth double checking the process names.

@@ +395,5 @@
> +            if procs == []:
> +                break
> +            else:
> +                self.info("#")
> +                time.sleep(30)

I never like to see polling. Is this needed to avoid an output-driven timeout?
Attachment #798914 - Flags: feedback?(gbrown) → feedback+
Also...I know earlier versions were not setting minidump_stackwalk_path correctly and I did not notice any changes in your patch. We should check on that.
Thanks gbrown for your catches (specially the ports - I lost the changes somewhere).
I've fixed them and testing again.
> @@ +395,5 @@
> > +            if procs == []:
> > +                break
> > +            else:
> > +                self.info("#")
> > +                time.sleep(30)
> 
> I never like to see polling. Is this needed to avoid an output-driven
> timeout?

The printing of "#" is to avoid an output-driven timeout.

I needed to know when the process that triggers the tests finishes.
I wished I could use a callback or a similar mechanism but I didn't research more.
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #105)
> The printing of "#" is to avoid an output-driven timeout.

You may want to sys.stdout.write('#') to avoid filling up the log with a timestamp + INFO + # every 30 seconds.
Attachment #794874 - Attachment is obsolete: true
Comment on attachment 798640 [details] [diff] [review]
[wip] androidx86.configs.diff

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #93)
> What do you guys think of the builder naming? and the structure to define it?
> androidx86-set-# --> --test-suite jsreftest-1 --test-suite jsreftest-2
> --test-suite jsreftest-3

I think that makes sense.
I called groups of builds 'suites' elsewhere, but we already use 'suite' here so 'set' is fine.

> I had to do that weird ANDROID_X86_MOZHARNESS_UNITTEST_DICT and
> ANDROID_X86_MOZHARNESS_UNITTEST_DICT dictionaries. It is ugly. Do you have
> any suggestions?

I don't know about the names, but the current dict assumes we're running the test chunks individually, not parallelized on a single machine, so we do require a separate config for this.

>diff --git a/mozilla-tests/BuildSlaves.py.template b/mozilla-tests/BuildSlaves.py.template
>--- a/mozilla-tests/BuildSlaves.py.template
>+++ b/mozilla-tests/BuildSlaves.py.template
>@@ -1,43 +1,44 @@
>+    "ubuntu32": "pass",
>+    "ubuntu64": "pass",
>+    "ubuntu64-b2g": "pass",
>-    "tiger": "pass",
>-    "w764": "pass",
>-    "vista": "pass",
>+    "linux64_android-x86": "pass",

Did you intend to make these other changes?
I think linux64_android-x86 is the only specifically applicable line here, right?
I didn't see 'ubuntu32' in use anywhere; I didn't check on the others.

Also, nit: lots of trailing whitespace in your mobile_config.py patch :)
Attachment #798640 - Flags: feedback?(aki) → feedback+
I've dealt with all of your feedback.
I've removed some of the noise of sorting platforms so staging_config.py and production_config.py match (I think that was my original reason).
I'm also reusing the ubuntu64_hw instead.
I've removed a bunch of trailing white spaces.
Attachment #798640 - Attachment is obsolete: true
Attachment #798640 - Flags: feedback?(bugspam.Callek)
Attachment #799702 - Flags: review?(aki)
(In reply to Geoff Brown [:gbrown] from comment #103)
> Also...I know earlier versions were not setting minidump_stackwalk_path
> correctly and I did not notice any changes in your patch. We should check on
> that.

What should it be?
I see this output for the panda jobs:
./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH = "/builds/minidump_stackwalk"
./configs/android/android_panda_releng.py:    "minidump_stackwalk_path": MINIDUMP_STACKWALK_PATH,
./configs/android/android_panda_releng.py:    "minidump_save_path": "%(abs_work_dir)s/../minidumps",
Flags: needinfo?(gbrown)
gbrown: should I delete ~/.android/avds and unpack clean templates before each run?
Attached patch androidx86.mozharness.4.diff (obsolete) — Splinter Review
I've completed the coding as best as possible.
Let me know if you prefer other ways to solve some of what I do.

Would you mind if we landed this after you review it and iterate after that?
I assume that we will need to see it running on tbpl and adjust as we see failures.
Attachment #798914 - Attachment is obsolete: true
Attachment #799090 - Attachment is obsolete: true
Attachment #798914 - Flags: feedback?(dminor)
Attachment #799741 - Flags: review?(aki)
Fixed some last minute typos.
Attachment #799741 - Attachment is obsolete: true
Attachment #799741 - Flags: review?(aki)
Attachment #799757 - Flags: review?(aki)
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #109)
> (In reply to Geoff Brown [:gbrown] from comment #103)
> > Also...I know earlier versions were not setting minidump_stackwalk_path
> > correctly and I did not notice any changes in your patch. We should check on
> > that.
> 
> What should it be?
> I see this output for the panda jobs:
> ./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH =
> "/builds/minidump_stackwalk"
> ./configs/android/android_panda_releng.py:    "minidump_stackwalk_path":
> MINIDUMP_STACKWALK_PATH,
> ./configs/android/android_panda_releng.py:    "minidump_save_path":
> "%(abs_work_dir)s/../minidumps",

I do not know where minidump_stackwalk lives officially, but there are some at http://mxr.mozilla.org/build/source/tools/breakpad/. 

We need minidump_stackwalk_path to point to the minidump_stackwalk binary (the file itself). So if we can get http://mxr.mozilla.org/build/source/tools/breakpad/linux64/minidump_stackwalk copied to /build/minidump_stackwalk and set minidump_stackwalk_path == "/build/minidump_stackwalk", that should work.

minidump_save_path just needs to point to a directory (it can be empty) that .dmp files can be copied to - a temporary directory to hold dumps.
Flags: needinfo?(gbrown)
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #110)
> gbrown: should I delete ~/.android/avds and unpack clean templates before
> each run?

I think that would be best.
(In reply to Geoff Brown [:gbrown] from comment #113)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> > What should it be?
> > I see this output for the panda jobs:
> > ./configs/android/android_panda_releng.py:MINIDUMP_STACKWALK_PATH =
> > "/builds/minidump_stackwalk"
> > ./configs/android/android_panda_releng.py:    "minidump_stackwalk_path":
> > MINIDUMP_STACKWALK_PATH,
> > ./configs/android/android_panda_releng.py:    "minidump_save_path":
> > "%(abs_work_dir)s/../minidumps",
> 
> I do not know where minidump_stackwalk lives officially, but there are some
> at http://mxr.mozilla.org/build/source/tools/breakpad/. 
> 
> We need minidump_stackwalk_path to point to the minidump_stackwalk binary
> (the file itself). So if we can get
> http://mxr.mozilla.org/build/source/tools/breakpad/linux64/
> minidump_stackwalk copied to /build/minidump_stackwalk and set
> minidump_stackwalk_path == "/build/minidump_stackwalk", that should work.
> 
> minidump_save_path just needs to point to a directory (it can be empty) that
> .dmp files can be copied to - a temporary directory to hold dumps.

We don't need to stuff it in /builds like foopies, we do however want to be sure we use it. The binary is based on the host OS not target OS, so using the same one that Ubuntu test slaves use is just fine, we checkout tools as part of these tests aiui so we can just point at the location in our local tools repo.
Attachment #799702 - Flags: review?(aki) → review+
Comment on attachment 799757 [details] [diff] [review]
androidx86.mozharness.4.diff

Awesome work, Armen!
I'm pretty impressed you got parallel processes working... I'd love to see that as a generic helper object and logger, but that's definitely out of scope here.

There are a lot of comments below.
You can land after fixing, or I'm happy to re-review after changes.

(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #111)
> Would you mind if we landed this after you review it and iterate after that?
> I assume that we will need to see it running on tbpl and adjust as we see
> failures.

That's ok.  You might have to make some of these fixes to actually have this runnable, though.


Pyflakes says:

scripts/androidx86_emulator_unittest.py:10: 're' imported but unused
scripts/androidx86_emulator_unittest.py:24: 'BaseErrorList' imported but unused
scripts/androidx86_emulator_unittest.py:25: 'ERROR' imported but unused
scripts/androidx86_emulator_unittest.py:375: undefined name 'c'
scripts/androidx86_emulator_unittest.py:397: local variable 'joint_return_code' is assigned to but never used

The undefined name 'c' is probably going to break things.


>diff --git a/configs/android/androidx86.py b/configs/android/androidx86.py
>new file mode 100644
>--- /dev/null
>+++ b/configs/android/androidx86.py
>@@ -0,0 +1,191 @@
>+import os
>+
>+config = {
>+    "buildbot_json_path": "buildprops.json",
>+    "host_utils_url": "http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip",

We may want this url in-tree at some point.  Not a blocker.

>+    "fennec_package_name": "org.mozilla.fennec",

This will work for m-c level branches, but not as they ride the trains.
I think there should be a text file inside the apk (package-name.txt?) that says what the app name is.  (there is for android, not sure if that carried over to android-x86)  Reading that file will make this work across trains, or for developers' builds, or on try, no matter which train the developer pushed from.

We'll have to fix this before we can enable this on Aurora, or really have it be useful on Try.
This doesn't block rolling out to Cedar, but we should probably fix before rolling out further.

>+    "test_suite_definitions": {
>+        "mochitest-1": {
>+            "category": "mochitest",
>+            "extra_args": ["--total-chunks", "8", "--this-chunk", "1", "--run-only-tests", "androidx86.json"],
>+        },
<snip>
>+    "suite_definitions": {
>+        "mochitest": {
>+            "run_filename": "runtestsremote.py",
>+            "options": ["--autorun", "--close-when-done", "--dm_trans=sut",
>+                "--console-level=INFO", "--app=%(app)s", "--remote-webserver=%(remote_webserver)s",
>+                "--xre-path=%(xre_path)s", "--utility-path=%(utility_path)s",
>+                "--deviceIP=%(device_ip)s", "--devicePort=%(device_port)s",
>+                "--http-port=%(http_port)s", "--ssl-port=%(ssl_port)s",
>+                "--certificate-path=%(certs_path)s", "--symbols-path=%(symbols_path)s"
>+            ],
>+        },

At some point we may want these in-tree, like talos.json.  Again, not a blocker.

>+sleeptime = 60

This might be a good thing to be able to configure, with a default of 60 (or whatever).

>+    def _redirectSUT(self, emuport, sutport1, sutport2):
>+        '''
>+        This redirects the default SUT ports for a given emulator.
>+        This is needed if more than one emulator is started.
>+        '''

This is interesting... temporary workaround or permanent solution?
You might want a self.info() at the beginning, maybe not.

>+    def _post_fatal(self, message=None, exit_code=None):
>+        """ After we call fatal(), run this method before exiting.
>+        """
>+        self._kill_processes("emulator64-x86")
>+
>+        # XXX aki, I' not sure exactly what this block is for
>+        if 'notify' in self.actions:
>+            self.notify(message=message, fatal=True)
>+        self.copy_logs_to_upload_dir()

This is to send me email and save logs for the hg-git process.
You don't need this block.

>+        joint_return_code = 0
>+        while True:
>+            for p in procs:
>+                return_code = p["process"].poll()
>+                if return_code!=None:
>+                    self.info("##### %s log begins" % p["suite_name"])

You may want to sys.stdout.write('\n') before this self.info(), so it doesn't show up on screen to the right of a million #'s.

>+                    if return_code !=0:
>+                        joint_return_code=1

I think you need to do something with this.

> if __name__ == '__main__':
>     emulatorTest = Androidx86EmulatorTest()
>-    emulatorTest.run_and_exit()
>+    emulatorTest.run()

This should be run_and_exit().
Attachment #799757 - Flags: review?(aki) → review+
Attachment #790394 - Attachment is obsolete: true
Attachment #790394 - Flags: review?(bugspam.Callek)
Attachment #800164 - Flags: review?(bugspam.Callek)
So sad :(

"Output exceeded 52428800 bytes, remaining output has been truncated"

This happens when all of the emulators do not actually fail right away.
Updating the maxLogSize is very unwanted since it affects the performance of the masters.

Plan to mitigate comment 118:
- pull verbose test jobs into their own separate Androix86 test set (run 1 emulator job instead of 4)

Plan to fix this issue (might not be implemented this time around):
- do not output test results
-- output only the test summary
- upload test log with blobber
- Tinderboxprint link to full log
- Teach tbpl how to follow those URLs and parse those
Landed: https://hg.mozilla.org/build/mozharness/rev/3b926f407e76

I will follow up with all feedback from comment 114, comment 116 and the minidump related comments.
Attachment #800164 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 800164 [details] [diff] [review]
androidx86.tools.diff - We add android-x86 on the linux masters instead of the mobile ones

http://hg.mozilla.org/build/tools/rev/f2f79ce56851
Attachment #800164 - Flags: checked-in+
Attachment #799757 - Flags: checked-in+
Merged to production branch. Live in production.
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #118)
> So sad :(
> 
> "Output exceeded 52428800 bytes, remaining output has been truncated"
> 
> This happens when all of the emulators do not actually fail right away.

You can also do the sleep(5), but only sys.stdout.write('#') if more than X amount of time has passed since the last one (60 seconds? 5min?)
> >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> >+        '''
> >+        This redirects the default SUT ports for a given emulator.
> >+        This is needed if more than one emulator is started.
> >+        '''
>
> This is interesting... temporary workaround or permanent solution?
> You might want a self.info() at the beginning, maybe not.
>

aki, what do you mean with self.info()? It is permanent. Each test job will talk to the pair of sut ports redirected for each emulator.
Flags: needinfo?(aki)
I can't see the jobs running on tbpl or buildapi even though in the "hidden builders" section I can see the androidx86 set jobs.
https://tbpl.mozilla.org/?tree=Ash&rev=0d4ae6057ef5&jobname=Android.*&showall=1
https://secure.pub.build.mozilla.org/buildapi/self-serve/ash/rev/0d4ae6057ef5

I will wait a bit before filing a bug.
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #125)
> > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > >+        '''
> > >+        This redirects the default SUT ports for a given emulator.
> > >+        This is needed if more than one emulator is started.
> > >+        '''
> >
> > This is interesting... temporary workaround or permanent solution?
> > You might want a self.info() at the beginning, maybe not.
> >
> 
> aki, what do you mean with self.info()? It is permanent. Each test job will
> talk to the pair of sut ports redirected for each emulator.

self.info("Attempting to redirect ports for X to ...")
Flags: needinfo?(aki)
Depends on: 913174
We can see the jobs in here:
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set

The status reporting for TBPL it is not yet working properly.
(In reply to Aki Sasaki [:aki] from comment #127)
> (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> Enginerring) (EDT/UTC-4) from comment #125)
> > > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > > >+        '''
> > > >+        This redirects the default SUT ports for a given emulator.
> > > >+        This is needed if more than one emulator is started.
> > > >+        '''
> > >
> > > This is interesting... temporary workaround or permanent solution?
> > > You might want a self.info() at the beginning, maybe not.
> > >
> > 
> > aki, what do you mean with self.info()? It is permanent. Each test job will
> > talk to the pair of sut ports redirected for each emulator.
> 
> self.info("Attempting to redirect ports for X to ...")

Does this do?
http://hg.mozilla.org/build/mozharness/file/default/scripts/androidx86_emulator_unittest.py#l159
    self.info("  Attempt #%d to redirect ports: (%d, %d, %d)" % \
            (attempts, emuport, sutport1, sutport2))
I'm now going to be pushing changes to ash-mozharness instead of my own user repo.
I've triggered a new set of jobs based on:
http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness/rev/460926e7ed43

The results will show up on the second run of these:
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%20x86%20Emulator%20ash%20opt%20test%20androidx86-set
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #129)
> (In reply to Aki Sasaki [:aki] from comment #127)
> > (In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release
> > Enginerring) (EDT/UTC-4) from comment #125)
> > > > >+    def _redirectSUT(self, emuport, sutport1, sutport2):
> > > > >+        '''
> > > > >+        This redirects the default SUT ports for a given emulator.
> > > > >+        This is needed if more than one emulator is started.
> > > > >+        '''
> > > >
> > > > This is interesting... temporary workaround or permanent solution?
> > > > You might want a self.info() at the beginning, maybe not.
> > > >
> > > 
> > > aki, what do you mean with self.info()? It is permanent. Each test job will
> > > talk to the pair of sut ports redirected for each emulator.
> > 
> > self.info("Attempting to redirect ports for X to ...")
> 
> Does this do?
> http://hg.mozilla.org/build/mozharness/file/default/scripts/
> androidx86_emulator_unittest.py#l159
>     self.info("  Attempt #%d to redirect ports: (%d, %d, %d)" % \
>             (attempts, emuport, sutport1, sutport2))

Ah. Yes.
Results on tbpl:
- m-2 crashes: https://tbpl.mozilla.org/php/getParsedLog.php?id=27485148&tree=Ash&full=1#error99
06:04:43  WARNING -  PROCESS-CRASH | /tests/content/canvas/test/test_canvas.html | application crashed [Unknown top frame]

gbrown, I would not look at other logs until I fix few more things. Feel free to look into the m-2 crash.

It seems that I still need to set the MINIDUMP_STACKWALK correctly:
> 06:04:43     INFO -  MINIDUMP_STACKWALK not set, can't process dump.
- xpcshell is failing to 'mkdr /mnt/sdcard/tests'; err='Could not create the directory /mnt/sdcard/tests'
https://tbpl.mozilla.org/php/getParsedLog.php?id=27487615&tree=Ash&full=1#error0
- The command used is: /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py --deviceIP=127.0.0.1 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --no-logfiles --manifest xpcshell/xpcshell_android.ini
* robocop is failing with this:
07:36:49     INFO - Running on test-x86-3 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level=INFO --app=org.mozilla.fennec --remote-webserver=10.0.2.2 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --utility-path=/builds/slave/talos-slave/test/build/hostutils/bin --deviceIP=127.0.0.1 --devicePort=20705 --http-port=8858 --ssl-port=4458 --certificate-path=/builds/slave/talos-slave/test/build/tests/certs --symbols-path=crashreporter-symbols.zip --total-chunks 2 --this-chunk 1 --robocop-path=. --robocop-ids=fennec_ids.txt --robocop=robocop.ini
07:37:19     INFO - ERROR: Unable to find robocop APK './robocop.apk'
07:37:19     INFO - ERROR: Invalid options specified, use --help for a list of valid options
07:37:19     INFO -  ERROR: Unable to find robocop APK './robocop.apk'
07:37:19     INFO -  ERROR: Invalid options specified, use --help for a list of valid options
07:37:19     INFO - TinderboxPrint: robocop-2<br/><em class="testfail">T-FAIL</em>

I will adjust the robocop path but I wonder the invalid option is.
Summary of test results:
- m-1 to m-4 are running [1]:
TinderboxPrint: mochitest-2: T-FAIL CRASH
TinderboxPrint: mochitest-1: 32373/1/63
TinderboxPrint: mochitest-4: 37567/3/200
TinderboxPrint: mochitest-3: 19809/4/55

- m-5 to m-8 are running [2]:
TinderboxPrint: mochitest-7: 13070/10/1923
TinderboxPrint: mochitest-8: 73338/0/61
TinderboxPrint: mochitest-5: 39233/4/611
TinderboxPrint: mochitest-6: 12771/0/27

My own notes:
- The TBPL status seems to work for set-1 and set-2
- We have some jobs retrying and I want to understand why
- It seems that minidump is not quire there but I have made some progress
08:32:58     INFO -  Crash dump filename: /tmp/tmpY_ydDv/242a7c9a-2af1-0a96-7cb91f82-77bfb3dc.dmp
08:32:58     INFO -  MINIDUMP_STACKWALK binary not found: /talos-slave/test/build/venv/lib/python2.7/site-packages/talos/breakpad/linux64/minidump_stackwalk

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27491043&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27490746&tree=Ash&full=1
something here is in production
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #133)
> - xpcshell is failing to 'mkdr /mnt/sdcard/tests'; err='Could not create the
> directory /mnt/sdcard/tests'
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=27487615&tree=Ash&full=1#error0
> - The command used is: /builds/slave/talos-slave/test/build/venv/bin/python
> /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py
> --deviceIP=127.0.0.1
> --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre
> --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules
> --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.
> apk --no-logfiles --manifest xpcshell/xpcshell_android.ini

Oops - there is no --devicePort in that command, so it is probably running on 20701, which is already running a mochitest job!!
Attached patch androidx86.mozharness.diff (obsolete) — Splinter Review
What do you think?
Attachment #800865 - Flags: feedback?(aki)
Attached patch [wip] androidx86.mozharness.diff (obsolete) — Splinter Review
I will leave the other patch as the one for feedback since it addressed aki's concerns. This one will be the wip which I will ask review for.
Summary:
* m-1 to m-8 are running well[1][2]
** gbrown will update androidx86.json to clear some test failures
* m-2 is crashing consistently
** I think that if I fix the minidump_stackwalk properly it will help fix it [3]
* The TBPL status seems to work for set-1 and set-2
* set-4 was the job that would normally RETRY
** It seems that it does not do it anymore after mochitest-gl stopped failing
** probably something in the output was triggering it
* I sometimes see command timeout of 2400 secs during the run_tests step
** I don't know if it is because we use this "sys.stdout.write('#')" instead of "self.info('#')

aki, if the timeout issues is not related to the usage of sys.stdout.write('#'), do you know if there is a safe way to see the last lines of each stdout before buildbot kills the job?



[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27491043&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27490746&tree=Ash&full=1
[3]
08:32:58     INFO -  Crash dump filename: /tmp/tmpY_ydDv/242a7c9a-2af1-0a96-7cb91f82-77bfb3dc.dmp
08:32:58     INFO -  MINIDUMP_STACKWALK binary not found: /talos-slave/test/build/venv/lib/python2.7/site-packages/talos/breakpad/linux64/minidump_stackwalk
Comment on attachment 800865 [details] [diff] [review]
androidx86.mozharness.diff

Thanks for the pyflakes cleanup; that was annoying me.
Getting this:

scripts/androidx86_emulator_unittest.py:373: local variable 'dirs' is assigned to but never used

>     def _post_fatal(self, message=None, exit_code=None):
>         """ After we call fatal(), run this method before exiting.
>         """
>         self._kill_processes("emulator64-x86")
>-
>-        # XXX aki, I' not sure exactly what this block is for
>-        if 'notify' in self.actions:
>-            self.notify(message=message, fatal=True)
>         self.copy_logs_to_upload_dir()

I don't think you have to copy the logs either.

>     def run_tests(self):
>         """
>         Run the tests
>         """
>         procs = []
>
>         emulator_index = 0
>         for suite_name in self.test_suites:
>             procs.append(self._trigger_test(suite_name, emulator_index))
>             emulator_index+=1
>
>-        joint_return_code = 0
>+        joint_tbpl_status = None
>+        joint_log_level = None
>         while True:
>             for p in procs:
>                 return_code = p["process"].poll()
>                 if return_code!=None:

If you're having problems timing out, I would put in debug output around both the procs.append above, and the 'for p in procs' here.
I'm guessing something around here is hanging.
If it helps to switch to self.info() instead of sys.stdout.write() that's fine, i'm just aware that that will be a) way more verbose and b) create a ton more lines of log.  Still, you'll have timestamps and it's temporary.

>+                    # aki: do I need this?
>                     # I'm not using the concept of "plain-#" like other jobs; do I need this logging?
>                     # e.g. The mochitest suite: plain4 ran with return status: SUCCESS
>                     #self.log("The %s suite: %s ran with return status: %s" %
>                     #         (suite_category, p["suite_name"], tbpl_status), level=log_level)
>                     self.info("##### %s log ends" % p["suite_name"])

I'm not sure?  Does your log have a good summary at the end?
Attachment #800865 - Flags: feedback?(aki) → feedback+
gbrown, is this mochitest-gl cmd built correctly?

10:45:00     INFO - Running on test-x86-1 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py --autorun --close-when-done --dm_trans=sut --console-level=INFO --app=org.mozilla.fennec --remote-webserver=10.0.2.2 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --utility-path=/builds/slave/talos-slave/test/build/hostutils/bin --deviceIP=127.0.0.1 --devicePort=20701 --http-port=8854 --ssl-port=4454 --certificate-path=/builds/slave/talos-slave/test/build/tests/certs --symbols-path=crashreporter-symbols.zip --test-path content/canvas/test/webgl --run-only-tests androidx86.json

What about xpcshell?

10:45:00     INFO - Running on test-x86-2 the command /builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/xpcshell/remotexpcshelltests.py --deviceIP=127.0.0.1 --devicePort=20703 --xre-path=/builds/slave/talos-slave/test/build/hostutils/xre --testing-modules-dir=/builds/slave/talos-slave/test/build/tests/modules --apk=/builds/slave/talos-slave/test/build/fennec-26.0a1.en-US.android-i386.apk --no-logfiles --manifest xpcshell/tests/xpcshell_android.ini
Flags: needinfo?(gbrown)
> What about xpcshell?

Ignore this last one, I have to look into an incorrect path:
10:51:00     INFO - IOError: Missing files: xpcshell/tests/xpcshell_android.ini
(In reply to Armen Zambrano [:armenzg] (Use needinfo flag) (Release Enginerring) (EDT/UTC-4) from comment #143)
> gbrown, is this mochitest-gl cmd built correctly?

That looks correct, except you should remove "--run-only-tests androidx86.json"
Flags: needinfo?(gbrown)
Relative paths always make me nervous. I would prefer to see an absolute path for content/canvas/test/webgl and xpcshell/tests/xpcshell_android.ini.
Attached patch androidx86.mozharness.diff (obsolete) — Splinter Review
Added feedback from aki.
Added full paths.
I hope to have fixed the minidumps situation.
Other fixes.

I'm aiming to get xpcshell, jsreftest and reftests running by the end of Tuesday.
Attachment #800865 - Attachment is obsolete: true
Attachment #800936 - Attachment is obsolete: true
(In reply to Dustin J. Mitchell [:dustin] from comment #89)

...to comment 92 are all answered in Bug 913011
gbrown, can you please look at this?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27648536&tree=Ash&full=1#error0
09:41:43     INFO - REFTEST TEST-UNEXPECTED-FAIL | | HTTP ERROR : 404
09:41:43     INFO - REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: Error 6 in manifest file http://10.0.2.2:8854//builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list line 1

I also see that mochitest-gl has gone a bit further:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27598902&tree=Ash&full=1#error16
but not a clean run:
15:08:58     INFO - Traceback (most recent call last):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 689, in main
15:08:58     INFO -     retVal = mochitest.runTests(options)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtests.py", line 624, in runTests
15:08:58     INFO -     self.cleanup(manifest, options)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/runtestsremote.py", line 243, in cleanup
15:08:58     INFO -     if self._dm.fileExists(self.remoteLog):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 404, in fileExists
15:08:58     INFO -     return filename in self.listFiles(containingpath)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 408, in listFiles
15:08:58     INFO -     if not self.dirExists(rootdir):
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 389, in dirExists
15:08:58     INFO -     ret = self._runCmds([{ 'cmd': 'isdir ' + remotePath }]).strip()
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 152, in _runCmds
15:08:58     INFO -     self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
15:08:58     INFO -   File "/builds/slave/talos-slave/test/build/tests/mochitest/devicemanagerSUT.py", line 134, in _sendCmds
15:08:58     INFO -     raise err
15:08:58     INFO - DMError: Automation Error: Timeout in command isdir /mnt/sdcard/tests/logs
15:08:58     INFO - Automation Error: Exception caught while running tests

I will trigger a new fresh set of builds based on:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Android&rev=074ec56640f6


Thanks!
Flags: needinfo?(gbrown)
"My" reftests run much better -- but currently with lots of errors! -- with:

/home/cltbld/tests/scripts/scripts/build/venv/bin/python remotereftest.py 
--app=org.mozilla.fennec --ignore-window-size --remote-webserver 10.0.2.2 
--xre-path /home/cltbld/tests/scripts/scripts/build/hostutils/xre 
--utility-path /home/cltbld/tests/scripts/scripts/build/hostutils/bin 
--deviceIP 127.0.0.1 --devicePort 20701 --http-port 8888 --ssl-port 4445 
--httpd-path reftest/components 
--total-chunks 10 --this-chunk 1 
--symbols-path crashreporter-symbols.zip 
tests/layout/reftests/reftest.list

Compared to "your" reftests:

/builds/slave/talos-slave/test/build/venv/bin/python /builds/slave/talos-slave/test/build/tests/reftest/remotereftest.py 
--app=org.mozilla.fennec --ignore-window-size --remote-webserver=10.0.2.2 
--xre-path=/builds/slave/talos-slave/test/build/hostutils/xre 
--utility-path=/builds/slave/talos-slave/test/build/hostutils/bin 
--deviceIP=127.0.0.1 --devicePort=20701 --http-port=8854 --ssl-port=4454 
--httpd-path reftest/components 
--total-chunks 4 --this-chunk 1 
/builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list

The only significant difference I see is your full path for reftest.list -- in this case, I think we  need a relative path. 

Also, I found that reftests run much slower on emulator -- I expect you need at least 10 chunks to avoid 60-minute timeouts.
Flags: needinfo?(gbrown)
More evidence for that relative path...

My log has lines like:

INFO -  REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/reftests/reftest-sanity/test-async.xul

Compared to:

INFO - REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: Error 6 in manifest file http://10.0.2.2:8854//builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/reftest.list line 1

This isn't going to work: http://10.0.2.2:8854//builds/...
                                              ^^
Attached patch androidx86.mozharness.diff (obsolete) — Splinter Review
Switching back to relative paths.
We're testing the new inbound builds.
We have the tbpl patches live.

It's not been a very productive day. I hope to complete tomorrow what I had hoped to complete today.
Attachment #801806 - Attachment is obsolete: true
I also split the reftest chunks into 10.
At least the 4 that were mentioned on the buildbot-configs.

https://tbpl.mozilla.org/?tree=Ash&jobname=Android 4.2 x86&rev=1c67140bc6a3 - The second the set of jobs will be using the build from m-i (as per comment 149)
Status summary:
* m-[1-8] are running green [1][2]
* xpcshell is running green [3]
* mochitest-gl is running but it fails [3]
* robocop-{1,2} are running but 2 tests fail [3]
* reftests should be running before the end of the day


[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700133&tree=Ash&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700349&tree=Ash&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27705576&tree=Ash&full=1

https://tbpl.mozilla.org/?tree=Ash&jobname=Android%204.2%20x86&rev=1c67140bc6a3
This is very close.

I need to see set 3 & 5 not timing out and that should be mainly it.
Attachment #802585 - Attachment is obsolete: true
Attachment #803294 - Flags: review?(aki)
Attachment #803294 - Flags: feedback?(gbrown)
Attached patch androidx86.configs.diff (obsolete) — Splinter Review
These are the branches where it would be enabled:
Android 4.2 x86 Emulator alder opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ash opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator b2g-inbound opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator birch opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator build-system opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator cedar opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator cypress opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator elm opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator fig opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator fx-team opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator graphics opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator gum opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator holly opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ionmonkey opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator jamun opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator larch opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator maple opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator mozilla-central opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator mozilla-inbound opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator oak opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator pine opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator profiling opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator services-central opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator try opt test androidx86-set-1 ScriptFactory
Android 4.2 x86 Emulator ux opt test androidx86-set-1 ScriptFactory
Attachment #803295 - Flags: review?(aki)
Comment on attachment 803295 [details] [diff] [review]
androidx86.configs.diff

Why TEMP ?
Attachment #803295 - Flags: review?(aki) → review+
gbrown, could you please have a look at these reftests, crashtest and jsreftests results?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27726116&tree=Ash&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=27726080&tree=Ash&full=1
Flags: needinfo?(gbrown)
Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Good job!  And I'm really happy to see the pyflakes warnings go away too.

>     def setup_avds(self):
>         '''
>         We have deployed through Puppet tar ball with the pristine templates.
>-        If they have not been untarred before we go ahead and do so.
>+        Let's unpack them every time.
>         '''
>-        if not os.path.exists(os.path.join(self.config[".avds_dir"], "test-x86-1.avd")):
>-            avds_path = self.config["avds_path"]
>-            self.mkdir_p(self.config[".avds_dir"])
>-            self.unpack(avds_path, self.config[".avds_dir"])
>+        if os.path.exists(os.path.join(self.config[".avds_dir"], "test-x86-1.avd")):
>+           shutil.rmtree(self.config[".avds_dir"])

self.rmtree?
Attachment #803294 - Flags: review?(aki) → review+
Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Landed so we can see nicer results on Cedar.
Any follow up feedback or fixes will come in new patch to address them separately.
Attachment #803294 - Flags: checked-in+
(In reply to Aki Sasaki [:aki] from comment #157)
> Comment on attachment 803295 [details] [diff] [review]
> androidx86.configs.diff
> 
> Why TEMP ?
I can leave it as it was to match the naming of other dictionaries in the file.
I have pet peeve with naming things very similar to other variables.

Summary:
> * m-[1-8] are running green [1][2]
> * xpcshell is running green [3]
> * mochitest-gl is running but it fails [3]
** gbrown to look into it
> * robocop-{1,2} are running but 2 tests fail [3]
* reftests, crashtest and jsreftests can run but FAIL [4][5]
** gbrown to look into it
* once mozharness is merged to production, we will be able to see the x86 jobs run as I mention in this summary
* I would like to wait until Cedar is green before we enable them across the board
** blassey works for you? (Or add voluntold to go to every tree to hide/show jobs as they green out)

> [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700133&tree=Ash&full=1
> [2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27700349&tree=Ash&full=1
> [3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27705576&tree=Ash&full=1
> 
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27726116&tree=Ash&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27726080&tree=Ash&full=1
Flags: needinfo?(blassey.bugs)
Whiteboard: [reit-x86] → [reit-x86] summary in comment 161
voluntold??

do you have any gut feeling for how long it'll take to green cedar up?
Flags: needinfo?(blassey.bugs)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #162)
> voluntold??
>
Volunteer + told :P 
nvm. ignore me :)

> do you have any gut feeling for how long it'll take to green cedar up?

I can only get things as I mentioned on comment 161.
I will work *today/tomorrow* towards getting Cedar in such state (merge mozharness to production + trigger new set of builds)
mochitest-gl, reftests, crashtest and jsreftests are as of now out of my hand.
Merged to production and triggered new set of builds on Cedar.

Results to be found in here:
https://tbpl.mozilla.org/?tree=Cedar&jobname=Android 4.2 x86&rev=e6f8b77a8824
Comment on attachment 803294 [details] [diff] [review]
androidx86.mozharness.diff

Review of attachment 803294 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine.

I am investigating the remaining test failures. We may need to add some chunks or introduce another x86-specific manifest, but other than that, I don't expect more harness changes.
Attachment #803294 - Flags: feedback?(gbrown) → feedback+
For some odd reason, Cedar is not reporting the results that I expected. I will look into it while gbrown looks into the issues that I reported earlier.

I hope to have some fixes landed before EOD tomorrow.
https://tbpl.mozilla.org/php/getParsedLog.php?id=27772974&tree=Ash&full=1#error49 has a crash dump (good) but it lacks symbols (no file names, line numbers -- bad). I am not sure where this is going wrong. I think the harness should be invoking "minidump_stackwalk <dmp file> <symbols dir>". I have a bad feeling that we are passing crashreporter-symbols.zip as <symbols dir>, instead of unpacking crashreporter-symbols.zip to a directory and passing that directory name to minidump_stackwalk.
Attached patch add def worst_tbpl_status() (obsolete) — Splinter Review
Somewhere down the line I failed to bring this part of my code into the review.
Attachment #804437 - Flags: review?(aki)
Comment on attachment 804437 [details] [diff] [review]
add def worst_tbpl_status()

Hm, I use

    self.tbpl_status = self.worst_level(TBPL_WARNING, self.tbpl_status,
                                        levels=TBPL_WORST_LEVEL_TUPLE)

http://hg.mozilla.org/build/mozharness/file/a660ae1a633f/mozharness/mozilla/testing/unittest.py#l135

This seems to be dup code?  I'm fine with having this method, but maybe it should call self.worst_level().
Comment on attachment 804437 [details] [diff] [review]
add def worst_tbpl_status()

Minusing for now, due to comment 169.
Attachment #804437 - Flags: review?(aki) → review-
Running on Ash. I will ask for review when I see them complete.
Attachment #804437 - Attachment is obsolete: true
I asked on IRC but just in case our day is over and can't get back to me through IRC

I'm in the process of having Cedar match comment 161. Probably before EOD.

gbrown, what would you prefer me to help with?
* bug 915870 - make sure that our funky builder naming works with trychooser
* help with the minidump issue - comment 167

On another note, do we need to split reftests even more? (currently running 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #172)
> gbrown, what would you prefer me to help with?
> * bug 915870 - make sure that our funky builder naming works with trychooser
> * help with the minidump issue - comment 167

The minidump issue please.

> On another note, do we need to split reftests even more? (currently running
> 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)

I think we will need to split reftests more, but I'm not sure how much. We are also considering running with skia-gl disabled -- some discussion in bug 907351 -- so I am running some special tests to see how much of a difference that makes. I'll try to get back to you with a recommendation for # chunks by Monday morning.
Flags: needinfo?(gbrown)
Comment on attachment 804623 [details] [diff] [review]
tbpl's worst status - v2

It works!
Attachment #804623 - Flags: review?(aki)
This is wip.

The theory is that you can pass to --symbols-path either a path or a URL and the test harnesses take care of it.

I see that we have "download-symbols" set to be "ondemand"
> 'download_symbols': 'ondemand'
which causes us to set self.symbols_path to be self.symbols_url

http://hg.mozilla.org/build/mozharness/file/production/mozharness/mozilla/testing/testbase.py#l250

Let's see if we get a crash in https://tbpl.mozilla.org/?tree=Ash&jobname=Android%204.2%20x86&rev=82db508f2304
Attachment #804623 - Flags: review?(aki) → review+
Comment on attachment 804623 [details] [diff] [review]
tbpl's worst status - v2

https://hg.mozilla.org/build/mozharness/rev/9ef0a3e99b55

I will re-trigger the jobs in here and we should see what I mentioned on comment 161 (in an hour from now):
https://tbpl.mozilla.org/?tree=Cedar&jobname=Android%204.2%20x86&rev=e6f8b77a8824
Attachment #804623 - Flags: checked-in+
I see the symbols-path set to a URL but I don't see that the output is any different.

gbrown, is there anything else left to be set before this should work?
https://tbpl.mozilla.org/php/getParsedLog.php?id=27848103&tree=Ash&full=1#error0

I do see this though:
14:21:49     INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/ash-android-x86/1379011131/fennec-26.0a1.en-US.android-i386.crashreporter-symbols.zip
http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#73

I will look at mozcrash's code on Monday.
Flags: needinfo?(gbrown)
Cedar is looking good :)

Comment 161 *might* need refreshing on Monday since I can already see that mochitest-3 is crashing when it didn't use to.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #177)
> I see the symbols-path set to a URL but I don't see that the output is any
> different.

That looks like it should work.

:ted -- Can you see what is going wrong here? As shown in Comment 177, mozcrash reports that it downloads symbols, but I do not see symbols in the resulting crash dump.
Flags: needinfo?(gbrown) → needinfo?(ted)
Depends on: 916657
(In reply to Geoff Brown [:gbrown] from comment #173)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> > On another note, do we need to split reftests even more? (currently running
> > 10 chunks) or bump a timeout inside of the *test* harness? (not mozharness)
> 
> I think we will need to split reftests more, but I'm not sure how much. We
> are also considering running with skia-gl disabled -- some discussion in bug
> 907351 -- so I am running some special tests to see how much of a difference
> that makes. I'll try to get back to you with a recommendation for # chunks
> by Monday morning.

Try:

crashtests -- 3 chunks
js-reftests -- 6 chunks

The plain-reftest situation is pretty bad. I think we need 20+ chunks for green runs -- we may want to wait for progress on bug 916657.
Attachment #805282 - Flags: review?(aki)
Carrying forward the part of the patch which adds the tests across the board.
For now, this patch is on hold until we get everything green on Cedar.
Attachment #803295 - Attachment is obsolete: true
Attachment #805288 - Flags: review+
Summary (these results are from Cedar):

Running well:
* m-1, m-2 and m-4 are green; m-3 crashes [1]
* m-[4-8] are green [2]
* xpcshell is running green [4]

Suites needing attention:
* mochitest-gl is crashing [4]
** gbrown to look into it
* robocop-{1,2} are running but 2 tests fail [4]
* reftests, crashtest and jsreftests can run but FAIL/timeout [3][5]
** more crashtest and jsreftest chunking will happen today/tomorrow
** reftests will need more investigation - bug 916657

Others:
* minidumps are fixed
* crash symbols are not giving source code lines
** waiting on "needinfo" for ted
* adjust trychooser to handle x86 "sets" of suites approach
** bug 915870

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27851558&tree=Cedar&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27848601&tree=Cedar&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850566&tree=Cedar&full=1
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27849762&tree=Cedar&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850902&tree=Cedar&full=1
Whiteboard: [reit-x86] summary in comment 161 → [reit-x86] summary in comment 184
I know that I've said that we should wait until everything is green on Cedar, however, after seeing the bug filed for reftests I wonder if we should go out the door with whatever is green and leave reftests only running on Cedar and Ash.

What do you think?
Attachment #805282 - Flags: review?(aki) → review+
This isn't a symbols issue, this dump looks completely broken. If you can grab one of these minidumps off of a slave and attach it here or somewhere else we can poke at it.
Flags: needinfo?(ted)
Attachment #805281 - Flags: review?(gbrown) → review+
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #185)
> I know that I've said that we should wait until everything is green on
> Cedar, however, after seeing the bug filed for reftests I wonder if we
> should go out the door with whatever is green and leave reftests only
> running on Cedar and Ash.

I would prefer that. If the reftest issue takes a while to sort out, we risk losing today's greens over time.
Depends on: 916923
Attachment #805282 - Attachment is obsolete: true
Attachment #805288 - Attachment is obsolete: true
Attachment #805514 - Flags: review?(aki)
Comment on attachment 805281 [details] [diff] [review]
chunk jsreftest into 6 chunks and crashtest into 3

https://hg.mozilla.org/build/mozharness/rev/5eca80d07e33
Attachment #805281 - Flags: checked-in+
Attachment #804696 - Flags: review?(aki)
Attachment #804696 - Flags: review?(aki) → review+
Comment on attachment 805514 [details] [diff] [review]
enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar

Thanks for fixing the dict spacing!  That was bugging me.
Attachment #805514 - Flags: review?(aki) → review+
Comment on attachment 805514 [details] [diff] [review]
enable sets 1 & 2 - move failing suites to sets 3 to 8 on Ash and Cedar

https://hg.mozilla.org/build/buildbot-configs/rev/e5177c27ce46
Attachment #805514 - Flags: checked-in+
Summary (these results are from Cedar):

Coming up:
* we're enabling sets 1 and 2 across the board (whenever we have a reconfig)
** m-{1,2,4,5,6,7,8} and xpcshell (not m-3)
* the remaining suites will run on Cedar and Ash
** as suites get fixed on Cedar we will move them to the all other branches
* landed fix for download symbols correctly

Running well:
* m-1, m-2 and m-4 are green; m-3 crashes [1]
* m-[4-8] are green [2]
* xpcshell is running green [4]

Suites needing attention:
* mochitest-gl is crashing [4]
** gbrown to look into it
* robocop-{1,2} are running but 2 tests fail [4]
* reftests, crashtest and jsreftests can run but FAIL/timeout [3][5]
** more crashtest and jsreftest chunking will happen today/tomorrow
** reftests will need more investigation - bug 916657

Others:
* crash symbols are not giving source code lines
** ted and gbrown investigating in bug 916923
* bug 915870 - adjust trychooser to handle x86 "sets" of suites approach

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27851558&tree=Cedar&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=27848601&tree=Cedar&full=1
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850566&tree=Cedar&full=1
[4] https://tbpl.mozilla.org/php/getParsedLog.php?id=27849762&tree=Cedar&full=1
[5] https://tbpl.mozilla.org/php/getParsedLog.php?id=27850902&tree=Cedar&full=1
Whiteboard: [reit-x86] summary in comment 184 → [reit-x86] summary in comment 193
Depends on: 917053
This should be in production.
We now have sets 1 & 2 running everywhere:
https://tbpl.mozilla.org/?jobname=Android%204.2%20x86&rev=1d27c4c9871f

It seems like splitting crashtest into 3 chunks made them run green.
Should we get those out to other branches?

The same with jsreftests, splitting them into 6 made them run green (except jsreftest-5).
Would you want to fix #5 before moving them to other branches?

robocop-2 is not failing any tests. robocop-1 is only failing 2 tests.
(In reply to Ed Morley [:edmorley UTC+1] from comment #196)
> We don't have proper crash stacks (bug )

Bug 916923
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #195)
> We now have sets 1 & 2 running everywhere:

Awesome!
 
> It seems like splitting crashtest into 3 chunks made them run green.
> Should we get those out to other branches?

I think so.
 
> The same with jsreftests, splitting them into 6 made them run green (except
> jsreftest-5).
> Would you want to fix #5 before moving them to other branches?
 
I think so -- I'll be looking at jsreftest-5 more closely today.

> robocop-2 is not failing any tests. robocop-1 is only failing 2 tests.

There are some non-x86 robocop patches landing today that may help.

There is a new patch landing in bug 913627 which I hope will green up M3.

I hope to have a patch up for review in bug 917053 today to fix M-gl.
No longer depends on: 916923
No longer depends on: 917053
No longer depends on: 916657
It's rather cumbersome to add green test suites to tbpl if they have to be hidden right away (due to tbpl's per-branch nature as well as waiting for them to be scheduled first).
Could we try fixing them on Cedar for this week and see what is ready for next week?
Could we use bug 891959 for further status updates as well as adding more tests suites to tbpl?

FYI, my biggest focus will be bug 915870.
I will not be able to look at bug 891959 before next week. Is there anyone besides gbrown that would be interested to give a hand with it?

FTR, I need a day or two to meet some Summit Preparation deadlines that are happening this week.
Flags: needinfo?(gbrown)
Flags: needinfo?(blassey.bugs)
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #199) 
> FYI, my biggest focus will be bug 915870.
> I will not be able to look at bug 891959 before next week. Is there anyone
> besides gbrown that would be interested to give a hand with it?
> 
I meant bug 917361 (make it easy for a dev to run the Android x86 test jobs).
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #199)
> Could we try fixing them on Cedar for this week and see what is ready for
> next week?

That seems reasonable. With patches on the go, I expect we can turn everything green on Cedar this week, except for plain reftests.

> Could we use bug 891959 for further status updates as well as adding more
> tests suites to tbpl?

OK.
Flags: needinfo?(gbrown)
sounds good
Flags: needinfo?(blassey.bugs)
I have triggered a new set of builds on Cedar where some changes that gbrown landed on m-i will be integrated.

A new summary will be given in bug 891959 once those builds complete.

This week the focus will be on bug 915870 for the try support.
Next week we will add whatever we green out this week.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: