Closed
Bug 777440
Opened 12 years ago
Closed 12 years ago
Estimate releng work for supporting 2 pass linking
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: hwine)
References
Details
(Whiteboard: [mobile] [releases])
Attachments
(1 file, 3 obsolete files)
The blog entry referenced in bug 748488 comment #0 outlines a process to build fennec, run that build on a panda, then relink based on perf data for the "official" binary. Hal had to deal with this at prior company, where "2 pass compile" became a dirty word, so will help us avoid that.
Note that some supporting work will need to be done by ateam and likely requires panda boards.
Initial desire is to have this integrated into all nightly & release builds.
Comment 1•12 years ago
|
||
There is some prior art for this already:
- Windows and Linux PGO builds do a two-pass compile/link, with a performance run done between the 1st and 2nd pass
- OSX builds do a two-pass compile/link, one for i686 and one for x86_64
Assignee | ||
Comment 2•12 years ago
|
||
Yes, Armen had pointed to the prior art. The difference (which was the sticking point at my prior job) is that the reliability of the connection to/from the tegra is a lot less reliable than that to a desktop os machine.
Based on prior experience, the existing timeout and retry logic may need to be augmented for this case. Or, maybe panda boards will be as reliable as a desktop os machine. I'm a dreamer!
Assignee | ||
Comment 3•12 years ago
|
||
Next step is to see if preferred "phone like device" (QEMU) will work with :gbrown for this process.
Comment 4•12 years ago
|
||
Here's a first attempt at demonstrating the 2-phase build process using the Android emulator. I think it could be easily adapted for use on a real device.
On the emulator, the set-up appears to work, but I am having trouble running fennec-via-valgrind. I see this in logcat:
08-28 22:50:50.570 I/ActivityManager( 77): START {flg=0x10000000 cmp=org.mozilla.fennec_mozdev/.App (has extras)} from pid 563
08-28 22:50:50.610 D/PermissionCache( 36): checking android.permission.READ_FRAME_BUFFER for uid=1000 => granted (1051 us)
08-28 22:50:50.621 W/WindowManager( 77): Failure taking screenshot for (120x180) to layer 21000
08-28 22:50:50.661 D/AndroidRuntime( 563): Shutting down VM
08-28 22:50:50.671 D/dalvikvm( 563): GC_CONCURRENT freed 98K, 77% free 482K/2048K, paused 1ms+1ms
08-28 22:50:50.700 I/AndroidRuntime( 563): NOTE: attach of thread 'Binder Thread #3' failed
08-28 22:50:50.756 D/dalvikvm( 574): Not late-enabling CheckJNI (already on)
08-28 22:50:50.802 I/dalvikvm( 574): Exec: /system/bin/sh -c /data/local/start_valgrind_fennec /system/bin/app_process /system/bin
--application '--nice-name=org.mozilla.fennec_mozdev' com.android.internal.os.WrapperInit 29 14 'android.app.ActivityThread'
08-28 22:50:50.831 D/Zygote ( 37): Process 574 exited cleanly (2)
08-28 22:50:50.831 W/Zygote ( 37): Error reading pid from wrapped process, child may have died
08-28 22:50:50.831 W/Zygote ( 37): java.io.EOFException
08-28 22:50:50.831 W/Zygote ( 37): at libcore.io.Streams.readFully(Streams.java:83)
08-28 22:50:50.831 W/Zygote ( 37): at java.io.DataInputStream.readInt(DataInputStream.java:124)
08-28 22:50:50.831 W/Zygote ( 37): at com.android.internal.os.ZygoteConnection.handleParentProc(ZygoteConnection.java:908)
08-28 22:50:50.831 W/Zygote ( 37): at com.android.internal.os.ZygoteConnection.runOnce(ZygoteConnection.java:258)
08-28 22:50:50.831 W/Zygote ( 37): at com.android.internal.os.ZygoteInit.runSelectLoopMode(ZygoteInit.java:649)
08-28 22:50:50.831 W/Zygote ( 37): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:546)
08-28 22:50:50.831 W/Zygote ( 37): at dalvik.system.NativeStart.main(Native Method)
08-28 22:50:50.831 I/ActivityManager( 77): Start proc org.mozilla.fennec_mozdev for activity org.mozilla.fennec_mozdev/.App: pid=5
74 uid=10040 gids={3003, 1015, 1006}
Comment 5•12 years ago
|
||
Also, if I disable valgrind (setprop wrap.mozilla.fennec_mozdev '') and start Fennec directly in the emulator, if briefly displays about:home, then dies with:
I/Gecko ( 649): An error occurred earlier while querying gfx info: eglChooseConfig returned zero OpenGL ES2 configs. Maybe this device does not support OpenGL ES2?.
I/Gecko ( 649): ###!!! ABORT: OpenGL-accelerated layers are a hard requirement on this platform. Cannot continue without support for them.: file /home/mozdev/src/widget/xpwidgets/nsBaseWidget.cpp, line 808
F/libc ( 649): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
There are notes on enabling OpenGL ES2 emulator support here: http://developer.android.com/tools/devices/emulator.html and http://android-developers.blogspot.ca/2012/04/faster-emulator-with-better-hardware.html
Comment 6•12 years ago
|
||
...but when I create and use an AVD with hg.gpu.enabled=yes and pass -gpu on on the command line, it still fails, with only a slightly different error:
I/Gecko ( 536): An error occurred earlier while querying gfx info: eglCreatePbufferSurface failed (EGL error 3009).
I/Gecko ( 536): ###!!! ABORT: OpenGL-accelerated layers are a hard requirement on this platform. Cannot continue without support for them.: file /home/mozdev/src/widget/xpwidgets/nsBaseWidget.cpp, line 808
F/libc ( 536): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 552 (Gecko)
Comment 7•12 years ago
|
||
I also tried *only* enabling "GPU Emulation" in the AVD and, in another run, only passing '-gpu on' on the command line -- no change. Other people, notably :kats, report having no such problem with OpenGL in the emulator.
Apparently GPU emulation involves passing calls to the host, so there may be some incompatibility with my Ubuntu environment.
To move forward, I have adapted my script to work with a device and am now trying to run on a Galaxy Nexus and Pandaboard. This resolves the OpenGL problems, of course, but I am still having trouble executing valgrind. I have found some problems with the start_valgrind_fennec script -- trying to work around those now....
Assignee | ||
Comment 8•12 years ago
|
||
Geoff - what do you need to unblock QEMU work? Using an emulator is still the best option for this. We can provide loaner hardware on the VPN running centos (which we use in prodution) if you have a rough spec on graphics card needs.
Comment 9•12 years ago
|
||
Note for releng - not all of our build hosts will have OpenGL available. Not sure if that's going to be a blocker or not.
Comment 10•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #9)
> Note for releng - not all of our build hosts will have OpenGL available. Not
> sure if that's going to be a blocker or not.
I think that's a significant cause for concern. As I understand it, without OpenGL on the host, we cannot run Fennec in the emulator...therefore cannot collect the usage logs we need for this procedure. My experience suggests that even when OpenGL is available on the host, the emulator may not recognize it. If we have a variety of build host configurations / capabilities, the emulator approach may not be viable.
Assignee | ||
Comment 11•12 years ago
|
||
Geoff - we can standardize some dev boxes much, MUCH, easier than we can attach real hardware :)
As you untangle the requirements, just mention them here - we'll scream if it's an issue.
Comment 12•12 years ago
|
||
Following the experience related in comment 7, I tried running on the Galaxy Nexus -- no sign of GPU emulation problems there, but I ran out of memory running valgrind (I did not configure swap).
I had been running the emulator in Ubuntu, in a VM on a Macbook Pro. I tried installing all sorts of OpenGL/EGL packages in Ubuntu, but the error shown in comment 6 persisted. On a hunch that perhaps the VM was confusing the emulator's GPU emulation, I tried installing the Android SDK on the Macbook Pro directly -- MacOSX 10.6.8 -- and running the emulator there, with a Fennec APK built on Ubuntu. Yahoo - it works: no more OpenGL/EGL errors! (And simply passing "-gpu on" on the emulator command line was sufficient.)
I had minor problems with memory management. valgrind needs lots of memory and seems to run into problems if the emulator -memory argument is less than about 1GB. On the other hand, if the emulator -memory argument is too large (say 2GB), the emulator fails to start, complaining about failed allocations. I settled on -memory 1200 and haven't seen any memory problems since.
I had additional problems with process management: We need to be careful that we give Fennec and valgrind enough time to start up so that we can collect meaningful logs. I observed start up times of about 90 seconds typically; for good measure, I am now allowing 180 seconds after each launch. To shut down Fennec after this period, I settled on using the "am force-stop" command -- it seems very effective; I also use "am kill"...just to be sure.
** There is one remaining problem: the Android screen-lock/keyguard. When the emulator starts, the keyguard is enabled. If a script launches Fennec via an intent, with the keyguard enabled, Fennec starts but does not complete its normal start-up sequence. In this case, we get valgrind logs, but they are short and typically do not contain libxul references. On the other hand, if the keyguard is disabled, Fennec starts normally and valgrind logs contain libxul references. I thought I would automatically disable the keyguard with the watcher - as we do on the tegras - but I have not been able to get this to work yet.
Logs from a successful run - Android emulator running on MacOSX with GPU Emulation, -memory 1200, keyguard disabled manually - are available at http://people.mozilla.org/~gbrown/bug777440/logs.tar.gz.
Running through the whole process -- start emulator, re-package Fennec, install Fennec, install valgrind, run 3 times with waits, process logs, re-build libxul, re-package Fennec -- takes about 30 minutes.
Comment 13•12 years ago
|
||
Attachment #656457 -
Attachment is obsolete: true
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #9)
> Note for releng - not all of our build hosts will have OpenGL available. Not
> sure if that's going to be a blocker or not.
Just leaving a bread crumb for the future - atm, I don't think it's worth investigating. It is possible to install some versions of OpenGL without graphics cards - see http://bergbom.blogspot.com/2011/04/off-screen-rendering-without-graphics.html for example.
Comment 15•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #12)
> ** There is one remaining problem: the Android screen-lock/keyguard. When
> the emulator starts, the keyguard is enabled. If a script launches Fennec
> via an intent, with the keyguard enabled, Fennec starts but does not
> complete its normal start-up sequence. In this case, we get valgrind logs,
> but they are short and typically do not contain libxul references. On the
> other hand, if the keyguard is disabled, Fennec starts normally and valgrind
> logs contain libxul references. I thought I would automatically disable the
> keyguard with the watcher - as we do on the tegras - but I have not been
> able to get this to work yet.
The watcher does not auto-start because the BOOT_COMPLETED intent is not automatically broadcast on the emulator. BUT we can manually send that intent and it successfully launches the watcher, clearing the keyguard (and starting the sut agent):
adb -e shell am broadcast -a android.intent.action.BOOT_COMPLETED -c android.intent.category.HOME -n com.mozilla.watcher/.WatcherReceiver
See also: http://stackoverflow.com/questions/9241667/how-to-reboot-emulator-to-test-action-boot-completed
Comment 16•12 years ago
|
||
Updated to install and start the Watcher, to disable the keyguard.
This would be a great time for others to have a look at and/or try the script and provide feedback.
Hal -- can you take over now? Anything else I can do?
Attachment #659386 -
Attachment is obsolete: true
Attachment #659836 -
Flags: feedback?(hwine)
Assignee | ||
Comment 17•12 years ago
|
||
Geoff - thanks for your work on showing it works with the emulator!
I'll coordinate from here on. Thanks!
Comment 18•12 years ago
|
||
Comment on attachment 659836 [details]
prototype shell script demonstrating http://glandium.org/blog/?p=2467 on android emulator
># start emulator
>echo starting emulator...
># create an SD card image large enough to hold libxul.so
>mksdcard 512M $ANDROID_IMAGES/sdcard.img
>sh -c "emulator -kernel $ANDROID_IMAGES/kernel-qemu -sysdir $ANDROID_IMAGES -data $ANDROID_IMAGES/userdata.img -sdcard $ANDROID_IMAGES/sdcard.img -memory $EMULATOR_MEMORY -gpu on -qemu -cpu 'cortex-a8' &"
...
># wait for the sdcard to be mounted
>sleep 30
Ok, I have to impart some "working with tegra" learned knowledge on you. explicit sleep's are *sometimes* necessary, but every time you do one you are going to get *some* failures because you didn't sleep long enough and every time you increase the sleep you're also going to waste a lot of time overall because of the many times the task[s] will complete before your sleep time wears off.
Mutually Exclusive Suggestions:
* Register for and Wait for the Android MOUNTED intent to get sent
* sleep 5 seconds at a time, and loop manually checking if the sdcard is indeed mounted (and not read-only)
* Find a way to start the emulator in an "sdcard is already attached" mode, and use adb to install/write to the sdcard manually for the profiling.
* Use SUTAgent [which we're trying to properly setup to need a mounted/working sdcard]
Assignee | ||
Comment 19•12 years ago
|
||
Comment on attachment 659836 [details]
prototype shell script demonstrating http://glandium.org/blog/?p=2467 on android emulator
Already communicated, but I'll repeat myself to avoid the bz nagbot
This is great work, and gives us a working example to move forward with.
Attachment #659836 -
Flags: feedback?(hwine) → feedback+
Assignee | ||
Comment 20•12 years ago
|
||
Estimate of effort to implement is 2 weeks.
Assignee | ||
Comment 21•12 years ago
|
||
The bulk of the 2 week estimate is to set up the emulator for a machine with no graphics chip, and get that part of builder setup. The remainder is for the normal staging runs, config modifications, etc.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
See Also: → android-pgo-ARMv7
Comment 22•12 years ago
|
||
Note these important updates:
- MOZ_ENABLE_SZIP=1 added to final package command line
- SD card size increased to 1GB
- adding the Android SDK tools/lib to LD_LIBRARY_PATH may help the emulator to start correctly
Attachment #659836 -
Attachment is obsolete: true
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•