Closed Bug 1127952 Opened 9 years ago Closed 9 years ago

[raptor] emulator_launch_test intermittent hang after killing the launched app

Categories

(Firefox OS Graveyard :: Gaia::PerformanceTest, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rwood, Assigned: rwood)

Details

[Test] Starting run 3
  mozdevice:command [Executing] ANDROID_SERIAL=emulator-5554 adb shell 'echo "tap 200 142 1 1" > /data/local/tmp/orng-cmd' +6s
  mozdevice:command [Executing] ANDROID_SERIAL=emulator-5554 adb shell '/data/local/orng /dev/input/event0 /data/local/tmp/orng-cmd' +406ms
  raptor:coldlaunch Received performance entry `navigationLoaded` +13s
  raptor:coldlaunch Received performance entry `navigationInteractive` +43ms
  raptor:coldlaunch Received performance entry `visuallyLoaded` +895ms
  raptor:coldlaunch Received performance entry `contentInteractive` +8ms
  raptor:coldlaunch Received performance entry `fullyLoaded` +66ms
  mozdevice:logging Writing to log +7s
  mozdevice:command [Executing] ANDROID_SERIAL=emulator-5554 adb shell 'log -p i -t GeckoConsole "Memory Entry: Clock|$(b2g-info | grep "Clock")"' +0ms
[Test] Run 3 complete
  mozdevice:util Killing process 1922 +1s
  mozdevice:command [Executing] ANDROID_SERIAL=emulator-5554 adb shell 'kill 1922' +0ms

Raptor framework hangs at this point. Happens quite often after various iterations of the emulator_launch_test. Probably a timing issue, from first inspection it looks like it is attempting to start the next test iteration before the home screen has appeared after killing the app, but need to verify further.

Not able to reproduce the same issue on the device (using corresponding launch_test).
Note from IRC:

Eli> rwood: hey, have you tried messing with this settimeout for that new bug? https://github.com/mozilla-b2g/raptor/blob/master/lib/suite/cold-launch.js#L236
(In reply to Robert Wood [:rwood] from comment #1)
> Note from IRC:
> 
> Eli> rwood: hey, have you tried messing with this settimeout for that new
> bug?
> https://github.com/mozilla-b2g/raptor/blob/master/lib/suite/cold-launch.
> js#L236

This doesn't make a difference. This bug is killing the automation and must be fixed ASAP.
Investigating this further, I notice in logcat that when this stall happens, the app has already been killed, but then the appLaunch mark / event is received. The appLaunch mark doesn't appear at the time of app launch; even though the app is launched successfully; and then after the fullyLoaded mark is received, then the appLaunch is received (and then the framework completely stalls at that point).

I/Clock   ( 3507): Content JS LOG: Performance Entry: mark|navigationLoaded|4182.950792|0|1423770836622 
I/Clock   ( 3507):     at logEntry/< (app://clock.gaiamobile.org/js/startup.js:72:8)
I/Clock   ( 3507): Content JS LOG: Performance Entry: mark|navigationInteractive|4607.441052|0|1423770837047 
I/Clock   ( 3507):     at logEntry/< (app://clock.gaiamobile.org/js/startup.js:72:8)

I/Clock   ( 3507): Content JS LOG: Performance Entry: mark|visuallyLoaded|5461.177425|0|1423770837901 
I/Clock   ( 3507):     at logEntry/< (app://clock.gaiamobile.org/js/startup.js:72:8)
I/Clock   ( 3507): Content JS LOG: Performance Entry: mark|contentInteractive|5462.957367|0|1423770837902 
I/Clock   ( 3507):     at logEntry/< (app://clock.gaiamobile.org/js/startup.js:72:8)
I/Clock   ( 3507): Content JS LOG: Performance Entry: mark|fullyLoaded|5464.352303|0|1423770837904 
I/Clock   ( 3507):     at logEntry/< (app://clock.gaiamobile.org/js/startup.js:72:8)
F/libc    ( 3575): Fatal signal 13 (SIGPIPE) at 0x00000df7 (code=0)
I/DEBUG   ( 3533): debuggerd committing suicide to free the zombie!
I/GeckoConsole( 3579): Memory Entry: Clock| 
I/DEBUG   ( 3578): debuggerd: Jan 30 2015 04:43:45
I/Gecko   ( 2783): 
I/Gecko   ( 2783): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko   ( 2783): 
I/Gecko   ( 2783): [Parent 2783] WARNING: pipe error (121): Connection reset by peer: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 456
I/Homescreen( 2887): Content JS LOG: Performance Entry: mark|appLaunch@Clock|145715.70824|0|1423770831702 
I/Homescreen( 2887):     at logEntry/< (app://verticalhome.gaiamobile.org/shared/js/usertiming.js:38:8)
Eli, looks like the perf marks get out of order sometimes when running on the emulator (see comment 3). Any ideas on how to solve this?
Flags: needinfo?(eperelman)
(In reply to Robert Wood [:rwood] from comment #4)
> Eli, looks like the perf marks get out of order sometimes when running on
> the emulator (see comment 3). Any ideas on how to solve this?

Good news: I ramped up my local ubuntu VM from dual-core to quad-core and that fixed it. Able to run the emulator_launch_test with RUNS=100 and it worked great!

Eli: Do you think it is fine to close this as 'wontfix' in that the emulator requires a minimum amount of CPU cores to run at a reliable speed? This would require ramping up my AWS instance also. Or do you think this is an issue that needs to be fixed in raptor somehow?
Flags: needinfo?(eperelman)
I've been thinking about this, and even though it may appear to be fixed for now, I'm wondering if this is just an issue with using console.log. The user timing shim logs entries async, which may cause things to fire out of order possibly. When bug 1129041 lands, it would be interesting to see if this issue still manifested on lower-spec configurations.
Thanks Eli.

If bug 1129041 doesn't make any difference, that means we need a quad core cpu to run the emulator sufficiently for raptor. I've been using an m3.medium instance on AWS however it is only single core, so I cannot reliably run tests. This means at minimum on aws we would require an m3.xlarge instance (quad core) which also quadruples the price (at least for on-demand instances, could maybe get a better price using spot instances, but still).
No longer blocks: 1112116
The tests still time out when run on this same instance (m3.medium, single core). Putting this down to requiring a faster machine for the emulator (i.e. quad core).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.