1175230 - [raptor] launch test stalls after run 144

Reporter

Description

•

9 years ago

Thought I'd try 1000 RUNS of the launch test on the clock app, and found this... the launch test hangs after run 144 every time.

[Cold Launch: Clock] Starting run 144
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'echo "tap 300 213 1 1" > /data/local/tmp/orng-cmd' +10s
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell '/data/local/orng /dev/input/event0 /data/local/tmp/orng-cmd' +305ms
  raptor:coldlaunch Received performance entry `navigationLoaded` in clock.gaiamobile.org +895ms
  raptor:coldlaunch Capturing application PID: 3114 +0ms
  raptor:coldlaunch Received performance entry `navigationInteractive` in clock.gaiamobile.org +114ms
  raptor:coldlaunch Received performance entry `visuallyLoaded` in clock.gaiamobile.org +183ms
  raptor:coldlaunch Received performance entry `contentInteractive` in clock.gaiamobile.org +1ms
  raptor:coldlaunch Received performance entry `fullyLoaded` in clock.gaiamobile.org +4ms
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'b2g-info' | grep "3114" +1ms
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'log -p i -t PerformanceMemory "clock.gaiamobile.org|uss|13.3"' +241ms
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'log -p i -t PerformanceMemory "clock.gaiamobile.org|pss|18.1"' +3ms
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'log -p i -t PerformanceMemory "clock.gaiamobile.org|rss|30.1"' +3ms
  raptor:coldlaunch Received USS memory entry: 13.3MB +64ms
  raptor:coldlaunch Received PSS memory entry: 18.1MB +5ms
  raptor:coldlaunch Received RSS memory entry: 30.1MB +3ms
[Cold Launch: Clock] Run 144 complete
  mozdevice:util Killing process 3114 +0ms
  mozdevice:command [Executing] ANDROID_SERIAL=3739cec3 adb shell 'kill 3114' +0ms

Full logs here: https://gist.github.com/rwood-moz/1c6174d8ca8254dd66e6

Robert Wood [:rwood]

Reporter

Updated

•

9 years ago

Summary: [raptor] launch test (clock app) stalls after run 144 → [raptor] launch test stalls after run 144

Robert Wood [:rwood]

Reporter

Comment 1

•

9 years ago

Ok, this is strange, the problem is, on run 144, the appLaunch event/mark is never received! And it happens every time... no appLaunch mark - navigationLoaded is the first mark received, followed by all the remaining ones... what...

Flags: needinfo?(eperelman)

Robert Wood [:rwood]

Reporter

Comment 2

•

9 years ago

Watching in the logcat, after 144 test launches, the appLaunch mark is never posted in the log. Even if you manually launch another app after that, that app starts but goes right to the navigationLoaded mark, skipping the appLaunch mark.

:Eli Perelman

Assignee

Comment 3

•

9 years ago

So this comes down to the buffer that User Timing is using.

User Timing currently shares the buffer that Resource Timing has, which is 150 entries. [1]
At 149 entries, no more will be stored until the buffer has been cleared. So, number 150 will not be added [2].

Why failure at 144 runs? Because before the homescreen launches any application it creates 5 of its own markers: navigationLoaded, navigationInteractive, visuallyLoaded, contentInteractive, and fullyLoaded. The homescreen also primes the application before any official runs, so there is one appLaunch prior. 5 + 1 + 143 = 149.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.h#358
[2] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.cpp#877

There isn't an immediate need to fix this, but there are a few solutions:

1. Fix bug 1159003 so User Timing isn't beholden to the Resource Timing buffer. This will happen eventually regardless, so we should probably just wait for this. This is a Gecko change.
2. Do `performance.clearMarks()` from homescreen when we receive the 'resourcetimingbufferfull' event and re-trigger the appLaunch marker. This is a Gaia change.
3. Restart the homescreen or B2G before we get to 144 launches. This is a Raptor change.
4. Don't do more than 144 performance test runs; do smaller batches and aggregate the results. This is a user change. :)

Flags: needinfo?(eperelman)

Hiroyuki Ikezoe (:hiro)

Comment 4

•

9 years ago

The usage of User Timing API in gaia is somewhat weird to me since there is no performance.get(Entries|EntriesByType|EntriesByName) at all. Raptor seems to rely on debug output of User Timing [1], right?

[1] https://hg.mozilla.org/mozilla-central/file/657edac53a66/dom/base/nsPerformance.cpp#l743

I'd basically propose to use the API in proper way. i.e mark, getEntries and clearMarks.
So we should take '2' first. then output User Timing info to log in somewhere and clearMarks there.

Hiroyuki Ikezoe (:hiro)

Comment 5

•

9 years ago

Even if the User Timing buffer size has no limitation, performance will get worse without clearMarks and clearMeasures because sort operation is done when inserting each User Timing entry.

:Eli Perelman

Assignee

Comment 6

•

9 years ago

@hiro, I can definitely see how our usage of User Timing is strange within Gaia, but there are reasons for it, namely because we don't have the APIs necessary to do some of the things we need to replace the functionality we currently have:

1. Generating performance metrics across processes/window scopes. For example, we use a custom mark convention in Homescreen [1] to denote that a mark has a context different than the one it was generated in.

2. Related to 1, there is no mechanism to compare performance entries generated across processes, so we need the epoch of an entry in addition to its HR time [2][3]. To overcome this we need to provide something to make entries comparable across time, which I do believe there is work towards (the conversation mostly relates to workers) [4].

3. In order to performance test Gaia from within Gaia, we would need access to an API that would capture the performance marks and measures (probably via getEntries and friends) from different applications and aggregate their data. This would most likely live within System or Gecko since we wouldn't want to duplicate it across every application. Unfortunately this is clunky, since its not really their concern to performance test themselves, and we would need a way to invoke these APIs at our whim. In addition, it would make generic testing very difficult as we wouldn't be able to easily define when to start and end testing without additional APIs.

4. Log parsing, while it has its own set of evils, enables us to performance test in a decoupled manner. This enables us to do performance testing by minimally affecting performance.

Believe me, if I had the APIs in place, I would adopt some changes. I proposed these in the past[5], and some are still in progress at web-perf[6], but unfortunately we still aren't ready for that. Hope some of my babbling makes sense as to why we are doing what we are. :D

(In reply to Hiroyuki Ikezoe (:hiro) from comment #5)
> Even if the User Timing buffer size has no limitation, performance will get
> worse without clearMarks and clearMeasures because sort operation is done
> when inserting each User Timing entry.

You are totally right, we should probably do some sort of clearMarks in the Homescreen on occasion.

[1] https://github.com/mozilla-b2g/gaia/blob/fc9cb65bfe557063792e0cf10f7939f03f0532e5/shared/elements/gaia_grid/js/items/mozapp.js#L345
[2] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.cpp?from=PERFLOG#749
[3] https://github.com/mozilla-b2g/gaia/blob/be75625dda4a6d8e5191174267ea88cab7356d16/tools/raptor/lib/suite/phase.js#L362
[4] https://github.com/w3c/hr-time/pull/6
[5] https://groups.google.com/forum/#!searchin/mozilla.dev.b2g/performance$20api/mozilla.dev.b2g/NSnBDfmRG9o/j5qUsDoCfVIJ
[6] https://github.com/w3c/user-timing/issues/3

:Eli Perelman

Assignee

Comment 7

•

9 years ago

Attached file Link to Github pull-request: https://github.com/mozilla-b2g/raptor/pull/11 — Details

This patch will cause Raptor to clear the performance marks and measures buffer for the Homescreen every 50 app launches to keep the buffer clean and speedy. I tested this by running coldlaunch against the Clock app 200 times, and it worked just fine. :)

Attachment #8640488 - Flags: review?(rwood)

:Eli Perelman

Assignee

Updated

•

9 years ago

Assignee: nobody → eperelman

Target Milestone: --- → FxOS-S4 (07Aug)

Robert Wood [:rwood]

Reporter

Comment 8

•

9 years ago

Comment on attachment 8640488 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/raptor/pull/11

R+ (just a nit). Was able to successfully run the coldlaunch test with RUNS=150

Attachment #8640488 - Flags: review?(rwood) → review+

:Eli Perelman

Assignee

Updated

•

9 years ago

Status: NEW → ASSIGNED

:Eli Perelman

Assignee

Comment 9

•

9 years ago

In master: https://github.com/mozilla-b2g/raptor/commit/0c76357f24487365bb5d85e84526effdbc907dcb
In npm: @mozilla/raptor v1.4.2

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

[raptor] launch test stalls after run 144

Categories

(Firefox OS Graveyard :: Gaia::PerformanceTest, defect)

Tracking

(Not tracked)

People

(Reporter: rwood, Assigned: Eli)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Comment 9

Attachment

General

Description

File Name

Content Type