Closed Bug 1563576 Opened 5 years ago Closed 4 years ago

Raptor applink tests for Android Gecko vehicles

Categories

(Testing :: Raptor, enhancement, P3)

Version 3
enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1632225

People

(Reporter: nalexander, Assigned: nalexander)

References

Details

Attachments

(7 obsolete files)

We'd like to run "applink" tests for Android Gecko vehicles. Such tests are described in this report.

Technically, we need to:

  1. launch a vehicle with a URL (i.e., am start ... -d URL)
  2. capture measurements from that load, including navigationStart
  3. determine the vehicle main process start time
  4. calculate the "applink X time" as navigationStart - process start + X

This is a sibling to Bug 1522972, which added coldload tests to Raptor.

Priority: -- → P3

verify_android_device requires a real _mach_context in order to
install.

Depends on D37025

This is just bad copy-pasta.

Depends on D37026

This is an unusual method. We find, ideally at the same instant: the
system uptime and a dummy process uptime. From these we determine the
jiffies per second of the system, which is almost always 100 but not
guaranteed to be that value.

From the current time, the system uptime, and the target process
uptime we determine the target process start time.

See the linked posts.

This is inherently racy, so we do it a few times and average. Under
the x86 emulator, it appears that the shell_output() invocations are
also fragile, so we retry a few extra times to accommodate.

Depends on D37027

At its heart, this commit reports navigationStart from the Web
Performance API in on-device milliseconds since the epoch and then
substracts the process start time of the main process.

The way it does this is opaque due to the existing Raptor
architecture. First, it's work to add a new test type, so I overload
coldload, since we're basically doing a single, special cold load.
Second, it's work to add an additional measurement checking function,
so I overload the existing "loadtime" flag and return two values. I
can't tell the harness the process start time because we don't have it
early enough for any of the existing communication mechanisms (we
write configuration JSON before we launch the process, and in any case
that races; and there's no convenient control server endpoint that
actually reveals that data dynamically). We push the process start
time calculation into the control server because it handles killing
the application process, and this needs to happen before that process
is killed. So we store up process start times and then subtract them
when processing the results. Again, it's work to put that filtering
logic anywhere else but in the existing results handler.

There are cleaner ways to achieve this goal, but all of them involve
tidying the division of responsibilities in the harness to an extent
that isn't worth the effort (at this time).

Depends on D37028

This is just a first approach. It works locally with ./mach raptor-test --app=geckoview --binary=org.mozilla.geckoview_example --test=raptor-tp6m-applink-1 --browser-cycles 3, say. I don't really know how to add a new applink "suite", or how to make sure that the firstNavigationStart measurement is filtered, alerted, whatever, but others can pick that up.

rwood: if you could counter with any additional patches for verifying the perf herder output, grouping into a test set, etc, that would be helpful.

As per our discussion earlier today, we're going to put these patches on hold for now, and wait to see how the browsertime MVP work goes. Ideally, if the browsertime MVP goes well and we go ahead with the move to browsertime, it makes better sense to add the applink work on top of browsertime instead of raptor proper. If need be though we can revisit these patches and add this functionality to raptor instead.

Attachment #9076117 - Attachment is obsolete: true
Attachment #9076113 - Attachment is obsolete: true
Attachment #9076116 - Attachment is obsolete: true
Attachment #9076115 - Attachment is obsolete: true
Assignee: nobody → rwood
Attachment #9076115 - Attachment is obsolete: false
Status: NEW → ASSIGNED
Attachment #9076114 - Attachment is obsolete: true

Barret: just throwing out my WIP. This depends on https://github.com/sitespeedio/browsertime/pull/1118, now landed upstream, but not yet in place for m-c. That's blocked on Bug 1607851, which is next in my queue.

Assignee: rwood → nalexander
Depends on: 1607851

Rebased on top of WIP for Node 10 + browsertime 8, some small tweaks, throwing things at the wall: https://treeherder.mozilla.org/#/jobs?repo=try&revision=342c4e83fcf3341bea60f3b1aeeef07e2df6b2e6.

OK, I basically have this working as I expect, but I'm running into the Window Recorder OOM issues that we see elsewhere. I'll update Bug 1585036 with more notes.

This adds a --applink flag sibling to --cold. I would much
prefer to generalize to --pageload-type, but the precedent was set
and "cold" is difficult to update. History also required passing
through the --intent flag, since the per-App intent defaulting
system is not particularly flexible.

To achieve the goal, --applink tests:

  1. configure browsertime/geckodriver to launch using adb am start -a android.intent.action.VIEW -d URL

  2. run a browsertime measurement driver script that knows not to
    navigate to a URL

  3. patches the navigationTiming struct to include a "process shift" so
    that they measure from process start time rather than the usual
    navigation start time.

This doesn't quite work because the browsertime result processing
doesn't use navigationTiming directly. It should be easy to
complete the patching using the template given here -- or to expose
different, applink-specific, metrics to perfherder.

This doesn't yet handle Chrome/chromedriver (that seems to require
patching chromedriver to launch with a URL).

For convenience this requires a very recent sitespeedio/browsertime.
There's nothing fundamental here, but many fixes (including to
--processStartTime) have landed and make this all much easier.

Depends on D59235

Attachment #9076115 - Attachment is obsolete: true
Attachment #9121656 - Attachment is obsolete: true

https://phabricator.services.mozilla.com/D61383 is basically ready for review and for somebody else to push across the line. Tarek, can you work out who that might be?

Try push is at https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=feabc1feeb61e5c9aaa2468f3c303b57178d625c.

Flags: needinfo?(tarek)
Assignee: nalexander → nobody
Status: ASSIGNED → NEW

I'll defer to Dave

Flags: needinfo?(tarek) → needinfo?(dave.hunt)
Assignee: nobody → nalexander
Status: NEW → ASSIGNED

I've added a comment in Phabricator and added #perftest-reviewers for review. If possible, I'd like to avoid commandeering this patch if it's so close to landing so we can avoid additional delays in getting these tests running in CI.

Flags: needinfo?(dave.hunt)

While using this patch I noticed something that I didn't expect in browsertime:
a 2-second (android-only) delay after launch
https://github.com/sitespeedio/browsertime/blob/b16e8fbaa1ba8c1ee6a1c24604a546046db60290/lib/core/engine/iteration.js#L167

This is in the code-path of the applink startup time measurements so I'm expecting that it will affect the measurements.

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #16)

While using this patch I noticed something that I didn't expect in browsertime:
a 2-second (android-only) delay after launch
https://github.com/sitespeedio/browsertime/blob/b16e8fbaa1ba8c1ee6a1c24604a546046db60290/lib/core/engine/iteration.js#L167

This is in the code-path of the applink startup time measurements so I'm expecting that it will affect the measurements.

Good catch. We should guard that with processStartTime, like https://github.com/sitespeedio/browsertime/blob/b16e8fbaa1ba8c1ee6a1c24604a546046db60290/lib/core/engine/iteration.js#L158.

Adding some notes since I tried this cold start applink test.

I haven't been able to cleanly capture visual metrics through this route because of the racing of browsertime vs the browser starting with the given intent.
Even with the delays mentioned in Comment 17 removed from browsertime, I found that the web sites were mostly or entirely loaded by the time browsertime was ready to start measuring.

What I ended up doing was to modify browsertime to create a "warm browser, cold page load applink scenario".

That is, the browser is started and conditioned as in traditional browsertime tests.
And then, instead of making the window.location write, we send the view action intent via adb (piped through browsertime as an option).
https://paste.rs/sny

This works quite well, although it's capturing the scenario where the browser hadn't been free'ed by the OS.

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #18)

Adding some notes since I tried this cold start applink test.

I haven't been able to cleanly capture visual metrics through this route because of the racing of browsertime vs the browser starting with the given intent.
Even with the delays mentioned in Comment 17 removed from browsertime, I found that the web sites were mostly or entirely loaded by the time browsertime was ready to start measuring.

My model for how browsertime works means that this is fine. browsertime really doesn't care how the page load gets initiated, it just needs to be able to read stuff from the DOM after the load has completed. (That's why it's okay to poll/sleep/whatever for pageload to fire, and then wait N seconds.) So "ready to start measuring" is fine: we're just going to be querying the DOM performance timers anyway, which are constant after the various events have completed.

Is my model wrong? Is there something else at play here?

(In reply to Nick Alexander :nalexander [he/him] from comment #19)
...

My model for how browsertime works means that this is fine. browsertime really doesn't care how the page load gets initiated, it just needs to be able to read stuff from the DOM after the load has completed. (That's why it's okay to poll/sleep/whatever for pageload to fire, and then wait N seconds.) So "ready to start measuring" is fine: we're just going to be querying the DOM performance timers anyway, which are constant after the various events have completed.

Is my model wrong? Is there something else at play here?

I should have been more specific: the problem is visual metrics. It was the orange frame insertion that you suspected could be a problem.
Browsertime doesn't have a chance to insert it before the pageload has begun.
Theoretically, I think I could capture visual metrics of the entire cold start: from home screen to app launch to pageload, an see what value could be extracted.

I will be trying out this patch again to collect navigation metrics on cold start app links.

FYI,
I noticed that in mozilla-browsertime the default insertion of the Browsertime webextension adds ~1000ms to Fenix startup results (but curiously doesn't impact fennec).

AFAICT, this isn't a problem in raptor-browsertime because the extension is disabled and there is also a pref to disable extension blocklists (extensions.blocklist.enabled)
https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/testing/raptor/raptor/browsertime/base.py#194

:acreskey, :sparky is this bug still relevant, or have we resolved this via mozperftest?

Flags: needinfo?(gmierz2)
Flags: needinfo?(acreskey)

:davehunt, we can definitely close this -- the mozperftest view tests are exactly applink tests.
They cover geckoview_example and Fenix. (I removed Reference Browser as it was not as useful as hoped).
e.g.
https://searchfox.org/mozilla-central/rev/35245411b9e8a911fe3f5adb0632c3394f8b4ccb/taskcluster/ci/perftest/android.yml#35

Flags: needinfo?(acreskey)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(gmierz2)
Attachment #9123879 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: