Closed Bug 846206 Opened 7 years ago Closed 3 years ago

Add automated infrastructure for memory-QoS measurements


(Core :: General, defect)

Not set





(Reporter: cjones, Unassigned)



It's pretty well-known how to test for memory /bugs/, like failing to launch apps in random workloads, or using too much memory.  We have most of that infrastructure in place already.

The next step is for memory-management /quality/ testing.  That is, do we keep enough apps alive in the background?  Or perhaps better formulated, do we spend the minimum time launching apps when users ask?

One problem to solve is determining the metrics to measure.  Minimizing launch time over a given workload may be the easiest.  Sum up the latencies for launching all apps, and that's the QoS score.  That measurement is slightly confounded by orthogonal changes in app-launch latency, but we should be able to deal with those independently.

Another problem is finding appropriate workloads.  I think we can learn some lessons from telemetry for this: record (in some appropriate form) user actions over a few hours or days.  Distill out as much detail as we can replay into test scripts for automation.  Then play all these back, and compute a score.

As usual for QoS, we want to "optimize" the results, which if you're being precise means "never regress the QoS scores".
In my experience with MemShrink benchmarks, it's much better for benchmarks to measure the precise thing you care about, instead of something correlated with that.

In this case, if we want a memory QoS benchmark, we should measure how many apps we can keep in the FG or something, not how long it takes to load apps.  Filtering benchmark data is very hard.
The thing we care about is user QoS, which is impossible to measure directly.  (Well, not economically feasible, anyway.)  We certainly can and should find the best proximate measurement(s) though.
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.