Closed Bug 833917 Opened 11 years ago Closed 11 years ago

Try to see if not doing PGO on one Windows Nightly moves any telemetry needles by a noticeable amount

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Assigned: vladan)

References

Details

As we're trying to determine whether or not turning off PGO could affect our users, we can do a small A/B testing on Telemetry by turning off PGO for a day.  I asked Vladan to help here and he kindly agreed.
Turning off PGO, comparing telemetry, and then noticing that things don't change means either:

- PGO doesn't matter;
- we're not measuring things that are affected by PGO with Telemetry.

I think the latter is the more likely explanation.
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Turning off PGO, comparing telemetry, and then noticing that things don't
> change means either:
> 
> - PGO doesn't matter;
> - we're not measuring things that are affected by PGO with Telemetry.
> 
> I think the latter is the more likely explanation.

Yes, agreed.  I'm mostly looking for reasons on why we absolutely should not turn off PGO.

BTW, Vladan, I just landed bug 833915, so tonight's Nightly will not have PGO/LTCG.
I started looking at Telemetry Evolution data for various histograms around Jan 24th, expecting noticeable jumps, but there are just not enough people on Nightly & there is too much noise to make a confident conclusion. This is in addition to different patches having made it into the different days' builds & different workloads on different days. 
Overall, the non-PGO build's averages are all well within the normal band of values, but I don't feel this fact is enough to make any kind of conclusion.

You can see some of the histograms I looked at below before I gave up on the excercise. I chose measures of times that didn't involve I/O operations. All values are from Windows machines.

The "->" notation shows the mean timing values in order: 

Jan 23rd (PGO) Nightly -> Jan 24th (Not PGO) -> Jan 25th (PGO)

1) Measurements showing regression on non-PGO Nightly but well within noise band:

SIMPLE_MEASURES_MAIN: 2444.85 -> 2607.2 -> 2090.95 (similar for medians)
SIMPLE_MEASURES_CREATETOPLEVELWINDOW: 4457.15 -> 4631.6 -> 3668.65
SIMPLE_MEASURES_FIRSTPAINT: 6104.2 -> 6167.45 -> 5175.95
SIMPLE_MEASURES_FIRSTLOADURI: 6236.7 -> 6383.05 -> 5435.35
SIMPLE_MEASURES_SESSIONRESTORED: 6392.95 -> 6510.35 -> 5520.5
SIMPLE_MEASURES_SHUTDOWNDURATION: 1607.7 -> 1813.3 -> 1405.95
GC_MS: 359.1 -> 363 -> 336.65
CYCLE_COLLECTOR: 15.7 -> 18.7 -> 16.6
GRADIENT_DURATION: 51.15 -> 58.65 -> 43.4
IMAGE_DECODE_LATENCY_US: 1146.4 -> 1283.4 -> 992.5
IMAGE_DECODE_TIME: 1583.45 -> 1988.15 -> 1531.1
XUL_FOREGROUND_REFLOW_MS: 0.15 -> 0.25 -> 0.2

2) Measurements showing significant changes between days, but not suggesting a regression on 24th: 

SIMPLE_MEASURES_START: 925.15 -> 778 -> 770.85 
GC_MAX_PAUSE_MS: 46.5 -> 47.2 -> 55.2
FX_TAB_SWITCH_UPDATE_MS: 19 -> 19.35 -> 20.5
FX_TAB_SWITCH_TOTAL_MS: 73.1 -> 72.55 -> 75.55
FX_THUMBNAILS_CAPTURE_TIME_MS: 25.75 -> 26.8 -> 28.5
HTML_FOREGROUND_REFLOW_MS: 0.35 -> 0.8 -> 0.9
EVENTLOOP_UI_LAG_EXP_MS: 148.8 -> 133.1 -> 138.65
  * These averages are scary
IMAGE_DECODE_SPEED_JPEG: 11686.85 -> 11317.6 -> 14064.25
  * Similar "improvement" for IMAGE_DECODE_SPEED_GIF
  * IMAGE_DECODE_SPEED_PNG shows "regression" on 24th
  * Probably very workload sensitive
Thanks a lot, Vladan, this is great analysis.  The conclusion here is that we can't draw any conclusions, which is a valuable datapoint.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to :Ehsan Akhgari (Away 2/7-2/15) from comment #5)
> Thanks a lot, Vladan, this is great analysis.  The conclusion here is that
> we can't draw any conclusions, which is a valuable datapoint.

Even more than that, it's evidence against the hypothesis "PGO makes a large difference". 

I sort of wondered if additionally, the fact that any differences seem to have roughly the same magnitude as the noise would mean that users also would find the differences to be lost in the noise. But I believe that is not valid, because day-to-day telemetry noise could have sources that are entirely different to the noise any user sees. E.g., if telemetry noise is from a changing set of users.
You need to log in before you can comment on or make changes to this bug.