Closed Bug 1650713 Opened 4 years ago Closed 4 years ago

14090.68 - 16274.9% a11yr (linux64-shippable, linux64-shippable-qr, macosx1014-64-shippable, windows10-64-shippable, windows10-64-shippable-qr, windows7-32-shippable) regression on push d2cfd29e2bb556bdcb383d8ed9e2c1a31a4cf7bf (Thu July 2 2020)

Categories

(Testing :: Performance, defect, P3)

defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox78 unaffected, firefox79 unaffected, firefox80 fixed)

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- fixed

People

(Reporter: alexandrui, Assigned: kanishk509)

References

(Regression)

Details

(4 keywords)

Perfherder has detected a talos performance regression from push d2cfd29e2bb556bdcb383d8ed9e2c1a31a4cf7bf. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

16275% a11yr windows10-64-shippable opt e10s stylo 146.56 -> 23,999.35
15965% a11yr windows7-32-shippable opt e10s stylo 147.21 -> 23,650.34
15828% a11yr windows7-32-shippable opt e10s stylo 148.38 -> 23,633.05
15799% a11yr macosx1014-64-shippable opt e10s stylo 189.38 -> 30,110.55
15792% a11yr windows10-64-shippable-qr opt e10s stylo 153.16 -> 24,339.56
14120% a11yr linux64-shippable-qr opt e10s stylo 179.07 -> 25,464.53
14091% a11yr linux64-shippable opt e10s stylo 177.64 -> 25,208.64

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(kanishk509)
Flags: needinfo?(gmierz2)

Sparky, Kanishk, this is a huge regression. Please look into fixing this and make sure you avoid those magnitudes into future patches. Thanks!

There seems to be some problem when tpRecordTime is called from a11y/dhtml.html and a11y/tablemutation.html after we shifted to using performance.now() values.

function postProcessingRecord(s) {
  // alert(window.performance.now() - s);
  let delta = window.performance.now() - s;
  TalosContentProfiler.pause("tablemutation finish", true).then(() => {
    tpRecordTime(delta, s);
  });
}

I looked at the delta values specifically and they are around 300 and 80 (for dhtml and tablemutation respectively) on my local machine, which looks correct. So the problem seems to be with passing s which is a performance.now() value in tpRecordTime(delta, s).

I searched for tpRecordTime on searchfox and I'm a bit confused as to which definition of tpRecordTime is being called from dhtml and tablemutation.

I'll try to figure it out and hopefully fix this soon.

Flags: needinfo?(kanishk509)

I think I know what the problem is.
In pageloader.js :

function _loadHandlerCapturing() {
  failTimeout.clear();

  if (!(plPageFlags() & TEST_DOES_OWN_TIMING)) {
    dumpLine(
      "tp: Capturing onload handler used with page that doesn't do its own timing?"
    );
    plStop(true);
  }

  if (useMozAfterPaint) {
    if (gStartTime != null && gStartTime >= 0) {
      gTime = window.performance.now() - gStartTime;
      gStartTime = -1;
    }
  }

  if (gTime !== -1) {
    plRecordTime(gTime);
    TalosParentProfiler.pause("capturing load handler fired");
    gTime = -1;
    recordedName = null;
    setTimeout(plNextPage, delay);
  }
}

useMozAfterPaint is true for these 2 tests. So gTime is calculated as gTime = window.performance.now() - gStartTime and the delta that we pass in tpRecordTime(delta, s) is ignored.

Problem is window.performance.timing.navigationStart of pageloader.js and dhtml.html/ tablemutation.html differs by a lot (I'm not sure why, are they different windows?) and hence gTime = window.performance.now() - gStartTime is incorrect (where gStartTime is the start time passed from dhtml/tablemutation using performance.now()).

Using delta (which we calculate in dhtml/tablemutation) directly seems to give correct results:

11:35:03     INFO -  PID 20501 | ------- Summary: start -------
11:35:03     INFO -  PID 20501 | Number of tests: 2
11:35:03     INFO -  PID 20501 |
11:35:03     INFO -  PID 20501 | [#0] dhtml.html  Cycles:25  Average:285.76  Median:285.80  stddev:13.77 (4.8%)  stddev-sans-first:7.53
11:35:03     INFO -  PID 20501 | Values: 229.9  278.9  276.6  287.9  281.1  283.6  287.6  283.3  285.8  285.8  297.5  289.7  298.3  281.1  284.1  278.6  283.5  297.6  299.8  294.7  286.1  294.5  288.0  285.2  305.0
11:35:03     INFO -  PID 20501 |
11:35:03     INFO -  PID 20501 | [#1] tablemutation.html  Cycles:25  Average:88.28  Median:85.54  stddev:6.43 (7.5%)  stddev-sans-first:6.45
11:35:03     INFO -  PID 20501 | Values: 82.4  81.6  83.7  107.6  87.7  94.3  85.5  84.3  84.5  85.2  88.8  80.2  86.7  81.8  92.2  94.2  99.7  98.5  84.6  87.6  89.4  87.5  84.8  89.1  84.9
11:35:03     INFO -  PID 20501 | -------- Summary: end --------

One fix can be to pass window.performance.timing.navigationStart + window.performance.now() as start time from dhtml/tablemutation and do gTime = window.performance.timing.navigationStart + window.performance.now() - gStartTime in pageloader.js. We may have to make these changes in other places also.

:sparky can you advise me on the next steps?

Problem is window.performance.timing.navigationStart of pageloader.js and dhtml.html/ tablemutation.html differs by a lot (I'm not sure why, are they different windows?)

Yes, pageloader and dhtml/tablemutation are different windows and hence their performance.now() values are not correctly relative to each other.
Is there a way to access pageloader's window.performance object in dhtml/tablemutation?

Set release status flags based on info from the regressing bug 1380676

:alexandrui, can you get this patch backed out?

(In reply to Kanishk from comment #3)

One fix can be to pass window.performance.timing.navigationStart + window.performance.now() as start time from dhtml/tablemutation and do gTime = window.performance.timing.navigationStart + window.performance.now() - gStartTime in pageloader.js. We may have to make these changes in other places also.

:sparky can you advise me on the next steps?

:kanishk, let's try the fix that you're suggesting here and see how it looks.

Flags: needinfo?(gmierz2) → needinfo?(aionescu)

Should I update the existing revision?

Flags: needinfo?(gmierz2)

:kanishk, yup, it'll be backed out and reopened shortly. After that you can update it.

Flags: needinfo?(gmierz2)
Flags: needinfo?(aionescu)
Priority: -- → P3

(In reply to Greg Mierzwinski [:sparky] from comment #6)

:alexandrui, can you get this patch backed out?

:aryx can you help me backing this out, please?

Flags: needinfo?(aryx.bugmail)

Nevermind, I just found out it was already backed out.

Flags: needinfo?(aryx.bugmail)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Assignee: nobody → kanishk509
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.