Closed Bug 1807228 Opened 1 year ago Closed 1 year ago

50.72% google-slides ContentfulSpeedIndex (Linux, OSX, Windows) regression on Fri December 23 2022

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
110 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox108 --- unaffected
firefox109 --- unaffected
firefox110 + fixed

People

(Reporter: afinder, Assigned: arai)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(3 files, 4 obsolete files)

Perfherder has detected a browsertime performance regression from push 7da2203b8c4f8a6c590ce3d85e1b72eca109716c. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
116% google-slides fcp linux1804-64-shippable-qr bytecode-cached fission warm webrender 284.15 -> 613.25 Before/After
109% google-slides FirstVisualChange windows10-64-shippable-qr bytecode-cached fission warm webrender 316.50 -> 661.00 Before/After
104% google-slides FirstVisualChange linux1804-64-shippable-qr bytecode-cached fission warm webrender 316.67 -> 646.67 Before/After
100% google-slides fcp windows10-64-shippable-qr bytecode-cached fission warm webrender 309.71 -> 618.83 Before/After
53% google-slides SpeedIndex linux1804-64-shippable-qr bytecode-cached fission warm webrender 502.04 -> 770.33 Before/After
52% google-slides PerceptualSpeedIndex windows10-64-shippable-qr bytecode-cached fission warm webrender 526.58 -> 799.00 Before/After
51% google-slides SpeedIndex windows10-64-shippable-qr bytecode-cached fission warm webrender 530.33 -> 799.33 Before/After
47% google-slides PerceptualSpeedIndex linux1804-64-shippable-qr bytecode-cached fission warm webrender 526.50 -> 776.17 Before/After
34% google-slides ContentfulSpeedIndex linux1804-64-shippable-qr bytecode-cached fission warm webrender 596.08 -> 800.50 Before/After
29% google-slides ContentfulSpeedIndex windows10-64-shippable-qr bytecode-cached fission warm webrender 671.29 -> 864.67 Before/After
26% google-slides ContentfulSpeedIndex windows10-64-shippable-qr bytecode-cached fission warm webrender 696.54 -> 879.58 Before/After
4% yahoo-mail loadtime macosx1015-64-shippable-qr bytecode-cached fission warm webrender 239.92 -> 249.54
4% facebook fcp macosx1015-64-shippable-qr fission warm webrender 247.29 -> 257.17
3% facebook loadtime macosx1015-64-shippable-qr fission warm webrender 207.12 -> 212.75

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=36550

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(arai.unmht)

Hi arai! Could you please also let us know which of the defects mentioned above (1804613, 1804837, 1804842, 1805149) caused the regression ? Thanks!

Flags: needinfo?(arai.unmht)

I'm not sure if I can solve the issue within 3 days.
if I couldn't, please backout the patches for the following bugs, which all has dependencies:

bug 1804842, bug 1804613, bug 1805149, bug 1804849, bug 1804837, bug 1805142, bug 1805645, bug 1806366

Thanks for pointing out this regression! With holidays, it may take us a about a week to determine a clear answer about what should be backed out. Fortunately, the next merge is Jan 17 so we should be able to resolve this with time to see improvements in results still.

the regression comes from bug 1805149 patches, that moves the bytecode de-duplication from the compilation step to the instantiation step,
which can move some task from off-thread to the main thread and can affect the performance:

"test-linux1804-64-shippable-qr/opt-browsertime-tp6-bytecode-firefox-google-slides" fcp warm score

https://hg.mozilla.org/mozilla-central/rev/343425669f57
https://hg.mozilla.org/mozilla-central/rev/ac0c7c9bbd8e

I'm still not sure why the score gets doubled.
I'll check what's happening here next week.

No longer regressed by: 1804613, 1804837, 1804842

Set release status flags based on info from the regressing bug 1805149

I don't see that huge regression in the profile marker or the screenshots.

In the comment #0's first 2 links

  • Before : FirstContentfulPaint: 704ms
  • After : FirstContentfulPaint : 335ms

In the comment #4, Bug 1804613 vs Bug 1805149

am I missing something?

Attached image frame comparison

so far, what I observed is the following:

  • the profile is for the different run than the runs used for the final score
  • with profile, there's not much regression observed so far
  • without profile, the patch makes the score bimodal, with:
    • no regression (fcp ~ 250ms)
    • much regression (fcp ~ 650ms)
  • even with regression on fcp, FirstVisualChange, etc, the total load time isn't affected
  • the visual of the first frame seems to be same between regressed and not-regressed case
  • the regressed case doesn't update on some frame where not-regressed case does (see attachment)

I'll see if I can reproduce the regressed case in profile run.

The bug is marked as tracked for firefox110 (nightly). We have limited time to fix this, the soft freeze is in 14 days. However, the bug still isn't assigned.

:sdetar, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)
Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED
Flags: needinfo?(sdetar)
Severity: -- → S2
Priority: -- → P1

I don't see any notable difference in the JS-related behavior, and the code that's moved from off-thread to main-thread (SharedImmutableScriptData::shareScriptData called by JSScript::fullyInitFromStencil) doesn't appear in the profile,
between Navigation::Start and "fcp" range.

So this should be a scheduling issue.

See Also: → 1808824

I filed Bug 1808824 to seek out more information since this issue seems to be preexisting. The 400 ms change is due to slight timing changes leading to us still being paint-suppressed during the vsync, and then the DOM code deciding to execute 400 ms of scripts back-to-back without allowing any vsync to process until they are done.

This seems to be a very specific timing issue that happens to depend on how the test case is setup and the hardware it runs on. Slight changes to configuration give the same very different results. I don't think we should back out any patches.

Tooru, are you planning a fix before the merge or should we target a uplift to beta? Thanks

Flags: needinfo?(arai.unmht)

The regression on yahoo-mail appears to be real. The stack samples seem pretty unreliable, but looking at markers I can see a clear 10ms of additional stencil instantiate time with the changes. This screenshot is looking at the gaps between script-execution in a single nsHtml5ExecutorReflusher. This 10ms is the matches the pageload regression.

https://s.yimg.com/nq/nr/js/mail_app_es6_vendor_DJd61cYxTk1IkcyOrb6RX_nl4UD9dabo4MtjaSYMW4k_v1.js : 3.7 ms -> 7.3 ms
https://s.yimg.com/nq/nr/js/mail_app_es6_093W1xWOMPPEBgMia6YydArbgLeqNX3lfXmR7pgDliA_v1.js : 19 ms -> 25 ms

(In reply to Pascal Chevrel:pascalc from comment #15)

Tooru, are you planning a fix before the merge or should we target a uplift to beta? Thanks

fix for google-slides is landed in bug 1808824, and those 20-100% regressions will go away.
I'm going to address the remaining 5% regression here

forgot to mention about the fix.
I'm planning a fix before the merge

tested a patch to apply compilation-local bytecode de-deduplication, and it slightly improves the yahoo-mail loadtime, but given:

  • this is slightly risky, given it can regress other case, and it may need more tuning before landing
  • fcp regresses even after bug 1808824, due to bimodal-ness and bad case increases with the change

so, I'll revert the bug 1805149 patch here and move the compilation-local de-duplication patch to a followup bug.

Flags: needinfo?(arai.unmht)
See Also: → 1810160
See Also: → 1810161
Pushed by arai_a@mac.com:
https://hg.mozilla.org/integration/autoland/rev/f2457a464d03
Revert bug 1805149 and temporarily revive BytecodeEmitter.cx. r=tcampbell
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch

== Change summary for alert #36740 (as of Mon, 16 Jan 2023 20:37:03 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
3% yahoo-mail loadtime macosx1015-64-shippable-qr bytecode-cached fission warm webrender 249.50 -> 241.62 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=36740

Now ScriptStencil to SharedImmutableScriptData is not 1:1, and ScriptStencil
holds SharedImmutableScriptDataIndex to point the SharedImmutableScriptData
inside SharedDataContainer.

SharedDataContainer is simplified to use either a single pointer or a vector.

Depends on D167507

Depends on D167509

Comment on attachment 9313491 [details]
Bug 1807228 - Part 1: Perform SharedImmutableScriptData::shareScriptData during instantiation. r?tcampbell!

Revision D167507 was moved to bug 1810160. Setting attachment 9313491 [details] to obsolete.

Attachment #9313491 - Attachment is obsolete: true

Comment on attachment 9313492 [details]
Bug 1807228 - Part 2: Perform local bytecode de-duplication during compilation. r?tcampbell!

Revision D167508 was moved to bug 1810160. Setting attachment 9313492 [details] to obsolete.

Attachment #9313492 - Attachment is obsolete: true

Comment on attachment 9313493 [details]
Bug 1807228 - Part 3: Perform global bytecode de-duplication at once. r?tcampbell!

Revision D167509 was moved to bug 1810160. Setting attachment 9313493 [details] to obsolete.

Attachment #9313493 - Attachment is obsolete: true

Comment on attachment 9313494 [details]
Bug 1807228 - Part 4: Remove AutoLockScriptData. r?tcampbell!

Revision D167510 was moved to bug 1810160. Setting attachment 9313494 [details] to obsolete.

Attachment #9313494 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: