Closed Bug 1602808 Opened 11 months ago Closed 7 months ago

3.06 - 7.93% startup_about_home_paint / tabswitch (linux64-shippable-qr, windows10-64-shippable, windows10-64-shippable-qr) regression on push bff9eef8191f12cee9af275966a563517dcdf429 (Fri November 22 2019)

Categories

(Firefox :: Tabbed Browser, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 76
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- unaffected
firefox72 --- fixed
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- fixed

People

(Reporter: marauder, Assigned: zbraniecki)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords)

Attachments

(2 files, 3 obsolete files)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=bff9eef8191f12cee9af275966a563517dcdf429

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

8% tabswitch windows10-64-shippable opt e10s stylo 9.85 -> 10.63
7% tabswitch windows10-64-shippable-qr opt e10s stylo 7.49 -> 8.00
7% tabswitch windows10-64-shippable-qr opt e10s stylo 7.49 -> 7.98
6% tabswitch linux64-shippable-qr opt e10s stylo 7.23 -> 7.68
3% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,226.62 -> 1,264.17

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=24403

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Component: Performance → General
Flags: needinfo?(gandalf)
Product: Testing → Firefox
Version: Version 3 → unspecified

I suspect this will get fixed with bug 1560038.

Depends on: 1560038
Flags: needinfo?(gandalf)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #1)

I suspect this will get fixed with bug 1560038.

Why? The actual time spent in fluent should be really minimal, the strings are really simple. This is now on beta - can we back the original bug out of beta still?

Flags: needinfo?(gandalf)

Why? The actual time spent in fluent should be really minimal, the strings are really simple.

It was my first guess. But you're right that it shouldn't be the bulk of the reason.

Not only that, the calls are async now, so I'm confused why would it impact talos at all, unless it measures title switch.

This is now on beta - can we back the original bug out of beta still?

Who is the right person to make the call?

I looked at the tabswitch and it looks like we shouldn't impact it with the test because we don't wait for the title translation to complete to fire https://searchfox.org/mozilla-central/source/browser/base/content/tabbrowser.js#1248 which is what is measured against.

I can investigate further if we want to see if we can keep the patch in, or request backout if not.

Flags: needinfo?(gandalf) → needinfo?(gijskruitbosch+bugs)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #3)

Not only that, the calls are async now, so I'm confused why would it impact talos at all, unless it measures title switch.\

Perhaps the calls are queueing up localization work if/when a lot of tabswitches happen in a short space of time, and that ends up stealing CPU cycles from other work? A gecko profile job could help with figuring this out.

This is now on beta - can we back the original bug out of beta still?

Who is the right person to make the call?

Well, there's no user benefit to the patch, we're coming up on the holiday season when loads of people will be away, so unless there's a user upside I'm missing, or serious l10n issues if we back it out, I suggest we back it out of beta, at least (also considering the functional regression with long content titles), to give ourselves time to investigate and fix the issues on nightly.

I can investigate further if we want to see if we can keep the patch in, or request backout if not.

I think we should at least back out of beta. Up to you what you want to do about nightly, but we should either fix or have a very clear explanation about what's going on here (and why it's OK to ship with it) before nightly 73 goes to beta, I think.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,1922475,1,1&timerange=5184000&zoom=1574330076850,1574620623713,6.805521427183259,8.883766680215567 looks visible to me.

That's autoland. I was looking for it on central, but I guess the frequency is lower so it's harder to pinpoint in those graphs maybe.

I requested the backout from beta.

Flags: needinfo?(gandalf)

backed out from beta. Will investigate in nightly tomorrow.

Assignee: nobody → gandalf
Status: NEW → ASSIGNED

Given that this seems tabswitching-related, as that updates the window title from tabbrowser.js, moving to tabbed browser.

Component: General → Tabbed Browser
Priority: -- → P1

Hi Zibi, is this still on track to land in time for Fx73? We've got about 2 weeks left this cycle.

Flags: needinfo?(gandalf)

Unlikely. I'm still profiling this, against the updated to fluent-rs patchset. Should have the decision on 73 by tomorrow.

I measured the impact of fluent-rs patchset on this change and unfortunately I don't think anymore that it will fix the issue.

Here's the comparison between m-c and reverted title patch - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2dce7876974e1659603cc0b791f9dab14bb90d05&newProject=try&newRevision=bf0fac4b823eef06f63aadde8ec4812ed764c55b&framework=1

Here's the comparison between m-c and the fluent-rs patchset - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2dce7876974e1659603cc0b791f9dab14bb90d05&newProject=try&newRevision=0a87399cdc66b7f5fa674061b7c37d5ddc33570c&framework=1

I'd like to get some help in profiling this patch https://hg.mozilla.org/try/rev/0465fc4df36bf2ae6673c39486406c5ddc516679 to understand why does it affect tabswitch so much (it doesn't seem to be raw fluent perf like parsing on I/O). I suspect something about missing the frame, but I'm not sure why.
Mike - can you help or NI someone who might be able to?

No longer depends on: 1560038
Flags: needinfo?(gandalf) → needinfo?(mconley)

Sorry for the slow response on this - been prepping for All Hands.

I've spun up a number of try builds to try to box the problem in a bit. I've narrowed the issue down to this section of the backout patch which, when applied, causes the improvement:

https://hg.mozilla.org/try/rev/d3c5fae7981f343595de28eecc4d492f3257801b

Now that I've boxed it in this far, I'm going to see if I can add some instrumentation to see where the time is going without the patch applied.

It looks like we're spending time deciding the JSON string that was attached to the element with setAttributes when calling the translate function. I think this also causes us to create garbage that gets GC'd which also takes a little bit of time.

I believe I was able to eliminate the regression on try by applying the title change inside of a requestIdleCallback function (which cancelled any previous callbacks if called in quick succession). So we might try that.

Here's my try push comparison:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3e1d1bca59a1a9d2150ef351904bba066f173a0f&newProject=try&newRevision=39a947582359afe13f7ea19b67d8c1ed637e01e8&framework=1

Sorry again for how long this took. All Hands prep and All Hands itself kepte distracted from my needinfos.

Flags: needinfo?(mconley)

Thank you! That sounds like a viable fix!

Attachment #9117503 - Attachment is obsolete: true

Hmm, ok, I tried to build something on top of what :mconley found, but that doesn't seem to be complete.

In particular, while I did successfully delay the title change, the consequitive runs of the updateTitlebar will put the previous promise in limbo till it times out.

The issue is that win.convert() has to have the title set here: https://searchfox.org/mozilla-central/source/browser/components/extensions/parent/ext-windows.js#420-427

so I need to resolve that async by applying the title properly before we return from that function, which requires any idle callback to resolve all promises requested by updateTitlebar.

I haven't been working with front end code in quite a bit and I'm not sure how to resolve it. I know I could hack my way around by exposing the resolve of a promise to outside of it, and then maybe store a list of pendingTitleupdates = [promise, resolve] and just resolve all of them when the idle callback updates the title, but it seems very dirty.

Gijs - do you have any suggestion on how to resolve it more cleanly?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #12)

Unlikely. I'm still profiling this, against the updated to fluent-rs patchset. Should have the decision on 73 by tomorrow.

We should have backed this out of 73 again at the time, the fluent-rs changes would never have been upliftable to 73 anyway. :-\

Anyway, that ship has sailed now, I guess.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #18)

Hmm, ok, I tried to build something on top of what :mconley found, but that doesn't seem to be complete.

In particular, while I did successfully delay the title change, the consequitive runs of the updateTitlebar will put the previous promise in limbo till it times out.

I don't understand this sentence.

The issue is that win.convert() has to have the title set here: https://searchfox.org/mozilla-central/source/browser/components/extensions/parent/ext-windows.js#420-427

so I need to resolve that async by applying the title properly before we return from that function, which requires any idle callback to resolve all promises requested by updateTitlebar.

I haven't been working with front end code in quite a bit and I'm not sure how to resolve it. I know I could hack my way around by exposing the resolve of a promise to outside of it, and then maybe store a list of pendingTitleupdates = [promise, resolve] and just resolve all of them when the idle callback updates the title, but it seems very dirty.

Gijs - do you have any suggestion on how to resolve it more cleanly?

Well, updateTitlebar doesn't actually take any arguments. So it follows that if there are multiple calls while waiting for the idle moment, it doesn't matter "which" we run. We want to essentially only run it when an idle callback has time to complete, so we can just check if a call is pending, schedule one if not, and do nothing else:

let gBrowser = {
  _titleUpdatePromise: null,
  updateTitlebar() {
    if (!this._titleUpdatePromise) {
      this._titleUpdatePromise = new Promise(resolve => {
        window.requestIdleCallback(async () => {
          this._titleUpdatePromise = null;
          // Note: it's possible for `updateTitlebar` to get called again while we're waiting
          // for l10n to finish in `actualUpdateTitlebar`, and in theory its idle callback
          // could then run while we're still waiting here. Unsure if that's a problem.
          // However, we definitely can't null out the promise afterwards, or the same
          // race could lead to dropped titlebar updates.
          await this.actualUpdateTitlebar();
          resolve();
          return;
        });
      });
    }
    return this._titleUpdatePromise;
  }
}

Does that work?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)

Oh yeah! This looks great! Damn, I need to get back to async programming - it got rusty over last year! Thank you!

The tests pass, will test raciness and talos and try to get the patch for review today!

Flags: needinfo?(gandalf)

Seems like a solid Talos improvement.

Zibi, do you have a status update on this bug? Thanks

Flags: needinfo?(gandalf)

Zibi, do you have a status update on this bug? Thanks

It's in the review.

Flags: needinfo?(gandalf)

Per request from :dao, I:

  1. Tested it against the prototype of changes in bug 1613705

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3c4f604ec7c96d5f98c8c8c580dcca1b2a372fea&newProject=try&newRevision=3a2421a124796f002c6d4cccbe5dcb673221bc68&framework=1

The results show wins on tabswitch validating my hypothesis from comment 12 that rust migration solves it - it's just that bug 1560038 is not enough.
The results are not fully green, some builds show significance, others don't, but all are down and the win is similar to the perf loss reported here.
If we decided to wait for bug 1613705, I can remeasure with more detail closer to when its ready.

The issue is that bug 1613705 is quite far off from being done. I don't even know how long it'll take me to finish it unfortunately, because there's a number of unknown unknowns around async I/O and generator patterns from JS.
I don't know how comfortable :gijs and :dao are with keeping this regression in through cycles while waiting for it.

  1. Tested Services.tm.dispatchToMainThread instead of requestIdleCallback

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3c4f604ec7c96d5f98c8c8c580dcca1b2a372fea&newProject=try&newRevision=f62c9f4abcff77ed79e201a1b1b83db394f884a0&framework=1

I see some tabswitch wins, but they seem slightly smaller than from the requestedIdleCallback and more builds don't show significance.

  1. Moving the optimization to the callsite

I'd like :gijs to weigh in on this approach. I can write a patch if he thinks it's worth it.

===

Generally speaking, the requestIdleCallback seems to give the best perf win and can be landed now and even backported to 74. dispatchToMainThread also seems like a small patch, and the win is only slightly smaller.
The wait for fluent-rs refactor may take several cycles if we want to wait for it.

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dao+bmo)

Since, as I understand it, requestIdleCallback could add a use-perceivable delay, I think as an interim solution I'd prefer dispatchToMainThread on the call site with a reference to bug 1613705.

Flags: needinfo?(dao+bmo)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #26)

Per request from :dao, I:

  1. Tested it against the prototype of changes in bug 1613705

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3c4f604ec7c96d5f98c8c8c580dcca1b2a372fea&newProject=try&newRevision=3a2421a124796f002c6d4cccbe5dcb673221bc68&framework=1

The results show wins on tabswitch validating my hypothesis from comment 12 that rust migration solves it - it's just that bug 1560038 is not enough.

comment #13 said that rust migration didn't solve it? I don't understand what's changed.

The results are not fully green, some builds show significance, others don't, but all are down and the win is similar to the perf loss reported here.
If we decided to wait for bug 1613705, I can remeasure with more detail closer to when its ready.

The issue is that bug 1613705 is quite far off from being done. I don't even know how long it'll take me to finish it unfortunately, because there's a number of unknown unknowns around async I/O and generator patterns from JS.
I don't know how comfortable :gijs and :dao are with keeping this regression in through cycles while waiting for it.

I think we should fix this ASAP given the uncertainty about both timeline and efficacy of the rust migration in terms of addressing this issue.

  1. Tested Services.tm.dispatchToMainThread instead of requestIdleCallback

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3c4f604ec7c96d5f98c8c8c580dcca1b2a372fea&newProject=try&newRevision=f62c9f4abcff77ed79e201a1b1b83db394f884a0&framework=1

I see some tabswitch wins, but they seem slightly smaller than from the requestedIdleCallback and more builds don't show significance.

Can you link to what you're comparing with, ie where's the requestIdleCallback version?

  1. Moving the optimization to the callsite

I'd like :gijs to weigh in on this approach. I can write a patch if he thinks it's worth it.

I mean, we'll still need to de-dupe the calls at that callsite, and if we don't update the other callsite then we could still end up with unnecessary duplicate calls to updateTitlebar, which seems unfortunate. It's not clear to me what moving this to the callsite achieves. Dão?

Really, at this point I wonder if we should just not use fluent for this. Cache the values for the brand name and (Private Browsing) and do string concatenation instead. Then this whole problem goes away.

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(gandalf)
Flags: needinfo?(dao+bmo)

(In reply to :Gijs (he/him) from comment #28)

  1. Moving the optimization to the callsite

I'd like :gijs to weigh in on this approach. I can write a patch if he thinks it's worth it.

I mean, we'll still need to de-dupe the calls at that callsite,

It's not really clear to me why that's needed.

and if we don't update the other callsite then we could still end up with unnecessary duplicate calls to updateTitlebar, which seems unfortunate.

It seems like a mostly hypothetical problem to me. In reality the common calls are from tab switching and content updates. I would expect neither to normally happen in quick enough succession that this would be something we should optimize for.

It's not clear to me what moving this to the callsite achieves. Dão?

It avoids adding complexity to updateTitlebar for what seems to be an interim solution. The idea is to wrap the call only in perf-critical paths.

Flags: needinfo?(dao+bmo)

(In reply to Dão Gottwald [::dao] from comment #29)

(In reply to :Gijs (he/him) from comment #28)

  1. Moving the optimization to the callsite

I'd like :gijs to weigh in on this approach. I can write a patch if he thinks it's worth it.

I mean, we'll still need to de-dupe the calls at that callsite,

It's not really clear to me why that's needed.

Well, that's what's getting us perf wins, surely? Avoiding running N titlebar updates by delaying when we update the titlebar and only running one when we stop switching tabs. I'd be very surprised if running the same number of updates, just a bit later, gets us any perf win - unless the delay just pushes the CPU hit to happen after the test stops measuring, which isn't really a fix at all, just cheating the perf test (and introducing a slowdown for users in terms of when updates happen).

So writing that, I went to look at the test. https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/tabswitch/api.js#280-284

    TalosParentProfiler.resume("start: " + tab.linkedBrowser.currentURI.spec);
    let time = await switchToTab(tab);
    TalosParentProfiler.pause("finish: " + tab.linkedBrowser.currentURI.spec);
    dump(`${tab.linkedBrowser.currentURI.spec}: ${time}ms\n`);
    times.push(time);

switchToTab resolves to the time it took (measured using Cu.now()) between the start of the tabswitch and when content is presented for the new tab.

Delaying and de-duplicating using idle callbacks helps both the test and reality, because now we actually run less stuff.

Using Services.tm.dispatchToMainThread and not de-duping will maybe help the test but will certainly not help CPU load in reality. So that's not actually a fix.

If using requestIdleCallback is not acceptable for UX reasons, I think we need a different solution.

(In reply to Marian Raiciof [:marauder] from comment #0)

3% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,226.62 -> 1,264.17

FWIW, do we know what caused this? Because my first guess here would be that this is the tab title update (ie through setTabLabel) rather than tabswitching, so any fix to only the latter callsite of updateTitlebar would also not address this regression.

My understanding was that updateTitlebar itself blocked part of tab switching, hence making tab switching slower; delaying that work wouldn't be cheating but actually help tab switching performance as encountered by real users. If the problem is repeated calls due to how the tabswitch test works then I think we should close this as wontfix and just wait for bug 1613705.

(In reply to Dão Gottwald [::dao] from comment #32)

If the problem is repeated calls due to how the tabswitch test works then I think we should close this as wontfix and just wait for bug 1613705.

Or stop using fluent I guess. I haven't looked at all the pros and cons so I don't have an opinion on this at this stage.

(In reply to Dão Gottwald [::dao] from comment #32)

My understanding was that updateTitlebar itself blocked part of tab switching

No, none of the calls to updateTitlebar await the promise except the webext one (which I don't think gets hit from the test), so the promise resolution in the test is already (ie pre-patch) not dependent on updateTitlebar completing. It's just that we spend cycles, while waiting for other work, and as a result that other work takes longer to complete.

(In reply to :Gijs (he/him) from comment #34)

(In reply to Dão Gottwald [::dao] from comment #32)

My understanding was that updateTitlebar itself blocked part of tab switching

No, none of the calls to updateTitlebar await the promise except the webext one (which I don't think gets hit from the test), so the promise resolution in the test is already (ie pre-patch) not dependent on updateTitlebar completing. It's just that we spend cycles, while waiting for other work, and as a result that other work takes longer to complete.

Okay, so the call doesn't immediately block the call site, but that still doesn't seem to mean that repeated calls are the problem. Based on what you're saying now it seems that we could just delay the work without tracking repeated calls. It still doesn't seem to me that the repeated call scenario is real enough to worry about.

(In reply to Dão Gottwald [::dao] from comment #35)

(In reply to :Gijs (he/him) from comment #34)

(In reply to Dão Gottwald [::dao] from comment #32)

My understanding was that updateTitlebar itself blocked part of tab switching

No, none of the calls to updateTitlebar await the promise except the webext one (which I don't think gets hit from the test), so the promise resolution in the test is already (ie pre-patch) not dependent on updateTitlebar completing. It's just that we spend cycles, while waiting for other work, and as a result that other work takes longer to complete.

Okay, so the call doesn't immediately block the call site, but that still doesn't seem to mean that repeated calls are the problem. Based on what you're saying now it seems that we could just delay the work without tracking repeated calls. It still doesn't seem to me that the repeated call scenario is real enough to worry about.

Right, the repeated calls probably aren't the issue - that's just what I thought before I looked at the test. I guess from comment #15 it just seemed like another thing we should avoid if the title update is expensive - if the user switches tabs 3 times quickly (e.g. from hitting ctrl-tab repeatedly), we don't need 3 title updates, one will do...

But idle callbacks make some attempt to avoid doing the work while we're still waiting for the rest of the tab switch to complete, which dispatchToMainThread doesn't, which means I'm unconvinced that doing that really helps here. It might help the numbers reported by this test a bit, depending on the phase of the moon etc. in terms of when that task gets scheduled vs. when the other bits of the tab switch get scheduled. But it won't make a difference for users - things will still be slower, on the whole, because fluent has meant we now just do more work here than the "dumb" string concatenation that was there before bug 1591328. In fact, similar things may be true for requestIdleCallback, though the tabswitch itself (in terms of showing the new content and updating the tabstrip) is probably more responsive in that scenario vs. using dispatchToMainThread - but the trade-off is that if the titlebar is visible, there'll be visible slowness there, and even without the titlebar visible, on Windows the same slowness may be visible in the taskbar etc.

(In reply to Dão Gottwald [::dao] from comment #33)

Or stop using fluent I guess. I haven't looked at all the pros and cons so I don't have an opinion on this at this stage.

This seems like the best option to me, unless we're convinced we actually need fluent's features for this string, in which case we probably want to come up with some way of making the parameter passing for the string less slow. I guess we could use formatValue and "manually" assign the result, which should reduce the amount of JSON marshalling that happens? Unsure if that'd be enough to fix the slowness here.

comment #13 said that rust migration didn't solve it? I don't understand what's changed.

Bug 1560038 is what I was working on for the last couple months and it migrates FluentBundle/FluentResource classes (which are performing parsing/resolving of messages) to Rust. This one, despite showing good performance wins in microbenchmarks, didn't result in talos wins.

Bug 1613705 builds on top of that migrating the remaining Fluent JS bits to Rust. This makes us not use JS at all when localizing UI, and it does show significant performance wins.

Can you link to what you're comparing with, ie where's the requestIdleCallback version?

Yeah,
m-c vs. requestIdleCallback: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a7805428e6a865e8bc90f51d0474c87203b61396&newProject=try&newRevision=e8be624b3111e3bf72104d03874c4fcaae41f69f&framework=1
m-c vs. dispatch: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3c4f604ec7c96d5f98c8c8c580dcca1b2a372fea&newProject=try&newRevision=f62c9f4abcff77ed79e201a1b1b83db394f884a0&framework=1

Really, at this point I wonder if we should just not use fluent for this. Cache the values for the brand name and (Private Browsing) and do string concatenation instead. Then this whole problem goes away.

This seems like the best option to me, unless we're convinced we actually need fluent's features for this string, in which case we probably want to come up with some way of making the parameter passing for the string less slow. I guess we could use formatValue and "manually" assign the result, which should reduce the amount of JSON marshalling that happens? Unsure if that'd be enough to fix the slowness here.

Looping in :flod for that option.
I can explore the formatValue approach as well.

Flags: needinfo?(gandalf) → needinfo?(francesco.lodolo)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #37)

Really, at this point I wonder if we should just not use fluent for this. Cache the values for the brand name and (Private Browsing) and do string concatenation instead. Then this whole problem goes away.

This seems like the best option to me, unless we're convinced we actually need fluent's features for this string, in which case we probably want to come up with some way of making the parameter passing for the string less slow. I guess we could use formatValue and "manually" assign the result, which should reduce the amount of JSON marshalling that happens? Unsure if that'd be enough to fix the slowness here.

Looping in :flod for that option.

I'm not sure what's the question for me, and there's a ton to parse through in this bug. On technical matters, Axel might also be a better fit.

If it's about dropping the use of Fluent for windows' titles, I'd like to understand what's the alternative. Going back to DTDs? For how long?

Flags: needinfo?(francesco.lodolo)

(In reply to Francesco Lodolo [:flod] from comment #38)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #37)

Really, at this point I wonder if we should just not use fluent for this. Cache the values for the brand name and (Private Browsing) and do string concatenation instead. Then this whole problem goes away.

This seems like the best option to me, unless we're convinced we actually need fluent's features for this string, in which case we probably want to come up with some way of making the parameter passing for the string less slow. I guess we could use formatValue and "manually" assign the result, which should reduce the amount of JSON marshalling that happens? Unsure if that'd be enough to fix the slowness here.

Looping in :flod for that option.

I'm not sure what's the question for me, and there's a ton to parse through in this bug. On technical matters, Axel might also be a better fit.

If it's about dropping the use of Fluent for windows' titles, I'd like to understand what's the alternative. Going back to DTDs? For how long?

In my head, what we'd do is use fluent to get the brand name and the "private browsing" suffix as custom data-* attributes on the window element, and then construct the title from those in JS by string concatenation.

FWIW, we store the brand name in a Term and you can't get Terms through the localization API w/out a Message that does a term reference. Depending on what we're paying for here, that may or may not be a problem.

And yes, there are good reasons for Terms being internal to the localization.

Here's the result with formatValue used instead of DOM - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=40b548c2c444fc586015bb176d8b3d30871003d2&newProject=try&newRevision=524e5584ec96a25fa3fca0d88fb924e10bee4d6a&framework=1

The perf wins are there, but since the distribution of results is bimodal the talos test for measuring the significance is not accurate (we really should be testing if the distribution is normal before we decide which p-test to use...).

I suspect that bug 1613705 will help further, so maybe this is good enough to land for now?

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #41)

Here's the result with formatValue used instead of DOM - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=40b548c2c444fc586015bb176d8b3d30871003d2&newProject=try&newRevision=524e5584ec96a25fa3fca0d88fb924e10bee4d6a&framework=1

This still does a fluent trip every time there's a new title, which my suggestion in comment #39 was trying to avoid. Is there a reason not to do that? Did I miss some discussion?

Axel stated that we don't want to expose "just brand-name".

What I could do is have a string like "TITLE { -brand-short-name } (Private Browsing)", get it via formatValue, store as an attribute on the element, and .replace("_TITLE_", title) on each change.

I'd hate to do it, but well, I can.

The way I see it is that I think we all can see that there's no easy solution without tradeoffs, and there are three stakeholders - Pike, Gijs and Dao, each saying what we shouldn't do with me caught in the middle trying to find a way where the tradeoff is acceptable to all three :)

Is there a chance you three can sync and decide on the approach? I'm happy to write the code, but I feel like it's suboptimal for me to continue writing different versions of a patch just to learn that one of the stakeholders finds it a no-go, irrelevant of the numbers.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #43)

Axel stated that we don't want to expose "just brand-name".

That's not how I read comment #40 ; I read that terms cannot be fetched by the API, which just seemed like a statement on a technical limitation that was there because of the intents around the use of terms in general. It'd be trivial to use (only) the brand name in a non-term string and fetch that, right? And I don't think of brand names as just any terms (in fact, given we can reference non-term identifiers, I wonder what the point is of brand names being terms at all...), and there are definitely already places in the codebase where the brand name gets used in isolation.

Either way, I didn't read that as objecting to the approach. Perhaps Axel can clarify if it was indeed an objection to my suggestion, and if so, why?

Flags: needinfo?(l10n)

Yeah, just wanted to note the technical limitations. No objection to any particular approach.

Question, though: AFAICT, the current code translates twice, once for the modification observer, and once for translateElements? No idea if pauseObserving/resumeObserving around the setAttributes call had any impact, though.

Flags: needinfo?(l10n)

Thank you Axel!

I gave it a spin to pause observing to see if it impacts the numbers: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1b9baf6045560d3f2a9e6e0c2d3fbb83cd12c568&newProject=try&newRevision=51704787bf0b2b6f7d9ecf2de191b91cbc205337&framework=1

My interpretation is that it does (consistent wins across all builds), but not enough to get us a significance. It's a safe change to land, so I may want to test it next against reversion to see if that gets us to the point where reversing doesn't show up as significant.

If we want a bigger win, we can get it from switching to formatValue, if we want even a bigger win, we can switch to some version of what Gijs suggested in #44, or continue fiddling with idle callback coalescing or dispatching to threads.

I think there's a good number of options here. My person would be to do the pause observing, combine it with the upcoming wins from bug 1560038 and be done. But if Gijs and Dao prefer something more substantial, I'll write the patch.

I think requestIdleCallback is essentially not the right tool to use here as we want to update the UI more or less immediately.

Attached image compare 6 scenarios

Right, removing it from contention Dao! :)

Ok, run more tests to help us make a decision here. Here are builds I ran:

  1. mozilla-central
  2. m-c + pauseObserving
  3. m-c + fluent-rs
  4. m-c + fluent-rs + pauseObserving
  5. m-c + fluent-rs + loc-rs
  6. reverting title change

I'm using Pike's talos compare to show all of them at once: https://pike.github.io/talos-compare/?revision=2987199163bc&revision=a551488594e1&revision=17d8a3714152&revision=d4c66bc198c6&revision=452cf9ddfa57&revision=74fc8e5852ff

Unfortunately, the UI doesn't allow to preserve the filtering of the tests, so you'll need to remove (x) all tests but tabswitch to see detailed view, so I'm attaching a screenshot of the output.

Conclusions:
a) While rustification of Fluent helps with tabswitch it does not fix the whole regression.
b) pauseObserving matters on some platforms, but not all

Next steps:
a) I'm going to retest with formatValue straight
b) I'm going to retest with formatValue combined with caching of some ##TITLE## - ... placeable as an attribute on the element and reuse it for updates.

Attachment #9124128 - Attachment is obsolete: true

Hi Zibi, do we have a clear picture now on how to proceed with this patch? Thank you!

Flags: needinfo?(gandalf)

Hi Zibi, do we have a clear picture now on how to proceed with this patch?

Nope, still investigating options.

Flags: needinfo?(gandalf)
Flags: needinfo?(l10n)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dao+bmo)

I'm only fully back tomorrow, but perhaps in the meantime you can answer a few questions I have off-hand:

  • is the twinopen regression not real?
  • does the tabpaint improvement get us the same win as backing out the regressing bug(s)? This should be easy to test on try, but it's hard to tell from just one compare link.
  • why do we need to compute hashes at all? Is it just a memory-saving technique? Why not just compare id and args ?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)

Since I finally landed fluent-rs which already gives a small win here, I retested my formatValue + cache approach. Here are the results:

  1. m-c against full revert to deprecated DTD - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e2528dfcc347928ac3ceecb6bfd7b4d18a755e10&newProject=try&newRevision=6061ae700a634b8a562ae8897cc1549484bd2ad8&framework=1

As you can see, shippable builds have still 3-4% regression

  1. reversion against formatValue+cache - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6061ae700a634b8a562ae8897cc1549484bd2ad8&newProject=try&newRevision=56b75bc4ea52efcd7ab638173d54a95407614af5&framework=1

Now the shippable builds show 1% and below regression

You can think of (2) as if the patch with those changes landed originally. I think that talos numbers looks okayish with that patch. There still a lot of noise, and it's not completely in line with DTD, but it's pretty close and seem to me like a decent compromise.

Gijs, Dao?

Flags: needinfo?(gandalf) → needinfo?(gijskruitbosch+bugs)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #54)

  1. reversion against formatValue+cache - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6061ae700a634b8a562ae8897cc1549484bd2ad8&newProject=try&newRevision=56b75bc4ea52efcd7ab638173d54a95407614af5&framework=1

Now the shippable builds show 1% and below regression

Do you know why the non-shippable linux builds show a 4% tabswitch regression?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)

Do you know why the non-shippable linux builds show a 4% tabswitch regression?

No, I don't :(

The stdev is ~10% so my guess is that if I were to run two talos runs with exactly the same changeset I may get up to 3-5% difference and it may show it as significant.
The other thing is that if I understand the test correctly, the paint frames come into play so it may be that a small variance is signified by being clustered into one of the paint frames.
Maybe it's worth asking the perf team again to investigate?

Flags: needinfo?(gandalf)

I guess we won't fix this for 75 either at this point.

I just looked at this again. I'm looking at this talos-compare: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6061ae700a634b8a562ae8897cc1549484bd2ad8&newProject=try&newRevision=56b75bc4ea52efcd7ab638173d54a95407614af5&framework=1 with uncertain results removed.

It shows improvements (compared to a backout) in tabpaint.

It still shows regressions in:

  • pdfpaint (1.15%)
  • sessionrestore (1.98% / 1.65%)
  • tabswitch (~4%, but on non-shippable builds only)

And in twinopen it shows mixed results - a potential regression on non-shippable, potential improvement on shippable - but both are uncertain. It also introduces significantly more noise on Windows (though it's unclear to me how actionable that is).

As a result, I don't that this caching patch fully addresses the issue.

It's now been 3.5 months since the regression originally happened. I think at this point, we should just back out, and re-do the conversion from scratch, keeping an eye on talos the whole time. This way, we can ensure that whatever we come up with doesn't regress things, and then Dão/Axel/me/you can review whatever conversion we end up doing once there's talos results indicating there's no regression.

It should be possible, once a revert has happened, to push a naive conversion to try and profile it aggressively to find out where we're actually paying additional costs, and then address that.

Flags: needinfo?(l10n)
Flags: needinfo?(dao+bmo)

As promised here's the backout patch. I put it next to the formatValue-cache patch.

Here's the comparison using Pike's talos compare - https://pike.github.io/talos-compare/?revision=41cf2e99d8b0&revision=46577a1c65c8&revision=9254c4b7e01b (you have to remove the tests you don't need to see tabswitch for example).

Here's the perfherder view:

m-c vs backout: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=41cf2e99d8b098a638278403d986820f386923d9&newProject=try&newRevision=46577a1c65c867a5ce8e5e0397c1cf5d5e8c2180&framework=1

backout vs. formatValue-cache: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=46577a1c65c867a5ce8e5e0397c1cf5d5e8c2180&newProject=try&newRevision=9254c4b7e01b5d76fd75281dbc7b08a28fee5cf7&framework=1

I understand the motivation for the backout, but, I'd like to take a moment here to talk about the culture of making decisions based on imperfect talos data:

Talos has been notoriously noisy and imperfect. It's a useful tool for us, but it's very far away from a tool that would reliably show data that we can blindly translate into "this impacts user experience". There are many factors contributing to that, but the level of noise, hidden factors (like differences in machines that the test happens to be performed in in the cloud) and non-normal distributions being measured as if they were normal are the main ones.

The reason I brought it up here is that we spent 4 months, with non-insignificant amount of time invested at least by me, but likely by others in this thread as well, chasing a statistically significant regression of ~4-6% (linux shippable)/~2-3% (windows shippable) in a test with ~2.5% stdev. This test seems to be bi or multimodal which puts the significance results in question.


To be clear - I don't doubt that there is a regression. I am not sure how much we really are regressing and if it's worth reverting a migration from a deprecated technology to Fluent for that reason.



With all that in mind. I still think it was worth chasing this bug. The test may be not perfect, the validity of the results may be in question, and the value of fixing this regression may be less than it appears to me from the reaction, but it's good to chase it and try to fix it.


What I struggle with is how absolute the notion of "regression has to be completely removed" is being treated in the light of all those imperfections of the test.
The caching patch I provided reduces the regression to almost completely 0% with a few exceptions on non-shippable builds spread across different tests and platforms.

We are left in a position to revert the current patch and work on an "improved" patch to chase the remaining 0.2ms on linux, 0.1ms on windows on tabswitch and fluctuating.

With the level of stdev and hidden factors, I'm fairly confident that I can re-run all trys a couple times and find a combination where formatValue+cache doesn't produce any significance.
In fact the latest backout vs. formatValue+cache almost looks like it.

I believe that our talos harness is not misleading us to show false positives, but has a tendency to allow us to exaggerate minor regressions (say, 0.5-1.5%) into significant ones (2.0%-3.5%) chase red herrings - not in a sense of non-existing regressions, but minor ones as if they were major.
In other words, if we're ok paying 1.0% for migrating this to Fluent and moving on to work on things that will improve the Fluent integration with Gecko (see bug 1613705 comment 2 for potential wins), then I think this patch is there.

====

For that reason, while I'm offering the backout patch, I'm advocating for landing the formatValue+cache patch and considering this regression fixed enough. It's a zero-sum game and in the end, reducing technical debt is also a value that frees me to work on Fluent bindings performance.

NI's:

  • Gijs/Dao - can you look at the last talos results and either confirm that you want me to back out the patch and file a follow up to invest more time to improve the migration or that you agree with me on the formatValue+cache patch?
  • Pike - in case Gijs+Dao decide on the backout, can you confirm that we're ok with the backout?
  • mconley - can you look at the talos results and my take on the results and provide your take on how seriously you'd recommend us treating regressions of that magnitude (both the original regression and the reduced regression with the formatValue+cache patch)?

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #60)

Note to needinfo-ees: I'm snipping for brevity; please see Zibi's full comment #60.

As promised here's the backout patch. I put it next to the formatValue-cache patch.

Here's the comparison using Pike's talos compare - https://pike.github.io/talos-compare/?revision=41cf2e99d8b0&revision=46577a1c65c8&revision=9254c4b7e01b (you have to remove the tests you don't need to see tabswitch for example).

FWIW, I don't really understand how to read this. There is no x-axis labeling, and the numbers on either end have no units and also seem to make no sense (e.g. 0 on the left of most of the cpstartup runs - but not all of them, and exactly the same number (557) at the end?). Also, the scale on the x-axis is such that you cannot really see any detail for tests with small values (like tabswitch).

The reason I brought it up here is that we spent 4 months, with non-insignificant amount of time invested at least by me, but likely by others in this thread as well, chasing a statistically significant regression of ~4-6% (linux shippable)/~2-3% (windows shippable) in a test with ~2.5% stdev. This test seems to be bi or multimodal which puts the significance results in question.

I don't see this bi- or multi-modality in any of the links; either way, I suggest you file a separate bug on this issue in Testing :: Talos, with details on what you perceive to be the problem with the test, and suggestions as to how to fix it. If there's a perf cliff (e.g. based on whether we finish some operation before or after a refresh driver tick, or we end up doing a jank-y thing either before or after the test's completion point) resulting in multi-modality, and we could either transform the test results to account for that, or investigate and fix that perf cliff, that seems valuable, orthogonal to this bug.

To be clear - I don't doubt that there is a regression. I am not sure how much we really are regressing and if it's worth reverting a migration from a deprecated technology to Fluent for that reason.

We're not permanently reverting. We're reverting so we can re-migrate in a way that doesn't impact performance, or at least so we can understand what exactly the impact on performance is. You allude to talos being an imperfect mirror of user experience - which is true! - but at this point we do not seem to understand what the cause of the talos regression is, so we do not understand the perf impact, so it is also impossible to say what the actual impact on UX is. We've now spent nearly 4 months on this and still don't understand, so the safest course of action is to back out and figure that part out (or come up with a way of landing this switch that has no perf impact) without potentially negatively impacting performance for users. Really, we should probably have done this sooner - but we kept thinking "if only we do this other thing, maybe that'll fix the performance issues", and have been unsuccessful. I still have some ideas on how to improve things here, but we've spent so much time doing this that I don't want to pursue them without reverting first: Time to cut our losses.

What I struggle with is how absolute the notion of "regression has to be completely removed" is being treated in the light of all those imperfections of the test.
The caching patch I provided reduces the regression to almost completely 0% with a few exceptions on non-shippable builds spread across different tests and platforms.

perfherder still shows 1.7 and 2.89% regressions on tabswitch on shippable linux and windows builds, and importantly, we still don't understand what the reason for this is.

I believe that our talos harness is not misleading us to show false positives, but has a tendency to allow us to exaggerate minor regressions (say, 0.5-1.5%) into significant ones (2.0%-3.5%) chase red herrings - not in a sense of non-existing regressions, but minor ones as if they were major.
In other words, if we're ok paying 1.0% for migrating this to Fluent and moving on to work on things that will improve the Fluent integration with Gecko (see bug 1613705 comment 2 for potential wins), then I think this patch is there.

I don't think we're OK with paying 2% (I don't really buy the "exaggerate minor regressions" thing - talos is a machine, it has no intentions or "tendencies". The regression is around 2% in this push, it was around 4% in the previous push, so if anything I think this push is undercounting it), when we should be able to do the migration with 0 performance cost.

For that reason, while I'm offering the backout patch, I'm advocating for landing the formatValue+cache patch and considering this regression fixed enough. It's a zero-sum game and in the end, reducing technical debt is also a value that frees me to work on Fluent bindings performance.

Well, the DTD implementation was very straightforward - there were a bunch of attributes, and whenever the title changes there were some straightforward operations based on those attributes and the content title, including some string concatenation, to produce the final title. We could do the same in fluent, but we haven't tried that.

NI's:

You didn't leave any needinfos; I'll set them.

  • Gijs/Dao - can you look at the last talos results and either confirm that you want me to back out the patch and file a follow up to invest more time to improve the migration or that you agree with me on the formatValue+cache patch?

I still think we should back out the patch and someone should file a follow-up to investigate the migration. You wrote elsewhere in this comment that you have other fluent-related work, and suggested in comment #56 that maybe we should ask the perf team. The actual fluent work here is very minimal (only a handful of strings), and the issue we're fixing is perf-related, so that seems like a reasonable way to proceed. I expect Mike or I could work on the follow-up.

  • Pike - in case Gijs+Dao decide on the backout, can you confirm that we're ok with the backout?
  • mconley - can you look at the talos results and my take on the results and provide your take on how seriously you'd recommend us treating regressions of that magnitude (both the original regression and the reduced regression with the formatValue+cache patch)?
Flags: needinfo?(mconley)
Flags: needinfo?(l10n)
Flags: needinfo?(dao+bmo)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #60)

  • Pike - in case Gijs+Dao decide on the backout, can you confirm that we're ok with the backout?

I know that there's tons of people that I don't represent. So I'm not speaking for "us".

The impact of a back-out on the localizations of Firefox is minimal. Thanks to Fennec keeping ESR 68 in cross-channel, we still have the original DTD strings around. So for anything in Pontoon and down from there, this is should be a non-event.

Flags: needinfo?(l10n)

Also, the scale on the x-axis is such that you cannot really see any detail for tests with small values (like tabswitch).

Yeah, that's an UX issue with that tool. You need to remove all-but tabswitch to see the values for it.

I don't see this bi- or multi-modality in any of the links;

Here's an example dataset from tabswitch linux64-qr: [8.61, 8.42, 8.62, 8.35, 8.42, 8.51, 8.65, 8.36, 8.45, 8.7, 8.44, 8.72, 8.48, 8.72, 8.68, 8.39, 8.71, 8.63, 8.49, 8.41]

Feeding it to https://www.gigacalculator.com/calculators/normality-test-calculator.php gives warning of potential non-normal distribution on Shapiro-Wolf, D'Agostino-Pearson and Anderson-Darling p-values.
You can also feed it here for visual analysis - http://www.statskingdom.com/320ShapiroWilk.html

I suggest you file a separate bug on this issue in Testing :: Talos, with details on what you perceive to be the problem with the test, and suggestions as to how to fix it.

It's been filed as bug 1425493, bug 1623168 and :ekyle is working on fixing it in bug 1601952.

I still have some ideas on how to improve things here, but we've spent so much time doing this that I don't want to pursue them without reverting first: Time to cut our losses.

Gotcha. Fair enough. I guess my position is subtly different (cut losses == land it and improve iteratively on the remaining bits), but it's your call :)

perfherder still shows 1.7 and 2.89% regressions on tabswitch on shippable linux and windows builds, and importantly, we still don't understand what the reason for this is.

Perfherder also claims that its within the noise. It is possible for a test to show persistant minor regression without being an actual regression. That's what I was referring to as red herring.

when we should be able to do the migration with 0 performance cost.

I don't know if we should. We are interpolating a string, providing proper localization rather than hardcoded concatenation and we provide reactive l10n which can update the translation on locale switch via declarative bindings.
At scale we're talking about that probably cost (even just assigning the attributes on the elements).

Maybe what I'm claiming is that I wish the cost was acceptable for the benefits and I'm disappointed that it is considered not to be.

Flags: needinfo?(mconley)
Flags: needinfo?(dao+bmo)

Requested review on the backout.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #63)

when we should be able to do the migration with 0 performance cost.

I don't know if we should. We are interpolating a string, providing proper localization rather than hardcoded concatenation and we provide reactive l10n which can update the translation on locale switch via declarative bindings.
At scale we're talking about that probably cost (even just assigning the attributes on the elements).

Maybe what I'm claiming is that I wish the cost was acceptable for the benefits and I'm disappointed that it is considered not to be.

We can provide the reactive l10n fairly easily by responding to an event / observer message (once we switch away from DTD in any way/shape/form). I'm not convinced fluent is able to do anything else for us here apart from a convenient way of specifying how the concatenation happens (ie prefix + suffix around content title) - after all, there is no semantic information attached to the content title (gender, number, etc.) that the locale can actually use here to adapt the rest of the string, so I don't think there's an actual gain in expressiveness.

There's also a risk in terms of allowing the localization to e.g. omit the brand name, if we think that it needs to be there on win/linux, so a potential for locale-specific bugs...

Anyway, if I'm missing something there and there are other compelling reasons, we can discuss more in bug 1591328 (or a fresh bug; I don't mind which) once this backout lands. If there's some benefit to having fluent handle the concatenation that we can't get any other way, and there's no way to make this performant and we actually understand where the perf loss comes from, I could be convinced -- but right now, I don't believe any of those are the case, and I think we should continue the discussion without continuing to take the current perf consequences on builds we ship to users.

Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/16d82f57e0fb
Revert Window Title to use DTD. r=fluent-reviewers,Gijs
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5db76e42440f
Revert Window Title to use DTD. r=fluent-reviewers,Gijs
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 76
Flags: needinfo?(gandalf)
Regressions: 1611645
Attachment #9132335 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.