Closed Bug 1810739 Opened 2 years ago Closed 2 years ago

Telemetry hurts performance of creating lots of tabs

Categories

(Firefox :: Tabbed Browser, defect)

defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- wontfix
firefox114 --- fixed

People

(Reporter: Oriol, Assigned: beth)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf:frontend, regression)

Attachments

(1 file)

Open lots of tabs, e.g. 2500.

var t0 = performance.now();
var tabs = gBrowser.addMultipleTabs(true, 0, new Array(2500).fill({}));
var t1 = performance.now();
gBrowser.removeTabs(tabs);
var t2 = performance.now();
console.log("Total:", t2 - t0, ". Add:", t1 - t0, ". Remove:", t2 - t1);

Note gBrowser.addMultipleTabs is the API used to restore the previous session, so it's important to optimize it so that Firefox can launch fast.

But it's slow, and one of the reasons is that, when opening each one of these tabs, telemetry runs this: https://searchfox.org/mozilla-central/rev/5bcbe6ae54ee5b152f6cef29dc5627ec7c0e1f1e/browser/modules/BrowserUsageTelemetry.jsm#523

function getOpenTabsAndWinsCounts() {
  let loadedTabCount = 0;
  let tabCount = 0;
  let winCount = 0;

  for (let win of Services.wm.getEnumerator("navigator:browser")) {
    winCount++;
    tabCount += win.gBrowser.tabs.length;
    for (const tab of win.gBrowser.tabs) {
      if (tab.getAttribute("pending") !== "true") {
        loadedTabCount += 1;
      }
    }
  }

  return { loadedTabCount, tabCount, winCount };
}

Note that the tab cache is invalidated when a tab opens, so win.gBrowser.tabs needs to recompute it for every tab, and then for (const tab of win.gBrowser.tabs) makes it quadratic. Is loadedTabCount really necessary? Or maybe the number of pending tabs can be cached?

And it seems to me that this doesn't need to happen synchronously for each new tab, the code could be delayed one tick so that it only runs once at the end.

:barret, since you are the author of the regressor, bug 1634508, could you take a look? Also, could you set the severity field?

For more information, please visit auto_nag documentation.

Flags: needinfo?(brennie)

Set release status flags based on info from the regressing bug 1634508

Actually I was wrong, the telemetry observer is only added after restoring the session, so session restore is not affected.
Still, other things that create multiple tabs synchronously (e.g. moving a tab multiselection into another window) are affected.

Summary: Telemetry hurts performance of restoring a big session → Telemetry hurts performance of creating lots of tabs
Flags: needinfo?(brennie)
Assignee: nobody → brennie
Status: NEW → ASSIGNED
Pushed by brennie@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/91a18abcd0e2 Debounce tab count metric collection r=Gijs

Have you checked the failure? I guess await BrowserTestUtils.openNewForegroundTab(gBrowser, "about:blank") may resolve too fast if about:blank is cached...

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:barret, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(brennie)

I cannot reproduce the failures locally. Try run for a potential fix: https://treeherder.mozilla.org/jobs?repo=try&revision=438b722ce4740847968429881aa827bbfa36a4ff

Flags: needinfo?(brennie)
Flags: needinfo?(gijskruitbosch+bugs)

Latest try run: https://treeherder.mozilla.org/jobs?repo=try&revision=ff0e3ff100f3551f31a2837afddb22b4f63475c9
I can't seem to get the tests to pass consistently.

:Gijs do you have any ideas?

Flags: needinfo?(gijskruitbosch+bugs)

Have you tried opening a tab different than about:blank?

(In reply to Barret Rennie [:barret] (they/them) from comment #10)

Latest try run: https://treeherder.mozilla.org/jobs?repo=try&revision=ff0e3ff100f3551f31a2837afddb22b4f63475c9
I can't seem to get the tests to pass consistently.

:Gijs do you have any ideas?

At least on my Windows machine with --verify, I saw issues relating to two problems:

  • opened tabs where we load a second URL and I suspect the browserLoaded promise resolves with about:blank instead of the initial pageload
  • opened windows where we don't wait for the content page to load as well. BTU has a flag for this but it takes a URL.

I wrote a patch that seems to have addressed these locally, and am trying to push that to try:

https://treeherder.mozilla.org/jobs?repo=try&revision=1e0602936547779ddfee822b4566db5d91012e9b

Fingers crossed.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to :Gijs (he/him) from comment #12)

(In reply to Barret Rennie [:barret] (they/them) from comment #10)

Latest try run: https://treeherder.mozilla.org/jobs?repo=try&revision=ff0e3ff100f3551f31a2837afddb22b4f63475c9
I can't seem to get the tests to pass consistently.

:Gijs do you have any ideas?

At least on my Windows machine with --verify, I saw issues relating to two problems:

  • opened tabs where we load a second URL and I suspect the browserLoaded promise resolves with about:blank instead of the initial pageload
  • opened windows where we don't wait for the content page to load as well. BTU has a flag for this but it takes a URL.

I wrote a patch that seems to have addressed these locally, and am trying to push that to try:

https://treeherder.mozilla.org/jobs?repo=try&revision=1e0602936547779ddfee822b4566db5d91012e9b

Fingers crossed.

This has helped (green on Windows) but is still orange on Linux debug/tsan/asan (but not on TV for those jobs!? Oh look, they didn't run any tests - I filed bug 1829314.)

It looks like the next issue is that with the deferred task, the job of doing the counting is scheduled after a timer of 0s and an idle dispatch with no limit. It would appear that on these platforms, the idle task can't find a slot to run inbetween this code:

  let win = await BrowserTestUtils.openNewBrowserWindow();
  await BrowserTestUtils.firstBrowserLoaded(win);
  openedTabs.push(
    await BrowserTestUtils.openNewForegroundTab(win.gBrowser, "about:blank")
  );
  openedTabs.push(
    await BrowserTestUtils.openNewForegroundTab(win.gBrowser, "about:blank")
  );
  openedTabs.push(
    await BrowserTestUtils.openNewForegroundTab(gBrowser, "about:blank")
  );
  // The new window started with a new tab, so account for it.
  expectedTabOpenCount += 4;
  expectedWinOpenCount += 1;
  expectedMaxWins = 2;
  expectedMaxTabs += 4; /* This sets it to 8 */

  // Remove a tab from the first window, the max shouldn't change.
  BrowserTestUtils.removeTab(openedTabs.pop());
  checkScalars({
    maxTabs: expectedMaxTabs,  /* this fails beceause we get 7 instead of 8 */
    tabOpenCount: expectedTabOpenCount,
    maxWindows: expectedMaxWins,
    windowsOpenCount: expectedWinOpenCount,
    totalURIs: expectedTotalURIs,
    domainCount: 0,
    totalUnfilteredURIs: 0,
    maxTabsPinned: expectedMaxTabsPinned,
    tabPinnedCount: expectedTabPinned,
    totalURIsNormalAndPrivateMode: expectedTotalURIs,
  });

and I now see that my attempt to fix this was sort of a hackaround (even though it looks like it isn't). The BrowserUsageTelemetry code doesn't care at all about whether the tab loads, so we don't need to wait for that. When this bit of the test fails, the removeTab runs before we find an idle slot to record anything to telemetry, and so we record 7 tabs instead of 8.

Short of overriding the entire _onTabsOpenedTask with one that does set a timeout limit for the idle task (and then waiting a bit longer in the test), I don't see an "easy" way of fixing this. The "correct" fix is finding some way of ensuring that the idle task has run, and awaiting that in the test. DeferredTask exposes finalize but that prevents the deferred task from being used again afterwards (it's intended for shutdown use). We'd probably need to add a testOnlyWaitForTaskToRun method on the deferred task that returns a promise, or something. You could also sprinkle waitForCondition everywhere but that's ugly and might well cause intermittent failures.

It's perhaps worth noting that the fact this test works at all after your patch is probably because openNewForegroundTab will wait 300ms each time, because it waits for the tab to be switched to, and that waits 300ms: bug 1702637. Changing it to not do that causes a lot of orange - it's on my backburner list to fix but it exposes other issues in our code so I haven't gotten around to it. Anyway, the 300ms wait probably allows for the code to find an "idle" slot to do the telemetry work on fast/opt builds in the general case. But Linux debug/tsan/asan are all slow enough that this isn't the case.

The same issue is happening in the histogram part of the test, further down, ie here - it looks like your patch added one waitForCondition already but all the other "do thing, then check histogram" bits are also racy, for the same reason as outlined above, AFAICT.

Hopefully that is still helpful, even if it's not a cut and dry solution?

Flags: needinfo?(brennie)
Flags: needinfo?(brennie)
Pushed by brennie@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d3fbbe0fbae8 Debounce tab count metric collection r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch
Regressions: 1829478
Regressions: 1829464
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: