Closed Bug 1496790 Opened 6 years ago Closed 6 years ago

46.39 - 46.96% tp5n main_normal_fileio (windows7-32, windows7-32-msvc) regression on push cd560f5cff2dd1cebcaa462a207a340011e91f1c (Wed Sep 26 2018)

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(firefox63+ wontfix)

RESOLVED WONTFIX
Tracking Status
firefox63 + wontfix

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=cd560f5cff2dd1cebcaa462a207a340011e91f1c

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 47%  tp5n main_normal_fileio windows7-32-msvc opt e10s stylo     1,850,205.67 -> 2,719,120.67
 46%  tp5n main_normal_fileio windows7-32 opt e10s stylo          1,931,351.67 -> 2,827,393.00


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16235

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
we are unable to bisect/retrigger on tree, so there are a handful of pushes:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=b94d1d0eaf9e5b062ee94b62e0456d06aedbd842&tochange=cd560f5cff2dd1cebcaa462a207a340011e91f1c

here are the changes that are not backed out or anyway related:
Julien Cristau — No bug - post beta 9, disable EARLY_BETA_OR_EARLIER. a=pascalc
Kris Maglione — Bug 1464938 - Add extension lifecycle state information to shutdown blocker for better diagnostics. r=aswan, a=pascalc
Tom Prince — Bug 1486224: [fetch-content] Retry downloads when fetching content; r=gps, a=release


I don't know what [fetch-content] is, and the no bug from Julien just changed EARLY_BETA_OR_EARLIER=.  of course, what does EARLY_BETA_OR_EARLIER do to the code base or build process (63 instances in tree: https://searchfox.org/mozilla-central/search?q=EARLY_BETA_OR_EARLIER&case=false&regexp=false&path=)

I will expect each author here to help determine if their changes are causing file IO access on windows 7 (we don't measure on windows 10 for 63) to increase by ~1M calls.
Flags: needinfo?(mozilla)
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(jcristau)
I am not sure how to push beta to try successfully, in the past I haven't been successful.  :ryanvm do you have tips on how to push m-b to try and get the same results?
Flags: needinfo?(ryanvm)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #1)
> Kris Maglione — Bug 1464938 - Add extension lifecycle state information to
> shutdown blocker for better diagnostics. r=aswan, a=pascalc

I can't imagine how this could possibly affect file IO.
Flags: needinfo?(kmaglione+bmo)
fetch-content is infra-related, so shouldn't affect test results.
Flags: needinfo?(mozilla)
Flags: needinfo?(ryanvm) → needinfo?(aryx.bugmail)
[Tracking Requested - why for this release]: this is a regression that showed up last week on the beta branch only- we should figure this out and keeping eyes on this is important.
To simulate beta pushes:
Import at least the "base config" and "trunk as early beta" patches from https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed,busted,exception,retry,usercancel,runnable&revision=bb1b4427f68438462dfea42276905e79f5198787
Then push to Try.

The other 2 patches reduce error messages for wpt and T-e10s(x) tests.
Flags: needinfo?(aryx.bugmail)
bisecting on try this is related to:
https://hg.mozilla.org/releases/mozilla-beta/rev/cd560f5cff2d

there is no bug, just a change to this:
--- a/build/defines.sh
+++ b/build/defines.sh
@@ -1,3 +1,3 @@
 # Define indicating that this build is prior to one of the early betas. To be
 # unset mid-way through the beta cycle.
-EARLY_BETA_OR_EARLIER=1
+EARLY_BETA_OR_EARLIER=

I assume this is a wontfix, but I will leave that decision up to relman :)
Flags: needinfo?(jcristau)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #7)
> bisecting on try this is related to:
> https://hg.mozilla.org/releases/mozilla-beta/rev/cd560f5cff2d
> 
> there is no bug, just a change to this:
> --- a/build/defines.sh
> +++ b/build/defines.sh
> @@ -1,3 +1,3 @@
>  # Define indicating that this build is prior to one of the early betas. To
> be
>  # unset mid-way through the beta cycle.
> -EARLY_BETA_OR_EARLIER=1
> +EARLY_BETA_OR_EARLIER=
> 
> I assume this is a wontfix, but I will leave that decision up to relman :)

In general I'd expect removing this flag to improve performance. We should try to figure out what it changes to cause a regression.
Is this new from 63?  The graph only seems to go back to the end of August.

Looking through all uses of that macro...

> browser/components/newtab/lib/ActivityStream.jsm:    value: AppConstants.EARLY_BETA_OR_EARLIER,

some pre-release telemetry

> browser/app/profile/firefox.js:#ifdef EARLY_BETA_OR_EARLIER

browser.tabs.multiselect gets disabled

> browser/app/profile/firefox.js:#ifdef EARLY_BETA_OR_EARLIER

browser.newtabpage.activity-stream.improvesearch.topSiteSearchShortcuts (bug 1495517)

> browser/app/profile/firefox.js:#ifndef EARLY_BETA_OR_EARLIER

geo.wifi.uri (MLS vs GLS)

> browser/app/profile/firefox.js:#ifdef EARLY_BETA_OR_EARLIER

browser.contentblocking.reportBreakage.enabled

> dom/html/HTMLInputElement.cpp:#ifdef EARLY_BETA_OR_EARLIER

telemetry

> ipc/glue/ProtocolUtils.h:#ifdef EARLY_BETA_OR_EARLIER
> ipc/glue/MessageChannel.cpp:#ifdef EARLY_BETA_OR_EARLIER
> ipc/glue/BackgroundChildImpl.h:#ifdef EARLY_BETA_OR_EARLIER
> ipc/glue/BackgroundChildImpl.cpp:#ifdef EARLY_BETA_OR_EARLIER

support for some telemetry collection

> media/mtransport/nr_socket_prsock.cpp:#if !EARLY_BETA_OR_EARLIER
> media/mtransport/nr_socket_prsock.cpp:#if !EARLY_BETA_OR_EARLIER

related to bug 1013007

> modules/libpref/init/StaticPrefList.h:#ifdef EARLY_BETA_OR_EARLIER

layout.css.moz-document.url-prefix-hack.enabled; not changed in a while

> modules/libpref/init/StaticPrefList.h:#ifdef EARLY_BETA_OR_EARLIER

layout.css.xul-box-display-values.content.enabled; bug 1477553

> modules/libpref/init/StaticPrefList.h:#ifdef EARLY_BETA_OR_EARLIER

layout.css.xul-tree-pseudos.content.enabled; bug 1480054

> modules/libpref/init/all.js:#ifdef EARLY_BETA_OR_EARLIER

dom.keyboardevent.dispatch_during_composition; bug 1446401

> modules/libpref/init/all.js:#ifdef EARLY_BETA_OR_EARLIER

editor.resizing.enabled_by_default
editor.inline_table_editing.enabled_by_default
editor.positioning.enabled_by_default
(bug 1449564)

> modules/libpref/init/all.js:#ifdef EARLY_BETA_OR_EARLIER
> modules/libpref/init/all.js:#else // #ifdef EARLY_BETA_OR_EARLIER
> modules/libpref/init/all.js:#endif // #ifdef EARLY_BETA_OR_EARLIER #else

intl.ime.hack.on_any_apps.fire_key_events_for_composition, android only

> modules/libpref/init/all.js:#ifdef EARLY_BETA_OR_EARLIER

font.name-list.sans-serif.zh-TW; bug 1394709

> modules/libpref/init/all.js:#ifdef EARLY_BETA_OR_EARLIER

intl.tsf.hack.allow_to_stop_hacking_on_build_17643_or_later; bug 1481153

> modules/libpref/init/all.js:#if !defined(EARLY_BETA_OR_EARLIER)

network.tcp.tcp_fastopen_enable; bug 1431738

> security/sandbox/win/SandboxInitialization.cpp:#if defined(_X86_) && (defined(EARLY_BETA_OR_EARLIER) || defined(DEBUG))

disable handle verifier; bug 1453929 / bug 1452090

> services/sync/services-sync.js:#ifdef EARLY_BETA_OR_EARLIER

services.sync.engine.bookmarks.validation.enabled (bug 1412332)
services.sync.engine.passwords.validation.enabled (bug 1385127)

> widget/android/nsAppShell.h:#ifdef EARLY_BETA_OR_EARLIER
> widget/android/nsAppShell.h:#ifdef EARLY_BETA_OR_EARLIER

android code, irrelevant

> xpcom/threads/SharedThreadPool.cpp:#ifdef EARLY_BETA_OR_EARLIER

some extra printf in SharedThreadPoolShutdownObserver::Observe (bug 1351655)
here is a link to the files accessed before/after (all file IO, not just main thread and not just after startup - also look at the two tabs):
https://docs.google.com/spreadsheets/d/144dBviaxy8HDc5mXYmdTQ_SRyPdso5BKpPhOFazSA64/edit?usp=sharing

I assume one of the 63 instances of EARLY_BETA_OR_EARLIER helps contribute to this:
https://searchfox.org/mozilla-central/search?q=EARLY_BETA_OR_EARLIER&path=

the files don't give me a good idea of what is changing.
I ran a bunch of things through try to try and figure out what causes the slowdown.  TL;DR: the biggest change seems to come from font.name-list.sans-serif.zh-TW.

First, to make sure I could reproduce Joel's results, current beta vs early-beta:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=271a31dfcefdcdac782d65e111cef2840b15103a&framework=1

With browser.newtabpage.activity-stream.improvesearch.topSiteSearchShortcuts=true:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=446b23aded17&framework=1&filter=main_normal_fileio

With dom.keyboardevent.dispatch_during_composition=true:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=326403e69de9&framework=1&filter=main_normal_fileio

With layout.css.moz-document.url-prefix-hack.enabled=false:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=bc23b0d5696c&framework=1&filter=main_normal_fileio

With layout.css.xul-box-display-values.content.enabled=false:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=b489c507690e&framework=1&filter=main_normal_fileio

With layout.css.xul-tree-pseudos.content.enabled=false:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=a9c902285c6a&framework=1&filter=main_normal_fileio

With browser.tabs.multiselect=true:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=3841ca40f350&framework=1&filter=main_normal_fileio

With the media/mtransport/nr_socket_prsock.cpp early beta code:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=029922f5b68e&framework=1&filter=main_normal_fileio

With the editor prefs set to early beta:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=4d3037c9bae3&framework=1&filter=main_normal_fileio

With the added font in font.name-list.sans-serif.zh-TW:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=07975ec470b4&framework=1&filter=main_normal_fileio

With intl.tsf.hack.allow_to_stop_hacking_on_build_17643_or_later=true:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=02c635ae5bea&framework=1&filter=main_normal_fileio

With network.tcp.tcp_fastopen_enable=true:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newRevision=a69114b4c638&framework=1&filter=main_normal_fileio
(In reply to Julien Cristau [:jcristau] from comment #12)
> I ran a bunch of things through try to try and figure out what causes the
> slowdown.  TL;DR: the biggest change seems to come from
> font.name-list.sans-serif.zh-TW.
> 
[...]
> With the added font in font.name-list.sans-serif.zh-TW:
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=bd81bb1d6a46&newProject=try&newR
> evision=07975ec470b4&framework=1&filter=main_normal_fileio

Kato-san, are we tracking removing the EARLY_BETA_OR_EARLIER ifdef for that change?
Flags: needinfo?(m_kato)
See Also: → 1394709
Wontfix for 63 as this is not exactly a regression since the performance change is due to a change in loaded fonts for zh-TW between early and late betas.
(In reply to Julien Cristau [:jcristau] from comment #12)
> I ran a bunch of things through try to try and figure out what causes the
> slowdown.  TL;DR: the biggest change seems to come from
> font.name-list.sans-serif.zh-TW.

Does this affect non-zh-TW builds?
this is occuring on en-US builds that we run in automation.
(In reply to Andrew Overholt [:overholt] from comment #15)
> (In reply to Julien Cristau [:jcristau] from comment #12)
> > I ran a bunch of things through try to try and figure out what causes the
> > slowdown.  TL;DR: the biggest change seems to come from
> > font.name-list.sans-serif.zh-TW.
> 
> Does this affect non-zh-TW builds?

Yes, example, when lang attribute of element is zh-TW, font selection uses zh-TW preference for Kanji.

(In reply to Julien Cristau [:jcristau] from comment #13)
> Kato-san, are we tracking removing the EARLY_BETA_OR_EARLIER ifdef for that
> change?

OK, I will change default font for zh-TW on all channel since no regression report after this change.
Flags: needinfo?(m_kato)
Depends on: 1498438
given this, I think this bug should be marked as worksforme or wontfix
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
See Also: → 1573126
You need to log in before you can comment on or make changes to this bug.