2.72% Linux* tp5o regression on Mozilla-Inbound-Non-PGO (v.41) on May 12, 2015 from push d6d25651b082

RESOLVED FIXED in Firefox 41

Status

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: vaibhav1994, Assigned: jfkthame)

Tracking

unspecified
mozilla41
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 fixed)

Details

Attachments

(2 attachments)

Talos has detected a Firefox performance regression from your commit d6d25651b082 in bug 1056479.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=d6d25651b082&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux -u none -t tp5o  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tp5o

Making a decision:
As the patch author we need your feedback to help us handle this regression.

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
John, the revisions causing the regression is one of: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=d6d25651b082

Here are the re-triggered pushes on mozilla-inbound-non-pgo: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=2d8698ff73b6&tochange=c48d5ce78376&filter-searchStr=Ubuntu%20HW%2012.04%20x64%20mozilla-inbound%20talos%20tp5o

Note that, it is not necessary to backout the offending patches. Since it is 2.72% only, it will be good if we can fix it else we can mark it as wontfix if it is too time consuming.
Depends on: 1056479
Flags: needinfo?(jdaggett)
Assignee: nobody → jdaggett
Flags: needinfo?(jdaggett)
(In reply to Vaibhav (:vaibhav1994) from comment #1)
> John, the revisions causing the regression is one of:
> http://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?changeset=d6d25651b082

Yes, this is the change in the font backend for Linux. We're now using a larger amount of shared code across platforms with less Linux-specific code.

I'll take and look and see how we can tune this a bit.
That sounds great, thanks John!
Blocks: 1056479
No longer depends on: 1056479
John, any luck with this?  It has been a while.
Flags: needinfo?(jdaggett)
(In reply to Joel Maher (:jmaher) from comment #4)
> John, any luck with this?  It has been a while.

I was hoping some of the other fixes would bring down the tp5o numbers but it looks like it didn't. I'll do some more tuning.
Flags: needinfo?(jdaggett)
Hey :jtd, I am checking in to see if this is still an active bug for you or not.  If there is a timeline to work on this great, if not- I prefer to close this as wontfix.
Flags: needinfo?(jdaggett)
To get a 2.7% regression on tp5o, it seems like the underlying regression in the font backend must be considerably greater, as font management/selection is only a small part of tp5o's overall work. So I really hope we can identify what's behind this (maybe profiling a Linux tp5o run would identify hotspots in the font code?) and find a way to improve it.
I pushed a try run with profiling, and from looking at a few of the resulting profiles, it appears that the FcConfigSubstitute() calls in gfxFcPlatformFontList::FindFamily can be pretty expensive. E.g. in 

http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/6768d1d0db0589a8b56f292293f99b3ab65fb5d115b15cd6671b24a1755f4b28f63abab161c91e0cc956678d36ee8da0af6904cc1a8be2dd39b411ed8d9eef44&pathInZip=profile_tp5o/huffingtonpost.com/cycle_0.sps

it looks like 3% of the total running time is spent in those calls alone.

So if we can find some alternative that avoids calling FcConfigSubstitute (twice, even!), this should help significantly. Or at least cache the sentinelFirstFamily instead of recomputing it each time, and only invalidate the cached value if the font list changes. That should virtually halve the cost of FindFamily().
oh awesome stuff!
This should provide a small but (probably) measurable improvement, afaict; though if we could avoid the other FcConfigSubstitute call too, that would be more better.
Attachment #8616179 - Flags: review?(jdaggett)
Tryserver tp5o run with the patch above gives an average of 304.213:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=954c9ddefc70

Tryserver tp5o run on the parent of the above (i.e. without the patch), average 314.47:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=de0e7394c67d

Which suggests an improvement of slightly over 3%. :)
well done!
Comment on attachment 8616179 [details] [diff] [review]
Cache the result of calling FcConfigSubstitute for our sentinel font name, to make gfxFcPlatformFontList::FindFamily less expensive

Review of attachment 8616179 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8616179 - Flags: review?(jdaggett) → review+
We should be able to get an additional boost of a similar amount by caching the actual familyname lookups in FindFamily, so that we can bypass the expensive FcConfigSubstitute call on any subsequent lookups for the same family name. Tryserver job https://treeherder.mozilla.org/#/jobs?repo=try&revision=23c1f410f8ed shows a tp5o average of 292.14 with this added patch.
Attachment #8616771 - Flags: review?(jdaggett)
Assignee: jdaggett → jfkthame
Status: NEW → ASSIGNED
(In reply to Jonathan Kew (:jfkthame) from comment #15)

> tp5o average of 292.14 with this added patch.

Err... I mean an average of 293.574, but that's still worthwhile. :)
(In reply to Jonathan Kew (:jfkthame) from comment #16)
> (In reply to Jonathan Kew (:jfkthame) from comment #15)
> 
> > tp5o average of 292.14 with this added patch.
> 
> Err... I mean an average of 293.574, but that's still worthwhile. :)

The tp5o numbers tend to vary widely, it's actually better to use the PGO build numbers because those tend to vary a whole lot less and regressions/fixes are easier to see.
Flags: needinfo?(jdaggett)
Comment on attachment 8616771 [details] [diff] [review]
patch 2 - Cache family-name lookups in gfxFcPlatformFontList::FindFamily, to avoid repeating expensive calls to FcConfigSubstitute

Review of attachment 8616771 [details] [diff] [review]:
-----------------------------------------------------------------

>      // Example:
>      //
>      //   serif ==> DejaVu Serif, ...
>      //   Helvetica, serif ==> Helvetica, TeX Gyre Heros, Nimbus Sans L, DejaVu Serif
>      //
>      // In this case fontconfig is including Tex Gyre Heros and
>      // Nimbus Sans L as alternatives for Helvetica.
>  
> +    NS_ConvertUTF16toUTF8 familyToFind(familyName);
> +    gfxFontFamily* cached = mFcSubstituteCache.GetWeak(familyToFind);
> +    if (cached) {
> +        return cached;
> +    }
> +

We should probably rewrite the comment here to explain what's going on
below, since at this point that comment is outdated.
Attachment #8616771 - Flags: review?(jdaggett) → review+
https://hg.mozilla.org/mozilla-central/rev/3324527b41cc
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
verified on graph server!
FTR, the first landing here resulted in:

>>>>>
Improvement: Mozilla-Inbound - Tp5 Optimized - Ubuntu HW 12.04 x64 - 2.91% decrease
-----------------------------------------------------------------------------------
    Previous: avg 244.216 stddev 1.925 of 12 runs up to revision edaec128d6ec
    New     : avg 237.105 stddev 1.470 of 12 runs since revision e03144903268
    Change  : -7.112 (2.91% / z=3.694)
    Graph   : http://mzl.la/1I0GTZm

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=edaec128d6ec&tochange=e03144903268
<<<<<

and then the second patch gave us in addition:

>>>>>
Improvement: Mozilla-Inbound-Non-PGO - Tp5 Optimized - Ubuntu HW 12.04 - 2.73% decrease
---------------------------------------------------------------------------------------
    Previous: avg 348.147 stddev 1.815 of 12 runs up to revision 5389f1e4087c
    New     : avg 338.638 stddev 2.570 of 12 runs since revision 0c35e99b2167
    Change  : -9.509 (2.73% / z=5.240)
    Graph   : http://mzl.la/1GyBNsw

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5389f1e4087c&tochange=0c35e99b2167
<<<<<
Depends on: 1173826
Depends on: 1174946
You need to log in before you can comment on or make changes to this bug.