Closed Bug 1298608 Opened 8 years ago Closed 8 years ago

2.63 - 20.65% sessionrestore_no_auto_restore / tart / tp5n main_normal_fileio / tp5o Main_RSS / tp5o Private Bytes / tp5o_scroll / tsvgr_opacity (windows7-32, windowsxp) regression on push 209072396aa5ab5c5a0a28109a980dbbcd884922 (Thu Aug 25 2016)

Categories

(Core :: Graphics, defect)

51 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox51 --- affected

People

(Reporter: jmaher, Assigned: mchang)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push 209072396aa5ab5c5a0a28109a980dbbcd884922. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 21%  tp5n main_normal_fileio windows7-32 pgo          52101054.17 -> 62861425.83
 18%  tp5n main_normal_fileio windows7-32 opt          52715503.92 -> 62263896.33
 17%  tart summary windowsxp opt                       6.03 -> 7.07
 15%  tsvgr_opacity summary windows7-32 opt            695 -> 797.92
 15%  tsvgr_opacity summary windowsxp opt              689.07 -> 789.59
 14%  tp5o_scroll summary windowsxp pgo                3.17 -> 3.62
 14%  tart summary windowsxp pgo                       4.72 -> 5.39
 14%  tsvgr_opacity summary windowsxp opt              686.62 -> 782.7
 10%  tp5o_scroll summary windowsxp opt                4.37 -> 4.79
  5%  tp5o Main_RSS windows7-32 pgo e10s               103321503.91 -> 108475935.56
  5%  tp5o Main_RSS windows7-32 opt                    170334630.52 -> 178443877.92
  4%  tp5o Private Bytes windows7-32 pgo               200766365.94 -> 207989087.29
  4%  tp5o Main_RSS windowsxp pgo                      218783527.86 -> 226472317.73
  3%  tp5o Private Bytes windowsxp opt                 204362350.89 -> 211405027.36
  3%  tp5o Private Bytes windowsxp pgo                 202393707.92 -> 209308076.8
  3%  tp5o Main_RSS windowsxp opt                      222326566.25 -> 229492442.28
  3%  tp5o Main_RSS windows7-32 pgo                    169171122.74 -> 174158572.74
  3%  sessionrestore_no_auto_restore windowsxp pgo     694.92 -> 713.17

Improvements:

 58%  tsvgr_opacity summary windows7-32 pgo e10s     934.8 -> 388.74
 57%  tsvgr_opacity summary windowsxp pgo e10s       901.89 -> 388.68
 39%  tsvgr_opacity summary windows7-32 opt e10s     1086.24 -> 665.17
 38%  tsvgr_opacity summary windowsxp opt e10s       1039.3 -> 641.29
 32%  tsvgx summary windowsxp opt e10s               272.79 -> 186.83
 28%  tsvgx summary windows7-32 pgo                  400.21 -> 287.02
 28%  tsvgx summary windows7-32 opt e10s             274.5 -> 198.56
 27%  tsvgx summary windowsxp pgo e10s               232.65 -> 168.79
 27%  tsvgx summary windowsxp pgo                    405.4 -> 296.45
 26%  tps summary windows7-32 pgo e10s               46.95 -> 34.55
 26%  tsvgx summary windows7-32 opt                  464.12 -> 344.55
 26%  tsvgx summary windows7-32 pgo e10s             221.85 -> 164.86
 25%  tsvgx summary windowsxp opt                    465.3 -> 348.81
 23%  cart summary windowsxp pgo                     40.64 -> 31.23
 22%  tps summary windows7-32 opt e10s               53.58 -> 41.56
 22%  tps summary windowsxp pgo e10s                 40.91 -> 31.87
 21%  tsvgr_opacity summary windows7-32 pgo          563.79 -> 445.88
 18%  tsvgr_opacity summary windowsxp pgo            550.86 -> 449.83
 18%  tps summary windowsxp opt e10s                 48.54 -> 39.89
 14%  cart summary windowsxp opt                     44.82 -> 38.71
 13%  tps summary windows7-32 pgo                    64.11 -> 55.65
 10%  tps summary windows7-32 opt                    79.38 -> 71.52
  8%  tcanvasmark summary windowsxp pgo e10s         8656.02 -> 9317.25
  7%  tps summary windowsxp pgo                      59.92 -> 55.63
  7%  tcanvasmark summary windows7-32 pgo e10s       8892.38 -> 9503.71
  7%  tcanvasmark summary windows7-32 pgo            8853.67 -> 9432.17
  6%  tps summary windowsxp opt                      77.03 -> 72.31
  2%  tp5o summary windows7-32 opt e10s              363.52 -> 354.56


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=2749

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
I did a series of retriggers and generated a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=024e7dc7a219&newProject=mozilla-inbound&newRevision=209072396aa5&framework=1

these match up well with the alerts.  I see winxp having tart, tp5o_scroll, and svg_opacity regressions, otherwise memory and fileio for win7 is increased.  Oddly most seem to be non-e10s, maybe that makes sense given the nature of the patch.

Mason, can you look into this?
Component: Untriaged → Graphics
Flags: needinfo?(mchang)
Product: Firefox → Core
Assignee: nobody → mchang
Flags: needinfo?(mchang)
From irc, if you could please have a patch or something to get some memory profiling data from this push, that'd be great!

As for the logs, the amount of bytes read on main thread from before (https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32/1472152998/mozilla-inbound_win7_ix_test-xperf-bm112-tests1-windows-build637.txt.gz) and after (https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32/1472153839/mozilla-inbound_win7_ix_test-xperf-bm126-tests1-windows-build567.txt.gz), the amount of mainthread_readbytes is roughly the same, and actually the amount of readbytes from xul.dll is exactly the same.
Flags: needinfo?(jmaher)
I figured out how to get the data, here are some try pushes:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=cae7eabc8434a4e4380cc5dd7761fb1dffa8ee6e&tochange=0beee12cd65a565d551e0d57742116faf3a1e1b2

I suspect tomorrow I will be able to pinpoint the issue with about 10 minutes looking into the log files.
ok, this is really odd, but it looks like this patch is showing a difference in fonts?!?

here is the differences in fileIO access for the main thread during normal running (non startup/shutdown):
filename	bad	good	diff
 C:\windows\Fonts\simsun.ttc	8773632	3686400	5087232
 C:\windows\Fonts\msmincho.ttc	3731456	1785856	1945600
 C:\windows\Fonts\msgothic.ttc	5828608	4231168	1597440
 C:\windows\Fonts\gulim.ttc	3190784	2408448	782336
 C:\windows\Fonts\timesbd.ttf	356352	163840	192512
 C:\windows\Fonts\lucon.ttf	98304		98304
 C:\windows\Fonts\times.ttf	401408	327680	73728
 C:\windows\Fonts\segoeuil.ttf	65536		65536
 C:\windows\Fonts\tahoma.ttf	380928	319488	61440
 C:\windows\Fonts\verdanab.ttf	139264	90112	49152
 C:\windows\Fonts\ariali.ttf	241664	196608	45056
 C:\windows\Fonts\timesi.ttf	229376	184320	45056
 C:\windows\Fonts\cour.ttf	98304	65536	32768
 C:\windows\Fonts\l_10646.ttf	131072	98304	32768
 C:\windows\Fonts\tahomabd.ttf	356352	331776	24576
 C:\Users\cltbld\AppData\Local\Temp\TMPDIR\profile\cache2\index.log	83096	64984	18112
 C:\windows\Fonts\arialbi.ttf	217088	204800	12288
 C:\windows\Fonts\shruti.ttf	270336	262144	8192
 C:\windows\Fonts\msyhbd.ttf	3448832	3444736	4096
 C:\windows\Fonts\pala.ttf	315392	311296	4096
 C:\windows\Fonts\seguisym.ttf	299008	294912	4096
 C:\windows\Fonts\trebucbd.ttf	102400	98304	4096
 C:\$Extend\$UsnJrnl:$J	4096	8192	-4096
 C:\windows\Fonts\malgun.ttf	32768	49152	-16384
 C:\windows\Fonts\segoeuib.ttf	32768	65536	-32768
 C:\Users\cltbld\AppData\Local\Temp\TMPDIR\profile\cookies.sqlite	819400	950472	-131072
 C:\windows\Fonts\meiryo.ttc	2183168	2334720	-151552
 C:\windows\Fonts\msyh.ttf	7987200	8183808	-196608
 C:\windows\Fonts\msjh.ttf	32768	327680	-294912
 C:\windows\Fonts\mingliu.ttc	303104	618496	-315392
 C:\$LogFile	32768	385024	-352256


it is possible that we are forcing fonts to default to different families?  Maybe we are displaying errors/css/info that uses certain fonts?  Maybe to render we prefer a different font?
Flags: needinfo?(jmaher)
:mchang, you mentioned over IRC some ideas that you had, did you have any luck?
Flags: needinfo?(mchang)
(In reply to Joel Maher ( :jmaher) from comment #5)
> :mchang, you mentioned over IRC some ideas that you had, did you have any
> luck?

No not yet, hoping to get to it this week, but skia windows is nightly only after bug 1301111 and I might back out, not sure yet.
Flags: needinfo?(mchang)
Bug 1298484 fixes the Windows XP Main_RSS and Private Bytes regression.
Depends on: 1298484
The xperf on my local windows machine that came with the SDK ( Microsoft (R) Windows (R) Performance Analyzer Version 6.3.9600) doesn't seem to work with the way it's being run in talos. Do you know what version of xperf is on try?
Flags: needinfo?(jmaher)
I am not sure of the version on the talos machines, but here are the instructions for setting up a new machine:
https://wiki.mozilla.org/index.php?title=ReferencePlatforms/Test/Win7#Xperf_and_mozprofilerprobe.mof
Flags: needinfo?(jmaher)
Skia windows is being backed out so the numbers should reset soon.
Joel, do you know if this update is installed on the Windows 7 try machines? - https://support.microsoft.com/en-us/kb/2505438
Flags: needinfo?(jmaher)
/me redirects to Q.

Q, do you know if this update is installed on the Windows 7 try machines? - https://support.microsoft.com/en-us/kb/2505438
Flags: needinfo?(jmaher) → needinfo?(q)
See Also: → 1294337
I think we're going to accept the regressions.

The tp5n main_normal_fileio windows7-32 regressions don't make a lot of sense and aren't related to using dwrite fonts. From my investigation, we can't control when windows decides to read from a font file versus hitting the windows cache. Even with GDI fonts, both cairo and skia use ExtTextOut and native GDI system calls to get the font, and we're not requesting any extra fonts over cairo.

The svgr_opacity tests only get worse in non-e10s. Running it through a profiler, all the time is being spent in the blitter, but we're not using any different blitter in e10s and nothing else stands out in the profile.

Most of the RSS regressions were fixed in bug 1298484.

Finally for the tart regressions, we accepted these regressions in bug 1294337. Since this is layers acceleration disabled, we use cairo to composite with basic layers and all the hot path is in the compositor. I didn't really see anything in skia that stood out.

At least we get a lot of nice wins.
:mchang, are we going to reland skia for windows?
(In reply to Joel Maher ( :jmaher) from comment #15)
> :mchang, are we going to reland skia for windows?

Not in 51, but in 52. We're going to do a telemetry experiment first in bug 1299605, then reland sometime in the middle of October.
Flags: needinfo?(q)
Blocks: 1302124
No longer blocks: 1291351
While looking at a different Skia regression with Lee, I saw that tp5n nonmain_startup_fileio op are vastly different on inbound versus try. See:

https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D&series=%5Btry,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D&series=%5Btry,e5f5eaa174ef22fdd6b6e150e8c450aa827c2ff6,1,1%5D&series=%5Bmozilla-inbound,e5f5eaa174ef22fdd6b6e150e8c450aa827c2ff6,1,1%5D

Do you have any idea why this is Joel? Or am I looking at the data right? Thanks!
Flags: needinfo?(jmaher)
yes, that is tracked in bug 1274018.  I am not sure what to do here, but that is what is happening.
Flags: needinfo?(jmaher)
Looks like the tsvgr_opacity summary windows7-32 opt  is fixed by bug 1299435. Also resolving this as wontfix.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.