Closed Bug 1257304 Opened 4 years ago Closed 4 years ago

Font inflation significantly slows down page loading since FF45

Categories

(Firefox for Android :: General, defect)

45 Branch
Unspecified
Android
defect
Not set

Tracking

()

VERIFIED FIXED
Firefox 48
Tracking Status
firefox45 --- affected
firefox46 + fixed
firefox47 + fixed
firefox48 + verified
fennec 46+ ---

People

(Reporter: JanH, Assigned: esawin)

References

Details

(Keywords: perf, regression)

Attachments

(1 file, 2 obsolete files)

With Firefox 45, I've noticed that some pages take a significantly longer time to load than before.

Not all pages are affected (a mobile Flickr photo page, e.g. https://m.flickr.com/#/photos/dgeezer/7263564070/ seems to work fine), but with some (e.g. http://diamondgeezer.blogspot.com/2012/05/abbey-creek.html or https://bugzilla.mozilla.org/query.cgi?format=advanced) it is quite noticeable.
Testing with the Bugzilla search page, on unaffected builds loading takes ~ 30 s on a fresh load and 10 - 15 s for subsequent (i.e. close the page and then re-open it in a new tab) attempts.
On affected builds, the first page load takes over two minutes and subsequent page loads still take almost one minute to complete, which makes browsing on affected pages a major pain.
In both cases, the keyboard appears at approximately the same point after loading the page (15 s for the first load and 6 s for subsequent loads), so the big delays only happen somewhere after that point.

It's not reproducible with a fresh profile, however copying over the prefs.js from my main profile is sufficient to trigger slow page loading.
After some experimenting, it seems that activating font inflation, i.e. in my case
(user_pref("font.size.inflation.lineThreshold", 300);
user_pref("font.size.inflation.minTwips", 120);)
is enough to trigger this behaviour, which explains why I didn't experience any problems on the mobile Flickr page.

A mozregression run leads to https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=5f9371f97e1e21f10fa20b39f33774116cfcf5f0&tochange=da972c7c5cf485bff4c3bb36d0c5f7b885b42435, which points to bug 1219016 as the culprit.

I've also tested with today's Nightly and while some of the times might vary from the measurements above, the basic problems remains and loading times for pages with font inflation are somewhere in the region of > 1 min.

My hardware: Samsung Galaxy S3 Mini, Android 4.1.2
Eugen, can you see if there's anything obviously busted going on here by profiling.
Assignee: nobody → esawin
tracking-fennec: ? → 46+
I've done a bit of further testing and it looks like the lineThreshold setting also has a big influence. I've got it set to 300, which compared to the default value of 400 means that font inflation is somewhat more likely to kick in on a page element.
With the default value of 400, a warm load of the advanced Bugzilla search page only takes around 30 seconds on my phone, which is not quite as bad, but still almost double the time it takes without font inflation/before this regression.
I've captured two profiler runs of loading http://freefall.purrsia.com/ffdex.htm, which consists of all very simple HTML, only quite a bit of it.
For this test run, I've only set the font size to medium (i.e. font.size.inflation.minTwips = 120) and left the lineThreshold at its default value of 400.

Trace #1 was captured on the 2015-11-11 nightly, i.e. the last good build:
https://cleopatra.io/#filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A26097,%22end%22%3A34877}]&report=9e116bc89337b802d491e8ec2a1e5d7e585c9c87

Trace #2 was captured on a current local Nightly build:
https://cleopatra.io/#filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A93105,%22end%22%3A191324}]&report=7aa01ec68bf284fea734d8dd67c3b2bf4c1e2e11
Looks like in #2 we're doing a bunch of Java garbage (string manipulation?). Strange. I don't know why that would be the case.
Okay, part of this seems to be exacerbated by me having the Xposed framework active [1].
Nevertheless, even with it completely deactivated, page loading with font inflation is still noticeably slower, even if it's not quite as extreme as with some of my measurements above (and of course, page load times on unaffected builds have become slightly faster, too).

I've also further bisected the push from bug 1219016 with local builds and the regression was introduced in d42f4c84b136, i.e. with the switch from getContext() to getApplicationContext().

With Xposed deactivated, on the last good (local) build (https://hg.mozilla.org/mozilla-central/rev/9dbbf81894ca) the advanced Bugzilla search page (https://bugzilla.mozilla.org/query.cgi?format=advanced) takes about 12 - 14 seconds between me hitting Paste & Go and the loading indicator disappearing on a warm load. Similarly, the Freefall index page (http://freefall.purrsia.com/ffdex.htm) takes about 6 - 7 seconds on a warm load. On the 44 release build from the play store I get very similar measurements.
On the first affected build (https://hg.mozilla.org/mozilla-central/rev/d42f4c84b136) on the other hand, loading the Bugzilla search takes about 17 - 19 (+38 %) seconds and the Freefall index approximately 14 - 16 seconds (+130 %), although somehow those times don't seem to be 100 % reproducible - within a session they're usually consistent, but if I kill Firefox and restart it, sometimes I'm getting slightly slower or faster loading times (but still definitively slower than on 44).
Moving forward to today's Nightly, things seem to have deteriorated somewhat, because the Freefall page now is now taking around 26 - 28 seconds to load.

Clean profile without Xposed on today's official Nightly build:
https://cleopatra.io/#report=8de08cbfc5093737b0b8c1cec778cd97ee16aeb3&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A666709,%22end%22%3A694140}]&selection=0,1,1212,1214,3318,889

Local build first affected revision (d42f4c84b136):
https://cleopatra.io/#report=fbbbfdc061cbcf9c3a6e7c1b0675c5a082895673&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A111405,%22end%22%3A134173}]&selection=0,1,1057,1650,1652,370

Local build last unaffected revision (9dbbf81894ca):
https://cleopatra.io/#report=ef72e94bc0615eeeed510f86d1f895d711588ae1&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A127262,%22end%22%3A134122}]&selection=0,1,1496,1967,1969

So all that strange spurious Java stuff disappears, and what remains is an increased bunch of calls to __ioctl in the affected builds.

[1] Which hooks a load of Java functions, although so far I haven't tested whether due to the Xposed framework itself or one specific module I've got installed.
Some slight confusion, I think in comment 5 I actually tested with the 2015-03-07 Nightly.
The Xposed framework itself and most modules don't seem to have a very large performance impact and strangely enough, the Freefall index page now consistently loaded in around 20 s.
The only modules with a noticeable impact were "App Settings" (+7 s) and XPrivacy (+ 45 s!), even though both modules were set up not to do anything in particular for Firefox.

Unless somebody has a better idea, I'll try and see if further bisecting of the changes from d42f4c84b136 turns up any line(s) where the change to getApplicationContext() had a particularly detrimental effect on reflow performance (with and without Xposed).
I can confirm a significant performance regression in load/rendering times with font inflation enabled.

Some results on Fennec 48.0a1 2016-03-29 loading http://freefall.purrsia.com/ffdex.html:
Pixel C, default:      min=1657, max=2642, mean=1929, median=1906, std-dev=147 (8%)
Pixel C, minTwips=120: min=2080, max=2902, mean=2390, median=2347, std-dev=181 (8%)
S3 Mini, default:      min=6437, max=18888, mean=11021, median=8795, std-dev=4201 (38%)
S3 Mini, minTwips=120: min=15939, max=30195, mean=20933, median=19699, std-dev=3188 (15%)
(In reply to Jan Henning [:JanH] from comment #6)
> 
> Unless somebody has a better idea, I'll try and see if further bisecting of
> the changes from d42f4c84b136 turns up any line(s) where the change to
> getApplicationContext() had a particularly detrimental effect on reflow
> performance (with and without Xposed).

I think that's the best option. Prime suspects would be display-related calls such as GeckoAppShell.getDensity() or GeckoAppShell.getScreenSize().
MinimumFontSizeFor in nsLayoutUtils.cpp is called approx. 40k times when rendering http://freefall.purrsia.com/ffdex.html which cascades down to GeckoAppShell::getScreenSize().

With [1] we've introduced some overhead by using the application context instead of the activity, which has a dramatic effect in this case.

Reverting GeckoAppShell::getScreenSize() to use the activity interface fixes the regression. This could have performance implications in other places, maybe less noticeable, too.

However, there remains the issue that we have so many redundant calls to getScreenSize() without the screen size changing, this should be handled more efficiently.

[1] https://hg.mozilla.org/mozilla-central/rev/d42f4c84b136
Yeah we should cache the result at least.
With this patch we cache the screen size in GeckoAppShell.

This reduces rendering times of the test page from 20s to 7s.

Results loading http://freefall.purrsia.com/ffdex.html
S3 Mini, minTwips=120: min=6387, max=9017, mean=7135, median=6928, std-dev=538 (8%)
Attachment #8736957 - Flags: review?(nchen)
Comment on attachment 8736957 [details] [diff] [review]
0001-Bug-1257304-1.1-Cache-screen-size-and-update-only-on.patch

Review of attachment 8736957 [details] [diff] [review]:
-----------------------------------------------------------------

Instead of adding getScreenSize(boolean), I'd prefer keeping getScreenSize() as before and add resetScreenSize(), which is called by GeckoScreenOrientation. In resetScreenSize you simply set sScreenSize to null.

It's not a big deal so your call.

::: mobile/android/base/java/org/mozilla/gecko/GeckoAppShell.java
@@ +2856,4 @@
>          return 0;
>      }
>  
> +    public static synchronized Rect getScreenSize(final boolean update) { 

nit: trailing space

@@ +2869,2 @@
>      @WrapForJNI
> +    public static Rect getScreenSize() { 

nit: trailing space

@@ +2869,3 @@
>      @WrapForJNI
> +    public static Rect getScreenSize() { 
> +        return getScreenSize(false);

nit: /* update */ false

::: mobile/android/base/java/org/mozilla/gecko/GeckoScreenOrientation.java
@@ +156,5 @@
>                  GeckoEvent.createScreenOrientationEvent(aScreenOrientation.value,
>                                                          getAngle()));
>          }
> +        // Update screen size.
> +        GeckoAppShell.getScreenSize(true);

nit: /* update */ true
Attachment #8736957 - Flags: review?(nchen) → review+
(In reply to Jim Chen [:jchen] [:darchons] from comment #12)
> Instead of adding getScreenSize(boolean), I'd prefer keeping getScreenSize()
> as before and add resetScreenSize(), which is called by
> GeckoScreenOrientation. In resetScreenSize you simply set sScreenSize to
> null.

For that we would need an additional synchronization object, which I tried to avoid, although I agree it would be more expressive.
Addressed comments.
Attachment #8736957 - Attachment is obsolete: true
Attachment #8737303 - Flags: review+
I backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/6166a6070ec2e78b2994e1670c2fb62db27cb6e2 because it was making the merge from mozilla-central back to inbound difficult. Feel free to reland this as you see fit.
Flags: needinfo?(esawin)
Added GeckoAppShell.resetScreenSize based on review comment.
Attachment #8737303 - Attachment is obsolete: true
Flags: needinfo?(esawin)
Attachment #8737749 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/a52cfd253d3b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
This sounds like a pretty bad performance regression for Firefox for Android. 
Should we back this out from aurora and beta as well?
Flags: needinfo?(esawin)
Keywords: perf
Wouldn't uplifting this patch be better?

In any case I can confirm that the issue is fixed in today's Nightly (build 20160405030214). I've tested with the Bugzilla advanced search page and the Freefall index page and loading times are back to Firefox 44 levels, even with Xposed enabled.
Status: RESOLVED → VERIFIED
We shouldn't back out bug 1219016, it's part of a long-term refactoring initiative. We can uplift this fix, instead.
Flags: needinfo?(esawin)
(In reply to Eugen Sawin [:esawin] from comment #22)
> We shouldn't back out bug 1219016, it's part of a long-term refactoring
> initiative. We can uplift this fix, instead.

Yes, please uplift.
Flags: needinfo?(esawin)
Comment on attachment 8737749 [details] [diff] [review]
0001-Bug-1257304-1.3-Cache-screen-size-and-update-only-on.patch

Approval Request Comment
[Feature/regressing bug #]: 1219016
[User impact if declined]: Rendering performance regression with font inflation enabled
[Describe test coverage new/current, TreeHerder]: One week on Nightly
[Risks and why]: Low, simple caching mechanics
[String/UUID change made/needed]: None
Flags: needinfo?(esawin)
Attachment #8737749 - Flags: approval-mozilla-beta?
Attachment #8737749 - Flags: approval-mozilla-aurora?
Comment on attachment 8737749 [details] [diff] [review]
0001-Bug-1257304-1.3-Cache-screen-size-and-update-only-on.patch

Please uplift to aurora and beta. 
This can make beta 12 build next Monday.
Attachment #8737749 - Flags: approval-mozilla-beta?
Attachment #8737749 - Flags: approval-mozilla-beta+
Attachment #8737749 - Flags: approval-mozilla-aurora?
Attachment #8737749 - Flags: approval-mozilla-aurora+
Verifying this fix in beta would be good once it lands.
Flags: qe-verify+
Flags: needinfo?(ioana.bugs)
Flags: qe-verify+
Flags: needinfo?(ioana.bugs)
You need to log in before you can comment on or make changes to this bug.