899424 - FloatUtils.floatFuzzyEquals is consuming more CPU time on page load than it should

Reporter

Description

•

12 years ago

Profiling has revealed that 0.3% of the CPU time involved in loading a page is spent in FloatUtils.fuzzyEquals - an entirely uninteresting function to spend so long in. Yes, it's 0.3% - whoopdediddleydoo - I present a one-line patch reducing this to 0.1% by eliminating the extra function call to Math.abs (Which ProGuard would be unable to optimise out anyway) Our pages will load.. Uh.. 30 milliseconds faster. Every little helps!

Chris Kitching [:ckitching]

Reporter

Comment 1

•

12 years ago

Attached patch fuzzyEqualsWithoutMathAbs.patch — Details — Splinter Review

Assignee: nobody → ckitching

Attachment #782944 - Flags: review?

Chris Kitching [:ckitching]

Reporter

Updated

•

12 years ago

Severity: normal → minor

OS: Linux → Android

Hardware: x86_64 → ARM

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

12 years ago

Attachment #782944 - Flags: review? → review?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 2

•

12 years ago

Comment on attachment 782944 [details] [diff] [review] fuzzyEqualsWithoutMathAbs.patch Review of attachment 782944 [details] [diff] [review]: ----------------------------------------------------------------- While I'm not entirely opposed to doing this, I need more justification for it before I'll agree to it. Specifically: 1) Please post the profile data that shows your results. How precise is the profile data? I'd like to be reasonably certain that the measurement difference you noticed isn't just noise, so measuring a few samples and generating a standard deviation would be nice. 2) How many times is this function called, and from where? Micro-optimizations like this are acceptable if there's no higher-level point in the call graph that can be optimized. If there is, though, we should look at that first. Reducing the number of calls to e.g. ImmutableViewportMetrics.fuzzyEquals, or even just rearranging the order of calls in that function to better take advantage of short-circuiting is preferable to me. Getting an idea of where this function is being called from and the relative counts of those call sites would be a good start. 3) Even if it turns out that this change is justified I would like to do the minimum change needed in the code itself to allow Proguard to do the rest. In this case, you're changing a readable Math.abs call into something much harder to read. I would prefer pulling out a FloatUtils.abs private function instead, which Proguard can optimize away. Failing that this code change is at least worth a comment.

Attachment #782944 - Flags: review?(bugmail.mozilla) → review-

Chris Kitching [:ckitching]

Reporter

Comment 3

•

12 years ago

I'm very sorry, I did an extremely shoddy job of explaining my thinking about this one. The breakdown of calls to FloatUtils.fuzzyEquals is, roughly: 40% from RectUtils.fuzzyEquals 34% from GeckoLayerClient.progressiveUpdateCallback 9% from FloatUtils.fuzzyEquals(Point, Point) 8% from ScrollbarLayer.fade 5% from Layer$RenderContext.fuzzyEquals 1.5% from ImmutableViewportMetrics.setMargins And a few other tiny fragments of little interest. When we apply ProGuard to this, the situation hardly changes - the Math.abs call in FloatUtils.fuzzyEquals can't be inlined, but also seems to cause ProGuard not to inline the body of fuzzyEquals into the callers (Which would at least eliminate all the call overhead they're incurring calling fuzzyEquals) Looking into RectUtils.fuzzyEquals, we come across an unrelated thing of interest: >public static boolean fuzzyEquals(RectF a, RectF b) { > if (a == null && b == null) > return true; > else if ((a == null && b != null) || (a != null && b == null)) > return false; > else > return FloatUtils.fuzzyEquals(a.top, b.top) > && FloatUtils.fuzzyEquals(a.left, b.left) > && FloatUtils.fuzzyEquals(a.right, b.right) > && FloatUtils.fuzzyEquals(a.bottom, b.bottom); >} The central if statement can be rewritten equivalently as as if (a == null || b == null). An optimisation ProGuard plausibly can make by itself, but perhaps worth doing just for clarity. It doesn't seem to be obvious that there's a way we can optimise this method to make less use of fuzzyEquals, sadly. In fact, exploring the places that use FloatUtils.fuzzyEquals in detail does not seem to provide obvious routes for reducing the calls to fuzzyEquals, alas. Some of these parent fuzzyEquals functions do seem to be being hit pretty hard, though, and most of them are doing little except calling FloatUtils.fuzzyEquals on a bunch of different things. Applying the optimisation, Proguard, disturbingly, seems still not to inline the FloatUtils.fuzzyEquals method itself (Something that I'd have thought would be something that it would do...). Maybe this is just some profiler effect, but I do not believe so - ProGuard modifies the bytecode itself, and I have successfully measured ProGuard's improvements using the profiler. Anyway, on to the actual numbers observed for this change. Both were also ProGuard'd (That hasn't landed yet, but it's only a matter of time, we hope...) With the patch (All times in ms): Total Time in FloatUtils.fuzzyEquals Calls Time Per call 23.118 5253 .0044 16.847 4258 .0040 16.652 4303 .0039 5.432 1408 .0039 4.158 1269 .0033 4.98 1205 .0041 3.175 772 .0041 2.844 617 .0046 2.875 617 .0047 10.525 2780 .0038 2.84 630 .0045 2.438 634 .0038 Avg time per call: 0.0041ms Standard deviation: 0.0004 Without the patch (But still ProGuardified) Total Time in FloatUtils.fuzzyEquals Calls Time Per call 58.989 4901 .0120 46.52 4086 .0114 50.944 4249 .0120 19.447 1691 .0115 15.923 1281 .0124 14.699 1193 .0123 9.061 783 .0116 7.365 621 .0119 7.019 656 .0107 14.44 1336 .0108 7.654 664 .0115 7.146 643 .0111 Avg time per call: 0.0116ms STDev: 0.0006 Each group of three profiles are three loads of the same page (Respectively: a new york times article, a Reddit thread, a Google search, and an xkcd comic.). These results make it fairly clear that this change has a significant impact on the speed of this function (Call overhead of the .abs call dominates). Adding our own version of library functions/classes which ProGuard is unable to optimise is, in general, a good plan. For example, it will likelly prove useful to make our own version of Rect, so calls to boring functions like getWidth() can be optimised away. In this case, adding our own abs function seems like an excellent idea - additional neatness and no slower. (Although doing so eliminates all non-Proguarded performance benefit of this patch, but that's okay) Profiles with Proguardification and our own abs function: https://www.dropbox.com/sh/y9h9cddoinonipm/EmiPuSLEVY Profiles with Proguardification and my patch as presented (Not really any difference): https://www.dropbox.com/sh/0w1m0gv0zmufhio/0tLcvT22pD Profiles with Proguardification but no patch for this problem: https://www.dropbox.com/sh/dghb4jkvcipf3rk/rBoCc-QEsj

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 4

•

12 years ago

So I looked at the nytimes1.trace file from your dropbox on the assumption that since it was the largest trace file it would have the most data. I found the FloatUtils.fuzzyEquals, which is reported as taking up 0.3% of the total time (as you also stated in comment 0) and the top caller of it being RectUtils.fuzzyEquals, at 42.3%, which is roughly what you had above. If I work my way up the stack another level, to the callers of RectUtils.fuzzyEquals, it is called entirely by LayerRenderer$Frame.beginDrawing. I looked at the code for this function and the only call to fuzzyEquals in there is at [1], but it's to RenderContext.fuzzyEquals [2]. I assume the RenderContext.fuzzyEquals function is getting inlined by Proguard and so ends up calling Rect.fuzzyEquals directly on the handful of rects inside the RenderContext. So now what I would like to do is optimize at this level, and try to reduce the number of calls to RenderContext.fuzzyEquals. Note that mLastPageContext takes on the value of mPageContext a few lines below; this condition is basically checking to see if the mPageContext changes from one frame of composition to the next. I know this part of the code pretty well (considering I wrote that line) and I think that during page load, in most cases, the mPageContext and mLastPageContext will have the same data inside. So if we can detect that fact earlier in the process we can short-circuit this fuzzyEquals check entirely. If you go up to [3] you'll see that the mPageContext is created entirely from data in the ImmutableViewportMetrics object and some fields of LayerRenderer. (Actually it also takes an offset but with my patch for bug 892267 that can also be pulled out of the ImmutableViewportMetrics). I'm pretty sure that a lot of the time, the ImmutableViewportMetrics object passed in there will actually be the exact same object. At least during page load. So the first high-level optimization that comes to mind is to modify the createPageContext function to be more like this: createPageContext(ImmutableViewportMetrics metrics) { if (mCachedMetrics == metrics) { return mCachedPageContext; } mCachedMetrics = metrics; return mCachedPageContext = createContext(...); } and then stick in an extra condition in the mPageContext.fuzzyEquals(mLastPageContext) that short-circuits if mPageContext == mLastPageContext. This will save both needless creation of RenderContext instances and also the fuzzyEquals calls. That being said, the first step is always to verify the assumptions :) In this case the assumption I'm making is that the ImmutableViewportMetrics object passed in to the Frame constructor is going to be the same object a nontrivial amount of time - this can easily be verified by sticking in a few lines of code there. Thoughts? [1] http://hg.mozilla.org/mozilla-central/file/c2b375f3a909/mobile/android/base/gfx/LayerRenderer.java#l472 [2] http://hg.mozilla.org/mozilla-central/file/c2b375f3a909/mobile/android/base/gfx/Layer.java#l196 [3] http://hg.mozilla.org/mozilla-central/file/c2b375f3a909/mobile/android/base/gfx/LayerRenderer.java#l428

Chris Kitching [:ckitching]

Reporter

Comment 5

•

12 years ago

Ah! Great spot! Having such a good understanding of the code makes finding high level optimisations much easier, it seems. This is still quite unfamiliar (And sadly undocumented :( ) code to me, so I find myself doing peephole-esque optimisations that annoy reviewers. What you propose seems at least worth investigating - but why not both have our cake and eat it, so to speak? It sounds like you've found a further optimisation - why not add it to the one I propose? (Refactoring mine to instead use our own abs function which can be Proguard-optimised instead of using the Math.abs has no effect on performance gain, but does prevent it from harming the clarity of fuzzyEquals so badly) Or is the trick of duplicating or repacking library functions used by critial code with a view to being able to optimise them with ProGuard discouraged? If so I feel we'll be blocking quite a few less trivial optimisations, too. I'll give your latest suggestion a proper investigation tomorrow. The amount of call overhead we're doing to, essentially, do a very simple sum, is quite depressing. It also occurs to me that another valid approach to finding fuzzy equals would be along the lines of (boolean) ((b-a)*(10^12)). Probably more cryptic than the own-abs approach.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 6

•

12 years ago

(In reply to Chris Kitching [:ckitching] from comment #5) > Or is the trick of duplicating or repacking library functions used by > critial code with a view to being able to optimise them with ProGuard > discouraged? If so I feel we'll be blocking quite a few less trivial > optimisations, too. I'm ok with doing this, but only where "it makes sense". Saving 30ms on a time scale of 20 seconds for something that may not be on the critical path and does impact readability of the code, to me, doesn't really make much sense. And my thinking is that if the optimization I described above works out, then the 0.3% time spent in this function will drop to 0.2% and the benefit from the inlining optimization will be even less worth it. I guess this is pretty hand-wavy as to what's worth it and what's not but in my experience peephole optimizations like this never make a noticeable difference and therefore are a waste of developer time, unless you can show they're in hot code that gets called millions of times. Just my two cents.

Chris Kitching [:ckitching]

Reporter

Updated

•

11 years ago

Assignee: chriskitching → nobody

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

11 years ago

Blocks: 947390

BMO Automation

Comment 7

•

4 years ago

We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → INCOMPLETE

BMO Automation

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Bugzilla

FloatUtils.floatFuzzyEquals is consuming more CPU time on page load than it should

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

Tracking

(Not tracked)

People

(Reporter: ckitching, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7

Updated

Attachment

General

Description

File Name

Content Type