GeckoAppShell.pumpMessageLoop is wasting lots of CPU time on page load

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: ckitching, Assigned: ckitching)

Tracking

Trunk
ARM
Android
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

Profiling page loads, it seems that GeckoAppShell.pumpMessageLoop is consuming quite a bit of CPU time - a total of about 1 second of inclusive CPU time on my Nexus 4 for a single load of an article on NYTimes. (Although this may have been stretched somewhat by the profiler)
I present a patch reducing the time consumed by this function during the same page load to 72ms, for the same number of calls.

Profiles without patch:
https://www.dropbox.com/s/h6hw8et2ptxcefz/withoutPatch.tar.xz

Profiles with patch:
https://www.dropbox.com/s/rss6n8nttprmh9s/withPatch.tar.xz
Attachment #779870 - Attachment description: needlessJNIGAS.patch → Make GeckoAppShell.pumpMessageLoop waste less CPU time
Attachment #779870 - Attachment is obsolete: true
Attachment #779870 - Flags: review?(nchen)
Attachment #779876 - Flags: review?(nchen)
Assignee: nobody → ckitching
Comment on attachment 779876 [details] [diff] [review]
Make GeckoAppShell.pumpMessageLoop waste less CPU time V2 - redundant check removal

Review of attachment 779876 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoAppShell.java
@@ +89,5 @@
>  import java.io.OutputStream;
>  import java.io.PrintWriter;
>  import java.io.StringWriter;
> +import java.lang.reflect.InvocationTargetException;
> +import java.lang.reflect.Method;

You can remove this.

@@ +142,5 @@
>  
> +    // Static reference to Gecko handler to avoid call overheads.
> +    private static Handler sGeckoHandler;
> +    // Static reference to this thread's looper call to avoid call overheads.
> +    private static MessageQueue sMyQueue;

How much difference do these make? I would be surprised if getter methods are that much slower, and we generally want to avoid having multiple references around (they also clutter up GeckoAppShell which we are trying to avoid as well).

@@ +2491,5 @@
>  
> +    public static boolean pumpMessageLoop() throws InvocationTargetException, IllegalAccessException {
> +        Handler geckoHandler = sGeckoHandler;
> +        // It is not faster to replace this with reflection - the underlying MessageQueue.next() is itself native,
> +        // so the reflection just adds extra indirection and we still have to pay for the JNI.

Don't need the comments.

::: widget/android/AndroidJNI.cpp
@@ +816,5 @@
> +    // if queue.mMessages is null, queue.next() will block, which we don't want
> +    // It turns out to be an order of magnitude more performant to do this extra check here and block less vs. one fewer
> +    // checks here and more blocking.
> +    if (!msg)
> +      return msg;

You should wrap GetObjectField and !msg inside an |if (jMessagesField)| block. Also |return NULL;| instead of |return msg;|. Looks good otherwise.
Attachment #779876 - Flags: review?(nchen) → feedback+
(In reply to Jim Chen [:jchen :nchen] from comment #3)
> Comment on attachment 779876 [details] [diff] [review]
> Make GeckoAppShell.pumpMessageLoop waste less CPU time V2 - redundant check
> removal
> 
> Review of attachment 779876 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +142,5 @@
> >  
> > +    // Static reference to Gecko handler to avoid call overheads.
> > +    private static Handler sGeckoHandler;
> > +    // Static reference to this thread's looper call to avoid call overheads.
> > +    private static MessageQueue sMyQueue;
> 
> How much difference do these make? I would be surprised if getter methods
> are that much slower, and we generally want to avoid having multiple
> references around (they also clutter up GeckoAppShell which we are trying to
> avoid as well).

It is significant. Calls in Java are surprisingly expensive. The measured gain from this change is about 0.20ms per call (Averaged over 5000 calls). The total gain from this patch is about 0.34ms per call - so these changes account for most of the gains in the patch. Profiles with the patch, without these changes to Java:
https://www.dropbox.com/s/w7f8pg77hgta82i/withPatchSansJava.tar.xz

We find ourselves in the slightly funky land of Java micro-optimisations - these tiny changes are, most of the time, not interesting - but this method is being absolutely hammered (About 5000 calls for a load of nytimes), so it seems worth slightly compromising neatness in favour of performance (and 0.34ms * 5000 is not half bad - although just how much of this is actually blocking the page load is unclear - this patch may well improve Flash performance somewhat, at least - maybe it'll become (slightly) less of a battery hog)
A sane compiler would realise "These methods are just returning some field, so I can inline them and not actually do a call". 

Alas, javac is not a sane compiler.

> 
> @@ +2491,5 @@
> >  
> > +    public static boolean pumpMessageLoop() throws InvocationTargetException, IllegalAccessException {
> > +        Handler geckoHandler = sGeckoHandler;
> > +        // It is not faster to replace this with reflection - the underlying MessageQueue.next() is itself native,
> > +        // so the reflection just adds extra indirection and we still have to pay for the JNI.
> 
> Don't need the comments.

I disagree. Replacing the native call with a reflective call is usually an optimisation (You can call private methods of library classes using reflection without needing to pay the (huge) cost of calling the JNI), so this very much looks like it's something that could be improved like this. It can't, because the private method we want is itself native (So we would end up paying for reflection and then paying for the JNI anyway - which is slower - so we might as well cut out the middleman). Removing this comment would run the risk of some other helpful person coming along next year and making this worse because they didn't realise the somewhat non-obvious reason why this is written in an apparently suboptimal way.
Attachment #779876 - Attachment is obsolete: true
Attachment #779964 - Flags: review?(nchen)
Comment on attachment 779964 [details] [diff] [review]
Make GeckoAppShell.pumpMessageLoop waste less CPU time V3 - Nit picking

Review of attachment 779964 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the good work! :)

::: mobile/android/base/GeckoAppShell.java
@@ +141,5 @@
> +    // Static reference to Gecko handler to avoid call overheads.
> +    private static Handler sGeckoHandler;
> +    // Static reference to this thread's looper call to avoid call overheads.
> +    private static MessageQueue sMyQueue;
> +

Alright, I now think it's fine to cache MessageQueue.

As for the Gecko Handler, can you make the fields public in ThreadUtils? 1) It avoids duplication. 2) It doesn't clutter up GeckoAppShell.

I'd recommend that sMyQueue be moved to ThreadUtils, too, and be called sGeckoQueue. Comments would be great on why these need to be fields instead of getters.

@@ +2489,5 @@
>  
>      public static boolean pumpMessageLoop() {
> +        Handler geckoHandler = sGeckoHandler;
> +        // It is not faster to replace this with reflection - the underlying MessageQueue.next() is itself native,
> +        // so the reflection just adds extra indirection and we still have to pay for the JNI.

> > 
> > Don't need the comments.
> 
> I disagree. Replacing the native call with a reflective call is usually an
> optimisation (You can call private methods of library classes using
> reflection without needing to pay the (huge) cost of calling the JNI), so
> this very much looks like it's something that could be improved like this.
> It can't, because the private method we want is itself native (So we would
> end up paying for reflection and then paying for the JNI anyway - which is
> slower - so we might as well cut out the middleman). Removing this comment
> would run the risk of some other helpful person coming along next year and
> making this worse because they didn't realise the somewhat non-obvious
> reason why this is written in an apparently suboptimal way.

Note that MessageQueue.next() is written in Java [1]. Also reflection is implemented through JNI calls [2], so it's debatable whether reflection is an optimization over JNI. But feel free to keep a comment here as you feel appropriate.

[1] http://androidxref.com/4.2_r1/xref/frameworks/base/core/java/android/os/MessageQueue.java#117
[2] http://androidxref.com/4.2_r1/xref/libcore/luni/src/main/java/java/lang/reflect/Method.java#506

::: widget/android/AndroidJNI.cpp
@@ +812,2 @@
>      }
> +    

Nit: extra spaces in empty line.
Attachment #779964 - Flags: review?(nchen) → feedback+
(In reply to Jim Chen [:jchen :nchen] from comment #5)
> Comment on attachment 779964 [details] [diff] [review]
> Make GeckoAppShell.pumpMessageLoop waste less CPU time V3 - Nit picking
> 
> Review of attachment 779964 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for the good work! :)
> 
> ::: mobile/android/base/GeckoAppShell.java
> @@ +141,5 @@
> > +    // Static reference to Gecko handler to avoid call overheads.
> > +    private static Handler sGeckoHandler;
> > +    // Static reference to this thread's looper call to avoid call overheads.
> > +    private static MessageQueue sMyQueue;
> > +
> 
> Alright, I now think it's fine to cache MessageQueue.
> 
> As for the Gecko Handler, can you make the fields public in ThreadUtils? 1)
> It avoids duplication. 2) It doesn't clutter up GeckoAppShell.

I'd love to - but assumed that'd cause more resistance than what I did.


> I'd recommend that sMyQueue be moved to ThreadUtils, too, and be called
> sGeckoQueue. Comments would be great on why these need to be fields instead
> of getters.

Sounds like a plan!

> Note that MessageQueue.next() is written in Java [1]. Also reflection is
> implemented through JNI calls [2], so it's debatable whether reflection is
> an optimization over JNI. But feel free to keep a comment here as you feel
> appropriate.
> 
> [1]
> http://androidxref.com/4.2_r1/xref/frameworks/base/core/java/android/os/
> MessageQueue.java#117
> [2]
> http://androidxref.com/4.2_r1/xref/libcore/luni/src/main/java/java/lang/
> reflect/Method.java#506

That's really interesting - I had assumed otherwise - I was seeing calls to nativeInvoke in the profiles and thought this was some special thing to invoke native methods, not the actual natively-implemented way of invoking methods.
Drat.
Should probably actually look things up in future.
Okay, no more factually inaccurate comments, no more needless spaces, and no more duplicated object references (Just public ones instead. Eep.)

It seems wise to revisit this change if we ever resolve Bug 709230 (My new side project) since it can make these kinds of stupid optimisations (And a bunch of cleverer ones, too) happen for us at compile time, so we can both have our neat code and have it not run slowly for stupid reasons like getter call overhead. (Profiling does suggest that there are a bunch of places, particularly in gfx code, where inlining getters like this could make performance much better - although I'm not filing to change those by hand because of the quite dreadful effect on neatness that would ensue).
Attachment #779964 - Attachment is obsolete: true
Attachment #780727 - Flags: review?(nchen)
Comment on attachment 780727 [details] [diff] [review]
Make GeckoAppShell.pumpMessageLoop waste less CPU time V4 - Remove the stupid.

Review of attachment 780727 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Looks good. See nits below.

::: mobile/android/base/GeckoAppShell.java
@@ +308,5 @@
>          };
>          Looper.myQueue().addIdleHandler(idleHandler);
>  
> +        ThreadUtils.sGeckoQueue = Looper.myQueue();
> +

You should do this inside GeckoThread.run(), right below setGeckoThread. http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoThread.java#118

Also, I recommend getting rid of ThreadUtils.setGeckoThread and ThreadUtils.getGeckoThread entirely.

::: mobile/android/base/util/ThreadUtils.java
@@ +21,5 @@
>      private static Handler sUiHandler;
> +
> +    // Referenced directly from GeckoAppShell in highly performance-sensitive code (The extra function call of the getter
> +    // was harming performance. (Bug 897123))
> +    // Once Bug 709230 is resolved we should reconsider this as ProGuard should be able to optimise this out at compile time.

Nit: limit lines to 100 chars.

::: widget/android/AndroidJNI.cpp
@@ +816,5 @@
> +
> +    if (jMessagesField) {
> +        jobject msg = jenv->GetObjectField(queue, jMessagesField);
> +        // if queue.mMessages is null, queue.next() will block, which we don't want
> +        // It turns out to be an order of magnitude more performant to do this extra check here and block less vs. one fewer

Nit: 100 chars
Attachment #780727 - Flags: review?(nchen) → review+
Done as you suggested. Ready for checkin?
Attachment #780727 - Attachment is obsolete: true
Attachment #781856 - Flags: review?(nchen)
Status: NEW → ASSIGNED
https://tbpl.mozilla.org/?tree=Try&rev=8cdf8c2c2389
Let's see if I broke anything nonobvious...
Attachment #781856 - Flags: review?(nchen) → review+
Looks like a win!
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/fac2d791f66b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.