Panning around canvas on crash-stats.mozilla.org is laggy without GL compositing

RESOLVED FIXED

Status

()

Core
Canvas: 2D
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: jdm, Assigned: azakai)

Tracking

({perf})

Trunk
Points:
---

Firefox Tracking Flags

(fennec2.0+)

Details

Attachments

(1 attachment, 6 obsolete attachments)

(Reporter)

Description

7 years ago
Go to a product summary page on crash-stats.  Notice that when panning around while the canvas element is in view, the panning is laggy.  Notice that when it is out of view, the panning is smooth as butter.  I've seen this on both my n900 and Galaxy S.

Updated

7 years ago
tracking-fennec: --- → ?
Keywords: perf
OS: Linux → All
Hardware: x86 → All

Comment 1

7 years ago
jdm, any idea what is going on?
(Assignee)

Updated

7 years ago
Assignee: nobody → azakai
(Assignee)

Comment 2

7 years ago
Some data, comparing 3 seconds of panning over the canvas to not panning, on a fast laptop:

Profiling IPC messages shows

PBrowserChild|AM:Content:SetCacheViewpor               total latency: 0.0000    total processing: 0.1079    total: 0:137:137
                 PLayersChild|Msg_Update [SYNC/RPC]    total latency: 0.2097    total processing: 0.0276    total: 70:75:75

Neither message is noticeable in profiles from without panning.

oprofile shows:

2335      1.7176  libxul.so                pixman_blt_sse2
2189      1.6102  libxul.so                js::Interpret(JSContext*, JSStackFrame*, unsigned int, JSInterpMode)

The former is not noticeable without panning; the latter is half as big without panning.

Taking the IPC profiling and oprofile stats together, and considering that this is on a powerful laptop and not a device, I'd say they might explain the issue. We apparently

1. Spend a significant amount of time in synchronous PLayersChild|Msg_Update messages that cause latency
2. Send dozens of Content:SetCacheViewport messages per second, that take a significant amount of time to process
It looks like the canvas there has an alpha channel, which is expensive to composite on the CPU.  GL should make this go away, but another possibility is fixing the TODO here

  virtual LayerState GetLayerState(nsDisplayListBuilder* aBuilder,
                                   LayerManager* aManager)
  {
    // XXX we should have some kind of activity timeout here so that
    // inactive canvases can be composited into the background
    return mozilla::LAYER_ACTIVE;
  }
Assignee: azakai → nobody
Component: General → Canvas: 2D
Product: Fennec → Core
QA Contact: general → canvas.2d
Assignee: nobody → azakai
tracking-fennec: ? → 2.0+
(Assignee)

Comment 4

7 years ago
So, what should we do here - fix that TODO, or wait for GL?
If you've got the time, it'd be great to apply the GL patches and check if they indeed speed up compositing here.
(In reply to comment #4)
> So, what should we do here - fix that TODO, or wait for GL?

Alon - Front-end could explore reducing the Content:SetCacheViewport messages too
(Assignee)

Comment 7

7 years ago
(In reply to comment #5)
> If you've got the time, it'd be great to apply the GL patches and check if they
> indeed speed up compositing here.

I tried to test with hardware acceleration, however it crashes (filed bug 617498), so I cannot tell if that will help here or not.
(Assignee)

Comment 8

7 years ago
Crashes and lockups are frequent with GL turned on on that page, but sometimes it works long enough to see that it is much, much smoother with GL (on a Galaxy S).
\o/
has this gotten better?  WFM?
GL compositing makes it go away.  Probably worth leaving open to explore alternatives if we end up disabling GL on important HW.
Summary: Panning around canvas on crash-stats.mozilla.org is laggy → Panning around canvas on crash-stats.mozilla.org is laggy without GL compositing
(Assignee)

Comment 12

7 years ago
Removing assignee as there is currently no work to be done in this bug.
Assignee: azakai → nobody
Its looking like open gl compositing will not be enabled on at least a subset of devices. Alon, do you want to pick this back up?
(Assignee)

Comment 14

7 years ago
Hmm, I am not clear on what should be done here. I assume comment #3 is intended? cjones, can you please elaborate?
Yes, when we use CPU compositing in chrome our easiest bet is probably to do comment 3 in the content process.  In fennec, that will slow down content re-rendering a bit at the expense of faster panning in the common case, which seems like a win.
(Assignee)

Comment 16

7 years ago
Are we sure that that would improve things here? Just switching mozilla::LAYER_ACTIVE to mozilla::LAYER_INACTIVE as a test doesn't seem to have any effect.
You can check the layer tree output by setting gDumpPaintList = 1 in a debug build. With layers set to inactive, the CanvasLayer entry should disappear.

Note that this will break reftests (functionally, rather than a visible failure) that rely on canvas being an always active layer.
(Assignee)

Comment 18

7 years ago
Thanks for the info!

Ok, I verified that indeed all CanvasLayers vanish when I return INACTIVE (but do appear when the original ACTIVE is returned). As mentioned before, no noticeable performance difference (or any other visual difference), but I'll do some more tests.
(In reply to comment #17)
> Note that this will break reftests (functionally, rather than a visible
> failure) that rely on canvas being an always active layer.

Hmm, how so?  Example?

(In reply to comment #18)
> Ok, I verified that indeed all CanvasLayers vanish when I return INACTIVE (but
> do appear when the original ACTIVE is returned). As mentioned before, no
> noticeable performance difference (or any other visual difference), but I'll do
> some more tests.

Your patch (probably not quite what we want; prolly want to make INACTIVE off a timeout) would move the <canvas>->page compositing into the content process, where it would happen exactly once per PLayers:Update.  Currently compositing happens on each chrome-process repaint.  If the number of repaints are >> PLayers:Update, e.g. if you're panning slowly up/down around the canvas, your patch ought to be a clear qualitative win.  If the ratio is closer to 1:1, we won't win anything.
(And in fact could lose if we would have been able to apply a clip in the chrome process, to skip some compositing.)
(Assignee)

Comment 21

7 years ago
Ok, did some more testing. Using INACTIVE definitely has a positive effect in many cases, but it might not be very noticeable depending on zoom etc. (which is why I didn't notice it before).
Assignee: nobody → azakai
(Assignee)

Comment 22

7 years ago
cjones, can you please clarify some things for me?

Other classes that decide between ACTIVE/INACTIVE - nsDisplayVideo and nsDisplayOpacity - appear to have natural ways to find out if their content is active or not. Not clear to me what makes sense with canvas. My initial guess is something to do with when a reflow/invalidation occurs, but I'm not sure if that makes sense - won't the canvas be already inactive when the reflow completes (so no need to do a timeout for later)? Or is the idea to wait in anticipation of possible additional reflows happening soon after? Or are my guesses completely off the mark? ;)
Vlad would have the best feel for heuristics, so he should have the final say.

For GL we don't need to worry about this, it would be a perf hit.  (Might be a win for battery life, but we don't really have the tools to test that yet.)  If

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.h#318

says anything other than LAYERS_BASIC, we don't need to bother with this ACTIVE/INACTIVE code.  The same is true with single-process rendering.

There's not really a clean way to determine ACTIVE/INACTIVE for <canvas>, AFAIK.  A heuristic that might be good enough for 4.0 could be
 - ACTIVE unless chrome-process layer manager is LAYERS_BASIC, in which case
 - ACTIVE if the <canvas> is opaque (and so compositing is cheap)
 - INACTIVE otherwise

I'm starting to reconsider comment 19; with the above, we'd trade off possibly unnecessary compositing ops in the content process (say if the <canvas> layer ended up off-screen in chrome) for *not* doing an unbounded number of unnecessary compositing ops in the chrome process if the <canvas> layer were on-screen.  This keeps the possibly-wasted work in the content process.

We could get slightly fancier by amending "INACTIVE otherwise" above to
 - ACTIVE unless the canvas has been invalidated within the last k milliseconds*
 - INACTIVE otherwise

* probably determined near here http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/nsCanvasRenderingContext2D.cpp#4061

That's similar to what we do for scrollable subframes, and could be a win on desktop too.  Not sure what the timeout would be ... 100ms?  200ms?  The important thing is not to waste composite-into-background ops for active animations.
If we were doing to make canvases INACTIVE, I would use a heuristic of the form "make the canvas active if it's been updated within the last NNN seconds".

If someone's updating the canvas a lot (and not changing other content on the page), then making it an active layer should be a performance improvement because it avoids the cost of redrawing the content in the ThebesLayer(s) around it.
(Assignee)

Comment 25

7 years ago
>  - ACTIVE unless chrome-process layer manager is ...
                   ~~~~~~~~~~~~~~

A question about that: Even when running a remote page in Fennec, the call to GetLayerState (where we need to decide active/inactive) happens in the parent process during panning (it happens in the child during initial load). So I do not understand what kind of check you are suggesting to do, regarding which process is being run (as the check during panning always happens in the parent).
Which GetLayerState call?  For <canvas> frames?
(Assignee)

Comment 27

7 years ago
Yes, in nsDisplayCanvas.
Those are probably the UI's <browser> thumbnails, which we don't really care about since they're small and opaque.  But comment 23 and comment 24 apply equally well to those.

"chrome-process layer manager" should really have been, "*compositing* layer manager".  For the content process, the eventual compositor will be the chrome-process's manager.
(Assignee)

Comment 29

7 years ago
Created attachment 505272 [details] [diff] [review]
wip patch

This is what I have so far. Works nicely on device.

cjones, I don't understand your last comment, and still not clear to me what kind of process-related check you are suggesting here - what exactly should be checked for, and where?
Content processes always use basic layer managers.  "Shadow layers" of the layers created by the content-process managers are hooked into the chrome-process layer manager.  The chrome process layer manager can by any type.  The chrome-process layer manager is the one that eventually composites content-process <canvas> shadow layers onto fennec's window.

In your patch, the check for basic managers is always going to be true in the content process, even if we're using GL layers in chrome.  In that case, composite-into-background doesn't necessarily help perf, and can hurt it.

Instead, we'd want to check a layer-manager interface like MaybeCompositeCanvasLayersIntoBackground().  In content processes, that interface would use

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.h#318 

, i.e. the compositing manager's type, instead of the content-process manager's type, which is always LAYERS_BASIC.

(You didn't request feedback on this patch, but I think you want the inactivity check to be near

http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/nsCanvasRenderingContext2D.cpp#4061

rather the checking the frame's reflow time.  To my knowledge, <canvas> reflow and web content drawing to the canvas are entirely separate beasts, and we only care about the latter for composite-into-background.  Vlad would know more.)
(Assignee)

Comment 31

7 years ago
Created attachment 505481 [details] [diff] [review]
wip patch

Thanks for the explanation!

Vlad, I marked as asking for feedback, specifically for the issue cjones mentioned you would know about, in the previous comment.

Unrelated question (for anyone): I have

  layers.acceleration.disabled = true
  layers-acceleration-force-enable = false

but I do get some layers as being LAYERS_OPENGL for their parent backend type. Is this expected?
Attachment #505272 - Attachment is obsolete: true
Attachment #505481 - Flags: feedback?(vladimir)
Indeed, canvas reflow is totally unrelated to the canvas layer or drawing into the canvas.  Only drawing into the canvas should trigger the inactive/active stuff; using the last reflow time for this isn't useful.  (Think of video -- video is marked as "active" while it's playing, that is while the content is changing, and inactive when paused.)  Instead, you want to use something like the last update time on the canvas layer, if possible.

You can't quite use last invalidation time, because if we ever invalidate the entire canvas, we'll stop calling invalidate again until we actually paint.  So you could get into the scenario of someone doing some really expensive canvas drawing that takes a long time; at some point we'd invalidate the entire canvas and stop calling invalidate again, so the "time since last invalidate" would be very large by the time we got to layers, even though we're actively painting.  But maybe that's a pathological case anyhow, since you can get into the same situation if you use last updated time...

Note that as far as I know, the inactivity bits are an optimization primarily for basic layers.  For other layer types, the relevant composite should be quite fast, and the only optimization we'd gain is somewhat reduced memory usage.
Examples of reftests relying on canvases being active is layout/bugs/602200-**.html tests.
I think those tests will be OK as long as canvases become inactive a few seconds after the last drawing activity.
(Assignee)

Comment 35

7 years ago
Is AreLayersMarkedActive relevant here?

http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#1567
Yes, that's the right thing. You can call nsIFrame::MarkLayersActive to mark the canvas frame as active after it's been painted. It will automatically be marked inactive soon after the last MarkActive (currently 300-400ms, contrary to comments, although we might reduce that to 75-100ms).
(Assignee)

Comment 37

7 years ago
Created attachment 507657 [details] [diff] [review]
wip patch

Hmm, I still don't understand the big picture here. I can't seem, for example, to find a way to access the relevant Layer in GetLayerState() (to find out if it was recently updated), nor to access the relevant Frame from ShadowLayers (to call MarkLayersActive).

So, this patch is the best I can think of so far. It adds a lastUpdatedTime to ShadowLayer, and caches that when calling BuildLayer, then reads it in GetLayerState.

Is there a better way?
Attachment #505481 - Attachment is obsolete: true
Attachment #507657 - Flags: feedback?
Attachment #505481 - Flags: feedback?(vladimir)
See nsDisplayOpacity::GetLayerState, which just does
  if (mFrame->AreLayersMarkedActive())
    return LAYER_ACTIVE;

Instead of accessing the frame from ShadowLayers, mark the frame active from the canvas code (nsCanvasRenderingContext2D::GetCanvasLayer and WebGLContext::GetCanvasLayer).
(Assignee)

Comment 39

7 years ago
Created attachment 507998 [details] [diff] [review]
patch

Thanks, here is a much nicer and shorter patch.
Attachment #507657 - Attachment is obsolete: true
Attachment #507998 - Flags: review?(roc)
Attachment #507657 - Flags: feedback?
-    // XXX we should have some kind of activity timeout here so that
-    // inactive canvases can be composited into the background
-    return mozilla::LAYER_ACTIVE;
+    if (XRE_GetProcessType() == GeckoProcessType_Default ||
+        static_cast<BasicShadowLayerManager*>(aManager)
+          ->GetParentBackendType() != LayerManager::LAYERS_BASIC)
+      return mozilla::LAYER_ACTIVE;
+
+    return mFrame->AreLayersMarkedActive() ? LAYER_ACTIVE : LAYER_INACTIVE;

I would like to be consistent here across products and processes. I don't think this should depend on the process type.

Also, I don't think we should just check for LAYERS_BASIC here. It could be a ShadowLayerManager where the shadows get accelerated rendering, in which case we probably want to always be active. How about adding a method LayerManager::IsCompositingCheap() which we call here and which is true for everything except BasicLayerManager? Then we can easily change it to return the right thing when we have accelerated shadows.
See comment 30 for a way to implement that.
(Assignee)

Comment 42

7 years ago
Not sure I understand. So, LayerManager::IsCompositingCheap would return true for everything but BasicLayerManager and BasicShadowLayerManager? BasicLayerManager will return false, and BasicShadowLayerManager will need to do an IPC call to call IsCompositingCheap in the parent?
LayerManager::IsCompositingCheap() {
  return LAYERS_BASIC != GetBackendType();
}

overridden for basic layers by

BasicShadowLayerManager::IsCompositingCheap() {
  return !mShadowManager ? LayerManager::IsCompositingCheap() :
                           LAYERS_BASIC != mShadowManager->GetParentBackendType();
}
(Assignee)

Comment 44

7 years ago
Created attachment 508541 [details] [diff] [review]
patch

Updated patch with |IsCompositingCheap|.
Attachment #507998 - Attachment is obsolete: true
Attachment #508541 - Flags: review?(roc)
Attachment #507998 - Flags: review?(roc)
+  virtual PRBool IsCompositingCheap()
+  {
+    return IsCompositingCheap(GetBackendType());
+  }

Make this just return true, and override in BasicLayerManager to return false. No need for the virtual call of GetBackendType.

+  // Whether compositing is cheap depends on the parent backend.
+  return LayerManager::IsCompositingCheap(mShadowManager ?
+    GetParentBackendType() : GetBackendType());

So this would just be return mShadowManager && IsCompositingCheap(GetParentBackendType());
(Assignee)

Comment 46

7 years ago
Created attachment 508642 [details] [diff] [review]
patch

Patch with those fixes.
Attachment #508541 - Attachment is obsolete: true
Attachment #508642 - Flags: review?(roc)
Attachment #508541 - Flags: review?(roc)
Comment on attachment 508642 [details] [diff] [review]
patch

We'd better land this for the beta, it may affect desktop Firefox performance.
Attachment #508642 - Flags: review?(roc) → review+
(Assignee)

Comment 48

7 years ago
Sending to try, will push to m-c if all is well.
(Assignee)

Comment 49

7 years ago
(In reply to comment #33)
> Examples of reftests relying on canvases being active is
> layout/bugs/602200-**.html tests.

Early results from try indeed show a failure on 602200-4.html.

Do we want to keep the old behavior (that is, limit the new behavior only to child processes), or do we want to investigate the failure and possibly change the test?
I think what's happening is that we fill in the canvas before the canvas frame is created, so we never get around to marking the layer active.

The conservative way around this would be to mark the canvas frame active when it's created.
(Assignee)

Comment 51

7 years ago
Created attachment 509137 [details] [diff] [review]
patch

Great, looks like that fixes the problems on try.

Did I put the call to MarkLayersActive() in the right place?
Attachment #508642 - Attachment is obsolete: true
Attachment #509137 - Flags: review?(roc)
Comment on attachment 509137 [details] [diff] [review]
patch

+  // We can fill in the canvas before the canvas frame is created, in
+  // which case we never get around to marking the layer active. Therefore,
+  // we mark it active here when we create the frame.
+  MarkLayersActive();

That's probably OK in the constructor but doing it in nsHTMLCanvasFrame::Init would be better. r+ with that.
Attachment #509137 - Flags: review?(roc) → review+
(Assignee)

Comment 53

7 years ago
Pushed: http://hg.mozilla.org/mozilla-central/rev/dc85ecc5f1dc
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Depends on: 640170
Depends on: 852803
No longer depends on: 852803
You need to log in before you can comment on or make changes to this bug.