Closed Bug 794337 Opened 7 years ago Closed 7 years ago

Canvas does not update properly without hardware acceleration

Categories

(Core :: Canvas: 2D, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox15 --- unaffected
firefox16 --- unaffected
firefox17 --- affected
firefox18 --- verified
firefox19 --- fixed
firefox-esr10 --- unaffected

People

(Reporter: spectre, Assigned: BenWa)

References

()

Details

Attachments

(1 file, 2 obsolete files)

First demonstrated on TenFourFox, which lacks hardware acceleration altogether, but also reproducible on Firefox on 10.6.

STR:
0. Disable hardware acceleration from about:config and restart the browser.
1. Start JSNES with any game.
2. The display does not update until you open a new tab, switch to it and switch back. Even then it does not update consistently.

Other canvas-based animation is also affected, such as GameBoy.

Works with hardware acceleration on. Does not change if Azure is turned off. 16 is not affected.

Last good revision: http://hg.mozilla.org/mozilla-central/rev/8b96a33ecbd2
First bad revision: http://hg.mozilla.org/mozilla-central/rev/2abd21593e57
Bug 764125 is in this regression range, but I'm not sure what is exactly at fault from that.
I neglected to add that this still affects trunk.
While it's not repainting properly, are we doing empty transactions? I.e. is PresShell::Paint getting called regularly, but we return after calling EndEmptyTransaction()?
I see lots of Paints, but no empty transactions. I threw a fprintf in there just to make sure, and all I see is Paint, on both systems:

+fprintf(stderr, "nsPresShell::Paint\n");
   if (frame && isRetainingManager) {
     // Try to do an empty transaction, if the frame tree does not
     // need to be updated. Do not try to do an empty transaction on
     // a non-retained layer manager (like the BasicLayerManager that
     // draws the window title bar on Mac), because a) it won't work
     // and b) below we don't want to clear NS_FRAME_UPDATE_LAYER_TREE,
     // that will cause us to forget to update the real layer manager!
     if (aType == PaintType_Composite) {
       if (layerManager->HasShadowManager()) {
         return;
       }
       layerManager->BeginTransaction();
       if (layerManager->EndEmptyTransaction()) {
+fprintf(stderr, "Empty transaction\n");
         return;
       }
Is CanvasRenderingContext2DUserDataAzure::DidTransactionCallback getting called for every frame?
Yes, I see a DidTransactionCallback on every nsPresShell::Paint.

I tried to replicate this on a Windows 7 machine at work and was unable to, btw, so this seems Mac-specific.
In every PresShell::Paint, is BasicCanvasLayer::UpdateSurface getting called and are we getting past the "if (!IsDirty())" check?
With a couple extra fprintf's worth of logging, I see the following on each frame:

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = true
::DidTransactionCallback

Looking at UpdateSurface, since we have no hardware acceleration, we should have no mGLContext. Is it ever possible for aDestSurface to be null(ptr) in there? Should that not happen? It looks like if there is no mGLContext and no aDestSurface, nothing happens.
Looking even further, the only place I see that calls UpdateSurface with a null aDestSurface is

229 void
230 BasicCanvasLayer::Paint(gfxContext* aContext, Layer* aMaskLayer)
231 {
232   if (IsHidden())
233     return;
234   UpdateSurface();  /**** HERE ****/
235   FireDidTransactionCallback();
236   PaintWithOpacity(aContext, GetEffectiveOpacity(), aMaskLayer);
237 }

Why would this call it without a surface?
(In reply to Cameron Kaiser from comment #7)
> Looking at UpdateSurface, since we have no hardware acceleration, we should
> have no mGLContext.

That just means it's not a WebGL canvas.

Is "(void)drawRect:(NSRect)aRect inContext:(CGContextRef)aContext" in nsChildView.mm getting called regularly? What rects are being returned by getRectsBeingDrawn?
Sorry, I was out of town. I made a couple tweaks to nsChildView.mm but I'm getting garbage numbers out of it. Did I patch this right? During the JSNES run I see, repeatedly,

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = true
::DidTransactionCallback
drawRect inContext
---- Update[0x2110bbc0][0x2110bab0] [678.000000 437.000000 257.000000 240.000000] cgc: 0x1c0fd360
  gecko bounds: [0 0 1288 999]
  xform in: [1.000000 0.000000 0.000000 -1.000000 0.000000 999.000000]
rect 0 from getRectsBeingDrawn 1081085952 x 0

This doesn't make a lot of sense to me and I think I'm doing something wrong with the NSRect but I can't figure out what.

-#ifdef DEBUG_UPDATE
+//#ifdef DEBUG_UPDATE
   nsIntRect geckoBounds;
   mGeckoChild->GetBounds(geckoBounds);
 
   fprintf (stderr, "---- Update[%p][%p] [%f %f %f %f] cgc: %p\n  gecko bounds: [%d %d %d %d]\n",
            self, mGeckoChild,
            aRect.origin.x, aRect.origin.y, aRect.size.width, aRect.size.height, aContext,
            geckoBounds.x, geckoBounds.y, geckoBounds.width, geckoBounds.height);
 
   CGAffineTransform xform = CGContextGetCTM(aContext);
   fprintf (stderr, "  xform in: [%f %f %f %f %f %f]\n", xform.a, xform.b, xform.c, xform.d, xform.tx, xform.ty);
-#endif
+//#endif
   nsIntRegion region;
 
   nsIntRect boundingRect =
     nsIntRect(aRect.origin.x, aRect.origin.y, aRect.size.width, aRect.size.height);
   const NSRect *rects;
   NSInteger count, i;
   [[NSView focusView] getRectsBeingDrawn:&rects count:&count];
   if (count < MAX_RECTS_IN_REGION) {
     for (i = 0; i < count; ++i) {
       // Add the rect to the region.
       const NSRect& r = [self convertRect:rects[i] fromView:[NSView focusView]];
+fprintf(stderr, "rect %i from getRectsBeingDrawn %i x %i\n",
+       i, rects[i].size.width, rects[i].size.height);
       region.Or(region, nsIntRect(r.origin.x, r.origin.y, r.size.width, r.size.height));
     }
     region.And(region, boundingRect);
   } else {
     region = boundingRect;
   }
You know what works really well for NSRect coordinates? Floats.

Anyway, when I am on the JSNES tab, I see, over and over:

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = false
::DidTransactionCallback
drawRect inContext
---- Update[0x1db0f730][0x1db0f620] [678.000000 437.000000 257.000000 240.000000] cgc: 0x1ce5a260
  gecko bounds: [0 0 1288 999]
  xform in: [1.000000 0.000000 0.000000 -1.000000 0.000000 999.000000]
rect 0 from getRectsBeingDrawn 257.000000 x 240.000000

This does correspond to the size of the canvas. When I switch tabs back and forth, which repaints the entire view (I presume), I get

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = true
::DidTransactionCallback
drawRect inContext
---- Update[0x1db0f730][0x1db0f620] [0.000000 0.000000 1288.000000 999.000000] cgc: 0x1ce5a260
  gecko bounds: [0 0 1288 999]
  xform in: [1.000000 0.000000 0.000000 -1.000000 0.000000 999.000000]
rect 0 from getRectsBeingDrawn 1288.000000 x 999.000000

and then back to

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = false
::DidTransactionCallback
drawRect inContext
---- Update[0x1db0f730][0x1db0f620] [678.000000 437.000000 257.000000 240.000000] cgc: 0x1ce5a260
  gecko bounds: [0 0 1288 999]
  xform in: [1.000000 0.000000 0.000000 -1.000000 0.000000 999.000000]
rect 0 from getRectsBeingDrawn 257.000000 x 240.000000

So it looks like it is trying to paint, but it's not painting the right thing?
(btw, changed to

+fprintf(stderr, "rect %i from getRectsBeingDrawn %f x %f\n",
+       i, r.size.width, r.size.height);

so it actually reflects what is being fed to region.Or().
Before you said mDirty was true, now you say it's false. Is it really false?
I apologize, I cut and pasted that from the transaction immediately following the repaint of the entire window, when it was indeed not dirty. It really is showing mDirty = true while the script is running. This is the correct output:

nsPresShell::Paint
BasicCanvasLayer::UpdateSurface
mDirty = true
::DidTransactionCallback
drawRect inContext
---- Update[0x1db13a50][0x1db13940] [678.000000 437.000000 257.000000 240.000000] cgc: 0x1ce5a990
  gecko bounds: [0 0 1288 999]
  xform in: [1.000000 0.000000 0.000000 -1.000000 0.000000 999.000000]
rect 0 from getRectsBeingDrawn 257.000000 x 240.000000
So I don't know. That all looks good.

Maybe you should bisect down to a particular changeset.
I'm still confused about comment 8, though. Since WebGL won't work, and the destination surface would be null in that instance, it seems like that Paint call won't do anything (or at least it won't update anything).

In the meantime, I will try to bisect it down.
Two more questions while I'm trying to make heads or tails of this:

- What is the pipeline by which a canvas surface gets updated?
- Are you able to see this too with hardware acceleration disabled, or am I just nuts (or this is a 10.6-and-earlier-only bug)?
Hmm. CanvasLayerOGL does
  mCanvasSurface = gfxPlatform::GetPlatform()->GetThebesSurfaceForDrawTarget(mDrawTarget);
but that could make a snapshot of mDrawTarget ... maybe after that we don't update mCanvasSurface anymore? I don't really see how this is supposed to work. Jeff, BenWa, how are Azure 2D canvases supposed to work on Mac?
FWIW, that line is also in BasicCanvasLayer::UpdateSurface, but commented out ("TODO Fix me before turning accelerated quartz canvas by default"). Does that need to be uncommented there now that it is, in fact, default?
Accelerated Quartz canvas is not on by default yet.
I touched this code recently. I got it.
Assignee: nobody → bgirard
For what it's worth, uncommenting that line in comment 19 does repair this bug, but I don't know if that's a good fix or the right fix (the below is against 17).

 void
 BasicCanvasLayer::UpdateSurface(gfxASurface* aDestSurface, Layer* aMaskLayer)
 {
+  if (!mDirty)
+    return;
+  mDirty = false;
+
   if (mDrawTarget) {
     mDrawTarget->Flush();
     // TODO Fix me before turning accelerated quartz canvas by default
-    //mSurface = gfxPlatform::GetPlatform()->GetThebesSurfaceForDrawTarget(mDrawTarget);
+    mSurface = gfxPlatform::GetPlatform()->GetThebesSurfaceForDrawTarget(mDrawTarget);
   }
Ok here's what is going on:
Before with GetThebesSurfaceForDrawTarget we used to return the same surface on subsequent draw but this is no longer the case after:
http://hg.mozilla.org/mozilla-central/diff/c5125dde4bbf/gfx/thebes/gfxPlatformMac.cpp

Nick is this intentional? I couldn't find any discussion about this change in the review comments. I'm just wondering if the previous code was causing a bug that lead you to change this in incur extra work?

So we can either fix this by re-using the same surface or by un-commenting that line and updating the surface every frame.
Attached patch Update mSurface every frame (obsolete) — Splinter Review
Waiting on feedback from nrc for why GetThebesSurfaceForDrawTarget was changed to require this.
Attachment #670393 - Flags: feedback?(ncameron)
(In reply to Benoit Girard (:BenWa) from comment #23)
> Ok here's what is going on:
> Before with GetThebesSurfaceForDrawTarget we used to return the same surface
> on subsequent draw but this is no longer the case after:
> http://hg.mozilla.org/mozilla-central/diff/c5125dde4bbf/gfx/thebes/
> gfxPlatformMac.cpp
> 
> Nick is this intentional? I couldn't find any discussion about this change
> in the review comments. I'm just wondering if the previous code was causing
> a bug that lead you to change this in incur extra work?
> 
> So we can either fix this by re-using the same surface or by un-commenting
> that line and updating the surface every frame.

Yes it was definitely intentional, see this comment from the same changeset: http://hg.mozilla.org/mozilla-central/rev/c5125dde4bbf#l4.35 I think the reason is to avoid reference cycles because the surface holds a strong reference to the draw target, so the draw target should not hold a strong ref to the surface (but my memory is a little hazy about that).

So, the right way to fix this is to store the returned surface and reuse it, if that is possible.
Oh, maybe that is what is happening right now anyway. I don't think you should create a new surface every time we update the canvas, that sounds horribly wasteful. Will look at this a bit more....
(In reply to Nick Cameron [:nrc] from comment #26)
> Oh, maybe that is what is happening right now anyway. I don't think you
> should create a new surface every time we update the canvas, that sounds
> horribly wasteful. Will look at this a bit more....

OK, I'm not sure about that commented line, but that shouldn't happen (I think). I hope we never get a snapshot for GetThebesSurfaceForDrawTarget, it didn't touch the snaphotting code and just creates a new surface. I'm afraid I don't have any more insight for this, but I can push it on to my work stack if needed.
For 17, I need to fix this in TenFourFox; it's a bad regression for me to ship. Is there anything *wrong* with uncommenting that line, other than performance? No Power Mac supports BACKEND_COREGRAPHICS_ACCELERATED, just BACKEND_COREGRAPHICS. Do you have a better idea I could cobble a fix together over?
I don't think there is anything _wrong_ with un-commenting that line, but it will be really, really horrible for perf. There must be a better fix.
I looked quickly, I though it was just creating a surface wrapper which would be reasonable performance. I saw the dom canvas code actually calling this method once per frame. I'll grab a profile shortly to see how bad it is.
> So, the right way to fix this is to store the returned surface

Assuming so, where would be an appropriate place to store it, if not the UserData? Any reference cycle issues if the BasicCanvasLayer itself were made to hold the surface?
(In reply to Cameron Kaiser from comment #31)
> > So, the right way to fix this is to store the returned surface
> 
> Assuming so, where would be an appropriate place to store it, if not the
> UserData? Any reference cycle issues if the BasicCanvasLayer itself were
> made to hold the surface?

I think BasicCanvasLayer is the place to store it, like on the accelerated layers. I don't think you should store it in user data.
I'm looking at CanvasLayerOGL to see if I understand what you mean. Maybe I'm not getting how the data flows from surface to surface, but how is mCanvasSurface in CanvasLayerOGL.cpp handled differently than mSurface in CanvasLayerBasic.cpp? The difference I see is not that the layer doesn't have the surface stored, but rather it looks like CanvasLayerOGL.cpp::UpdateSurface actually is updating its own surface internally, which CanvasLayerBasic doesn't appear to do. Or do I not understand what mSurface and mCanvasSurface actually contain?

My apologies if these questions are ignorant.
Attachment #670393 - Flags: feedback?(ncameron) → review?(jmuizelaar)
I didn't notice any performance impact when profiling. I got 0 samples in that function.
Comment on attachment 670393 [details] [diff] [review]
Update mSurface every frame

Review of attachment 670393 [details] [diff] [review]:
-----------------------------------------------------------------

Why is mSurface changing?
Attachment #670393 - Flags: review?(jmuizelaar) → review-
Attached patch patch (obsolete) — Splinter Review
Attachment #670393 - Attachment is obsolete: true
Attachment #677866 - Flags: review?(jmuizelaar)
Comment on attachment 677866 [details] [diff] [review]
patch

Review of attachment 677866 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPlatformMac.cpp
@@ +386,5 @@
> +    unsigned char* data = (unsigned char*)CGBitmapContextGetData(cg);
> +    size_t bpp = CGBitmapContextGetBitsPerPixel(cg);
> +    size_t stride = CGBitmapContextGetBytesPerRow(cg);
> +    gfxIntSize size(aTarget->GetSize().width, aTarget->GetSize().height);
> +    nsRefPtr<gfxImageSurface> imageSurface = new gfxImageSurface(data, size, stride, bpp == 2

This needs to be 16 instead of 2.
Comment on attachment 677866 [details] [diff] [review]
patch

Review of attachment 677866 [details] [diff] [review]:
-----------------------------------------------------------------

Please add lots more comments

::: gfx/thebes/gfxPlatformMac.cpp
@@ +378,5 @@
>      return threshold;
>  }
>  
>  already_AddRefed<gfxASurface>
> +gfxPlatformMac::CreateThebesSurfaceAliasForDrawTarget_hack(mozilla::gfx::DrawTarget *aTarget)

Please use a more descriptive name than hack.

@@ +387,5 @@
> +    size_t bpp = CGBitmapContextGetBitsPerPixel(cg);
> +    size_t stride = CGBitmapContextGetBytesPerRow(cg);
> +    gfxIntSize size(aTarget->GetSize().width, aTarget->GetSize().height);
> +    nsRefPtr<gfxImageSurface> imageSurface = new gfxImageSurface(data, size, stride, bpp == 2
> +                                                                                     ? gfxImageFormat::ImageFormatRGB16_565

Add a comment about why this is gfxImageSurface and not a gfxQuartzImageSurface.
Attachment #677866 - Flags: review?(jmuizelaar) → review+
(In reply to Jeff Muizelaar [:jrmuizel] from comment #38)
> Please use a more descriptive name than hack.
> 

I'm not sure what else to add, the name mentions that it's a surface alias for the draw target. Maybe hack->CairoWorkaround?
Attachment #677866 - Attachment is obsolete: true
Attachment #679673 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/1ef1038baaf1
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Comment on attachment 679673 [details] [diff] [review]
patch - fix try failure caused by displacing the flush

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 764125
User impact if declined: 

Some canvas will fail to update on mac. This will be all un-accelerated users (10% of the mac population last I checked but I'd have to confirm this) and will affect some instances of un-accelerated canvas for accelerated users (the remaining 90%). Un-accelerated canvas compositing can be used for non accelerated windows and views such as add-ons windows/panels.

Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): Small but not entirely trivial. I could get other GFX peers to double check the patch. Backing out the cause is not a viable option.
String or UUID changes made by this patch: None
Attachment #679673 - Flags: approval-mozilla-beta?
Attachment #679673 - Flags: approval-mozilla-aurora?
Comment on attachment 679673 [details] [diff] [review]
patch - fix try failure caused by displacing the flush

(In reply to Benoit Girard (:BenWa) from comment #45)
> Comment on attachment 679673 [details] [diff] [review]
> Risk to taking this patch (and alternatives if risky): Small but not
> entirely trivial. I could get other GFX peers to double check the patch.
> Backing out the cause is not a viable option.


Given how late we are in the cycle, I'm declining the beta approval but if more GFX peers weigh in and confirm the low risk of this patch I would reconsider if this is something you think a lot of users would hit in the coming 6 weeks until 18 ships.
Attachment #679673 - Flags: approval-mozilla-beta?
Attachment #679673 - Flags: approval-mozilla-beta-
Attachment #679673 - Flags: approval-mozilla-aurora?
Attachment #679673 - Flags: approval-mozilla-aurora+
Alright I discussed this with Joe and we concluded that we wont push for the beta-uplift.
We're using the original workaround for TenFourFox 17 stable anyway, and we'll pick this up separately for 18+.
Verified that canvas updates properly when hardware acceleration is disabled on Mac OS X 10.7.5 using Firefox 18 beta 3. 
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:18.0) Gecko/20100101 Firefox/18.0
Build ID:  20121205060959
QA Contact: simona.marcu
You need to log in before you can comment on or make changes to this bug.