Last Comment Bug 731868 - cycles through CanvasRenderingContext2DUserData can leak everything
: cycles through CanvasRenderingContext2DUserData can leak everything
Status: RESOLVED FIXED
[MemShrink:P2]
: regression
Product: Core
Classification: Components
Component: Canvas: 2D (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla15
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
:
Mentors:
Depends on:
Blocks: 622072 757749
  Show dependency treegraph
 
Reported: 2012-02-29 16:48 PST by Luke Wagner [:luke]
Modified: 2012-07-10 12:48 PDT (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
wontfix
+
fixed


Attachments
maybe fix (1.34 KB, patch)
2012-03-01 00:19 PST, Luke Wagner [:luke]
dao+bmo: review-
Details | Diff | Splinter Review
the nsHTMLCanvasElement missing link, backtrace (5.04 KB, text/plain)
2012-03-13 00:56 PDT, Luke Wagner [:luke]
no flags Details
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure) (6.61 KB, patch)
2012-05-29 21:55 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
jmuizelaar: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Part 2: Azure fix (6.83 KB, patch)
2012-05-29 22:09 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
jmuizelaar: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Part 3: remove JS workaround for leak (822 bytes, patch)
2012-05-29 22:11 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
dietrich: review+
Details | Diff | Splinter Review

Description Luke Wagner [:luke] 2012-02-29 16:48:46 PST
A side-effect of bug 730497 is to make UI_init (in tabview.js) a "heavyweight" function which seems to cause the BackstagePass and 1000 other things to get leaked at shutdown.  However, this is nothing special about bug 730497, to see this on trunk, just put an "eval('')" at the head of UI_init (browser/components/tabview/tabview.js:8800) and run the browser/components/tabview/test/browser_tabview_firstrun_pref.js browser-chrome test.

I spent a little bit of time looking at the code and the cycle collector dump.  The cause seems to be several event handlers that never get un-registered:
  - line 8829: iQ("#exit-button").click(function() { ... })
  - line 8837: iQ(gTabViewFrame.contentDocument).mousedown(function(e) { ... })
  - line 2016: $container.mousedown(function(e) { ... })
  - line 2627: handleKeyUp

dump() calls show UI_uninit() getting called so perhaps the cleanup is missing some bits?

I'd really appreciate any help anyone can give me understanding this leak.  Again, this is technically a bug on trunk, saved only by an optimization that we want to remove soon.  The optimization is fickle so that means any future modifications to this code may turn off the optimization and trigger the same leak.
Comment 1 Luke Wagner [:luke] 2012-03-01 00:19:28 PST
Created attachment 601892 [details] [diff] [review]
maybe fix

After looking at the code a bit more, I think the fix is simple.
Comment 2 Tim Taubert [:ttaubert] 2012-03-01 02:13:50 PST
Thanks for providing a patch! While there's nothing wrong with adding some cleanup code I don't understand why we're actually leaking here. Sure, these event handlers get never unregistered but I thought that's not a problem because the whole window gets destroyed on shutdown anyway? Is that technically even a leak?
Comment 3 Luke Wagner [:luke] 2012-03-01 02:21:41 PST
Yeah, I'm not really sure, I was just looking at the cycle collector dumps.  I'm not really sure what the lifetimes of all these things are here, but I'm thinking there is a long-lived object (that will live until shutdown) and some content windows (which should all be dead before shutdown) and the lexical closures of the lambdas that my path unregisters are creating edges from the long-lived object to the content windows.  Does that make sense?  It is quite possible this isn't the right fix.

Perhaps peterv has something to add?
Comment 4 Tim Taubert [:ttaubert] 2012-03-01 02:37:19 PST
Panorama's content window is contained in the property gWindow.Tabview._window which is first destroyed on shutdown. Does adding |this._window = null| to TabView.uninit() fix the problem for you?

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-tabview.js#163

As TabView.uninit() is supposed to clean up, this is clearly missing here.
Comment 5 Luke Wagner [:luke] 2012-03-01 12:40:57 PST
I tried that and I still get the leak.  If you want to play around with it more, it is pretty easy to stick an eval('') in the iQ("#exit-button").click lambda and then run:

  python _tests/testing/mochitest/runtests.py --browser-chrome --autorun --close-when-done --test-path=browser/components/tabview/test/browser_tabview_firstrun_pref.js
Comment 6 Tim Taubert [:ttaubert] 2012-03-05 23:56:16 PST
Thanks for the precise STR, easy to reproduce. This line is causing the leak:

http://mxr.mozilla.org/mozilla-central/source/browser/components/tabview/ui.js#222

Removing that line fixes the leaks though I'm not exactly sure why. Maybe we're creating a circular reference here?
Comment 7 Tim Taubert [:ttaubert] 2012-03-06 00:05:29 PST
Argh, false alarm. I accidentally removed the eval() somehow.
Comment 8 Tim Taubert [:ttaubert] 2012-03-06 02:18:03 PST
Ok, think I found it now. I checked with about:cc(dump) to see what's holding the document alive and it was in fact a xhtml:canvas element of which we have one for every tab in Panorama. I'd say it's probably some kind of CC bug wrt to try/catch blocks?

Removing the try/catch block at this line makes the leak disappear:

http://mxr.mozilla.org/mozilla-central/source/browser/components/tabview/tabitems.js#1492
Comment 9 Tim Taubert [:ttaubert] 2012-03-06 02:20:20 PST
(In reply to Tim Taubert [:ttaubert] from comment #8)
> Removing the try/catch block at this line makes the leak disappear:

I mean of course removing only the try/catch stmt, not the code wrapped in it.
Comment 10 Tim Taubert [:ttaubert] 2012-03-06 04:29:41 PST
Comment on attachment 601892 [details] [diff] [review]
maybe fix

Marking the patch as obsolete, seems like we need a fix in the core.
Comment 11 Luke Wagner [:luke] 2012-03-06 09:01:10 PST
(In reply to Tim Taubert [:ttaubert] from comment #8)
> I'd say it's probably some kind of CC bug wrt to try/catch blocks?

The CC shouldn't have any direct interaction with try/catch except through scope objects created.  Given that there are no nested closures/eval in this function, I don't think there would be any scope objects.  Removing the try/catch may also alter control flow (if an exception is thrown), can you confirm that the 'catch' is never executed?
Comment 12 Tim Taubert [:ttaubert] 2012-03-06 10:22:27 PST
Ok, sorry for the hassle. I could swear this fixed it but I can't reproduce it anymore. Latest discovery:

If I remove everything from TabCanvas.paint() except this line http://mxr.mozilla.org/mozilla-central/source/browser/components/tabview/tabitems.js#1480 then the leak still exists. Removing this line makes it disappear. Any chance that the 2d context can somehow hold a canvas alive?
Comment 13 Olli Pettay [:smaug] 2012-03-06 11:59:52 PST
2d context does keep its canvas element alive.
Comment 14 Luke Wagner [:luke] 2012-03-07 14:56:04 PST
The the original patch I posted, do you see the leaks via about:cc?
Comment 15 Luke Wagner [:luke] 2012-03-08 17:51:09 PST
Comment on attachment 601892 [details] [diff] [review]
maybe fix

Tim, you mentioned earlier that these changes look good; can we land this and investigate this issue involving canvas in a followup bug?
Comment 16 Dão Gottwald [:dao] 2012-03-09 03:08:43 PST
It seems like we need someone closer to the JS engine to investigate this, since it doesn't appear to be a front-end bug?
Comment 17 Luke Wagner [:luke] 2012-03-09 08:56:19 PST
Dao: Given that ttaubert said that my patch looks like something that belongs in uninit regardless, that sounds like something to investigate in the follow-up bug.  (Also, it seems like the issue isn't with the JS engine, but, iiuc, with canvas.)
Comment 18 Dão Gottwald [:dao] 2012-03-09 09:05:56 PST
(In reply to Luke Wagner [:luke] from comment #17)
> Dao: Given that ttaubert said that my patch looks like something that
> belongs in uninit regardless

I don't think he said that. The patch doesn't hurt but it shouldn't be needed.
Comment 19 Luke Wagner [:luke] 2012-03-09 09:07:29 PST
On irc.
Comment 20 Luke Wagner [:luke] 2012-03-12 10:07:02 PDT
This patch is holding back the landing of dependent patches.
Comment 21 Dão Gottwald [:dao] 2012-03-12 10:08:10 PDT
Can you explain why this belongs in uninit regardless?
Comment 22 Luke Wagner [:luke] 2012-03-12 10:32:21 PDT
Because uninit's job is to remove registered event handlers.  E.g., see all the removeEventListener calls registered with _cleanupFunctions.
Comment 23 Dão Gottwald [:dao] 2012-03-12 10:52:52 PDT
Why is it uninit's job is to remove registered event handlers? (There's actually only one instance of _cleanupFunctions being used to explicitly call removeEventListener. AllTabs.unregister is probably a leftover from when AllTabs was a JS module.)
Comment 24 Dão Gottwald [:dao] 2012-03-12 10:58:40 PDT
Comment on attachment 601892 [details] [diff] [review]
maybe fix

We need better understanding of what's going on here. Once we have that, we can possibly handle it in a followup bug. We shouldn't wallpaper over it just like this.
Comment 25 Luke Wagner [:luke] 2012-03-12 11:02:28 PDT
Alright, so are you going to help?
Comment 26 Jeff Walden [:Waldo] (remove +bmo to email) 2012-03-12 11:29:09 PDT
(In reply to Luke Wagner [:luke] from comment #17)
> (Also, it seems like the issue isn't with the JS engine, but, iiuc, with canvas.)

Note that the behavior of a canvas context holding the corresponding canvas element alive is by design.  If this is a problem in any particular case, then code should go to extra effort to not hold onto the context when it's no longer needed.
Comment 27 Dão Gottwald [:dao] 2012-03-12 11:41:22 PDT
Judging by <http://mxr.mozilla.org/mozilla-central/source/browser/components/tabview/tabitems.js#1480> which I only glanced at, it doesn't look like something should be holding onto the context.
Comment 28 Dão Gottwald [:dao] 2012-03-12 11:43:37 PDT
Moving to Core since based on what we know so far this doesn't seem to be a front-end bug, although we may take the front-end workaround if deemed appropriate.
Comment 29 Luke Wagner [:luke] 2012-03-12 11:53:33 PDT
So, how about a little help.  Where is the canvas created?  What is it associated with?  When should it become unreachable?
Comment 30 Luke Wagner [:luke] 2012-03-12 17:45:49 PDT
Using Andrew's find_roots.py, I get the following path to the HTMLDocument, starting at a canvas which has 1 unknown edge:

Parsing cc-edges-5.2788.log. Done loading graph. Reversing graph. Done.

0x7f43b25b1b60 [nsGenericElement (xhtml) canvas chrome://browser/content/tabview.html]
    --[GetParent()]-> 0x7f43b5b31780 [nsGenericElement (xhtml) div class='thumb' chrome://browser/content/tabview.html]
    --[GetParent()]-> 0x7f43b5b31600 [nsGenericElement (xhtml) div class='tab focus' chrome://browser/content/tabview.html]
    --[GetParent()]-> 0x7f43b0ae5e90 [nsGenericElement (xhtml) body chrome://browser/content/tabview.html]
    --[mAttrsAndChildren[i]]-> 0x7f43b076fe00 [nsGenericElement (xhtml) div class='groupItem iq-droppable iq-resizable activeGroupItem' chrome://browser/content/tabview.html]
    --[mAttrsAndChildren[i]]-> 0x7f43b0770000 [nsGenericElement (xhtml) div class='titlebar' chrome://browser/content/tabview.html]
    --[mAttrsAndChildren[i]]-> 0x7f43b0770100 [nsGenericElement (xhtml) div class='title-container' chrome://browser/content/tabview.html]
    --[mAttrsAndChildren[i]]-> 0x7f43b5adcd20 [nsGenericElement (xhtml) input class='name' chrome://browser/content/tabview.html]
    --[[via hash] mListenerManager]-> 0x7f43a73330d0 [nsEventListenerManager]
    --[mListeners[i] mListener]-> 0x7f43b0770590 [nsXPCWrappedJS (nsIDOMEventListener)]
    --[mJSObj]-> 0x7f43a566f160 [JS Object (Function)]
    --[upvars[0]]-> 0x7f43a451c180 [JS Object (Function)]
    --[fun_callscope]-> 0x7f43b09f09c0 [JS Object (Call)]
    --[**UNKNOWN SLOT 1**]-> 0x7f43b096f240 [JS Object (Function - GroupItem)]
    --[prototype]-> 0x7f43a50c1a40 [JS Object (Object)]
    --[showExpandControl]-> 0x7f43a501f6a0 [JS Object (Function - GroupItem_showExpandControl)]
    --[nativeReserved[1]]-> 0x7f43a50c1ac0 [JS Object (Object)]
    --[removeAll]-> 0x7f43a501f4c0 [JS Object (Function - GroupItem_removeAll)]
    --[script]-> 0x7f43b1766180 [JS Script]
    --[object, parent, script_global]-> 0x7f43a5053c40 [JS Object (ChromeWindow)]
    --[GroupItems]-> 0x7f43a50c1b00 [JS Object (Object)]
    --[_lastActiveList]-> 0x7f43a5006840 [JS Object (Object)]
    --[_list]-> 0x7f43a50470a0 [JS Object (Array)]
    --[element[0]]-> 0x7f43a451a240 [JS Object (Object)]
    --[$closeButton]-> 0x7f43a451a7c0 [JS Object (Object)]
    --[0]-> 0x7f43a56e7f70 [JS Object (HTMLDivElement)]
    --[parent]-> 0x7f43a5088a60 [JS Object (HTMLDocument)]

    Root 0x7f43b25b1b60 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x7f43b5b31780 [nsGenericElement (xhtml) div class='thumb' chrome://browser/content/tabview.html] --[mAttrsAndChildren[i]]-> 0x7f43b25b1b60
       0x7f43a7dfb630 [nsCanvasRenderingContext2D] --[mCanvasElement]-> 0x7f43b25b1b60
       0x7f43af5dd7c0 [nsDOMCSSAttributeDeclaration] --[mElement]-> 0x7f43b25b1b60

Anyone have any clues about what could keep a <canvas> alive?
Comment 31 Luke Wagner [:luke] 2012-03-12 17:54:53 PDT
Oh, right, the *2d context*, from earlier discussions.  Yes, I can see the same behavior as ttaubert mentioned earlier: commenting out everything but the this.context.getCanvas("2d") leaks, commenting out that line does not leak.
Comment 32 Luke Wagner [:luke] 2012-03-12 17:58:22 PDT
The context (which seems to be in a cycle with the canvas also has 1 unknown edge:

0x7fba31eb1b60 [nsGenericElement (xhtml) canvas chrome://browser/content/tabview.html]
    --[mCurrentContext]-> 0x7fba26c6a6a0 [nsCanvasRenderingContext2D]

    Root 0x7fba31eb1b60 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x7fba2ecf7a00 [nsDOMCSSAttributeDeclaration] --[mElement]-> 0x7fba31eb1b60
       0x7fba26c6a6a0 [nsCanvasRenderingContext2D] --[mCanvasElement]-> 0x7fba31eb1b60
       0x7fba35431580 [nsGenericElement (xhtml) div class='thumb' chrome://browser/content/tabview.html] --[mAttrsAndChildren[i]]-> 0x7fba31eb1b60
Comment 33 Boris Zbarsky [:bz] 2012-03-12 20:52:21 PDT
The canvas and canvas context hold references to each other.  But both those edges should be known to CC, in general...
Comment 34 Luke Wagner [:luke] 2012-03-12 21:27:28 PDT
I'm still digging.  It seems that the leaking version has a canvas elem with rc=4 and only 3 edges known to the CC.  In the non-leaking version, rc=3 so it gets cleaned up early.  I'm thinking getContext('2d') must be causing this outstanding ref-count (the context2d itself has no unaccounted incoming refs).
Comment 35 Luke Wagner [:luke] 2012-03-13 00:56:41 PDT
Created attachment 605311 [details]
the nsHTMLCanvasElement missing link, backtrace

Alright, I dug in more and I think I have the offender and indeed I think it may be platform (canvas) code, but I'll feedback from someone who knows canvas.

So, a nsHTMLCanvasElement is being leaked. The CC dump shows 3 incoming edges but the ref-count is 4.  Doing a refcnt-trace diff (w/ and w/o the eval('') that causes the leak) shows that the non-leaking version has one more 'Release' than the leaking version.  I attached the callstack of this 'Release'; basically, there is a path from ~nsDOMEvent through ~CanvasLayer and finally the nsRefPtr memvar of CanvasRenderingContext2DUserData.  I don't see any cycle collection annotations for this edge, so I think that explains the outstanding refcount from C++.  (The path through layers also explains why getContext('2d') was necessary: I suspect a canvas doesn't get a CanvasLayer until it has a 2d context.)

So why isn't the nsDOMEvent getting destroyed?  Because it is held alive by the 'event' local variable of UI_init which is being closed over by event listeners on the window which is being kept alive by the canvas which is being kept alive by the event...

Setting 'event' to null (or narrowing the scope) fixes the leak, but it seems that the real fix is to annotate CanvasRenderingContext2DUserData for CC.  Can any CC-knowing folk comment on this?
Comment 36 Olli Pettay [:smaug] 2012-03-13 02:46:32 PDT
Sounds right.
Comment 37 Peter Van der Beken [:peterv] - away till Aug 1st 2012-03-13 04:15:09 PDT
Making the layers not hold that strong reference to an element seems like a good change, can we also destroy the whole layer system when we destroy the layout frames?
Comment 38 Boris Zbarsky [:bz] 2012-03-13 06:42:28 PDT
ccing some folks who might know the answer to Peter's question in comment 37.
Comment 39 Timothy Nikkel (:tnikkel) 2012-03-14 02:17:28 PDT
I'm wondering what's keeping the canvas layer alive, it should die shortly after we no longer need to paint the canvas.
Comment 40 Luke Wagner [:luke] 2012-03-14 09:45:53 PDT
(In reply to Timothy Nikkel (:tn) from comment #39)
The nsDOMEvent object (see destructor callstack in attachment to comment 35).

Any volunteers to take this bug?
Comment 41 Olli Pettay [:smaug] 2012-03-14 09:54:20 PDT
Could we release nsPrescontext::mDeviceContext earlier? Not in the dtor, but when the
presentation actually goes away. That should fix this, I think.
Comment 42 Timothy Nikkel (:tnikkel) 2012-03-14 12:19:08 PDT
There is no reason the nsDOMEvent should keep alive a Layer. Layers should get deleted when they are no longer needed to paint to the screen. Is the canvas still visible on screen when you get that callstack (or the instant before it anyway)?
Comment 43 Luke Wagner [:luke] 2012-03-14 14:24:21 PDT
(In reply to Timothy Nikkel (:tn) from comment #42)
You can follow the chain of ref-counting in the destructor callstack in comment 25...
Comment 44 Luke Wagner [:luke] 2012-03-14 14:24:41 PDT
er, comment 35.
Comment 45 Timothy Nikkel (:tnikkel) 2012-03-14 14:36:04 PDT
What I am suggesting is that either that layer needs to be alive that long because it needs to be painted to the screen until that point (in which case the canvas element node can't go away either because it is on screen), or there is a bug in the layer system that is keeping it alive longer than it should be.
Comment 46 Olli Pettay [:smaug] 2012-03-14 14:58:02 PDT
Based on the stack layer system is keeping it alive very long, up until prescontext dtor.
Comment 47 Luke Wagner [:luke] 2012-03-14 15:20:47 PDT
If anyone would like to help, it is easy to reproduce this leak: just add "eval('');" to dist/bin/chrome/browser/content/browser/tabview.js:8830 and run:

python _tests/testing/mochitest/runtests.py --browser-chrome --autorun --close-when-done --test-path=browser/components/tabview/test/browser_tabview_firstrun_pref.js

Adding "event = null;" after tabview.js:8940 makes the leak go away.
Comment 48 Timothy Nikkel (:tnikkel) 2012-03-16 01:12:19 PDT
Ok, what I could see being the case is that the last paint of the window has the canvas element visible. Then the window is closed. After this is doesn't get any more paints so the layers don't get updated. Then they get destroyed when the widget gets destroyed. Maybe we should flush the layers sometime before the widget destructor.
Comment 49 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 22:01:56 PDT
Try clearing out the layer manager in nsIWidget::Show(false)?
Comment 50 Timothy Nikkel (:tnikkel) 2012-03-19 14:56:43 PDT
Was there a reason we only did that on widget destroy?
Comment 51 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-19 15:25:24 PDT
I don't recall a reason.
Comment 52 Olli Pettay [:smaug] 2012-03-31 13:52:36 PDT
tn, could you perhaps look at this. Leaks are bad, they increase cycle collection times
significantly.
Comment 53 Timothy Nikkel (:tnikkel) 2012-04-03 22:05:43 PDT
I'll take it, but I'm not going to get to this right away.
Comment 54 Andrew McCreight [:mccr8] 2012-05-25 10:29:20 PDT
Bug 757749 has a test case that creates a similar leak.
Comment 55 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-28 20:03:10 PDT
(In reply to Luke Wagner [:luke] from comment #47)
> If anyone would like to help, it is easy to reproduce this leak: just add
> "eval('');" to dist/bin/chrome/browser/content/browser/tabview.js:8830 and
> run:
> 
> python _tests/testing/mochitest/runtests.py --browser-chrome --autorun
> --close-when-done
> --test-path=browser/components/tabview/test/browser_tabview_firstrun_pref.js
> 
> Adding "event = null;" after tabview.js:8940 makes the leak go away.

Can you give instructions that don't depend on line numbers that have probably changed now? :-)
Comment 56 Luke Wagner [:luke] 2012-05-29 09:57:04 PDT
Since bug 730497 landed, just comment out "event = null" in:
  http://mxr.mozilla.org/mozilla-central/source/browser/components/tabview/ui.js#255
(or the analogous line in dist/bin/chrome/browser/content/browser/tabview.js).  I just tested again and this still repros.
Comment 57 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 17:21:18 PDT
I couldn't reproduce this on Windows. Is it Mac-only like bug 757749 seems to be?
Comment 58 Luke Wagner [:luke] 2012-05-29 17:52:01 PDT
I repro'd on Linux.
Comment 59 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 20:57:48 PDT
OK, I reproduce now. Thanks!
Comment 60 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 21:55:10 PDT
Created attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)
Comment 61 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 22:09:54 PDT
Created attachment 628224 [details] [diff] [review]
Part 2: Azure fix
Comment 62 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 22:11:26 PDT
Created attachment 628225 [details] [diff] [review]
Part 3: remove JS workaround for leak
Comment 63 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-29 22:12:50 PDT
I'm almost certain this will fix bug 757749 as well.
Comment 64 Jeff Muizelaar [:jrmuizel] 2012-05-31 07:21:09 PDT
Comment on attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)

Review of attachment 628216 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/nsCanvasRenderingContext2D.cpp
@@ +440,5 @@
>      bool mIPC;
>  
>      // the canvas element we're a context of
>      nsCOMPtr<nsIDOMHTMLCanvasElement> mCanvasElement;
> +    nsTArray<CanvasRenderingContext2DUserData*> mUserDatas;

A LinkedList will avoid the O(n) RemoveElement in CanvasRenderingContext2DUserData. Maybe we don't have enough of these for this to matter?

@@ +758,5 @@
>  
>      friend struct nsCanvasBidiProcessor;
>  };
>  
> +class CanvasRenderingContext2DUserData : public LayerUserData {

It might be worth explicitly calling this out as a weak reference.
Comment 65 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-31 16:14:34 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #64)
> A LinkedList will avoid the O(n) RemoveElement in
> CanvasRenderingContext2DUserData. Maybe we don't have enough of these for
> this to matter?

Correct. 0 or 1 are the overwhelmingly common cases. We could have more than one in unusual situations like temporary layer managers constructed for drawWindow calls.

> @@ +758,5 @@
> >  
> >      friend struct nsCanvasBidiProcessor;
> >  };
> >  
> > +class CanvasRenderingContext2DUserData : public LayerUserData {
> 
> It might be worth explicitly calling this out as a weak reference.

OK.
Comment 66 Jeff Muizelaar [:jrmuizel] 2012-05-31 16:21:36 PDT
Comment on attachment 628224 [details] [diff] [review]
Part 2: Azure fix

Assuming this is basically the same, it looks good to me.
Comment 68 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 00:24:49 PDT
I don't think we should uplift the patches here to an earlier release. I believe this leak can only happen during shutdown.
Comment 69 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 00:26:32 PDT
Actually I take that back. It might be possible to trigger this leak just by closing a window.
Comment 70 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 00:31:38 PDT
Comment on attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)

Review of attachment 628216 [details] [diff] [review]:
-----------------------------------------------------------------

So I think we should uplift this to Aurora. Leaks are bad.
Comment 71 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 01:28:39 PDT
Backed out for test failures:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebb3dd6d811a
Comment 72 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 01:29:23 PDT
Comment on attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)

Review of attachment 628216 [details] [diff] [review]:
-----------------------------------------------------------------

So I think we should uplift this to Aurora. Leaks are bad.
Comment 73 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 03:50:45 PDT
Additional backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/d9f90e9e1de9
Comment 74 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 03:53:24 PDT
Seemed to cause high-frequency failure in test_videocontrols.html on Macos 10.7 debug only:
https://tbpl.mozilla.org/php/getParsedLog.php?id=12263937&tree=Mozilla-Inbound&full=1

Reftest failures across all Mac platforms:
https://tbpl.mozilla.org/php/getParsedLog.php?id=12264983&tree=Mozilla-Inbound&full=1
Comment 75 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-01 06:34:47 PDT
It looks like under some conditions drawing into a canvas isn't triggering invalidation. That suggests I somehow failed to call MarkContextClean when I need to.
Comment 77 :Ehsan Akhgari (away Aug 1-5) 2012-06-02 11:47:16 PDT
Backed out: https://hg.mozilla.org/mozilla-central/rev/ebb3dd6d811a
Comment 78 :Ehsan Akhgari (away Aug 1-5) 2012-06-02 11:50:43 PDT
Backed out for real! https://hg.mozilla.org/mozilla-central/rev/d9f90e9e1de9
Comment 79 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-04 16:11:34 PDT
10.7 has extra failures:
https://tbpl.mozilla.org/php/getParsedLog.php?id=12266059&tree=Mozilla-Inbound&full=1#error30
Comment 80 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-04 17:07:30 PDT
jrmuizelaar: you have a Mac setup, can you debug this? It shouldn't be hard, just run the reftest layout/reftests/bugs/579323-1.html (e.g. by putting it in layout/reftests/bugs/tmp.list and running -reftest file://.../tmp.list) and see how it is we're getting into an infinite loop.
Comment 81 Timothy Nikkel (:tnikkel) 2012-06-17 21:06:32 PDT
I applied the patches to an old rev around when the patches were posted (they didn't apply to current trunk) and I can't reproduce any infinite loop when running reftests on my mac 10.6 machine.
Comment 82 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-21 04:32:20 PDT
https://tbpl.mozilla.org/?tree=Try&rev=562b0a797195
Comment 83 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-22 02:59:08 PDT
https://tbpl.mozilla.org/?tree=Try&rev=e3db41113451

Seems to me that using the same class name defined differently in two different files was a bad idea. I've fixed that and tests on Mac are passing now. I'll try relanding.
Comment 85 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-22 16:07:48 PDT
Comment on attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)

Review of attachment 628216 [details] [diff] [review]:
-----------------------------------------------------------------

It stuck this time. Apart from my boneheaded mistake earlier, the patch is straightforward, and memory leaks are really bad, so I think taking this on Aurora would be wise.
Comment 86 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-22 16:07:55 PDT
Comment on attachment 628224 [details] [diff] [review]
Part 2: Azure fix

Review of attachment 628224 [details] [diff] [review]:
-----------------------------------------------------------------

It stuck this time. Apart from my boneheaded mistake earlier, the patch is straightforward, and memory leaks are really bad, so I think taking this on Aurora would be wise.
Comment 88 Alex Keybl [:akeybl] 2012-06-26 09:58:57 PDT
Comment on attachment 628216 [details] [diff] [review]
Part 1: make CanvasRenderingContext2DUserData's reference to the context weak (non-Azure)

[Triage Comment]
New memory leak in 14, let's skip up the trains and consider taking on beta if deemed low risk.
Comment 89 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-26 19:12:36 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/7b2eb38d3045
https://hg.mozilla.org/releases/mozilla-aurora/rev/d807814e2927
Comment 90 Alex Keybl [:akeybl] 2012-07-08 18:31:53 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #85)
> Comment on attachment 628216 [details] [diff] [review]
> Part 1: make CanvasRenderingContext2DUserData's reference to the context
> weak (non-Azure)
> 
> Review of attachment 628216 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It stuck this time. Apart from my boneheaded mistake earlier, the patch is
> straightforward, and memory leaks are really bad, so I think taking this on
> Aurora would be wise.

How do you feel about getting this FF14 regression fix into our final beta?
Comment 91 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-07-08 23:17:51 PDT
While I'm confident in the patch, I'm not sure this bug is a big deal. No leaks on real sites have been traced to this. The only report we have of a bug in FF14 would be bug 757749. Bug 757749 appears to be Mac-only too.

So I'm not sure. I think on balance we should probably take it, but not taking it would also be reasonable.
Comment 92 Alex Keybl [:akeybl] 2012-07-10 12:48:30 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #91)
> While I'm confident in the patch, I'm not sure this bug is a big deal. No
> leaks on real sites have been traced to this. The only report we have of a
> bug in FF14 would be bug 757749. Bug 757749 appears to be Mac-only too.
> 
> So I'm not sure. I think on balance we should probably take it, but not
> taking it would also be reasonable.

Given that, we'll wontfix for FF14.

Note You need to log in before you can comment on or make changes to this bug.