Closed Bug 795186 Opened 12 years ago Closed 12 years ago

Declare refptrs between WebGL objects to the CC even if there is no cycle at C++ level, because JS could create cycles

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox18 + fixed
firefox19 + fixed

People

(Reporter: bjacob, Assigned: bjacob)

References

Details

(Whiteboard: webgl-next [MemShrink:P1])

Attachments

(3 files)

For example, WebGLUniformLocation has a nsRefPtr to WebGLProgram. While there is no cycle at the c++ level, js code could do

  program.foo = uniformloc;

thus creating a cycle.

Thus we need to declare all such refptrs to the cc.

Cycles that are entirely created by JS code do not need to be declared to the CC as the JS GC takes care of them, but if part of the cycle is a reference created by c++ code, then the CC needs to know about it.
Smaug tells me that we have a leak in http://maps.nokia.com/webgl/ that looks like it might be caused by this bug. Prioritizing.
Whiteboard: webgl-next → webgl-next [MemShrink]
Blocks: 802834
To be clear, the only nontrivial thing in this bug is that WebGL inter-object strong references use a custom refptr class, WebGLRefPtr, so it's not clear beforehand that it's just a matter of using the usual CC macros -- have to look into these macros to see what they are doing.

WebGLRefPtr is here:
http://hg.mozilla.org/mozilla-central/file/39d70a2e0dea/content/canvas/src/WebGLContext.h#l300
The macros are fairly simple: grab out the underlying pointer, do a callback.  Assuming your underlying classes are the usual nsISupports CCed classes, and the WebGLRefPtr supports .get(), then it looks like the usual NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR should just work.
(Hit enter before I could write the explanatory comment)

Unfortunately, while this does all I can see needs doing here, the issue with Nokia Maps seems to persists: if I view that page and then go to about:memory in the same tab, then no matter how long I wait and click GC/CC buttons, the webgl metrics stay. Note: in about:memory, do NOT click 'minimize memory usage' as this would cause the WebGL context to destroy all its resources, artificially making the problem disappear.

Explanation on this patch:

The header file where all those classes are declared is WebGLContext.h.

Aside from usual refptr classes, the main custom refptr class here is WebGLRefPtr. There is a comment there explaining why it exists. Also, WebGLVertexAttribData stores a WebGLRefPtr member, so we had to add a .get() method to it to make CC macros work with it.

A nontrivial change in this patch is that it adds CC to WebGLUniformLocation, which wasn't CC'd so far. The reason why it wasn't CC'd was that contrary to most other classes there, WebGLUniformLocation objects can only ever be gotten by scripts by a [Creator] method, webgl.getUniformLocation, so there is no need for wrappercaching. That's why WebGLUniformLocation still doesn't inherit nsWrapperCache.
(In reply to Benoit Jacob [:bjacob] from comment #5)
> Note: in about:memory, do NOT click
> 'minimize memory usage' as this would cause the WebGL context to destroy all
> its resources, artificially making the problem disappear.
Ah, this is interesting. Add a breakpoint to the context release and check what is releasing it when
minimizing memory.
Blocks: 800488
Comment on attachment 672984 [details] [diff] [review]
fix cycle collection in the WebGL implementation

The LinkedList member variables don't need traversing/unlinking, right?
They are just list of objects which have raw refs to each other?



Can there be lots of WebGL related objects alive?
If so, we should probably make them skippable. Something like
http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsDOMEventTargetHelper.h?rev=d8798b6331be&mark=31-31#24
and
http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsDOMEventTargetHelper.cpp?rev=5a6a36d109d7&mark=47-62#42

There are 3 different skip-phases, documented here
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsCycleCollectionParticipant.h?rev=1e5bf398fc88&mark=256-260,265-268,273-276#244

So, file a followup to add support for skippability.
Attachment #672984 - Flags: review?(bugs) → review+
Can you write a test for this?
Flags: in-testsuite?
Assignee: nobody → bjacob
At a point, I thought that this fixed all the bugs (nokia maps and ro.me); but now I can't reproduce anymore. There are poltergeists around here. Anyway, this timer _is_ causing a call to WebGLContext::AddRef() so this seems like another cycle that we failed to declare.
Attachment #673510 - Flags: review?(bugs)
My reference analysis tool (bug 704240) finds only one persistent (not broken by the CC) cycle involving WebGL, and that appears to be the cycle between the WebGLContext and the HTMLCanvasElement:


Reference cycle 39:

Contains 2 blocks, listed below:

block 592393
has references to these blocks: 415899, 592399
allocation call stack:
at 0x402B1FE: malloc (vg_replace_malloc.c:271)
by 0x403BEE4: moz_xmalloc (mozalloc.cpp:57)
by 0x709865F: NS_NewHTMLCanvasElement(already_AddRefed<nsINodeInfo>, mozilla::dom::FromParser) (mozalloc.h:200)
by 0x70F2476: CreateHTMLElement(unsigned int, already_AddRefed<nsINodeInfo>, mozilla::dom::FromParser) (nsHTMLContentSink.cpp:497)
by 0x70F3DCA: NS_NewHTMLElement(nsIContent**, already_AddRefed<nsINodeInfo>, mozilla::dom::FromParser) (nsHTMLContentSink.cpp:480)
by 0x6F94058: NS_NewElement(nsIContent**, already_AddRefed<nsINodeInfo>, mozilla::dom::FromParser) (nsNameSpaceManager.cpp:201)
by 0x6F54C75: nsDocument::CreateElem(nsAString_internal const&, nsIAtom*, int, nsIContent**) (nsDocument.cpp:6943)
by 0x6F478EC: nsDocument::CreateElement(nsAString_internal const&, nsIContent**) (nsDocument.cpp:4561)
by 0x762AEF0: nsIDOMDocument_CreateElement(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:1273)
by 0x821A89B: js::CallJSNative(JSContext*, int (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:364)
by 0x822638E: js::InvokeKernel(JSContext*, JS::CallArgs, js::MaybeConstruct) (jsinterp.cpp:367)
by 0x82213D7: js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) (jsinterp.cpp:2370)

block 592399
has references to these blocks: 592393, 592400, 592871, 593172
allocation call stack:
at 0x402B1FE: malloc (vg_replace_malloc.c:271)
by 0x403BEE4: moz_xmalloc (mozalloc.cpp:57)
by 0x7011F1E: NS_NewCanvasRenderingContextWebGL(nsIDOMWebGLRenderingContext**) (mozalloc.h:200)
by 0x6C7BFFC: CreateCanvasRenderingContextWebGL(nsISupports*, nsID const&, void**) (in /hack/mozilla-central/obj-firefox-valgrind/toolkit/library/libxul.so)
by 0x7BFC628: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1006)
by 0x7BB5E8C: nsCreateInstanceByContractID::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:178)
by 0x7098830: nsCOMPtr<nsICanvasRenderingContextInternal>::assign_from_helper(nsCOMPtr_helper const&, nsID const&) (nsCOMPtr.h:1286)
by 0x7098D46: nsCOMPtr<nsICanvasRenderingContextInternal>::nsCOMPtr(nsCOMPtr_helper const&) (nsCOMPtr.h:685)
by 0x7098EBB: nsHTMLCanvasElement::GetContextHelper(nsAString_internal const&, nsICanvasRenderingContextInternal**) (nsHTMLCanvasElement.cpp:628)
by 0x70991A3: nsHTMLCanvasElement::GetContext(nsAString_internal const&, JS::Value const&, nsISupports**) (nsHTMLCanvasElement.cpp:652)
by 0x76210D0: nsIDOMHTMLCanvasElement_GetContext(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:16548)
by 0x821A89B: js::CallJSNative(JSContext*, int (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:364)
Comment on attachment 673510 [details] [diff] [review]
cycle collect the mContextRestorer pointer

Hmm, this shouldn't help.
nsTimerImpl isn't cycle collected.
Need to change timer handling, so that it doesn't keep context alive.
Use InitWithFuncCallback and make sure to cancel the timer when
context is deleted.
Attachment #673510 - Flags: review?(bugs) → review-
(In reply to Andrew McCreight [:mccr8] from comment #8)
> Can you write a test for this?

How would you write a test, checking that things are correctly cycle collected?

A compiled-code test maybe?
(In reply to Benoit Jacob [:bjacob] from comment #13)
> How would you write a test, checking that things are correctly cycle
> collected?

Well, these leaks are apparently bad enough to cause shutdown leaks, so you can just set up one of these leaky cycles, and the shutdown leak detection will catch it if it breaks.
http://hg.mozilla.org/integration/mozilla-inbound/rev/48502b61a63e
Whiteboard: webgl-next [MemShrink] → webgl-next [MemShrink] [leave open]
Target Milestone: --- → mozilla19
(In reply to Andrew McCreight [:mccr8] from comment #14)
> (In reply to Benoit Jacob [:bjacob] from comment #13)
> > How would you write a test, checking that things are correctly cycle
> > collected?
> 
> Well, these leaks are apparently bad enough to cause shutdown leaks, so you
> can just set up one of these leaky cycles, and the shutdown leak detection
> will catch it if it breaks.

OK.

It seems like the two patches there together fix the leaks, and I was just using the wrong way to check if the leak was fixed -- I was navigating to about:memory in the SAME tab as the testcase and so the testcase was purposefully kept in memory.

I'll try to figured _what_ exactly caused the leak and then maybe I'll be able to make a test.
Comment on attachment 674011 [details] [diff] [review]
fix the robustness timer, don't inherit the timer callback base class

>+    void RobustnessTimerCallback(nsITimer* timer);
>+
>+    static void RobustnessTimerCallbackStatic(nsITimer* timer, void *thisPointer) {
>+        static_cast<WebGLContext*>(thisPointer)->RobustnessTimerCallback(timer);
>+    }
Looks like the file doesn't use Gecko coding style :(
But ok, let's not change that now.
(Parameters should be aParameterName, and indentation should be 2 spaces)
Attachment #674011 - Flags: review?(bugs) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/15b7770dcfcc

Regarding the testcases, after some testing, it seems that Nokia Maps is fixed by the first patch (48502b61a63e) and RO.ME is fixed by the second patch (15b7770dcfcc).

I will try to think of a test for this, but as it's not clear yet to me why these particular pages trigger these bugs while others don't, it would take some more time to isolate a test case, and I'm sorry, I have too many things to do this week.

Can close the bug now.
Whiteboard: webgl-next [MemShrink] [leave open] → webgl-next [MemShrink]
What's the effect on RO.ME?  Will bug 657115 be affected by this?
The effect on RO.ME was that the WebGL context and its resources were staying around forever after closing the RO.ME tab ; now they are properly going away several seconds after the tab is closed. I haven't tested if it has any effect on bug 657115 which seems to be more about memory usage _during_ playing RO.ME.
(In reply to Benoit Jacob [:bjacob] from comment #19)
> I will try to think of a test for this, but as it's not clear yet to me why
> these particular pages trigger these bugs while others don't, it would take
> some more time to isolate a test case, and I'm sorry, I have too many things
> to do this week.

No problem, I just was wondering if it might be easy to throw together a little test, but I guess not!

Nick: To reiterate what Benoit said, for RO.ME, there was a recent regression (bug 800488) from the new DOM bindings for WebGL stuff that caused a shutdown leak, which is what this fixes. It is possible that the old bindings for WebGL had some kind of temporary leak issue that will be fixed by the new DOM bindings, but I'd be surprised if that was the case.
This needs to land on Aurora, too, right?  But not really earlier?
Whiteboard: webgl-next [MemShrink] → webgl-next [MemShrink:P1]
Yes, let's backport to Aurora and not earlier. Backporting to Beta would be more intrusive as these classes were not CC'd before the WebIDL port (Aurora).
Comment on attachment 672984 [details] [diff] [review]
fix cycle collection in the WebGL implementation

[Approval Request Comment]
Bug caused by (feature/regressing bug #): don't know. The RO.ME leak is claimed to be from the WebIDL port (bug 779611) but I don't see how this can have not leaked before. The cycles existed before.
User impact if declined: shutdown memory leaks on certain heavy WebGL applications
Testing completed (on m-c, etc.): m-c since yesterday
Risk to taking this patch (and alternatives if risky): seems low risk: rather standard CC stuff
String or UUID changes made by this patch: none
Attachment #672984 - Flags: approval-mozilla-aurora?
Comment on attachment 674011 [details] [diff] [review]
fix the robustness timer, don't inherit the timer callback base class

[Approval Request Comment]
See previous request comment.
Attachment #674011 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/15b7770dcfcc
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Attachment #672984 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #674011 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: in-testsuite? → in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: