Closed Bug 912255 Opened 11 years ago Closed 11 years ago

Categories

(Core :: Graphics: CanvasWebGL, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 + verified
firefox26 + verified

People

(Reporter: guijoselito, Assigned: bjacob)

References

Details

(Keywords: memory-leak, regression, Whiteboard: [MemShrink:P2] webgl-test-needed)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 (Beta/Release)
Build ID: 20130903030201

Steps to reproduce:

Open http://n-e-r-v-o-u-s.com/cellCycle/ and then close it.


Actual results:

Opened about:memory and there was a 
top(none)/ghost/window(http://n-e-r-v-o-u-s.com/cellCycle/)
Then opened about:compartments and there it was under "Ghost windows"


Expected results:

No Ghost Windows.
Whiteboard: [MemShrink]
I can reproduce this.
Whiteboard: [MemShrink] → [MemShrink:P2]
This leaks through shutdown.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Interesting, if I minimize memory a couple times on ESR 17 it goes away eventually.  It seems to stick around forever on nightly.

Alice, do you have time to bisect this?
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/8e3a124c9c1a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130627 Firefox/25.0 ID:20130627193322
Bad:
http://hg.mozilla.org/mozilla-central/rev/ec8ceaf4b03a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130628 Firefox/25.0 ID:20130628063224
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8e3a124c9c1a&tochange=ec8ceaf4b03a

Regression window(m-c)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/c9bbec788caf
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130627 Firefox/25.0 ID:20130627132145
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/88b65229c78c
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130627 Firefox/25.0 ID:20130627140921
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c9bbec788caf&tochange=88b65229c78c

Regressed by: bug 738869
Blocks: 738869
Indeed, that bug changed cycle collection of WebGLContext.
mDefaultVertexArray for instance isn't traversed/unlinked.
Component: General → Canvas: WebGL
I should have a patch... once my build is ready.
Indeed, this patch added mBoundVertexArray and mDefaultVertexArray but only traversed/unlinked the former.
Attachment #799470 - Flags: review?(bugs)
Comment on attachment 799470 [details] [diff] [review]
traverse/unlink mDefaultVertexArray

Assuming you actually tested this helps :)
Attachment #799470 - Flags: review?(bugs) → review+
Sorry, didn't realize that you already had a patch.

Tested locally, the patch does fix the leak for me.
https://hg.mozilla.org/integration/mozilla-inbound/rev/36aacf27a24e
Assignee: bugs → bjacob
Target Milestone: --- → mozilla26
We should add a mochitest for this.
Flags: needinfo?(bjacob)
Flags: in-testsuite?
Yes, but in fact, I don't understand why this wasn't caught already. This kind of leak is automatically detected when it happens on a mochitest, right?

Looking at the code, the cycle between mDefaultVertexArray and its WebGLContext is always, unconditionally created right on the creation of any WebGL context. Furthermore, even if for whatever reason the leak only occured when a non-default vertex array was bound, that too is tested by an existing, already-running mochitest, namely this WebGL conformance test page (run by the WebGL mochitest): conformance/extensions/oes-vertex-array-object.html

So I'm a bit clueless as to why the leak wasn't automatically detected; understanding that blocks doing anything about that; but I don't have time to do anything about it short-term myself (b2g 1.2 rush...).
Flags: needinfo?(bjacob)
(In reply to Benoit Jacob [:bjacob] from comment #14)
> Yes, but in fact, I don't understand why this wasn't caught already. This
> kind of leak is automatically detected when it happens on a mochitest, right?

Yes.  Since we leak through shutdown here, triggering it is enough to turn the tree orange.

> Looking at the code, the cycle between mDefaultVertexArray and its
> WebGLContext is always, unconditionally created right on the creation of any
> WebGL context.

Why do you believe that?  Where is the edge from the vertex array to the context?  I don't see it anywhere.  WebGLBoundObject holds a weak reference to the context afaict.
Flags: needinfo?(bjacob)
Oh, you are completely right. Been a long time since I actually did much WebGL coding :-) we do intentionally not have a strong reference from the WebGLContextBoundObject back to the WebGLContext, to avoid creating unnecessary cycles. So in fact, a mochitest would be as simple as creating a vertex array object, and in javascript manually creating a reference from it to the webgl context.
Flags: needinfo?(bjacob)
This would actually be perfectly reasonable material to add to the existing, upstream WebGL conformance test,

  https://www.khronos.org/registry/webgl/sdk/tests/conformance/extensions/oes-vertex-array-object.html

so we should just add it there (and then, we'd pick it up the next time that we update our copy of the conformance tests into our mochitest, which should happen more frequently than transits of Halley's comet).

Adding webgl-test-needed, which is how we remember to make needed contribution to the conformance tests...
Whiteboard: [MemShrink:P2] → [MemShrink:P2] webgl-test-needed
> Indeed, this patch added mBoundVertexArray and mDefaultVertexArray but only
> traversed/unlinked the former.

Is there any way to automate the detection of these kinds of problems?
(In reply to Benoit Jacob [:bjacob] from comment #16)
> So in fact, a mochitest would be as simple as creating a
> vertex array object, and in javascript manually creating a reference from it
> to the webgl context.

On second thought this is probably more subtle than that? The cycle that we forgot to break was between the WebGL context and its default vertex array, which AFAICT is not exposed to JS... so I don't understand how the present testcase managed to create that cycle. Need to minimize it?
(In reply to Nicholas Nethercote [:njn] from comment #18)
> > Indeed, this patch added mBoundVertexArray and mDefaultVertexArray but only
> > traversed/unlinked the former.
> 
> Is there any way to automate the detection of these kinds of problems?

The 'refgraph' tool that I was working on in my spare time ( https://github.com/bjacob/mozilla-central/wiki/Refgraph ) does exactly that, but with all the b2g work lately, I haven't had time.

I've been thinking about how to move refgraph to "bjacob's side project on github" to something that could land in mozilla-central. I think there's a reasonable path to that that involves abandoning refgraph's custom malloc-replace library and instead porting it to use DMD's existing one, which is already in mozilla-central (maybe extending it a little bit along the way).

refgraph would still remain a non-default build option, like DMD is, but it could be useful to a lot more people in this way.

I'd like to convince people who have the ability to spend mozilla time on memory issues that this is a reasonable project :-) Justin and Andrew had expressed interest a while ago but Justin left, most unfortunately.
BTW, this was a great bug all round.  Guilherme provided a nice, reproducible test case, which took advantage of some good built-in tools.  I confirmed it reproduced, Kyle identified it was a regression, Alice bisected and found the regression, and Olli and Benoit fixed it.  All in just over 24 hours! :)
Well, the refgraph tool is great for figuring out why a cycle is occurring, but it doesn't help with the issue here, which is that we are lacking a test case to create a cycle.  Practically speaking, for these sorts of issues in DOM code we mostly depend on Jesse fuzzing up test cases that leak on shutdown, and because they are often weird looking leaks, they don't always get fixed quickly.

I spent some time on a static analysis to detect things missing from Traverse and Unlink, but it got bogged down in the ugly reality of Gecko code, and didn't really end up as a usable thing.  Plus the underlying static analysis framework is now defunct.  But the basic idea is that you take a list of fields in the object, filter out things that the CC should be told about, and then compare it to the list of things you do tell the CC about.
(In reply to Andrew McCreight [:mccr8] from comment #22)
> Well, the refgraph tool is great for figuring out why a cycle is occurring,
> but it doesn't help with the issue here, which is that we are lacking a test
> case to create a cycle.

If fact, Refgraph could automatically detect the underlying problem --- that a non-cycle-collected C++ strong reference exists between objects that are exposed to JS --- without even the need for actually creating the cycle.

Refgraph knows the full graph of strong references, and know what part of it is known to the CC. Refgraph can also know which vertices (i.e., objects) are exposed to JS. So it could automatically find out when there is a path of strong references from one JS-exposed object to another, that is not known to the CC. That was my next task when working on refgraph, but I had to stop working on it for lack of time. But if anyone is interested, it could be implemented very quickly at this point.
Oh, I see.  That's pretty cool.  Yeah, that would certainly catch the most common problems with undeclared stuff.
(Of course you have to create the one-way strong reference between C++ objects. But we already do that in the above-mentioned mochitest).
Comment on attachment 799470 [details] [diff] [review]
traverse/unlink mDefaultVertexArray

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 738869
User impact if declined: large memory leak on certain WebGL-using pages
Testing completed (on m-c, etc.): m-c for a few days now
Risk to taking this patch (and alternatives if risky): no risk at all
String or IDL/UUID changes made by this patch: none
Attachment #799470 - Flags: approval-mozilla-aurora?
Attachment #799470 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Verified as fixed with Firefox 25 beta 7 (build ID: 20131010180222) and latest Aurora (build ID: 20131011004001).
QA Contact: manuela.muntean
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: