Structured clones of typed arrays cause leaks

RESOLVED FIXED in Firefox 23
(NeedInfo fromanyone)

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: gwagner, Assigned: sfink, NeedInfo)

Tracking

({regression})

unspecified
mozilla24
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox22 unaffected, firefox23+ fixed, firefox24+ fixed)

Details

(Whiteboard: [MemShrink:P1], )

Attachments

(2 attachments)

Posted file about:memory
We have a regression on current nightly. It woks fine on FF 21.
STR:
Just go to the URL and start the worker videos. Firefox will grow and grow and grow and grow....
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1)
> Alice, can you do a bisection here?

Unable to reproduce...

>STR:
>Just go to the URL and start the worker videos.

How do I start the worker videos?
Clicking on the Canvas should do it, just wait a few moments for the video to download entirely. Once it's cached, it should be a lot faster.
Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/ab0974ead030
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130411 Firefox/23.0 ID:20130411161820
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/058c640a3799
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130411 Firefox/23.0 ID:20130411164419
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ab0974ead030&tochange=058c640a3799

Suspected: 
 058c640a3799	Steve Fink — Bug 789593 - Clone typed arrays by cloning their buffers and only saving construction parameters. r=jorendorff, bent Also enters a compartment where needed, and bumps both the structured clone and IndexedDB schema versions
Blocks: 789593
Assignee: nobody → sphink
lsblakk, looks like your tracking flags got wiped out here.
This belongs in the JS engine.
Component: DOM: Workers → JavaScript Engine
Summary: Leak in workers doing postmessage → Structured clones of typed arrays cause leaks
The STR works fine for me, so I see it. Bug 861925 will probably fix this, but it's a pretty big change and may be too big to backport. I will look into a more targeted fix.
Bug 861925 will *not* fix this.

I don't think this is a leak. I believe what is happening is that the main thread is sending over chunks of compressed data to the workers by grabbing views of portions of one large buffer and postMessage'ing them.

Previously (before bug 789593), this would result in the data within that view getting copied into a structured clone buffer and a new ArrayBuffer+TypedArray pair constructed with that data. Now, the *entire* underlying ArrayBuffer is copied into the clone buffer, then the worker constructs a new ArrayBuffer out of that and makes a view of the appropriate portion.

This is what the spec says to do, and results in 3 copies of the full ArrayBuffer in memory at once temporarily. Worse, if the main thread sends multiple views of the same buffer over to the worker, it will make one full additional copy for each one. Unfortunately there doesn't seem to be any spec-compatible way to get the previous behavior without a manual copy of the portion of interest.

With Transferable ArrayBuffers, you could send the whole ArrayBuffer over with zero copying at all -- but only one worker could have it at a time, so you couldn't use multiple workers to process separate parts of the encoded data buffer.
For the life of me, I can't figure out the clone buffer management. Even in the nonleaking version, I see many more structured clone buffers written to than are ever read by the worker. It seems like the main thread sends over more messages than the worker can handle, and the worker never accesses the data within those messages, so it is never read out. Which is fine; perhaps it's intentionally dropping overdue frames or something, but not even MessageEventRunnable::WorkerRun seems to be getting called (in a quick sample run, I had 3178 MessageEventRunnable instances created and destroyed, but only 878 of them were run.) I am baffled as to why that *doesn't* leak, since with the current code it looks like that would leak a pointer to the data in MessageEventRunnable.mData. Or maybe it does, but it's not enough data to notice? (needinfo? bent)

At any rate, I locally patched it to use RAII guards for both the MessageEventRunnable and the MessageEvent, so that this would definitely get cleaned up, but the overall behavior is the same (namely: massive memory usage with the new structured clone typed array code, nothing noticeable with the old.) In fact, since the structured clone code is backwards-compatible, I can toggle which way it writes out typed arrays with an environment variable and see the difference. So the primary leak this bug is about definitely seems to be due to the whole-buffer cloning.
Flags: needinfo?(bent.mozilla)
The worker message loop isn't very complicated... If the worker can't keep up with the number of messages sent by the main thread then it won't get to run all of them. The queue will keep building up.

When the worker is shut down (either when closing the browser or calling 'terminate') then those MessageEventRunnables will be deleted. I think it's definitely a bug that we don't clear the structured clone data in the destructor there.

If the worker isn't being shut down then those MessageEventRunnables will eventually run given enough time and memory.
Flags: needinfo?(bent.mozilla)
With this patch, plus updating Broadway to copy Uint8Array views before cloning, the leak is gone.
Attachment #761162 - Flags: review?(bent.mozilla)
That is to say, this patch fixes a leak, but the Broadway demo memory usage is still out of control even after this patch is applied, because it's constantly cloning 50MB buffers and only using 350KB of them. That's enough to run my machine out of memory, so there may be a GC scheduling issue here.
I submitted a pull request of https://github.com/hotsphink/Broadway/commit/1c500d6700c89207c57d347941cc389710b0aa36 to improve its memory handling.
Comment on attachment 761162 [details] [diff] [review]
Manage structured clone memory with JSAutoStructuredCloneBuffer to prevent leaking unused buffers

Review of attachment 761162 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with this addressed:

::: dom/workers/Events.cpp
@@ +444,5 @@
>  
>    virtual ~MessageEvent()
>    {
>      MOZ_COUNT_DTOR(mozilla::dom::workers::MessageEvent);
> +    JS_ASSERT(!mBuffer.data());

You're no longer forcing the free in the finalize hook so I don't think you can assert this, right?
Attachment #761162 - Flags: review?(bent.mozilla) → review+
Filed bug 881914 on the (potential) GC scheduling problem.
(In reply to ben turner [:bent] from comment #15)
> Comment on attachment 761162 [details] [diff] [review]
> Manage structured clone memory with JSAutoStructuredCloneBuffer to prevent
> leaking unused buffers
> 
> Review of attachment 761162 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with this addressed:
> 
> ::: dom/workers/Events.cpp
> @@ +444,5 @@
> >  
> >    virtual ~MessageEvent()
> >    {
> >      MOZ_COUNT_DTOR(mozilla::dom::workers::MessageEvent);
> > +    JS_ASSERT(!mBuffer.data());
> 
> You're no longer forcing the free in the finalize hook so I don't think you
> can assert this, right?

I do believe you are correct. Thanks.
Blocks: 881922
Attachment #761162 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/2f1bb6823625
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Uplift nomination?
Flags: needinfo?(sphink)
Comment on attachment 761162 [details] [diff] [review]
Manage structured clone memory with JSAutoStructuredCloneBuffer to prevent leaking unused buffers

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 789593
User impact if declined: potentially massive memory leaks in pages using a combination of features (Workers + postMessaged typed arrays + high message frequency)
Testing completed (on m-c, etc.): on m-c and uplifted to aurora
Risk to taking this patch (and alternatives if risky): discussed via email
String or IDL/UUID changes made by this patch: none
Attachment #761162 - Flags: approval-mozilla-beta?
Attachment #761162 - Flags: approval-mozilla-aurora?
Comment on attachment 761162 [details] [diff] [review]
Manage structured clone memory with JSAutoStructuredCloneBuffer to prevent leaking unused buffers

Approving for beta uplift since, as discussed in email, bug 887420 has also been uplifted.
Attachment #761162 - Flags: approval-mozilla-beta?
Attachment #761162 - Flags: approval-mozilla-beta+
Attachment #761162 - Flags: approval-mozilla-aurora?
Attachment #761162 - Flags: approval-mozilla-aurora+
Fwiw, this is already marked fixed on FF24 so ignore the aurora approval.
Flags: needinfo?(sphink)
The link provided in comment 0 has expired (404 error). Is there another TC or steps in order to reproduce the issue.
Ups, the ending of my above comment was a question.
Flags: needinfo?
You need to log in before you can comment on or make changes to this bug.