Closed Bug 1153907 Opened 9 years ago Closed 9 years ago

Memoryleak probably involving websocket, binary blobs used as images

Categories

(Core :: DOM: Core & HTML, defect)

37 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: M.Mann, Unassigned)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(8 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150320202338

Steps to reproduce:

The issue is hard to reproduce as the website is the HTML5 UI of our hardware device and requires the dedicated HW to run.

But I have collected dumps from about:memory

It might be possible to construct an application which simulates the HW - but this will take time to build.


Actual results:

After about 2 hours runtime Firefox's memory usage reached 2 GB.


Expected results:

Memory usage should not grow - the website works fine in Chromium.
Whiteboard: [MemShrink]
Component: Untriaged → DOM
Product: Firefox → Core
Matthias, any chance for some testcase here?
It looks like we're holding on to many many FileImpl things:

├──329.56 MB (37.51%) -- dom
│  ├──328.81 MB (37.42%) -- memory-file-data
│  │  ├──326.57 MB (37.17%) -- large
│  │  │  ├────0.14 MB (00.02%) ── file(length=15159, sha1=6382a123f255f309266b452abbb75c561b604d86) [9]
│  │  │  ├────0.14 MB (00.02%) ── file(length=18187, sha1=6ae2e92180a0a1dd15ff50eb474fa1062d1b9414) [7]
│  │  │  ├────0.12 MB (00.01%) ── file(length=17315, sha1=1be21d7c30276212eb7668936c2581fa3ba6416d) [6]
...

and so on for many hundreds of files.

Matthias, can you save concise and verbose GC/CC logs from about:memory and attach those to the bug as well?
Flags: needinfo?(M.Mann)
I'll create these logs - how long should I let Firefox run before generating these?

As for the test case: not easily - unless you know of an open source library / package which can send binary data via websocket. Then we could generate random JPEG images and stream these.

We can include a small website with JS to receive these - but the server part is closed source.
Flags: needinfo?(M.Mann)
Note, CC/GC logs may contain urls, so you may want to check that you don't expose anything you don't want to.
Also, the logs are possibly rather large, so sharing them somehow, and not attaching to the bug might work better.
And just to see the client side part of the testcase could be useful.
(In reply to Matthias Mann from comment #4)
> I'll create these logs - how long should I let Firefox run before generating
> these?

Ideally once about:memory is showing several tens of things in dom/memory-file-data/large.  If it doesn't take too long to get to the point where your initial 1.8GB about:memory report came from, taking the logs at that point would be even better.

I assume triggering GC/CC/Minimize memory usage from about:memory doesn't reduce the memory usage?
Manually triggering GC / CC did not reduce the amount of memory in dom/memory-file-data.

The compressed GC logs are available via Dropbox: https://www.dropbox.com/l/tcGhGJ7yV8kOoMm2bjfSPp
These dumps have been taking a few minutes after starting Firefox - and with about ~5 minute in between (first concise than verbose).

We will prepare the client side part of the testcase tomorrow.
Attached file ClientTestcase.7z
I uploaded the reduced website which only contains the websocket handling and rendering of the received images.
The 2 memory dumps have been captured with ~1h time difference while running only that testcase.
Haven't looked at the testcase, but assuming this is a problem with the platform, having shiny new web APIs that cause memory problems is bad.  Marking P1 to reflect that.
Whiteboard: [MemShrink] → [MemShrink:P1]
Attached file jpgGraber.js
I attached a very simple web server (written Java) which listens for websocket connects on port 8080 and an updated jpgGraber.js which replaces that file in the ClientTestcase.7z to connect to 127.0.0.1:8080.

This should allow debugging of this issue without the need for our embedded system. In a short test we also saw growing memory usage using this server.
Looking at the memory report in attachment 8592251 [details] we can see a ton of images referring to blobs, and then a ton of memory-file-data (which is probably the in memory backing for the blob). Naive guess is that the images are holding the blobs alive.

> ├───43.11 MB (15.77%) -- images
> │   ├──39.89 MB (14.59%) -- content/raster
> │   │  ├──33.04 MB (12.08%) -- unused
> │   │  │  ├──29.25 MB (10.70%) -- (1545 tiny)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/0aeee507-d3b4-46a1-855a-030cad9ddeb3)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/0d03a4bc-606e-45c8-a8c4-4ce14415b9fd)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/17877496-5467-42f0-9d8d-46ba27d00bb6)

> ├───44.95 MB (16.44%) -- dom
> │   ├──43.45 MB (15.89%) -- memory-file-data
> │   │  ├──43.27 MB (15.82%) -- large
> │   │  │  ├───0.07 MB (00.03%) ── file(length=21519, sha1=a7b479aa7bf53dbfb21e5d908484afba8ae8ca38) [3]
> │   │  │  ├───0.07 MB (00.03%) ── file(length=21855, sha1=8080d7e73b9caf00c5451cce7c26e0cb8fb3087f) [3]

Seth, can you help us understand why there would be so many "unused" images?
Flags: needinfo?(seth)
We discussed this a bit on IRC. "unused" images are images which aren't referenced by any document.

I'm currently not sure where the bug is here. I'd really like to establish if image cache entries are keeping blobs alive (which would be a serious bug, but I don't see how we'd be doing it) or if the blobs are leaking via some other mechanism, which seems more likely.

A more minimal testcase would probably help.
Flags: needinfo?(seth)
One more point we noticed:

>       1,848 ── host-object-urls

So there are a *ton* of object URLs still floating around. They're not reported in detail b/c they're not blob URLs [1]. According to comments in the nsHostObjectProtocolHandler these are either DOMMediaStream or MediaSource instances [2].

[1] http://hg.mozilla.org/mozilla-central/annotate/ec1351f9bc58/dom/base/nsHostObjectProtocolHandler.cpp#l174
[2] http://hg.mozilla.org/mozilla-central/annotate/ec1351f9bc58/dom/base/nsHostObjectProtocolHandler.cpp#l28
Depends on: 1155371
An additional oddity here is that we are holding on to more unused images than expected. imgLoader::CheckCacheLimits() is supposed to limit us to "image.cache.size" bytes of cache, which is 5 MB by default.

There are some special cases, specifically for evicted entries, which make this harder to reason about.

I'd be interested in figuring out exactly what's happening, but again, I think we need a more minimal testcase. It's unclear whether websockets are even involved in this bug or not.
With instrumentation I've found all of the host-object-urls are DOMMediaStream instances that are created in jpgGraber.js:

> var blob = new Blob([msg.data], {
>                        type: 'image/jpeg'
> });
> imageArray[imageInput].src= window.URL.createObjectURL(blob);

So why this isn't report as a blob, I'm not sure.

|imageArray| is a fixed sized array of 16 |Image| objects. Further instrumentation indicates that for the leaked entries, the value of |imageArray[imageInput].src| is being overwritten *before* |window.URL.revokeObjectURL| is called on the object.

At this point it's hard for me to say whether we're doing something wrong or if this is user error, presumably what's happening is socket messages are coming in faster than |Image.onload| is being called. It's also possible |onerror| is being hit and not properly handled.

I particularly saw this behavior when switching tabs and loading other URLs.
(In reply to Eric Rahm [:erahm] from comment #21)
> At this point it's hard for me to say whether we're doing something wrong or
> if this is user error,

Failing to call window.URL.revokeObjectURL is user error, but shouldn't we be eventually GC'ing the blobs anyway?
(In reply to Seth Fowler [:seth] from comment #22)
> (In reply to Eric Rahm [:erahm] from comment #21)
> > At this point it's hard for me to say whether we're doing something wrong or
> > if this is user error,
> 
> Failing to call window.URL.revokeObjectURL is user error, but shouldn't we
> be eventually GC'ing the blobs anyway?

When the document is unloaded, yes.  We can't GC the blobs before then because somebody might use the URL.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> When the document is unloaded, yes.  We can't GC the blobs before then
> because somebody might use the URL.

Hmm, that makes sense.. We have no way of tracking whether they've "dropped a reference" to the URIs, after all, since they're just text.

So if we fix this bug and make sure window.URL.revokeObjectURL gets called, do we still leak?
(In reply to Seth Fowler [:seth] from comment #24)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> > When the document is unloaded, yes.  We can't GC the blobs before then
> > because somebody might use the URL.
> 
> Hmm, that makes sense.. We have no way of tracking whether they've "dropped
> a reference" to the URIs, after all, since they're just text.
> 
> So if we fix this bug and make sure window.URL.revokeObjectURL gets called,
> do we still leak?

A patched version that calls |window.URL.revokeObjectURL| prior to setting |Image.src| does seem to have cleared things up. I haven't seen the odd cache behavior in my testing, it would be interesting to see if that persists on Matthias' side.
Flags: needinfo?(M.Mann)
Do you call revokeObjectURL on the new URL prior to setting Image.src - or on the previously assigned URL?
Or would it be better (possible) to call revokeObjectURL on the new URL directly after setting Image.src?

Can you upload the patched test case?

Also if this is a bug in the JS code - why doesn't it affect Chromium?
An instrumented version of jpgGraber.js that highlights the issue at hand.
(In reply to Matthias Mann from comment #26)
> Do you call revokeObjectURL on the new URL prior to setting Image.src - or
> on the previously assigned URL?

On the previously assigned URL.

> Or would it be better (possible) to call revokeObjectURL on the new URL
> directly after setting Image.src?

This would most likely break image loading.

> Also if this is a bug in the JS code - why doesn't it affect Chromium?

That is a good question. I've confirmed that this issue is not present in Chrome. From what I can tell Chrome *always* has a corresponding |onload| call for each time |src| is set, OTOH Fx does not which is what leads to the leak. There also is not an onerror/onabort call, which to me seems like a bug on our side.
(In reply to Eric Rahm [:erahm] from comment #28)
> That is a good question. I've confirmed that this issue is not present in
> Chrome. From what I can tell Chrome *always* has a corresponding |onload|
> call for each time |src| is set, OTOH Fx does not which is what leads to the
> leak. There also is not an onerror/onabort call, which to me seems like a
> bug on our side.

I'm not sure I understand the provided code 100%, but this sounds like it could just be a timing issue involving how the delivery of chunks of data from the websocket is interleaved with image header decoding. If you change the |src| before we have time to deliver the load event, it makes sense (and IIRC the spec requires) that we don't deliver the load event for the old image.

If we just keep appending <img> elements to the array, instead of changing the src on existing <img> elements, do you still observe any of those <img> elements not receiving the load event? I'd think they would all receive the load event.
The work around seems to work - the memory usage is no longer growing.

But still - the fact that the other browsers (Chromium, Opera and IE10) did not show this issue is interesting.
Flags: needinfo?(M.Mann)
We think this is working as designed.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: