Closed Bug 1153907 Opened 9 years ago Closed 9 years ago

Memoryleak probably involving websocket, binary blobs used as images

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: M.Mann, Unassigned)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(8 files)

Memory dump at 1.8GB memory usage 9 years ago Matthias Mann 914.49 KB, application/x-gzip		Details
Memory dump after starting Firefox (same session as memory-report-6) 9 years ago Matthias Mann 425.96 KB, application/x-gzip		Details
ClientTestcase.7z 9 years ago Matthias Mann 78.44 KB, application/x-7z-compressed		Details
ClientTestcase-memory-report-1.json.gz 9 years ago Matthias Mann 178.86 KB, application/x-gzip		Details
ClientTestcase-memory-report-2.json.gz 9 years ago Matthias Mann 207.53 KB, application/x-gzip		Details
FakeWebsocketServer.java 9 years ago Matthias Mann 6.99 KB, text/x-java		Details
jpgGraber.js 9 years ago Matthias Mann 9.99 KB, application/javascript		Details
jpgGraber.js w/ extra revoking 9 years ago Eric Rahm [:erahm] 10.72 KB, application/javascript		Details

Matthias Mann

Reporter

Description

•

9 years ago

Attached file Memory dump at 1.8GB memory usage — Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150320202338

Steps to reproduce:

The issue is hard to reproduce as the website is the HTML5 UI of our hardware device and requires the dedicated HW to run.

But I have collected dumps from about:memory

It might be possible to construct an application which simulates the HW - but this will take time to build.


Actual results:

After about 2 hours runtime Firefox's memory usage reached 2 GB.


Expected results:

Memory usage should not grow - the website works fine in Chromium.

[:philipp]

Updated

•

9 years ago

Whiteboard: [MemShrink]

Matthias Mann

Reporter

Comment 1

•

9 years ago

Attached file Memory dump after starting Firefox (same session as memory-report-6) — Details

Nathan Froyd [:froydnj]

Updated

•

9 years ago

Component: Untriaged → DOM

Product: Firefox → Core

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 2

•

9 years ago

Matthias, any chance for some testcase here?

Nathan Froyd [:froydnj]

Comment 3

•

9 years ago

It looks like we're holding on to many many FileImpl things:

├──329.56 MB (37.51%) -- dom
│  ├──328.81 MB (37.42%) -- memory-file-data
│  │  ├──326.57 MB (37.17%) -- large
│  │  │  ├────0.14 MB (00.02%) ── file(length=15159, sha1=6382a123f255f309266b452abbb75c561b604d86) [9]
│  │  │  ├────0.14 MB (00.02%) ── file(length=18187, sha1=6ae2e92180a0a1dd15ff50eb474fa1062d1b9414) [7]
│  │  │  ├────0.12 MB (00.01%) ── file(length=17315, sha1=1be21d7c30276212eb7668936c2581fa3ba6416d) [6]
...

and so on for many hundreds of files.

Matthias, can you save concise and verbose GC/CC logs from about:memory and attach those to the bug as well?

Nathan Froyd [:froydnj]

Updated

•

9 years ago

Flags: needinfo?(M.Mann)

Matthias Mann

Reporter

Comment 4

•

9 years ago

I'll create these logs - how long should I let Firefox run before generating these?

As for the test case: not easily - unless you know of an open source library / package which can send binary data via websocket. Then we could generate random JPEG images and stream these.

We can include a small website with JS to receive these - but the server part is closed source.

Flags: needinfo?(M.Mann)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

9 years ago

Note, CC/GC logs may contain urls, so you may want to check that you don't expose anything you don't want to.
Also, the logs are possibly rather large, so sharing them somehow, and not attaching to the bug might work better.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

9 years ago

And just to see the client side part of the testcase could be useful.

Nathan Froyd [:froydnj]

Comment 7

•

9 years ago

(In reply to Matthias Mann from comment #4)
> I'll create these logs - how long should I let Firefox run before generating
> these?

Ideally once about:memory is showing several tens of things in dom/memory-file-data/large.  If it doesn't take too long to get to the point where your initial 1.8GB about:memory report came from, taking the logs at that point would be even better.

I assume triggering GC/CC/Minimize memory usage from about:memory doesn't reduce the memory usage?

Matthias Mann

Reporter

Comment 8

•

9 years ago

Manually triggering GC / CC did not reduce the amount of memory in dom/memory-file-data.

The compressed GC logs are available via Dropbox: https://www.dropbox.com/l/tcGhGJ7yV8kOoMm2bjfSPp
These dumps have been taking a few minutes after starting Firefox - and with about ~5 minute in between (first concise than verbose).

We will prepare the client side part of the testcase tomorrow.

Matthias Mann

Reporter

Comment 9

•

9 years ago

Attached file ClientTestcase.7z — Details

Matthias Mann

Reporter

Comment 10

•

9 years ago

Attached file ClientTestcase-memory-report-1.json.gz — Details

Matthias Mann

Reporter

Comment 11

•

9 years ago

Attached file ClientTestcase-memory-report-2.json.gz — Details

Matthias Mann

Reporter

Comment 12

•

9 years ago

I uploaded the reduced website which only contains the websocket handling and rendering of the received images.
The 2 memory dumps have been captured with ~1h time difference while running only that testcase.

Nathan Froyd [:froydnj]

Comment 13

•

9 years ago

Haven't looked at the testcase, but assuming this is a problem with the platform, having shiny new web APIs that cause memory problems is bad.  Marking P1 to reflect that.

Whiteboard: [MemShrink] → [MemShrink:P1]

Matthias Mann

Reporter

Comment 14

•

9 years ago

Attached file FakeWebsocketServer.java — Details

Matthias Mann

Reporter

Comment 15

•

9 years ago

Attached file jpgGraber.js — Details

Matthias Mann

Reporter

Comment 16

•

9 years ago

I attached a very simple web server (written Java) which listens for websocket connects on port 8080 and an updated jpgGraber.js which replaces that file in the ClientTestcase.7z to connect to 127.0.0.1:8080.

This should allow debugging of this issue without the need for our embedded system. In a short test we also saw growing memory usage using this server.

Eric Rahm [:erahm]

Comment 17

•

9 years ago

Looking at the memory report in attachment 8592251 [details] we can see a ton of images referring to blobs, and then a ton of memory-file-data (which is probably the in memory backing for the blob). Naive guess is that the images are holding the blobs alive.

> ├───43.11 MB (15.77%) -- images
> │   ├──39.89 MB (14.59%) -- content/raster
> │   │  ├──33.04 MB (12.08%) -- unused
> │   │  │  ├──29.25 MB (10.70%) -- (1545 tiny)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/0aeee507-d3b4-46a1-855a-030cad9ddeb3)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/0d03a4bc-606e-45c8-a8c4-4ce14415b9fd)
> │   │  │  │  ├───0.03 MB (00.01%) ++ image(blob:null/17877496-5467-42f0-9d8d-46ba27d00bb6)

> ├───44.95 MB (16.44%) -- dom
> │   ├──43.45 MB (15.89%) -- memory-file-data
> │   │  ├──43.27 MB (15.82%) -- large
> │   │  │  ├───0.07 MB (00.03%) ── file(length=21519, sha1=a7b479aa7bf53dbfb21e5d908484afba8ae8ca38) [3]
> │   │  │  ├───0.07 MB (00.03%) ── file(length=21855, sha1=8080d7e73b9caf00c5451cce7c26e0cb8fb3087f) [3]

Seth, can you help us understand why there would be so many "unused" images?

Flags: needinfo?(seth)

Seth Fowler [:seth] [:s2h]

Comment 18

•

9 years ago

We discussed this a bit on IRC. "unused" images are images which aren't referenced by any document.

I'm currently not sure where the bug is here. I'd really like to establish if image cache entries are keeping blobs alive (which would be a serious bug, but I don't see how we'd be doing it) or if the blobs are leaking via some other mechanism, which seems more likely.

A more minimal testcase would probably help.

Flags: needinfo?(seth)

Eric Rahm [:erahm]

Comment 19

•

9 years ago

One more point we noticed:

>       1,848 ── host-object-urls

So there are a *ton* of object URLs still floating around. They're not reported in detail b/c they're not blob URLs [1]. According to comments in the nsHostObjectProtocolHandler these are either DOMMediaStream or MediaSource instances [2].

[1] http://hg.mozilla.org/mozilla-central/annotate/ec1351f9bc58/dom/base/nsHostObjectProtocolHandler.cpp#l174
[2] http://hg.mozilla.org/mozilla-central/annotate/ec1351f9bc58/dom/base/nsHostObjectProtocolHandler.cpp#l28

Eric Rahm [:erahm]

Updated

•

9 years ago

Depends on: 1155371

Seth Fowler [:seth] [:s2h]

Comment 20

•

9 years ago

An additional oddity here is that we are holding on to more unused images than expected. imgLoader::CheckCacheLimits() is supposed to limit us to "image.cache.size" bytes of cache, which is 5 MB by default.

There are some special cases, specifically for evicted entries, which make this harder to reason about.

I'd be interested in figuring out exactly what's happening, but again, I think we need a more minimal testcase. It's unclear whether websockets are even involved in this bug or not.

Eric Rahm [:erahm]

Comment 21

•

9 years ago

With instrumentation I've found all of the host-object-urls are DOMMediaStream instances that are created in jpgGraber.js:

> var blob = new Blob([msg.data], {
>                        type: 'image/jpeg'
> });
> imageArray[imageInput].src= window.URL.createObjectURL(blob);

So why this isn't report as a blob, I'm not sure.

|imageArray| is a fixed sized array of 16 |Image| objects. Further instrumentation indicates that for the leaked entries, the value of |imageArray[imageInput].src| is being overwritten *before* |window.URL.revokeObjectURL| is called on the object.

At this point it's hard for me to say whether we're doing something wrong or if this is user error, presumably what's happening is socket messages are coming in faster than |Image.onload| is being called. It's also possible |onerror| is being hit and not properly handled.

I particularly saw this behavior when switching tabs and loading other URLs.

Seth Fowler [:seth] [:s2h]

Comment 22

•

9 years ago

(In reply to Eric Rahm [:erahm] from comment #21)
> At this point it's hard for me to say whether we're doing something wrong or
> if this is user error,

Failing to call window.URL.revokeObjectURL is user error, but shouldn't we be eventually GC'ing the blobs anyway?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 23

•

9 years ago

(In reply to Seth Fowler [:seth] from comment #22)
> (In reply to Eric Rahm [:erahm] from comment #21)
> > At this point it's hard for me to say whether we're doing something wrong or
> > if this is user error,
> 
> Failing to call window.URL.revokeObjectURL is user error, but shouldn't we
> be eventually GC'ing the blobs anyway?

When the document is unloaded, yes.  We can't GC the blobs before then because somebody might use the URL.

Seth Fowler [:seth] [:s2h]

Comment 24

•

9 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> When the document is unloaded, yes.  We can't GC the blobs before then
> because somebody might use the URL.

Hmm, that makes sense.. We have no way of tracking whether they've "dropped a reference" to the URIs, after all, since they're just text.

So if we fix this bug and make sure window.URL.revokeObjectURL gets called, do we still leak?

Eric Rahm [:erahm]

Comment 25

•

9 years ago

(In reply to Seth Fowler [:seth] from comment #24)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> > When the document is unloaded, yes.  We can't GC the blobs before then
> > because somebody might use the URL.
> 
> Hmm, that makes sense.. We have no way of tracking whether they've "dropped
> a reference" to the URIs, after all, since they're just text.
> 
> So if we fix this bug and make sure window.URL.revokeObjectURL gets called,
> do we still leak?

A patched version that calls |window.URL.revokeObjectURL| prior to setting |Image.src| does seem to have cleared things up. I haven't seen the odd cache behavior in my testing, it would be interesting to see if that persists on Matthias' side.

Flags: needinfo?(M.Mann)

Matthias Mann

Reporter

Comment 26

•

9 years ago

Do you call revokeObjectURL on the new URL prior to setting Image.src - or on the previously assigned URL?
Or would it be better (possible) to call revokeObjectURL on the new URL directly after setting Image.src?

Can you upload the patched test case?

Also if this is a bug in the JS code - why doesn't it affect Chromium?

Eric Rahm [:erahm]

Comment 27

•

9 years ago

Attached file jpgGraber.js w/ extra revoking — Details

An instrumented version of jpgGraber.js that highlights the issue at hand.

Eric Rahm [:erahm]

Comment 28

•

9 years ago

(In reply to Matthias Mann from comment #26)
> Do you call revokeObjectURL on the new URL prior to setting Image.src - or
> on the previously assigned URL?

On the previously assigned URL.

> Or would it be better (possible) to call revokeObjectURL on the new URL
> directly after setting Image.src?

This would most likely break image loading.

> Also if this is a bug in the JS code - why doesn't it affect Chromium?

That is a good question. I've confirmed that this issue is not present in Chrome. From what I can tell Chrome *always* has a corresponding |onload| call for each time |src| is set, OTOH Fx does not which is what leads to the leak. There also is not an onerror/onabort call, which to me seems like a bug on our side.

Seth Fowler [:seth] [:s2h]

Comment 29

•

9 years ago

(In reply to Eric Rahm [:erahm] from comment #28)
> That is a good question. I've confirmed that this issue is not present in
> Chrome. From what I can tell Chrome *always* has a corresponding |onload|
> call for each time |src| is set, OTOH Fx does not which is what leads to the
> leak. There also is not an onerror/onabort call, which to me seems like a
> bug on our side.

I'm not sure I understand the provided code 100%, but this sounds like it could just be a timing issue involving how the delivery of chunks of data from the websocket is interleaved with image header decoding. If you change the |src| before we have time to deliver the load event, it makes sense (and IIRC the spec requires) that we don't deliver the load event for the old image.

If we just keep appending <img> elements to the array, instead of changing the src on existing <img> elements, do you still observe any of those <img> elements not receiving the load event? I'd think they would all receive the load event.

Matthias Mann

Reporter

Comment 30

•

9 years ago

The work around seems to work - the memory usage is no longer growing.

But still - the fact that the other browsers (Chromium, Opera and IE10) did not show this issue is interesting.

Flags: needinfo?(M.Mann)

Eric Rahm [:erahm]

Comment 31

•

9 years ago

We think this is working as designed.

Status: UNCONFIRMED → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.