Closed Bug 1301863 Opened 9 years ago Closed 8 years ago

Crash caused by memory leak when using WebSocket in Worker

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox-esr52

---

wontfix

firefox55

---

wontfix

firefox56

---

wontfix

firefox57

---

fixed

People

(Reporter: browser_defects, Assigned: jonco)

References

Details

(Keywords: crash, Whiteboard: [MemShrink:P2])

Attachments

(3 files, 4 obsolete files)

ff_worker.zip 9 years ago Milestone Systems 258.99 KB, application/zip		Details
memory-report.json.gz 9 years ago Milestone Systems 298.36 KB, application/x-gzip		Details
account-blob-reflectors 8 years ago Jon Coppeard (:jonco) 3.70 KB, patch	peterv : review-	Details \| Diff \| Splinter Review
bug1301863-account-blob-reflectors v2 8 years ago Jon Coppeard (:jonco) 4.66 KB, patch	baku : review-	Details \| Diff \| Splinter Review
bug1301863-blob-reflectors-2 8 years ago Jon Coppeard (:jonco) 2.08 KB, patch		Details \| Diff \| Splinter Review
bug1301863-account-blob-reflectors v3 8 years ago Jon Coppeard (:jonco) 12.32 KB, patch	baku : review+	Details \| Diff \| Splinter Review
bug1301863-account-blob-reflectors v4 8 years ago Jon Coppeard (:jonco) 13.55 KB, patch	jonco : review+	Details \| Diff \| Splinter Review

Milestone Systems

Reporter

Description

•

9 years ago

Attached file ff_worker.zip — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0 Build ID: 20160823121617 Steps to reproduce: 1. Download and install nodejs (https://nodejs.org/en/) 2. Download and unzip the attached prototype 3. Open shell and navigate to the folder extracted in point 2 4. Start the server by typing: node server.js 5. Open Firefox and navigate to http://localhost:8888 6. Observer the Firefox memory consumption in the task manager. Link to crash report: https://crash-stats.mozilla.com/report/index/c94acc22-8f80-4a7d-9356-8fad12160909 Actual results: After running for 5-10 minutes browser either crash because reach the maximum memory allowed for the process or terminate the worker which can be seen in server console log as closed connection. Expected results: Memory consumption should be stable. This is only seen in Firefox.

Milestone Systems

Reporter

Comment 1

•

9 years ago

Attached file memory-report.json.gz — Details

about:memory report

Loic

Updated

•

9 years ago

Severity: normal → critical

Component: Untriaged → DOM: Service Workers

Keywords: crash

Product: Firefox → Core

Loic

Updated

•

9 years ago

Whiteboard: [MemShrink]

Ben Kelly [:bkelly, not reviewing]

Comment 2

•

9 years ago

I think the issue here is that the server sends a message every 20ms without regard to whether the client can keep up or not. We probably don't provide any meaningful back-pressure here and just let it fill up the event queue. We should perhaps track the number of outstanding web socket messages remaining to fire and if we exceed some threshold stop reading from the incoming socket. Once the messages are consumed we can then begin reading the socket again.

Status: UNCONFIRMED → NEW

Component: DOM: Service Workers → Networking: WebSockets

Ever confirmed: true

Ben Kelly [:bkelly, not reviewing]

Comment 3

•

9 years ago

Of course, if we stop reading from the socket the tcp socket will then apply back-pressure to the server once the tcp window is filled. Your server.js setInterval() will then be filling some server-side buffer and eventually OOM your server instead. You should probably build some kind of throttling or flow-control mechanism into your client/server app.

Valentin Gosu [:valentin] (he/him) {{ AFK in July }}

Comment 4

•

9 years ago

I does sound like we want to do this in order to avoid OOM. Unsure if this should be done in WebSocketChannel or in dom/base/WebSocket.cpp

Flags: needinfo?(michal.novotny)

Flags: needinfo?(amarchesini)

Milestone Systems

Reporter

Comment 5

•

9 years ago

It doesn't matter if we increase the interval that server will send data to the Worker, it still leaks. It is slower, but browser still does not release memory. When I set the interval to 1000 ms it still leaks for about 100 MB per 10 minutes. The sample was kept as simple as possible. In the real application we consume the event and then set it to null and it still leaks memory. If you think of any workaround until the fix is made that will be very helpful.

Andrea Marchesini [:baku]

Comment 6

•

9 years ago

It seems a dup of bug 1280916. I'll investigate it today.

Flags: needinfo?(amarchesini)

Michal Novotny [:michal]

Comment 7

•

9 years ago

(In reply to Valentin Gosu [:valentin] from comment #4) > I does sound like we want to do this in order to avoid OOM. > Unsure if this should be done in WebSocketChannel or in > dom/base/WebSocket.cpp In WebSocketChannel we can't know whether nsIWebSocketListener receives or queues the incoming frames. All we could do is to implement suspend/resume methods that would be called by the listener in case it's not able to consume the frames quickly enough. Anyway, as per comment 6 it seems the problem is somewhere else.

Flags: needinfo?(michal.novotny)

Valentin Gosu [:valentin] (he/him) {{ AFK in July }}

Comment 8

•

9 years ago

(In reply to Andrea Marchesini [:baku] from comment #6) > It seems a dup of bug 1280916. I'll investigate it today. Thanks!

Assignee: nobody → amarchesini

Whiteboard: [MemShrink] → [MemShrink][necko-active]

Andrea Marchesini [:baku]

Comment 9

•

9 years ago

The issue here is exactly what I have seen in bug 1280916: WebSocket creates some JS objects (blobs here, but similar issue happens with strings) but then not all of them are GCed. I'm sure that is not CC because I interrupted the dispatching of WebSocket events here: https://dxr.mozilla.org/mozilla-central/source/dom/base/WebSocket.cpp#2000 Every is correctly GCed when I run GC from about:memory. The |JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER)| is correctly executed by the worker thread. Jon, any feedback about this?

Flags: needinfo?(jcoppeard)

Ben Kelly [:bkelly, not reviewing]

Comment 10

•

9 years ago

Don't we just GC less often on worker threads? Is this just a consequence of that design decision? Can we trigger a more aggressive GC/CC at some point?

Andrea Marchesini [:baku]

Comment 11

•

9 years ago

the mPeriodicGCTimerTarget runs often. This executes JS_MaybeGC(aCx). Less often we execute JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER); but they still run. But we still leak. I don't think the issue is about when we trigger GC/CC. I want to test comment 5.

Ben Kelly [:bkelly, not reviewing]

Comment 12

•

9 years ago

I wonder if something on the main thread is keeping it alive until the about:memory triggers a cycle collection there, etc.

Andrea Marchesini [:baku]

Comment 13

•

9 years ago

Jon is in PTO. moving NI to sfink

Flags: needinfo?(sphink)

Andrea Marchesini [:baku]

Updated

•

9 years ago

Flags: needinfo?(jcoppeard)

Andrea Marchesini [:baku]

Comment 14

•

9 years ago

I confirm that we correctly execute mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal. Quite often we end up calling JS_MaybeGC(aCx); and any few seconds, we call JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER);

Eric Rahm [:erahm]

Updated

•

9 years ago

Whiteboard: [MemShrink][necko-active] → [MemShrink:P2][necko-active]

Andrea Marchesini [:baku]

Comment 15

•

9 years ago

NI me if there is something else I can help with.

Assignee: amarchesini → nobody

Michal Novotny [:michal]

Updated

•

9 years ago

Component: Networking: WebSockets → DOM

Whiteboard: [MemShrink:P2][necko-active] → [MemShrink:P2]

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

8 years ago

Priority: -- → P3

Eric Rahm [:erahm]

Comment 16

•

8 years ago

Jon, can you give feedback on comment 9?

Flags: needinfo?(sphink) → needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Assignee

Comment 17

•

8 years ago

I've tried reproducing this and cannot... it's sitting here running and I'm seeing lots of received packets but no build up of memory. If I had to guess I'd say that the data is being kept alive by something outside JS but wouldn't be surprised if there was some problem with the way we schedule GC for workers either. baku, is there anything else I need to do to repro this other than the original instructions? Do you know if it's still a problem?

Flags: needinfo?(jcoppeard) → needinfo?(amarchesini)

Jon Coppeard (:jonco)

Assignee

Comment 18

•

8 years ago

Never mind, I can reproduce this with e10s disabled. Memory usage grows to > 7 GB as reported by activity monitor. Multiple worker runtimes are created but not all have GCs triggered on them. about:memory shows most usage is memory-file-data: 11,455.77 MB (100.0%) -- explicit ├──11,027.59 MB (96.26%) -- dom │ ├──11,027.52 MB (96.26%) -- memory-file-data │ │ ├──11,027.50 MB (96.26%) ── string [176440] │ │ └───────0.02 MB (00.00%) ── large/file(length=5430, sha1=49263695f6b0cdd72f45cf1b775e660fdc36c606) [2] │ └───────0.08 MB (00.00%) ++ (2 tiny) ├─────258.06 MB (02.25%) ++ (27 tiny) └─────170.11 MB (01.48%) ── heap-unclassified Minimise memory usage frees memory down to a normal size again.

Flags: needinfo?(amarchesini)

Jon Coppeard (:jonco)

Assignee

Comment 19

•

8 years ago

Attached patch account-blob-reflectors (obsolete) — Details — Splinter Review

The problem here is that a JS reflector object can keep a large DOM object alive but the JS GC by default doesn't know about this extra memory and so doesn't trigger a GC. The patch adds a function which can be overloaded to provide this information and implements it for blobs. This causes GCs to be triggered for this test case and fixes the memory leak. We should probably do something like this for other large DOM objects as well.

Assignee: nobody → jcoppeard

Attachment #8884867 - Flags: review?(peterv)

Peter Van der Beken [:peterv]

Comment 20

•

8 years ago

Note that we are calling JS_updateMallocCounter in various places already. How does this interact with some of the existing callers related to blobs (for example http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/dom/canvas/CanvasRenderingContextHelper.cpp#50)? Will we double-count?

Flags: needinfo?(jcoppeard)

Peter Van der Beken [:peterv]

Comment 21

•

8 years ago

Comment on attachment 8884867 [details] [diff] [review] account-blob-reflectors -ing while waiting for response to question in comment 20.

Attachment #8884867 - Flags: review?(peterv) → review-

Jon Coppeard (:jonco)

Assignee

Comment 22

•

8 years ago

Attached patch bug1301863-account-blob-reflectors v2 (obsolete) — Details — Splinter Review

Sorry for the long delay. I checked the other uses of JS_updateMallocCounter and the previous patch would double-count when CanvasRenderingContextHelper was converted to a blob so I removed that use.

Attachment #8884867 - Attachment is obsolete: true

Flags: needinfo?(jcoppeard)

Attachment #8906033 - Flags: review?(peterv)

Jon Coppeard (:jonco)

Assignee

Comment 23

•

8 years ago

Attached patch bug1301863-blob-reflectors-2 (obsolete) — Details — Splinter Review

And further to the last patch, we should remove the update call from OffscreenCanvas when converting to a blob (since blobs are already handled) and when converting HTMLCanvasElement to a File (since file is a subclass of Blob and is also handled).

Attachment #8906642 - Flags: review?(peterv)

Andrea Marchesini [:baku]

Comment 24

•

8 years ago

Comment on attachment 8906033 [details] [diff] [review] bug1301863-account-blob-reflectors v2 Review of attachment 8906033 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/file/Blob.cpp @@ +286,5 @@ > mImpl->GetInternalStream(aStream, aRv); > } > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) blob -> aBlob @@ +287,5 @@ > } > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ MOZ_ASSERT(aBlob) @@ +288,5 @@ > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ > + ErrorResult error; Use IgnoredErrorResult and remove error.SuppressException(). @@ +289,5 @@ > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ > + ErrorResult error; > + uint64_t size = blob->GetSize(error); This is wrong. Blob size is different than the memory allocation. For instance if this Blob is a real file on disk, GetSize() returns the file size, but we don't allocate such amount of memory. Some BlobImpl objects have a custom memory report. maybe we could expand that or implement a AllocationSize() method. ::: dom/file/Blob.h @@ +146,5 @@ > private: > nsCOMPtr<nsISupports> mParent; > }; > > +uint64_t BindingJSObjectMallocBytes(Blob* blob); aBlob here as well. Add a comment about why we need this function.

Attachment #8906033 - Flags: review?(peterv) → review-

Jon Coppeard (:jonco)

Assignee

Updated

•

8 years ago

Updated

•

8 years ago

Updated

•

8 years ago

Attachment #8906642 - Flags: review?(peterv)

Jon Coppeard (:jonco)

Assignee

Comment 25

•

8 years ago

Attached patch bug1301863-account-blob-reflectors v3 (obsolete) — Details — Splinter Review

Thanks for the review. How does this look? I added a GetAllocationSize() method and implemented it for MemoryBlobImpl, StringBlobImpl and MultipartBlobImpl.

Attachment #8906033 - Attachment is obsolete: true

Attachment #8906642 - Attachment is obsolete: true

Attachment #8906663 - Flags: review?(amarchesini)

Andrea Marchesini [:baku]

Comment 26

•

8 years ago

Comment on attachment 8906663 [details] [diff] [review] bug1301863-account-blob-reflectors v3 Review of attachment 8906663 [details] [diff] [review]: ----------------------------------------------------------------- You should do something also for StreamBlobImpl. Wondering if you can use stringInputStream->SizeOfIncludingThis(MallocSizeOf) for that. ::: dom/file/BaseBlobImpl.h @@ +96,5 @@ > } > > virtual void GetType(nsAString& aType) override; > > + size_t GetAllocationSize() override size_t GetAllocationSize() const override @@ +193,5 @@ > } // namespace dom > } // namespace mozilla > > #endif // mozilla_dom_BaseBlobImpl_h > + remove this extra line ::: dom/file/Blob.cpp @@ +214,5 @@ > return blob.forget(); > } > > +size_t > +Blob::GetAllocationSize() const? ::: dom/file/Blob.h @@ +129,5 @@ > const Optional<int64_t>& aEnd, > const nsAString& aContentType, > ErrorResult& aRv); > > + size_t GetAllocationSize(); size_t GetAllocationSize() const; ::: dom/file/BlobImpl.h @@ +50,5 @@ > virtual uint64_t GetSize(ErrorResult& aRv) = 0; > > virtual void GetType(nsAString& aType) = 0; > > + virtual size_t GetAllocationSize() = 0; virtual size_t GetAllocationSize() const = 0;

Attachment #8906663 - Flags: review?(amarchesini) → review+

Jon Coppeard (:jonco)

Assignee

Comment 27

•

8 years ago

Attached patch bug1301863-account-blob-reflectors v4 — Details — Splinter Review

Updated patch with comments applied.

Attachment #8906698 - Flags: review+

Jon Coppeard (:jonco)

Assignee

Updated

•

8 years ago

Attachment #8906663 - Attachment is obsolete: true

Pulsebot

Comment 28

•

8 years ago

Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6187a252c146 Tell the JS engine how much memory blob reflectors hold alive r=baku

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 29

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6187a252c146

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox55: --- → wontfix

status-firefox56: --- → wontfix

status-firefox-esr52: --- → wontfix

Milestone Systems

Reporter

Comment 30

•

7 years ago

Hello, I just retested our sample app. Now instead of crashing after 10 minutes, it reaches 90% memory in 20 minutes, stays there for 2 hours and then crashes. User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:62.0) Gecko/20100101 Firefox/62.0 Build ID: 20180920131237 Same attachment: ff_worker.zip Same steps to reproduce. You can see the video recording here: https://youtu.be/cU-1HemJm6M Do you think it's the same bug as bug 1280916 or it's something else?

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Assignee

Comment 31

•

6 years ago

I've reproduced this and I think the continuing problems are the same issue as bug 1501238, where we allow a lot of memory to build up before triggering a collection, but exacerbated because it's happening for several workers at the same time.

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Jon Coppeard (:jonco)

Assignee

Updated

•

1 year ago

Flags: needinfo?(jcoppeard)

You need to log in before you can comment on or make changes to this bug.