Closed
Bug 1301863
Opened 8 years ago
Closed 7 years ago
Crash caused by memory leak when using WebSocket in Worker
Categories
(Core :: DOM: Core & HTML, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla57
People
(Reporter: browser_defects, Assigned: jonco, NeedInfo)
References
Details
(Keywords: crash, Whiteboard: [MemShrink:P2])
Attachments
(3 files, 4 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0 Build ID: 20160823121617 Steps to reproduce: 1. Download and install nodejs (https://nodejs.org/en/) 2. Download and unzip the attached prototype 3. Open shell and navigate to the folder extracted in point 2 4. Start the server by typing: node server.js 5. Open Firefox and navigate to http://localhost:8888 6. Observer the Firefox memory consumption in the task manager. Link to crash report: https://crash-stats.mozilla.com/report/index/c94acc22-8f80-4a7d-9356-8fad12160909 Actual results: After running for 5-10 minutes browser either crash because reach the maximum memory allowed for the process or terminate the worker which can be seen in server console log as closed connection. Expected results: Memory consumption should be stable. This is only seen in Firefox.
Reporter | ||
Comment 1•8 years ago
|
||
about:memory report
Severity: normal → critical
Component: Untriaged → DOM: Service Workers
Keywords: crash
Product: Firefox → Core
Comment 2•8 years ago
|
||
I think the issue here is that the server sends a message every 20ms without regard to whether the client can keep up or not. We probably don't provide any meaningful back-pressure here and just let it fill up the event queue. We should perhaps track the number of outstanding web socket messages remaining to fire and if we exceed some threshold stop reading from the incoming socket. Once the messages are consumed we can then begin reading the socket again.
Status: UNCONFIRMED → NEW
Component: DOM: Service Workers → Networking: WebSockets
Ever confirmed: true
Comment 3•8 years ago
|
||
Of course, if we stop reading from the socket the tcp socket will then apply back-pressure to the server once the tcp window is filled. Your server.js setInterval() will then be filling some server-side buffer and eventually OOM your server instead. You should probably build some kind of throttling or flow-control mechanism into your client/server app.
Comment 4•8 years ago
|
||
I does sound like we want to do this in order to avoid OOM. Unsure if this should be done in WebSocketChannel or in dom/base/WebSocket.cpp
Flags: needinfo?(michal.novotny)
Flags: needinfo?(amarchesini)
Reporter | ||
Comment 5•8 years ago
|
||
It doesn't matter if we increase the interval that server will send data to the Worker, it still leaks. It is slower, but browser still does not release memory. When I set the interval to 1000 ms it still leaks for about 100 MB per 10 minutes. The sample was kept as simple as possible. In the real application we consume the event and then set it to null and it still leaks memory. If you think of any workaround until the fix is made that will be very helpful.
Comment 6•8 years ago
|
||
It seems a dup of bug 1280916. I'll investigate it today.
Flags: needinfo?(amarchesini)
Comment 7•8 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #4) > I does sound like we want to do this in order to avoid OOM. > Unsure if this should be done in WebSocketChannel or in > dom/base/WebSocket.cpp In WebSocketChannel we can't know whether nsIWebSocketListener receives or queues the incoming frames. All we could do is to implement suspend/resume methods that would be called by the listener in case it's not able to consume the frames quickly enough. Anyway, as per comment 6 it seems the problem is somewhere else.
Flags: needinfo?(michal.novotny)
Comment 8•8 years ago
|
||
(In reply to Andrea Marchesini [:baku] from comment #6) > It seems a dup of bug 1280916. I'll investigate it today. Thanks!
Assignee: nobody → amarchesini
Whiteboard: [MemShrink] → [MemShrink][necko-active]
Comment 9•8 years ago
|
||
The issue here is exactly what I have seen in bug 1280916: WebSocket creates some JS objects (blobs here, but similar issue happens with strings) but then not all of them are GCed. I'm sure that is not CC because I interrupted the dispatching of WebSocket events here: https://dxr.mozilla.org/mozilla-central/source/dom/base/WebSocket.cpp#2000 Every is correctly GCed when I run GC from about:memory. The |JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER)| is correctly executed by the worker thread. Jon, any feedback about this?
Flags: needinfo?(jcoppeard)
Comment 10•8 years ago
|
||
Don't we just GC less often on worker threads? Is this just a consequence of that design decision? Can we trigger a more aggressive GC/CC at some point?
Comment 11•8 years ago
|
||
the mPeriodicGCTimerTarget runs often. This executes JS_MaybeGC(aCx). Less often we execute JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER); but they still run. But we still leak. I don't think the issue is about when we trigger GC/CC. I want to test comment 5.
Comment 12•8 years ago
|
||
I wonder if something on the main thread is keeping it alive until the about:memory triggers a cycle collection there, etc.
Updated•8 years ago
|
Flags: needinfo?(jcoppeard)
Comment 14•8 years ago
|
||
I confirm that we correctly execute mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal. Quite often we end up calling JS_MaybeGC(aCx); and any few seconds, we call JS::GCForReason(aCx, GC_SHRINK, JS::gcreason::DOM_WORKER);
Updated•8 years ago
|
Whiteboard: [MemShrink][necko-active] → [MemShrink:P2][necko-active]
Comment 15•8 years ago
|
||
NI me if there is something else I can help with.
Assignee: amarchesini → nobody
Updated•8 years ago
|
Component: Networking: WebSockets → DOM
Whiteboard: [MemShrink:P2][necko-active] → [MemShrink:P2]
Updated•7 years ago
|
Priority: -- → P3
Comment 16•7 years ago
|
||
Jon, can you give feedback on comment 9?
Flags: needinfo?(sphink) → needinfo?(jcoppeard)
Assignee | ||
Comment 17•7 years ago
|
||
I've tried reproducing this and cannot... it's sitting here running and I'm seeing lots of received packets but no build up of memory. If I had to guess I'd say that the data is being kept alive by something outside JS but wouldn't be surprised if there was some problem with the way we schedule GC for workers either. baku, is there anything else I need to do to repro this other than the original instructions? Do you know if it's still a problem?
Flags: needinfo?(jcoppeard) → needinfo?(amarchesini)
Assignee | ||
Comment 18•7 years ago
|
||
Never mind, I can reproduce this with e10s disabled. Memory usage grows to > 7 GB as reported by activity monitor. Multiple worker runtimes are created but not all have GCs triggered on them. about:memory shows most usage is memory-file-data: 11,455.77 MB (100.0%) -- explicit ├──11,027.59 MB (96.26%) -- dom │ ├──11,027.52 MB (96.26%) -- memory-file-data │ │ ├──11,027.50 MB (96.26%) ── string [176440] │ │ └───────0.02 MB (00.00%) ── large/file(length=5430, sha1=49263695f6b0cdd72f45cf1b775e660fdc36c606) [2] │ └───────0.08 MB (00.00%) ++ (2 tiny) ├─────258.06 MB (02.25%) ++ (27 tiny) └─────170.11 MB (01.48%) ── heap-unclassified Minimise memory usage frees memory down to a normal size again.
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 19•7 years ago
|
||
The problem here is that a JS reflector object can keep a large DOM object alive but the JS GC by default doesn't know about this extra memory and so doesn't trigger a GC. The patch adds a function which can be overloaded to provide this information and implements it for blobs. This causes GCs to be triggered for this test case and fixes the memory leak. We should probably do something like this for other large DOM objects as well.
Assignee: nobody → jcoppeard
Attachment #8884867 -
Flags: review?(peterv)
Comment 20•7 years ago
|
||
Note that we are calling JS_updateMallocCounter in various places already. How does this interact with some of the existing callers related to blobs (for example http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/dom/canvas/CanvasRenderingContextHelper.cpp#50)? Will we double-count?
Flags: needinfo?(jcoppeard)
Comment 21•7 years ago
|
||
Comment on attachment 8884867 [details] [diff] [review] account-blob-reflectors -ing while waiting for response to question in comment 20.
Attachment #8884867 -
Flags: review?(peterv) → review-
Assignee | ||
Comment 22•7 years ago
|
||
Sorry for the long delay. I checked the other uses of JS_updateMallocCounter and the previous patch would double-count when CanvasRenderingContextHelper was converted to a blob so I removed that use.
Attachment #8884867 -
Attachment is obsolete: true
Flags: needinfo?(jcoppeard)
Attachment #8906033 -
Flags: review?(peterv)
Assignee | ||
Comment 23•7 years ago
|
||
And further to the last patch, we should remove the update call from OffscreenCanvas when converting to a blob (since blobs are already handled) and when converting HTMLCanvasElement to a File (since file is a subclass of Blob and is also handled).
Attachment #8906642 -
Flags: review?(peterv)
Comment 24•7 years ago
|
||
Comment on attachment 8906033 [details] [diff] [review] bug1301863-account-blob-reflectors v2 Review of attachment 8906033 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/file/Blob.cpp @@ +286,5 @@ > mImpl->GetInternalStream(aStream, aRv); > } > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) blob -> aBlob @@ +287,5 @@ > } > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ MOZ_ASSERT(aBlob) @@ +288,5 @@ > > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ > + ErrorResult error; Use IgnoredErrorResult and remove error.SuppressException(). @@ +289,5 @@ > +uint64_t > +BindingJSObjectMallocBytes(Blob* blob) > +{ > + ErrorResult error; > + uint64_t size = blob->GetSize(error); This is wrong. Blob size is different than the memory allocation. For instance if this Blob is a real file on disk, GetSize() returns the file size, but we don't allocate such amount of memory. Some BlobImpl objects have a custom memory report. maybe we could expand that or implement a AllocationSize() method. ::: dom/file/Blob.h @@ +146,5 @@ > private: > nsCOMPtr<nsISupports> mParent; > }; > > +uint64_t BindingJSObjectMallocBytes(Blob* blob); aBlob here as well. Add a comment about why we need this function.
Attachment #8906033 -
Flags: review?(peterv) → review-
Assignee | ||
Updated•7 years ago
|
Attachment #8906642 -
Flags: review?(peterv)
Assignee | ||
Comment 25•7 years ago
|
||
Thanks for the review. How does this look? I added a GetAllocationSize() method and implemented it for MemoryBlobImpl, StringBlobImpl and MultipartBlobImpl.
Attachment #8906033 -
Attachment is obsolete: true
Attachment #8906642 -
Attachment is obsolete: true
Attachment #8906663 -
Flags: review?(amarchesini)
Comment 26•7 years ago
|
||
Comment on attachment 8906663 [details] [diff] [review] bug1301863-account-blob-reflectors v3 Review of attachment 8906663 [details] [diff] [review]: ----------------------------------------------------------------- You should do something also for StreamBlobImpl. Wondering if you can use stringInputStream->SizeOfIncludingThis(MallocSizeOf) for that. ::: dom/file/BaseBlobImpl.h @@ +96,5 @@ > } > > virtual void GetType(nsAString& aType) override; > > + size_t GetAllocationSize() override size_t GetAllocationSize() const override @@ +193,5 @@ > } // namespace dom > } // namespace mozilla > > #endif // mozilla_dom_BaseBlobImpl_h > + remove this extra line ::: dom/file/Blob.cpp @@ +214,5 @@ > return blob.forget(); > } > > +size_t > +Blob::GetAllocationSize() const? ::: dom/file/Blob.h @@ +129,5 @@ > const Optional<int64_t>& aEnd, > const nsAString& aContentType, > ErrorResult& aRv); > > + size_t GetAllocationSize(); size_t GetAllocationSize() const; ::: dom/file/BlobImpl.h @@ +50,5 @@ > virtual uint64_t GetSize(ErrorResult& aRv) = 0; > > virtual void GetType(nsAString& aType) = 0; > > + virtual size_t GetAllocationSize() = 0; virtual size_t GetAllocationSize() const = 0;
Attachment #8906663 -
Flags: review?(amarchesini) → review+
Assignee | ||
Comment 27•7 years ago
|
||
Updated patch with comments applied.
Attachment #8906698 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Attachment #8906663 -
Attachment is obsolete: true
Comment 28•7 years ago
|
||
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6187a252c146 Tell the JS engine how much memory blob reflectors hold alive r=baku
Comment 29•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6187a252c146
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•7 years ago
|
Reporter | ||
Comment 30•6 years ago
|
||
Hello, I just retested our sample app. Now instead of crashing after 10 minutes, it reaches 90% memory in 20 minutes, stays there for 2 hours and then crashes. User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:62.0) Gecko/20100101 Firefox/62.0 Build ID: 20180920131237 Same attachment: ff_worker.zip Same steps to reproduce. You can see the video recording here: https://youtu.be/cU-1HemJm6M Do you think it's the same bug as bug 1280916 or it's something else?
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 31•5 years ago
|
||
I've reproduced this and I think the continuing problems are the same issue as bug 1501238, where we allow a lot of memory to build up before triggering a collection, but exacerbated because it's happening for several workers at the same time.
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•