Memory leak on nsPipe when building ClonedMessageData of Blob.
Categories
(Core :: DOM: File, defect, P3)
Tracking
()
People
(Reporter: vul3vm06, Assigned: shelly)
Details
(Keywords: memory-leak)
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0
Steps to reproduce:
Post a message with local file (> 100KB) to service worker, and do not reference the file on service worker side at all.
test page: https://vul3vm06.github.io/post-file-to-sw/
Actual results:
The following unreported blocks accumulates steadily every time the file is posted to service worker.
Unreported {
18 blocks in heap block record 60 of 15,643
73,728 bytes (73,728 requested / 0 slop)
Individual block sizes: 4,096 x 18
0.06% of the heap (28.91% cumulative)
0.15% of unreported (76.41% cumulative)
Allocated at {
#01: mozilla::dmd::StackTrace::Get(mozilla::dmd::Thread*) (/mnt/md0/gecko-dev/memory/replace/dmd/DMD.cpp:663)
#02: mozilla::dmd::AllocCallback(void*, unsigned long, mozilla::dmd::Thread*) (/mnt/md0/gecko-dev/memory/replace/dmd/DMD.cpp:1024)
#03: replace_malloc(unsigned long) (/mnt/md0/gecko-dev/memory/replace/dmd/DMD.cpp:1092)
#04: nsSegmentedBuffer::AppendNewSegment() (/mnt/md0/gecko-dev/xpcom/io/nsSegmentedBuffer.cpp:60)
#05: nsPipe::GetWriteSegment(char*&, unsigned int&) (/mnt/md0/gecko-dev/xpcom/io/nsPipe3.cpp:822)
#06: nsPipeOutputStream::WriteSegments(nsresult ()(nsIOutputStream, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) (/mnt/md0/gecko-dev/xpcom/io/nsPipe3.cpp:1665)
#07: nsStreamCopierOB::DoCopy(nsresult*, nsresult*) (/mnt/md0/gecko-dev/xpcom/io/nsStreamUtils.cpp:560)
#08: nsAStreamCopier::Process() (/mnt/md0/gecko-dev/xpcom/io/nsStreamUtils.cpp:302)
}
}
Actually, there is a nsPipe object whose refcnt remains 3, and therefore is never deleted.
Expected results:
The nsPipe should be deleted, and no accumulated blocks on dmd dump.
See the attachment for details: https://drive.google.com/file/d/1JuHHQt8_-wVSANXS_iu7vajAMmRl0iKt/view?usp=sharing
Steps:
- Open firefox, browse the test page: https://vul3vm06.github.io/post-file-to-sw/.
(Mimimize memory usage and colloect gc/cc/dmd, save to about-memory-0 in the attachment) - Upload a file, and click the button to post message to service worker.
(Mimimize memory usage and colloect gc/cc/dmd, save to about-memory-1 in the attachment) - Repeat step 2, upload a file and click send.
(Mimimize memory usage and colloect gc/cc/dmd, save to about-memory-2 in the attachment) - Close the test page.
(Mimimize memory usage and colloect gc/cc/dmd, save to about-memory-3 in the attachment)
In step 2, there is a nsPipe object whose refcnt remains 3, and therefore is never deleted.
The backtrace of the nsPipe::AddRef and nsPipe::Release are captured and in the the attachment too, see nsPipe-addref-release folder.
The gecko-dev git HEAD on this experiment is
commit 702199bca53babc925e47fd8f86ed56487d42492
Date: Sun Nov 21 13:50:24 2021 +0000
This was originally found by b2g device MTBF test.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
•
|
||
Patch seems simple but the root cause is complicated, I try to describe it as brief as possible.
When building ClonedMessageData of a Blob, gecko will clone the inputStream [1] and wrap it as a RemoteLazyStream in serialize phase.
If the inputStream is not cloneable (i.e. FileStream), NS_CloneInputStream [2] creates a new pipe and return two cloned stream (reader and writer). Then first, clone reader stream, wrap this cloned inputStream as RemoteLazyStream, serialize it with IPC. Second, copy the source data to writer stream.
To start the copy phase, copier holds references of writer stream (this adds two extra refcnt of pipe). And process data copy iff there is read request from child side. Then close the writer stream and release them when done with copy. When child has done with RemoteLazyStream, RemoteLazyInputStreamParent will try to release the cloned inputStream, closing itself explicitly [3].
Logic in nsPipe::OnInputStreamException [3] shows that gecko only shutdown the entire pipe iff there is one stream left and it is the given stream. So here we will not shutdown the pipe because there are two inputStreams left, one is the original inputStream, the other is the cloned inputStream. The cloned inputStream will be closed by ActorDestroy of RemoteLazyInputStreamParent and decrease one refcnt of this pipe.
In nsPipe::Release [4], shows that gecko only release the original inputStream iff the refcnt of pipe decrease to one. In the scenario where no read request from child, copier never finishes, two extra refcnt of pipe hold by copier, original inputStream will not be release. Thus, this is why I try to close the inputStream (a.k.a. the original inputStream) in advance when destructing StreamBlobImpl.
[1] https://searchfox.org/mozilla-central/source/dom/file/StreamBlobImpl.cpp#73
[2] https://searchfox.org/mozilla-central/source/xpcom/io/nsStreamUtils.cpp#859
[3] https://searchfox.org/mozilla-central/source/xpcom/io/nsPipe3.cpp#905-910
[4] https://searchfox.org/mozilla-central/source/xpcom/io/nsPipe3.cpp#516-519
Updated•3 years ago
|
Comment 4•3 years ago
|
||
So in normal case when someone does read the data, what calls close on the original input stream?
[3] doesn't seem to link to any closing step.
The patch is probably fine, but why aren't we closing this all in ActorDestroy of RemoteLazyInputStream?
Updated•3 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #4)
So in normal case when someone does read the data, what calls close on the original input stream?
[3] doesn't seem to link to any closing step.
Normally if child side request reading data, we will process and finish with the copier, close the sink and null out holding references.
https://searchfox.org/mozilla-central/source/xpcom/io/nsStreamUtils.cpp#366-367
The patch is probably fine, but why aren't we closing this all in ActorDestroy of RemoteLazyInputStream?
Currently the inputStream(the cloned one) is closed by ActorDestroy of RemoteLazyInputStreamParent already (via RemoteLazyInputStreamStorage::ForgetStream). I'm not sure why the "original" inputStream(owned by StreamBlobImpl) is not closed when destructing StreamBlobImpl (the destructing happens way earlier then ActorDestroy), I can expose that original inputStream to RemoteLazyInputStreamParent, just that it will involve some amount of interface changes to serialization related functions [1] though.
[1] https://searchfox.org/mozilla-central/source/dom/file/ipc/IPCBlobUtils.cpp#133-134
I also tried using NS_CancelAsyncCopy in ActorDestroy of RemoteLazyInputStreamParent (plus lots of interface changing just to expose copier context), end up no help since nsAStreamCopier::Cancel won't lead to process copier or pipe shutdown....
Assignee | ||
Comment 6•3 years ago
|
||
(In reply to Shelly Lin [:shelly] from comment #5)
(In reply to Olli Pettay [:smaug] from comment #4)
Currently the inputStream(the cloned one) is closed by ActorDestroy of RemoteLazyInputStreamParent already (via RemoteLazyInputStreamStorage::ForgetStream). I'm not sure why the "original" inputStream(owned by StreamBlobImpl) is not closed when destructing StreamBlobImpl (the destructing happens way earlier then ActorDestroy),
Answering myself, reference of original inputStream is still hold by the pipe.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Depends on D133717
Assignee | ||
Updated•3 years ago
|
Comment 9•3 years ago
|
||
Backed out for causing mochitest failures on test_bug1742540.html.
[task 2022-01-11T11:08:28.582Z] 11:08:28 INFO - TEST-START | dom/file/tests/test_bug1742540.html
[task 2022-01-11T11:13:51.919Z] 11:13:51 INFO - Buffered messages logged at 11:08:24
[task 2022-01-11T11:13:51.919Z] 11:13:51 INFO - add_task | Entering test setupPrefs
[task 2022-01-11T11:13:51.919Z] 11:13:51 INFO - add_task | Leaving test setupPrefs
[task 2022-01-11T11:13:51.920Z] 11:13:51 INFO - add_task | Entering test send_file_to_serviceworker
[task 2022-01-11T11:13:51.920Z] 11:13:51 INFO - TEST-PASS | dom/file/tests/test_bug1742540.html | ServiceWorker is activated
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - Buffered messages finished
[task 2022-01-11T11:13:51.922Z] 11:13:51 WARNING - TEST-UNEXPECTED-FAIL | dom/file/tests/test_bug1742540.html | Test timed out. -
[task 2022-01-11T11:13:51.922Z] 11:13:51 WARNING - TEST-UNEXPECTED-FAIL | dom/file/tests/test_bug1742540.html | This test left a service worker registered without cleaning it up
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:417:16
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - afterCleanup@SimpleTest/SimpleTest.js:1436:18
[task 2022-01-11T11:13:51.922Z] 11:13:51 WARNING - TEST-UNEXPECTED-FAIL | dom/file/tests/test_bug1742540.html | Left over worker: http://mochi.test:8888/tests/dom/file/tests/worker_bug1742540.js (scope: http://mochi.test:8888/tests/dom/file/tests/)
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:417:16
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - afterCleanup@SimpleTest/SimpleTest.js:1441:20
[task 2022-01-11T11:13:51.922Z] 11:13:51 INFO - TEST-OK | dom/file/tests/test_bug1742540.html | took 323501ms
[task 2022-01-11T11:13:51.922Z] 11:13:51 WARNING - TEST-UNEXPECTED-FAIL | /tests/dom/file/tests/test_bug1742540.html logged result after SimpleTest.finish(): This test left a service worker registered without cleaning it up
[task 2022-01-11T11:13:51.923Z] 11:13:51 WARNING - TEST-UNEXPECTED-FAIL | /tests/dom/file/tests/test_bug1742540.html logged result after SimpleTest.finish(): Left over worker: http://mochi.test:8888/tests/dom/file/tests/worker_bug1742540.js (scope: http://mochi.test:8888/tests/dom/file/tests/)
[task 2022-01-11T11:13:51.923Z] 11:13:51 INFO - TEST-START | dom/file/tests/test_file_from_blob.html
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0d9949fa97e2
https://hg.mozilla.org/mozilla-central/rev/84cae67e60f6
Comment 12•3 years ago
|
||
Seems like we can let this fix ride the trains, but feel free to nominate for uplift if you feel otherwise.
Updated•3 years ago
|
Description
•