Array of ChannelEvent and ChannelEventQueue aren't always cleared, causing cycles leading to leaks.
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: jya, Unassigned)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [necko-triaged])
Seen in bug 1588241.
We added the ability to queue events via a lambda to the ChannelEventQueue. Unfortunately, our static analyser will choke if we attempt to capture a ref counted object by pointer; so we changed all the events that were capturing the HttpChannelChild* by value into a RefPtr.
This caused leaks to appear like in:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275958554&repo=autoland&lineNumber=23410
The reason is that mUnknownDecoderEventQ array of events isn't always cleared. It is only cleared if HttpChannelChild::UnknownDecoderInvolvedOnStartRequestCalled() is called. This was introduced with bug 1197679 .
Adding an event in mUnknownDecoderEventQ could create a cycle with HttpChannelChild (the same issue exists in FTPChannelChild).
This bug is to track the proper handling of mUnknownDecoderEventQ and what we should do with those events queued. Can we just drop them, or run them when Cancel/Resume is called?
Updated•6 years ago
|
| Reporter | ||
Comment 1•6 years ago
|
||
This try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a8ff9c0296b81388fcc0906ec0f79d96aafae38 which eliminated the reference cycle in HttpChannelChild::mUnknownDecoderEventQ shows that the ChannelEventQueue itself doesn't run all its events.
We see:
leak at mozilla::net::HttpChannelChild::OnStopRequest, mozilla::net::ChannelEventQueue::FlushQueue, CompleteResume, mozilla::net::ChannelEventQueue::ResumeInternal
The task queued in HttpChannelChild::OnStopRequest isn't always run, and stays in the queue; causing the cycle between the task and HttpChannelChild to never be broken, hence the leek
That the OnStopRequest tasks isn't always run could possibly be a source of intermittent failures.
| Reporter | ||
Comment 2•6 years ago
|
||
In this try run where there was still a self reference kept in the lambda running in HttpChannelChild::ProcessOnTransportAndData, we see that we have a leak: that too didn't get run.
https://treeherder.mozilla.org/logviewer.html#?job_id=276164522&repo=try
(Asan WPT5 run)
TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at _M_init_functor, _M_init_functor, function, mozilla::net::HttpChannelChild::ProcessOnTransportAndData
The leak can only occur if the event queued in https://searchfox.org/mozilla-central/rev/cac340f10816707e91c46db6d285f80259420f07/netwerk/protocol/http/HttpChannelChild.cpp#760 wasn't run.
Comment 3•6 years ago
|
||
Nhi, can you find someone to take a look? Maybe Junior?
Updated•6 years ago
|
Comment 4•6 years ago
|
||
After discuss with :jya, let's start with assertion of empty event queue to find the culprit.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b4cb637210da5a1f1c554b171bb8f362912faa0
Note: If we can have eventQ clear and assert it, we might remove UnsafePtr.
Comment 5•6 years ago
|
||
| Reporter | ||
Comment 6•6 years ago
•
|
||
Here is a try run when I added an assertion that mEventQ would be empty when the HttpChannelChild destructor is called:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280768660&repo=try&lineNumber=19606
And unsurprisingly: Assertion failure: mEventQ->IsEmpty() (task list must be empty), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:208
Comment 7•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #6)
Here is a try run when I added an assertion that mEventQ would be empty when the HttpChannelChild destructor is called:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280768660&repo=try&lineNumber=19606
And unsurprisingly: Assertion failure: mEventQ->IsEmpty() (task list must be empty), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:208
Should be this xhr/open-url-base-inserted-after-open.htm
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a54b7770bd0e42c353cc7bafa067a06af597d96&selectedJob=280777404
Thanks, :jya!
Comment 8•6 years ago
|
||
Not sure why I can't reproduce locally either debug or asan, but it looks perma fail in treeherder.
Comment 9•6 years ago
|
||
Just notice that Pernosco only works with Linux x64 debug tests :(
Updated•5 years ago
|
Comment 10•4 years ago
|
||
This code is not used any more.
Updated•4 years ago
|
Description
•