Closed Bug 1596295 Opened 6 years ago Closed 4 years ago

Array of ChannelEvent and ChannelEventQueue aren't always cleared, causing cycles leading to leaks.

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: jya, Unassigned)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Seen in bug 1588241.

We added the ability to queue events via a lambda to the ChannelEventQueue. Unfortunately, our static analyser will choke if we attempt to capture a ref counted object by pointer; so we changed all the events that were capturing the HttpChannelChild* by value into a RefPtr.

This caused leaks to appear like in:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275958554&repo=autoland&lineNumber=23410

The reason is that mUnknownDecoderEventQ array of events isn't always cleared. It is only cleared if HttpChannelChild::UnknownDecoderInvolvedOnStartRequestCalled() is called. This was introduced with bug 1197679 .

Adding an event in mUnknownDecoderEventQ could create a cycle with HttpChannelChild (the same issue exists in FTPChannelChild).

This bug is to track the proper handling of mUnknownDecoderEventQ and what we should do with those events queued. Can we just drop them, or run them when Cancel/Resume is called?

This try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a8ff9c0296b81388fcc0906ec0f79d96aafae38 which eliminated the reference cycle in HttpChannelChild::mUnknownDecoderEventQ shows that the ChannelEventQueue itself doesn't run all its events.

We see:
leak at mozilla::net::HttpChannelChild::OnStopRequest, mozilla::net::ChannelEventQueue::FlushQueue, CompleteResume, mozilla::net::ChannelEventQueue::ResumeInternal

The task queued in HttpChannelChild::OnStopRequest isn't always run, and stays in the queue; causing the cycle between the task and HttpChannelChild to never be broken, hence the leek

That the OnStopRequest tasks isn't always run could possibly be a source of intermittent failures.

Summary: Array of ChannelEvent isn't always cleared, causing cycles leading to leaks. → Array of ChannelEvent and ChannelEventQueue aren't always cleared, causing cycles leading to leaks.

In this try run where there was still a self reference kept in the lambda running in HttpChannelChild::ProcessOnTransportAndData, we see that we have a leak: that too didn't get run.

https://treeherder.mozilla.org/logviewer.html#?job_id=276164522&repo=try

(Asan WPT5 run)
TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at _M_init_functor, _M_init_functor, function, mozilla::net::HttpChannelChild::ProcessOnTransportAndData

The leak can only occur if the event queued in https://searchfox.org/mozilla-central/rev/cac340f10816707e91c46db6d285f80259420f07/netwerk/protocol/http/HttpChannelChild.cpp#760 wasn't run.

Nhi, can you find someone to take a look? Maybe Junior?

Flags: needinfo?(nhnguyen)
Priority: -- → P2
Whiteboard: [necko-triaged]
Assignee: nobody → juhsu
Flags: needinfo?(nhnguyen)

After discuss with :jya, let's start with assertion of empty event queue to find the culprit.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b4cb637210da5a1f1c554b171bb8f362912faa0

Note: If we can have eventQ clear and assert it, we might remove UnsafePtr.

Here is a try run when I added an assertion that mEventQ would be empty when the HttpChannelChild destructor is called:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280768660&repo=try&lineNumber=19606
And unsurprisingly: Assertion failure: mEventQ->IsEmpty() (task list must be empty), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:208

(In reply to Jean-Yves Avenard [:jya] from comment #6)

Here is a try run when I added an assertion that mEventQ would be empty when the HttpChannelChild destructor is called:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280768660&repo=try&lineNumber=19606
And unsurprisingly: Assertion failure: mEventQ->IsEmpty() (task list must be empty), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:208

Should be this xhr/open-url-base-inserted-after-open.htm
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a54b7770bd0e42c353cc7bafa067a06af597d96&selectedJob=280777404

Thanks, :jya!

Not sure why I can't reproduce locally either debug or asan, but it looks perma fail in treeherder.

Just notice that Pernosco only works with Linux x64 debug tests :(

Assignee: CuveeHsu → nobody

This code is not used any more.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.