Closed Bug 1517601 Opened 5 years ago Closed 5 years ago

Leaks in /websockets/ WPTs

Categories

(Core :: Networking: WebSockets, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2][necko-triaged])

Attachments

(3 files)

LSan detects some leaks in the /websockets/ web platform tests. This includes some event queue thing, a worker private, some web socket stuff, and maybe a doc shell:

leak at NewPage, Push, mozilla::EventQueue::PutEvent, Dispatch, mozilla::ThrottledEventQueue::Dispatch, Dispatch, mozilla::ThrottledEventQueue::Inner::EnsureExecutor
leak at Alloc, nsTSubstring, nsTSubstring, mozilla::net::nsStandardURL::BuildNormalizedSpec, mozilla::net::nsStandardURL::SetSpecWithEncoding, mozilla::net::nsStandardURL::Init, Init
leak at mozilla::net::nsStandardURL::TemplatedMutator, Init, mozilla::net::nsStandardURL::TemplatedMutator, operator, std::_Function_handler, operator, Apply
leak at mozilla::WeakPtr, WeakPtr, mozilla::Maybe, mozilla::ContentPrincipal::AddonPolicy, nsIDocument::InitCSP, nsIDocument::StartDocumentLoad, nsHTMLDocument::StartDocumentLoad
leak at mozilla::ThrottledEventQueue::Create, mozilla::dom::WorkerPrivate::WorkerPrivate, mozilla::dom::WorkerPrivate::Constructor, mozilla::dom::Worker::Constructor, mozilla::dom::Worker_Binding::_constructor, CallJSNative, CallJSNativeConstructor
leak at nsDocShell::Create, nsWebBrowser::Create, mozilla::dom::TabChild::Init, mozilla::dom::ContentChild::ProvideWindowCommon, mozilla::dom::TabChild::ProvideWindow, nsWindowWatcher::OpenWindowInternal, OpenWindow2
leak at PLDHashTable::Add, mozilla::net::nsLoadGroup::AddRequest, nsBaseChannel::AsyncOpen, AsyncOpen2, nsBaseChannel::AsyncOpen2, nsURILoader::OpenURI, nsDocShell::DoChannelLoad
leak at mozilla::dom::WebSocket::WebSocket, mozilla::dom::WebSocket::ConstructorCommon, mozilla::dom::WebSocket::Constructor, mozilla::dom::WebSocket_Binding::_constructor, CallJSNative, CallJSNativeConstructor, InternalConstruct
leak at mozilla::BasePrincipal::CreateCodebasePrincipal, mozilla::BasePrincipal::CreateCodebasePrincipal, nsScriptSecurityManager::GetChannelURIPrincipal, nsScriptSecurityManager::GetChannelResultPrincipal, nsIDocument::Reset, nsHTMLDocument::Reset, nsIDocument::StartDocumentLoad
leak at Create, nsAtomTable::Atomize, mozilla::BasePrincipal::FinishInit, mozilla::ContentPrincipal::Init, mozilla::BasePrincipal::CreateCodebasePrincipal, mozilla::BasePrincipal::CreateCodebasePrincipal, nsScriptSecurityManager::GetChannelURIPrincipal
leak at NewPage, Push, mozilla::EventQueue::PutEvent, Dispatch, mozilla::ThrottledEventQueue::Dispatch, DispatchToMainThread, mozilla::dom::WorkerControlRunnable::DispatchInternal
leak at nsSupportsWeakReference::GetWeakReference, NS_GetWeakReference, do_GetWeakReference, mozilla::net::nsLoadGroup::SetGroupObserver, NS_NewLoadGroup, nsDocLoader::Init, nsDocShell::Create
leak at mozilla::SchedulerGroup::CreateEventTargetFor, mozilla::SchedulerGroup::CreateEventTargets, mozilla::dom::TabGroup::TabGroup, mozilla::dom::nsIContentChild::GetConstructedEventTarget, mozilla::ipc::IToplevelProtocol::ToplevelState::GetMessageEventTarget, GetMessageEventTarget, mozilla::ipc::MessageChannel::MessageTask::Post
leak at Create, mozilla::ThrottledEventQueue::Create, mozilla::dom::WorkerPrivate::WorkerPrivate, mozilla::dom::WorkerPrivate::Constructor, mozilla::dom::Worker::Constructor, mozilla::dom::Worker_Binding::_constructor, CallJSNative
leak at Alloc, nsTSubstring, nsTSubstring, nsTSubstring, nsTString, InitRunnable, mozilla::dom::WebSocket::ConstructorCommon[

My reply there applies to this, too.

Flags: needinfo?(continuation)
Blocks: 1517309
Assignee: nobody → continuation
Priority: -- → P2
Whiteboard: [MemShrink] → [MemShrink][necko-triaged]
Whiteboard: [MemShrink][necko-triaged] → [MemShrink:P2][necko-triaged]
See Also: → 1516300

I added logging for the ctor and dtor of WebSocketImpl and then looked at what test the leaking web sockets were created during.

There are two tests:
/websockets/Create-on-worker-shutdown.any.html
/websockets/Create-on-worker-shutdown.any.worker.html

Web socket's RecvOnStop sticks a runnable into the channel queue via RunOrEnqueue. That calls into ResumeInternal() which tries to fire off a CompleteResumeRunnable on the worker to call CompleteResume(). However, the worker is at some step of being shut down, so that dispatch fails. That seems to mean that the ChannelEventQueue just gets stuck with the StopEvent in the queue.

I used DMD heap scan mode to see that there is a cycle of strong references from WebSocketChannelChild -> ChannelEventQueue -> StopEvent -> WebSocketChannelChild, so they all leak. The ChannelChild holds alive the WebSocketImpl.

It kind of seems like the ChannelEvent should be thrown out rather than leaked if the dispatch fails, but maybe in general that is a bad idea, because the ChannelEvent might not expect to be destroyed on another thread. Another solution might be to add a Cancel() method to ChannelEvent and call that if the dispatch fails. StopEvent could override that to clear mChild.

I hacked up a more local fix, which is just to make the mChild reference weak, thus preventing the cycle. That at least fixes the leak.

Component: DOM: Networking → Networking: WebSockets

I moved this from DOM: Networking to Networking: WebSockets because the issue is either in WebSocketChannelChild or ChannelEventQueue, in my opinion.

Honza, do you think I should fix this in web socket code, or would a more general fix in ChannelEventQueue make sense? RunOrEnqueue has a lot of callers, so I'm inclined to just fix the WebSocket code directly. Thanks.

Flags: needinfo?(honzab.moz)

There's a number of different ChannelEvents in this file that also hold strong references to the ChannelChild, so presumably those should also be fixed.

I looked at this some more, and it feels like a fix local to WebSocket is appropriate. For the non-WebSocket channel events, they keep only weak references to their channel (to prevent a cycle), and the ChannelEventQueue itself ensures that the channel stays alive as long as it needs to. That was added in bug 1371203.

For WebSocket, the strong references to the channel from the inner ChannelEvents were added in bug 1123021 (another security bug), but the use there was via WrappedChannelEvent::Run(), which is a use of the inner ChannelEvents via a wrapper that isn't associated with the ChannelEventQueue.

I think then it makes sense to change the inner ChannelEvents of WebSocket to a separate class. The Run() method of that will take a pointer to the channel, so the individual "ChannelEvents" won't have to manage the lifetime of these objects. WrappedChannelEvent can have a weak reference to the channel and WrappedChannelEvent can have a strong reference to the channel.

Flags: needinfo?(honzab.moz)
See Also: → 1524100
See Also: → 1523489

Later patches change the WrappedChannelEvent ctor, so I'm removing
this unused method to avoid having to fix it up.

In the next patch, I want to change the signature of Run(), so I need
to create a new base class for these inner WebSocket events.

For now, this class is the same as ChannelEvent, except that it does
not have the GetEventTarget method, which is never called.

Depends on D19585

This patch moves the channel pointer from the WebSocketEvents into the
classes that wrap them (EventTargetDispatcher and
WrappedWebSocketEvent) to fix leaks.

EventTargetDispatcher uses a weak pointer. This is needed to prevent a
leak if the ChannelEvent dispatch fails, because it would create a
cycle of strong references between the WebSocketEvent, the channel,
the channel event queue, and EventTargetDispatcher. It is safe because
the ChannelEventQueue ensures that the channel remains alive.

WrappedWebSocketEvent uses a strong pointer. This won't create a leak
because the runnable is not owned by the channel. It is needed for
safety because it can't rely on the ChannelEventQueue mechanism for
keeping the channel alive.

The WPT whitelisting moves them into two subdirectories that still
seem to leak sometimes, but the top level websockets/ directory seems
okay now.

Depends on D19586

Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f6fc796dca2
part 1 - Remove the unused WebSocketChannelChild::DispatchToTargetThread() method. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/57206fca017d
part 2 - Create and use a new WebSocketEvent base class instead of ChannelEvent. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/56f7fee6d139
part 3 - WebSocketEvent subclasses should not hold strong references to the channel. r=mayhemer

We should probably just let this ride the trains given where we are in the cycle.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: