Open Bug 1667316 Opened 1 year ago Updated 5 months ago

RedirectSink cyclic reference doesn't get dropped

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

People

(Reporter: valentin, Assigned: valentin, NeedInfo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][qf:p1:responsiveness])

In reply to Emilio Cobos Álvarez (:emilio) from bug 1658571 comment #98)

So I dug a bit more in order to try to fix this for good. The extra reference comes from here:

#0  mozilla::PreloaderBase::RedirectSink::AddRef() (this=0x7fe2300672e0) at /home/emilio/src/moz/gecko/uriloader/preload/PreloaderBase.cpp:56
#1  0x00007fe2589fb6d5 in nsCOMPtr<nsIInterfaceRequestor>::assign_with_AddRef(nsISupports*) (this=0x7fe230014c08, aRawPtr=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:1178
#2  nsCOMPtr<nsIInterfaceRequestor>::operator=(nsIInterfaceRequestor*) (this=0x7fe230014c08, aRhs=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:691
#3  0x00007fe2589f7060 in nsBaseChannel::SetNotificationCallbacks(nsIInterfaceRequestor*) (this=0x7fe230014ae0, aCallbacks=0x7fe2300672e0) at /home/emilio/src/moz/gecko/netwerk/base/nsBaseChannel.cpp:535
#4  0x00007fe258d998ea in mozilla::net::HttpBaseChannel::SetupReplacementChannel(nsIURI*, nsIChannel*, bool, unsigned int)
    (this=0x7fe231741830, newURI=0x7fe23004d8b0, newChannel=<optimized out>, preserveMethod=<optimized out>, redirectFlags=2) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpBaseChannel.cpp:3819
#5  0x00007fe258da349d in mozilla::net::HttpChannelChild::SetupRedirect(nsIURI*, mozilla::net::nsHttpResponseHead const*, unsigned int const&, nsIChannel**)
    (this=0x7fe231741800, uri=0x7fe23004d8b0, responseHead=0x7fe233c0d1e8, redirectFlags=@0x7fe233c0d09c: 2, outChannel=0x7ffc79011160) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1309
#6  0x00007fe258da3b10 in mozilla::net::HttpChannelChild::Redirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTSubstring<char> const&, unsigned long const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=0x7fe231741800, registrarId=@0x7fe233c0d008: 52, newOriginalURI=..., newLoadFlags=@0x7fe233c0d098: 262144, redirectFlags=@0x7fe233c0d09c: 2, loadInfoForwarder=..., responseHead=..., securityInfoSerialization=..., channelId=@0x7fe233c0d2b0: 0, timing=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1358
#7  0x00007fe258db53f2 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30::operator()() const (this=<optimized out>)
    at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1271
#8  std::__invoke_impl<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(std::__invoke_other, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__f=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
#9  std::__invoke_r<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__fn=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110
#10 std::_Function_handler<void (), mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30>::_M_invoke(std::_Any_data const&) (__functor=...)
    at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291
#11 0x00007fe258d3ae5e in mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) (this=0x7fe235aac280, aCallback=<optimized out>, aAssertionWhenNotQueued=false)
    at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/mozilla/net/ChannelEventQueue.h:240
#12 0x00007fe258da31b6 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=<optimized out>, aRegistrarId=<optimized out>, aNewUri=..., aNewLoadFlags=@0x7ffc79011540: 2030108672, aRedirectFlags=@0x7ffc79011450: 2030114864, aLoadInfoForwarder=..., aResponseHead=..., aSecurityInfoSerialization=<gNullChar> "", aChannelId=@0x7ffc79011808: 0, aOldPeerAddr=..., aTiming=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1267
#13 0x00007fe2591ae9af in mozilla::net::PHttpChannelChild::OnMessageReceived(IPC::Message const&) (this=<optimized out>, msg__=...) at PHttpChannelChild.cpp:644
#14 0x00007fe2590c91f0 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7fe2634ca020, msg__=...) at PContentChild.cpp:8656
#15 0x00007fe258fe7763 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) (this=0x7fe2634ca118, aProxy=0x7fe2449e5440, aMsg=...)
    at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2150
#16 0x00007fe258fe6596 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7fe2634ca118, aMsg=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2074
#17 0x00007fe258fe6d29 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7fe2634ca118, aTask=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1922
#18 0x00007fe258fe7070 in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7fe23e376890) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1953
#19 0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run() (this=0x7fe231779e40) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#20 0x00007fe2588f6180 in mozilla::RunnableTask::Run() (this=0x7fe23003dd00) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#21 0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#22 0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373
#23 0x00007fe2588db893 in mozilla::TaskController::ProcessPendingMTTask(bool) (this=0x7fe2438b9380, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:170
#24 0x00007fe2588eed0d in mozilla::TaskController::InitializeInternal()::$_0::operator()() const (this=<optimized out>) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:84
#25 mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() (this=<optimized out>) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsThreadUtils.h:577
#26 0x00007fe2588e57cf in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fe2438805f0, aMayWait=<optimized out>, aResult=0x7ffc79013cc7) at /home/emilio/src/moz/gecko/xpcom/threads/nsThread.cpp:1234
#27 0x00007fe2588e8b39 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fe2300672e0, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/nsThreadUtils.cpp:513
#28 0x00007fe258fe975a in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7fe2634ab8d0, aDelegate=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/glue/MessagePump.cpp:87
#29 0x00007fe258f704e3 in MessageLoop::RunInternal() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:334
#30 0x00007fe258f7043b in MessageLoop::RunHandler() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:327

We're setting up some kind of redirect via HttpChannelChild::RecvRedirect1Begin, so we end up in HttpBaseChannel::SetupReplacementChannel with a newChannel that is a mozilla::net::nsInputStreamChannel. The redirect process works out alright afaict (we properly go through nsAsyncRedirectVerifyHelper, call OnStopRequest, etc).

But then RedirectSink::AsyncOnChannelRedirect keeps the new channel alive (mRedirectChannel = aNewChannel), and the new nsInputStreamChannel never releases its callbacks, so we remain with a cyclic reference there. That cyclic reference is supposed to be cleaned up in OnRedirectResult but that call never arrives. I think that's the underlying bug. The last point in the content process before OnStopRequest where we do something interesting with the channel is here, with this stack:

1  0x00007fe258da57be in non-virtual thunk to mozilla::net::HttpChannelChild::OnRedirectVerifyCallback(nsresult) () at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsIChildChannel.h:29
#2  0x00007fe2589fcc49 in mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run (this=0x7fe2300f91c0) at /home/emilio/src/moz/gecko/netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40
#3  0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run (this=0x7fe2300f9200) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#4  0x00007fe2588f6180 in mozilla::RunnableTask::Run (this=0x7fe2317a0900) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#5  0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#6  0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373

At this point I'm not familiar enough with necko to debug what is the right fix for that, I'll try to upload a trace to pernosco so that someone with more Necko experience can diagnose it, because as things are, we still leak the RedirectSink and the nsInputStreamChannel, as I understand it (we just don't leak the whole window).

For reference (if all goes well and I can send the trace just fine, this is a ryzen machine so rr support is recent), the leaked Preloaders (without my patch) are:

PreloaderBase::NotifyOpen(0x7fe23002bc00, aChannel = 0x7fe231741080, aIsPreload = 0, sink = 0x7fe230067150, uri = https://t.channeladvisor.com/v2/12021789.js)
PreloaderBase::NotifyOpen(0x7fe23002ec00, aChannel = 0x7fe231743080, aIsPreload = 0, sink = 0x7fe2300678d0, uri = https://www.googletagmanager.com/gtag/js?id=DC-4362844)
PreloaderBase::NotifyOpen(0x7fe23002c400, aChannel = 0x7fe231741880, aIsPreload = 0, sink = 0x7fe2300672e0, uri = https://resources.xg4ken.com/js/v2/ktag.js?tid=KT-N4270-3EB)
PreloaderBase::NotifyOpen(0x7fe23002d400, aChannel = 0x7fe231742880, aIsPreload = 0, sink = 0x7fe230067420, uri = https://connect.facebook.net/en_US/fbevents.js)
PreloaderBase::NotifyOpen(0x7fe2359c0800, aChannel = 0x7fe235ddd080, aIsPreload = 0, sink = 0x7fe235dfbba0, uri = https://s2.go-mpulse.net/boomerang/UDCTN-B4SVJ-MYFJ5-V5Q9A-U84KB)
PreloaderBase::NotifyOpen(0x7fe23593c000, aChannel = 0x7fe234561880, aIsPreload = 0, sink = 0x7fe23598c240, uri = https://recs.richrelevance.com/rrserver/p13n_generated.js?a=30280c406d639577&ts=1600969520980&v=1.2.6.20180926&ssl=t&pt=%7Chome_page.core_ol_rec_1%7Chome_page.core_ol_rec_2&u=55146031833627079542704514391466723154&s=551460318336270795427045143914667231542020824&cts=https%3A%2F%2Fwww.rei.com&l=1)

FWIW, the rr trace can't be processed by pernosco because of https://github.com/mozilla/rr/issues/2698.

If that's fixed I can try to upload it to pernosco again, but given this is 100% reproducible following the steps from bug 1658571 comment 74 debugging this locally is doable.

This is the tree I used to debug this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8acee1b1530e9d4df8e710d9e7728f90223c7912 (the dump patch should apply cleanly I think, so that may be all you need).

And this is the script I used to analyze the output to know which objects leaked: https://gist.github.com/emilio/13242b4276b68240f19b1640eb5282be

Still let me know if you need to and I can try to record it on my intel laptop and upload it to pernosco or what not.

Flags: needinfo?(valentin.gosu)

FWIW, the impact from bug 1658571 can be clearly observed on the telemetry graphs for the 95th percentile of the CYCLE_COLLECTOR_MAX_PAUSE probe (see bug 1677569): https://mzl.la/3lEiOn3
The numbers went up from ~40 to ~53 when bug 1618292 landed (May 11), and down to ~49 when bug 1658571 landed (Sept 24). So maybe the remaining ~9 come from this bug?

I've seen many performance bugs filed recently where the profiles were showing a lot of time in cycle collection in the content process. If there's a possibility that this bug is the cause for that, we should treat it with a very high priority.

Whiteboard: [necko-triaged] → [necko-triaged][qf]
Whiteboard: [necko-triaged][qf] → [necko-triaged][qf:p1:responsiveness]
Blocks: 1677569
You need to log in before you can comment on or make changes to this bug.