RedirectSink cyclic reference doesn't get dropped
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: valentin, Assigned: valentin)
References
Details
(4 keywords, Whiteboard: [necko-triaged][necko-priority-queue])
Attachments
(6 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
In reply to Emilio Cobos Álvarez (:emilio) from bug 1658571 comment #98)
So I dug a bit more in order to try to fix this for good. The extra reference comes from here:
#0 mozilla::PreloaderBase::RedirectSink::AddRef() (this=0x7fe2300672e0) at /home/emilio/src/moz/gecko/uriloader/preload/PreloaderBase.cpp:56 #1 0x00007fe2589fb6d5 in nsCOMPtr<nsIInterfaceRequestor>::assign_with_AddRef(nsISupports*) (this=0x7fe230014c08, aRawPtr=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:1178 #2 nsCOMPtr<nsIInterfaceRequestor>::operator=(nsIInterfaceRequestor*) (this=0x7fe230014c08, aRhs=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:691 #3 0x00007fe2589f7060 in nsBaseChannel::SetNotificationCallbacks(nsIInterfaceRequestor*) (this=0x7fe230014ae0, aCallbacks=0x7fe2300672e0) at /home/emilio/src/moz/gecko/netwerk/base/nsBaseChannel.cpp:535 #4 0x00007fe258d998ea in mozilla::net::HttpBaseChannel::SetupReplacementChannel(nsIURI*, nsIChannel*, bool, unsigned int) (this=0x7fe231741830, newURI=0x7fe23004d8b0, newChannel=<optimized out>, preserveMethod=<optimized out>, redirectFlags=2) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpBaseChannel.cpp:3819 #5 0x00007fe258da349d in mozilla::net::HttpChannelChild::SetupRedirect(nsIURI*, mozilla::net::nsHttpResponseHead const*, unsigned int const&, nsIChannel**) (this=0x7fe231741800, uri=0x7fe23004d8b0, responseHead=0x7fe233c0d1e8, redirectFlags=@0x7fe233c0d09c: 2, outChannel=0x7ffc79011160) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1309 #6 0x00007fe258da3b10 in mozilla::net::HttpChannelChild::Redirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTSubstring<char> const&, unsigned long const&, mozilla::net::ResourceTimingStructArgs const&) (this=0x7fe231741800, registrarId=@0x7fe233c0d008: 52, newOriginalURI=..., newLoadFlags=@0x7fe233c0d098: 262144, redirectFlags=@0x7fe233c0d09c: 2, loadInfoForwarder=..., responseHead=..., securityInfoSerialization=..., channelId=@0x7fe233c0d2b0: 0, timing=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1358 #7 0x00007fe258db53f2 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30::operator()() const (this=<optimized out>) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1271 #8 std::__invoke_impl<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(std::__invoke_other, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__f=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60 #9 std::__invoke_r<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__fn=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110 #10 std::_Function_handler<void (), mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291 #11 0x00007fe258d3ae5e in mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) (this=0x7fe235aac280, aCallback=<optimized out>, aAssertionWhenNotQueued=false) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/mozilla/net/ChannelEventQueue.h:240 #12 0x00007fe258da31b6 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&) (this=<optimized out>, aRegistrarId=<optimized out>, aNewUri=..., aNewLoadFlags=@0x7ffc79011540: 2030108672, aRedirectFlags=@0x7ffc79011450: 2030114864, aLoadInfoForwarder=..., aResponseHead=..., aSecurityInfoSerialization=<gNullChar> "", aChannelId=@0x7ffc79011808: 0, aOldPeerAddr=..., aTiming=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1267 #13 0x00007fe2591ae9af in mozilla::net::PHttpChannelChild::OnMessageReceived(IPC::Message const&) (this=<optimized out>, msg__=...) at PHttpChannelChild.cpp:644 #14 0x00007fe2590c91f0 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7fe2634ca020, msg__=...) at PContentChild.cpp:8656 #15 0x00007fe258fe7763 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) (this=0x7fe2634ca118, aProxy=0x7fe2449e5440, aMsg=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2150 #16 0x00007fe258fe6596 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7fe2634ca118, aMsg=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2074 #17 0x00007fe258fe6d29 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7fe2634ca118, aTask=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1922 #18 0x00007fe258fe7070 in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7fe23e376890) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1953 #19 0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run() (this=0x7fe231779e40) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146 #20 0x00007fe2588f6180 in mozilla::RunnableTask::Run() (this=0x7fe23003dd00) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244 #21 0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514 #22 0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373 #23 0x00007fe2588db893 in mozilla::TaskController::ProcessPendingMTTask(bool) (this=0x7fe2438b9380, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:170 #24 0x00007fe2588eed0d in mozilla::TaskController::InitializeInternal()::$_0::operator()() const (this=<optimized out>) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:84 #25 mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() (this=<optimized out>) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsThreadUtils.h:577 #26 0x00007fe2588e57cf in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fe2438805f0, aMayWait=<optimized out>, aResult=0x7ffc79013cc7) at /home/emilio/src/moz/gecko/xpcom/threads/nsThread.cpp:1234 #27 0x00007fe2588e8b39 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fe2300672e0, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/nsThreadUtils.cpp:513 #28 0x00007fe258fe975a in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7fe2634ab8d0, aDelegate=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/glue/MessagePump.cpp:87 #29 0x00007fe258f704e3 in MessageLoop::RunInternal() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:334 #30 0x00007fe258f7043b in MessageLoop::RunHandler() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:327
We're setting up some kind of redirect via
HttpChannelChild::RecvRedirect1Begin
, so we end up inHttpBaseChannel::SetupReplacementChannel
with anewChannel
that is amozilla::net::nsInputStreamChannel
. The redirect process works out alright afaict (we properly go throughnsAsyncRedirectVerifyHelper
, callOnStopRequest
, etc).But then
RedirectSink::AsyncOnChannelRedirect
keeps the new channel alive (mRedirectChannel = aNewChannel
), and the newnsInputStreamChannel
never releases its callbacks, so we remain with a cyclic reference there. That cyclic reference is supposed to be cleaned up inOnRedirectResult
but that call never arrives. I think that's the underlying bug. The last point in the content process before OnStopRequest where we do something interesting with the channel is here, with this stack:1 0x00007fe258da57be in non-virtual thunk to mozilla::net::HttpChannelChild::OnRedirectVerifyCallback(nsresult) () at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsIChildChannel.h:29 #2 0x00007fe2589fcc49 in mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run (this=0x7fe2300f91c0) at /home/emilio/src/moz/gecko/netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40 #3 0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run (this=0x7fe2300f9200) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146 #4 0x00007fe2588f6180 in mozilla::RunnableTask::Run (this=0x7fe2317a0900) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244 #5 0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514 #6 0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373
At this point I'm not familiar enough with necko to debug what is the right fix for that, I'll try to upload a trace to pernosco so that someone with more Necko experience can diagnose it, because as things are, we still leak the
RedirectSink
and thensInputStreamChannel
, as I understand it (we just don't leak the whole window).For reference (if all goes well and I can send the trace just fine, this is a ryzen machine so rr support is recent), the leaked Preloaders (without my patch) are:
PreloaderBase::NotifyOpen(0x7fe23002bc00, aChannel = 0x7fe231741080, aIsPreload = 0, sink = 0x7fe230067150, uri = https://t.channeladvisor.com/v2/12021789.js) PreloaderBase::NotifyOpen(0x7fe23002ec00, aChannel = 0x7fe231743080, aIsPreload = 0, sink = 0x7fe2300678d0, uri = https://www.googletagmanager.com/gtag/js?id=DC-4362844) PreloaderBase::NotifyOpen(0x7fe23002c400, aChannel = 0x7fe231741880, aIsPreload = 0, sink = 0x7fe2300672e0, uri = https://resources.xg4ken.com/js/v2/ktag.js?tid=KT-N4270-3EB) PreloaderBase::NotifyOpen(0x7fe23002d400, aChannel = 0x7fe231742880, aIsPreload = 0, sink = 0x7fe230067420, uri = https://connect.facebook.net/en_US/fbevents.js) PreloaderBase::NotifyOpen(0x7fe2359c0800, aChannel = 0x7fe235ddd080, aIsPreload = 0, sink = 0x7fe235dfbba0, uri = https://s2.go-mpulse.net/boomerang/UDCTN-B4SVJ-MYFJ5-V5Q9A-U84KB) PreloaderBase::NotifyOpen(0x7fe23593c000, aChannel = 0x7fe234561880, aIsPreload = 0, sink = 0x7fe23598c240, uri = https://recs.richrelevance.com/rrserver/p13n_generated.js?a=30280c406d639577&ts=1600969520980&v=1.2.6.20180926&ssl=t&pt=%7Chome_page.core_ol_rec_1%7Chome_page.core_ol_rec_2&u=55146031833627079542704514391466723154&s=551460318336270795427045143914667231542020824&cts=https%3A%2F%2Fwww.rei.com&l=1)
Comment 1•4 years ago
|
||
FWIW, the rr trace can't be processed by pernosco because of https://github.com/mozilla/rr/issues/2698.
If that's fixed I can try to upload it to pernosco again, but given this is 100% reproducible following the steps from bug 1658571 comment 74 debugging this locally is doable.
This is the tree I used to debug this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8acee1b1530e9d4df8e710d9e7728f90223c7912 (the dump
patch should apply cleanly I think, so that may be all you need).
And this is the script I used to analyze the output to know which objects leaked: https://gist.github.com/emilio/13242b4276b68240f19b1640eb5282be
Still let me know if you need to and I can try to record it on my intel laptop and upload it to pernosco or what not.
Comment 2•4 years ago
•
|
||
FWIW, the impact from bug 1658571 can be clearly observed on the telemetry graphs for the 95th percentile of the CYCLE_COLLECTOR_MAX_PAUSE probe (see bug 1677569): https://mzl.la/3lEiOn3
The numbers went up from ~40 to ~53 when bug 1618292 landed (May 11), and down to ~49 when bug 1658571 landed (Sept 24). So maybe the remaining ~9 come from this bug?
Comment 3•4 years ago
|
||
I've seen many performance bugs filed recently where the profiles were showing a lot of time in cycle collection in the content process. If there's a possibility that this bug is the cause for that, we should treat it with a very high priority.
Updated•4 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Assignee | ||
Comment 5•3 years ago
|
||
The test case where a global eventsink cancels the redirect will always reproduce the leak on ASAN builds.
As Emilio has pointed out, the problem seems to come from the fact that we never call OnRedirectResult when the redirection fails.
That seems to be because we just delete the new channel
I think we need some more tests for failure cases with IPC redirection.
The one I added just tests async failure from asyncOnChannelRedirect, but it would be nice to also check:
- synchronously throwing from asyncOnChannelRedirect in parent process
- synchronously throwing from asyncOnChannelRedirect in content process
- asyncly failing from asyncOnChannelRedirect in content process
Comment 6•3 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:valentin, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
This is just the test that triggers the issue.
I'm still working on a fix. My latest patch was causing a bunch of oranges on try.
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Depends on D133646
Assignee | ||
Comment 9•3 years ago
|
||
Depends on D140567
Assignee | ||
Comment 10•3 years ago
|
||
Depends on D140568
Assignee | ||
Comment 11•3 years ago
|
||
test_redirect-caching_failure_wrap.js is still failing, because it hits the same codepath as the redirect veto.
We should have different IPDL messages for veto fail and asyncOpen fail.
Depends on D140569
Comment 12•2 years ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P1.
Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on browser UI: Causes noticeable jank
Impact on site: Causes noticeable jank
[x] Causes severe resource usage
[x] Able to reproduce locally
[x] Bug affects multiple sites
[x] Multiple reporters
Assignee | ||
Comment 13•2 years ago
|
||
This is my main priority right now.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 14•2 years ago
|
||
Depends on D140569
Assignee | ||
Comment 15•2 years ago
|
||
Depends on D161432
Updated•2 years ago
|
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2f7be271681d
https://hg.mozilla.org/mozilla-central/rev/080ddc06d8b6
https://hg.mozilla.org/mozilla-central/rev/d98634b73083
https://hg.mozilla.org/mozilla-central/rev/d7c4c70b21f8
https://hg.mozilla.org/mozilla-central/rev/53960d81dacd
https://hg.mozilla.org/mozilla-central/rev/fdddcff65d09
Updated•2 years ago
|
Description
•