Closed Bug 1667316 Opened 4 years ago Closed 2 years ago

RedirectSink cyclic reference doesn't get dropped

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
109 Branch
Performance Impact high
Tracking Status
firefox-esr102 --- wontfix
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- fixed

People

(Reporter: valentin, Assigned: valentin)

References

Details

(4 keywords, Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(6 files, 1 obsolete file)

In reply to Emilio Cobos Álvarez (:emilio) from bug 1658571 comment #98)

So I dug a bit more in order to try to fix this for good. The extra reference comes from here:

#0  mozilla::PreloaderBase::RedirectSink::AddRef() (this=0x7fe2300672e0) at /home/emilio/src/moz/gecko/uriloader/preload/PreloaderBase.cpp:56
#1  0x00007fe2589fb6d5 in nsCOMPtr<nsIInterfaceRequestor>::assign_with_AddRef(nsISupports*) (this=0x7fe230014c08, aRawPtr=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:1178
#2  nsCOMPtr<nsIInterfaceRequestor>::operator=(nsIInterfaceRequestor*) (this=0x7fe230014c08, aRhs=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:691
#3  0x00007fe2589f7060 in nsBaseChannel::SetNotificationCallbacks(nsIInterfaceRequestor*) (this=0x7fe230014ae0, aCallbacks=0x7fe2300672e0) at /home/emilio/src/moz/gecko/netwerk/base/nsBaseChannel.cpp:535
#4  0x00007fe258d998ea in mozilla::net::HttpBaseChannel::SetupReplacementChannel(nsIURI*, nsIChannel*, bool, unsigned int)
    (this=0x7fe231741830, newURI=0x7fe23004d8b0, newChannel=<optimized out>, preserveMethod=<optimized out>, redirectFlags=2) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpBaseChannel.cpp:3819
#5  0x00007fe258da349d in mozilla::net::HttpChannelChild::SetupRedirect(nsIURI*, mozilla::net::nsHttpResponseHead const*, unsigned int const&, nsIChannel**)
    (this=0x7fe231741800, uri=0x7fe23004d8b0, responseHead=0x7fe233c0d1e8, redirectFlags=@0x7fe233c0d09c: 2, outChannel=0x7ffc79011160) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1309
#6  0x00007fe258da3b10 in mozilla::net::HttpChannelChild::Redirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTSubstring<char> const&, unsigned long const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=0x7fe231741800, registrarId=@0x7fe233c0d008: 52, newOriginalURI=..., newLoadFlags=@0x7fe233c0d098: 262144, redirectFlags=@0x7fe233c0d09c: 2, loadInfoForwarder=..., responseHead=..., securityInfoSerialization=..., channelId=@0x7fe233c0d2b0: 0, timing=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1358
#7  0x00007fe258db53f2 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30::operator()() const (this=<optimized out>)
    at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1271
#8  std::__invoke_impl<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(std::__invoke_other, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__f=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
#9  std::__invoke_r<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__fn=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110
#10 std::_Function_handler<void (), mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30>::_M_invoke(std::_Any_data const&) (__functor=...)
    at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291
#11 0x00007fe258d3ae5e in mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) (this=0x7fe235aac280, aCallback=<optimized out>, aAssertionWhenNotQueued=false)
    at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/mozilla/net/ChannelEventQueue.h:240
#12 0x00007fe258da31b6 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=<optimized out>, aRegistrarId=<optimized out>, aNewUri=..., aNewLoadFlags=@0x7ffc79011540: 2030108672, aRedirectFlags=@0x7ffc79011450: 2030114864, aLoadInfoForwarder=..., aResponseHead=..., aSecurityInfoSerialization=<gNullChar> "", aChannelId=@0x7ffc79011808: 0, aOldPeerAddr=..., aTiming=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1267
#13 0x00007fe2591ae9af in mozilla::net::PHttpChannelChild::OnMessageReceived(IPC::Message const&) (this=<optimized out>, msg__=...) at PHttpChannelChild.cpp:644
#14 0x00007fe2590c91f0 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7fe2634ca020, msg__=...) at PContentChild.cpp:8656
#15 0x00007fe258fe7763 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) (this=0x7fe2634ca118, aProxy=0x7fe2449e5440, aMsg=...)
    at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2150
#16 0x00007fe258fe6596 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7fe2634ca118, aMsg=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2074
#17 0x00007fe258fe6d29 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7fe2634ca118, aTask=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1922
#18 0x00007fe258fe7070 in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7fe23e376890) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1953
#19 0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run() (this=0x7fe231779e40) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#20 0x00007fe2588f6180 in mozilla::RunnableTask::Run() (this=0x7fe23003dd00) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#21 0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#22 0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373
#23 0x00007fe2588db893 in mozilla::TaskController::ProcessPendingMTTask(bool) (this=0x7fe2438b9380, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:170
#24 0x00007fe2588eed0d in mozilla::TaskController::InitializeInternal()::$_0::operator()() const (this=<optimized out>) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:84
#25 mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() (this=<optimized out>) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsThreadUtils.h:577
#26 0x00007fe2588e57cf in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fe2438805f0, aMayWait=<optimized out>, aResult=0x7ffc79013cc7) at /home/emilio/src/moz/gecko/xpcom/threads/nsThread.cpp:1234
#27 0x00007fe2588e8b39 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fe2300672e0, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/nsThreadUtils.cpp:513
#28 0x00007fe258fe975a in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7fe2634ab8d0, aDelegate=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/glue/MessagePump.cpp:87
#29 0x00007fe258f704e3 in MessageLoop::RunInternal() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:334
#30 0x00007fe258f7043b in MessageLoop::RunHandler() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:327

We're setting up some kind of redirect via HttpChannelChild::RecvRedirect1Begin, so we end up in HttpBaseChannel::SetupReplacementChannel with a newChannel that is a mozilla::net::nsInputStreamChannel. The redirect process works out alright afaict (we properly go through nsAsyncRedirectVerifyHelper, call OnStopRequest, etc).

But then RedirectSink::AsyncOnChannelRedirect keeps the new channel alive (mRedirectChannel = aNewChannel), and the new nsInputStreamChannel never releases its callbacks, so we remain with a cyclic reference there. That cyclic reference is supposed to be cleaned up in OnRedirectResult but that call never arrives. I think that's the underlying bug. The last point in the content process before OnStopRequest where we do something interesting with the channel is here, with this stack:

1  0x00007fe258da57be in non-virtual thunk to mozilla::net::HttpChannelChild::OnRedirectVerifyCallback(nsresult) () at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsIChildChannel.h:29
#2  0x00007fe2589fcc49 in mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run (this=0x7fe2300f91c0) at /home/emilio/src/moz/gecko/netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40
#3  0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run (this=0x7fe2300f9200) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#4  0x00007fe2588f6180 in mozilla::RunnableTask::Run (this=0x7fe2317a0900) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#5  0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#6  0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373

At this point I'm not familiar enough with necko to debug what is the right fix for that, I'll try to upload a trace to pernosco so that someone with more Necko experience can diagnose it, because as things are, we still leak the RedirectSink and the nsInputStreamChannel, as I understand it (we just don't leak the whole window).

For reference (if all goes well and I can send the trace just fine, this is a ryzen machine so rr support is recent), the leaked Preloaders (without my patch) are:

PreloaderBase::NotifyOpen(0x7fe23002bc00, aChannel = 0x7fe231741080, aIsPreload = 0, sink = 0x7fe230067150, uri = https://t.channeladvisor.com/v2/12021789.js)
PreloaderBase::NotifyOpen(0x7fe23002ec00, aChannel = 0x7fe231743080, aIsPreload = 0, sink = 0x7fe2300678d0, uri = https://www.googletagmanager.com/gtag/js?id=DC-4362844)
PreloaderBase::NotifyOpen(0x7fe23002c400, aChannel = 0x7fe231741880, aIsPreload = 0, sink = 0x7fe2300672e0, uri = https://resources.xg4ken.com/js/v2/ktag.js?tid=KT-N4270-3EB)
PreloaderBase::NotifyOpen(0x7fe23002d400, aChannel = 0x7fe231742880, aIsPreload = 0, sink = 0x7fe230067420, uri = https://connect.facebook.net/en_US/fbevents.js)
PreloaderBase::NotifyOpen(0x7fe2359c0800, aChannel = 0x7fe235ddd080, aIsPreload = 0, sink = 0x7fe235dfbba0, uri = https://s2.go-mpulse.net/boomerang/UDCTN-B4SVJ-MYFJ5-V5Q9A-U84KB)
PreloaderBase::NotifyOpen(0x7fe23593c000, aChannel = 0x7fe234561880, aIsPreload = 0, sink = 0x7fe23598c240, uri = https://recs.richrelevance.com/rrserver/p13n_generated.js?a=30280c406d639577&ts=1600969520980&v=1.2.6.20180926&ssl=t&pt=%7Chome_page.core_ol_rec_1%7Chome_page.core_ol_rec_2&u=55146031833627079542704514391466723154&s=551460318336270795427045143914667231542020824&cts=https%3A%2F%2Fwww.rei.com&l=1)

FWIW, the rr trace can't be processed by pernosco because of https://github.com/mozilla/rr/issues/2698.

If that's fixed I can try to upload it to pernosco again, but given this is 100% reproducible following the steps from bug 1658571 comment 74 debugging this locally is doable.

This is the tree I used to debug this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8acee1b1530e9d4df8e710d9e7728f90223c7912 (the dump patch should apply cleanly I think, so that may be all you need).

And this is the script I used to analyze the output to know which objects leaked: https://gist.github.com/emilio/13242b4276b68240f19b1640eb5282be

Still let me know if you need to and I can try to record it on my intel laptop and upload it to pernosco or what not.

Flags: needinfo?(valentin.gosu)

FWIW, the impact from bug 1658571 can be clearly observed on the telemetry graphs for the 95th percentile of the CYCLE_COLLECTOR_MAX_PAUSE probe (see bug 1677569): https://mzl.la/3lEiOn3
The numbers went up from ~40 to ~53 when bug 1618292 landed (May 11), and down to ~49 when bug 1658571 landed (Sept 24). So maybe the remaining ~9 come from this bug?

I've seen many performance bugs filed recently where the profiles were showing a lot of time in cycle collection in the content process. If there's a possibility that this bug is the cause for that, we should treat it with a very high priority.

Whiteboard: [necko-triaged] → [necko-triaged][qf]
Whiteboard: [necko-triaged][qf] → [necko-triaged][qf:p1:responsiveness]
Blocks: 1677569

The test case where a global eventsink cancels the redirect will always reproduce the leak on ASAN builds.
As Emilio has pointed out, the problem seems to come from the fact that we never call OnRedirectResult when the redirection fails.
That seems to be because we just delete the new channel

I think we need some more tests for failure cases with IPC redirection.
The one I added just tests async failure from asyncOnChannelRedirect, but it would be nice to also check:

  • synchronously throwing from asyncOnChannelRedirect in parent process
  • synchronously throwing from asyncOnChannelRedirect in content process
  • asyncly failing from asyncOnChannelRedirect in content process
Flags: needinfo?(valentin.gosu)

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:valentin, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(kershaw)
Flags: needinfo?(kershaw)

This is just the test that triggers the issue.
I'm still working on a fix. My latest patch was causing a bunch of oranges on try.

Flags: needinfo?(valentin.gosu)
Performance Impact: --- → P1
Whiteboard: [necko-triaged][qf:p1:responsiveness] → [necko-triaged]
Attachment #9255049 - Attachment description: Bug 1667316 - Test case r=#necko → WIP: Bug 1667316 - Test case r=#necko

test_redirect-caching_failure_wrap.js is still failing, because it hits the same codepath as the redirect veto.
We should have different IPDL messages for veto fail and asyncOpen fail.

Depends on D140569

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on browser UI: Causes noticeable jank
Impact on site: Causes noticeable jank
[x] Causes severe resource usage
[x] Able to reproduce locally
[x] Bug affects multiple sites
[x] Multiple reporters

This is my main priority right now.

Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]
Attachment #9255049 - Attachment description: WIP: Bug 1667316 - Test case r=#necko → Bug 1667316 - Test case r=#necko
Attachment #9266911 - Attachment description: WIP: Bug 1667316 - Unit test for veto-ing redirects r=#necko → Bug 1667316 - Unit test for veto-ing redirects r=#necko
Attachment #9266912 - Attachment description: WIP: Bug 1667316 - Make vetonotifier report rv r=#necko → Bug 1667316 - Make vetonotifier report rv r=#necko
Attachment #9266913 - Attachment description: WIP: Bug 1667316 - Make sure to also call OnRedirectResult for failed redirects r=#necko → Bug 1667316 - Pass nsresult status to OnRedirectResult r=#necko
Attachment #9266914 - Attachment is obsolete: true
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2f7be271681d
Test case r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/080ddc06d8b6
Unit test for veto-ing redirects r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/d98634b73083
Make vetonotifier report rv r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/d7c4c70b21f8
Pass nsresult status to OnRedirectResult r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/53960d81dacd
Also call child redirect vetoHooks for failed redirects r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/fdddcff65d09
Cleanup redirect handling debug logs r=necko-reviewers,jesup
Regressions: 1800884
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: