Fix tests that regressed when browser.tabs.documentchannel.parent-controlled is enabled.
Categories
(Core :: DOM: Navigation, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox97 | --- | fixed |
People
(Reporter: annyG, Assigned: annyG)
References
(Blocks 1 open bug)
Details
Attachments
(13 files, 2 obsolete files)
50.02 KB,
image/png
|
Details | |
130.54 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Assignee | ||
Comment 1•3 years ago
•
|
||
I have a patch for bug 1720958 and with the pref enabled, there are a bunch of tests failing. I need to audit the test failures and see which failures simply require test changes or small fixes in existing parent-initiated-load codepath (I can fix them in this bug), and which ones will be fixed once we implement parent-initiated CanonicalBrowsingContext::Reload (bug 1672589) , as well as parent initiated Forward/Back/Go methods (more here).
From my understanding, when Matt was trying to land the parent controlled loads there was only one (geckoview) test that was failing because it started a load and then right away issued a reload. As per convo with Matt "If the load request is handled in the parent, but the reload is forward onto whatever previous content process existed, then the reload finds about:blank and doesn't do anything. This can't really happen on desktop (since the combined stop/reload button will be showing stop, not reload during that time), but it's possible extensions or something might have this issue"
Assignee | ||
Comment 2•3 years ago
|
||
I'm seeing a pattern in a lot of tests that are failing where they call BrowserTestUtils.addTab
with no url, don't wait for anything to finish loading, and then load a specific uri in the same tab using BrowserTestUtils.loadURI
. Because the loadURI call now can be initiated directly from the parent process, without doing a round-trip to the content process, the load now starts earlier. Then, it gets cancelled by the about:blank load that was initiated due to addTab
call. To resolve this issue, we could fix all tests that are failing by passing {skipLoad: true}
as a parameter, so that there is no initial about:blank load and add a comment to BrowserTestUtils.addTab
warning users of the API about this, something along the lines of "If you are planning to load a different uri in this tab shortly after calling this function, you should wait for the initial load in this tab to finish or pass skipLoad: true
as a parameter".
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
As per convo in the Monday Fission meeting (July 19), changing tests to wait for navigation to complete after calling addTab is not ideal, and we should aim to keep the ordering of navigations consistent. The only reason the about:blank load ended up cancelling the loadURI load is because the blank load was content initiated, and the other load was parent initiated.
As per convo with nika today, we could implement a (temporary) epoch system to help order navigations consistently.
Assignee | ||
Comment 4•3 years ago
|
||
I'm currently debugging some test failures here.
Specifically, browser/components/preferences/tests/browser_open_download_preferences.js
test.
Looks like it's timing out here because we never get a load event. With the pref enabled, the load event is dispatched from nsDocumentViewer::LoadComplete
, with a stack trace that looks like this
#1 0x00007fb264fea63c in nsDocShell::EndPageLoad (this=0x7fb24a458c00, aProgress=0x7fb24a458c28, aChannel=0x7fb26ea9adf0, aStatus=nsresult::NS_OK)
at /home/annyg/central/docshell/base/nsDocShell.cpp:6284
#2 0x00007fb264fea116 in nsDocShell::OnStateChange (this=0x7fb24a458c00, aProgress=0x7fb24a458c28, aRequest=0x7fb26ea9adf0, aStateFlags=131088, aStatus=nsresult::NS_OK)
at /home/annyg/central/docshell/base/nsDocShell.cpp:5674
#3 0x00007fb25ca6053c in nsDocLoader::DoFireOnStateChange (this=0x7fb24a458c00, aProgress=0x7fb24a458c28, aRequest=0x7fb26ea9adf0, aStateFlags=@0x7ffd9ed8cb6c: 131088,
aStatus=nsresult::NS_OK) at /home/annyg/central/uriloader/base/nsDocLoader.cpp:1371
#4 0x00007fb25ca5fda5 in nsDocLoader::doStopDocumentLoad (this=0x7fb24a458c00, request=0x7fb26ea9adf0, aStatus=nsresult::NS_OK)
at /home/annyg/central/uriloader/base/nsDocLoader.cpp:969
#5 0x00007fb25ca5de66 in nsDocLoader::DocLoaderIsEmpty (this=0x7fb24a458c00, aFlushLayout=true, aOverrideStatus=...) at /home/annyg/central/uriloader/base/nsDocLoader.cpp:787
#6 0x00007fb25ca5f679 in nsDocLoader::OnStopRequest (this=0x7fb24a458c00, aRequest=0x7fb26ea9adf0, aStatus=-2142568444) at /home/annyg/central/uriloader/base/nsDocLoader.cpp:670
#7 0x00007fb26500c67c in nsDocShell::OnStopRequest (this=0x7fb24a458c00, aRequest=0x7fb26ea9adf0, aStatusCode=-2142568444)
at /home/annyg/central/docshell/base/nsDocShell.cpp:13440
#8 0x00007fb25b19f871 in mozilla::net::nsLoadGroup::NotifyRemovalObservers (this=0x7fb26ea3c1f0, request=0x7fb26ea9adf0, aStatus=-2142568444)
at /home/annyg/central/netwerk/base/nsLoadGroup.cpp:614
#9 0x00007fb25b1a0e2f in mozilla::net::nsLoadGroup::RemoveRequest
I don't think we have a load group in the parent process (I have to confirm this), which I think is the reason why we don't emit the load event. To emit a load event, we would have to synthesize an onStateChange(STATE_STOP) event and dispatch it to the docshell in the content process. Matt previously mentioned that this work would have to be done, in bug 1647554.
Comment 5•3 years ago
|
||
I guess this load (a pdf file) is just being handled entirely in the parent-process with the pref enabled?
The current behaviour (if I understand correctly) is that the .pdf URI is sent to the content process and creates a DocumentChannel. This handles the download in the parent, and cancels the DocumentChannel with NS_BINDING_REDIRECTED/-2142568444. This was the only request in the LoadGroup in the content process, so we then fire the load event (to the initial about:blank Document?), and BrowserTestUtils serializes it up to the parent for the test to see.
That seems like really weird behaviour that we fire a load event even though we didn't really load anything, and we wouldn't have fired a load event for the initial about:blank otherwise.
Is it possible for this test to wait for STATE_STOP instead of the load event?
Assignee | ||
Comment 6•3 years ago
|
||
(I would have messaged you privately to ask about this but it's nice to have this conversation stashed away under the relevant bugs so I can easily refer to it in future).
That's true, currently we do fire a load event for the initial about:blank document in this test. Could you elaborate on how it is a weird behaviour? If so, I'm confused why that's the current behaviour, unless it's a bug?
Comment 7•3 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #5)
That seems like really weird behaviour that we fire a load event even though we didn't really load anything, and we wouldn't have fired a load event for the initial about:blank otherwise.
It is weird behavior. Unfortunately, it's down to the way that we handle load groups when navigating away from initial about:blank documents, which I also ran into in bug 1696473. The whole DocLoaderIsEmpty setup basically needs to be completely rewritten if we want to do anything remotely sane there...
Comment 8•3 years ago
|
||
Right, so with the new behaviour we don't expose the download URI to the content process at all, and we don't get that load event.
I can think of at least 3 options:
- Send a message to the content process notifying it that it needs to emit the load event for about:blank
- Try to fix the underlying issue and never fire these extra load events
- Change the test (probably a requisite part of 2 as well).
openNewForegroundTab takes an waitForStateStop options which lets it wait on the nsIWebProgressListener STATE_STOP event instead, which might be sufficient.
Assignee | ||
Comment 9•3 years ago
|
||
I decided to go with changing the test, we can look into fixing the issue in follow up work, or re-evaluate it a bit later depending on what else I have to do in this bug.
Assignee | ||
Comment 10•3 years ago
•
|
||
Currently, I'm looking at failures for browser/components/urlbar/tests/browser/browser_stop_pending.js.
In the test, we first load example.com in a tab. Then, once the page has fully loaded, in the same tab we set the value of gURLBar to SLOW_PAGE here, the request for which takes a really long time to complete (and never completes, explained later). Then after a 200ms timeout, we set the value of gURLBar to SLOW_PAGE2 here. The test tries to assert that cancelling of the old load due to the current load does not reset the value of the url bar to the very first page we loaded - example.com.
With the pref enabled and my other changes applied, the behaviour changes and the value of the url bar changes to example.com (despite the fact that we have another load happening), which I can also confirm visually when I run the test is headed mode.
This is happening because in TabProgressListener::onStateChange, upon receiving a STATE_STOP event with a status code that indicates an error and if we see that the browser's isNavigating
value is false, we set the value of the url bar to browser's current URI (in our case example.com). So how does this happen?
When we start the load for SLOW_PAGE2, we set the value of isNavigating to true, then we cancel the other load (SLOW_PAGE), and set isNavigating to false. Cancelling SLOW_PAGE load results in a STATE_STOP event being fired, and that's where we decide to reset the value of the url bar.
Without the pref enabled, the timing of when the STATE_STOP gets fired and when the isNavigating
is different, and thus we don't set the url bar to the previous value. (This part of the explanation will be a bit blurry) The value of isNavigating
is set due to an IPC call, which arrives later, and when STATE_STOP event is received, we don't change the value of the url bar because isNavigating is set to true.
I came up with two solutions
-
we could look into setting isNavigating to false a bit later, not right after we are done executing the first method of loading a uri (as per description) but when the navigation itself completes, and do so while making sure that only current loads can set this value to false. This would require making sure that we set this value to false around the same time both with fission and without, and with this pref, and without this pref.
-
given that we have been considering getting rid of
isNavigating
field, we could go about this differently. In the cases when a load is cancelled due to another one starting, e.g. reload, going forward/back, going to index, etc, we could cancel the load using a distinct error code. Then, in TabProgressListener::onStateChange event, we could look at the status code and before resetting the url bar, check that the status code does not match that distinct error code (instead of checking the browser'sisNavigating
value)
Solution 2 seems like a more reasonable solution (I mentioned it in a meeting to kmag, who agreed), because hopefully it will also allow us to remove isNavigating
field, so this is what i'm going with. I will need to add more tests to make sure that the url bar does not reset in the aforementioned cases, e.g. reload, going forward/back, going to index. And most importantly, I will need to make sure that I audit other uses of isNavigating and look for a work around (perhaps this could be done in a follow up, and not as part of this work?) For now we could just get rid of the usage of isNavigating
in TabProgressListener::onStateChange
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
Since I'm currently debugging tests from https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Cusercancel%2Crunnable&revision=bbed78bd145d946a580ac0ac5a427d70a36d9a1b&selectedTaskRun=S2Gl4oaGTTmi4V6plHiFSQ.0 and looking towards remaining tasks that Matt filed for bug 1647550 as a guide for might need to get done (since things have changed a lot in the dom code as fission progressed), I figured I'll update my progress and add what tests remain to be investigated:
(fission, linux) toolkit/components/passwordmgr/test/browser/browser_context_menu.js
(fission, linux) browser/base/content/test/tabPrompts/browser_contentOrigins.js
(android) org.mozilla.geckoview.test.NavigationDelegateTest#redirectIntentLoad
(possibly) xpcshell.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentScripts_register.js
Assignee | ||
Comment 12•3 years ago
|
||
Assignee | ||
Comment 13•3 years ago
|
||
Assignee | ||
Comment 14•3 years ago
|
||
Hi Gijs and Micah, :mccr8 told me you have been looking into dialogs and navigation and I was wondering if the following test failure is related to what you have been looking into:
I am working in the DocumentChannel code to allow navigations that are triggered in the parent process to proceed in the parent process without having to bounce to the child process which would then send it up to the parent process (this is what is happening currently, and it's not the most effective way since we have this extra bounce). This new way of controlling navigations from the parent process can be enabled by setting pref browser.tabs.documentchannel.parent-controlled
to true. As I am trying to enable this pref by default, I have been fixing various test failures, and one of the test failures I have run into is in browser/base/content/test/tabPrompts/browser_contentOrigins.js (happens with or without fission, on linux + mac, though I was always debugging it with fission on, and when writing this comment I was debugging it on mac).
(words prompt/dialog are used interchangeably below)
The test times out here https://searchfox.org/mozilla-central/rev/e98b9b6a20123cdb1d58ca683de123ce61f1e1a6/browser/base/content/test/tabPrompts/browser_contentOrigins.js#205-212 because we never get a prompt shown for the xorigin-subframe because we get a prompt that we didn't expect that does not get dismissed (more on it below). When I comment out part of the test that does the simple load, the prompt for the xorigin-subframe does show up. So since the issue is most likely occurring with the "simple load" case I decided to comment out the xorigin-subframe dialog part (as well as test case test_check_prompt_origin_display
, and add a longer timeout at the end of the test_check_auth
test case to better understand what's going on.
It seems that after we dismiss the initial dialog (see "initial dialog.png") that pops up when we navigate to the forbidden page and after we have closed the forbidden page, we get another dialog that looks like this (see "unexpected dialog.png").
Debugging showed that the prompt was created as a result of the following callstack:
3:12.14 GECKO(200023) #01: nsWindowWatcher::GetPrompt(mozIDOMWindowProxy*, nsID const&, void**) [/home/annyg/central/toolkit/components/windowwatcher/nsWindowWatcher.cpp:1452]
3:12.15 GECKO(200023) #02: mozilla::net::ParentChannelListener::GetAuthPrompt(unsigned int, nsID const&, void**) [/home/annyg/central/netwerk/protocol/http/ParentChannelListener.cpp:250]
3:12.15 GECKO(200023) #03: mozilla::net::ParentChannelListener::GetInterface(nsID const&, void**) [/home/annyg/central/netwerk/protocol/http/ParentChannelListener.cpp:175]
3:12.16 GECKO(200023) #04: mozilla::dom::(anonymous namespace)::ClientChannelHelper::GetInterface(nsID const&, void**) [/home/annyg/central/dom/clients/manager/ClientChannelHelper.cpp:53]
3:12.16 GECKO(200023) #05: nsresult CallGetInterface<nsIInterfaceRequestor, nsIAuthPrompt2>(nsIInterfaceRequestor*, nsIAuthPrompt2**) [/home/annyg/central/obj-debug/dist/include/nsIInterfaceRequestorUtils.h:19]
3:12.17 GECKO(200023) #06: NS_QueryAuthPrompt2(nsIInterfaceRequestor*, nsIAuthPrompt2**) [/home/annyg/central/netwerk/base/nsNetUtil.cpp:2184]
3:12.17 GECKO(200023) #07: mozilla::net::GetAuthPrompt(nsIInterfaceRequestor*, bool, nsIAuthPrompt2**) [/home/annyg/central/netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:332]
3:12.18 GECKO(200023) #08: mozilla::net::nsHttpChannelAuthProvider::PromptForIdentity(unsigned int, bool, nsTSubstring<char> const&, nsTSubstring<char> const&, unsigned int, mozilla::net::nsHttpAuthIdentity&) [/home/annyg/central/netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:1360]
3:12.18 GECKO(200023) [Child 200384, Main Thread] WARNING: IPC message discarded: actor cannot send: file /home/annyg/central/ipc/glue/ProtocolUtils.cpp:527
3:12.18 GECKO(200023) #09: mozilla::net::nsHttpChannelAuthProvider::GetCredentialsForChallenge(nsTSubstring<char> const&, nsTSubstring<char> const&, bool, nsIHttpAuthenticator*, nsTString<char>&) [/home/annyg/central/netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:940]
3:12.18 GECKO(200023) #10: mozilla::net::nsHttpChannelAuthProvider::GetCredentials(nsTSubstring<char> const&, bool, nsTString<char>&) [/home/annyg/central/netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:641]
3:12.19 GECKO(200023) #11: mozilla::net::nsHttpChannelAuthProvider::ProcessAuthentication(unsigned int, bool) [/home/annyg/central/netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:183]
3:12.19 GECKO(200023) #12: mozilla::net::nsHttpChannel::ContinueProcessResponse3(nsresult) [/home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:2326]
3:12.20 GECKO(200023) #13: mozilla::net::nsHttpChannel::ContinueProcessResponse2(nsresult) [/home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:2203]
3:12.20 GECKO(200023) #14: mozilla::net::nsHttpChannel::ContinueProcessResponse1() [/home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:2176]
3:12.20 GECKO(200023) #15: mozilla::net::nsHttpChannel::ProcessResponse() [/home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:2062]
3:12.21 GECKO(200023) #16: mozilla::net::nsHttpChannel::OnStartRequest(nsIRequest*) [/home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:6851]
3:12.21 GECKO(200023) #17: nsInputStreamPump::OnStateStart() [/home/annyg/central/netwerk/base/nsInputStreamPump.cpp:465]
3:12.21 GECKO(200023) [Child 200384: Main Thread]: I/DocShellAndDOMWindowLeak --DOCSHELL 7fd2c688c800 == 0 [pid = 200384] [id = 0] [url = https://example.com/]
3:12.22 GECKO(200023) #18: nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [/home/annyg/central/netwerk/base/nsInputStreamPump.cpp:374]
3:12.22 GECKO(200023) #19: nsInputStreamReadyEvent::Run() [/home/annyg/central/xpcom/io/nsStreamUtils.cpp:0]
3:12.22 GECKO(200023) #20: mozilla::RunnableTask::Run() [/home/annyg/central/xpcom/threads/TaskController.cpp:478]
3:12.23 GECKO(200023) #21: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [/home/annyg/central/xpcom/threads/TaskController.cpp:781]
3:12.23 GECKO(200023) #22: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [/home/annyg/central/xpcom/threads/TaskController.cpp:617]
3:12.24 GECKO(200023) #23: mozilla::TaskController::ProcessPendingMTTask(bool) [/home/annyg/central/xpcom/threads/TaskController.cpp:401]
3:12.24 GECKO(200023) #24: mozilla::TaskController::InitializeInternal()::$_0::operator()() const [/home/annyg/central/xpcom/threads/TaskController.cpp:126]
and that the callback nsInputStreamReadyEvent
(which shows up in stack frame #19 above) was created as a result of the following callstack:
#0 0x00007f40aaaab9c6 in nsInputStreamReadyEvent::nsInputStreamReadyEvent(char const*, nsIInputStreamCallback*, nsIEventTarget*, unsigned int)
(this=0x7f407c6ac0b0, aName=0x7f40a4cca097 "nsPipeInputStream::AsyncWait", aCallback=0x7f408caaee48, aTarget=0x7f40be71d200, aPriority=3) at /home/annyg/central/xpcom/io/nsStreamUtils.cpp:47
#1 0x00007f40aaa98aa5 in NS_NewInputStreamReadyEvent(char const*, nsIInputStreamCallback*, nsIEventTarget*, unsigned int)
(aName=0x7f40a4cca097 "nsPipeInputStream::AsyncWait", aCallback=0x7f408caaee48, aTarget=0x7f40be71d200, aPriority=3) at /home/annyg/central/xpcom/io/nsStreamUtils.cpp:211
#2 0x00007f40aaa988ab in nsPipeInputStream::AsyncWait(nsIInputStreamCallback*, unsigned int, unsigned int, nsIEventTarget*) (this=0x7f40705a9160, aCallback=0x7f408caaee48, aFlags=0, aRequestedCount=0, aTarget=0x7f40be71d200)
at /home/annyg/central/xpcom/io/nsPipe3.cpp:1409
#3 0x00007f40aae5c7a3 in nsInputStreamPump::EnsureWaiting() (this=0x7f408caaee40)
at /home/annyg/central/netwerk/base/nsInputStreamPump.cpp:110
#4 0x00007f40aae5dc0d in nsInputStreamPump::AsyncRead(nsIStreamListener*)
(this=0x7f408caaee40, listener=0x7f408b3e55e8) at /home/annyg/central/netwerk/base/nsInputStreamPump.cpp:327
#5 0x00007f40ab6c8cb3 in mozilla::net::nsHttpTransaction::AsyncRead(nsIStreamListener*, nsIRequest**)
(this=0x7f4069e14800, listener=0x7f408b3e55e8, pump=0x7f408b3e56a8)
at /home/annyg/central/netwerk/protocol/http/nsHttpTransaction.cpp:432
#6 0x00007f40ab6195e7 in mozilla::net::nsHttpChannel::DoConnectActual(mozilla::net::HttpTransactionShell*)
(this=0x7f408b3e5000, aTransWithStickyConn=0x0)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:870
#7 0x00007f40ab6191a6 in mozilla::net::nsHttpChannel::DoConnect(mozilla::net::HttpTransactionShell*)
(this=0x7f408b3e5000, aTransWithStickyConn=0x0)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:846
#8 0x00007f40ab666dd8 in mozilla::net::nsHttpChannel::ContinueDoAuthRetry(mozilla::net::HttpTransactionShell*, std::function<nsresult (mozilla::net::nsHttpChannel*, nsresult)> const&)::$_27::operator()<mozilla::net::nsHttpChannel>(mozilla::net::nsHttpChannel*) const (this=0x7f40873fe9d0, self=0x7f408b3e5000)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:8212
#9 0x00007f40ab666c02 in std::_Function_handler<nsresult (mozilla::net::nsHttpChannel*), mozilla::net::nsHttpChannel::ContinueDoAuthRetry(mozilla::net::HttpTransactionShell*, std::function<nsresult (mozilla::net::nsHttpChannel*, nsresult)> const&)::$_27>::_M_invoke(std::_Any_data const&, mozilla::net::nsHttpChannel*&&)
(__functor=..., __args=@0x7ffccc160010: 0x7f408b3e5000)
at /home/annyg/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:301
#10 0x00007f40ab683c7e in std::function<nsresult (mozilla::net::nsHttpChannel*)>::operator()(mozilla::net::nsHttpChannel*) const (this=0x7ffccc160138, __args=0x7f408b3e5000)
at /home/annyg/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
#11 0x00007f40ab613a6e in mozilla::net::nsHttpChannel::CallOrWaitForResume(std::function<nsresult (mozilla::net::nsHttpChannel*)> const&) (this=0x7f408b3e5000, aFunc=...)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:8647
#12 0x00007f40ab6455c4 in mozilla::net::nsHttpChannel::ContinueDoAuthRetry(mozilla::net::HttpTransactionShell*, std::function<nsresult (mozilla::net::nsHttpChannel*, nsresult)> const&)
(this=0x7f408b3e5000, aTransWithStickyConn=0x0, aContinueOnStopRequestFunc=...)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:8210
#13 0x00007f40ab666a59 in mozilla::net::nsHttpChannel::DoAuthRetry(mozilla::net::HttpTransactionShell*, std::function<nsresult (mozilla::net::nsHttpChannel*, nsresult)> const&)::$_26::operator()<mozilla::net::nsHttpChannel>(mozilla::net::nsHttpChannel*) const (this=0x7f40873fe970, self=0x7f408b3e5000)
at /home/annyg/central/netwerk/protocol/http/nsHttpChannel.cpp:8165
#14 0x00007f40ab666882 in std::_Function_handler<nsresult (mozilla::net::nsHttpChannel*), mozilla::net::nsHttpChannel::DoAuthRetry(mozilla::net::HttpTransactionShell*, std::function<nsresult (mozilla::net::nsHttpChannel*, nsresult)> const&)::$_26>::_M_invoke(std::_Any_data const&, mozilla::net::nsHttpChannel*&&)
(__functor=..., __args=@0x7ffccc160210: 0x7f408b3e5000)
at /home/annyg/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:301
#15 0x00007f40ab683c7e in std::function<nsresult (mozilla::net::nsHttpChannel*)>::operator()(mozilla::net::nsHttpChannel*) const (this=0x7ffccc1602e0, __args=0x7f408b3e5000)
at /home/annyg/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
#16 0x00007f40ab613a6e in mozilla::net::nsHttpChannel::CallOrWaitForResume(std::function<nsresult (mozilla::net::nsHttpChannel*)> const&) (this=0x7f408b3e5000, aFunc=...)
I am not sure what would cause this test failure, maybe something changed timing wise when the pref is enabled? Any insights would be appreciated, because I'm not sure if you are already looking into something like this or if it's an unrelated issue and I should dive even further into prompts related to http and authorizations.
You can replicate the test failures I'm mentioning by simply having the latest mozilla central and running test browser/base/content/test/tabPrompts/browser_contentOrigins.js
with pref browser.tabs.documentchannel.parent-controlled=true and fission enabled (no need to add any other patches because some code related to parent controlled navigation has been in the tree for sometime hidden behind the pref, and now I'm just trying to enable it and fix any test failures).
Thanks a lot in advance!
Comment 15•3 years ago
|
||
I am not sure what would cause this test failure, maybe something changed timing wise when the pref is enabled? Any insights would be appreciated, because I'm not sure if you are already looking into something like this or if it's an unrelated issue and I should dive even further into prompts related to http and authorizations.
I'm afraid I don't think we're looking into something like this necessarily, no.
I suspect that the relevant question is why the stack for DoAuthRetry
that causes a renewed http request with a second prompt is not triggered with the browser.tabs.documentchannel.parent-controlled
pref set to false (or perhaps that part is triggered, but we abort attempting to show a prompt later?). It would seem to imply that after closing the initial prompt, we re-try the connection and reprompt. The fact that it's a different-looking dialog suggests that the prompt code either did not get a browsing context associated with the prompt or that it decided it couldn't display an in-content/in-tab prompt for that browsing context. My suspicion would therefore be that there is a "stray" extra connection attempt in the parent-controlled
case that doesn't occur in the pre-existing case, but I'm not familiar enough with the http / document channel code to know why that would be...
Assignee | ||
Comment 16•3 years ago
|
||
Thanks for your response! I've talked to Nika about this and she speculated it might be because we are not cancelling loads when we are discarding a canonical BC, which turned out to be the reason! I'm going to try to write some more tests for this behaviour, so that we have a bit more coverage.
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
•
|
||
Currently I'm debugging a timeout in test org.mozilla.geckoview.test.NavigationDelegateTest#redirectIntentLoad
on android.
The timeout occurs in a call to waitForPageStop()
.
In the logs, I see an assertion failure shortly before a timeout exception gets thrown, and the logs look like this:
09-27 12:09:26.438 14813 14829 I Gecko : [Child 14813, Main Thread] ###!!! ASSERTION: We should have connected the pending channel!: 'pending', file /Users/agakhokidze/src/gecko/docshell/base/nsDocShell.cpp:13042
09-27 12:09:26.447 14786 14805 I Gecko : [Parent 14786, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, mozilla::ipc::IPCResult::Ok()) failed with result 0x80004002 (NS_NOINTERFACE): file /Users/agakhokidze/src/gecko/dom/ipc/ContentParent.cpp:4480
09-27 12:09:26.448 14786 14805 I Gecko : [Parent 14786, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /Users/agakhokidze/src/gecko/uriloader/exthandler/nsExternalHelperAppService.cpp:1101
09-27 12:09:26.453 14786 14805 I Gecko : [Parent 14786: Main Thread]: V/DocumentChannel DocumentLoadListener RedirectToRealChannelFinished [this=9f6cf8b0, aRv=804b0002 ]
09-27 12:09:26.454 14786 14805 I Gecko : [Parent 14786: Main Thread]: V/DocumentChannel DocumentLoadListener FinishReplacementChannelSetup [this=9f6cf8b0, aResult=804b0002]
09-27 12:09:26.454 14786 14805 I Gecko : [Parent 14786, Main Thread] WARNING: NS_ENSURE_TRUE(mSuspendCount > 0) failed: file /Users/agakhokidze/src/gecko/netwerk/protocol/http/nsHttpChannel.cpp:5562
09-27 12:09:26.454 14786 14805 I Gecko : [Parent 14786: Main Thread]: V/DocumentChannel DocumentLoadListener dtor [this=9f6cf8b0]
09-27 12:09:49.875 9285 9339 D ConnectivityService: NetworkAgentInfo [MOBILE (LTE) - 102] validation failed
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: Error
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:50)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at android.os.Handler.handleCallback(Handler.java:751)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at android.os.Handler.dispatchMessage(Handler.java:95)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:153)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitUntilCalled(GeckoSessionTestRule.java:1714)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForPageStops(GeckoSessionTestRule.java:1502)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForPageStop(GeckoSessionTestRule.java:1477)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForPageStop(GeckoSessionTestRule.java:1467)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.NavigationDelegateTest.redirectIntentLoad(NavigationDelegateTest.kt:664)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at java.lang.reflect.Method.invoke(Native Method)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
09-27 12:09:55.805 14786 14786 E GeckoSessionTestRule: at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.lambda$evaluate$0$GeckoSessionTestRule$2(GeckoSessionTestRule.java:1429)
This assertion failure might be a hint as to what is going wrong, so I looked into it further. The assertion that fails is this one in nsDocShell::ResumeRedirectedLoad
. The channel in question, is the one created earlier in ContentChild::RecvCrossProcessRedirect
. If the channel is not pending, we return NS_BINDING_ABORTED
error in the callback nsDocShell::ResumeRedirectedLoad
.
The order of relevant events so far is the following: In DocumentLoadListener::OnStartRequest
, we take this code path. There is a call to browserParent->ResumeLoad(mLoadIdentifier);
(1) and a call to TriggerRedirectToRealChannel(Some(mContentParent));
(2).
(1) This call eventually causes nsWebBrowser::ResumeRedirectedLoad
to be called, which then calls nsDocShell::ResumeRedirectedLoad
(I matched up the load identifiers). There, we schedule a call (**) to nsDocShell::InternalLoad
(and do some other things) once the pending channel is received.
(2) DocumentLoadListener::TriggerRedirectToRealChannel
calls DocumentLoadListener::RedirectToRealChannel
which eventually calls ContentParent::SendCrossProcessRedirect
. On the receiving side, ContentChild::RecvCrossProcessRedirect
creates a new channel. In the end, it ends up calling ** callback with the new LoadState (which contains the channel)
The rough idea I have of channels is that once we create real network channel in DocumentLoadListener, we later connect it to the final destination after we have processed all redirects. In DocumentLoadListener::Open, I do see that we create a channel, and call AsyncOpen on it. However, I am not sure how the connecting of the channels work, so perhaps I should trace this to understand why the channel in docshell is not in the pending state.
Assignee | ||
Comment 18•3 years ago
|
||
Assignee | ||
Comment 19•3 years ago
|
||
Add a synced ParentInitiatedNavigationEpoch field to browsing context, which
only gets incremented when we start navigations in the parent process. When a
child process initiates a navigation, it sends the current value of the field
that it sees via DocumentChannelCreationArgs. In the parent process, we can
compare the value of that field with the latest one for the same browsing
context. If the latest value is higher than the one provided by the content
process, it means that in the meantime parent process has started a navigation
so the earlier navigation originating in the content process will be cancelled.
Depends on D126841
Assignee | ||
Comment 20•3 years ago
|
||
Previously, the tests were waiting for the load event, which was being emitted
for the initial about:blank. With the pref enabled, this event is no longer
propagated/fired to the content process. Instead of notifying the content
process that it needs to emit a load event for about:blank, we can instead make
the tests wait for a STATE_STOP event.
Depends on D126842
Assignee | ||
Comment 21•3 years ago
|
||
Depends on D126843
Assignee | ||
Comment 22•3 years ago
|
||
This allows us to move away from using IsNavigating field. Use a new distinct
error code in cases when we cancel loads in Canonical BC due to another load
starting. This way, we know to not reset the urlbar if we are doing another
load.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1721217#c10 for longer
explanation of what is going on here.
Depends on D126844
Assignee | ||
Comment 23•3 years ago
|
||
Depends on D126845
Assignee | ||
Comment 24•3 years ago
|
||
This fixes failures in browser/base/content/test/tabPrompts/browser_contentOrigins.js
Depends on D126846
Assignee | ||
Comment 25•3 years ago
|
||
Make the promise wait correctly for two "script-run-complete" events.
Depends on D126847
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 26•3 years ago
|
||
Some notes on debugging the android test so far
-
With the pref enabled
DocumentLoadListener::AsyncOnChannelRedirect
window->onLoadRequest
OnRedirectVerifyCallback
DocumentLoadListener::OnStartRequest
MaybeTriggerProcessSwitch
- returns false
- we take this path
- disconnect listeners in the old process
browserParent->ResumeLoad
🥥DocumentLoadListener::TriggerRedirectToRealChannel
DocumentLoadListener::RedirectToRealChannel
- call
SendCrossProcessRedirect
🍄
- call
- 🥥
nsWebBrowser::ResumeRedirectedLoad
nsDocShell::ResumeRedirectedLoad
- schedule a callback 🐸 upon receiving a pending channel
- callback 🐸
nsDocShell::InternalLoad
nsDocShell::DoURILoad
nsDocShell::OpenRedirectedChannel
- // Our channel was redirected from another process, so doesn't need to // be opened again. However, it does need its listener hooked up // correctly.
childChannel->CompleteRedirectSetup
- after the call to
::InternalLoad
- // If the channel isn't pending, then it means that InternalLoad // never connected it, and we shouldn't try to continue. This // can happen even if InternalLoad returned NS_OK.
NS_ASSERTION(pending, "We should have connected the pending channel!");
- [ ] confused:
nsDocShell::ResumeRedirectedLoad
says// If the channel isn't pending, then it means that InternalLoad // never connected it, and we shouldn't try to continue. This // can happen even if InternalLoad returned NS_OK.
butnsDocShell::OpenRedirectedChannel
which gets called before we check if the channel is pending, by::InternalLoad
, says// Our channel was redirected from another process, so doesn't need to // be opened again
- [ ] issue is that we are not only redirecting, but its a redirect to an external app....
- [ ] i wonder if maybe we should not call
browserParent->ResumeLoad
because that is what schedules a callback which eventually callsnsDocShell::internalLoad
- 🍄
ContentChild::RecvCrossProcessRedirect
- create a channel
- connect channel to parent ****
nsExtProtocolChannel::ConnectParent
ContentParent::RecvExtProtocolChannelConnectParent
- create a load state
- call
processListener->OnChannelReady
which will call 🐸// The listener will call completeRedirectSetup or asyncOpen on the channel.
-
Without the pref enabled - Success
DocumentChannel ctor
DocumentChannelChild ctor
DocumentChannelParent::Init
- promise =
DocumentLoadListener::Open
- 🐋 once the promise resolves do:
DocumentChannelParent::RedirectToRealChannel
SendRedirectToRealChannel
→DocumentChannelChild::RecvRedirectToRealChannel
nsExtProtocolChannel::ConnectParent
- [ ] why does this not get called in CrossProcessRedirect? Do we need to call it?
gHttpHandler->AsyncOnChannelRedirect
(this does not happen in the other path)nsHttpHandler::AsyncOnChannelRedirect
nsAsyncRedirectVerifyHelper::Init
nsAsyncRedirectVerifyHelper::Run()
- promise =
DocumentLoadListener OnStartRequest
DocumentLoadListener MaybeTriggerProcessSwitch
DocumentLoadListener::TriggerRedirectToRealChannel
DocumentLoadListener::RedirectToRealChannel
- resolve promise 🐋
DocumentChannelChild::OnRedirectVerifyCallback
childChannel->CompleteRedirectSetup
Comment 27•3 years ago
|
||
nsExtProtocolChannel unconditionally returns false from IsPending, implementing that might help?
At a higher level, do we actually want to forward to nsExtProtocolChannel down in the content process at all?
As far as I can tell, all it does it send the message back to the parent asking it to handle things (nsExternalHelperAppService::LoadURI calling SendLoadURIExternal), and then fires an empty OnStartRequest(NS_ERROR_NO_CONTENT)/OnStopRequest to the docshell.
It seems that we could bypass the content process entirely in this case (like for loads that will be downloaded), and just ask the nsExternalHelperAppService to do the external load directly in the parent, and never tell the content process/docshell about it at all.
Assignee | ||
Comment 28•3 years ago
|
||
Thanks for jumping in, Matt! So glad that you still read the comments here :)
I forgot to follow up here, but I had a meeting with nika and she pointed out how it could be related to an existing problem we know about where if a redirection fails to take place, we don't revert back the mChannelCreationURI
in DocumentLoadListener (which Matt you mentioned in another bug). I briefly tried to add a channelCreationURI
getter to nsIChannel
where it returns the value of OriginalURI
at the time AsyncOpen
method was called, and even though this specific android test was fixed, there were a lot of other test issues that I need to look at.
Considering how we want to enable parent-controlled pref only if session history in parent is enabled, this test will not be an issue anymore since we don't have SHIP enabled on android. Therefore, I filed a follow up bug 1735613 to fix this test where I will address the problem of mChannelCreationURI not being reverted back.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 29•3 years ago
|
||
Depends on D126848
Assignee | ||
Comment 30•3 years ago
|
||
Depends on D130197
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 31•3 years ago
|
||
Wait for the previous load to complete as sometimes the first load takes longer
to start and races with the latter about:mozilla one.
Depends on D130198
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Comment 33•3 years ago
|
||
Backed out for bc failures on browser_scrollPositions.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/82dbcd15f1cfd807f2ae91cbe9efcdeeee76d31d
Log link: https://treeherder.mozilla.org/logviewer?job_id=359300023&repo=autoland&lineNumber=5680
Assignee | ||
Comment 34•3 years ago
|
||
Sorry, I'm a little confused - isn't this an existing failure already - https://bugzilla.mozilla.org/show_bug.cgi?id=1716445 ?
Assignee | ||
Updated•3 years ago
|
Comment 35•3 years ago
|
||
Yes, there is an existing failure but starting with that push, the failure turned into a perma failure.
Retriggers: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Casan%2Copt%2Cmochitests%2Cwith%2Csoftware%2Cwebrender%2Cand%2Cfission%2Cenabled%2Ctest-linux1804-64-asan-qr%2Fopt-mochitest-browser-chrome-swr-fis-e10s%2Cbc15&fromchange=e9c03550bd568652b76718061548cbf3553d0a7f&selectedTaskRun=K5xbuTYfS7OBvD36IVzfWA.0&tochange=5b205316ef5c8c327a4f7a89da15e6a551474921
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 36•3 years ago
|
||
Updated•3 years ago
|
Comment 37•3 years ago
|
||
Comment 38•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2f5a0164679a
https://hg.mozilla.org/mozilla-central/rev/7d3f99ca3f44
https://hg.mozilla.org/mozilla-central/rev/885ced5f4bb4
https://hg.mozilla.org/mozilla-central/rev/7aa2726a0982
https://hg.mozilla.org/mozilla-central/rev/ee9bd012fda8
https://hg.mozilla.org/mozilla-central/rev/b38bc569d227
https://hg.mozilla.org/mozilla-central/rev/46512deaf0e5
https://hg.mozilla.org/mozilla-central/rev/c8075b91d660
https://hg.mozilla.org/mozilla-central/rev/74362b0c39c0
https://hg.mozilla.org/mozilla-central/rev/a59210a5b400
https://hg.mozilla.org/mozilla-central/rev/c131e4a6d9d5
https://hg.mozilla.org/mozilla-central/rev/5fe3c617d521
Comment 39•3 years ago
|
||
Backed out for bc failures on browser_TopLevelNavigationDelegate.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/a29dc5865ee1cfc2d4b8fce47232e4daf8f65121
Log link: https://treeherder.mozilla.org/logviewer?job_id=359770551&repo=autoland&lineNumber=27697
Comment 40•3 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/a29dc5865ee1
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 41•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 42•3 years ago
|
||
Comment 43•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/416a8398fcff
https://hg.mozilla.org/mozilla-central/rev/55d3a1bcab8f
https://hg.mozilla.org/mozilla-central/rev/384e7199e846
https://hg.mozilla.org/mozilla-central/rev/9b2516b114a7
https://hg.mozilla.org/mozilla-central/rev/5eab08029b51
https://hg.mozilla.org/mozilla-central/rev/c0c4625f45b4
https://hg.mozilla.org/mozilla-central/rev/92b951e86d87
https://hg.mozilla.org/mozilla-central/rev/6819f16a52d5
https://hg.mozilla.org/mozilla-central/rev/6bd53be6d22f
https://hg.mozilla.org/mozilla-central/rev/1b7a0d545131
https://hg.mozilla.org/mozilla-central/rev/71d18b1a1fc1
Assignee | ||
Updated•3 years ago
|
Description
•