Defer updating mChannel until HttpChannelParent::OnRedirectResult for ServiceWorker interception per comment 13. Was: Assertion failure mChannel == chan in HttpChannelParent.cpp
Categories
(Core :: DOM: Service Workers, defect, P2)
Tracking
()
People
(Reporter: bryce, Unassigned, NeedInfo)
References
(Regression)
Details
(Keywords: leave-open, regression, Whiteboard: dom-lws-bugdash-triage)
Attachments
(3 files, 1 obsolete file)
I'm hitting this assert with moderate regularity while testing EME video on Windows. I'm not familiar with this area, but am happy to provide information to help debug.
Just got a crash right now opening and closing tabs with the shaka player page repeatedly.
My current crashing stack:
xul.dll!mozilla::net::HttpChannelParent::OnStartRequest(nsIRequest * aRequest) Line 1402 C++
xul.dll!mozilla::net::HttpBaseChannel::DoNotifyListener() Line 3016 C++
xul.dll!mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult rv) Line 3225 C++
xul.dll!mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult result) Line 9198 C++
xul.dll!mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() Line 41 C++
xul.dll!nsThread::ProcessNextEvent(bool aResult, bool *) Line 1226 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool) Line 486 C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 88 C++
xul.dll!MessageLoop::RunHandler() Line 309 C++
xul.dll!MessageLoop::Run() Line 291 C++
xul.dll!nsBaseAppShell::Run() Line 139 C++
xul.dll!nsAppShell::Run() Line 406 C++
xul.dll!nsAppStartup::Run() Line 277 C++
xul.dll!XREMain::XRE_mainRun() Line 4600 C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4735 C++
xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4816 C++
[Inline Frame] firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 218 C++
firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 300 C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 131 C++
[External Code]
Reporter | ||
Comment 1•5 years ago
|
||
I'm hitting this on central commit 25c4c2685699 (current at the time of this comment) with no further modifications.
Comment 3•5 years ago
|
||
Bryce, thanks for reporting this. But I will either need exact STR listing here or ask you to provide logs: HTTP logging - Mozilla | MDN
Thanks!
Reporter | ||
Comment 4•5 years ago
|
||
Logs of fault reproed with following command:
MOZ_LOG=timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5 MOZ_LOG_FILE=$TEMP/BryceLog.txt ./mach run
Comment 5•5 years ago
|
||
Oh... I missed that this is an assertion failure. In that case the sync
module must be added to the MOZ_LOG list to flush each line.. I'm sorry I forgot about that bit. The log is unfortunately incomplete and I can't make any conclusion out of it.
Bryce, can you please reproduce again with the sync
bit added to MOZ_LOG?
Sorry again, and thanks!
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 6•5 years ago
|
||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Thanks, I can see what happens now! This is a regression from bug 1391693.
Content makes a request to https://shaka-player-demo.appspot.com/lib/debug/log.js. There is a service worker registered that intercepts the channel on the parent process and internally redirects to InterceptedHttpChannel
.
At HttpChannelParent.cpp - mozsearch we replace mChannel of the pre-redirect HttpChannelParent with the post-redirect channel. It's too soon to do so.
During the async processing of the redirect the child channel gets cancelled (looks like the whole page load is cancelled), it happens before the redirection is asynchronously confirmed.
Then, according the contract, we notify OnStart and OnStop of the listener of the (pre-redirect) nsHttpChannel with the error code; the listener is the pre-redirect HttpChannelParent. Then the assertion fails, because mChannel is the InterceptedHttpChannel object while aRequest is the original nsHttpChannel.
2019-10-25 20:15:18.687000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel [this=00000174AED4EAC0]
2019-10-25 20:15:18.687000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpAsyncAborter::AsyncAbort [this=00000174AAE32000 status=804b0002]
2019-10-25 20:15:18.697000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::OnRedirectVerifyCallback [this=00000174AC1CE000] result=0 stack=1 mWaitingForRedirectCallback=1
2019-10-25 20:15:18.697000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::ContinueAsyncRedirectChannelToURI [this=00000174AC1CE000]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::OnRedirectResult [this=00000174AED4EAC0, suc=0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::CompleteRedirect [this=00000174AED4EAC0 succeeded=0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::CleanupBackgroundChannel [this=00000174AED4EAC0 bgParent=00000174A5B42DE0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpBackgroundChannelParent::OnChannelClosed [this=00000174A5B42DE0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::Cancel [this=00000174AC1CE000 status=804b0002]
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpBaseChannel::DoNotifyListener this=00000174AC1CE000
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp ParentChannelListener::OnStartRequest [this=00000174AABE8A10]
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::OnStartRequest [this=00000174AED4EAC0, aRequest=00000174AC1CE050]
--> boom!
Moving to service workers.
Josh, can you please take a look? Thanks.
Comment 8•5 years ago
|
||
I haven't touched Service Worker code in several years at this point. Redirecting to the active developers :)
Updated•5 years ago
|
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Reproducible with https://www.wayfair.ca/keyword.php?keyword=
Assertion failure: mChannel == chan (HttpChannelParent getting OnStartRequest from a different HttpBaseChannel instance), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelParent.cpp:1375
#01: mozilla::net::HttpBaseChannel::DoNotifyListener() [netwerk/protocol/http/HttpBaseChannel.cpp:3033]
#02: mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:3249]
#03: mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:9252]
#04: mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() [netwerk/base/nsAsyncRedirectVerifyHelper.cpp:41]
#05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1225]
Comment 11•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #9)
Reproducible with https://www.wayfair.ca/keyword.php?keyword=
Assertion failure: mChannel == chan (HttpChannelParent getting OnStartRequest from a different HttpBaseChannel instance), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelParent.cpp:1375 #01: mozilla::net::HttpBaseChannel::DoNotifyListener() [netwerk/protocol/http/HttpBaseChannel.cpp:3033] #02: mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:3249] #03: mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:9252] #04: mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() [netwerk/base/nsAsyncRedirectVerifyHelper.cpp:41] #05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1225]
Bob, anything special in your setup? Can't repro with today debug Nightly and a clean profile.
If you can, please provide a log: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
PLEASE ADD sync
module to the MOZ_LOG list (important).
Thanks!
Comment 12•5 years ago
|
||
Nothing special. I had to reload a few times this morning though it asserted quickly when I had the MOZ_LOG set. This was locally with a 20191106233932 build. I'll dl this morning's and attempt again with that.
Comment 13•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #12)
Created attachment 9108118 [details]
log.zipNothing special. I had to reload a few times this morning though it asserted quickly when I had the MOZ_LOG set. This was locally with a 20191106233932 build. I'll dl this morning's and attempt again with that.
Thanks, this is actually exactly the same as what I've already found in comment 7...
There are two ways to fix this:
- do this properly: not set mChannel to the intercepted channel that soon, but only after the internal redirect is actually confirmed, in
HttpChannelParent::OnRedirectResult
(it means to save the intercepted channel instance in a new member or extend the OnRedirectResult signature, the latter preferred) - save the original nsHttpChannel to a debug-only member and extend the assertion to accept notification from both (it kinda ruins the purpose of that assertion, though)
waiting for Andrew's vote on this.
Comment 14•5 years ago
|
||
fwiw I was able to reproduce with this morning's Nightly on Fedora 30.
Comment 15•5 years ago
|
||
"Do this properly" sounds good, but does this impede the explicit goal of "None of the redirect IPC messaging takes place." ?
Comment 16•5 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #15)
"Do this properly" sounds good, but does this impede the explicit goal of "None of the redirect IPC messaging takes place." ?
Yes. The OnRedirectResult callback is called w/o any IPC involvement. It is simply called (synchronously) after we return from the new (post-redirect) channel's AsyncOpen successfully, which is the spot you should update mChannel in the parent.
Comment 17•5 years ago
|
||
Great, thank you for the clarification. It sounded like in comment 13 you might be willing to fix this, which would be very much appreciated!
Comment 18•5 years ago
|
||
The fuzzers are hitting this frequently but it seems to be very difficult to reproduce consistently with the generated test cases and therefore reduce. I am able to repro easily using the STR above and I would be happy to create a Pernsoco session. It seems that the cause is understood but ni? me if a Pernosco session would still be helpful.
Comment 19•5 years ago
|
||
Are there user-facing consequences for this that make us want to fix this in 73 with SW-e10s?
Comment 20•5 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #17)
It sounded like in comment 13 you might be willing to fix this, which would be very much appreciated!
Hi Honza, it would be really appreciated if you find some time for this! If not, please let me know such that we put it into our (long) queue. Thank you!
Comment 21•5 years ago
|
||
No, I wasn't willing to fix this, only stated what should be fixed. Please set a priority of this bug if you want someone from the Necko team to work on this sooner or later. Thanks.
Comment 22•5 years ago
|
||
Bug 1610888 disabled sw-e10s for 73.
Comment 23•5 years ago
|
||
Hi Tyson, did you further try to reduce this? Thanks in advance!
Comment 24•5 years ago
•
|
||
Before this assert there is no check/assert, if mChannel
is nullptr (like happens in bug 1572553 without consequences).
There are quite some places we set mChannel
to nullptr. I found:
HttpChannelParent::NotifyDiversionFailed
HttpChannelParent::AsyncOpenFailed
HttpChannelParent::RecvDocumentChannelCleanup
HttpChannelParent::NotifyDiversionFailed
And there are even more places we use it, some without checking. One is here.
As a first measure, I would suggest to add explicit MOZ_ASSERT(mChannel);
to all uses of mChannel in order to be sure, if we are seeing a nullptr or really a wrong channel.
:asuth, are we able to say something about the consequences for the user if this happens?
Comment 25•5 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #24)
:asuth, are we able to say something about the consequences for the user if this happens?
We're not currently expecting any meaningful user impact on non-debug builds. It's an invariant check that we do want to make pass, but this is about channel cancellation, which means the user has already aborted the underlying navigation and we expect the entirety of the channel object graph to be appropriately cleaned up as a result.
Updated•5 years ago
|
Reporter | ||
Comment 26•5 years ago
|
||
I continue to hit this regularly. I suggest we comment out the assert -- it appears that the issue is understood and that we have some idea how to fix, and I'm not sure if hitting the assertion is helping turn up any new information. I will submit a patch.
Reporter | ||
Comment 27•5 years ago
|
||
It's possible to hit the commented assertion in debug builds as the invariant it
is checking can be violated. Disable it to prevent crashes in debug builds until
the invariant holds.
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Reporter | ||
Comment 29•5 years ago
|
||
Unassigning since I was just disabling the assert.
Updated•5 years ago
|
Comment 30•5 years ago
|
||
bugherder |
Comment 31•5 years ago
|
||
uplift |
Comment 32•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Comment 33•4 years ago
|
||
Are we planning any follow up here?
Comment 34•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Comment 35•4 years ago
|
||
(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #34)
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Andrew?
Comment 36•4 years ago
|
||
We still want to take the comment 13 action; we've only landed the assertion disabling thus far and I think there's still useful context in this bug, but if someone wants to consolidate in a spin-off, that's fine.
Comment 37•3 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Updated•3 years ago
|
Updated•3 years ago
|
Comment 38•3 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Comment 39•3 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #36)
We still want to take the comment 13 action; we've only landed the assertion disabling thus far and I think there's still useful context in this bug, but if someone wants to consolidate in a spin-off, that's fine.
Still valid, I assume.
Updated•2 years ago
|
Updated•2 months ago
|
Description
•