Open Bug 1589749 Opened 8 months ago Updated 3 months ago

Assertion failure mChannel == chan in HttpChannelParent.cpp

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

Tracking Status
firefox-esr68 --- disabled
firefox72 --- disabled
firefox73 --- disabled
firefox74 --- wontfix
firefox75 --- disabled
firefox76 --- disabled

People

(Reporter: bryce, Unassigned, NeedInfo)

References

(Regression)

Details

(Keywords: leave-open, regression)

Attachments

(3 files, 1 obsolete file)

I'm hitting this assert with moderate regularity while testing EME video on Windows. I'm not familiar with this area, but am happy to provide information to help debug.

Just got a crash right now opening and closing tabs with the shaka player page repeatedly.

My current crashing stack:

	xul.dll!mozilla::net::HttpChannelParent::OnStartRequest(nsIRequest * aRequest) Line 1402	C++
 	xul.dll!mozilla::net::HttpBaseChannel::DoNotifyListener() Line 3016	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult rv) Line 3225	C++
 	xul.dll!mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult result) Line 9198	C++
 	xul.dll!mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() Line 41	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aResult, bool *) Line 1226	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool) Line 486	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 88	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	xul.dll!MessageLoop::Run() Line 291	C++
 	xul.dll!nsBaseAppShell::Run() Line 139	C++
 	xul.dll!nsAppShell::Run() Line 406	C++
 	xul.dll!nsAppStartup::Run() Line 277	C++
 	xul.dll!XREMain::XRE_mainRun() Line 4600	C++
 	xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4735	C++
 	xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4816	C++
 	[Inline Frame] firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 218	C++
 	firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 300	C++
 	firefox.exe!wmain(int argc, wchar_t * * argv) Line 131	C++
 	[External Code]	

I'm hitting this on central commit 25c4c2685699 (current at the time of this comment) with no further modifications.

See Also: → 1588409

Honza, could you take a look?
Thanks.

Flags: needinfo?(honzab.moz)

Bryce, thanks for reporting this. But I will either need exact STR listing here or ask you to provide logs: HTTP logging - Mozilla | MDN

Thanks!

Flags: needinfo?(honzab.moz) → needinfo?(bvandyk)
Attached file HttpLogs.zip (obsolete) —

Logs of fault reproed with following command:

MOZ_LOG=timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5 MOZ_LOG_FILE=$TEMP/BryceLog.txt ./mach run
Flags: needinfo?(bvandyk)

Oh... I missed that this is an assertion failure. In that case the sync module must be added to the MOZ_LOG list to flush each line.. I'm sorry I forgot about that bit. The log is unfortunately incomplete and I can't make any conclusion out of it.

Bryce, can you please reproduce again with the sync bit added to MOZ_LOG?

Sorry again, and thanks!

Flags: needinfo?(bvandyk)
Attachment #9104037 - Attachment is obsolete: true
Flags: needinfo?(bvandyk)
Attached file HttpLogs2.zip
Priority: -- → P2
Whiteboard: [necko-triaged]

Thanks, I can see what happens now! This is a regression from bug 1391693.

Content makes a request to https://shaka-player-demo.appspot.com/lib/debug/log.js. There is a service worker registered that intercepts the channel on the parent process and internally redirects to InterceptedHttpChannel.

At HttpChannelParent.cpp - mozsearch we replace mChannel of the pre-redirect HttpChannelParent with the post-redirect channel. It's too soon to do so.

During the async processing of the redirect the child channel gets cancelled (looks like the whole page load is cancelled), it happens before the redirection is asynchronously confirmed.

Then, according the contract, we notify OnStart and OnStop of the listener of the (pre-redirect) nsHttpChannel with the error code; the listener is the pre-redirect HttpChannelParent. Then the assertion fails, because mChannel is the InterceptedHttpChannel object while aRequest is the original nsHttpChannel.

2019-10-25 20:15:18.687000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel [this=00000174AED4EAC0]
2019-10-25 20:15:18.687000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpAsyncAborter::AsyncAbort [this=00000174AAE32000 status=804b0002]
2019-10-25 20:15:18.697000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::OnRedirectVerifyCallback [this=00000174AC1CE000] result=0 stack=1 mWaitingForRedirectCallback=1
2019-10-25 20:15:18.697000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::ContinueAsyncRedirectChannelToURI [this=00000174AC1CE000]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::OnRedirectResult [this=00000174AED4EAC0, suc=0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::CompleteRedirect [this=00000174AED4EAC0 succeeded=0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::CleanupBackgroundChannel [this=00000174AED4EAC0 bgParent=00000174A5B42DE0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpBackgroundChannelParent::OnChannelClosed [this=00000174A5B42DE0]
2019-10-25 20:15:18.718000 UTC - [Parent 39232: Main Thread]: D/nsHttp nsHttpChannel::Cancel [this=00000174AC1CE000 status=804b0002]
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpBaseChannel::DoNotifyListener this=00000174AC1CE000
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp ParentChannelListener::OnStartRequest [this=00000174AABE8A10]
2019-10-25 20:15:18.719000 UTC - [Parent 39232: Main Thread]: D/nsHttp HttpChannelParent::OnStartRequest [this=00000174AED4EAC0, aRequest=00000174AC1CE050]
--> boom!

Moving to service workers.

Josh, can you please take a look? Thanks.

Component: Networking: HTTP → DOM: Service Workers
Flags: needinfo?(josh)
Priority: P2 → --
Regressed by: 1391693
Whiteboard: [necko-triaged]

I haven't touched Service Worker code in several years at this point. Redirecting to the active developers :)

Flags: needinfo?(josh) → needinfo?(bugmail)
Priority: -- → P2

Reproducible with https://www.wayfair.ca/keyword.php?keyword=

Assertion failure: mChannel == chan (HttpChannelParent getting OnStartRequest from a different HttpBaseChannel instance), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelParent.cpp:1375
#01: mozilla::net::HttpBaseChannel::DoNotifyListener() [netwerk/protocol/http/HttpBaseChannel.cpp:3033]
#02: mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:3249]
#03: mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:9252]
#04: mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() [netwerk/base/nsAsyncRedirectVerifyHelper.cpp:41]
#05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1225]

^

Flags: needinfo?(honzab.moz)

(In reply to Bob Clary [:bc:] from comment #9)

Reproducible with https://www.wayfair.ca/keyword.php?keyword=

Assertion failure: mChannel == chan (HttpChannelParent getting OnStartRequest from a different HttpBaseChannel instance), at /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelParent.cpp:1375
#01: mozilla::net::HttpBaseChannel::DoNotifyListener() [netwerk/protocol/http/HttpBaseChannel.cpp:3033]
#02: mozilla::net::nsHttpChannel::ContinueAsyncRedirectChannelToURI(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:3249]
#03: mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:9252]
#04: mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() [netwerk/base/nsAsyncRedirectVerifyHelper.cpp:41]
#05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1225]

Bob, anything special in your setup? Can't repro with today debug Nightly and a clean profile.

If you can, please provide a log: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
PLEASE ADD sync module to the MOZ_LOG list (important).

Thanks!

Flags: needinfo?(honzab.moz)
Attached file log.zip

Nothing special. I had to reload a few times this morning though it asserted quickly when I had the MOZ_LOG set. This was locally with a 20191106233932 build. I'll dl this morning's and attempt again with that.

(In reply to Bob Clary [:bc:] from comment #12)

Created attachment 9108118 [details]
log.zip

Nothing special. I had to reload a few times this morning though it asserted quickly when I had the MOZ_LOG set. This was locally with a 20191106233932 build. I'll dl this morning's and attempt again with that.

Thanks, this is actually exactly the same as what I've already found in comment 7...

There are two ways to fix this:

  1. do this properly: not set mChannel to the intercepted channel that soon, but only after the internal redirect is actually confirmed, in HttpChannelParent::OnRedirectResult (it means to save the intercepted channel instance in a new member or extend the OnRedirectResult signature, the latter preferred)
  2. save the original nsHttpChannel to a debug-only member and extend the assertion to accept notification from both (it kinda ruins the purpose of that assertion, though)

waiting for Andrew's vote on this.

fwiw I was able to reproduce with this morning's Nightly on Fedora 30.

"Do this properly" sounds good, but does this impede the explicit goal of "None of the redirect IPC messaging takes place." ?

Flags: needinfo?(bugmail)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #15)

"Do this properly" sounds good, but does this impede the explicit goal of "None of the redirect IPC messaging takes place." ?

Yes. The OnRedirectResult callback is called w/o any IPC involvement. It is simply called (synchronously) after we return from the new (post-redirect) channel's AsyncOpen successfully, which is the spot you should update mChannel in the parent.

Great, thank you for the clarification. It sounded like in comment 13 you might be willing to fix this, which would be very much appreciated!

The fuzzers are hitting this frequently but it seems to be very difficult to reproduce consistently with the generated test cases and therefore reduce. I am able to repro easily using the STR above and I would be happy to create a Pernsoco session. It seems that the cause is understood but ni? me if a Pernosco session would still be helpful.

Are there user-facing consequences for this that make us want to fix this in 73 with SW-e10s?

Flags: needinfo?(bugmail)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #17)

It sounded like in comment 13 you might be willing to fix this, which would be very much appreciated!
Hi Honza, it would be really appreciated if you find some time for this! If not, please let me know such that we put it into our (long) queue. Thank you!

Flags: needinfo?(honzab.moz)

No, I wasn't willing to fix this, only stated what should be fixed. Please set a priority of this bug if you want someone from the Necko team to work on this sooner or later. Thanks.

Flags: needinfo?(honzab.moz)

Hi Tyson, did you further try to reduce this? Thanks in advance!

Flags: needinfo?(tysonsmith)

Before this assert there is no check/assert, if mChannel is nullptr (like happens in bug 1572553 without consequences).

There are quite some places we set mChannel to nullptr. I found:
HttpChannelParent::NotifyDiversionFailed
HttpChannelParent::AsyncOpenFailed
HttpChannelParent::RecvDocumentChannelCleanup
HttpChannelParent::NotifyDiversionFailed

And there are even more places we use it, some without checking. One is here.

As a first measure, I would suggest to add explicit MOZ_ASSERT(mChannel); to all uses of mChannel in order to be sure, if we are seeing a nullptr or really a wrong channel.

:asuth, are we able to say something about the consequences for the user if this happens?

(In reply to Jens Stutte [:jstutte] from comment #24)

:asuth, are we able to say something about the consequences for the user if this happens?

We're not currently expecting any meaningful user impact on non-debug builds. It's an invariant check that we do want to make pass, but this is about channel cancellation, which means the user has already aborted the underlying navigation and we expect the entirety of the channel object graph to be appropriately cleaned up as a result.

I continue to hit this regularly. I suggest we comment out the assert -- it appears that the issue is understood and that we have some idea how to fix, and I'm not sure if hitting the assertion is helping turn up any new information. I will submit a patch.

Keywords: leave-open

It's possible to hit the commented assertion in debug builds as the invariant it
is checking can be violated. Disable it to prevent crashes in debug builds until
the invariant holds.

Assignee: nobody → bvandyk
Status: NEW → ASSIGNED
Pushed by bvandyk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/69fa1204fcfd
Comment out assertion in HttpChannelParent as invariant does not hold. r=asuth

Unassigning since I was just disabling the assert.

Assignee: bvandyk → nobody
Status: ASSIGNED → NEW
Assignee: nobody → bvandyk
Status: NEW → ASSIGNED
Assignee: bvandyk → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bugmail)
You need to log in before you can comment on or make changes to this bug.