Crash in [@ mozilla::net::nsHttpTransaction::Finish0RTT]
Categories
(Core :: Networking, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox85 | --- | wontfix |
firefox86 | --- | wontfix |
firefox87 | --- | fixed |
People
(Reporter: furkan, Assigned: kershaw)
References
Details
(Keywords: crash, regression, Whiteboard: [necko-triaged])
Crash Data
Attachments
(1 file)
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/2c3db9f9-54e7-498d-b678-d61050210110
Reason: SIGSEGV /SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so mozilla::net::nsHttpTransaction::Finish0RTT netwerk/protocol/http/nsHttpTransaction.cpp:2902
1 libxul.so mozilla::net::Http2Session::Finish0RTT netwerk/protocol/http/Http2Session.cpp:3459
2 libxul.so mozilla::net::nsHttpConnection::EnsureNPNComplete netwerk/protocol/http/nsHttpConnection.cpp:574
3 libxul.so mozilla::net::nsHttpConnection::OnSocketWritable netwerk/protocol/http/nsHttpConnection.cpp:1953
4 libxul.so {virtual override thunk}
5 libxul.so mozilla::net::nsSocketOutputStream::OnSocketReady netwerk/base/nsSocketTransport2.cpp:520
6 libxul.so mozilla::net::nsSocketTransport::OnSocketReady netwerk/base/nsSocketTransport2.cpp:2296
7 libxul.so mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1156
8 libxul.so {virtual override thunk}
9 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
Updated•3 years ago
|
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Looks like a dupe of bug 1685923.
I've been getting these crashes lately (1-2x per day). This is on a Windows laptop running Nightly with Fission disabled. My work computer (where Nightly sees much more use) is MacOS running Nightly+Fission, and I haven't seen the crash there.
If there are any diagnostics, logging, etc. that would be helpful for me to run, let me know. I have a couple of sites that seem to trigger it more frequently and I can probably catch it in the act.
Comment 2•3 years ago
|
||
(In reply to Kevin Jacobs [:kjacobs] from comment #1)
Looks like a dupe of bug 1685923.
They are dups.
I've been getting these crashes lately (1-2x per day). This is on a Windows laptop running Nightly with Fission disabled. My work computer (where Nightly sees much more use) is MacOS running Nightly+Fission, and I haven't seen the crash there.
If there are any diagnostics, logging, etc. that would be helpful for me to run, let me know. I have a couple of sites that seem to trigger it more frequently and I can probably catch it in the act.
A http log would be useful, but you would need to run it all the time, because I need to figure out what is exactly happening. it will make browser slow.
If you could do that for a day or until it crashes that would be great.
Logging instruction:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
you can use rotate to limit log file.
Thanks!
Comment 3•3 years ago
|
||
Okay, I've emailed you HTTP logs of the crash with Fission both enabled and disabled.
Let me know if you need me to try anything else.
Comment 4•3 years ago
|
||
This is related to HTTPS RR records. We restart a transaction 2 times and we have the transaction 2 time added to nsHttpConnectionMgr.
Kershaw, ca you take a look? I am no sure what is a right solution here. Restarting transaction on error is what cause the problem.
We are also to aggressive when we restart. We are restarting a connection even if it is cancled by nsHttpChannel which is not right. The second restart in the log that I will send you is due to nsHttpChannel canceling transaction because it got a cache response first (race cache with network feature).
Assignee | ||
Comment 5•3 years ago
|
||
I think we should not restart the transaction when the close reason is not a network error.
Assignee | ||
Comment 6•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
I've tried to create a short list of error codes that are allowed to restart transactions with HTTPS RR.
static inline bool AllowedErrorForHTTPSRRFallback(nsresult aError) {
return psm::IsNSSErrorCode(-1 * NS_ERROR_GET_CODE(aError)) ||
aError == NS_ERROR_NET_RESET ||
aError == NS_ERROR_CONNECTION_REFUSED ||
aError == NS_ERROR_UNKNOWN_HOST;
}
I assume we want to be conservative about restarting transactions and I am also wondering if we want to add NS_ERROR_NET_TIMEOUT
into this list, but there are couple of places (here and here) that channels could be canceled by this error code. Currently, we don't know if the transaction is closed by nsHttpChannel::Cancel
or not, so it'd be a bit tricky if we want to add NS_ERROR_NET_TIMEOUT
into this list.
I also tried to figure out if nsHttpChannel::Cancel
could be called by the error codes listed above. There are two cases where this could happen.
nsHttpChannel::Cancel
is called bynsHttpChannel::ProcessFailedProxyConnect
at here. This only happens whenCONNECT
is used, so we can easily filter this case out (I should file another bug for this as described here).nsHttpChannel::Cancel
is called with it'smStatus
error code. For example, inHTMLMediaElement::MediaLoadListener::OnStartRequest
it returns the request's status when it's failed and thenCancel
is called in HttpChannelChild::DoOnStartRequest. This should be fine, since the transaction should be already closed.
Except two cases above, I think it should be safe to restart transactions with these error codes.
Dragana, could you take a look and let me know your thoughts? Thanks.
Comment 8•3 years ago
|
||
Thank you for investigating this!!!!!!
(In reply to Kershaw Chang [:kershaw] from comment #7)
I've tried to create a short list of error codes that are allowed to restart transactions with HTTPS RR.
static inline bool AllowedErrorForHTTPSRRFallback(nsresult aError) { return psm::IsNSSErrorCode(-1 * NS_ERROR_GET_CODE(aError)) || aError == NS_ERROR_NET_RESET || aError == NS_ERROR_CONNECTION_REFUSED || aError == NS_ERROR_UNKNOWN_HOST; }
I assume we want to be conservative about restarting transactions and I am also wondering if we want to add
NS_ERROR_NET_TIMEOUT
into this list, but there are couple of places (here and here) that channels could be canceled by this error code. Currently, we don't know if the transaction is closed bynsHttpChannel::Cancel
or not, so it'd be a bit tricky if we want to addNS_ERROR_NET_TIMEOUT
into this list.
This is worrying. I think we should include NS_ERROR_NET_TIMEOUT. At lease the firs link uses that error code so that a proper error page is shown. I do not really know what the second link is doing :)
What about introducing NS_ERROR_NET_TIMEOUT_EXTERNAL ? that behaves the same as NS_ERROR_NET_TIMEOUT, but we can distinguish them.
I also tried to figure out if
nsHttpChannel::Cancel
could be called by the error codes listed above. There are two cases where this could happen.
This will probably never happen. As you pointed out we should not use ECH for the connection to the proxy (only for the inner TLS to the end server), and we cannot use it for altSvc also because HTTP3 does not support CONNECT at moment.
nsHttpChannel::Cancel
is called with it'smStatus
error code. For example, inHTMLMediaElement::MediaLoadListener::OnStartRequest
it returns the request's status when it's failed and thenCancel
is called in HttpChannelChild::DoOnStartRequest. This should be fine, since the transaction should be already closed.Except two cases above, I think it should be safe to restart transactions with these error codes.
Dragana, could you take a look and let me know your thoughts? Thanks.
This is a good set of errors. Let me know what you think about NS_ERROR_NET_TIMEOUT_EXTERNAL
Assignee | ||
Comment 9•3 years ago
|
||
This is a good set of errors. Let me know what you think about NS_ERROR_NET_TIMEOUT_EXTERNAL
Thanks for this suggestion.
I think the way we want to proceed is to avoid channels being cancelled with hard error codes
, so I'll create this new error code.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
¡Hola y'all!
Nightly crashed like https://crash-stats.mozilla.org/report/index/ba9b040d-20a5-456e-88c1-0b5d30210204
Is this the same bug?
Also, should this be set NEW now?
¡Gracias!
Alex
Comment 11•3 years ago
|
||
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/8a2fb0aee75c Only fallback to original conn info when network error happens r=necko-reviewers,valentin,dragana
Comment 12•3 years ago
|
||
bugherder |
Comment 13•3 years ago
|
||
The patch landed in nightly and beta is affected.
:kershaw, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Comment 14•3 years ago
|
||
Low level of crashes in beta and we are entering RC week, wontfix 86.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Description
•