Crash in [@ nsCORSListenerProxy::OnStopRequest]
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: gsvelto, Assigned: mayhemer)
Details
(Keywords: crash, Whiteboard: [necko-triaged])
Crash Data
Attachments
(2 files)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
This bug is for crash report bp-294c86aa-5506-4176-bd98-6dd940190421.
Top 10 frames of crashing thread:
0 libxul.so nsCORSListenerProxy::OnStopRequest netwerk/protocol/http/nsCORSListenerProxy.cpp:619
1 libxul.so mozilla::extensions::ChannelWrapper::RequestListener::OnStopRequest toolkit/components/extensions/webrequest/ChannelWrapper.cpp:945
2 libxul.so mozilla::net::HttpBaseChannel::DoNotifyListener netwerk/protocol/http/HttpBaseChannel.cpp:3278
3 libxul.so mozilla::net::HttpAsyncAborter<mozilla::net::nsHttpChannel>::HandleAsyncAbort netwerk/protocol/http/HttpBaseChannel.h:875
4 libxul.so mozilla::detail::RunnableMethodImpl<mozilla::net::nsHttpChannel*, void xpcom/threads/nsThreadUtils.h:1174
5 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1180
6 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
9 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
Comment 1•7 years ago
|
||
Looks like the crash is caused because |listener| [1] is null. Maybe we should add a null check.
Updated•7 years ago
|
Comment 2•6 years ago
|
||
Over 2600 crashes in Fennec 68 and was also high volume in Fennec 67. Perhaps we can do what Kershaw suggested in Comment 1? I also noticed a high correlation to ublock on release:
(62.06% in signature vs 06.63% overall) Module "uBlock0@raymondhill.net.xpi" = true
(62.06% in signature vs 07.15% overall) Addon "uBlock0@raymondhill.net" = true
Having said that, this isn't in the top 20 crash signatures overall at the moment on release.
| Assignee | ||
Comment 4•6 years ago
|
||
All crashing calls are coming from ChannelWrapper::RequestListener::OnStopRequest. Apparently, listener, loaded from mOuterListener is null at [1]. The only possibility here is that OnStopRequest on the CORS proxy is called more than once. This looks to me as a bug in the webext backend.
Moving.
Comment 5•6 years ago
|
||
All we're doing is firing off an event then passing on to the upstream listener[1] we got when calling setNewListener. So IIUC, we would have to be called twice as well. Or am I missing something here?
| Assignee | ||
Comment 6•6 years ago
|
||
Got it, thanks. Then this is a necko issue, but needs some deeper look because we have added duplication checks long ago. Apparently, we are still leaking somewhere, just don't know if this has been fixed in 69 or not, or at all.
Although, according this report it seems impossible:
HttpBaseChannel.cpp@a092972b53f0e566a36770e7b03363036ff820ec (annotated)
nsHttpChannel.cpp@a092972b53f0e566a36770e7b03363036ff820ec (annotated)
HttpBaseChannel::DoNotifyListener() checks and sets mOnStopRequestCalled (on stack for the crash). nsHttpChannel::ContinueOnStopRequest also sets (doesn't check, tho) mOnStopRequestCalled. These two are only places we call OnStopRequest on the listener.
mOnStopRequestCalled is not a bit field, so possible threading issue is off the table too. It's never dropped back to false (on current mozilla-central)
It's also not a reentrance issue, something we fixed in bug 1529911.
So, only other option is that the proxy is somehow shared by more than one channels. Back to Necko.
| Assignee | ||
Comment 8•6 years ago
|
||
One aspect is weird: the crash spikes are time-correlated, not build-id correlated. The sudden spike on Jul 12 and sudden drop on Jul 26 (but not back to the previous numbers!) look like an external influence. All crashes I examined had addons (uBlock, AdBlock, ...) installed. Some cancellation logic is in place here, that triggers a duplicate call path.
I was looking more deeply in the code and we can't share nsCORSListenerProxy among channels. Http channel is inserted one in the listener chain as the first thing in AsyncOpen (nsContentSecurityManager::doContentSecurityCheck call). This either synchronously fails, bubbling to asyncOpen result, or is done only once as we set a flag on the load info to not do this again (loadInfo->Set/GetInitialSecurityCheckDone(), in case of e.g. a redirect). The listener is only given to redirected-to channels, before that we don't call streamlistener methods on it, only the last channel does.
We have fixed duplicate calls (only possible cause of this bug then) in 69, bug 1529911. Fennec (all channels!) is ESR68 based. There are no desktop 69+ crashes.
Updated•6 years ago
|
Comment 9•6 years ago
|
||
Though looking at crash-stats, maybe this isn't a dupe after all? Looks like we're still seeing a lot of reports from Fennec in the wild long after bug 1529911 was shipped to it.
| Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
Though looking at crash-stats, maybe this isn't a dupe after all? Looks like we're still seeing a lot of reports from Fennec in the wild long after bug 1529911 was shipped to it.
Thanks for dropping a note Ryan. I wanted to check myself.
I'll look at this later, because I'm running out of ideas what could be the cause. Wish we could switch to Fenix right now :/
| Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 11•6 years ago
|
||
Crash stats correlations shows a pretty high correlation to Ublock:
(67.07% in signature vs 07.01% overall) Module "uBlock0@raymondhill.net.xpi" = true
(67.07% in signature vs 07.78% overall) Addon "uBlock0@raymondhill.net" = true
(100.0% in signature vs 49.79% overall) address = 0x0
(100.0% in signature vs 52.64% overall) moz_crash_reason = null
(50.81% in signature vs 04.25% overall) Addon "uBlock0@raymondhill.net" Version = 1.23.0
| Assignee | ||
Comment 12•6 years ago
|
||
Yep, and I've spent an hour or so trying to reproduce it on my Android tablet with Fennec Nightly and that addon.
| Assignee | ||
Comment 13•6 years ago
|
||
OK, I don't remember exactly if my code investigation was done on m-c (69 or 70) or esr68. There might be a different mistake in the esr68 code causing this, but probably well hidden as the code around the CORS listener proxy has not changed that much in between. Then I'm not sure how much effort we should give this as this is a Tier2 product now (am I right?). The crash rate is not really low, so we could simply add a non-null check and be done, as a crash workaround.
| Assignee | ||
Comment 14•6 years ago
|
||
Plan:
- add a non-null check and simply do nothing when the listener has already been released
- make it initially "nightly only" or "beta only" to catch possible uncovered issues before we affect the release update channel
- make it permanently "fennec only" to not effect any other ESR based product
| Assignee | ||
Comment 15•6 years ago
|
||
| Assignee | ||
Comment 16•6 years ago
|
||
Comment on attachment 9120122 [details]
Bug 1546191 - Add a null-check for listener call in nsCORSListenerProxy::OnStopRequest to prevent a null-dereference crash, r=michal
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is an uplift effecting only Fennec Nightly and Beta and is to prevent a null-dereference crash that is hard to diagnose but has a higher crash rate. Adding only a non-null check is likely to work around the problem w/o any other symptoms. When proved, we will allow for Fennec Release as well.
Note that other ESR-based products are not affected by this patch.
- User impact if declined:
- Fix Landed on Version: none
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): We believe there will no side effect, but as no one understands the cause, it's hard to say - hence MEDIUM.
- String or UUID changes made by this patch:
Comment 17•6 years ago
|
||
Comment on attachment 9120122 [details]
Bug 1546191 - Add a null-check for listener call in nsCORSListenerProxy::OnStopRequest to prevent a null-dereference crash, r=michal
Nightly/Beta-only patch to hopefully help a Fennec crash. Approved for 68.5b4.
Comment 18•6 years ago
|
||
| bugherder uplift | ||
Comment 19•6 years ago
|
||
Setting this back to affected and removing the approval flag so we can keep an eye on the Nightly/Beta results and eventually do something for Release.
Comment 20•6 years ago
|
||
Comment on attachment 9120122 [details]
Bug 1546191 - Add a null-check for listener call in nsCORSListenerProxy::OnStopRequest to prevent a null-dereference crash, r=michal
(Just to get this off the needs-uplift radar)
| Assignee | ||
Comment 21•6 years ago
|
||
| Assignee | ||
Comment 22•6 years ago
•
|
||
Comment on attachment 9126164 [details]
Bug 1546191 - Add a null-check for listener call in nsCORSListenerProxy::OnStopRequest to prevent a null-dereference crash on all Fennec channels, r=michal
This is an all-release-channels patch for Fennec to fix the crash also on late beta and release. We have not seen any major and obvious regressions so far with the nightly/early beta patch.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Please see comment 16.
- User impact if declined:
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Comment 23•6 years ago
|
||
Comment on attachment 9126164 [details]
Bug 1546191 - Add a null-check for listener call in nsCORSListenerProxy::OnStopRequest to prevent a null-dereference crash on all Fennec channels, r=michal
Apply the crash fix to all Fennec channels. Approved for Fennec 68.6.
Comment 24•6 years ago
|
||
| bugherder uplift | ||
| Assignee | ||
Comment 25•6 years ago
|
||
Marking fixed now.
Comment 26•6 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Description
•