Open Bug 1542037 Opened 7 months ago Updated 19 days ago

Crash in [@ mozilla::net::nsIOService::NewChannelFromURIWithProxyFlagsInternal]

Categories

(Core :: DOM: Security, defect, P2, critical)

defect

Tracking

()

Tracking Status
firefox-esr60 --- unaffected
firefox66 --- unaffected
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fix-optional
firefox70 - fix-optional
firefox71 --- affected

People

(Reporter: calixte, Assigned: kershaw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, leave-open, regression, Whiteboard: [domsecurity-backlog3][necko-triaged][trr])

Crash Data

Attachments

(2 files)

This bug is for crash report bp-d536f568-32ef-4921-89aa-c2b440190404.

Top 10 frames of crashing thread:

0 libxul.so mozilla::net::nsIOService::NewChannelFromURIWithProxyFlagsInternal netwerk/base/nsIOService.cpp:924
1 libxul.so <name omitted> netwerk/base/nsIOService.cpp:1010
2 libxul.so NS_InvokeByIndex 
3 libxul.so XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1630
4 libxul.so XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:941
5 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:442
6 libxul.so Interpret js/src/vm/Interpreter.cpp:589
7 libxul.so js::RunScript js/src/vm/Interpreter.cpp:422
8 libxul.so js::ExecuteKernel js/src/vm/Interpreter.cpp:781
9 libxul.so EvaluateInEnv js/src/vm/Debugger.cpp:9289

There is 1 crash in nightly 68 with buildid 20190404063228. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1541161.

[1] https://hg.mozilla.org/mozilla-central/rev?node=6dfafa75ea1a

Flags: needinfo?(ckerschb)

0 <TOP LEVEL> ["debugger eval code":2:26]
1 getEvalResult() ["resource://devtools/server/actors/webconsole/eval-with-debugger.js":134:27]
2 exports.evalWithDebugger() ["resource://devtools/server/actors/webconsole/eval-with-debugger.js":105:17]
3 evaluateJS() ["resource://devtools/server/actors/webconsole.js":1005:21]
4 evaluateJSAsync() ["resource://devtools/server/actors/webconsole.js":910:26]
5 evaluateJSAsync() ["self-hosted":1005:16]
6 onPacket() ["resource://devtools/server/main.js":1291:57]
7 send/<() ["resource://devtools/shared/transport/local-transport.js":64:22]
8 exports.makeInfallible/<() ["resource://devtools/shared/ThreadSafeDevToolsUtils.js":109:21]
9 exports.makeInfallible/<() ["resource://devtools/shared/ThreadSafeDevToolsUtils.js":109:21]

It seems someone created a channel in the console.

(In reply to Valentin Gosu [:valentin] from comment #1)

It seems someone created a channel in the console.

Right - at least we know our crash annotation with the JS stack is working. I suppose we leave this bug open for now to investigate if there are more crashes coming in - once that we can actually fix. This particular one doesn't seem like a problem to me, because calling Firefox internals from the console not using the right args can produce undefined behavior.

Flags: needinfo?(ckerschb)

Set to P3 since the crash rate is quite low.

Priority: -- → P3
Whiteboard: [domsecurity-backlog3]

The volume on this bug continues to be fairly trivial on both 68 and 67.

This signature spiked up yesterday in nightly, a bunch of new crashes with "MOZ_RELEASE_ASSERT(false) (Please pass security info when creating a channel)".
Caller seems to be mozilla::net::TRR::SendHTTPRequest()

[Tracking Requested - why for this release]: This is quite prominent on 70 nightly at the moment. A spike started using 20190716112534 and we are getting over 50 crashes per daily nightly.

OS: Linux → All
Hardware: Unspecified → All

Hello Christoph - Can you retriage per Comment 6? Thanks.

Flags: needinfo?(ckerschb)

Adding 70 as affected and adding the tracking flag again. For some reason it didn't stick in Comment 6.

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

That's very high volume for nightly so I'm marking it as a blocker for 70.

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information

Priority: P3 → P2

I chatted with Marcia and William and it seems the only stacktrace we have is the one from comment 1. Please note that in that case someone tried to create a channel from the console by providing incorrect arguments (see [0]), in which case I think it's fine to crash/fail actually.

If that crash would come from one of our production code however, then that would be bad and we would need to fix that. I hope that William can make the annotation we added for the crash [1] searchable, otherwise it's really hard to debug the problem here without a stacktrace or any STRs.

:lizzard, or :marco is it possible that the high volume of crashes only comes from incorrect usage of that API from the console?

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1541161#c0
[1] https://searchfox.org/mozilla-central/rev/5e660d3dfcba897c8501e3fda1d415565a096e7e/netwerk/base/nsIOService.cpp#896

Flags: needinfo?(ckerschb) → needinfo?(mcastelluccio)

The volume seems high enough to me that it would be quite strange if there were so many people every day running Firefox internal code in the console.

https://crash-stats.mozilla.org/report/index/014db719-ae98-4e14-86ee-161000190729 seems to have a different stack trace than the one in comment 0.

Flags: needinfo?(mcastelluccio)
See Also: → 1400485

Nhi, can you please get me some help from Necko folks for that problem here? I would have asked Valentin, but I think he is on leave, right?

It seems the problem with the stacktrace [1] was introduced within [2]. I manually looked through the codepath within [1], but that all seems correct to me. I don't know where we would loose the 'loadingPrincipal' information so this code triggers the assertion described here.

Ultimately we would need a testcase that triggers the codepath for the Trusted Recusrive Rwesolver that triggers the assertion.

[1] https://crash-stats.mozilla.org/report/index/014db719-ae98-4e14-86ee-161000190729
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1434852

Flags: needinfo?(nhnguyen)

Kershaw, could you have a look at this? Thanks!

Flags: needinfo?(nhnguyen) → needinfo?(kershaw)

This crash is really strange to me, since the TRR object should be only created and used on parent process but the crashes [1] seem all happened on child process. According to searchfox, TRR is created at these places and nsDNSService is the only object that creates TRRService and nsHostResolver which are the only two objects that create TRR.
We use IsNeckoChild to prevent creating nsDNSService in child process. So, the only possibility that IsNeckoChild returns false in child process is that this is a middleman process. In summary, I think we should use XRE_IsChildProcess instead of IsNeckoChild in nsDNSService::GetXPCOMSingleton().

[1] https://crash-stats.mozilla.org/search/?build_id=20190727213541&release_channel=nightly&signature=%3Dmozilla%3A%3Anet%3A%3AnsIOService%3A%3ANewChannelFromURIWithProxyFlagsInternal&product=Firefox&version=70.0a1&process_type=content&date=%3E%3D2019-07-22T00%3A00%3A00.000Z&date=%3C2019-07-29T15%3A47%3A00.000Z&_facets=install_time&_facets=version&_facets=address&_facets=moz_crash_reason&_facets=reason&_facets=build_id&_facets=platform_pretty_version&_facets=signature&_facets=useragent_locale&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

Flags: needinfo?(kershaw)
Whiteboard: [domsecurity-backlog3] → [domsecurity-backlog3][necko-triaged][trr]
Assignee: nobody → kershaw
  • This patch makes sure that we don't create nsDNSService on both child process and middleman process.
  • gNeckoChild won't be created in middleman process, so it's fine to create ChildDNSService on middleman process.
  • Add some MOZ_DIAGNOSTIC_ASSERT in TRR, so we know where TRR is used on child process.

I'm getting these crashes more often lately. Any ETA for a patch on Nightly?

Keywords: leave-open
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a63bc3774244
Avoid accessing nsDNSService on middleman process r=dragana

Looks as if we already have a set of crashes that have the Moz Release Assert MOZ_RELEASE_ASSERT(XRE_IsParentProcess()) (TRR must be in parent): https://bit.ly/31rOiT5

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #21)

Looks as if we already have a set of crashes that have the Moz Release Assert MOZ_RELEASE_ASSERT(XRE_IsParentProcess()) (TRR must be in parent): https://bit.ly/31rOiT5

It looks like all the crashes are triggered from webrtc code. The good news is that XRE_IsParentProcess works as expected, so the assertion works.
I am not sure why XRE_IsContentProcess returns fail. Maybe webrtc is also running not only on content process?
I think we should change "if (XRE_IsContentProcess())" to "if (!XRE_IsParentProcess())". This is the only way to make sure nsDNSService is used on parent process.

Attachment #9084018 - Attachment is obsolete: true

Adding [@ mozilla::net::TRR::TRR ] so it gets picked up in crash stats.

Crash Signature: [@ mozilla::net::nsIOService::NewChannelFromURIWithProxyFlagsInternal] → [@ mozilla::net::nsIOService::NewChannelFromURIWithProxyFlagsInternal] [@ mozilla::net::TRR::TRR ]
Attachment #9084018 - Attachment is obsolete: false

This is marked blocking, maybe from the initial high volume when it was filed. I don't think it still needs to block considering that the crash is now barely showing up.
Kershaw, just let me know if you disagree. Are you still working on this issue?

Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e91774d533a9
Only create nsDNSService on parent process r=dragana

I think this can ride the trains with 71 at this point as we head into beta 13 of 14 total.

You need to log in before you can comment on or make changes to this bug.