Closed Bug 1872551 Opened 10 months ago Closed 9 months ago

Socket process sandbox prevents resolution of TURN hostnames on Win10 1607

Categories

(Core :: Networking: DNS, defect, P2)

Firefox 115
defect

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox123 --- fixed

People

(Reporter: proger.xp, Assigned: kershaw)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(1 file)

Steps to reproduce:

Firefox 115 ESR (Release), Windows 1607 LTSB. No addons or custom preferences, fully vanilla build.

If initiating a WebRTC session specifying TURN server(s) as hostnames rather than pre-resolved IPs, Firefox 100-115+ fail to resolve them.

Actual results:

By enabling MOZ_LOG=nsHostResolver:5,... I can see such records:


35.270 - [Socket Thread]: D/nsHostResolver Resolving host [turn.redacted.org]<> type 0. [this=4b71180]
35.270 - [Socket Thread]: D/nsHostResolver No usable record in cache for host [turn.redacted.org] type 0.
35.270 - [Socket Thread]: D/nsHostResolver NameLookup host:turn.redacted.org af:2
35.270 - [Socket Thread]: D/nsHostResolver NameLookup: turn.redacted.org effectiveTRRmode: 1 flags: 20
35.270 - [Socket Thread]: D/nsHostResolver TRR service not enabled - off or disabled
35.270 - [Socket Thread]: D/nsHostResolver NativeLookup host:turn.redacted.org af:2
35.270 - [Socket Thread]: D/nsHostResolver DNS thread counters: total=1 any-live=0 idle=0 pending=1
35.270 - [Socket Thread]: D/nsHostResolver DNS lookup for host [turn.redacted.org] blocking pending 'getaddrinfo' or trr query: callback
35.270 - [Socket Thread]: D/nsHostResolver Resolving host [turn.redacted.org]<> type 0. [this=4b71180]
35.271 - [Socket Thread]: D/nsHostResolver No usable record in cache for host [turn.redacted.org] type 0.
35.271 - [Socket Thread]: D/nsHostResolver NameLookup host:turn.redacted.org af:23
35.271 - [Socket Thread]: D/nsHostResolver NameLookup: turn.redacted.org effectiveTRRmode: 1 flags: 80
35.271 - [Socket Thread]: D/nsHostResolver TRR service not enabled - off or disabled
35.271 - [Socket Thread]: D/nsHostResolver NativeLookup host:turn.redacted.org af:23
35.271 - [Socket Thread]: D/nsHostResolver DNS thread counters: total=2 any-live=0 idle=0 pending=2
35.271 - [Socket Thread]: D/nsHostResolver DNS lookup for host [turn.redacted.org] blocking pending 'getaddrinfo' or trr query: callback
35.271 - [Socket Thread]: D/nsHostResolver Resolving host [turn.redacted.org]<> type 0. [this=4b71180]
35.271 - [Socket Thread]: D/nsHostResolver Host [turn.redacted.org] is being resolved. Appending callback.
35.271 - [Socket Thread]: D/nsHostResolver Resolving host [turn.redacted.org]<> type 0. [this=4b71180]
35.271 - [Socket Thread]: D/nsHostResolver Host [turn.redacted.org] is being resolved. Appending callback.
35.271 - [Socket Thread]: V/mtransport NrIceCtx(PC:{}: gathering state 0->1
35.286 - [DNS Resolver #1]: D/nsHostResolver DNS lookup thread - starting execution.
35.286 - [DNS Resolver #1]: E/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [turn.redacted.org].
35.287 - [DNS Resolver #2]: D/nsHostResolver DNS lookup thread - starting execution.
35.287 - [DNS Resolver #2]: E/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [turn.redacted.org].
35.287 - [DNS Resolver #1]: E/nsHostResolver DNS lookup thread - lookup completed for host [turn.redacted.org]: failure: unknown host.
35.287 - [DNS Resolver #1]: D/nsHostResolver nsHostResolver::CompleteLookup turn.redacted.org 0 804B001E resolver=0 stillResolving=0
35.287 - [DNS Resolver #1]: D/nsHostResolver nsHostResolver record 4b4aac0 new gencnt
35.287 - [DNS Resolver #1]: D/nsHostResolver Caching host [turn.redacted.org] negative record for 60 seconds.
35.287 - [DNS Resolver #1]: D/nsHostResolver CompleteLookup: turn.redacted.org has NO address
35.287 - [DNS Resolver #1]: D/nsHostResolver nsHostResolver record 4b4aac0 calling back dns users status:804B001E
35.288 - [DNS Resolver #2]: E/nsHostResolver DNS lookup thread - lookup completed for host [turn.redacted.org]: failure: unknown host.


In particular, observe these lines:


35.287 - [DNS Resolver #1]: E/nsHostResolver DNS lookup thread - lookup completed for host [turn.redacted.org]: failure: unknown host.

35.287 - [DNS Resolver #1]: D/nsHostResolver CompleteLookup: turn.redacted.org has NO address


As a result, TURN servers don't make it into locally-offered SDP preventing clients from reaching each other unless they have direct connectivity (TURN is not to be used). Setting media.peerconnection.ice.relay_only to true actually results in no candidates being offered (while there should be at least TURN(s)).

The same Firefox version running on a more recent Windows (20H2) resolves the hostnames correctly.

By checking the host system's network activity at the time of the call I see that Firefox doesn't attempt to contact the DNS when resolving turn.redacted.org.

I have tried to put turn.redacted.org into drivers/etc/hosts; to enable DoH (confirming on about:networking and MOZ_LOG); even visiting turn.redacted.org to land it into the cache (again confirming with about:networking) prior to making the call. Nothing worked, TURN hostnames still did not resolve.

Expected results:

I have then changed the WebRTC configuration and replaced hostname-based TURN reference with the IP - it worked (TURN seen in local SDP, clients connecting successfully).

Finally, I discovered that disabling socket process sandbox "fixes" TURN hostname resolution:

set MOZ_DISABLE_SOCKET_PROCESS_SANDBOX=1
firefox.exe

I have tested last 20 Release versions and found that this problem first appeared in Firefox 100. Firefox 99 works just fine, as do all versions several years back.

Also, it's probably irrelevant, but latest Chrome on the same system performs the same WebRTC call as expected.

The Bugbug bot thinks this bug should belong to the 'Core::Networking: DNS' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Networking: DNS
Product: Firefox → Core

Kershaw, can you have a look at this?

Flags: needinfo?(kershaw)
Blocks: socket-proc
Severity: -- → S3
Flags: needinfo?(kershaw)
Priority: -- → P2
Whiteboard: [necko-triaged]

(In reply to Ed Guloien [:edgul] from comment #2)

Kershaw, can you have a look at this?

We will revisit this one when we decide to ship socket process.

(In reply to Kershaw Chang [:kershaw] from comment #3)

(In reply to Ed Guloien [:edgul] from comment #2)

Kershaw, can you have a look at this?

We will revisit this one when we decide to ship socket process.

What does it mean? I thought we already shipped socket process (bug 1763207).

(In reply to Masatoshi Kimura [:emk] from comment #4)

(In reply to Kershaw Chang [:kershaw] from comment #3)

(In reply to Ed Guloien [:edgul] from comment #2)

Kershaw, can you have a look at this?

We will revisit this one when we decide to ship socket process.

What does it mean? I thought we already shipped socket process (bug 1763207).

Sorry, I meant networking over socket process.
My previous comment was wrong, since this has nothing to do with it.

Bob, could you take a look at this? Can we adjust the sandbox policy to allow socket process to call GetAddrInfo?

Flags: needinfo?(bobowencode)

(In reply to Kershaw Chang [:kershaw] from comment #5)
...

Bob, could you take a look at this? Can we adjust the sandbox policy to allow socket process to call GetAddrInfo?

Happy to look at what might be getting blocked and why, could you give me some steps to reproduce the problem, I'm not too familiar with WebRTC testing.
We don't have very fine-grained controls on the sandbox, so we can't just allow specific functions.

Flags: needinfo?(bobowencode) → needinfo?(kershaw)

(In reply to Bob Owen (:bobowen) from comment #6)

(In reply to Kershaw Chang [:kershaw] from comment #5)
...

Bob, could you take a look at this? Can we adjust the sandbox policy to allow socket process to call GetAddrInfo?

Happy to look at what might be getting blocked and why, could you give me some steps to reproduce the problem, I'm not too familiar with WebRTC testing.
We don't have very fine-grained controls on the sandbox, so we can't just allow specific functions.

You could go to https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ and just click Gather candidates button.
If you enable MOZ_LOG=nsHostResolver:5 logging, you should be able to see logs below. It means that socket process is trying to resolve stun.l.google.com.

[Socket 42206: Socket Thread]: D/nsHostResolver Resolving host [stun.l.google.com]<> type 0. [this=105835760]
[Socket 42206: Socket Thread]: D/nsHostResolver   No usable record in cache for host [stun.l.google.com] type 0.
[Socket 42206: Socket Thread]: D/nsHostResolver NameLookup host:stun.l.google.com af:30
[Socket 42206: Socket Thread]: D/nsHostResolver NameLookup: stun.l.google.com effectiveTRRmode: 1 flags: 80
[Socket 42206: Socket Thread]: D/nsHostResolver TRR service not enabled - off or disabled
[Socket 42206: Socket Thread]: D/nsHostResolver NativeLookup host:stun.l.google.com af:30
Flags: needinfo?(kershaw)

It looks like this is failing before Windows 10 version 1903.
If I change the integrity level from untrusted to low then it works.

kershaw - I notice that we normally do other DNS lookups in the parent, is there a reason why we don't do these lookups in the parent as well?

Flags: needinfo?(kershaw)

(In reply to Bob Owen (:bobowen) from comment #8)

It looks like this is failing before Windows 10 version 1903.
If I change the integrity level from untrusted to low then it works.

kershaw - I notice that we normally do other DNS lookups in the parent, is there a reason why we don't do these lookups in the parent as well?

That's because the DNS lookups here are from the webrtc code running in the socket process. Doing these lookups in the parent will add some extra IPC delay.

If it's not easy to modify the sandbox rules, I can create a patch to run DNS lookups in the parent process.
Bob, what do you think?

Flags: needinfo?(kershaw) → needinfo?(bobowencode)

(In reply to Kershaw Chang [:kershaw] from comment #9)

(In reply to Bob Owen (:bobowen) from comment #8)

It looks like this is failing before Windows 10 version 1903.
If I change the integrity level from untrusted to low then it works.

kershaw - I notice that we normally do other DNS lookups in the parent, is there a reason why we don't do these lookups in the parent as well?

That's because the DNS lookups here are from the webrtc code running in the socket process. Doing these lookups in the parent will add some extra IPC delay.

If it's not easy to modify the sandbox rules, I can create a patch to run DNS lookups in the parent process.
Bob, what do you think?

It's fairly easy to modify, but not just to allow this, it means weakening the sandbox in quite a broad way.
Unless there is a plan to move the other DNS lookups into the Socket process, I think doing these ones in the parent process where we do the other lookups make sense.

Flags: needinfo?(bobowencode)
Assignee: nobody → kershaw
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

(In reply to Bob Owen (:bobowen) from comment #10)

(In reply to Kershaw Chang [:kershaw] from comment #9)

(In reply to Bob Owen (:bobowen) from comment #8)

It looks like this is failing before Windows 10 version 1903.
If I change the integrity level from untrusted to low then it works.

kershaw - I notice that we normally do other DNS lookups in the parent, is there a reason why we don't do these lookups in the parent as well?

That's because the DNS lookups here are from the webrtc code running in the socket process. Doing these lookups in the parent will add some extra IPC delay.

If it's not easy to modify the sandbox rules, I can create a patch to run DNS lookups in the parent process.
Bob, what do you think?

It's fairly easy to modify, but not just to allow this, it means weakening the sandbox in quite a broad way.
Unless there is a plan to move the other DNS lookups into the Socket process, I think doing these ones in the parent process where we do the other lookups make sense.

Note that we still want to move DNS lookups to the socket process in the future, but the timeline for this plan is not clear yet. I'll file a follow up bug for modifying sandbox rules.
Before this happens, we should fix this bug by moving DNS lookups back to the parent process.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e376cec975f3 Use ChildDNSService in socket process when network.http.network_access_on_socket_process.enabled is false, r=necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: