Closed Bug 1686611 Opened 4 years ago Closed 4 years ago

Website load issues with network.dns.upgrade_with_https_rr

Categories

(Core :: Networking, defect, P1)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox84 --- unaffected
firefox85 --- unaffected
firefox86 --- fixed

People

(Reporter: loic.yhuel, Assigned: kershaw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0

Steps to reproduce:

Try to load ipchicken.com or forums.mozillazine.org on nightly, with network.dns.upgrade_with_https_rr=true (enabled by default now) and TRR enabled.

Actual results:

ipchicken.com first displays a page telling the site cannot be found, on refresh the site loads correctly.

forums.mozillazine.org doesn't load, or only partially (keeps loading forever)
Note that it seems to need uBlock Origin for the issue to be reproduced.
Sometimes the load is finished, but the banner image is missing : disabling then enabling it again in the CSS (devtools) triggers the request.

Looking at network requests in devtools doesn't show anything specific.
The IP addresses are resolved in about:networking#dns (sometimes there are up to 5 lines for the same hostname).
On about:networking#dnslookuptool, both hostnames show NS_ERROR_UNKNOWN_HOST for HTTP RR, but many other sites have this and don't fail.

Blocks: httpssvc
Component: Untriaged → Networking
OS: Unspecified → Windows 10
Product: Firefox → Core
Regressed by: 1680613
Hardware: Unspecified → x86_64
Has Regression Range: --- → yes

ipchicken.com seems to have "1 ipchicken.com (alpn="h2" ipv4hint="104.26.8.109, 104.26.9.109, 172.67.73.20" )", not sure if it changed, or if I made a mistake (checking when TRR and/or HTTPS RR was off).

http://ipchicken.com/ gets redirected to https://ipchicken.com in the address bar, but the connexion error page is displayed.
Then a refresh correctly loads https://ipchicken.com.

http://forums.mozillazine.org is different, since it has no HTTPS RR (and no https at all).
It could do http requests which could be upgraded to https (on www.google.com), but they are blocked by uBlock Origin.
So a full load with network.dns.upgrade_with_https_rr=false only shows requests on http://forums.mozillazine.org in devtools.

Flags: needinfo?(kershaw)

Thanks for the report.

The problem for http://ipchicken.com/ is that our implementation for using iphint address is kind broken. The socket somehow failed to get the iphint address from the dns cache.
Just FYI this patch should be able to fix this problem.

The problem about http://forums.mozillazine.org seems different, since I can't reproduce this at my side. Could you try to create the http log for loading http://forums.mozillazine.org only?

Thanks.

Assignee: nobody → kershaw
Flags: needinfo?(kershaw) → needinfo?(hwti)
Severity: -- → S3
Priority: -- → P1
Whiteboard: [necko-triaged]

Set release status flags based on info from the regressing bug 1680613

Attached file http log (obsolete) —

I'm sadly not able to reproduce the issue on a blank profile today (I even tried to restart firefox after having enabling TRR, installing uBlock Origin).

So I created the log on the current session, with many window opened, so there are probably other requests mixed.
At the end, the tab loading http://forums.mozillazine.org/ is still blank, with the loading animation.

I see two TRR requests for forums.mozillazine.org 0.5s apart, strange.

There are a few weird traces for what should be clear text connections :

2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: V/nsHttp HalfOpenSocket::SetupStreams [this=000001D0335B6700 ent=........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29] setup routed transport to origin forums.mozillazine.org:80 via :443
2021-01-14 12:09:27.228000 UTC - [Parent 9700: Socket Thread]: E/nsHttp nsHttpConnection::SetupSSL 000001D02CE71000 caps=0x401 ........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29
Flags: needinfo?(hwti)

The two DNS requests can be seen in about:networking#dns :

forums.mozillazine.org	ipv4	true	140.211.166.86	56	^partitionKey=%28http%2Cmozillazine.org%29
forums.mozillazine.org	ipv4	true	140.211.166.86	55	^partitionKey=%28http%2Cmozillazine.org%29

But they aren't the only duplicated ones, for example :

  • detectportal.firefox.com (not using TRR) has two "ipv4" lines, one with an ipv4, and the other one with the same ipv4 and an ipv6
  • after loading http://ipchicken.com, ipchicken.com is listed 5 times (3 with partitionKey, 2 without)

(In reply to Loïc Yhuel from comment #4)

Created attachment 9197070 [details]
http log

I'm sadly not able to reproduce the issue on a blank profile today (I even tried to restart firefox after having enabling TRR, installing uBlock Origin).

So I created the log on the current session, with many window opened, so there are probably other requests mixed.
At the end, the tab loading http://forums.mozillazine.org/ is still blank, with the loading animation.

I see two TRR requests for forums.mozillazine.org 0.5s apart, strange.

There are a few weird traces for what should be clear text connections :

2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: V/nsHttp HalfOpenSocket::SetupStreams [this=000001D0335B6700 ent=........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29] setup routed transport to origin forums.mozillazine.org:80 via :443
2021-01-14 12:09:27.228000 UTC - [Parent 9700: Socket Thread]: E/nsHttp nsHttpConnection::SetupSSL 000001D02CE71000 caps=0x401 ........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29

It seems that the log file you uploaded is corrupted. I can't unpack it. Could you check again?

Flags: needinfo?(hwti)
Attached file http log

(In reply to Kershaw Chang [:kershaw] from comment #6)

It seems that the log file you uploaded is corrupted. I can't unpack it. Could you check again?

It's valid for me, maybe we are using different 7zip versions.
I uploaded it in zip format now.

Attachment #9197070 - Attachment is obsolete: true
Flags: needinfo?(hwti)

(In reply to Loïc Yhuel from comment #8)

Created attachment 9197085 [details]
http log

(In reply to Kershaw Chang [:kershaw] from comment #6)

It seems that the log file you uploaded is corrupted. I can't unpack it. Could you check again?

It's valid for me, maybe we are using different 7zip versions.
I uploaded it in zip format now.

Thanks for the log.

However, I can't find anything wrong from the log. The logs shows that all http requests to http://forums.mozillazine.org/ are succeeded. Maybe I missed something or the log didn't include the failed http request. Anyway, it's for sure that the problem about http://forums.mozillazine.org/ has nothing to do with network.dns.upgrade_with_https_rr, so I'd like to create another bug for this.

Bug 1686828 filed for the problem about http://forums.mozillazine.org/.

(In reply to Kershaw Chang [:kershaw] from comment #9)

However, I can't find anything wrong from the log. The logs shows that all http requests to http://forums.mozillazine.org/ are succeeded. Maybe I missed something or the log didn't include the failed http request. Anyway, it's for sure that the problem about http://forums.mozillazine.org/ has nothing to do with network.dns.upgrade_with_https_rr, so I'd like to create another bug for this.

So the mention of :443 and nsHttpConnection::SetupSSL are just misleading ?
Are the multiple TRR resolve operations expected ?

I do not reproduce the issue with network.dns.upgrade_with_https_rr=false, also another person reproducing it has avoided the issue by either disabling TRR, or disabling it for the specified site (http://forums.mozillazine.org/viewtopic.php?p=14882778#p14882778).
Could this be a timing issue, causing the load to be stuck ?

Attached image browser console

Here is one case where the load seems to block very early.
In the browser console, the "GET http://forums.mozillazine.org/" was OK, but "GET http://forums.mozillazine.org/static/common/images/navHeader.gif" does not have any status (and all delays are at 0).

In about:networking#sockets, after the load is blocked for a few seconds, there are no sockets for 140.211.166.86 (forums.mozillazine.org).
But if I then cancel the load, I get one or several sockets, with 0 bytes sent and received.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d799681d6a40 If DNS resolvation failed with RESOLVE_IP_HINT flag, try without the flag again r=necko-reviewers,dragana
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch

(In reply to Loïc Yhuel from comment #11)

(In reply to Kershaw Chang [:kershaw] from comment #9)

However, I can't find anything wrong from the log. The logs shows that all http requests to http://forums.mozillazine.org/ are succeeded. Maybe I missed something or the log didn't include the failed http request. Anyway, it's for sure that the problem about http://forums.mozillazine.org/ has nothing to do with network.dns.upgrade_with_https_rr, so I'd like to create another bug for this.

So the mention of :443 and nsHttpConnection::SetupSSL are just misleading ?

Right. It's just misleading.
Please see the log below. The second line makes you think the connection to forums.mozillazine.org is using port 443, but actually the port that the socket used is still 80.

2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: V/nsHttp Creating HalfOpenSocket [this=000001D0335B6700 trans=000001D03B73AF80 ent=forums.mozillazine.org key=........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29]
2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: V/nsHttp HalfOpenSocket::SetupStreams [this=000001D0335B6700 ent=........[tlsflags0x00000000]forums.mozillazine.org:80^partitionKey=%28http%2Cmozillazine.org%29] setup routed transport to origin forums.mozillazine.org:80 via :443
2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: D/nsSocketTransport creating nsSocketTransport @000001D0308C5C00
2021-01-14 12:09:26.980000 UTC - [Parent 9700: Socket Thread]: E/nsSocketTransport nsSocketTransport::Init [this=000001D0308C5C00 host=forums.mozillazine.org:80 origin=forums.mozillazine.org:80 proxy=:0]

Are the multiple TRR resolve operations expected ?

I didn't dig why we have multiple DNS records, but I think it's fine. DNS records should not block the page load.

I do not reproduce the issue with network.dns.upgrade_with_https_rr=false, also another person reproducing it has avoided the issue by either disabling TRR, or disabling it for the specified site (http://forums.mozillazine.org/viewtopic.php?p=14882778#p14882778).
Could this be a timing issue, causing the load to be stuck ?

Could be, but I can't fix this without the log.

(In reply to Loïc Yhuel from comment #12)

Created attachment 9197225 [details]
browser console

Here is one case where the load seems to block very early.
In the browser console, the "GET http://forums.mozillazine.org/" was OK, but "GET http://forums.mozillazine.org/static/common/images/navHeader.gif" does not have any status (and all delays are at 0).

Unfortunately, the screenshot here is not aligned with the log you attached. The log shows that the request http://forums.mozillazine.org/static/common/images/navHeader.gif is served from cache. Could you try to reproduce this problem with the log turning on again?
It'd be great if you can upload the new log on Bug 1686828.

Thanks in advance for your help!

Flags: needinfo?(hwti)

Here's another site that behaves like mozillazine... https://forum.gigabyte.us/

I just installed this fix with the latest Nightly update. Mozzilazine still doesn't load the first time around. IPChicken and the Gigabyte forum load OK now.

Flags: qe-verify+

I can't seem to be able to reproduce the issue.
Can you please verify the fix on latest beta and Nightly?
Thank you.

(In reply to Oana Botisan, Desktop Release QA from comment #20)

I can't seem to be able to reproduce the issue.
Can you please verify the fix on latest beta and Nightly?
Thank you.

network.dns.upgrade_with_https_rr has been turned off, so the default config wouldn't trigger the bug, even without the fix.

The commit here fixed http://ipchicken.com/, and http://forums.mozillazine.org/ has been fixed on https://bugzilla.mozilla.org/show_bug.cgi?id=1686828, so everything has been fixed on Nightly.

On 86.0b5, even after enabling network.dns.upgrade_with_https_rr, I cannot reproduce the issue.

Flags: needinfo?(loic.yhuel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: