Closed Bug 1609203 Opened 4 years ago Closed 4 years ago

Some tabs fail to load on start with DNS "host not found" errors due to DoH, and work correctly without DoH

Categories

(Core :: Networking: DNS, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1610836
Tracking Status
firefox73 --- unaffected
firefox74 + fixed

People

(Reporter: soeren.hentzschel, Assigned: valentin)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(3 files)

[Tracking Requested - why for this release]:
Firefox no longer restores all tabs after Firefox start.

Since a few days Firefox is no longer able to load all tabs after the start of Firefox. The affected tabs are still there but show only a "page not found" error page. After a reload the affected pages load without any issues. Sometimes only one tab is affected, sometimes two or three, but it's always only a small portion of all tabs. I am not sure if sometimes non-pinned tabs were affected but at least one pinned tabs is always affected.

I have this issue since a few days on two differenct computers: An old iMac with macOS 10.11 and a Late 2018 MacBook with macOS 10.15.

Andrei, can you help find someone to reproduce/get a regression window? Thanks!

Flags: needinfo?(andrei.vaida)

Hello in the attempt to find a regression window for this issue i didn't manage to reproduce it on macOS 10.13 and neither on Linux 18.04 LTS. I tried 2 different Fx versions on both OS's 74.0a1(2020-01-13, 2020-01-14) and also 73.0b4. Also a few different scenarios with various number of pinned tabs and normal tabs. Could you please give more details on how you managed to find the issue, for example: is it a new profile, the number of normal and pinned tabs opened, if there are any addons installed?

Flags: needinfo?(andrei.vaida) → needinfo?(soeren.hentzschel)
Attached video screencast

I attached a screen ast. The screencast shows a restart of Firefox after a Firefox update (that's why you don't see Firefox for a few seconds). In this case only one pinned tab was affected. In the end of the video I reload the tab to show you that the page is working.

Could you please give more details on how you managed to find the issue, for example: is it a new profile, the number of normal and pinned tabs opened, if there are any addons installed?

  • It happens after every Firefox update, only the number of affected tabs is not always the same.
  • Not a new profile, but more than one system is affected
  • I always have at least three pinned tabs, sometimes more
  • The number of non-pinned tabs is very different, always between 0 and 50 the last few days, I guess.
  • Yes, add-ons are installed, but a completly different set of add-ons on both systems.
  • I noticed that Firefox does not even seems to tray to load the affected tabs, the network error pages appears immediately on affected tabs, there is no waiting time at all.

Maybe related to bug 1242912? I don't know the exact date when the problem first occured but it definitely started in the same time frame.

Flags: needinfo?(soeren.hentzschel)

Hello after the details provided in comment 3 I am still unable to reproduce the issue on macOS 10.13. I have tried with 2 different Fx 74.0a1 versions(2020-01-01, 2020-01-12), both started with a new profile, pinned at least 3 tabs and about 2-5 normal tabs and started the update manually from the about firefox menu. After the update everything was loaded and the pinned tabs didn't show any errors.

I am not sure if sometimes non-pinned tabs were affected but at least one pinned tabs is always affected.

Now I know that non-pinned tabs are also affected. I saw this on the one computer yesterday and on the other a few minutes ago. In both cases two pinned and one unpinned tabs didn't get restored.

Emma Malysz - as author of the patch in bug 1242912, do you think there could be a relation? I don't know if your patch is the cause of this regression but your patch affects the creating of the tabs for session restore and the regression started around the same time as your patch landed.

Flags: needinfo?(emalysz)
QA Whiteboard: [qa-regression-triage]

So far, nobody else is able to reproduce. Some questions to try to narrow this down:

  • Does the same issue happen when using safe mode / turning off add-ons?
  • Can you attach your about:support data from at least one affected machine?
  • How often does this happen; 100% of the time, or most of the time, or ...?
  • on an affected machine, can you set browser.sessionstore.debug to true in about:config and reproduce, and see if anything shows up (in the browser console) about the tabs in question?
  • what kind of network error pages are you getting? The phrase "page not found" does not occur in localization files in our tree ( https://searchfox.org/mozilla-central/search?q=page%20not%20found ) as far as I can see, so I assume you're paraphrasing or translating from German or whatever. The original string is going to be more help - I can find the German and work back to what it actually is.
  • what version are you using? The summary and comment #0 all say "Firefox" - are you on release? bug 1242912 is only on nightly so far, so I'm assuming you're on nightly; if so, if you turn off network.process.enabled in about:config, does the problem go away? (please restore to default after testing)
  • any pattern in which sites this happens with?
  • when this happened with non-pinned tabs (comment #5), I assume this was a selected tab? Or do you have browser.sessionstore.restore_on_demand set to false, perhaps?
Flags: needinfo?(emalysz) → needinfo?(soeren.hentzschel)
Attached file about-support.txt

Does the same issue happen when using safe mode / turning off add-ons?

I'll try this soon, but first I'll answer the other questions. Leaving ni? so that I won't forget it.

Can you attach your about:support data from at least one affected machine?

I attached the about:support data from the 2018 MacBook Pro.

How often does this happen; 100% of the time, or most of the time, or ...?

Almost all starts of my Firefox are restarts after a Firefox update. In this scenario it happens 100% of the time. I'll try some regular starts later.

on an affected machine, can you set browser.sessionstore.debug to true in about:config and reproduce, and see if anything shows up (in the browser console) about the tabs in question?

I set it true and will report back once I reproduced the issue again.

what kind of network error pages are you getting? The phrase "page not found" does not occur in localization files in our tree ( https://searchfox.org/mozilla-central/search?q=page%20not%20found ) as far as I can see, so I assume you're paraphrasing or translating from German or whatever. The original string is going to be more help - I can find the German and work back to what it actually is.

Yes, I was paraphrasing from German, sorry. I see the following network error page:

Hmm. We’re having trouble finding that site.
We can’t connect to the server at www.facebook.com.

what version are you using? The summary and comment #0 all say "Firefox" - are you on release? bug 1242912 is only on nightly so far, so I'm assuming you're on nightly;

Yes, Firefox Nightly.

if so, if you turn off network.process.enabled in about:config, does the problem go away? (please restore to default after testing)

I will test this soon.

any pattern in which sites this happens with?

not really, no. But it seems that when I have a few pinned tabs only the first or first two pinned tabs are affected, not the pinned tabs at the end. But maybe it's a coincidence.

when this happened with non-pinned tabs (comment #5), I assume this was a selected tab? Or do you have browser.sessionstore.restore_on_demand set to false, perhaps?

browser.sessionstore.restore_on_demand is set to true.

You're right, when it happened with a non-pinned tab, it was always the selected tab.

(In reply to Sören Hentzschel from comment #8)

Yes, I was paraphrasing from German, sorry. I see the following network error page:

Hmm. We’re having trouble finding that site.
We can’t connect to the server at www.facebook.com.

This is a DNS error, and you opted into a trusted resolver (DNS-over-HTTPS). Does (temporarily) flipping that pref help?

Also curious if there are any errors in the browser console even without the session store logging. It almost sounds like some kind of race condition to do with DNS-over-HTTPS...

Does (temporarily) flipping that pref help?

It seems so!

I did a small test session and tested different scenarios a few times. I was almost always able to reproduce the issue with DoH enabled but was never able to reproduce with DoH disabled.

Disabling all add-on didn't make any difference, also setting network.process.enabled to false did not help.

With browser.sessionstore.debug set to false I don't see any relevant log messages in the browser console. With browser.sessionstore.debug set to true I see the following session restore related messages in the browser console:

SessionStore: [process-switch]: currentRemoteType (web) preferredRemoteType: web
SessionStore: [process-switch]: web, web, false
SessionStore: [process-switch]: type (web) is compatible - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring
SessionStore: [process-switch]: remote subframes disabled - ignoring

Flags: needinfo?(soeren.hentzschel)

Alright, let's move this over to DNS land, as that's what's breaking here.

A regression window would still be super helpful; you can use mozregression to clone an existing profile to ensure it uses the same prefs and session restore file with -p path/to/profile and --profile-persistence clone - would it be possible for you to look for a regression window?

Component: Session Restore → Networking: DNS
Flags: needinfo?(soeren.hentzschel)
Product: Firefox → Core
Summary: Firefox no longer loads all tabs on start → Some tabs fail to load on start with DNS "host not found" errors due to DoH, and work correctly without DoH

To be clear, we're 100% certain that Beta is unaffected here? DoH is due to ship enabled by default for Fx73.

Thank you, Gijs. This was a very useful information how to use mozregression with my default profile. I was able to find the regressing bug. :)

18:13.79 INFO: Narrowed inbound regression window from [01d21786, e52b10b6] (3 builds) to [e1f5d7f5, e52b10b6] (2 builds) (~1 steps left)
18:13.79 INFO: No more inbound revisions, bisection finished.
18:13.79 INFO: Last good revision: e1f5d7f54d16f13836a3bb751bc1b32f28362c15
18:13.79 INFO: First bad revision: e52b10b6f0f771f5e9a39712c25da0fb59b2cf93
18:13.79 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e1f5d7f54d16f13836a3bb751bc1b32f28362c15&tochange=e52b10b6f0f771f5e9a39712c25da0fb59b2cf93

RyanVM: With bug 1552176 as regressing bug Firefox 73 should be unaffected.

Flags: needinfo?(soeren.hentzschel)
Regressed by: 1552176
Has Regression Range: --- → yes

Thanks for the report, Sören! Could you also help us with some logs when reproducing the bug?
If you think this might include personal information please send them to me via email instead of attaching them to the bug (https://send.firefox.com/ works well for this job)
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Logging_HTTP_activity_by_manually_setting_environment_variables

Assignee: nobody → valentin.gosu
QA Whiteboard: [qa-regression-triage] → [qa-regression-triage][necko-triage][trr]
Flags: needinfo?(soeren.hentzschel)
Priority: -- → P1
Attached file logs.zip

Sure! I attached the logs.

Flags: needinfo?(soeren.hentzschel)

tagging Valentin to check out the logs.

Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-triaged]

Already assigned to me 🙂 Based on the logs seems to have the same cause as bug 1610836.

Flags: needinfo?(valentin.gosu)

Hi Sören, can you try to check with a build from https://nightly.mozilla.org/ ? bug 1610836 just landed and this should be fixed now.

Flags: needinfo?(soeren.hentzschel)

Yes, it seems to be fixed. Thanks!

Flags: needinfo?(soeren.hentzschel)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: