Closed Bug 1243237 Opened 8 years ago Closed 8 years ago

DNS failure w/ upgrade to 44.0

Categories

(Core :: Networking: DNS, defect)

44 Branch
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1243098
Tracking Status
firefox44 - affected

People

(Reporter: tgmct, Assigned: mcmanus)

References

Details

(Keywords: regression, regressionwindow-wanted, Whiteboard: [necko-active][44dll])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
Build ID: 20160123151951

Steps to reproduce:

Performed upgrade to version 44.0 via the "about Firefox" method.


Actual results:

Upgrade performed normally and Firefox restarted.  Firefox was unable to perform any DNS lookup after upgrade.  Shutting down and restarted Firefox did NOT correct the situation.  Other browsers on same machine operated normally.  A full machine reboot corrected the situation.


Expected results:

If a reboot is required, the upgrade process should have requested it.  There should also have been a warning indicating that a reboot would be required.
Component: Untriaged → Networking: DNS
Product: Firefox → Core
Tim, thanks for the report. A reboot is not normally required. You obviously hit a bug of some sort but I have no idea what I could be that a restart was not able to clear it.
I have a similar problem to Tim: connecting to sites via IP address only works, but connecting via domain names fail.  Other browsers (Chrome 48.0.2564.82 m, IE 11.0.9600.18163) worked normally.  The only difference between myself and Tim is that a Windows reboot did not fix the problem.

I also uninstalled and re-installed 44.0 but the problem persisted.

I re-installed 43.0.3 from this link: https://ftp.mozilla.org/pub/firefox/releases/43.0.3/win64/en-US/
and the problem was resolved.
(In reply to Gordon Cook from comment #2)
> The only difference
> between myself and Tim is that a Windows reboot did not fix the problem.
> 

Me to. But after tests (safe mode, new profile, complete re-install), i found the pinned link in the bottom bar was corrupted. I made a new one and it works fine.
(In reply to Fred from comment #3)
>
> Me to. But after tests (safe mode, new profile, complete re-install), i
> found the pinned link in the bottom bar was corrupted. I made a new one and
> it works fine.

Can you elaborate further on that? I don't know what the pinned link in the bottom bar is.
Severity: normal → major
A duplicate thread (1243765) suggested a potential relationship to nVidea hardware/software.  For whatever it's worth, this machine has a nVidea Quadro 600 that was OEM installed.  The last driver/software update was performed in August of 2015 to version 353.82.  Firefox has been functioning normally for the most part since the upgrade.  The only issue that is new is that the tabs bar has shifted above the address bar.  My original version of FF is very old and I use the older menu style user interface.
(In reply to Patrick McManus [:mcmanus] from comment #4)

> 
> Can you elaborate further on that? I don't know what the pinned link in the
> bottom bar is.

Well, I don't know the correct name in english but I talk about the persistent taskbar buttons used to launch apps in W7.
This is now preventing DNS lookups on this machine again!  This is very odd where nothing known has changed anything related to Firefox.  Intermittent bugs are very painful.  I will be downgrading to the previous version (43.0.4).  BTW Firefox Developer Edition (46.0a2) is working normally on the same machine.  Go figure.
I can reproduce this on release.
Status: UNCONFIRMED → NEW
Ever confirmed: true
By the way, once we have a fix - update doesn't work at this point.
Flags: needinfo?(ryanvm)
Patrick, Milan pinged me about this bug. As he commented above, he has a repro that we can use to investigate this further. This seems like a critical issue and if we are able to root cause, we can consider this as a ride-along in a dot release. Or if it is severe enough to warrant a dot-release by itself. Thanks!
Flags: needinfo?(mcmanus)
Not sure regression range is possible, since the nightlies work fine...
milan, where is the repro?
Flags: needinfo?(mcmanus) → needinfo?(milan)
44 beta 9 is busted as well.  Yes, it reproduces on my desktop.  I can't get to any site, and "update" also just spins (better description for this is in bug 1243765)
Flags: needinfo?(milan)
Would it make sense for us to either hard code a set of IP addresses for our update servers as a fallback, or try a well known DNS server if OS level DNS resolution fails, or something like that to future proof against an issue like this going forward?
OK, seems I have somehow "polluted" my machine with 44 Beta 9 install.  The 46 dev edition started exhibiting the same problem; resetting the profile helped, but now I have a startup crash (without the crash report, just nothing ever shows up) on all versions.
Please note: OS level DNS resolution is fully operational and working with other browsers on same machine at the same time.
(In reply to Milan Sreckovic [:milan] from comment #17)
> OK, seems I have somehow "polluted" my machine with 44 Beta 9 install.  The
> 46 dev edition started exhibiting the same problem; resetting the profile
> helped, but now I have a startup crash (without the crash report, just
> nothing ever shows up) on all versions.

In this state, when I run my nightly build from a few days ago, I get a "Couldn't load XPCOM" dialog.
let's get one of these
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

and a wireshark trace too
(In reply to Milan Sreckovic [:milan] from comment #19)
> (In reply to Milan Sreckovic [:milan] from comment #17)
> > OK, seems I have somehow "polluted" my machine with 44 Beta 9 install.  The
> > 46 dev edition started exhibiting the same problem; resetting the profile
> > helped, but now I have a startup crash (without the crash report, just
> > nothing ever shows up) on all versions.
> 
> In this state, when I run my nightly build from a few days ago, I get a
> "Couldn't load XPCOM" dialog.

that's not sounding a lot like dns anymore :(

when you just download and run a new nightly, perhaps without installing it, what happens?
OK, unistalling 44 Beta9 fixed the "won't start" issues.  I'll try logging from comment 20.
The log for (mostly) trying to get to yahoo.com, and failing.
please add nsHostResolver:5,GetAddrInfo:5 to the NSPR_LOG_MODULES from comment 20. Sorry I forgot to mention that - forgot the documentation didn't include DNS by default
the log from 23 is kinda interesting - but the additional GetAddrInfo stuff requested in comment 24 will narrow it down a bit more
Flags: needinfo?(milan)
does a new profile still fix the problem (i.e. if you send me the old profile is there a chance I can repro?)

where is this desktop physically?

the log makes it likely we're stuck in a windows system call.
This is getting fun.  I again got into the "won't start at all".  Then I actually got it to start, and it managed to go to yahoo.com.  Then it froze, I got a glimpse of a shockwave message, then my whole computer froze.  Will restart and see where I am.

This is in Toronto office, the Windows desktop machine at my desk.
Flags: needinfo?(milan)
New profile didn't seem to make a difference.
what version of windows (pretty much just for the record - I don't have a theory.)
fwiw - there is nothing in the dns code that looks suspicious in that window.. but hey, foresight isn't as good as hindsight.

is this a win64 build? does a win32 build work differently?

lsps installed?

nvidia?
This is Windows 7 SP1 64-bit, but Firefox is a 32-bit version.  Nvidia Quadro 600.  Don't know what lsps is.
Flags: needinfo?(ryanvm)
Maybe it's co-incidence, mine is also Windows 7 Professional SP1 64-bit.  Firefox was 32-bit version.  I am now running the 64-bit version without any issues. I have a GeForce 9500 GT on a Dell Optiplex 780.
(In reply to Milan Sreckovic [:milan] from comment #31)
>   Don't know what lsps is.

https://msdn.microsoft.com/en-us/library/windows/desktop/bb513664%28v=vs.85%29.aspx

and maybe http://www.cexx.org/lspfix.htm to list them (I can't vouch for that one.. I normally see them as dll's in crash stats)
(In reply to Gordon Cook from comment #32)
> Maybe it's co-incidence, mine is also Windows 7 Professional SP1 64-bit. 
> Firefox was 32-bit version.  I am now running the 64-bit version without any
> issues. I have a GeForce 9500 GT on a Dell Optiplex 780.

Bug 1243765 comment 1 suggests some correlation with Nvidia and 32-bit.  We had a reverse problem with Nvidia and 64-bit in bug 1241921.  Not related, I imagine, but there it is.
My configuration is very similar to Milan's.   Mine is Windows 7 SP1 64-bit, but Firefox is a 32-bit version.  Nvidia Quadro 600 on a Dell Precision 1650.

I'm not an expert with lsps but I sort of doubt it applies where a direct IPv4 address connects without any error.
so this doesn't seem much like a necko DNS bug to me.. I'm happy to keep looking, but I wouldn't focus too much on the DNS symptom hoping for a resolution.

I say this because it comes with all kinds of OS level instability that's not really a failure mode of the dns code.. comment 0 (reboot required!), explorer corruption (comment 3), OS freeze (comment 27)

bugs https://bugzilla.mozilla.org/show_bug.cgi?id=1243910
https://bugzilla.mozilla.org/show_bug.cgi?id=1243914
https://bugzilla.mozilla.org/show_bug.cgi?id=1243765

all seem related.. they all have startup crashes that seem cured by going to 64 bit builds. Comment 32 says it is the same for a reporter of this bug.

https://bugzilla.mozilla.org/show_bug.cgi?id=1243914#c13 is interesting
"Sounds like there might be a 32 bit binary loading (plugin, lsp, or malware, etc.) with Firefox that is preventing startup on 32 bit and not loading on 64 bit so it doesn't prevent startup."

:milan or gordon, do you have a crash report from about:crashes you can link to? It doesn't need to be for this issue, I would just like to see the recorded system information (dll's etc..) for your setup.
Depends on: 1243765
I don't have a crash report from this issue since it does not actually crash; tabs that reference sites with domain names never load and sites with IP addresses continue normally.

That said, as requested:
https://crash-stats.mozilla.com/report/index/7334c6fa-4cf4-48eb-affb-b37eb2160203
(In reply to Gordon Cook from comment #37)
> 
> That said, as requested:
> https://crash-stats.mozilla.com/report/index/7334c6fa-4cf4-48eb-affb-
> b37eb2160203

thanks.. you have a vmware lsp installed. we'll see if that shows up anywhere else too (no reason to think its a problem)
the mention of the backout around the changes to dll interceptor in https://bugzilla.mozilla.org/show_bug.cgi?id=1240977 are certainly interesting.

DNS does load a DLL manually - though this isn't new behavior in firefox 44 (firefox 35 iirc).
:milan - if you can take a try build as a test I can easily make one that has the DNS code not load the dll (it isn't needed for basic functionality). does that make sense?
Flags: needinfo?(milan)
Sure, but as of right now, I lost the ability to reproduce this problem.  I went through a few reboots because that's what my computer does, but I can now get to the sites.  FlashPlayerPlugin_20_0_0_286.exe perhaps plays a part...
Flags: needinfo?(milan)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=79912835fa97 - perhaps tim or gordon could try it if :milan is back in working order..
I've already downgraded to 43.0.4 to regain stability.  This is my production box.  I don't have another machine available running W7-64 native.  Testing in a virtual instance is probably useless.  I would be willing to upgrade to 64 bit version of 44 but for some reason the FEBE extension currently installed is now complaining that it's incompatible w/43.0.4. It may have upgraded during the upgrade up to 44 originally.  I'm trying to avoid a complete rebuild of extensions and the look and feel that I've become familiar with for years.
(In reply to Milan Sreckovic [:milan] from comment #43)
> That build failed.

try this one
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f21f37523872
Milan, if you can give the try build a go that would be great. I am trying to see if we can get a fix ready in the next hour or so, as I gtb 44.0.1. Thanks!
Flags: needinfo?(milan)
(In reply to Ritu Kothari (:ritu) from comment #46)
> Milan, if you can give the try build a go that would be great. I am trying
> to see if we can get a fix ready in the next hour or so, as I gtb 44.0.1.
> Thanks!

fyi - that try build is not a shippable fix. it just helps diagnose the problem.
(In reply to Patrick McManus [:mcmanus] from comment #47)
> (In reply to Ritu Kothari (:ritu) from comment #46)
> > Milan, if you can give the try build a go that would be great. I am trying
> > to see if we can get a fix ready in the next hour or so, as I gtb 44.0.1.
> > Thanks!
> 
> fyi - that try build is not a shippable fix. it just helps diagnose the
> problem.

Oh ok, got it! Thanks for letting me know.
(In reply to Ritu Kothari (:ritu) from comment #49)

> Oh ok, got it! Thanks for letting me know.

I really don't think this is going to be resolved from the dns code (but hey, its not over yet) - it smells like dll handling https://bugzilla.mozilla.org/show_bug.cgi?id=1243765#c24 .. if that's true a number of things other than dns will be having troubles (dns is just sort of universally used..)
Depends on: 1243914
in a different bug another reporter used the above try build and
 1] continued seeing some crashes
 2] had DNS working fine when we didn't crash (before that he had the same symptom of nameresolution failing)

The try build removes the DNS TTL feature (pretty much unchanged since gecko 33) which loads a system DLL. To me this strongly implies something is up with library loading and, as per 1, DNS isn't the only thing to do that.
Assignee: nobody → mcmanus
Whiteboard: [necko-active]
Whiteboard: [necko-active] → [necko-active][44dll]
I'm going to mark this a dup of bug 1243098 - where the work of resolving it seems to be going on.

That bug draws a line to 1218473 through bisecting regression data - the patch for 1218473 was uplifted to both aurora (jan 11) and 44b9. It seems backing that patch out is not a great idea.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
FF is now offering an update to 44.0.1... Is this update the fix to this issue?
Flags: needinfo?(milan)
For affected users in this bug, please see https://bugzilla.mozilla.org/show_bug.cgi?id=1243098#c31. If you're willing, can you please try out that build and let us know if it improves your situation? Thanks!
Flags: needinfo?(tgm)
FYI, the fix for this should be in Firefox 44.0.2 (due for release today).
Shutdown problem seems be have gone away for me.  The "can't connect" is also not there, but this morning was when 44 (again) decided that it will work, so I could get the update right from Firefox.  I'll report if I get more of the "can't connect".
This was resolved as a dup of a bug that is already fixed and tracked.
Flags: needinfo?(tgm)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: