Closed Bug 991923 Opened 10 years ago Closed 9 years ago

Server not found when switching networks on Android 3-5

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox34 wontfix, firefox35 wontfix, firefox36 verified, firefox37 verified, relnote-firefox 36+, fennec34+)

VERIFIED FIXED
Firefox 37
Tracking Status
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 --- verified
firefox37 --- verified
relnote-firefox --- 36+
fennec 34+ ---

People

(Reporter: microrffr, Assigned: esawin)

References

Details

Attachments

(3 files, 2 obsolete files)

After I leave a WiFi access point and my phone switches to LTE, trying to open a page shows the "server not found" error. It does this until I restart Nightly (I swipe it from the recent apps and launch it again). Meanwhile, other apps can communicate fine. I'm able to use a terminal emulator to run nslookup commands, and I get results back.
I've seen this also when the cell signal is switching between 3G and 4G.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I've seen similar behavior before, though I've never switched apps to verify the phone is able to still access the network. However, the network indicators are still active and seem to indicate the phone has data access.

This also occurs when switching between wifi and LTE for me.
tracking-fennec: --- → ?
I've seen this, but usually just refreshing the page makes stuff work again. I don't think I've had to restart ever.
Patrick, any thoughts here? Apparently we're not handling switching networks as well as other apps on the system.
tracking-fennec: ? → +
Flags: needinfo?(mcmanus)
this is in scope for 939318 which :bagder is working on RIGHT NOW! :)
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(mcmanus)
Resolution: --- → DUPLICATE
This still happens with bug 939318 fixed.

Right now, all I have to do is:

1. launch Nightly with wifi on
2. load a page
3. turn off wifi
4. try to refresh the page

It immediately shows the server not found error page.
Meanwhile, search suggestions work.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The previous comment's device is a Galaxy S4, I'm reproducing this on Nexus 5 (Android 5) on Fennec 34 and up.
Bug 1024614 introduced DNS cache flushing on link status change, which seems to cause some kind of stale state in the resolver in this case.
Attached patch dns-shutdown-link-changed (obsolete) — Splinter Review
Restarting the resolver after flushing fixes it, so there is an issue with the resolver?
Flags: needinfo?(snorp)
Great find. Just a quick note that shutting down and re-initing the resolver causes other problems (basically there's a short time period where resolves don't work) so it isn't the correct fix.

The question is then - I guess - what in the FlushCache() method is missing?
It is probably the call to res_ninit() from within nsHostResolver::Init() that actually fixes the problem ? Although the logic in ThreadFunc should already call res_ninit() on name resolve failures so I'm not sure.
It looks like the configure check for res_ninit is failing, so we don't call it on failure (or ever). Output in config.log is (on my mac, at least):

configure:13193: checking for res_ninit()
configure:13216: /tools/android-ndk-r10b/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc -o conftest -mandroid -fno-short-enums -fno-exceptions -fdiagnostics-color -march=armv7-a -mthumb -mfpu=vfp -mfloat-abi=softfp -mno-unaligned-access -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno -idirafter /tools/android-ndk-r10b/platforms/android-9/arch-arm/usr/include -fdiagnostics-color  -mandroid -L/tools/android-ndk-r10b/platforms/android-9/arch-arm/usr/lib -Wl,-rpath-link=/tools/android-ndk-r10b/platforms/android-9/arch-arm/usr/lib --sysroot=/tools/android-ndk-r10b/platforms/android-9/arch-arm -llog -Wl,--allow-shlib-undefined -fdiagnostics-color -mthumb -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id conftest.c  1>&5
^[[01m^[[Kconfigure:^[[m^[[K In function '^[[01m^[[Kmain^[[m^[[K':
^[[01m^[[Kconfigure:13212:1:^[[m^[[K ^[[01;35m^[[Kwarning: ^[[m^[[Kimplicit declaration of function '^[[01m^[[Kres_ninit^[[m^[[K' [-Wimplicit-function-declaration]
In file included from ^[[01m^[[Kconfigure:13209:0^[[m^[[K:
^[[01m^[[Kconfigure:13212:22:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kdereferencing pointer to incomplete type
configure: failed program was:
#line 13201 "configure"
#include "confdefs.h"

            #ifdef linux
            #define _BSD_SOURCE 1
            #endif
            #include <sys/types.h>
            #include <netinet/in.h>
            #include <arpa/nameser.h>
            #include <resolv.h>

int main() {
int foo = res_ninit(&_res);
; return 0; }


I can't explain why restarting nsHostResolver would do anything in light of this.
Flags: needinfo?(snorp)
(In reply to Eugen Sawin [:esawin] from comment #9)
> Bug 1024614 introduced DNS cache flushing on link status change, which seems
> to cause some kind of stale state in the resolver in this case.

It seems likely that resolving has been busted the whole time, but prior to the flushing we just used the cached values.
That configure check looks wrong to me. '_res' is supposedly an available global variable used by some res_* functions but res_ninit() itself was made partly to get away from that global struct and instead we pass in our own struct pointer as argument! So, assuming _res together with res_ninit() seems wrong.

I don't have an android build setup here, but I would imagine a simple configure adjustment like this:

--- a/configure.in
+++ b/configure.in
@@ -3079,11 +3079,12 @@ AC_CACHE_CHECK(
             #include <sys/types.h>
             #include <netinet/in.h>
             #include <arpa/nameser.h>
             #include <resolv.h>
             ],
-            [int foo = res_ninit(&_res);],
+            [ res_state rs;
+              int foo = res_ninit(&rs);],
             [ac_cv_func_res_ninit=yes],
             [ac_cv_func_res_ninit=no])
      fi
     ])
Here's the same patch I just mentioned in a comment as a separate clean patch
(In reply to Daniel Stenberg [:bagder] from comment #15)
But we invoke it that same way in nsHostResolver. So if the test is wrong, so is the actual code where we use it.

AFAICT, there is no res_ninit() definition in the NDK for android-9 (which is what we build against), so I don't see how the test can ever work. I just noticed, however, we have a bunch of resolver stuff in other-licenses/android. Presumably that's the getaddrinfo() impl we end up using. There is no public res_ninit() there either (it's in the resolv_private.h), but maybe some changes are necessary? :sworkman added it so maybe he has an idea?
Flags: needinfo?(sworkman)
Yes, if the test works after my patch then the code in nsHostResolver.cpp will still fail on Android but that would then be easy to fix too in the same way.

I found res_ninit() for Android on the link below which made me assume it is present:

https://code.google.com/p/android-source-browsing/source/browse/libc/netbsd/resolv/res_init.c?repo=platform--bionic&r=7f84da69f86ed9daf610c8d1129392ba3f7c4405#166
The bionic res_compat.c file has this comment about _res:

"Binary Compatibility; this symbol does not appear in a header file"
Attached patch dns-restart-link-change (obsolete) — Splinter Review
Without res_ninit support, we end up with resolver threads unable to successfully process getaddrinfo requests. A possible fallback could be to simply unwind and restart the resolver threads in this case.
I'm not overly comfortable with having the procedure within FlushCache, but that can be refactored out before landing.

This is a complementary patch to Daniel's, but even with the res_init fallback on Android, we end up in the same situation (I can look into that, too).
Is this something we want?
Attachment #8532592 - Attachment is obsolete: true
Attachment #8534359 - Flags: feedback?(daniel)
Comment on attachment 8534359 [details] [diff] [review]
dns-restart-link-change

I'm looking into an alternative way to fix this without thread restarting.
Attachment #8534359 - Attachment is obsolete: true
Attachment #8534359 - Flags: feedback?(daniel)
It looks like the Android resolver actually already tries to detect when a DNS configuration change occurred, and reinit appropriately[0]. My guess is that this isn't working on newer versions of Android for whatever reason. Eugen's looking into making that machinery work again, I believe.

[0] http://dxr.mozilla.org/mozilla-central/source/other-licenses/android/res_data.c?from=res_data.c&case=true#1
Depends on: 1109940
Assignee: nobody → esawin
Depends on: 1110529
Blocks: 947801
Sorry it's taken me so long to comment here.

My guess is that Comment #22 in combination with bug 1110529 may be the problem here. So, we weren't calling out to the system DNS daemon, which re-init's itself on DNS changes (e.g. see ConnectivityService.java, bumpDns()), and I don't think our forked resolver for pre-honeycomb was re-init'ing at all - I'm not 100% certain of this one, so it's worth verifying.

Seems like you're working off of that basis anyway, so good job!
Flags: needinfo?(sworkman)
With bug 694325 introduced, we are supposed to handle DNS requests on pre-Honeycomb Android versions using the in-tree DNS resolver and fallback to the system resolver for Android 3 and up.
However, the fallback has been broken, so have been using the in-tree resolver on all versions.

Bug 1109940 and bug 1110529 fix the symbol wrapping for the in-tree resolver. To fix DNS resolution for Android 3 and up, we only need to fix the Android SDK version detection (will be handled in this bug).

To fix DNS resolution for Android 2.x, we need to make sure that res_(n)init is called on link status changes and that res_get_dns_changed is appropriately set in this case (this will also be handled in this bug).
Set Android SDK version, required for Android 3.x+ fallback to system DNS resolver.
Attachment #8536682 - Flags: review?(sworkman)
You think you can release an apk once you feel confident for testing. I am really eager to test.
(In reply to lostinignorance from comment #28)
> You think you can release an apk once you feel confident for testing. I am
> really eager to test.

It will be in Nightly once it's checked in, which should happen shortly after reviews have concluded.
Try for dns-resolver-sdk-version: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=016e033d542d

(In reply to lostinignorance from comment #28)
> You think you can release an apk once you feel confident for testing. I am
> really eager to test.

This should work for Android 3-5: 
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/esawin@mozilla.com-016e033d542d/try-android-api-11/fennec-37.0a1.en-US.android-arm.apk
With the patch posted, I am still unable to get it to resolve any DNS names.  I am able to pull pages over the VPN using IP, but not using any DNS.  Is there anything I can post to help?
The build posted works for me on my Nexus 4 (Android 5.0.1, simple OpenVPN client/server setup).
(In reply to ryan from comment #32)
> The build posted works for me on my Nexus 4 (Android 5.0.1, simple OpenVPN
> client/server setup).

\o/
Using Private Internet Access on Android 5.0.1 with a Nexus 5 I can report that the Nightly APK provided is working with the PIA VPN app. Immediately upon activating my VPN the latest stable Firefox stoppped connecting and the Nightly continued working. 

I will continue playing with the Nightly and my VPN. If the issue appears again I will report back, but SK far so good for the Nightly.
Chiming in to confirm that the posted build works on Nexus 4, Android 5.0, using OpenVPN.
The DNS issue works fine now. There is an issue with NTLM challenge, but the dns issue works with Cisco anyconnect VPN.
forget to mention my device, works using Cisco AnyConnect with my 4.4.4 samsung Tab S
I have reloaded my phone (as 5.0.1 came out for the Nexus 5) and I can confirm it is now working on the nightly build referenced above.
Comment on attachment 8536682 [details] [diff] [review]
dns-resolver-sdk-version

Review of attachment 8536682 [details] [diff] [review]:
-----------------------------------------------------------------

Cool - just take a look at strtol. Otherwise, r=me.

::: other-licenses/android/getaddrinfo.c
@@ +428,5 @@
> +#endif
> +    return len;
> +  }
> +
> +  return (int)strtol(version_str, NULL, 10);

strtol(version_str, version_str+len, 10);

no?
Attachment #8536682 - Flags: review?(sworkman) → review+
(In reply to Steve Workman [:sworkman] (please use needinfo) from comment #39)

> strtol(version_str, version_str+len, 10);
> 
> no?

Nope:

       long int strtol(const char *nptr, char **endptr, int base);

and 'endptr' is allowed to be NULL, I'd say the patch's use if it looks fine.
I've tested on Android 2.3.6 (Nexus S) and switching between networks didn't cause any issues with our in-tree resolver. I also don't see any reports on issues with older Android version, so I'm adjusting the bug title.
Summary: server not found when switching networks → Server not found when switching networks on Android 3-5
Status: REOPENED → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/1f486b3b6b29
Status: ASSIGNED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 37
Comment on attachment 8536682 [details] [diff] [review]
dns-resolver-sdk-version

If approved, it should land after bug 1110529.

Approval Request Comment
[Feature/regressing bug #]: bug 694325
[User impact if declined]: major connectivity issues when switching networks or using VPNs
[Describe test coverage new/current, TBPL]: has been on nightly for some days
[Risks and why]: moderate, fallback to the Android DNS resolver hasn't worked for the past 3 years so this could have side effects, but the benefit is high in terms of increased stability
[String/UUID change made/needed]: none
Attachment #8536682 - Flags: approval-mozilla-aurora?
Since we have this bug for a while, I guess it is a wontfix for 35.
Attachment #8536682 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I have been using Aurora for several days now as a successful work around for this bug.  

I'm not a developer, but I'm happy to provide any info that will help expedite a solution for Firefox. I am using using PIA VPN on a Samsung Galaxy S5 with T-mobile and wifi, with APN protocol set to IPv4. Firefox could not load pages through a VPN, but Aurora is on fire.
This fix will be rolled out with Firefox 36 in Februrary.
Thanks for fixing it! Does that mean users with Play Store Firefox will be exposed to dns leaks (for certain people a big privacy issue) for another two months?
Verified as fixed in builds:
- 36 Beta 1;
- 37.0a1 (2015-01-11);
Device: Nexus 5 (Android 5.0.1).
Status: RESOLVED → VERIFIED
Release Note Request (optional, but appreciated)
[Why is this notable]: Broken VPN support and general connectivity issues have been an issue for a long time, a lot of users require VPN to connect to/from company networks or for security reasons
[Suggested wording]: Fixed DNS resolution when using VPNs or switching networks
[Links (documentation, blog post, etc)]:
relnote-firefox: --- → ?
relnoted in 36 as "Fixed: DNS resolution when using VPNs or switching networks"
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.