Closed
Bug 995417
Opened 11 years ago
Closed 10 years ago
Prevent our test suites from resolving non-loopback names
Categories
(Core :: Networking: DNS, defect)
Core
Networking: DNS
Tracking
()
People
(Reporter: ehsan.akhgari, Assigned: froydnj)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 4 obsolete files)
2.25 KB,
patch
|
mcmanus
:
review+
|
Details | Diff | Splinter Review |
4.08 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
Bug 995343 attempts to protect against a subset of the cases where our test suite contacts the external network.
A more complete fix would be to modify our DNS resolver code somewhere around <http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsHostResolver.cpp#1161> to check to see if the name that it has resolved is local or not, and if not, MOZ_CRASH or something like that. We can use either a pref or an environment variable which is set by our test suites to determine whether to skip this check.
This will once and for all put an end to bug 617414.
Reporter | ||
Comment 1•11 years ago
|
||
And by local I mean loopback!
Reporter | ||
Updated•11 years ago
|
Summary: Prevent out test suites from resolving non-local names → Prevent out test suites from resolving non-loopback names
Assignee | ||
Updated•11 years ago
|
Summary: Prevent out test suites from resolving non-loopback names → Prevent our test suites from resolving non-loopback names
Comment 2•11 years ago
|
||
I did a quick and dirty try run with this idea to see how bad things would be... the version I did crashed on non-local connect instead of on dns.
https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
other than marionette (which had a hard time), just 9 or 10 fails. a couple of which I know off the top of my head are false positives (necko special testing rfc 1918 rules) and can be worked around in the test. Seems triageable.
Assignee | ||
Comment 3•11 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #2)
> I did a quick and dirty try run with this idea to see how bad things would
> be... the version I did crashed on non-local connect instead of on dns.
>
> https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
Is it reasonable to crash on non-local connect and to warn people appropriately in dns (NS_WARNING?) that something resolved to a non-local name? Just crashing seems relatively unfriendly; the warning might help people figure out what's going on faster.
Flags: needinfo?(mcmanus)
Comment 4•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #3)
> (In reply to Patrick McManus [:mcmanus] from comment #2)
> > I did a quick and dirty try run with this idea to see how bad things would
> > be... the version I did crashed on non-local connect instead of on dns.
> >
> > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
>
> Is it reasonable to crash on non-local connect and to warn people
> appropriately in dns (NS_WARNING?) that something resolved to a non-local
> name? Just crashing seems relatively unfriendly; the warning might help
> people figure out what's going on faster.
maybe! I don't like putting any crash path in a opt build that doesn't need to crash. Though I'm actually not inclined to use DNS at all because the parser looks up a lot of domain names for cache warming purposes and disabling that takes us farther from testing what we ship.
right now I'm just triaging the fails and trying to help diagnose them.. We can figure out what the enforcement patch ought to look like.
Flags: needinfo?(mcmanus)
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #4)
> (In reply to Nathan Froyd (:froydnj) from comment #3)
> > (In reply to Patrick McManus [:mcmanus] from comment #2)
> > > I did a quick and dirty try run with this idea to see how bad things would
> > > be... the version I did crashed on non-local connect instead of on dns.
> > >
> > > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
> >
> > Is it reasonable to crash on non-local connect and to warn people
> > appropriately in dns (NS_WARNING?) that something resolved to a non-local
> > name? Just crashing seems relatively unfriendly; the warning might help
> > people figure out what's going on faster.
>
> maybe! I don't like putting any crash path in a opt build that doesn't need
> to crash.
I should have been clearer: that crash path would only be if a MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set. Or would you be opposed to that too?
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #5)
> (In reply to Patrick McManus [:mcmanus] from comment #4)
> > (In reply to Nathan Froyd (:froydnj) from comment #3)
> > > (In reply to Patrick McManus [:mcmanus] from comment #2)
> > > > I did a quick and dirty try run with this idea to see how bad things would
> > > > be... the version I did crashed on non-local connect instead of on dns.
> > > >
> > > > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
> > >
> > > Is it reasonable to crash on non-local connect and to warn people
> > > appropriately in dns (NS_WARNING?) that something resolved to a non-local
> > > name? Just crashing seems relatively unfriendly; the warning might help
> > > people figure out what's going on faster.
> >
> > maybe! I don't like putting any crash path in a opt build that doesn't need
> > to crash.
>
> I should have been clearer: that crash path would only be if a
> MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set. Or
> would you be opposed to that too?
I think this is ultimately what we would want to do, once we've fixed all of the existing cases where we attempt to connect to a non-local address in our tests. Note that the NS_WARNING will only work in debug builds though, but hopefully all future failures will hit both debug and non-debug builds alike.
Assignee | ||
Comment 7•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #6)
> > I should have been clearer: that crash path would only be if a
> > MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set. Or
> > would you be opposed to that too?
>
> I think this is ultimately what we would want to do, once we've fixed all of
> the existing cases where we attempt to connect to a non-local address in our
> tests. Note that the NS_WARNING will only work in debug builds though,
Indeed. Ideally people are testing debug builds on try or locally and would see this warning pointing out the problem, rather than trying to figure out what exact thing the socket transport service is grousing about.
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to comment #7)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #6)
> > > I should have been clearer: that crash path would only be if a
> > > MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set. Or
> > > would you be opposed to that too?
> >
> > I think this is ultimately what we would want to do, once we've fixed all of
> > the existing cases where we attempt to connect to a non-local address in our
> > tests. Note that the NS_WARNING will only work in debug builds though,
>
> Indeed. Ideally people are testing debug builds on try or locally and would
> see this warning pointing out the problem, rather than trying to figure out
> what exact thing the socket transport service is grousing about.
We could also make the MOZ_CRASH string argument very descriptive, get it to point to an MDN article, etc. as needed.
Assignee | ||
Comment 9•11 years ago
|
||
This patch is the patch I've been using locally to try to triage non-local
connection bugs. In conjunction with patch 2 (coming soon), this seems to DTRT
for mochitest runs.
Assignee | ||
Comment 10•11 years ago
|
||
...and here's the test bits. I believe there's actually a third place we setup
environment variables, but maybe that's been changed nowadays...
Assignee | ||
Comment 11•11 years ago
|
||
This patch series induces all sorts of interesting oranges, mostly to do with IPC:
https://tbpl.mozilla.org/?tree=Try&rev=49e9c7a82979
Running most of the crashing tests standalone locally reveals no problems, although they do crash when run in directory batches. Some subtle race in the networking code?
Assignee | ||
Updated•11 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Assignee | ||
Comment 12•11 years ago
|
||
I am seeing test_speculative_connect.js xpcshell failures locally:
7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 204] 1 == 1
7:48.00 BAD CONNECT: connecting to 10.0.0.1 0
7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 207] "object" != undefined
7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 212] "object" != undefined
7:48.00 <<<<<<<
PROCESS-CRASH | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | application crashed [Unknown top frame]
Happens on try, too:
https://tbpl.mozilla.org/php/getParsedLog.php?id=39672912&tree=Try
Should we be checking for IsIPAddrLocal in addition to IsLoopBackAddress:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/DNS.cpp#181
?
I'm also seeing strange failures on try on OSX:
https://tbpl.mozilla.org/php/getParsedLog.php?id=39672348&tree=Try
I think this is because our Macs don't properly resolve localhost (!), which seems to be a surprisingly common problem (the first link is actually the most interesting):
https://discussions.apple.com/thread/2613868
http://apple.stackexchange.com/questions/99996/which-setting-in-osx-could-block-ping-localhost
Not really sure what to do about that one. Patrick, can you shed some networking expertise on these two problems?
Flags: needinfo?(mcmanus)
Comment 13•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #12)
> I am seeing test_speculative_connect.js xpcshell failures locally:
>
I think that test does a bunch of off host things well beyond the rfc 1918 stuff (that we should keep around). can you file a separate bug about it, assign to :sworkman and cc me?
>
> https://tbpl.mozilla.org/php/getParsedLog.php?id=39672348&tree=Try
>
> I think this is because our Macs don't properly resolve localhost (!),
I'm not familiar with that. where are they trying to connect to? (I would assume it would just fail and that shouldn't trigger the new assert code, right?)
Flags: needinfo?(mcmanus)
Assignee | ||
Comment 15•11 years ago
|
||
Also set the appropriate environment variable for xpcshell.
Attachment #8407077 -
Attachment is obsolete: true
Assignee | ||
Comment 16•10 years ago
|
||
All right, we are close enough to having all the appropriate dependent bugs
fixed that we can think about reviewing these patches and landing them in short
order. First up, the networking changes.
Bikeshedding on the message welcome.
Attachment #8423866 -
Attachment is obsolete: true
Attachment #8437909 -
Flags: review?(mcmanus)
Assignee | ||
Comment 17•10 years ago
|
||
Now for the umpteen different places to set the appropriate environment
variable so things will crash when we connect to external hosts.
Attachment #8423867 -
Attachment is obsolete: true
Attachment #8437910 -
Flags: review?(jmaher)
Comment 18•10 years ago
|
||
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS
Review of attachment 8437910 [details] [diff] [review]:
-----------------------------------------------------------------
remoteautomation.py/automation.py.i - for android tests which use this we need to reference the webserver on the host machine, this is by default non localhost. We also have b2g tests as well to consider. Should we be doing this for talos and reftest?
Attachment #8437910 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 19•10 years ago
|
||
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS
(In reply to Joel Maher (:jmaher) from comment #18)
> remoteautomation.py/automation.py.i - for android tests which use this we
> need to reference the webserver on the host machine, this is by default non
> localhost. We also have b2g tests as well to consider. Should we be doing
> this for talos and reftest?
reftests pick up MOZ_DISABLE_NONLOCAL_CONNECTIONS from one of these files (I forget which one offhand). Talos should do this eventually, too.
We don't need to worry about remote tests needing to access the host machines because the networking changes actually check whether we're talking to localhost or something on the local network:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/DNS.cpp#181
and the host machines have appropriate IP addresses in this case. (I ran into this very problem initially and almost opted not to set MOZ_DISABLE_NONLOCAL_CONNECTIONS for remote tests, but finally realized that I needed to allow local network connections, not just loopback ones.)
Example try run with these patches and a few miscellaneous ones from not-yet-fixed bugs:
https://tbpl.mozilla.org/?tree=Try&rev=82ef10561051
You can see the green for Android and B2G tests.
So, with that explanation at hand, re-r?'ing
Attachment #8437910 -
Flags: review- → review?(jmaher)
Comment 20•10 years ago
|
||
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS
Review of attachment 8437910 [details] [diff] [review]:
-----------------------------------------------------------------
thanks for the context here. The try run shows xpcshell failures on B2G, not sure if that is related to this patch or not.
Attachment #8437910 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 21•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #20)
> thanks for the context here. The try run shows xpcshell failures on B2G,
> not sure if that is related to this patch or not.
Thanks for the review, my fault for not explaining things fully the first time around. The B2G xpcshell failures are from bug 1023638, which just landed on inbound this morning.
Comment 22•10 years ago
|
||
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS
Thank you for your work on this bug!
After this initial patch lands (I think it's best to land this first before anything regresses), it would be great if we could add the define to more of the locations listed in bug 1023483 comment 3 - since aiui cppunit tests, jetpack etc won't have it enabled, unless they happen to inherit from automation.py/automationutils.py.
But work for another bug :-)
Comment 23•10 years ago
|
||
Comment on attachment 8437909 [details] [diff] [review]
part 1 - netwerk/ changes for crashing on non-local connections
Review of attachment 8437909 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/base/src/nsSocketTransport2.cpp
@@ +1190,5 @@
> + "Non-local network connections are disabled and a connection "
> + "attempt to %s was made. You should only access hostnames "
> + "available via the test networking proxy (if running mochitests) "
> + "or from a test-specific httpd.js server (if running xpcshell tests).",
> + mHost.get());
thanks.. lets print out the address too - we've already been bitten by different resolvers giving different answers (i.e. does-not-exist vs catchall). mNetAddr.GetAddress() iirc.
r+ with that
Attachment #8437909 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 24•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a3bfa5ab131e
https://hg.mozilla.org/integration/mozilla-inbound/rev/51342b493983
Landed with small changes to the error message wording and the addition of the IP address thus resolved. Try runs:
https://tbpl.mozilla.org/?tree=Try&rev=c4a2cc9381ef (almost there except for Android M5)
https://tbpl.mozilla.org/?tree=Try&rev=911517506d3a (with current-this-morning m-c, looking green enough that I'm going to cancel the rest of the runs)
I'll make a post to dev-platform.
Flags: in-testsuite+
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → nfroyd
Comment 25•10 years ago
|
||
I've mentioned the need for MOZ_DISABLE_NONLOCAL_CONNECTIONS to be defined in new harnesses, here:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#8.29_Must_avoid_patterns_known_to_cause_non_deterministic_failures
I'll file bugs for defining it in the other places mentioned in bug 1023483 comment 3 that don't overlap with those already set, so we get coverage in things like Jetpack too.
Status: NEW → ASSIGNED
Comment 26•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a3bfa5ab131e
https://hg.mozilla.org/mozilla-central/rev/51342b493983
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Comment 27•10 years ago
|
||
Try runs for branch uplifts:
Aurora - https://tbpl.mozilla.org/?tree=Try&rev=63c92347e1ae
Beta - https://tbpl.mozilla.org/?tree=Try&rev=11e59c657ccc
status-b2g-v1.3:
--- → wontfix
status-b2g-v1.3T:
--- → wontfix
status-b2g-v1.4:
--- → wontfix
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → fixed
status-firefox31:
--- → affected
status-firefox32:
--- → affected
status-firefox33:
--- → fixed
status-firefox-esr24:
--- → wontfix
Comment 28•10 years ago
|
||
Actually, b2g30 is going to outlive esr24, so we should probably aim to get it on there as well.
https://tbpl.mozilla.org/?tree=Try&rev=90c9016d084e
Comment 29•10 years ago
|
||
> Aurora - https://tbpl.mozilla.org/?tree=Try&rev=63c92347e1ae
Looks green to me (amongst the jobs I'd expect to actually be so on Aurora). I'll aim to uplift this there tomorrow morning.
> Beta - https://tbpl.mozilla.org/?tree=Try&rev=11e59c657ccc
Only one relevant failure in xpcshell:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42378982&tree=Try
This appears to be bug 995806, which I had mistakenly marked Gecko 31 as unaffected at the time. However, the patch in bug 995806 is going to need rebasing for that branch, so I've pinged maxim in the bug.
> Actually, b2g30 is going to outlive esr24, so we should probably aim to get
> it on there as well.
> https://tbpl.mozilla.org/?tree=Try&rev=90c9016d084e
This is failing in test_accounts.js due to trying to hit api.accounts.firefox.com:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42378299&tree=Try
It appears that's bug 995599, but that was uplifted to b2g30. I've pinged rnewman in the bug for help since markh is away until August.
It's also hitting Webapp update checks on B2G desktop mochitests:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42383005&tree=Try
I'll try backporting bug 1025959 to fix that. Not sure why I marked the B2G branches as unaffected at the time.
Comment 30•10 years ago
|
||
Comment 31•10 years ago
|
||
Beta looks green with bug 995806 landed. I'll push these patches there soon.
b2g30 is still hitting B2G desktop mochitest failures, however:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42573795&tree=Try
17:07:05 INFO - *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
17:07:05 INFO - Non-local network connections are disabled and a connection attempt to inapp-pay-test.paas.allizom.org (63.245.215.73) was made. You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server.
Looks like the webapp updates need to be disabled harder?
Comment 32•10 years ago
|
||
Comment 33•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #31)
> 17:07:05 INFO - *** UTM:SVC TimerManager:notify - notified
> @mozilla.org/b2g/webapps-update-timer;1
> 17:07:05 INFO - Non-local network connections are disabled and a
> connection attempt to inapp-pay-test.paas.allizom.org (63.245.215.73) was
> made. You should only access hostnames available via the test networking
> proxy (if running mochitests) or from a test-specific httpd.js server (if
> running xpcshell tests). Browser services should be disabled or redirected
> to a local server.
Noting that other logs linked in this bug also show marketplace.allizom.org connection attempts, the most relevant hit I can find on this appears to be in build/test/integration/helper.js:
https://github.com/mozilla-b2g/gaia/blob/v1.4/build/test/integration/helper.js#L46
Conveniently, bug 983573 massively refactored that code and landed on 2.0+, which would explain why this is only hitting v1.4. Unfortunately, that patch doesn't have a chance of being uplifted to v1.4. So I think we're looking for a way to pref-off or disable our way to victory on this one.
George, does my theory make any sense? Any ideas for what I might be able to do to avoid hitting the network from this code during mochitest runs?
Flags: needinfo?(gduan)
Comment 34•10 years ago
|
||
Note that webapp update checks are supposed to be disabled via:
http://mxr.mozilla.org/mozilla-b2g30_v1_4/source/testing/profiles/prefs_general.js#176
Comment 35•10 years ago
|
||
Bug 984274 disables webapp updates harder and apparently does the trick! Will land this on b2g30 once that patch is ready for landing & uplift.
https://tbpl.mozilla.org/?tree=Try&rev=cce952525b28
Flags: needinfo?(gduan)
Comment 36•10 years ago
|
||
Comment 37•10 years ago
|
||
Argh, bug 984274 caused a couple B2G mochitests to perma-fail, so I had to back it and this out. :(
Comment 38•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•