Closed Bug 1026970 Opened 10 years ago Closed 10 years ago

Talos needs to define MOZ_DISABLE_NONLOCAL_CONNECTIONS

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(2 files, 3 obsolete files)

Talos doesn't currently define MOZ_DISABLE_NONLOCAL_CONNECTIONS, so we aren't protecting ourselves against non-loopback connections (eg external resource outside of the build network).

We need to:
a) Make sure it is defined.
b) Fix any failures caused by pre-existing bad behaviour (much of this will just be ensuring the automation prefs for disabling things like safebrowsing etc are set; see bug 1023483 comment 0 for what other suites use).
We'll need to wait until any test failures are fixed before this can land - but here's the patch for now.

I wasn't sure if the ttest.py change makes the PerfConfigurator.py change unnecessary - the "remote-specific defaults" comment to me implies they only override - but there is already duplication between the two locations?

This try run will tell us what needs fixing before this can land:
https://tbpl.mozilla.org/?tree=Try&rev=a06dd6870851&jobname=talos
Attachment #8441977 - Flags: review?(jmaher)
> This try run will tell us what needs fixing before this can land:
> https://tbpl.mozilla.org/?tree=Try&rev=a06dd6870851&jobname=talos

Another one with a TBPL parsable error prefix, and the MOZ_CRASH removed, in the hope we can find more than just the first failure in each run:
https://tbpl.mozilla.org/?tree=Try&rev=c088826545b9&jobname=talos
Depends on: 1026869
Comment on attachment 8441977 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS

Review of attachment 8441977 [details] [diff] [review]:
-----------------------------------------------------------------

just remove the changes in ttest.py and we can go forward with the changes in PerfConfigurator.py.

::: talos/ttest.py
@@ +278,5 @@
>          else:
>              utils.setEnvironmentVars({'MOZ_CRASHREPORTER_DISABLE': '1'})
>  
> +        # Crash on non-local network connections.
> +        utils.setEnvironmentVars({'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1'})

we don't need this set here- all global environment variables are set in Perfconfigurator.py and conditional ones are set in ttest.py
Attachment #8441977 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #3)
> we don't need this set here- all global environment variables are set in
> Perfconfigurator.py and conditional ones are set in ttest.py

Ah - have qreffed locally and filed bug 1027074 for removing another dupe, which is what made me think it needed specifying in both locations :-)

Will hold off landing until the external-access failures are fixed.
Depends on: 1027113
Depends on: 994302, 996871
Updated, but needs deps before landing.
Attachment #8441977 - Attachment is obsolete: true
Now that bug 1028999 has updated the talos zip with the latest fixes in the dependant bugs, I've pushed to try again to see where we're at:
remote:   https://tbpl.mozilla.org/?tree=Try&rev=55bfde8a3bd2
Unfortunately there are still quite a few failures (search for "Non-local" in the full logs to find the actual failure line):

# dromaeojs, tp5o, svgr:
telemetry-experiment.cdn.mozilla.net
https://tbpl.mozilla.org/php/getParsedLog.php?id=42428257&tree=Try

# remote-tp4m_nochrome:
www.gstatic.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try

# xperf:
ynuf.alipay.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try

# remote-trobopan:
upload.wikimedia.org
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429059&full=1&branch=try

# remote-troboprovider:
safebrowsing.google.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429023&full=1&branch=try
> # dromaeojs, tp5o, svgr:
> telemetry-experiment.cdn.mozilla.net
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42428257&tree=Try

http://mxr.mozilla.org/mozilla-central/source/testing/profiles/prefs_general.js#61
61 user_pref("experiments.manifest.uri", "http://%(server)s/experiments-dummy/manifest");

> # remote-tp4m_nochrome:
> www.gstatic.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try

Seems to be from:
page_load_test/mobile_tp4/news.google.com/news.google.com/index.html

> # xperf:
> ynuf.alipay.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try

From:
page_load_test/tp5n/alipay.com/www.alipay.com/index.html

> # remote-trobopan:
> upload.wikimedia.org
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429059&full=1&branch=try

From:
https://hg.mozilla.org/build/talos/file/3a6a28b96228/talos/startup_test/fennecmark/wikipedia.html#l1095
  1095 		<a style="background-image: url(http://upload.wikimedia.org/wikipedia/en/b/bc/Wiki.png);" href="http://localhost/wiki/Main_Page" title="Visit the main page"></a>

> # remote-troboprovider:
> safebrowsing.google.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429023&full=1&branch=try

I thought we'd disabled safebrowsing?

Seems to happen straight after startup:

02:32:14     INFO -  06-25 02:32:01.640 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: PAGE_SHOW
02:32:14     INFO -  06-25 02:32:01.640 D/GeckoToolbar( 2187): onTabChanged: VIEWPORT_CHANGE
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: VIEWPORT_CHANGE
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoToolbar( 2187): onTabChanged: STOP
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoHealthRec( 2187): Got all add-ons and prefs.
02:32:14     INFO -  06-25 02:32:01.648 I/GeckoHealthRec( 2187): Persisting 1 add-ons.
02:32:14     INFO -  06-25 02:32:01.648 I/GeckoHealthRec( 2187): Persisting prefs.
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: extensions.blocklist.enabled
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: intl.accept_languages
02:32:14     INFO -  06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: toolkit.telemetry.enabled
02:32:14     INFO -  06-25 02:32:01.671 D/GeckoHealthRec( 2187): Done initializing profile cache. Beginning storage init.
02:32:14     INFO -  06-25 02:32:01.679 E/GeckoConsole( 2187): Sending snapshot message.
02:32:14     INFO -  06-25 02:32:01.679 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: STOP
02:32:14     INFO -  06-25 02:32:01.726 I/GeckoDisplayPort( 2187): Set strategy VelocityBiasStrategy mult=2.0, threshold=5.1200004, reverse=0.2, dangerBaseX=1.0, dangerBaseY=1.0, dangerIncrX=0.0, dangerIncrY=0.0
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): Registering Idle observer callback
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): Register idle observer 0x6c7d4dc0 for 180 seconds
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): Register: adjusting next switch from -1 to 180 seconds
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): next timeout 179999 msec from now
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): SetTimerExpiryIfBefore: next timeout 179999 msec from now
02:32:14     INFO -  06-25 02:32:01.750 I/IdleService( 2187): reset timer expiry to 180009 msec from now
02:32:14     INFO -  06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Initializing measurement org.mozilla.appSessions to 4 (current 0)
02:32:14     INFO -  06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Measurement org.mozilla.appSessions now at 4
02:32:14     INFO -  06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Initializing measurement org.mozilla.searches.counts to 5 (current 0)
02:32:14     INFO -  06-25 02:32:01.859 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Measurement org.mozilla.searches.counts now at 5
02:32:14     INFO -  06-25 02:32:01.859 D/GeckoHealthRec( 2187): Ensuring environment.
02:32:14     INFO -  06-25 02:32:01.914 D/GeckoHealthRec( 2187): Finishing init.
02:32:14     INFO -  06-25 02:32:01.937 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:01.953 D/GeckoHealthRec( 2187): Checking for orphan session.
02:32:14     INFO -  06-25 02:32:02.078 V/GeckoHealthRec( 2187): Recorded session entry for env 1, current is 1
02:32:14     INFO -  06-25 02:32:02.171 D/GeckoToolbar( 2187): onTabChanged: THUMBNAIL
02:32:14     INFO -  06-25 02:32:02.171 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: THUMBNAIL
02:32:14     INFO -  06-25 02:32:02.484 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:03.507 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:04.359 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:05.453 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:07.000 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:08.281 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14     INFO -  06-25 02:32:09.187 I/Gecko   ( 2187): TEST-UNEXPECTED-FAIL | Non-local network connections are disabled and a connection attempt to safebrowsing.google.com (74.125.239.132) was made.
No longer depends on: 994302
so safebrowsing prefs are set to *disable* it:
02:12:42     INFO -    browser.link.open_newwindow: 2
02:12:42     INFO -    browser.safebrowsing.enabled: false
02:12:42     INFO -    browser.safebrowsing.gethashURL: http://127.0.0.1/safebrowsing-dummy/gethash
02:12:42     INFO -    browser.safebrowsing.keyURL: http://127.0.0.1/safebrowsing-dummy/newkey
02:12:42     INFO -    browser.safebrowsing.malware.enabled: false
02:12:42     INFO -    browser.safebrowsing.updateURL: http://127.0.0.1/safebrowsing-dummy/update

maybe we have more safe browsing prefs to adjust.

dcamp, do you know who would have more information about safe browsing preferences so we can disable external access?
Flags: needinfo?(dcamp)
Depends on: 1030093
Depends on: 1030111
fixed a couple issues, still have:
> # remote-tp4m_nochrome:
> www.gstatic.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try

Seems to be from:
page_load_test/mobile_tp4/news.google.com/news.google.com/index.html

> # xperf:
> ynuf.alipay.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try

From:
page_load_test/tp5n/alipay.com/www.alipay.com/index.html

and safebrowsing.
I've filed bug 1030161 for following up on whether the Talos telemetry submissions made it through and if so, by how much they skewed the data.
Depends on: 1030166
Depends on: 1030169
Depends on: 1030177
Depends on: 1030179
filed bug 1030179 to track safe browsing and asked :gcp for information there.
Flags: needinfo?(dcamp)
Latest try push:
remote:   https://tbpl.mozilla.org/?tree=Try&rev=ee8087524eb1

Note I didn't update talos.zip, only the rev used by talos_from_code.py, but this will at least show the fixes where we use the repo directly and also for the remote talos tp hosted files.
The problem with this new version is that android tests fail on the first failure (process crash) and in the case of tpn, we fixed one problem and now we have a new one.
(In reply to Joel Maher (:jmaher) from comment #16)
> The problem with this new version is that android tests fail on the first
> failure (process crash) and in the case of tpn, we fixed one problem and now
> we have a new one.

Yeah I will switch to make the failure non-fatal once I see where we're at - I tried this before but we were missing failures on some platforms, since they weren't all showing up in the logs (need to dig into it more).

Another try push since I bungled the talos rev for the comment 15 one:
remote:   https://tbpl.mozilla.org/?tree=Try&rev=aaa025fc2ca0
xperf is still getting ynuf.alipay.com failures, but I can't see why, since it's supposed to be using the same tp5n zip as everything else:

06:09:40     INFO -              u'tp5o': {u'pagesets_manifest_path': u'talos/page_load_test/tp5n/tp5o.manifest',
06:09:40     INFO -                        u'pagesets_parent_dir_path': u'talos/page_load_test/',
06:09:40     INFO -                        u'pagesets_url': u'http://talos-bundles.pvt.build.mozilla.org/zips/tp5n.zip',
06:09:40     INFO -                        u'plugins': {u'32': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip',
06:09:40     INFO -                                     u'64': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip'},
06:09:40     INFO -                        u'tests': [u'tp5o']},
...
06:09:40     INFO -              u'xperf': {u'pagesets_manifest_path': u'talos/page_load_test/tp5n/tp5n.manifest',
06:09:40     INFO -                         u'pagesets_parent_dir_path': u'talos/page_load_test/',
06:09:40     INFO -                         u'pagesets_url': u'http://talos-bundles.pvt.build.mozilla.org/zips/tp5n.zip',
06:09:40     INFO -                         u'plugins': {u'32': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip',
06:09:40     INFO -                                      u'64': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip'},
06:09:40     INFO -                         u'talos_options': [u'--xperf_path',
06:09:40     INFO -                                            u'"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"',
06:09:40     INFO -                                            u'C:/slave/talos-data/talos/xperf.config'],
06:09:40     INFO -                         u'tests': [u'tp5n']}},

Maybe it's not using the manifests correctly?
Also what's strange on the tpn failures, is that "non-local" doesn't appear anywhere in the log? Are we not flushing stderr in time? We're also not finding the crash dump.
I pinged about uploading tp5n.zip again, that will probably be next week.  As for tpn on android, we will need to figure out why this crash happens.
Joel, I'd like to get this work backported to 31 so we can benefit from this work throughout the next ESR cycle. How practical is that?
We can backport it but we are not green yet.  I believe we are green on desktop and the failures on android can be done outside of talos.  The one exception is safe browsing in bug 1030179.  If that was in, I would say we are ready to backport to beta/aurora.
New try run (linking to tbpl-dev since at the time it had the new parser tweak that tbpl prod didn't):
https://tbpl-dev.allizom.org/?tree=Try&rev=349e5eb93687&jobname=talos

We've had a regression (from bug 1009816), will file a dep bug now.
Depends on: 1048374
New try run with the fix from bug 1048374 (for platforms that use talos from the repo) + TBPL now supporting the new line prefix:
https://tbpl.mozilla.org/?tree=Try&rev=29cd10e84f99
Depends on: 1048448
The latest try run (comment 24) which includes the fix from bug 1048374 is all green apart from:

Android 4.0 Panda try talos remote-troboprovider:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45190486&tree=Try
12:05:02     INFO -  08-04 12:04:57.617 I/Gecko   ( 2191): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to safebrowsing.google.com (173.194.33.46) was made.

Android 4.0 Panda try talos remote-tp4m_nochrome:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45192357&tree=Try
-> sadly we don't get the error in the log, only:
command timed out: 3600 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_panda_talos.py', '--talos-suite', 'remote-tp4m_nochrome', '--cfg', 'android/android_panda_talos_releng.py', '--branch-name', 'Try', '--blob-upload-branch', 'Try'], attempting to kill

...however for tp4m I'm guessing bug 1048448 will sort this.

Joel, do you know if there is a bug on file for ensuring crashes during tp4m get dumped to the log correctly? 
I'll also file a bug for ensuring the "FATAL ERROR" doesn't get sanitised to "FATAL ERR_R" for the remote-troboprovider example above.
Meant to say - I think even if we can't turn this on for all platforms/talos suites, it would be good if we could enable at least for desktop in the short term, to prevent further regressions like bug 1048374.
Shall we just land this for now (once bug 1048448 lands) & follow up tprovider after?
Attachment #8467680 - Flags: review?(jmaher)
Attachment #8442865 - Attachment is obsolete: true
(In reply to Ed Morley [:edmorley] from comment #25)
> I'll also file a bug for ensuring the "FATAL ERROR" doesn't get sanitised to
> "FATAL ERR_R" for the remote-troboprovider example above.

Bug 1048836.
Comment on attachment 8467680 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS for all suites apart from tprovider on mobile

Review of attachment 8467680 [details] [diff] [review]:
-----------------------------------------------------------------

once the new tp4m.zip file is deployed we can retrigger the tpn job and verify it is green.

::: talos/ttest.py
@@ +281,5 @@
> +        # Crash on non-local network connections.
> +        # Not enabled for Android tprovider due to yet to be diagnosed connections
> +        # to safebrowsing.google.com.
> +        if not browser_config['remote'] or test_config['name'] != 'tprovider':
> +            utils.setEnvironmentVars({'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1'})

just check if test_config['name'] != 'tprovider', this way we require this for android.
Attachment #8467680 - Flags: review?(jmaher) → review-
Same but without check for mobile (given that presumably tprovider test name is unique to the Android job).
Attachment #8467710 - Flags: review?(jmaher)
Attachment #8467680 - Attachment is obsolete: true
Comment on attachment 8467710 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS for all suites apart from tprovider

Review of attachment 8467710 [details] [diff] [review]:
-----------------------------------------------------------------

this is great- I assume that we will have this set by default in the browser for the desktop and other tests, we don't need to set an environment variable?
Attachment #8467710 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #31)
> this is great- I assume that we will have this set by default in the browser
> for the desktop and other tests, we don't need to set an environment
> variable?

I'm sorry I don't understand? This patch is setting an environment variable - both for desktop and mobile. (This doesn't require any changes to gecko, since the check for the environment variable is done in the networking code already - bug 995417).
forget my confusion- I was typing from the left brain and understanding this from the right brain :)
Ah cool :-)

Landing this now, but we'll obviously need to do another try run once bug 1048448 is deployed.

remote:   https://hg.mozilla.org/build/talos/rev/2185428a0061
so even with updating mobile_tp4 pageset, tpn is still crashing:
https://tbpl.mozilla.org/?tree=Try&rev=29cd10e84f99

I am not sure what to do here, someone would need to debug this locally, and then maybe on a live system.
I see (not obvious, due to bug 1048836):
https://tbpl.mozilla.org/php/getParsedLog.php?id=45358480&full=1&branch=try
12:46:21     INFO -  08-06 12:45:10.671 I/Gecko   ( 2198): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.230.116.138) was made.
A grep of the tp4 zip linked from the other bug for "//g-ecx.images-amazon.com" returns hits in:
mobile_tp4/amazon.com/www.amazon.com/index.html
mobile_tp4/amazon.com/z-ecx.images-amazon.com/images/G/01/nav2/gamma/cmuAnnotations/cmuAnnotations-cmuAnnotations-44699.js._V174015304_.js
mobile_tp4/m.amazon.com/www.amazon.com/gp/anywhere/scripts/touchsmart.css/8448790c6fd3d4fd7f45b4089c520045.css
Depends on: 1050161
Broken the mobile tpn pageset issue out to bug 1050161.
Attachment #8470002 - Flags: review?(emorley) → review+
Depends on: 1050769
Depends on: 1050824
Depends on: 1051993
Depends on: 1051998
Calling this fixed apart from bug 1051998 (tpr - tprovider), which is waiting on bug 1030179.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: