Closed
Bug 1026970
Opened 10 years ago
Closed 10 years ago
Talos needs to define MOZ_DISABLE_NONLOCAL_CONNECTIONS
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(2 files, 3 obsolete files)
1.40 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
1.06 KB,
patch
|
emorley
:
review+
|
Details | Diff | Splinter Review |
Talos doesn't currently define MOZ_DISABLE_NONLOCAL_CONNECTIONS, so we aren't protecting ourselves against non-loopback connections (eg external resource outside of the build network).
We need to:
a) Make sure it is defined.
b) Fix any failures caused by pre-existing bad behaviour (much of this will just be ensuring the automation prefs for disabling things like safebrowsing etc are set; see bug 1023483 comment 0 for what other suites use).
Assignee | ||
Comment 1•10 years ago
|
||
We'll need to wait until any test failures are fixed before this can land - but here's the patch for now.
I wasn't sure if the ttest.py change makes the PerfConfigurator.py change unnecessary - the "remote-specific defaults" comment to me implies they only override - but there is already duplication between the two locations?
This try run will tell us what needs fixing before this can land:
https://tbpl.mozilla.org/?tree=Try&rev=a06dd6870851&jobname=talos
Attachment #8441977 -
Flags: review?(jmaher)
Assignee | ||
Comment 2•10 years ago
|
||
> This try run will tell us what needs fixing before this can land:
> https://tbpl.mozilla.org/?tree=Try&rev=a06dd6870851&jobname=talos
Another one with a TBPL parsable error prefix, and the MOZ_CRASH removed, in the hope we can find more than just the first failure in each run:
https://tbpl.mozilla.org/?tree=Try&rev=c088826545b9&jobname=talos
Comment 3•10 years ago
|
||
Comment on attachment 8441977 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS
Review of attachment 8441977 [details] [diff] [review]:
-----------------------------------------------------------------
just remove the changes in ttest.py and we can go forward with the changes in PerfConfigurator.py.
::: talos/ttest.py
@@ +278,5 @@
> else:
> utils.setEnvironmentVars({'MOZ_CRASHREPORTER_DISABLE': '1'})
>
> + # Crash on non-local network connections.
> + utils.setEnvironmentVars({'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1'})
we don't need this set here- all global environment variables are set in Perfconfigurator.py and conditional ones are set in ttest.py
Attachment #8441977 -
Flags: review?(jmaher) → review+
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> we don't need this set here- all global environment variables are set in
> Perfconfigurator.py and conditional ones are set in ttest.py
Ah - have qreffed locally and filed bug 1027074 for removing another dupe, which is what made me think it needed specifying in both locations :-)
Will hold off landing until the external-access failures are fixed.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 7•10 years ago
|
||
Updated, but needs deps before landing.
Assignee | ||
Updated•10 years ago
|
Attachment #8441977 -
Attachment is obsolete: true
Assignee | ||
Comment 8•10 years ago
|
||
Now that bug 1028999 has updated the talos zip with the latest fixes in the dependant bugs, I've pushed to try again to see where we're at:
remote: https://tbpl.mozilla.org/?tree=Try&rev=55bfde8a3bd2
Assignee | ||
Comment 9•10 years ago
|
||
Unfortunately there are still quite a few failures (search for "Non-local" in the full logs to find the actual failure line):
# dromaeojs, tp5o, svgr:
telemetry-experiment.cdn.mozilla.net
https://tbpl.mozilla.org/php/getParsedLog.php?id=42428257&tree=Try
# remote-tp4m_nochrome:
www.gstatic.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try
# xperf:
ynuf.alipay.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try
# remote-trobopan:
upload.wikimedia.org
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429059&full=1&branch=try
# remote-troboprovider:
safebrowsing.google.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=42429023&full=1&branch=try
Assignee | ||
Comment 10•10 years ago
|
||
> # dromaeojs, tp5o, svgr:
> telemetry-experiment.cdn.mozilla.net
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42428257&tree=Try
http://mxr.mozilla.org/mozilla-central/source/testing/profiles/prefs_general.js#61
61 user_pref("experiments.manifest.uri", "http://%(server)s/experiments-dummy/manifest");
> # remote-tp4m_nochrome:
> www.gstatic.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try
Seems to be from:
page_load_test/mobile_tp4/news.google.com/news.google.com/index.html
> # xperf:
> ynuf.alipay.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try
From:
page_load_test/tp5n/alipay.com/www.alipay.com/index.html
> # remote-trobopan:
> upload.wikimedia.org
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429059&full=1&branch=try
From:
https://hg.mozilla.org/build/talos/file/3a6a28b96228/talos/startup_test/fennecmark/wikipedia.html#l1095
1095 <a style="background-image: url(http://upload.wikimedia.org/wikipedia/en/b/bc/Wiki.png);" href="http://localhost/wiki/Main_Page" title="Visit the main page"></a>
> # remote-troboprovider:
> safebrowsing.google.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429023&full=1&branch=try
I thought we'd disabled safebrowsing?
Seems to happen straight after startup:
02:32:14 INFO - 06-25 02:32:01.640 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: PAGE_SHOW
02:32:14 INFO - 06-25 02:32:01.640 D/GeckoToolbar( 2187): onTabChanged: VIEWPORT_CHANGE
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: VIEWPORT_CHANGE
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoToolbar( 2187): onTabChanged: STOP
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoHealthRec( 2187): Got all add-ons and prefs.
02:32:14 INFO - 06-25 02:32:01.648 I/GeckoHealthRec( 2187): Persisting 1 add-ons.
02:32:14 INFO - 06-25 02:32:01.648 I/GeckoHealthRec( 2187): Persisting prefs.
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: extensions.blocklist.enabled
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: intl.accept_languages
02:32:14 INFO - 06-25 02:32:01.648 D/GeckoHealthRec( 2187): Incorporating environment: toolkit.telemetry.enabled
02:32:14 INFO - 06-25 02:32:01.671 D/GeckoHealthRec( 2187): Done initializing profile cache. Beginning storage init.
02:32:14 INFO - 06-25 02:32:01.679 E/GeckoConsole( 2187): Sending snapshot message.
02:32:14 INFO - 06-25 02:32:01.679 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: STOP
02:32:14 INFO - 06-25 02:32:01.726 I/GeckoDisplayPort( 2187): Set strategy VelocityBiasStrategy mult=2.0, threshold=5.1200004, reverse=0.2, dangerBaseX=1.0, dangerBaseY=1.0, dangerIncrX=0.0, dangerIncrY=0.0
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): Registering Idle observer callback
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): Register idle observer 0x6c7d4dc0 for 180 seconds
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): Register: adjusting next switch from -1 to 180 seconds
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): next timeout 179999 msec from now
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): SetTimerExpiryIfBefore: next timeout 179999 msec from now
02:32:14 INFO - 06-25 02:32:01.750 I/IdleService( 2187): reset timer expiry to 180009 msec from now
02:32:14 INFO - 06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Initializing measurement org.mozilla.appSessions to 4 (current 0)
02:32:14 INFO - 06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Measurement org.mozilla.appSessions now at 4
02:32:14 INFO - 06-25 02:32:01.820 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Initializing measurement org.mozilla.searches.counts to 5 (current 0)
02:32:14 INFO - 06-25 02:32:01.859 I/GeckoLogger( 2187): fennec :: HealthReportStorage :: Measurement org.mozilla.searches.counts now at 5
02:32:14 INFO - 06-25 02:32:01.859 D/GeckoHealthRec( 2187): Ensuring environment.
02:32:14 INFO - 06-25 02:32:01.914 D/GeckoHealthRec( 2187): Finishing init.
02:32:14 INFO - 06-25 02:32:01.937 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:01.953 D/GeckoHealthRec( 2187): Checking for orphan session.
02:32:14 INFO - 06-25 02:32:02.078 V/GeckoHealthRec( 2187): Recorded session entry for env 1, current is 1
02:32:14 INFO - 06-25 02:32:02.171 D/GeckoToolbar( 2187): onTabChanged: THUMBNAIL
02:32:14 INFO - 06-25 02:32:02.171 D/GeckoBrowserApp( 2187): BrowserApp.onTabChanged: 0: THUMBNAIL
02:32:14 INFO - 06-25 02:32:02.484 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:03.507 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:04.359 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:05.453 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:07.000 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:08.281 D/GeckoSuggestedSites( 2187): Number of suggested sites: 4
02:32:14 INFO - 06-25 02:32:09.187 I/Gecko ( 2187): TEST-UNEXPECTED-FAIL | Non-local network connections are disabled and a connection attempt to safebrowsing.google.com (74.125.239.132) was made.
Comment 11•10 years ago
|
||
so safebrowsing prefs are set to *disable* it:
02:12:42 INFO - browser.link.open_newwindow: 2
02:12:42 INFO - browser.safebrowsing.enabled: false
02:12:42 INFO - browser.safebrowsing.gethashURL: http://127.0.0.1/safebrowsing-dummy/gethash
02:12:42 INFO - browser.safebrowsing.keyURL: http://127.0.0.1/safebrowsing-dummy/newkey
02:12:42 INFO - browser.safebrowsing.malware.enabled: false
02:12:42 INFO - browser.safebrowsing.updateURL: http://127.0.0.1/safebrowsing-dummy/update
maybe we have more safe browsing prefs to adjust.
dcamp, do you know who would have more information about safe browsing preferences so we can disable external access?
Flags: needinfo?(dcamp)
Comment 12•10 years ago
|
||
fixed a couple issues, still have:
> # remote-tp4m_nochrome:
> www.gstatic.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42439912&full=1&branch=try
Seems to be from:
page_load_test/mobile_tp4/news.google.com/news.google.com/index.html
> # xperf:
> ynuf.alipay.com
> https://tbpl.mozilla.org/php/getParsedLog.php?id=42429996&tree=Try
From:
page_load_test/tp5n/alipay.com/www.alipay.com/index.html
and safebrowsing.
Assignee | ||
Comment 13•10 years ago
|
||
I've filed bug 1030161 for following up on whether the Talos telemetry submissions made it through and if so, by how much they skewed the data.
Comment 14•10 years ago
|
||
filed bug 1030179 to track safe browsing and asked :gcp for information there.
Flags: needinfo?(dcamp)
Assignee | ||
Comment 15•10 years ago
|
||
Latest try push:
remote: https://tbpl.mozilla.org/?tree=Try&rev=ee8087524eb1
Note I didn't update talos.zip, only the rev used by talos_from_code.py, but this will at least show the fixes where we use the repo directly and also for the remote talos tp hosted files.
Comment 16•10 years ago
|
||
The problem with this new version is that android tests fail on the first failure (process crash) and in the case of tpn, we fixed one problem and now we have a new one.
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #16)
> The problem with this new version is that android tests fail on the first
> failure (process crash) and in the case of tpn, we fixed one problem and now
> we have a new one.
Yeah I will switch to make the failure non-fatal once I see where we're at - I tried this before but we were missing failures on some platforms, since they weren't all showing up in the logs (need to dig into it more).
Another try push since I bungled the talos rev for the comment 15 one:
remote: https://tbpl.mozilla.org/?tree=Try&rev=aaa025fc2ca0
Assignee | ||
Comment 18•10 years ago
|
||
xperf is still getting ynuf.alipay.com failures, but I can't see why, since it's supposed to be using the same tp5n zip as everything else:
06:09:40 INFO - u'tp5o': {u'pagesets_manifest_path': u'talos/page_load_test/tp5n/tp5o.manifest',
06:09:40 INFO - u'pagesets_parent_dir_path': u'talos/page_load_test/',
06:09:40 INFO - u'pagesets_url': u'http://talos-bundles.pvt.build.mozilla.org/zips/tp5n.zip',
06:09:40 INFO - u'plugins': {u'32': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip',
06:09:40 INFO - u'64': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip'},
06:09:40 INFO - u'tests': [u'tp5o']},
...
06:09:40 INFO - u'xperf': {u'pagesets_manifest_path': u'talos/page_load_test/tp5n/tp5n.manifest',
06:09:40 INFO - u'pagesets_parent_dir_path': u'talos/page_load_test/',
06:09:40 INFO - u'pagesets_url': u'http://talos-bundles.pvt.build.mozilla.org/zips/tp5n.zip',
06:09:40 INFO - u'plugins': {u'32': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip',
06:09:40 INFO - u'64': u'http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip'},
06:09:40 INFO - u'talos_options': [u'--xperf_path',
06:09:40 INFO - u'"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"',
06:09:40 INFO - u'C:/slave/talos-data/talos/xperf.config'],
06:09:40 INFO - u'tests': [u'tp5n']}},
Maybe it's not using the manifests correctly?
Assignee | ||
Comment 19•10 years ago
|
||
Also what's strange on the tpn failures, is that "non-local" doesn't appear anywhere in the log? Are we not flushing stderr in time? We're also not finding the crash dump.
Comment 20•10 years ago
|
||
I pinged about uploading tp5n.zip again, that will probably be next week. As for tpn on android, we will need to figure out why this crash happens.
Comment 21•10 years ago
|
||
Joel, I'd like to get this work backported to 31 so we can benefit from this work throughout the next ESR cycle. How practical is that?
Comment 22•10 years ago
|
||
We can backport it but we are not green yet. I believe we are green on desktop and the failures on android can be done outside of talos. The one exception is safe browsing in bug 1030179. If that was in, I would say we are ready to backport to beta/aurora.
Assignee | ||
Comment 23•10 years ago
|
||
New try run (linking to tbpl-dev since at the time it had the new parser tweak that tbpl prod didn't):
https://tbpl-dev.allizom.org/?tree=Try&rev=349e5eb93687&jobname=talos
We've had a regression (from bug 1009816), will file a dep bug now.
Assignee | ||
Comment 24•10 years ago
|
||
New try run with the fix from bug 1048374 (for platforms that use talos from the repo) + TBPL now supporting the new line prefix:
https://tbpl.mozilla.org/?tree=Try&rev=29cd10e84f99
Assignee | ||
Comment 25•10 years ago
|
||
The latest try run (comment 24) which includes the fix from bug 1048374 is all green apart from:
Android 4.0 Panda try talos remote-troboprovider:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45190486&tree=Try
12:05:02 INFO - 08-04 12:04:57.617 I/Gecko ( 2191): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to safebrowsing.google.com (173.194.33.46) was made.
Android 4.0 Panda try talos remote-tp4m_nochrome:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45192357&tree=Try
-> sadly we don't get the error in the log, only:
command timed out: 3600 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_panda_talos.py', '--talos-suite', 'remote-tp4m_nochrome', '--cfg', 'android/android_panda_talos_releng.py', '--branch-name', 'Try', '--blob-upload-branch', 'Try'], attempting to kill
...however for tp4m I'm guessing bug 1048448 will sort this.
Joel, do you know if there is a bug on file for ensuring crashes during tp4m get dumped to the log correctly?
I'll also file a bug for ensuring the "FATAL ERROR" doesn't get sanitised to "FATAL ERR_R" for the remote-troboprovider example above.
Assignee | ||
Comment 26•10 years ago
|
||
Meant to say - I think even if we can't turn this on for all platforms/talos suites, it would be good if we could enable at least for desktop in the short term, to prevent further regressions like bug 1048374.
Assignee | ||
Comment 27•10 years ago
|
||
Shall we just land this for now (once bug 1048448 lands) & follow up tprovider after?
Attachment #8467680 -
Flags: review?(jmaher)
Assignee | ||
Updated•10 years ago
|
Attachment #8442865 -
Attachment is obsolete: true
Assignee | ||
Comment 28•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #25)
> I'll also file a bug for ensuring the "FATAL ERROR" doesn't get sanitised to
> "FATAL ERR_R" for the remote-troboprovider example above.
Bug 1048836.
Comment 29•10 years ago
|
||
Comment on attachment 8467680 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS for all suites apart from tprovider on mobile
Review of attachment 8467680 [details] [diff] [review]:
-----------------------------------------------------------------
once the new tp4m.zip file is deployed we can retrigger the tpn job and verify it is green.
::: talos/ttest.py
@@ +281,5 @@
> + # Crash on non-local network connections.
> + # Not enabled for Android tprovider due to yet to be diagnosed connections
> + # to safebrowsing.google.com.
> + if not browser_config['remote'] or test_config['name'] != 'tprovider':
> + utils.setEnvironmentVars({'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1'})
just check if test_config['name'] != 'tprovider', this way we require this for android.
Attachment #8467680 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 30•10 years ago
|
||
Same but without check for mobile (given that presumably tprovider test name is unique to the Android job).
Attachment #8467710 -
Flags: review?(jmaher)
Assignee | ||
Updated•10 years ago
|
Attachment #8467680 -
Attachment is obsolete: true
Comment 31•10 years ago
|
||
Comment on attachment 8467710 [details] [diff] [review]
Define MOZ_DISABLE_NONLOCAL_CONNECTIONS for all suites apart from tprovider
Review of attachment 8467710 [details] [diff] [review]:
-----------------------------------------------------------------
this is great- I assume that we will have this set by default in the browser for the desktop and other tests, we don't need to set an environment variable?
Attachment #8467710 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 32•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #31)
> this is great- I assume that we will have this set by default in the browser
> for the desktop and other tests, we don't need to set an environment
> variable?
I'm sorry I don't understand? This patch is setting an environment variable - both for desktop and mobile. (This doesn't require any changes to gecko, since the check for the environment variable is done in the networking code already - bug 995417).
Comment 33•10 years ago
|
||
forget my confusion- I was typing from the left brain and understanding this from the right brain :)
Assignee | ||
Comment 34•10 years ago
|
||
Ah cool :-)
Landing this now, but we'll obviously need to do another try run once bug 1048448 is deployed.
remote: https://hg.mozilla.org/build/talos/rev/2185428a0061
Comment 35•10 years ago
|
||
so even with updating mobile_tp4 pageset, tpn is still crashing:
https://tbpl.mozilla.org/?tree=Try&rev=29cd10e84f99
I am not sure what to do here, someone would need to debug this locally, and then maybe on a live system.
Assignee | ||
Comment 36•10 years ago
|
||
I see (not obvious, due to bug 1048836):
https://tbpl.mozilla.org/php/getParsedLog.php?id=45358480&full=1&branch=try
12:46:21 INFO - 08-06 12:45:10.671 I/Gecko ( 2198): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.230.116.138) was made.
Assignee | ||
Comment 37•10 years ago
|
||
A grep of the tp4 zip linked from the other bug for "//g-ecx.images-amazon.com" returns hits in:
mobile_tp4/amazon.com/www.amazon.com/index.html
mobile_tp4/amazon.com/z-ecx.images-amazon.com/images/G/01/nav2/gamma/cmuAnnotations/cmuAnnotations-cmuAnnotations-44699.js._V174015304_.js
mobile_tp4/m.amazon.com/www.amazon.com/gp/anywhere/scripts/touchsmart.css/8448790c6fd3d4fd7f45b4089c520045.css
Assignee | ||
Comment 38•10 years ago
|
||
Broken the mobile tpn pageset issue out to bug 1050161.
Comment 39•10 years ago
|
||
Attachment #8470002 -
Flags: review?(emorley)
Assignee | ||
Updated•10 years ago
|
Attachment #8470002 -
Flags: review?(emorley) → review+
Comment 40•10 years ago
|
||
Assignee | ||
Comment 41•10 years ago
|
||
Calling this fixed apart from bug 1051998 (tpr - tprovider), which is waiting on bug 1030179.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•