Closed Bug 1753003 Opened 3 years ago Closed 3 years ago

Investigate network connections to incoming.telemetry.mozilla.org during telemetry client tests

Categories

(Remote Protocol :: Marionette, defect, P1)

Default
defect

Tracking

(firefox98 fixed)

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: jdescottes, Assigned: jdescottes)

References

Details

Attachments

(1 file, 1 obsolete file)

Filing as follow up to the last few comment in Bug 1371576, focused around network connections in telemetry tests.

See comment 30 to comment 35

Despite setting telemetry.fog.test.localhost_port to -1 in the test runner, we still seem to perform a ping to incoming.telemetry.mozilla.org as seen on the following try push which forced MOZ_DISABLE_NONLOCAL_CONNECTIONS: https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=cOjOySeaT_WMzmheMk9D4Q.0&revision=d991ee6c5a39a57b508e3fbba93dc344004b3397

We should investigate where this ping comes from and if we can prevent it.

Over on the other bug the default search engine was mentioned which also causes a network access when just typing text in the URL bar and hitting enter. AFAIR this is what the telemetry tests are using? If that is the case the default engine has to be switched to a mocked version that is hosted locally or the used HTTP server as used by the Marionette harness.

Here an example how to override the default search engine:
https://searchfox.org/mozilla-central/rev/b0c5c3b9821c2f22193fd6e1e9f66032639da1a1/toolkit/components/search/tests/xpcshell/test_override_allowlist.js#369-370

This Bug is about the remaining pings to incoming.telemetry.mozilla.org, let's keep the discussion about the search feature separated.

See Also: → 1753034

(( For my own later reference, see this rev for how to enable non-local connection crashing. ))

:jdescottes, is there any way to get the path and file of the remote request or the (likely JS) stack of the code that's calling it? I know of a few other pieces of code that might be, but without either of those pieces of information I'd be running at them blindly.

Systems I know of that use incoming.tmo:

  • Firefox Desktop Telemetry: the big one. It uses a server defined at toolkit.telemetry.server and pays attention to the datareporting.healthreport.uploadEnabled pref (docs.
  • Glean (via Firefox on Glean): the new kid. It uses either incoming.tmo or localhost based on the value of the telemetry.fog.test.localhost_port pref ( > 0 ? use localhost:<that value>, < 0 ? use no networking at all and pretend to succeed, = 0 ? use incoming.tmo) and also pays attention to the datareporting.healthreport.uploadEnabled pref (docs).
  • The launcher process seems to hardcode the server
    • But it's Windows-only and you see this on Linux. Likely not this one.
  • Downgrade Telemetry grabs the server pref but operates early enough that it may depend on how the profile is applied using the runner.
    • But it grabs it early enough that it probably shouldn't get as far as TelemetrySession::observe - user-interaction-inactive notified. (it should happen before the browser properly even starts.
  • The Default Browser Agent also hardcodes the server
    • Like the launcher process, this is Windows-only. Likely not this one so long as we continue to reproduce on Linux.
  • PingCenter [sic] is another system that uses the Data Pipeline at incoming.tmo
    • But it uses TelemetrySend these days, so it shares the same fate (and logging-ness) as Firefox Desktop Telemetry.
  • Activity Stream (a part of new tab) also uses incoming.telemetry.mozilla.org (TIL). Its pref is telemetry.structuredIngestion.endpoint (code) which no one seems to change?
    • Could be this, I suppose. Not sure we ever load about:newtab, but maybe it does some preliminary work even without being loaded? Could test this by setting the structured ingestion endpoint to some other host.

In terms of action items, I'll turn on Glean logging and change the structured ingestion endpoint and see if that gives us any more information.

Assignee: nobody → chutten
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1

My initial idea was to run the job and record a profile on the side, to capture network requests and eventually get stacktraces, but that doesn't seem to work by default. While I try to find another way of getting the info, I'll log the value of some preferences on Marionette startup.

(In reply to Chris H-C :chutten from comment #3)

  • Activity Stream (a part of new tab) also uses incoming.telemetry.mozilla.org (TIL). Its pref is telemetry.structuredIngestion.endpoint (code) which no one seems to change?
    • Could be this, I suppose. Not sure we ever load about:newtab, but maybe it does some preliminary work even without being loaded? Could test this by setting the structured ingestion endpoint to some other host.

Looks like this is the culprit! I tried to mock both telemetry.structuredIngestion.endpoint and browser.newtabpage.activity-stream.telemetry.structuredIngestion.endpoint, and the test now crashes on a google.com request from one of the tests exercising search.

(edit: although I don't know why I don't reproduce the issue locally)

We disable about:newtab by default in Marionette. But letting Marionette open an instance of Firefox I still see New tab as tab title. Also for opening a new tab Marionette currently uses the BrowserOpenTab method right now which calls openTrustedLinkIn(BROWSER_NEW_TAB_URL, ...).

But all that works for Marionette proper. Maybe there is a specific preference the Telemetry harness resets and which is causing Firefox to send a ping?

(In reply to Julian Descottes [:jdescottes] from comment #5)

(In reply to Chris H-C :chutten from comment #3)

  • Activity Stream (a part of new tab) also uses incoming.telemetry.mozilla.org (TIL). Its pref is telemetry.structuredIngestion.endpoint (code) which no one seems to change?
    • Could be this, I suppose. Not sure we ever load about:newtab, but maybe it does some preliminary work even without being loaded? Could test this by setting the structured ingestion endpoint to some other host.

Looks like this is the culprit! I tried to mock both telemetry.structuredIngestion.endpoint and browser.newtabpage.activity-stream.telemetry.structuredIngestion.endpoint, and the test now crashes on a google.com request from one of the tests exercising search.

(edit: although I don't know why I don't reproduce the issue locally)

I'd r+ a change that kicked those out and skipped the search tests pending bug 1753034

Redirect preferences used for pings from Activity Stream to dummy local urls, and skip tests relying on search engine pages

Assignee: chutten → jdescottes
Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/45815e758336 Remove non-local connections from telemetry-tests-client r=chutten,whimboo,webdriver-reviewers
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Comment on attachment 9262183 [details]
Bug 1753003 - Remove no-longer-needed global Glean eslint annotations r?janerik!

Revision D137793 was moved to bug 1752586. Setting attachment 9262183 [details] to obsolete.

Attachment #9262183 - Attachment is obsolete: true
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: