Investigate whether Talos automation telemetry submissions were accepted

RESOLVED FIXED

Status

defect
--
major
RESOLVED FIXED
5 years ago
6 months ago

People

(Reporter: emorley, Unassigned)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments)

(Reporter)

Description

5 years ago
As part of bug 1026970 (switching on a new feature to help us catch stray connections to external sites during our automated test runs), it was discovered that Talos performance tests attempt to make connections to incoming.telemetry.mozilla.org, eg:

https://tbpl.mozilla.org/php/getParsedLog.php?id=41951942&tree=Try
04:03:56     INFO -  Non-local network connections are disabled and a connection attempt to incoming.telemetry.mozilla.org (54.245.232.191) was made. 

This is because the telemetry URI wasn't set to the dummy value in Talos prefs, since it doesn't use prefs_general.js (see bug 1023483).

Bug 996871 (landed in production 2014-06-24) has since ensured Talos runs use this dummy URI, however this has been occurring for months/years, so:

1) Were these submissions making it through the VPN to incoming.telemetry.mozilla.org?
2) Were we accepting those submissions (ie not filtered out server side)?
3) If the answer both to #1 and #2 is yes:
  a) How much has this skewed the data (I don't know what % of submissions we're talking about here)?
  b) Do we want to try and remove those submissions retrospectively (if that's possible)?

Mark - I'm not sure who is best to needinfo here, if it's not you would you mind redirecting?
Flags: needinfo?(mreid)
(In reply to Ed Morley [:edmorley UTC+0] from comment #0) 
> 1) Were these submissions making it through the VPN to
> incoming.telemetry.mozilla.org?
How would I identify these submissions?  I can take a look and see whether they made it to the server.

> 2) Were we accepting those submissions (ie not filtered out server side)?
If they were valid-looking submissions, they would have been accepted on the server side if they did make it.

> 3) If the answer both to #1 and #2 is yes:
>   a) How much has this skewed the data (I don't know what % of submissions
> we're talking about here)?
I will check this once I know how to identify the submissions.

>   b) Do we want to try and remove those submissions retrospectively (if
> that's possible)?
I think we probably do want to get rid of them, but this has potentially far-reaching implications for any aggregated data we have (mainly data used by the telemetry dashboards).

I'm going to do some preliminary investigation to see if the talos pageloader addon is appearing in telemetry submissions, but if you have any suggestions on how to more easily identify these payloads, please let me know.
Flags: needinfo?(mreid)
Vladan suggested I check the proportion of submissions with "default" as the appUpdateChannel.

For 2014-05-01, there were 4072 submissions on that channel.  For comparison, the number of "nightly" submissions for that day is approximately 140k (based on checking the telemetry.m.o dashboard).

None of the above submissions had the talos pageloader addon listed (according to http://hg.mozilla.org/build/talos/file/f9136c4bc616/talos/pageloader/chrome.manifest  the addon id is {8AF052F5-8EFE-4359-8266-E16498A82E8B}).
(Reporter)

Comment 3

5 years ago
(In reply to Mark Reid [:mreid] from comment #2)
> Vladan suggested I check the proportion of submissions with "default" as the
> appUpdateChannel.

Do we definitely use 'default' in automation? I guess for Nightlies running Talos we definitely don't (though they'll 0.1% of the total). Do you record the IP of the machine making the submission, or is that discarded for privacy reasons?
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> Do we definitely use 'default' in automation? I guess for Nightlies running
> Talos we definitely don't (though they'll 0.1% of the total). Do you record
> the IP of the machine making the submission, or is that discarded for
> privacy reasons?
The IP is discarded before it hits long-term storage. I can check on new incoming data briefly, but if the problem has been fixed on Talos, that doesn't help...
Do you have any other ideas about how I could identify submissions from Talos?
Flags: needinfo?(emorley)
(Reporter)

Comment 6

5 years ago
(In reply to Mark Reid [:mreid] from comment #5)
> Do you have any other ideas about how I could identify submissions from
> Talos?

Joel, do you have any ideas here? Would checking for the presence of talos pageloader addon be sufficient to find all potential accidental talos submissions to telemetry? We can't use IP, since it is discarded.
Flags: needinfo?(emorley) → needinfo?(jmaher)
pageloader doesn't account for all our talos tests (most of them), could we detect certain preferences? maybe hardware specifics?
Flags: needinfo?(jmaher)
We can try fingerprinting Talos machines based on their hardware. Joel, can you attach the contents of the about:telemetry page from a Talos machine. The data from the "System Information" subsection ought to be enough.
Flags: needinfo?(jmaher)
Vladan,  here is a list of the hardware we use for talos:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation

In order to get to about:telemetry, I will need to get loaners (which I need one for another purpose).  If you need me to get one of each type, please let me know.
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #9)
> In order to get to about:telemetry, I will need to get loaners (which I need
> one for another purpose).  If you need me to get one of each type, please
> let me know.

yes, please let us know the about:telemetry sysinfo section for each type of Talos hardware
ok, this will take a week or two, but I will get it.
Depends on: 1035193
Depends on: 1035309
Depends on: 1035310
Depends on: 1035311
Depends on: 1035312
Depends on: 1035313
Depends on: 1035314
Roberto, can you check for the proprotion of Telemetry reports with the sysinfo characteristics?

You can use a combination of locale + cpucount + memsize + OS + the various hasflags + profileHDDModel + adapterVendorID + adapterDeviceID + adapterRam. Unfortunately for Mac, this is going to catch a lot of Telemetry submissions from real users, since there's less variation in Mac h/w configurations.
Flags: needinfo?(rvitillo)
OK, I will check it.
Flags: needinfo?(rvitillo)
Assignee: nobody → rvitillo
Looking at the submissions of nightly between 14/06 and 18/06 I see only 15 out of 1049600 submissions matching the following criteria (windows 7 box): 

"OS": "WINNT",
"memsize": 3063,
"arch": "x86",
"version": "6.1",
"binHDDModel": "WDC WD5003ABYX-01WER",
"adapterVendorID": "0x10de",
"adapterDeviceID": "0x104a",
"adapterRAM": "1023"

Doesn’t seem like there is anything interesting here.
Assignee: rvitillo → nobody
Using only the prefix of binHDDModel (i.e. "WDC WD5003ABYX") to match pings doesn't change the result.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

6 months ago
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.