Closed Bug 1291340 Opened 4 years ago Closed 4 years ago
Port Sync server metrics to production
49 bytes, text/x-github-pull-request
|Details | Review|
52 bytes, text/x-github-pull-request
|Details | Review|
Sync metrics are currently running in the dev environment (cloudservices-aws-dev) using my one-off method This bug tracks the work required to transition these metrics to production.  https://sql.telemetry.mozilla.org/dashboard/sync  https://github.com/dannycoates/smt
Per a couple of discussions this week, we need to either run the conversion + export code in the prod IAM, or store a pii-scrubbed version of the data that we can access from the dev IAM (specifically from analysis.telemetry.mozilla.org- and airflow-launched instances). I would prefer to be able to run the import / rollup code via airflow, but if it's significantly easier to run it in prod, let's do that. Wesley, what do you think?
As discussed in IRC/vidyo, if the analysis doesn't need access to the PII fields I can refactor https://github.com/mozilla-services/puppet-config/pull/2031/files to facilitate this.
Danny, can you confirm that we don't need access to any PII fields?
Here are the fields we need: uid CHAR(32) NOT NULL encode lzo, -- a sha256-hashed Firefox Account (FxA) user id s_uid CHAR(32) encode lzo, -- a surrogate user id (generated by token server) dev CHAR(32) NOT NULL encode lzo, -- a sha256-hashed FxA device id s_dev CHAR(32) encode lzo, -- a surrogate device id (generated by token server) ts TIMESTAMP NOT NULL encode lzo, -- timestamp of the request method VARCHAR(32) encode lzo, -- request method (GET, POST, etc) code SMALLINT encode lzo, -- http status code of the response bucket VARCHAR(255) encode bytedict, -- sync bucket name (bookmarks, history, etc) t INTEGER encode bytedict, -- request time in milliseconds ua_browser VARCHAR(255) encode lzo, -- request User Agent browser ua_version INTEGER encode lzo, -- request User Agent browser version ua_os VARCHAR(255) encode lzo, -- request User Agent OS host VARCHAR(255) encode lzo -- server hostname that handled the request None of those are immediately PII, though with enough access (to our internal production systems) one could probably decipher what the actual fxa uid and dev are.
@whd, does this list look ok?
(In reply to Mark Reid [:mreid] from comment #5) > @whd, does this list look ok? To me, yes.
Taking this to make the data available to ATMO, whereupon :mreid will take over.
Assignee: mreid → whd
Priority: P2 → P1
https://github.com/mozilla-services/puppet-config/pull/2263 Data should be available in s3://net-mozaws-prod-us-west-2-pipeline-analysis/sync-metrics/data, which should be accessible from ATMO etc. Back to :mreid.
Assignee: whd → mreid
Just checking in to see what's left to get this in production. A lot of work was put into this up until now and I feel like we're really close to the finish line. Until it lands in prod, all of Danny's beautiful dashboards are reporting back inaccurate numbers because data is struggling to refresh in the dev environment. https://sql.telemetry.mozilla.org/dashboard/sync Thanks in advance for updating me on this status.
I'm back on this as my #1 item this week. I should have something preliminary in the next few days.
Priority: P2 → P1
(In reply to Mark Reid [:mreid] from comment #10) > I'm back on this as my #1 item this week. I should have something > preliminary in the next few days. Great news! Thanks for update!
r? @Danny for adherence to the previous conversion and rollup logic. r? @Roberto for partitioning, Dataset mangling, and general Spark stuff (feel free to redirect to others).
Attachment #8810833 - Flags: review?(rvitillo)
Attachment #8810833 - Flags: review?(rvitillo) → review-
Attachment #8824450 - Flags: review?(rvitillo) → review+
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.