Closed Bug 1353110 Opened 7 years ago Closed 7 years ago

Land pings with telemetry experiment annotations into new source

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: whd)

References

Details

(Whiteboard: [SvcOps])

Now that Bug 1348748 has landed, we should start receiving pings annotated with the new unified experiment annotation object. Pings with a non-empty object should be landed into a new source that includes dimensions for submissionDate, docType, experimentId, and experimentBranch. If a ping is tagged with multiple experiments, it should be written into multiple experiment dimensions.
Points: --- → 1
Priority: -- → P2
Component: Metrics: Pipeline → Pipeline Ingestion
Product: Cloud Services → Data Platform and Tools
Hi Trink, do you have a timeline for when you'll be able to get to this?
Flags: needinfo?(mtrinkala)
I haven't been in the loop on this.  This is for a new direct to parquet output that can write one message to multiple files (all the same schema) correct?  So basically you want what the generic parquet output does with an additional loop over the experiments hash.  Is there a 'new source' schema or is it just the existing schemas (main, crash etc) with the S3 dimensions specified above?

This can be scheduled for the next sprint putting it in production at the very end of May.
Flags: needinfo?(mtrinkala) → needinfo?(ssuh)
Points: 1 → 3
Priority: P2 → P1
Sunah spec'ed out the requirements for a Heka protobuf stream in IRC
https://github.com/mozilla-services/lua_sandbox_extensions/pull/137

Assigning to whd to update the production configurations and deploy (name it as desired):

# /pipeline/modules/pipeline/templates/hindsight/output/telemetry_s3.cfg.erb (add this to the existing cfg)
experiment_dimension_file  = "schema.telemetry.per_experiment.json"


# /pipeline/modules/pipeline/files/schema/schema.telemetry.per_experiment.json (create a new dimension specification)
{
  "version": 1,
  "dimensions": [
    {
      "field_name": "submissionDate",
      "allowed_values": "*"
    },
    {
      "field_name": "docType",
      "allowed_values": "*"
    },
    {
      "field_name": "experimentId",
      "allowed_values": "*",
      "is_variable": true
    },
    {
      "field_name": "experimentBranch",
      "allowed_values": "*",
      "is_variable": true
    }
  ]
}
Assignee: mtrinkala → whd
Points: 3 → 2
Flags: needinfo?(ssuh)
P2 until next sprint per our deploy cadence.
Priority: P1 → P2
Whiteboard: [SvcOps]
whd: What's the date we can expect this in prod, then?
Flags: needinfo?(whd)
2017-05-29. If this is an urgent request, :trink can probably publish the packages sooner and we can do an out-of-band update.
Flags: needinfo?(whd)
While prepping this deploy I noticed we already have a separate heka output for experiments per bug #1255543. It looks like that data source "telemetry-experiments" was never used, and as a result we should close that bug, remove the existing configuration, and replace it with the work here. :mreid does that sound correct?

:sunahsuh mentioned telemetry-cohorts as her placeholder for the new data source name, which I will use unless anyone has a strong opinion about it.
Flags: needinfo?(mreid)
(In reply to Wesley Dawson [:whd] from comment #7)
> While prepping this deploy I noticed we already have a separate heka output
> for experiments per bug #1255543. It looks like that data source
> "telemetry-experiments" was never used, and as a result we should close that
> bug, remove the existing configuration, and replace it with the work here.
> :mreid does that sound correct?

Yes, that sounds right to me. As far as I know, nobody ever actively used the "telemetry-experiments" data source. We should remove its configuration to avoid confusion. I believe there may be plans to expose the Telemetry Experiments annotation in a compatible way with the output in this bug, so if/when that happens, this bug will fully supersede bug 1255543.

> :sunahsuh mentioned telemetry-cohorts as her placeholder for the new data
> source name, which I will use unless anyone has a strong opinion about it.

Sounds good to me.
Flags: needinfo?(mreid)
> :sunahsuh mentioned telemetry-cohorts as her placeholder for the new data
> source name, which I will use unless anyone has a strong opinion about it.

Ha, this design simply alters the dimensions for the experiment output (same prefix).  However, we can run it as two separate plugins and have it work (albeit less efficiently).  In the future the design intentions need to be better communicated/written down and ideally the implementation would be created by the team requesting it and reviewed by ops/myself.
We ended up pushing a patched version per https://github.com/mozilla-services/lua_sandbox_extensions/pull/141 to allow us to run only one output writing to two output prefixes.

I've deployed https://github.com/mozilla-services/puppet-config/pull/2589 which removes the old experiments output and replaces it with telemetry-cohorts. I've updated the metadata bucket with this new source and verified it works from ATMO.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Pipeline Ingestion → General
You need to log in before you can comment on or make changes to this bug.