Closed
Bug 1246137
Opened 9 years ago
Closed 9 years ago
Add crash pings and data from nightly and aurora to "telemetry-sample".
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Assigned: mreid)
References
Details
(Whiteboard: loasis)
Mark, this depends on your work of adding nightly and aurora to telemetry-release.
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(mreid)
Assignee | ||
Comment 1•9 years ago
|
||
I propose that we do the following:
- Rename 'telemetry-release' to 'telemetry-sample'
- Extend it to include data for all channels (not just release+beta)
- Limit it to 10% of the clientId space using the 'sampleId' field
- Consider dropping any dimensions we don't need.
If / when we move to a longitudinal dataset containing 100% of the data, we can simply use the "raw" telemetry-2 dataset instead, since we won't be selecting by sampleId.
Question - is 10% enough? Are we likely to want more than 10% but less than 100%?
Flags: needinfo?(mreid) → needinfo?(rvitillo)
Reporter | ||
Comment 2•9 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #1)
> I propose that we do the following:
> - Rename 'telemetry-release' to 'telemetry-sample'
> - Extend it to include data for all channels (not just release+beta)
> - Limit it to 10% of the clientId space using the 'sampleId' field
> - Consider dropping any dimensions we don't need.
I am ok with this but it depends on what fraction of profiles we want ultimately end up with in the longitudinal dataset.
> If / when we move to a longitudinal dataset containing 100% of the data, we
> can simply use the "raw" telemetry-2 dataset instead, since we won't be
> selecting by sampleId.
>
> Question - is 10% enough? Are we likely to want more than 10% but less than
> 100%?
We are not going to use more than 10% of the data in the short term, but I can't take the decision for other teams.
Flags: needinfo?(rvitillo)
Updated•9 years ago
|
Assignee: nobody → mreid
Points: --- → 2
Priority: -- → P1
Reporter | ||
Comment 3•9 years ago
|
||
Crash pings should also be stored in the same structure on S3 with main pings.
Reporter | ||
Updated•9 years ago
|
Summary: Add nightly and aurora channels to longitudinal datasets. → Add crash pings and data from nightly and aurora to "telemetry-sample".
Reporter | ||
Updated•9 years ago
|
Whiteboard: loasis
Assignee | ||
Comment 4•9 years ago
|
||
Sent a PR to implement the above changes:
https://github.com/mozilla-services/puppet-config/pull/1779
Reporter | ||
Comment 5•9 years ago
|
||
We can't generate an updated longitudinal dataset as "telemetry-release" stopped updating the 12th and there are only a handful of dates in "telemetry-sample".
Mark, could you please backfill telemetry-sample until the 15th of November?
Flags: needinfo?(mreid)
Comment 7•9 years ago
|
||
As jobs complete the data is being uploaded by a separate process to the prod bucket, starting at 20160212 and going backwards.
Assignee | ||
Comment 8•9 years ago
|
||
I just had a look and it appears the data is backfilled to Nov 15th.
I noticed that all the records have "UNKNOWN" for sampleId though... any idea why? It was working previously in telemetry-release, and I didn't change that field in the PR.
Flags: needinfo?(whd)
Assignee | ||
Comment 9•9 years ago
|
||
Sorry, all the *files* on S3, not the records themselves. The actual data contains correct sampleIds.
Comment 10•9 years ago
|
||
I have taken a look at this and think I have determined the cause. I copied the code I used to set up the backfill jobs from one of :mreid's old backfill scripts:
s3://telemetry-analysis-code-2/jobs/whd-sample-backfill-1/telemetry-sample-backfill-0.1.tar.gz
This pulls in a heka build from
http://people.mozilla.org/~mreid/heka-20150918-0_11_0-linux-amd64.tar.gz
which most unfortunately was built three days before the commit that added support for non-string fields for s3 dimensions:
https://github.com/mozilla-services/data-pipeline/commit/f02b014e674fbe62f47844da02a369a94053ee08
Additionally I noticed we botched the sampleId filtering logic (|| instead of &&), so we're sending the full data stream instead of a 10% sample:
https://github.com/mozilla-services/puppet-config/pull/1779/files
This might explain why it is requiring so many machines to process. I'm going to file a PR to fix the sampleId filtering, but we are probably going to need to do the whole backfill process again, with a newer heka and a fixed message matcher.
Flags: needinfo?(whd)
Comment hidden (off-topic) |
Comment 12•9 years ago
|
||
The backfill completed this weekend.
Reporter | ||
Comment 13•9 years ago
|
||
The data looks OK so far :)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•