Closed
Bug 1353784
Opened 7 years ago
Closed 7 years ago
Add campaign field and build metadata to core ping
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: frank, Assigned: frank)
References
Details
Attachments
(1 file, 2 obsolete files)
2.48 KB,
text/plain
|
Details |
No description provided.
Assignee | ||
Comment 1•7 years ago
|
||
Assignee | ||
Comment 2•7 years ago
|
||
I've tested this and the output looks good. Here are the changes: 1. Partition by normalizedChannel 2. Added new "campaign" field (binary UTF8) 3. Added buildId and appName to metadata
Flags: needinfo?(whd)
Comment 3•7 years ago
|
||
In the last version from https://bug1347609.bmoattachments.org/attachment.cgi?id=8850024 the searches group had an int32 value. This seems like it would be a possible incompatible change, is that expected?
Assignee | ||
Comment 4•7 years ago
|
||
A few changes: 1. appName is no longer a metadata field, but a partition. We need to take all the existing files and put them in "app_name=Fennec", since they are all Fennec pings. 2. "submission" partition name changed to "submission_date". Is it feasible to change this for the historical files? If not feel free to keep it as "submission".
Assignee | ||
Updated•7 years ago
|
Attachment #8855852 -
Attachment is obsolete: true
Assignee | ||
Comment 5•7 years ago
|
||
Updated search field type.
Attachment #8855896 -
Attachment is obsolete: true
Comment 6•7 years ago
|
||
https://github.com/mozilla-services/puppet-config/pull/2554 (In reply to Frank Bertsch [:frank] from comment #4) > 2. "submission" partition name changed to "submission_date". Is it feasible > to change this for the historical files? If not feel free to keep it as > "submission". I've changed this to "submission_date_s3" to be more similar to our batch jobs (which append _s3 to avoid having the problem where a parquet file contains a field that also exists in an s3 partition). This required a copy of all existing data, which is complete. However, it appears there are already different app_names sending data so we may need to do an actual backfill to properly categorize historical data. After the next p2h run the data should be available using the new partitioning scheme from re:dash.
Flags: needinfo?(whd)
Assignee | ||
Comment 7•7 years ago
|
||
Great, thanks whd. We got a few pings today with Focus data, but I'm not overly concerned and there's no need to run backfill as long as the future data is properly partitioned by appName.
Assignee | ||
Comment 8•7 years ago
|
||
:whd, unfortunately my change wasn't backwards compatible. Bug 1352521 will fix this for our presto instance, but not for Athena, so we're going to have to version bump the new data.
Flags: needinfo?(whd)
Comment 9•7 years ago
|
||
I've bumped the version to 2. I'm running backfill for 2017 as I assume that's desired.
Flags: needinfo?(whd)
Comment 10•7 years ago
|
||
2017 is now fully backfilled for v2.
Assignee | ||
Comment 11•7 years ago
|
||
Thanks whd. Closing this out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 12•7 years ago
|
||
:whd, the name should have been `campaignId` not `campaign` :/ What's the cost of a backfill? I need to decide whether or not it's worth it to backfill again.
Status: RESOLVED → REOPENED
Flags: needinfo?(whd)
Resolution: FIXED → ---
Comment 13•7 years ago
|
||
In terms of compute, it's $0.42/hr for about 5 hours on a c3.2xlarge per backfilled day, which for 110 days comes out to about $250.0. There are other factors such as network and s3 api costs but the cost will be dominated by compute. I estimate the aws cost of backfill to be < $500. In terms of my time, an hour or so, as this requires a production deploy, backfill setup, and some context switching.
Flags: needinfo?(whd)
Assignee | ||
Comment 14•7 years ago
|
||
Let's not backfill for now. Can you change the name to campaignId rather than campaign, and we'll have this data moving forward? I'll take care of the schema in mozilla-pipeline-schemas.
Flags: needinfo?(whd)
Comment 15•7 years ago
|
||
Deployed: https://github.com/mozilla-services/puppet-config/pull/2572
Flags: needinfo?(whd)
Updated•7 years ago
|
Component: Metrics: Pipeline → Datasets: Mobile
Product: Cloud Services → Data Platform and Tools
Assignee | ||
Comment 16•7 years ago
|
||
Changes available in STMO.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Component: Datasets: Mobile → General
You need to log in
before you can comment on or make changes to this bug.
Description
•