Closed Bug 1353784 Opened 7 years ago Closed 7 years ago

Add campaign field and build metadata to core ping

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
1

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: frank)

References

Details

Attachments

(1 file, 2 obsolete files)

      No description provided.
Attached file new_core_config (obsolete) —
I've tested this and the output looks good. Here are the changes:

1. Partition by normalizedChannel
2. Added new "campaign" field (binary UTF8)
3. Added buildId and appName to metadata
Flags: needinfo?(whd)
In the last version from https://bug1347609.bmoattachments.org/attachment.cgi?id=8850024 the searches group had an int32 value. This seems like it would be a possible incompatible change, is that expected?
Attached file updated_core_config (obsolete) —
A few changes:

1. appName is no longer a metadata field, but a partition. We need to take all the existing files and put them in "app_name=Fennec", since they are all Fennec pings.

2. "submission" partition name changed to "submission_date". Is it feasible to change this for the historical files? If not feel free to keep it as "submission".
Attachment #8855852 - Attachment is obsolete: true
Updated search field type.
Attachment #8855896 - Attachment is obsolete: true
https://github.com/mozilla-services/puppet-config/pull/2554

(In reply to Frank Bertsch [:frank] from comment #4)

> 2. "submission" partition name changed to "submission_date". Is it feasible
> to change this for the historical files? If not feel free to keep it as
> "submission".

I've changed this to "submission_date_s3" to be more similar to our batch jobs (which append _s3 to avoid having the problem where a parquet file contains a field that also exists in an s3 partition).

This required a copy of all existing data, which is complete. However, it appears there are already different app_names sending data so we may need to do an actual backfill to properly categorize historical data.

After the next p2h run the data should be available using the new partitioning scheme from re:dash.
Flags: needinfo?(whd)
Great, thanks whd. We got a few pings today with Focus data, but I'm not overly concerned and there's no need to run backfill as long as the future data is properly partitioned by appName.
:whd, unfortunately my change wasn't backwards compatible. Bug 1352521 will fix this for our presto instance, but not for Athena, so we're going to have to version bump the new data.
Flags: needinfo?(whd)
I've bumped the version to 2. I'm running backfill for 2017 as I assume that's desired.
Flags: needinfo?(whd)
2017 is now fully backfilled for v2.
Thanks whd. Closing this out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
:whd, the name should have been `campaignId` not `campaign` :/

What's the cost of a backfill? I need to decide whether or not it's worth it to backfill again.
Status: RESOLVED → REOPENED
Flags: needinfo?(whd)
Resolution: FIXED → ---
In terms of compute, it's $0.42/hr for about 5 hours on a c3.2xlarge per backfilled day, which for 110 days comes out to about $250.0. There are other factors such as network and s3 api costs but the cost will be dominated by compute. I estimate the aws cost of backfill to be < $500.

In terms of my time, an hour or so, as this requires a production deploy, backfill setup, and some context switching.
Flags: needinfo?(whd)
Let's not backfill for now. Can you change the name to campaignId rather than campaign, and we'll have this data moving forward? I'll take care of the schema in mozilla-pipeline-schemas.
Flags: needinfo?(whd)
Component: Metrics: Pipeline → Datasets: Mobile
Product: Cloud Services → Data Platform and Tools
Changes available in STMO.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Component: Datasets: Mobile → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: