Closed Bug 1181640 Opened 5 years ago Closed 5 years ago

Write Data migration script to move 'campaign' data from Redshift to RDS (postgres)

Categories

(Content Services Graveyard :: Tiles: Data Processing, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Iteration:
43.2 - Sep 7

People

(Reporter: tspurway, Assigned: tspurway)

References

(Blocks 3 open bugs)

Details

(Whiteboard: .?)

Currently all campaign data is in Redshift (tables: tiles, adgroups, adgroup_sites, channels, distributions, countries).  After we have created the new RDS database for this data, we will need to migrate our campaign data to the new database.

The new database will have additional tables, namely tables: clients, campaigns.  Because existing data will have to reference these tables, we will need to create appropriate data for these tables as well (by hand).

This migration will be a 'one-time' script that will be run on the production database.  Note that we are assuming the Redshift database is the one defined in Splice 1.1.24.
Blocks: 1176649
Whiteboard: .?
Assignee: nobody → tspurway
There is campaign and client data captured in zenko:

Some client data.

e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L740

Campaign data for non-Mozilla tiles.

e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/templates/index.html#L198

There may be some inaccuracies:

e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L278

There are actually more tiles in the android channel than what is listed in this ruleset. And there is at least one mistake in Redshift. See bug 1186451



Campaign data for Mozilla Tiles.

e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L108
I have written a script that populates campaign, country and account tables in our campaign database.  I will copy the docstring here, as it is a pretty concise explanation of how it works:

    This script is going to populate Account and Campaign database structures.  It does this
    by reading the currently deployed tile distributions (s3), where it determines the currently active
    tile set, as well as the geo-targetting data (currently only country level) for each tile/adgroup.

    The script also popultates/migrates data from the 'countries' table in the stats database.

    The script will discriminate between 'active' and 'inactive' adgroups based on whether or not
    the adgroup exists in the current distribution.  Inactive adgroups are given start/end dates
    in campaigns that are in the *past*.  Active adgroups are placed in campaigns that start on their
    adgroup creation date and end at some far distant future date.

    Campaign objects are considered unique by grouping together the following keys in the adgroup:
    * the TLD+1 of the target_url
    * the locale of the adgroup
    * the channel of the adgroup
    * the 'active' flag (determined as explained above) of the adgroup

    One campaign row will be assigned for each unique campaign detected.

    The script will populate the adgroup.campaign_id with the campaign that the adgroup fits into.

    Account objects are considered unique by considering the TLD+1 of the campaign.  All campaigns that
    share the same TLD+1 will be grouped into the same Account.

    All writes to the database are transactional.

    This script is *not* idempotent, and will therefore check that accounts and campaigns tables are empty
    before running.
Iteration: --- → 43.2 - Sep 7
Blocks: Sprint_CS_S2
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Blocks: 1202820
No longer blocks: 1202820
Blocks: 1202820
You need to log in before you can comment on or make changes to this bug.