Closed Bug 1181640 Opened 5 years ago Closed 5 years ago
Write Data migration script to move 'campaign' data from Redshift to RDS (postgres)
43.2 - Sep 7
Currently all campaign data is in Redshift (tables: tiles, adgroups, adgroup_sites, channels, distributions, countries). After we have created the new RDS database for this data, we will need to migrate our campaign data to the new database. The new database will have additional tables, namely tables: clients, campaigns. Because existing data will have to reference these tables, we will need to create appropriate data for these tables as well (by hand). This migration will be a 'one-time' script that will be run on the production database. Note that we are assuming the Redshift database is the one defined in Splice 1.1.24.
There is campaign and client data captured in zenko: Some client data. e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L740 Campaign data for non-Mozilla tiles. e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/templates/index.html#L198 There may be some inaccuracies: e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L278 There are actually more tiles in the android channel than what is listed in this ruleset. And there is at least one mistake in Redshift. See bug 1186451 Campaign data for Mozilla Tiles. e.g. https://github.com/matthewruttley/zenko/blob/8461a90b2c0a4ec01f79e651b68c4157d04f28f4/redshift.py#L108
Commit pushed to master at https://github.com/mozilla/splice https://github.com/mozilla/splice/commit/b9742e82194090d938d3fb0b55a4bf5227bff436 1.1.28 merging bug 1181640
I have written a script that populates campaign, country and account tables in our campaign database. I will copy the docstring here, as it is a pretty concise explanation of how it works: This script is going to populate Account and Campaign database structures. It does this by reading the currently deployed tile distributions (s3), where it determines the currently active tile set, as well as the geo-targetting data (currently only country level) for each tile/adgroup. The script also popultates/migrates data from the 'countries' table in the stats database. The script will discriminate between 'active' and 'inactive' adgroups based on whether or not the adgroup exists in the current distribution. Inactive adgroups are given start/end dates in campaigns that are in the *past*. Active adgroups are placed in campaigns that start on their adgroup creation date and end at some far distant future date. Campaign objects are considered unique by grouping together the following keys in the adgroup: * the TLD+1 of the target_url * the locale of the adgroup * the channel of the adgroup * the 'active' flag (determined as explained above) of the adgroup One campaign row will be assigned for each unique campaign detected. The script will populate the adgroup.campaign_id with the campaign that the adgroup fits into. Account objects are considered unique by considering the TLD+1 of the campaign. All campaigns that share the same TLD+1 will be grouped into the same Account. All writes to the database are transactional. This script is *not* idempotent, and will therefore check that accounts and campaigns tables are empty before running.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.