Closed Bug 1353161 Opened 8 years ago Closed 7 years ago

Send FxA data from basket to SFMC

Categories

(Websites :: Basket, enhancement)

Production
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kirby, Assigned: pmac)

Details

Attachments

(1 file)

40 bytes, text/x-github-pull-request
Details | Review
Cloud Services is sending a data feed to AWS with data about who is logging into their Firefox Account. For background, see: (https://bugzilla.mozilla.org/show_bug.cgi?id=1338939) We need a daily feed from basket into Salesforce Marketing Cloud (ExactTarget). pmac has been provided credentials for AWS. External Key for the target data extension provided previously.
pmac here's the bug, thanks!
Flags: needinfo?(pmac)
bniolet and I discussed storage and we decided that we only need to store the FxA_ID and the most recent login date. In this way we can keep the data extension much smaller and make updates easier to perform.
Flags: needinfo?(pmac)
Indeed we did. And I already made FXA_ID the primary key in the data extension :pmac.
I've gotten into this, and wow, are these files large: * Each file is a day's worth of login activity * Each contains around 20 Million records * After processing, that's around 7 Million unique FxA_IDs I've also processed a full week's worth of files (all that are in the S3 bucket) and that resulted in around 10.5 Million records. That was from around 6.5 GB worth of text files. That's a whole lot of data transfer and processing and API calls, and what we'll be storing is not actually what we want. We want those users who aren't active now, but were recently. Would it be better if instead of recording active users if we asked FxA to send us a weekly file containing users that last logged in more than a week ago and less than 2 months ago (for example. numbers should be tweaked obviously). This would be exactly what we want, should be WAY less data to process and store, and be much quicker. If we needed an initial dump of inactives perhaps we could provide them a list of the FxA IDs we have, and they could tell us which of them are "inactive". I'm just trying to think of alternatives because I'm not sure basket can process nearly 50 Million rows in a Data Extension per week (that's assuming around 7 Million per day for 7 days). I'm willing to try, but it feels like we'll run into limitations of their API.
Attached file github PR
This is my initial attempt. As soon as code review is complete we can test and see if my assumptions hold true.
Assignee: nobody → pmac
Status: NEW → ASSIGNED
Commits pushed to master at https://github.com/mozmar/basket https://github.com/mozmar/basket/commit/0f3ae7cbc9cc41e2dad856cfd67a2a22d72a4b54 Fix bug 1353161: Import FxA Activity data into SFMC * Download csv files from s3 * Parse csv files and get the most recent timestamps per fxa_id * Update said timestamps in a Data Extension in SFMC * Cache the timestamps in Redis to avoid so many SFMC API calls * Cache which files we've successfully processed These files are FxA login timestamps per day. Each one contains around 20M rows. After processing all 8 (max in the bucket at a time) there are around 10M records to update. This will take quite a while per run. https://github.com/mozmar/basket/commit/cb425cf56c1f10151923831cd3c62d4005270413 Merge pull request #17 from pmac/fxa-s3-info-to-sfmc-1353161 Fix bug 1353161: Import FxA Activity data into SFMC
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
I've just greatly improved this in https://github.com/mozmar/basket/commit/8944603aa36c5d49fb25ca812b2af97184583858. It switches this from using the SOAP API we use for smaller updates to the REST API which supports updating multiple Data Extension rows in a single request. This is allowing basket to update 1000 records per call and has reduced the time to update the data from 10 days to under 4 hours. This should be far more reliably up to date now.
OMG. You're awesome! Thanks, pmac!!!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: