Closed Bug 1194810 Opened 9 years ago Closed 9 years ago

[BSD Migration to ExactTarget] Match & Assign tokens to email records

Categories

(Websites :: Basket, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jdavis, Unassigned)

Details

Foundation is moving their email sending production from Blue State Digital into ExactTarget this quarter.

There will be a file of ~1.5 million records that will need tokens assigned to them (if they exist in ET, we'll need to know what it is, if they do not exist in ET, a new token will need to be assigned).

These records will then need to be assigned the correct subscription flags and imported into ExactTarget's Master_Subscriber data extension.


Questions:
1) Pmac, what's the best way for you to get and assign the 1.5 million records of emails?

2) Adam, Pmac, DEG, what's the best way to assign Subscription Flags to these 1.5 million records to prep for import?

3) DEG, what's the best way to import these 1.5 million records? (How long will it take? Is there a way to do this import without locking Master_Subscribers so that Basket can still ping & update the Master_Subscribers table independent of the import?)

4) What other questions are out there to explore the best way to do this?
(In reply to Jessilyn Davis from comment #0)
> 1) Pmac, what's the best way for you to get and assign the 1.5 million
> records of emails?

There isn't a good way for "me" to do this unfortunately. Each API call to ET I can make takes around a second, maybe two, so doing that 1.5M times is a whole lot of seconds (nearly 20 days worth). If there's a way for DEG to get us a list of the ones from the BSD list that are in ET with their tokens, I can easily and quickly assign the rest a new token and provide you the result. Otherwise we'll just have to start the process and wait and hope it doesn't stop.
(In reply to Paul [:pmac] McLanahan from comment #1)
> (In reply to Jessilyn Davis from comment #0)
> > 1) Pmac, what's the best way for you to get and assign the 1.5 million
> > records of emails?
> 
> There isn't a good way for "me" to do this unfortunately. Each API call to
> ET I can make takes around a second, maybe two, so doing that 1.5M times is
> a whole lot of seconds (nearly 20 days worth). If there's a way for DEG to
> get us a list of the ones from the BSD list that are in ET with their
> tokens, I can easily and quickly assign the rest a new token and provide you
> the result. Otherwise we'll just have to start the process and wait and hope
> it doesn't stop.

AH! Yes. We can do this a la how we handled the EOY exclusion process in Bug 1112179.


Adam, we'll want to change over the signup form endpoints before we start this process. That way it's a one-and-done data import.
Proposed steps *after* Foundation Email Opt-in signup forms connected to Basket + ET:

1. We'll need a secure place for DEG to get the database of email addresses.

2. DEG will then compare the list to our current list of subscribers in ExactTarget and weed these individuals out.

3. DEG will provide Pmac access to a list of email addresses of individuals who do not currently exist in ExactTarget.

4. Pmac will assign tokens to these individuals.

5. An import file containing the columns of: email address, token, and subscription flgs with corresponding Y values, will need to be created. (Or is there a better way to do this for 1.5 million records?)

6. DEG will assist with importing. 


Does this make sense? Is there a better way to go about this?
Makes sense to me. Let's do it.
(In reply to Jessilyn Davis from comment #3)
> Proposed steps *after* Foundation Email Opt-in signup forms connected to
> Basket + ET:
> 
> 1. We'll need a secure place for DEG to get the database of email addresses.

For past projects, we've been set up with a username/password for Brick FTP access. If you guys can do that again, we can send over IP addresses to keep the connection secure.

 
> 5. An import file containing the columns of: email address, token, and
> subscription flgs with corresponding Y values, will need to be created. (Or
> is there a better way to do this for 1.5 million records?)

As far as I know, this is the best way to go about a mass import/add/update of records. I don't believe it impacts the data table, locking it or otherwise, but we will have a brief internal kickoff to discuss ramifications of this import and impacts to basket and advise.
Updating the bug (other convos have been happening in GitHub (Foundation), Basecamp (DEG), and in meetings):

The plan:

Adam will post the .csv file of ~1.6 million records to Brick FTP tomorrow(?) to be ready for DEG.

DEG will then match the tokens that exist in ET, and post back to Brick FTP 2 files:

1 file with all matches (subscription flags as well)
1 file with no matches for pmac to assign a token.

Pmac - do you have access to Brick FTP?


Then later this week, or Monday, Sept 28 - Adam will post the final csv file on Brick FTP, complete with TOKEN and new subscription flag information for Foundation email programs, to be imported by DEG into Master_Subscribers in ExactTarget.

Let me know if anyone has any questions or problems with this timeline.

Thanks!

Jess
Update in this thread: 

Adam, Pmac and DEG all have access to Brick FTP
Adam has posted file of 1.6 million records there for DEG

*Ideal timeframe:*
Friday, 9/25 - DEG to provide 2 files
1 file with all matches (and subset of field information)
1 file with no matches for pmac to assign a token.

Monday, 9/28 -  Pmac to create tokens for new people in ET from Foundation. Adam review the data to see if we need to be concerned about overwriting any current information (CREATED_DATE_, LANGUAGE_ISO2, COUNTRY).

Tuesday, 9/29 - Final tweaks done to file(s) to ready them for DEG to import.

Wed, 9/30 - Import!


Sound doable?
Flags: needinfo?(pmac)
Flags: needinfo?(adam)
good for me.
Flags: needinfo?(pmac)
Pmac - you're up! There is a file on Brick FTP ready for you to look through and assign *new* tokens: mofo_bsd_export_et_fmt_deg_matches.csv


Please take a look and let me know:

1) Can you get to this today?

2) When you assign tokens to these email addresses, can you keep the other columns in tact? That way once we have the tokens, the file is ready for import.

3) I'm trying to figure out how to deal with the dupes so that we don't assign new & different tokens to the dupe email rows. Do you have a recommendation on how to deal with this? (Adam, do you?)



From DEG:

I have the output files back out on the Brick FTP site ready for review. 

I assigned my own unique identifier to the rows since the email was not unique. That column is DEG_ID. It can be ignored on your end.

Here are the details:

    Total input rows - 1,654,480

    Number of Email Addresses Repeated on Input - 2,874 (3,079 duplications)

    MoFo_BSD_Contact_Matches - 484,274

    MoFo_BSD_Contact_Non-Matches - 1,170,587


Files:

    mofo_bsd_export_et_fmt_deg_non-matches.csv
    mofo_bsd_export_et_fmt_deg_matches.csv
Flags: needinfo?(pmac)
(In reply to Jessilyn Davis from comment #9)
> Pmac - you're up! There is a file on Brick FTP ready for you to look through
> and assign *new* tokens: mofo_bsd_export_et_fmt_deg_matches.csv

I think the file that needs new tokens is the other one: ...non-matches.csv

> 3) I'm trying to figure out how to deal with the dupes so that we don't
> assign new & different tokens to the dupe email rows. Do you have a
> recommendation on how to deal with this? (Adam, do you?)

For simplicity we can skip the dupes if it unblocks the bulk import. There's only a couple of thousand of them, so I can tidy the dupe data up and subscribe them via the API after the bulk import is done.
Flags: needinfo?(adam)
Ah! Yes - Pmac, the file that needs tokens matched is: ...non-matches.csv 

BUT!

I've asked DEG to pull a file of the dupes, so that we can skip those. (Would require Pmac (or Adam) to find and filter out the dupes before pmac generates Tokens.)

I've also asked DEG if it's possible to recreate the non-matches and matches without the dupes so that we can resume this token matching and bulk import more easily. (Would NOT require any further work before Pmac can create new tokens.)

I'll keep you posted.

J
Sorry. Today was shot by bedrock deployment testing in AWS. My reading of the last couple comments is that I should _not_ yet generate the tokens, and should instead wait for a deduped file? If I should go ahead, let me know, and I can maybe get to this tonight or first thing tomorrow.
Flags: needinfo?(pmac)
Does the above sound right to you Ben?
Flags: needinfo?(bniolet)
Hey pmac,

You were right to wait for the deduped file.

This has now been deduped and is on the FTP site:


From DEG:
> I have the new data files out on the FTP Site. I changed the naming convention 
> from the previous batch. 
> These are the matches less the duplicate email addresses from the input.

> Mozilla_MoFoBSDContacts_NotinET_10012015.txt - 1,167,613
> Mozilla_MoFoBSDContacts_InET_10012015.txt - 483,788
Flags: needinfo?(bniolet)
Checking in. I'm about to get on a plane. Is there anything I can do / need to do to assist?
Done. The file is in brickftp and is called:

Mozilla_MoFoBSDContacts_NotinET_10012015_New_Tokens.txt.gz

That's a gzipped file. I can upload a decompressed one as well if someone needs it; it would just take longer.
Sweet! Thank you Pmac, Adam, Ben, and DEG! 

Closing this out as all Tokens are now assigned to existing newsletter subscribers & incoming signups will be added via the Basket API (so tokens will be automatically matched for existing, or created for new).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.