Closed Bug 1631787 Opened 4 years ago Closed 4 years ago

Please assist SoftVision with migration testing against production on Tuesday April 28th at 10am EST

Categories

(Cloud Services Graveyard :: Operations: Sync, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rachel, Assigned: eolson)

References

Details

Attachments

(5 files)

Vasilica's team will be testing the migration as part of this PI request.

I'm marking this as a P1 as we're planning to start the migration on May 4th assuming all goes ok.

Erik and JR: can you be on hand to help troubleshoot? JR: could you also detail specifically what flags to use for the script?

Flags: needinfo?(jrconlin)
Flags: needinfo?(eolson)
Blocks: 1631788
No longer blocks: 1631788
Summary: Please assist SoftVision with migration testing on Tuesday April 28th at 10am EST → Please assist SoftVision with migration testing against production on Tuesday April 28th at 10am EST
Blocks: 1631788

Yes, sounds great!

Flags: needinfo?(eolson)
Component: Operations: Miscellaneous → Operations: Sync
QA Contact: habib

There are now several scripts that will need to be run. The (README.md)[https://github.com/mozilla-services/syncstorage-rs/tree/master/tools/user_migration_ file] includes instructions about how to run these. All scripts support --help which will list all available options, their functions and default values (if appropriate). Most of the option default values have been suggested by operations, so no changes should be required. It is presumed that the scripts will have default read and write access to the current directory.

By default, the gen_bso_users.py --hoard_limit value does not have a default value. @eolson may have recommendations about what a good value to use is.

Note, these scripts will generate a success_{date}.log and failure_{date}.log file (if appropriate) indicating the successfully and unsuccessfully migrated userids.

Flags: needinfo?(jrconlin) → needinfo?(eolson)

Hi! Here are the selected accounts for this second part testing of sync data migration:

produser1@mailinator.com 128709160
produser2@mailinator.com 128754071
produser3@mailinator.com 128803402
produser4@mailinator.com 128803495
produser5@mailinator.com 128804907
produser6@mailinator.com 128806959
produser8@mailinator.com 128808758
produser9@mailinator.com 128809197
produser10@mailinator.com 128808946
produser13@mailinator.com 128809645
produser14@mailinator.com 128812632
produser15@mailinator.com 128812699

Hello, they have been assigned to the node we will use for migration. Their new uids are as follows:

produser1@mailinator.com 128709160 ebd5428fdef74e7ea2315bb4c0ed3897@api.accounts.firefox.com 143683269
produser2@mailinator.com 128754071 0f13d22f1c3e48edba8f49bd83f295af@api.accounts.firefox.com 143683279
produser3@mailinator.com 128803402 9909a79230b04ce7ab19b75206b6d8a7@api.accounts.firefox.com 143683290
produser4@mailinator.com 128803495 9b46d6447ed14421849d155c821ac3f1@api.accounts.firefox.com 143683297
produser5@mailinator.com 128804907 31a63278ac2d48afb72808ea7145b7c2@api.accounts.firefox.com 143683302
produser6@mailinator.com 128806959 d4db06b1118a46cdb73c6091ce176f63@api.accounts.firefox.com 143683311
produser8@mailinator.com 128808758 16cde22f1532456ea91e7744dce4c605@api.accounts.firefox.com 143683318
produser9@mailinator.com 128809197 632c2fba929a4194a2f98685edb00163@api.accounts.firefox.com 143683321
produser10@mailinator.com 128808946 f20b94b939ca47cba726760a13e2d4d2@api.accounts.firefox.com 143683328
produser13@mailinator.com 128809645 96be267c104a4b02be3c7895e7af3bb0@api.accounts.firefox.com 143683334
produser14@mailinator.com 128812632 387cc0c04a8f489fa94af0136b0b4cfe@api.accounts.firefox.com 143683341
produser15@mailinator.com 128812699 36b464f5ef1f4bd38920a311bd2ea654@api.accounts.firefox.com 143683351

The node is up, please allow them to sync prior to the testing.

Flags: needinfo?(eolson)

I used 300k as a hoard limit in a recent test, that seemed reasonable as it would take about 4.5 minutes to transfer a user with that many rows and less than .5% of users on the node had more than that.

Migration tests were completed yesterday. There were 12 clients tested. The beginning of the test was delayed due to the clients having sync errors while connecting to the migration sync-py node and deleting all their data. We have reproduced this issue today and it may need to be addressed.

The clients were re-populated and the migration to spanner testing was completed.

Attaching two clients logs showing the failures that led to data loss.

New uids to get server logs after data migration to spanner:

produser1@mailinator.com 128709160 ebd5428fdef74e7ea2315bb4c0ed3897@api.accounts.firefox.com 144074002
produser2@mailinator.com 128754071 0f13d22f1c3e48edba8f49bd83f295af@api.accounts.firefox.com 144074484
produser3@mailinator.com 128803402 9909a79230b04ce7ab19b75206b6d8a7@api.accounts.firefox.com 144073987
produser4@mailinator.com 128803495 9b46d6447ed14421849d155c821ac3f1@api.accounts.firefox.com 144075243
produser5@mailinator.com 128804907 31a63278ac2d48afb72808ea7145b7c2@api.accounts.firefox.com 144076549
produser6@mailinator.com 128806959 d4db06b1118a46cdb73c6091ce176f63@api.accounts.firefox.com 144076072
produser8@mailinator.com 128808758 16cde22f1532456ea91e7744dce4c605@api.accounts.firefox.com 144074443
produser9@mailinator.com 128809197 632c2fba929a4194a2f98685edb00163@api.accounts.firefox.com 144075229
produser10@mailinator.com 128808946 f20b94b939ca47cba726760a13e2d4d2@api.accounts.firefox.com 144076550
produser13@mailinator.com 128809645 96be267c104a4b02be3c7895e7af3bb0@api.accounts.firefox.com 144124338
produser14@mailinator.com 128812632 387cc0c04a8f489fa94af0136b0b4cfe@api.accounts.firefox.com 144076546
produser15@mailinator.com 128812699 36b464f5ef1f4bd38920a311bd2ea654@api.accounts.firefox.com 144125169
Attached file 4-28-clients.csv.tgz

Serverside logs are attached. 144074443, 144075243, and 144073987 re-uploaded their bookmarks. 144124338 and 144125169 have no activity.

(In reply to Erik Olson from comment #7)

The beginning of the test was delayed due to the clients having sync errors while connecting to the migration sync-py node and deleting all their data. We have reproduced this issue today and it may need to be addressed.

The first log shows a hostname lookup failure (oddly enough this happens shortly after a successful GET to the same sync node).

The second log shows no /meta/global record. The is expected since this is their first connection to a sync-py node (https://sync-815-us-west-2.sync.services.mozilla.com) where they have no existing data. So the client performs a reset which includes deleting everything on the server:

1588075958472	Sync.Service	INFO	No metadata record, server wipe needed
1588075958472	Sync.Service	INFO	Wiping server data
1588075958472	Sync.Service	INFO	Fresh start. Resetting client.

The clients were re-populated and the migration to spanner testing was completed.

Attaching two clients logs showing the failures that led to data loss.

All I see is the delete against the sync-py node client logs (which isn't data loss if there was no data there in the first place).

Do we have the client logs against the spanner node?

Flags: needinfo?(eolson)

Vasilica can you make these client logs available?

Flags: needinfo?(eolson) → needinfo?(vasilica.mihasca)
Attached file sync-error3.txt

I found the following sync errors, are these the right ones?

Flags: needinfo?(vasilica.mihasca)

I think we have determined the sync problems/data loss with node sync-815-us-west-2 are because of the workflow. Neither the client nor the server had the data, so there was nothing to sync.

Do you have client logs for 144074443, 144075243, and 144073987 as they re-uploaded their bookmarks to the spanner node, which they should not have done?

Verified sync users being migrated with correct history and bookmarks. A final round of manual testing is planned.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
See Also: → 1636457
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: