Closed
Bug 1193011
Opened 9 years ago
Closed 8 years ago
modern mapper out of date for gecko-dev
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: hwine)
References
Details
Attachments
(9 files)
"modern" vcs-sync appears to not be handling updates to gecko-dev. At this point, production is still using "legacy" vcs-sync, so this is not production impacting reported by :nhirata in #releng
Assignee | ||
Comment 1•9 years ago
|
||
hg hash 22476236b3e1173e03078dc076258f36e67f7040 has not been inserted in modern mapper entry is present in mapfile on conversion box, but not in mapfile retrieved from api Ahh, sha of first commit of changeset is not being added, subsequent ones are.
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•9 years ago
|
||
Not seeing a pattern yet, but there were 376 missing mappings for gecko-dev. I've updated those (attached), and have a script to monitor future deviations.
Assignee | ||
Comment 3•9 years ago
|
||
Similar to bug 1175684, but that was new system on new hosts. Of course, this instance is happening on new system on old hosts, so log file hassles in a different way.
See Also: → 1175684
Assignee | ||
Comment 4•9 years ago
|
||
Found & fixed 13 more missing entries
Comment 5•9 years ago
|
||
Until root cause is found, it could be possible to have a daily push that pushed all mappings as a workaround. From memory, only diffs are pushed, and they are only marked as being pushed (and excluded from future pushes) if we get back a HTTP 200 back. Maybe there is some condition whereby commits are not successfully committed, but a 200 response is given. Another option might be to calculate a hash on both sides of the full set of hashes before pushing, and only if they match to push the delta, rather than the full map file. Another option might be to calculate the hash of the full mapfile on both sides /after/ pushing the delta. This is all speculation - but I'm guessing this is where the issue lies.
Comment 6•9 years ago
|
||
To be clear: pushing deltas only, means only new mappings get added. Therefore is something goes wrong on any run, and it is not detected, mappings can go missing. The advantage of pushing deltas over full map files is obviously efficiency. The suggestion in my previous comment is about the mechanism to ensure that the mappings were pushed successfully, and then falling back to a complete push of all mappings in the case of a problem being detected. I prefer to calculate a hash of the mapfile rather than just e.g. count the number of rows, in case there are invalid mappings. The advantage of calculating a hash over just pushing the whole map file is the reduced network traffic - only the hash has to be transmitted. You could calculate both hashes on the client side too by retrieving the full map file after pushing - the advantage here is simplicity in implementation - the disadvantage of course is you have to transfer all mappings, every time you add any - so should be the very last option, in my opinion - and if done, perhaps not for every update, just in a once-per-day or e.g. once-every-ten-times kind of setup.
Assignee | ||
Comment 7•9 years ago
|
||
23 entries missing
Assignee | ||
Comment 8•9 years ago
|
||
Update 122 entries
Assignee | ||
Comment 9•9 years ago
|
||
update: added additional logging to possibly detect when this happens see: https://hg.mozilla.org/build/mozharness/rev/7c16786b489d Strategy is to immediately probe after successful upload for the first & last entries in the mapfile. If those are not found, log an error, and treat as failed upload (so will be retried next run). If no errors are reported, and entries still show up missing every few days, I'll need a new strategy.
Assignee | ||
Comment 10•9 years ago
|
||
Update of 31 entries after fixing issues with new logging (it was generating false positives, and possibly caused bug 1204318)
Assignee | ||
Comment 11•9 years ago
|
||
Fixing 88 entries
Assignee | ||
Comment 12•9 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/7e04b4a5b116248d02f8909d536e579abcbd5936 bug 1193011 - more debug code, still not catching "lost" mappings
Assignee | ||
Comment 13•9 years ago
|
||
added checks for all insertions and check for sensible calculation of "delta" and deployed https://hg.mozilla.org/build/mozharness/rev/7e04b4a5b116
Assignee | ||
Comment 14•9 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/e4f73a03ba9e378b19cd3679744f8525b58d0547 bug 1193011 - more debug code, still not catching "lost" mappings
Assignee | ||
Comment 15•9 years ago
|
||
Off by 6 overnight, no error logs. Time to re-examine assumptions.
Comment 16•9 years ago
|
||
I took a list of missing mappings from one of the attachments and looked up the first occurrence of each changeset and sorted by times. Hopefully that allows you to pinpoint things.
Comment 17•9 years ago
|
||
Those times might be in PDT.
Assignee | ||
Comment 18•9 years ago
|
||
72 entries
Assignee | ||
Comment 19•9 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/83d79ea2a48521643d02c7e18b62b71d060211af bug 1193011 - bustage: debug flag not marked global; r=me
Comment 20•9 years ago
|
||
(In reply to Hal Wine [:hwine] (use NI) from comment #15) > Created attachment 8667986 [details] > missing-mapfile 2015-09-30 > > Off by 6 overnight, no error logs. Time to re-examine assumptions. So just looking at these six revs, dustin@hopper ~/tmp $ cat foo.sh SHAs='69c6f1bd2f7f189da346b84daad0a70aa527142e 7a4aaa046640c50b7f0c55bf922d8b3f758183c1 7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 8e7c30dca84ce097fecb014885d551bc61f41c1f bfea98b5492154967387d31ad7df794a7c03d721 df21b43278fde2cd74e9010c288f32ea1cc2389c' for sha in $SHAs; do echo $sha from relengapi: curl https://api.pub.build.mozilla.org/mapper/gecko-dev/rev/git/$sha echo done dustin@hopper ~/tmp $ sh foo.sh 69c6f1bd2f7f189da346b84daad0a70aa527142e from relengapi: 69c6f1bd2f7f189da346b84daad0a70aa527142e 70d0481d618f875e24546f2589abcf56a7d5d2cf 7a4aaa046640c50b7f0c55bf922d8b3f758183c1 from relengapi: 7a4aaa046640c50b7f0c55bf922d8b3f758183c1 97e537f85183ef31481602ab9e5587a6e7d16b4d 7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 from relengapi: 7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 ccd3484ebb4c742a43b5d938911eac2d7e670d43 8e7c30dca84ce097fecb014885d551bc61f41c1f from relengapi: 8e7c30dca84ce097fecb014885d551bc61f41c1f 891ee0d0ba3ec42b6484cf0205b3c95e21c58f74 bfea98b5492154967387d31ad7df794a7c03d721 from relengapi: bfea98b5492154967387d31ad7df794a7c03d721 f02c4236fbdb21537f5200bd0d582dd6c264a8e2 df21b43278fde2cd74e9010c288f32ea1cc2389c from relengapi: df21b43278fde2cd74e9010c288f32ea1cc2389c ccee6614fd9d18a31f263fbcfe9676b224d851aa and similarly for comment 18. Were these manually re-inserted? If I understand your experiment correctly, you verified that the sha1's were inserted by querying for them after inserting, but checks later showed those records to no longer exist. I'll allow that MySQL might fail to store records properly, but it's pretty incredible that records which landed in the database might later disappear -- much less later *re*-appear.
Assignee | ||
Comment 21•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/cc3512adc11279dcd8d07cf8efeb3a70927b3e0a bug 1193011 - catch occaional exception; r=me
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #20) > (In reply to Hal Wine [:hwine] (use NI) from comment #15) > > Created attachment 8667986 [details] > > missing-mapfile 2015-09-30 > > > > Off by 6 overnight, no error logs. Time to re-examine assumptions. > > So just looking at these six revs, > ... > and similarly for comment 18. Were these manually re-inserted? Yes. I have a script on my laptop I run occasionally to both check if there are gaps, and to fill in those gaps. I am behind on posting data to this bug -- I'll also check if any revision, once "manually" fixed, ever disappears again.
Assignee | ||
Comment 23•8 years ago
|
||
update: now monitoring and posting "missing count" from relengwebadm to graphite on an hourly basis. datum name is test.hwine.mapper.delta.gecko-dev graphite report name is user graphs -> hwine -> mapper gecko-dev
Assignee | ||
Comment 24•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/00cf4033248b2c9ce1d53341e2c96f240dda2922 bug 1193011 - more debug - confirm selection; r=me
Assignee | ||
Comment 25•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/5ddd884214d6889485e653650237ac9c447e753f bug 1193011 - more debug - save all mapfiles to logs; r=me
Assignee | ||
Comment 26•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/0f47574fdd9576b913d1d8519ed2a1975e6e1561 bug 1193011 - more debug - save all mapfiles to logs; r=me
Assignee | ||
Comment 27•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/6484726fcf35c805262a0924f6fc977d6f635074 bug 1193011 - more debug - save all mapfiles to logs; r=me
Assignee | ||
Comment 28•8 years ago
|
||
https://hg.mozilla.org/build/mozharness/rev/84b270b4a94d8cf8768d06e07c793496fdb0e1cd bug 1193011 - bustage in save all mapfiles to logs; r=me
Assignee | ||
Comment 29•8 years ago
|
||
After careful examination of collected log files, nothing is obvious. Will hit with hammer, and re-add missing on a regular basis.
Assignee | ||
Comment 30•8 years ago
|
||
configured hourly cronjob on relengwwebadmin running as hwine to: - get stats on situation and log to graphite - re-add any missing entries (using API credentials issued to hwine) daily, it will prune logs to roughly the last 4 days (96 entries) Will monitor for a day or two, then close.
Assignee | ||
Comment 31•8 years ago
|
||
so far, so good. Will wait one more day to confirm pruning of log files.
Comment 32•8 years ago
|
||
Hal, does this mean you found the root cause? Might also be worth pulling in Aki now he's back...
Assignee | ||
Comment 33•8 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #32) > Hal, does this mean you found the root cause? Nope - just decided the work around was sufficient. All log pruning operating fine.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 34•8 years ago
|
||
New hint - see bug 1265263 attachment 8747304 [details] -- that is the reduced occurrence of this issue after timeout on gexport call extended. So, likely this is/was a corner case on handling of mapper deltas in the presence of abandoned gexport runs.
See Also: → 1265263
You need to log in
before you can comment on or make changes to this bug.
Description
•