Closed Bug 1193011 Opened 9 years ago Closed 8 years ago

modern mapper out of date for gecko-dev

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: hwine)

References

Details

Attachments

(9 files)

"modern" vcs-sync appears to not be handling updates to gecko-dev. At this point, production is still using "legacy" vcs-sync, so this is not production impacting

reported by :nhirata in #releng
hg hash 22476236b3e1173e03078dc076258f36e67f7040 has not been inserted in modern mapper

entry is present in mapfile on conversion box, but not in mapfile retrieved from api

Ahh, sha of first commit of changeset is not being added, subsequent ones are.
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Attached file missing-mapfile
Not seeing a pattern yet, but there were 376 missing mappings for gecko-dev.

I've updated those (attached), and have a script to monitor future deviations.
Similar to bug 1175684, but that was new system on new hosts. Of course, this instance is happening on new system on old hosts, so log file hassles in a different way.
See Also: → 1175684
Found & fixed 13 more missing entries
Until root cause is found, it could be possible to have a daily push that pushed all mappings as a workaround.

From memory, only diffs are pushed, and they are only marked as being pushed (and excluded from future pushes) if we get back a HTTP 200 back. Maybe there is some condition whereby commits are not successfully committed, but a 200 response is given.

Another option might be to calculate a hash on both sides of the full set of hashes before pushing, and only if they match to push the delta, rather than the full map file.

Another option might be to calculate the hash of the full mapfile on both sides /after/ pushing the delta.

This is all speculation - but I'm guessing this is where the issue lies.
To be clear: pushing deltas only, means only new mappings get added. Therefore is something goes wrong on any run, and it is not detected, mappings can go missing. The advantage of pushing deltas over full map files is obviously efficiency. The suggestion in my previous comment is about the mechanism to ensure that the mappings were pushed successfully, and then falling back to a complete push of all mappings in the case of a problem being detected. I prefer to calculate a hash of the mapfile rather than just e.g. count the number of rows, in case there are invalid mappings. The advantage of calculating a hash over just pushing the whole map file is the reduced network traffic - only the hash has to be transmitted. You could calculate both hashes on the client side too by retrieving the full map file after pushing - the advantage here is simplicity in implementation - the disadvantage of course is you have to transfer all mappings, every time you add any - so should be the very last option, in my opinion - and if done, perhaps not for every update, just in a once-per-day or e.g. once-every-ten-times kind of setup.
23 entries missing
Update 122 entries
update: added additional logging to possibly detect when this happens
see: https://hg.mozilla.org/build/mozharness/rev/7c16786b489d

Strategy is to immediately probe after successful upload for the first & last entries in the mapfile. If those are not found, log an error, and treat as failed upload (so will be retried next run).

If no errors are reported, and entries still show up missing every few days, I'll need a new strategy.
Update of 31 entries after fixing issues with new logging (it was generating false positives, and possibly caused bug 1204318)
Fixing       88 entries
added checks for all insertions and check for sensible calculation of "delta" and deployed

https://hg.mozilla.org/build/mozharness/rev/7e04b4a5b116
Off by 6 overnight, no error logs. Time to re-examine assumptions.
I took a list of missing mappings from one of the attachments and looked up the first occurrence of each changeset and sorted by times. Hopefully that allows you to pinpoint things.
Those times might be in PDT.
72 entries
(In reply to Hal Wine [:hwine] (use NI) from comment #15)
> Created attachment 8667986 [details]
> missing-mapfile 2015-09-30
> 
> Off by 6 overnight, no error logs. Time to re-examine assumptions.

So just looking at these six revs,

dustin@hopper ~/tmp $ cat foo.sh
SHAs='69c6f1bd2f7f189da346b84daad0a70aa527142e 7a4aaa046640c50b7f0c55bf922d8b3f758183c1 7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 8e7c30dca84ce097fecb014885d551bc61f41c1f bfea98b5492154967387d31ad7df794a7c03d721 df21b43278fde2cd74e9010c288f32ea1cc2389c'

for sha in $SHAs; do
    echo $sha from relengapi:
    curl https://api.pub.build.mozilla.org/mapper/gecko-dev/rev/git/$sha
    echo
done
dustin@hopper ~/tmp $ sh foo.sh 
69c6f1bd2f7f189da346b84daad0a70aa527142e from relengapi:
69c6f1bd2f7f189da346b84daad0a70aa527142e 70d0481d618f875e24546f2589abcf56a7d5d2cf
7a4aaa046640c50b7f0c55bf922d8b3f758183c1 from relengapi:
7a4aaa046640c50b7f0c55bf922d8b3f758183c1 97e537f85183ef31481602ab9e5587a6e7d16b4d
7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 from relengapi:
7aa2a6b13e4c72f2c45730843a302a93adfdcdf3 ccd3484ebb4c742a43b5d938911eac2d7e670d43
8e7c30dca84ce097fecb014885d551bc61f41c1f from relengapi:
8e7c30dca84ce097fecb014885d551bc61f41c1f 891ee0d0ba3ec42b6484cf0205b3c95e21c58f74
bfea98b5492154967387d31ad7df794a7c03d721 from relengapi:
bfea98b5492154967387d31ad7df794a7c03d721 f02c4236fbdb21537f5200bd0d582dd6c264a8e2
df21b43278fde2cd74e9010c288f32ea1cc2389c from relengapi:
df21b43278fde2cd74e9010c288f32ea1cc2389c ccee6614fd9d18a31f263fbcfe9676b224d851aa

and similarly for comment 18.  Were these manually re-inserted?

If I understand your experiment correctly, you verified that the sha1's were inserted by querying for them after inserting, but checks later showed those records to no longer exist.  I'll allow that MySQL might fail to store records properly, but it's pretty incredible that records which landed in the database might later disappear -- much less later *re*-appear.
(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> (In reply to Hal Wine [:hwine] (use NI) from comment #15)
> > Created attachment 8667986 [details]
> > missing-mapfile 2015-09-30
> > 
> > Off by 6 overnight, no error logs. Time to re-examine assumptions.
> 
> So just looking at these six revs,
> ...
> and similarly for comment 18.  Were these manually re-inserted?

Yes. I have a script on my laptop I run occasionally to both check if there are gaps, and to fill in those gaps.

I am behind on posting data to this bug -- I'll also check if any revision, once "manually" fixed, ever disappears again.
update: now monitoring and posting "missing count" from relengwebadm to graphite on an hourly basis.

datum name is test.hwine.mapper.delta.gecko-dev
graphite report name is user graphs -> hwine -> mapper gecko-dev
After careful examination of collected log files, nothing is obvious. 

Will hit with hammer, and re-add missing on a regular basis.
configured hourly cronjob on relengwwebadmin running as hwine to:
 - get stats on situation and log to graphite
 - re-add any missing entries (using API credentials issued to hwine)

daily, it will prune logs to roughly the last 4 days (96 entries)

Will monitor for a day or two, then close.
so far, so good. Will wait one more day to confirm pruning of log files.
Hal, does this mean you found the root cause?

Might also be worth pulling in Aki now he's back...
(In reply to Pete Moore [:pmoore][:pete] from comment #32)
> Hal, does this mean you found the root cause?

Nope - just decided the work around was sufficient.

All log pruning operating fine.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
New hint - see bug 1265263 attachment  8747304 [details] -- that is the reduced occurrence of this issue after timeout on gexport call extended.

So, likely this is/was a corner case on handling of mapper deltas in the presence of abandoned gexport runs.
See Also: → 1265263
See Also: → 1376048
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: