Closed Bug 1363635 Opened 7 years ago Closed 7 years ago

tests/wpt leaked into servo mercurial repo and autoland repo

Categories

(Developer Services :: Servo VCS Sync, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

Details

Attachments

(1 file)

The tests/wpt directory from the servo Git repo was somehow converted to Mercurial despite a configuration excluding it. This added >100k files to the repo. That commit was subsequently autolanded.

First bad commit in the converted repo (https://hg.mozilla.org/projects/converted-servo-linear) is 5f0bb852f0fc.

First bad commit in the autoland repo is d9acbaa99119.

All bad changesets and their descendants will need to be stripped.

The converted-servo-linear repo will need to be hard stripped. We should be able to recover changesets on autoland that didn't touch servo/. Although the pushlog may be wrong.

We should be able to recover autoland quicly'ish. servo vcs sync could be offline until we figure out root cause.
From servo vcs sync logs:

May 09 20:15:13 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: From https://github.com/servo/servo
May 09 20:15:13 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 121662aa57..66cfea6728  master     -> master
May 09 20:15:18 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: To github.com:mozilla/converted-servo.git
May 09 20:15:18 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 121662aa57..66cfea6728  master -> master
May 09 20:15:18 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: linearizing 1 commits from heads/master (66cfea6728135d18be253c6f97f4a65ef561ba55 to 66cfea6728135d18be253c6f97f4a65ef561b
May 09 20:15:18 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 1/1 66cfea6728135d18be253c6f97f4a65ef561ba55 Auto merge of #16784 - mbrubeck:has_author, r=bholley
May 09 20:15:19 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 1 commits from heads/master converted; original: 66cfea6728135d18be253c6f97f4a65ef561ba55; rewritten: 6d7731e38c8f1d33fa42
May 09 20:15:36 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: To github.com:mozilla/converted-servo.git
May 09 20:15:36 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 011f3a35d1..6d7731e38c  refs/convert/dest/heads/master -> refs/convert/dest/heads/master
May 09 20:15:36 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: 121662aa57..66cfea6728  refs/convert/source/heads/master -> refs/convert/source/heads/master
May 09 20:15:36 servo-vcs-sync.mozops.net linearize-git-to-hg[15488]: converting 1 Git commits

Confirmed that tests/wpt leaked in as part of the Git linearization phase because the diff from 011f3a35d1..6d7731e38c is massive.
So, I did a servo vcs sync deploy this morning. This will likely the first conversion post deploy. If I had to take a stab in the dark, I'd say we regressed passing arguments to the git linearization function as part of recent refactoring I did.
Regression from https://hg.mozilla.org/hgcustom/version-control-tools/rev/86c6e05ed52c (bug 1357236). This apparently didn't get deployed until today due to yet another bug in the deployment mechanism.
Comment on attachment 8866193 [details]
vcssync: pass exclude_dirs to linearize_git_repo (bug 1363635);

https://reviewboard.mozilla.org/r/137836/#review140952

Per IRC discussion, please land this with a test.

Also, I can't but notice that if this were a regular variable, the unused variable linter would have caught it. But since it is a command argument hidden in `args.exclude_dirs` it wasn't.
Attachment #8866193 - Flags: review?(gps) → review+
integration/autoland from d9acbaa99119 has been stripped. I'll be reapplying appropriate changesets momentarily. I plan to push things as myself in the same way they were pushed before. I may go back tomorrow and rewrite the pushlog to update author metadata to match the original pushlog.
So, the list of potential alerts that should have fired for this but didn't is staggering. This caused a major oddity in operational characteristics of multiple systems and AFAIK the only automated alert we saw was an email log from servo vcs sync with a weird "out of memory" failure. Notable things not sending actionable alerts with sufficiently low latency include:

* hg.mozilla.org slow push (10 minutes)
* hg.mozilla.org large push (adding thousands of files)
* hg.mozilla.org slow replication (I presume this took forever to replicate)
* autoland slow operations
* Firefox CI slower builds due to VCS overhead
* (there's probably a few more)
OK. I've stripped integration/autoland and have grafted most stripped changesets back on to autoland. I also replayed one of KWierso's merges.

changesets that didn't make it back are:

d9acbaa99119	servo: Merge #16784 - Bug 1349651 - stylo: Implement HasAuthorSpecifiedRules (from mbrubeck:has_author); r=bholley
35ec5509fce4	Bug 1349651 - stylo: Implement HasAuthorSpecifiedRules. r=bholley
b8dc0262c14b	servo: Merge #16782 - Fix the serialization of image-orientation property (from chenpighead:stylo-serialization-of-image-orientation); r=nox
802fb3480bff	Bug 1349651 - Update stylo test expectations
6fb937063a6b	Bug 1363295 - stylo: update test expectations for image-orientation property. r=heycam
a93e7dd251fe	servo: Merge #16786 - register stylo threads with the gecko profiler (from froydnj:gecko-profiler-bindings); r=upsuper
58beae2c80d7	Bug 1349651 - Further test expectation adjustments. r=me

Changesets coming from vcs sync *have* to be skipped.

Everything in bug 1349651 was related to the initial buggy conversion that broke everything, so it got caught up in the mess.

6fb937063a6b / bug 1363295 seemed like it could lead to badness, so I played it safe and didn't take it.

I'll leave needinfo's in bugs 1349651 and 1363295 to indicate a reland is necessary.
For posterity, the last good changeset before the strip of autoland is https://hg.mozilla.org/integration/autoland/rev/dba495fc3c06. You can see from the HTML how the pushlog id goes from 42321 to 42338 on its new child (219c5bfc40a2). Those pushes will forever be empty.
Comment on attachment 8866193 [details]
vcssync: pass exclude_dirs to linearize_git_repo (bug 1363635);

https://reviewboard.mozilla.org/r/137836/#review140958

::: vcssync/tests/test-linearize-git-to-hg-exclude-dirs.t:27
(Diff revision 2)
> +  1/1 ad3f6b56f7320d386c2ce2574b0573d1ad88773b ignore
> +  dropping ad3f6b56f7320d386c2ce2574b0573d1ad88773b because no tree changes
> +  0 commits from heads/master converted; original: ad3f6b56f7320d386c2ce2574b0573d1ad88773b; rewritten: aea30981234cf6848489e0ccf541fbf902b27aca
> +  all Git commits have already been converted; not doing anything
> +  $ cd grepo-dest
> +  $ hg co tip > /dev/null

Nit: hg -q checkout tip

Also, `hg files -r tip` is easier :)
If you pulled a "bad" changeset from autoland, you can permanently remove the bad changesets by running the following command:

  $ hg --config extensions.strip= strip -r 'd9acbaa99119::'

To the uninitiated, "--config extensions.strip=" enables the strip extension (just in case) and "-r d9acbaa99119::" tells it to strip changeset d9acbaa99119 and all of its descendants (should be 16 in total). That will likely take a few minutes because packaging up 190,000+ files is never fast.
Pushed by bjones@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/4c0dafbad7cf
vcssync: pass exclude_dirs to linearize_git_repo ; r=gps
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The autoland repo is reopened and autoland service is fully caught up landing to try and autoland.

The only remaining part now is fixing servo vcs sync.
> That will likely take a few minutes because packaging up 190,000+ files is never fast.

Add --no-backup and it should be faster.
(In reply to Mike Hommey [:glandium] from comment #15)
> > That will likely take a few minutes because packaging up 190,000+ files is never fast.
> 
> Add --no-backup and it should be faster.

I purposefully did not advise running a command that results in data loss. But, yes, that statement is factual.
Pushed by bjones@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/38b1229808e1
vcssync: fix nits in exclude_dirs test ; r=gps
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
3:26 PM <gps> we still need to strip the converted-servo-linear repo on hg.mo
3:27 PM <gps> and we may need to remove some changesets from the shamap of the hg repo on the vcs sync server
3:28 PM <gps> and we'll need to "rewind" refs/convert/dest/heads/master of the git repo from d9acbaa99119 to 011f3a35d1
3:28 PM <gps> and refs/convert/source/heads/master from <whatever it is at> to 121662aa57
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The changesets that need to be stripped from the converted-servo-linear repo are (newest to oldest):

15b4fc2dd9faadcc96748197af013a97ec6f8752 servo: Merge #16790 - Sync binding files with autoland (from upsuper:sync-bindings); r=heycam
dd9cab013bd5421cfb1f0bbce56f30b96155a181 servo: Merge #16786 - register stylo threads with the gecko profiler (from froydnj:gecko-profiler-bindings); r=upsuper
2aba3272503ee281d3fe866ece875bd4191cb92e servo: Merge #16782 - Fix the serialization of image-orientation property (from chenpighead:stylo-serialization-of-image-orientation); r=nox
5f0bb852f0fc9425e321a69f1a4fcacab1bc0ee3 servo: Merge #16784 - Bug 1349651 - stylo: Implement HasAuthorSpecifiedRules (from mbrubeck:has_author); r=bholley
Steps taken to undo this mess:

From v-c-t checkout:

  $ ./deploy hgmo-strip projects/converted-servo-linear 5f0bb852f0fc9425e321a69f1a4fcacab1bc0ee3 

From servo vcs sync server (as servo-sync user):

  $ hg -R ~/servo-linear --config extensions.strip= strip -r 5f0bb852f0fc9425e321a69f1a4fcacab1bc0ee3 

  <open editor for ~/servo-linear/.hg/shamap>
  <remove lines referencing SHA-1 from comment #19>
  <save file>

  $ hg -R ~/firefox-overlay update -C null
  $ hg -R ~/firefox-overlay --config extensions.purge= purge --all
  $ hg -R ~/firefox-overlay --config extensions.strip= strip -r 'not public()'

  $ cd ~/servo.git
  $ git update-ref -m 'rewind converted pointer to redo conversion' refs/convert/source/heads/master 121662aa57 e029a42653323
  $ git update-ref ... refs/convert/dest/heads/master 011f3a35d1 a9fd63d99f

The servo linearize service is back running. However, the autoland repo is closed and I want to be looking at logs when the overlay runs, so all timers and pulse-based triggering is not yet enabled.
I also needed to strip the bad changesets from ~/firefox-overlay/.hg/hg.mozilla.org__projects__converted-servo-linear (I forgot we had a clone of the repo inside the overlay repo). Alternatively, I could have just nuked the repo because it is a glorified cache.
Autoland repo has been reopened and I started the overlay service. It "just works." So I also enabled all the other systemd units. So we're back to fully functional.

There's still a bit of cleanup I need to make. Keeping bug open to track that.
I manually ran some SQL to restore the old pushlog users for changesets that I grafted and re-pushed after stripping.

At this point, everything is back to the way it was.

We'll likely some follow-up work to improve the monitoring/alerting situation. But that's for another bug.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: