Closed Bug 1322769 Opened 4 years ago Closed 4 years ago
Vendor servo into mozilla-central
272.13 KB, text/plain
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
59 bytes, text/x-review-board-request
Stylo needs large parts of https://github.com/servo/servo vendored into mozilla-central. This bug tracks doing that.
Please let us get the license checker in before we decide to vendor a billion packages.
Depends on: 1316990
I'm also open to vendoring servo without the rust dependencies as a first milestone.
Also, we're not going to vendor the WPT metadata files during the initial landing. This is because there are thousands of them and we don't want to bloat repo size, at least initially.
(In reply to Gregory Szorc [:gps] from comment #2) > I'm also open to vendoring servo without the rust dependencies as a first > milestone. That would work for me, but where do servo's dependencies come from when we do builds, then?
(In reply to Nathan Froyd [:froydnj] from comment #4) > (In reply to Gregory Szorc [:gps] from comment #2) > > I'm also open to vendoring servo without the rust dependencies as a first > > milestone. > > That would work for me, but where do servo's dependencies come from when we > do builds, then? We have the incubator/stylo repo serving as a stop-gap until those dependencies are vendored. Also, we'll only vendor the dependencies required by style (which is a manageable list) rather than all of servo's dependencies (which has something like 300 dependencies).
Rather than attempt to submit this behemoth to MozReview or Splinter, please just have a look at the commit at https://hg.mozilla.org/users/gszorc_mozilla.com/firefox/raw-rev/db0004d9a749 (warning: large file, may cause browser to hang). Review should consist of reading the commit message than granting r+ if you agree. I'm flagging a bunch of people for review to ensure there is general consensus on this approach.
Does that commit contain anything other than servo/ being populated?
I assume that we decided not to import/replay the entire history of servo (ignoring wpt) due to size constraints. Is there any other disadvantage? Is it worth considering importing servo history just up until when we started working on stylo, so that we have history in-tree for all of the stylo work under servo/ up until now?
The commit message has one "z" in lizard, whereas the CVS/mercurial messages quoted have two "z"s. Intentional?
(In reply to Manish Goregaokar [:manishearth] from comment #7) > Does that commit contain anything other than servo/ being populated? No. Will follow-up with e.g. .hgignore changes once this commit is approved. (In reply to Cameron McCormack (:heycam) from comment #8) > I assume that we decided not to import/replay the entire history of servo > (ignoring wpt) due to size constraints. Is there any other disadvantage? Good question. I realized I forgot to comment about this as I was going to bed but didn't want to reopen the laptop to write it :) Somewhere - I can't remember where - I'm pretty sure we decided the full Servo history wasn't that useful to us because: * Size (although not a huge issue w/o WPT - unless there are other surprises in the history) * Creates a massive bisect "trap" for people bisecting Gecko * Would introduce tons of merges and contaminate history of m-c * Wouldn't be that useful to Firefox/Gecko developers anyway > Is it worth considering importing servo history just up until when we > started working on stylo, so that we have history in-tree for all of the > stylo work under servo/ up until now? This, however, is a more compelling argument since that history in mozilla-central could help Firefox/Gecko developers. It's definitely something to think about. Overall, I'm sure someone, sometime will regret not vendoring the full history. If we play the "long game," it becomes really tempting to ignore the immediate disadvantages and just do it. But the body of evidence today says not to.
(In reply to Cameron McCormack (:heycam) from comment #8) > I assume that we decided not to import/replay the entire history of servo > (ignoring wpt) due to size constraints. Is there any other disadvantage? > Is it worth considering importing servo history just up until when we > started working on stylo, so that we have history in-tree for all of the > stylo work under servo/ up until now? Just to clarify, the history of m-c would look something like the following under this plan: m-c tip before this bug lands | land servo/servo in its checkout state before stylo commits start getting added (bulk import) | individual commits to servo/servo | [m-c after this bug lands] | m-c and servo/servo commits interspersed as necessary Is that accurate? In thinking about it, the reason I could see for importing servo/servo stylo-era history is so you could do bisection...but importing it in a bisectable way seems somewhat difficult, i.e. rewriting history, not to mention figuring out how to get a relatively buildable tree for bisection. Are there other reasons that you had in mind?
(In reply to Gregory Szorc [:gps] from comment #10) > * Size (although not a huge issue w/o WPT - unless there are other surprises > in the history) > * Creates a massive bisect "trap" for people bisecting Gecko This does seem like an issue. Could we solve it by adding all the servo stuff in a separate head and then joining it in a single merge commit? > * Would introduce tons of merges and contaminate history of m-c Can't we just linearize these on per-push granularity like we were planning to do with the autolander anyway? > * Wouldn't be that useful to Firefox/Gecko developers anyway Hard to say what's useful until someone needs it. I do run into this reasonably often. Can certainly live without it, but would be good to quantify the codesize impact and see if there's anything we can do about the bisect trap.
Also, if figuring out how to vendor the whole history would significantly slip the date at which we can replace incubator with m-c, I would be against it.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #12) > (In reply to Gregory Szorc [:gps] from comment #10) > > * Size (although not a huge issue w/o WPT - unless there are other surprises > > in the history) > > * Creates a massive bisect "trap" for people bisecting Gecko > > This does seem like an issue. Could we solve it by adding all the servo > stuff in a separate head and then joining it in a single merge commit? We could check in this code into a new "root" and then merge it with mozilla-central, yes. I think that makes the bisect issues go away. > > > * Would introduce tons of merges and contaminate history of m-c > > Can't we just linearize these on per-push granularity like we were planning > to do with the autolander anyway? Yes. > > * Wouldn't be that useful to Firefox/Gecko developers anyway > > Hard to say what's useful until someone needs it. > > I do run into this reasonably often. Can certainly live without it, but > would be good to quantify the codesize impact and see if there's anything we > can do about the bisect trap. I can measure the full history easily enough. Please tell me the earliest Servo commit related to Stylo so I can also measure just the commits after. I'll try to run a mock conversion tomorrow (Thursday).
The earliest commit that I made to the style system is https://github.com/servo/servo/commit/75ec093334ff8f8f7ef41b90007588b924c40731 That's sort of arbitrary though, since there was plenty interesting style system work before that. Another interesting measurement point might be https://github.com/servo/servo/commit/c6ab60dbfc6da7b4f800c9e40893c8b58413960c , which is when cargo landed and everything in the tree shuffled around, and is therefore the oldest commit in the style system that a non-rename-tracking blame would find. Again, plenty of interesting work before that though.
Mercurial blame follows renames automatically if they are recorded as such. Conversion of repos from Git to Mercurial can record file move annotations. hg-git can find copies (as opposed to renames), including fuzzy matching where a minimum percentage match of a file can be used to detect copies. This operation is very expensive to run on large repos (like mozilla-central), so I may not include it in initial estimates.
A conversion of Servo with full Git history minus tests/wpt weighs in ~46MB in bundle form, ~71MB in on-disk file size (not including inode overhead). That will get smaller if all the merge commits are filtered out. This isn't as bad as I thought. I'm definitely open to importing the full history.
That is, IMO, worth it. We should be careful to design the system to be robust against somebody renaming wpt directory on the servo side such that it accidentally gets imported. Maybe m-c should require a magic string to accept landings that add more than X bytes or inodes to the repo?
https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear contains the servo repo history without merges. The conversion is essentially the first parent of every commit from master. This repo is ~38MB in bundle form, which is very reasonable. This repo is by no means the final product: I'm throwing it out there to see if people like the approach of taking this set of commits. An alternate commit selection method would be to attempt to rebase the p2 branch of every merge commit. This wouldn't work in all circumstances because there could be merge conflicts on rebase. In many cases, they could be resolved automatically by looking at the post-merge code state. But if the same file lines are being touched multiple times in a branch, this approach won't always work. Writing the code to perform this conversion and attempt intelligent conflict resolution would be a bit consuming. Taking the p1 ancestry, by contrast, essentially "just works." The servo-linear repo is also lacking copy/rename metadata. This is somewhat time consuming to compute, so I'll hold off doing it until later. If we take the p1 ancestry, we may also want to rewrite commit messages to include details on the "dropped" commits. Anything is possible here: I just need to know what to do. We could even encode the "dropped" commits as machine-readable so e.g. hgweb could hyperlink to the original GitHub pull requests / commits.
> without merges I like this approach, and hope we use it for the autoland sync stuff. While we try to require that individual commits in a PR pass tests, that's not always the case, so really treating the auto merge commit as a single unit makes the most sense especially in the context of stuff like bisection.
Comment on attachment 8818484 [details] vendor servo I think putting the code in a top-level servo dir is totally sensible. One thing I noticed skimming the raw-rev that will be slightly interesting is that servo has a Cargo.lock file committed but we will use our own since we're using the code as a library crate. I don't think this will actually be a problem, but it is likely that we will not wind up using the exact same versions of crates in Servo and Gecko builds. (They should match by the semver rules.)
Attachment #8818484 - Flags: review?(ted) → review+
Comment on attachment 8818484 [details] vendor servo I have nothing to complain about with the approach. I approve of the commit message, but you may want to correct the spelling of "lizard" in your commit message explanation paragraph--at least hg@1 has "lizard". Include an exclamation mark, though, and show some emotion! :) I have no objection to including servo history; I have a preference for doing it without merge commits if possible.
Attachment #8818484 - Flags: review?(nfroyd) → review+
Comment on attachment 8818484 [details] vendor servo The approach we've discussed sgtm.
Attachment #8818484 - Flags: review?(bobbyholley) → review+
The current tentative conversion approach is: * Convert the p1 only history of servo from git to hg (no merge commits) * Filter out all files related to WPT * Import the history (~8000 commits) into mozilla-central To be decided: * Add a new root to mozilla-central (will likely require some minor hook work on the server to allow it) * Commit message reformatting details Converting only the p1 history will require some upstream changes to Mercurial to make `hg convert` happy. We can run an unreleased/patched version or a custom extension in the interim. Working on these improvements is currently my #1 priority.
I think I mentioned elsewhere, but we probably want a size / num-file hook to prevent the autolander from accidentally landing all the WPT tests if somebody renames the directory.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #25) > I think I mentioned elsewhere, but we probably want a size / num-file hook > to prevent the autolander from accidentally landing all the WPT tests if > somebody renames the directory. Good idea. Also, today's update is I'm still in the weeds coercing the git -> hg conversion to squash merges while retaining copy/rename metadata. The existing conversion tools make a number of assumptions that merges stay as merges. So when parents are dropped, it is failing to find copy sources and failing to record copy metadata. I may just give up and do a 2 phase conversion where I rewrite history of the Git repo to squash merges then convert that to Mercurial.
I abandoned the approach of patching `hg convert` and have moved on to a 2 phase conversion because I got tired of doing down a rabbit hole. I didn't want to do this because it means writing what is essentially Mozilla-custom code as opposed to a feature in Mercurial (that could be maintained by the Mercurial project) and it is inherently a more complex approach. But here we are. I've got code for the 2 phase approach mostly working, including tests. It lives in my version-control-tools user repo. I'm traveling Thursday and may be PTO Friday until Tuesday or Wednesday. So this may be the final update I give for a few days. I'm pretty encouraged about where I am. If I squint hard enough, I can also see how I can apply this approach to have mozilla-central track various upstream Git[Hub] repositories in a way that is compatible with our existing VCS practices.
Awesome - thanks for pushing on this!
I spent today working out a number of minor blocking issues with the conversion, such as being able to properly record metadata about the source commit in Mercurial. This required some minor changes to `hg convert`, which I'm in the process of upstreaming. The biggest remaining issue is accuracy of copy detection. Mercurial has explicit copy/rename metadata. Git relies on run-time detection. So, when converting from Git to Mercurial, you need to write the copy/rename metadata at conversion time. Both rename and copy detection are prone to false positives and false negatives. There are flaws with both explicit metadata tracking and purely algorithm/heuristic based detection, so neither Git nor Mercurial are perfect when it comes to tracking renames and copies. Anyway, when we do the conversion, we'll need to figure out a good threshold for copy detection. By default, Git says that if 50% of the lines in a file are similar, the file is copied. Preliminary testing shows this can be way too aggressive. For example, a number of files with only boilerplate are detected as copies when they obviously have no relationship. On the flip side, the copy detection does appear to find some valid copy scenarios. I'll likely post lists of copies/renames at various thresholds and see if we can hone in on a good copy detection threshold. I'll try to get sample repos with lists of file copies posted next week. I'm PTO Friday, Monday and Tuesday are holidays, and I may be PTO on Wednesday.
In general I think we care much more about tracking renames than copies, and renames are easier to detect because the original file went away. If we limit the scope to renames, do things get easier?
Yes, rename detection is much simpler. It effectively "just works." We can also avoid false positives with copy detection by setting the copy similarity threshold high (default is 50%) and e.g. filtering out files with certain traits (paths, file size, etc). I also anticipate Mercurial will one day gain run-time copy detection. So the need for storing the correct metadata up front isn't too important. It's mainly an issue for people doing archeology. Even though mozilla-central isn't the canonical repo for Servo, I've learned to never under-estimate the wants of code archeologists, so I prefer to do things right.
https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear2/ is up. It has proper rename and copy detection at a 75% similarity threshold. This file contains a list of all recorded copies, grouped by changeset in DAG order, oldest to newest.
Bikeshedding time! 1) Do we want to rewrite the commit message summary line? Messages like "Auto merge of #14668 - glennw:update-wr-dwrote, r=jdm" aren't exactly useful in the context of mozilla-central. I propose prefixing "servo: " on the summary line to indicate this is a vendoring of a Servo commit. 2) Do we want to include a list of "squashed" Git commit SHA-1s in the commit message? If so, what's the format? Since we're currently only taking the p1 ancestry, that means the commits in the p2 parents are "lost." In some situations, I could see archeologists wanting to quickly see the list of commits (and possibly their message summary lines) in the "squashed merge" commit message. Note that in bug 1325782, I plan to have hg.mozilla.org link to github.com for these commits. If we link to the PR, that would expose the commits. Is that good enough? 3) Do we want to do anything about the Reviewable blob in commit messages? Currently, recent commit messages have a multi-line blob like: <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14701) <!-- Reviewable:end --> I'm not keen on GitHub flavored markdown appearing in the commit message. We could just as easily rewrite that to just the https://reviewable.io/... URL and hgweb will linkify it automatically. 4) What about the PR template? e.g. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #14688 (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because it only affects documentation. <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> I'm also not too keen on this, especially the comments. But some of it could be valuable. 5) For bug 1325782, we'll likely include the source repository and commit SHA-1 in the commit message (so hgweb can linkify appropriately - read that bug). I'm tentatively going to put "Source-Repo: https://..." and "Source-Revision: abcd..." (or similar) lines at the bottom of commit messages. Any thoughts? 6) Are there any other commit or commit message rewrites that others can think of?
I implemented rewriting of the Reviewable markdown boilerplate with e.g. "Reviewable-URL: https://reviewable.io/reviews/servo/servo/14701". It only took a few minutes of work and looks much, much nicer. I also have the source repository and revision recorded in the commit message. That enables a lot of useful hyperlinking from looking at the commit message alone.
I was getting tired of ~2 hour conversion times to test out changes. So, I rewrote the old code relying on `git filter-branch` to Python using the Dulwich package for performing low-level Git object manipulation. As a result, the "linearizing" part of history rewriting now takes ~10s instead of ~2 hours and the overall conversion now takes ~4 minutes (most of that time is in `hg convert` calling `git` commands to obtain data for the conversion - it would likely be a magnitude faster if we used hg-git or git-cinnabar for the conversion. For the curious, the performance of the Git index when there are 100k+ files is *really* bad. `git filter-branch`'s repeated population of the index in order to make an --index-filter usable was consuming tons of CPU time. Bypassing the index and manipulating tree objects via Dulwich is so much faster.
Do we need the tests/ref directory in the imported Servo history in mozilla-central?
Latest conversion is up at https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear3.
> Do we need the tests/ref directory in the imported Servo history in mozilla-central? Not really. I don't think any of these tests are currently even run by Servo. They used to be, but I don't think that's particularly important for this history import. Couple of issues with the history: - #1234 should link to the Servo pull request, not a bugzilla bug. Embedding the entire link is fine. - As we discussed in person, making the commit message use the PR message (first line of extended commit message) with the r=foo appended would be nicer. This may get a bit tricky to deal with because we also have some manual github merges on servo/servo, from times when bors/homu stopped working. An example would be at https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear3/shortlog/9082bff8cbfe . In this case, the merge author is the reviewer. This isn't really important though so it's fine as-is as well. - The extended commit message could also use a list of commit messages of the squashed P2 commits as well, though that's not really necessary.
Bug 1325782 is for hgweb to linkify the PR properly. I'll drop tests/ref from the next conversion.
Dropping tests/ref (and its previous home at src/test/ref) reduced the bundle size (on wire size) from 36,390,265 to 30,759,315 and tracked file from 5,694 to 4,492. Of these remaining tracked files, 2,347 have "components" in the path (either /src/components or /components), and 1,195 have "test." Here is the kilobyte count for various directories: 12590 .hg/store/data/components 9908 .hg/store/data/src 4680 .hg/store/data/tests 3204 .hg/store/data/ports 3054 .hg/store/data/resources 4807 .hg/store/data/components/script 3110 .hg/store/data/components/style 1423 .hg/store/data/components/layout 525 .hg/store/data/components/gfx 496 .hg/store/data/components/net 408 .hg/store/data/components/compositing 377 .hg/store/data/components/servo 297 .hg/store/data/components/util 118 .hg/store/data/components/devtools 118 .hg/store/data/components/canvas 6532 .hg/store/data/src/components 1816 .hg/store/data/src/servo 976 .hg/store/data/src/test 296 .hg/store/data/src/servo-gfx 116 .hg/store/data/src/servo-gfx-2 28 .hg/store/data/src/etc 27 .hg/store/data/src/servo-net 15 .hg/store/data/src/platform 14 .hg/store/data/src/support 12 .hg/store/data/src/servo-util 2132 .hg/store/data/tests/unit 1773 .hg/store/data/tests/content 692 .hg/store/data/tests/html 21 .hg/store/data/tests/compiletest 17 .hg/store/data/tests/reftest.rs.i 15 .hg/store/data/tests/heartbeats 9 .hg/store/data/tests/power 9 .hg/store/data/tests/jquery 6 .hg/store/data/tests/dromaeo 6 .hg/store/data/tests/contenttest.rs.i 1129 .hg/store/data/ports/android 951 .hg/store/data/ports/geckolib 734 .hg/store/data/ports/cef 240 .hg/store/data/ports/gonk 72 .hg/store/data/ports/glutin 48 .hg/store/data/ports/servo 21 .hg/store/data/ports/glfw 9 .hg/store/data/ports/stable-rust We could chase the long tail and remove a few megabytes here or there. Honestly, I'm pretty happy with the current storage requirements. I fear we would regret dropping too much history down the road. IMO we should only drop stuff if it will cause scaling problems (WPT) or has little to no value in mozilla-central (IMO files deleted from the head of the Servo repo qualify).
(In reply to Gregory Szorc [:gps] from comment #33) > 1) Do we want to rewrite the commit message summary line? Messages like > "Auto merge of #14668 - glennw:update-wr-dwrote, r=jdm" aren't exactly > useful in the context of mozilla-central. I propose prefixing "servo: " on > the summary line to indicate this is a vendoring of a Servo commit. GitHub linkifies PR references in issue comments and commit messages using notation like `servo/servo#14828` to identify GitHub account, repo, and issue #. You could change the imported commit messages from "servo: Auto merge of #14828" to something using GitHub-like syntax like "servo: Auto merge of servo/servo#14828" and linkify `servo/servo#14828` to https://github.com/servo/servo/pull/14828.
(In reply to Gregory Szorc [:gps] from comment #34) > I implemented rewriting of the Reviewable markdown boilerplate with e.g. > "Reviewable-URL: https://reviewable.io/reviews/servo/servo/14701". It only > took a few minutes of work and looks much, much nicer. I think this is fine, but I suspect blocking markdown in general will be on the wrong side of history. Not that I expect those reviewable.io urls to work in 10 years... (In reply to :gps email https://groups.google.com/forum/#!topic/mozilla.dev.platform/ygd2EoILDD8) > * Remove everything related to submodules I worry about this, since it's the only information about what version of some 3rd-party software should be compatible with a particular revision. Maybe we could keep in in-tree `.gitmodules` files to save having to look in the original git repo?
(In reply to Chris Peterson (out until January 2) [:cpeterson] from comment #41) > GitHub linkifies PR references in issue comments and commit messages using > notation like `servo/servo#14828` to identify GitHub account, repo, and > issue #. Yes, we should expand the auto hyperlinking on hg.mozilla.org to cover this pattern for cases where a commit message references other projects. > You could change the imported commit messages from "servo: Auto > merge of #14828" to something using GitHub-like syntax like "servo: Auto > merge of servo/servo#14828" and linkify `servo/servo#14828` to > https://github.com/servo/servo/pull/14828. That's a good idea, but this won't be necessary. The commit message metadata contains a "Source-Repository" annotation. And the commit message hyperlinking code I just wrote is GitHub aware: if it sees "Source-Repository: https://github.com/..." it will automatically link "#\d+" to https://github.com/account/repo/issues/\d+ (which will HTTP redirect to a "pull" URL if it is a pull request). FWIW, I worry about rewriting the summary line too much because it is common for people to copy and paste that to search for a commit. (Today's GitHub news is even reevant: https://github.com/blog/2299-search-commit-messages.) Speaking of hyperlinking commit messages, I could use extra eyes on the code I posted in bug 1325782 to make sure the GitHub linking is sane.
Forgot to post here earlier when it was deployed, but hg.mozilla.org is now hyperlinking to GitHub when appropriate. See it in action at https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear4 and https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear4/rev/7d5ae7de6a89 The "bug" line on the /rev/ URLs isn't linking properly. Bug 1328725 tracks fixing that (it is low priority).
https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-unified-servo is a clone of mozilla-unified with the history of servo merged into mozilla-central. The Servo history is on a new root changeset. There is a special server-side config on this repo to allow pushing new roots (normally that would be rejected by a hook). So attempting to pull and push this repo elsewhere on hg.mozilla.org will currently fail.
This is really exciting - thanks for working on this Greg!
Attachment #8818484 - Attachment is obsolete: true
I'm intending for glob to review this code when he gets back from holiday on Monday. I don't think code review is strictly required to perform the initial import of Servo into mozilla-central, however. glandium: given the subject matter, you may be interested in a drive-by. I'd certainly appreciate any review you can give. But I'm not expecting it of you, hence the needinfo and not a stronger flag.
I only have two comments: - You could have created a git fast-import stream instead of using dulwich instead of filter-branch. I bet it's even possible to create such a stream directly with git log --format. - I find it sad that the conversion uses hg-convert, when we already use hg-git for gecko-dev (although I'd prefer it switched to cinnabar).
(In reply to Mike Hommey [:glandium] from comment #51) > I only have two comments: > - You could have created a git fast-import stream instead of using dulwich > instead of filter-branch. I bet it's even possible to create such a stream > directly with git log --format. Yes, `git fast-import` is the best way to create thousands of objects. However, I don't think it's worth more than an inline comment noting the future optimization at this time given the good-enough performance of the existing code. FWIW the existing code does a conversion of the master branch of gecko-dev in ~4 minutes. Although >50% of that is spent in `git gc` to pack the hundreds of thousands loose objects. `git fast-import` would remove the need for `git gc`, of course. > - I find it sad that the conversion uses hg-convert, when we already use > hg-git for gecko-dev (although I'd prefer it switched to cinnabar). I /could/ have used cinnabar (or hg-git - but I would have chosen cinnabar given those 2 choices). However, I decided to go with `hg convert` because we're only doing unidirectional conversion and cinnabar - being designed for bi-direction conversion - felt "excessive" for the task at hand. Given a choice between a tool built into Mercurial and that does the job adequately versus a separate (albeit more powerful) tool, I chose simplicity. I will happily use cinnabar when the requirements call for its feature set.
I was doing my due diligence and noticed that the Servo repo used to have an "aux.rs" file. This is problematic because you can't create files with that name on Windows. I tracked this down to being fixed in https://github.com/servo/servo/issues/1226. This shouldn't pose a problem unless a) you are using Mercurial <1.1 b) your repo was cloned with Mercurial <1.1. Since Mercurial 1.1 was released in December 2008, I'm optimistic this won't be a problem. If it does hit someone, then we've done them a favor by forcing them to upgrade to a modern Mercurial storage format.
Manish and I reviewed the latest conversion today and came up with a few minor changes: * The various Bors authors will all be normalized to the same author for consistency * Reviewable-URL annotation will be removed completely (it provides little value and can be derived automatically later) * The summary line of Bors/GitHub PR merge commits will be spliced together from the first 2 lines. Bors/GitHub currently adds an autogenerated summary line e.g. "Auto merge of #14984 - servo:needsRooting, r=jdm" and pushes the original summary line to line 2. We want to move the original summary line back into the summary line because that is more valuable than "Merge #...".
Comment on attachment 8824307 [details] vcssync: scaffolding for version control syncing project (bug 1322769); https://reviewboard.mozilla.org/r/102826/#review106924
Attachment #8824307 - Flags: review?(glob) → review+
For my own sanity so I don't have to keep finding this in my shell history across multiple machines, the current command being used to convert the repo is: linearize-git-to-hg --exclude-dir src/test/wpt --exclude-dir src/test/ref --exclude-dir tests/ref --exclude-dir tests/wpt --summary-prefix servo: --remove-reviewable --source-repo-key Source-Repo --source-revision-key Source-Revision --normalize-github-merge-message --committer-action use-author --hg ~/bin/hg --copy-similarity 75 --find-copies-harder --skip-submodules https://github.com/servo/servo master ~/tmp/hgtmp/servo-git-source ~/tmp/hgtmp/servo-hg-dest
I locally have commit message summary line rewriting working like is wanted. However, a large number of commits (those with the "auto merge of #" syntax) do not appear to include the PR title in the commit message! In fact, many of these commits only have a single line in the commit message. e.g. "auto merge of #5388 : Ms2ger/servo/net-tests, r=jdm". The modern style of "Auto merge of #" does include the PR title on a subsequent line in the commit message as does the GitHub default message format (which is used in very old commits before Bors was introduced). I suppose we could query the GitHub API to extract the PR title. That's unfortunate. But considering the PR title is missing from ~2200 commits spanning multiple years, it is probably worth doing. Yay scope bloat.
Comment on attachment 8824308 [details] vcssync: support for linearizing a Git repo (bug 1322769); https://reviewboard.mozilla.org/r/102828/#review107762 this looks sane to me. ::: vcssync/mozvcssync/gitrewrite/__init__.py:169 (Diff revisions 4 - 5) > > + if remove_reviewable: > + message = RE_REVIEWABLE.sub(b'', message) > + > + if reviewable_key: > + message = RE_REVIEWABLE.sub(b'%s: \\g<url>' % reviewable_key, message) reviewable_key should be re.escape'd
Attachment #8824308 - Flags: review?(glob) → review+
https://hg.mozilla.org/users/gszorc_mozilla.com/servo-linear6 is my latest conversion. GitHub PR titles are now fetched from the GitHub API and added to the summary line. Also, the p2 author is used on merge commits. I manually looked stepped through ~25% of merge commits and verified that the p2 author almost always aligns with the GitHub user that created the PR. So I think it is safe to rewrite the author on these commits.
Comment on attachment 8824308 [details] vcssync: support for linearizing a Git repo (bug 1322769); https://reviewboard.mozilla.org/r/102828/#review107762 > reviewable_key should be re.escape'd Unfortunately, it isn't that easy since a key like `Reviewable-URL` will be escaped to `Reviewable\-URL`. I'll come up with a craftier solution.
Comment on attachment 8824308 [details] vcssync: support for linearizing a Git repo (bug 1322769); https://reviewboard.mozilla.org/r/102828/#review108146 ::: vcssync/mozvcssync/cli.py:48 (Diff revision 6) > + help='Use the author if the 2nd parent of ' > + 'merge commits')), s/if the 2nd parent/of the 2nd parent/
something weird is going on here .. https://reviewboard.mozilla.org/r/102830/diff/5-6/ shows you adding --use-p2-author, however this change is missing from https://reviewboard.mozilla.org/r/102830/diff/6 and the corresponding raw diff at https://reviewboard-hg.mozilla.org/version-control-tools/raw-rev/3b7f0b2bd26c gps - can you push an update to r/102830 please (there's the same typo to fix).
(In reply to Byron Jones ‹:glob› from comment #75) > something weird is going on here .. > > https://reviewboard.mozilla.org/r/102830/diff/5-6/ shows you adding > --use-p2-author, however this change is missing from > https://reviewboard.mozilla.org/r/102830/diff/6 and the corresponding raw > diff at > https://reviewboard-hg.mozilla.org/version-control-tools/raw-rev/3b7f0b2bd26c > > gps - can you push an update to r/102830 please (there's the same typo to > fix). Interdiffs show all other changes to the file. So this interdiff for commit 3 was picking up changes made in later versions of commit 2. Interdiffing is hard :/
Comment on attachment 8824309 [details] vcssync: support for converting a linearized Git repo to Mercurial (bug 1322769); https://reviewboard.mozilla.org/r/102830/#review108818
Attachment #8824309 - Flags: review?(glob) → review+
Will land the initial series momentarily. I have a few follow-ups. Will submit those in this bug as well.
Pushed by email@example.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/530195e755f4 vcssync: scaffolding for version control syncing project ; r=glob https://hg.mozilla.org/hgcustom/version-control-tools/rev/de7ba86773e9 vcssync: support for linearizing a Git repo ; r=glob https://hg.mozilla.org/hgcustom/version-control-tools/rev/92cb8e06a435 vcssync: support for converting a linearized Git repo to Mercurial ; r=glob
Comment on attachment 8831326 [details] vcssync: log number of commits converted; https://reviewboard.mozilla.org/r/107880/#review109546
Attachment #8831326 - Flags: review?(glob) → review+
Comment on attachment 8831327 [details] vcssync: drop commits without tree changes when linearizing (bug 1322769); https://reviewboard.mozilla.org/r/107882/#review109602
Attachment #8831327 - Flags: review?(glob) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/hgcustom/version-control-tools/rev/2c89c07900d6 vcssync: drop commits without tree changes when linearizing ; r=glob
(In reply to Ralph Giles (:rillian) | needinfo me from comment #42) > (In reply to Gregory Szorc [:gps] from comment #34) > > * Remove everything related to submodules > > I worry about this, since it's the only information about what version of > some 3rd-party software should be compatible with a particular revision. > Maybe we could keep in in-tree `.gitmodules` files to save having to look in > the original git repo? I'm going to call scope creep on this. The existing git->hg conversion tool either handles submodules or it doesn't. We'd need to implement new functionality to preserve some bits. That's a bit of effort. Since the Git submodules only exist on old commits (from 2015) and the original history is in Git, I think excluding submodule history from the Mercurial conversion is tolerable and isn't worth the time to preserve. If only there were infinite time...
Pushed by email@example.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/e0c5a75dfaa6 ansible/hg-ssh: allow Servo's root changeset to be pushed to Firefox repos
Comment on attachment 8833109 [details] Bug 1322769 - Add a "servo" feature; https://reviewboard.mozilla.org/r/109330/#review110440
Attachment #8833109 - Flags: review?(manishearth) → review+
Comment on attachment 8833113 [details] Bug 1322769 - Have eslint ignore servo/; https://reviewboard.mozilla.org/r/109332/#review110452
Attachment #8833113 - Flags: review?(manishearth) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/268fa5f3bc25 Add a "servo" feature; r=manishearth https://hg.mozilla.org/integration/autoland/rev/216081fe8791 Have eslint ignore servo/; r=manishearth
Comment on attachment 8833116 [details] Bug 1322769 - Declare servo rust feature properly; https://reviewboard.mozilla.org/r/109340/#review110456
Attachment #8833116 - Flags: review?(manishearth) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/0a3070e1d23c Declare servo rust feature properly; r=manishearth
Comment on attachment 8833416 [details] Bug 1322769 - Move dummy geckolib to toolkit/library, change taskgraph detection; https://reviewboard.mozilla.org/r/109638/#review110714 You'll want to make sure to ni? Cameron (or whoever's doing the stylo merges) for a heads-up on this.
Attachment #8833416 - Flags: review?(nfroyd) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/f49a72bc7aed Move dummy geckolib to toolkit/library, change taskgraph detection; r=froydnj
Just vendored servo into autoland. https://hg.mozilla.org/integration/autoland/rev/be030db91f007f59dcbd4b51033caee8b2251a8f This changeset likely won't merge around well. I may have to disable hooks on the server when that happens. Ping me on IRC.
You need to log in before you can comment on or make changes to this bug.