need to explicitly add vendored "*.egg-info" to simplify `.hgignore`
Categories
(Developer Infrastructure :: Mach Vendor & Updatebot, enhancement)
Tracking
(firefox127 fixed)
Tracking | Status | |
---|---|---|
firefox127 | --- | fixed |
People
(Reporter: pierre-yves.david, Assigned: pierre-yves.david)
References
(Regressed 1 open bug)
Details
Attachments
(3 files)
Expected results:
The ".hgignore" of the firefox repository uses negative lookahead regular expressions for two of its patterns.
Using such patterns prevent the use of modern regular engine, slowing down all operations that use status (status, addremove, merge, update, commit, histedit, …).
At that point you may wonder: "does not fsmonitor to speed up status anyway? Why do we care". In short, the performance gains still apply with fsmonitor for a coulpe of reasons.
First fsmonitor still need to process the hgignore when the working copy is subject to a lot of modification. In a case we tested, the working copy state makes the fsmonitor gain limited (2.031 seconds without fsmonitor vs 1.501 seconds with fsmonitor) and using better regex engine provides a significant speedup with fsmonitor (from 1.501 to 1.051 seconds).
Second, fsmonitor only speed things up when it works, and there is many cases when it does not, for example, if too many files changed, it might not have picked fs events fast enough and have to restart from scratch, or the maximum number of inotify watch have been exceeded. In these cases the time spent in status can be significantly slower than the one without it, we observed up to 4 seconds status time in theses case in our benchmarks. Talking about benchmarks, we had issue with fsmonitor out of the box in most cases and had to adjust things to make it works.
That second point is quite important, without these look-ahead patterns, it is possible to use a better implementation of status written in Rust. That implementation provides performance similar to the one offered but fsmonitor best cases, without the associated unreliability.
So What kind of performance improvement are we talking about and how do we get them? I put long benchmark result in the commit description of the patches addressing this, but here is a quick look at key result of a series benchmark I ran:
Status for a clean working copy of Mozilla Unified from Mars 2024 on a 8 core machine.
1.674 seconds Python-code + stdlib "re" module
1.430 seconds Python-code + "re2" module from "google-re2"
0.272 seconds fsmonitor + python-code (best case, worse case is ~4 seconds)
0.278 seconds fsmonitor + python-code + re2
0.359 seconds Rust-code with "dirstate-v1" format
0.259 seconds Rust-code with "dirstate-v2" format
0.235 seconds pure-Rust-executable with "dirstate-v1" format
0.052 seconds pure-Rust-executable with "dirstate-v2" format
Status very dirty working copy of Mozilla Unified from Feb 2022 on a 8 core machine.
1.501 seconds fsmonitor + python-code (best case, worse case is ~4 seconds)
2.031 seconds Python-code + stdlib "re" module
1.560 seconds Python-code + "re2" module from "google-re2"
1.051 seconds seconds fsmonitor + python-code + re2
0.519 seconds Rust-code with "dirstate-v1" format
0.483 seconds Rust-code with "dirstate-v2" format
0.234 seconds pure-Rust-executable with "dirstate-v1" format
0.124 seconds pure-Rust-executable with "dirstate-v2" format
Here we see that:
-
being able to use the more modern re2 engine provides a significant performance improvement for code path that cannot rely on a trivial fsmonitor returns.
-
The rust code path in Mercurial (Available on linux and MacOs) provides the similar performance benefit as fsmonitor in the simple cases and significantly outperforms it in the more complex cases. (In addition, these performance gains will be more stable and reliable than fsmonitor)
-
performance of the pure rust executable exceed the one provide by fsmonitor, by a mix of eliminating the python setup overhead and using efficient algorithm (and format). The comparison of "Rust-code" vs "pure-Rust-executable" gives us an idea of that Python overhead. That is also useful to understand the performance benefit to be expected from operation that can do many status call like histedit.
So what do we need to do?
To gain access to these speedup, we "just" need to remove the two negative lookahead patterns.
The first one aims at ignoring vscode directores except for the one at the root of the repository and is simple to rewrite in another way.
The second is trickier as it ignores python packaging ".egg-info" directories except in third-party
directory. We could use an approach similar to the first pattern, explicitly listing directory where they should be ignored. However an alternative is to have ".egg-info" directory ignored globally but have the vendoring too overriding such ignores by explicitly adding files within such directory while vendoring. I implemented that approach.
Assignee | ||
Comment 1•7 months ago
|
||
This lookahead expression prevents the use of more modern and efficient regexp
engine. This slows down "hg status" and other operations.
Since the exception are only about vendored content whose addition is managed by
a script (match vendor
), that script can deal with this exception by itself,
and it does since the last changeset..
So we drop the exception to unlock various performance improvements for status.
Why does this improves things?
There improvement can come from different sources:
-
Using the "re2" regexp engine to match ignored files and directories provide
a performance boost for vanillia mercurial installation and fs-monitor one in
various cases. To benefit from it, just install the "google-re2" packages and
mercurial will automatically uses it. -
Installing a Mercurial compiled with the Rust extensions unlock the use of a
more efficient code path for status that performs the necessary action in a
smarter and parallel ways, providing a significant boost. These extensions
are available on Linux and MacOs and some distribution have started to enable
them by default. -
Moving to a more modern "dirstate" format. The dirstate tracks the state of
the working copy. For a couple of years, Mercurial has a new format for this
information that is more efficient to read and update and tracks finer
grained information. This allow substantial improvement in the way we run
status. The Rust extensions are required to efficiently using this format. -
Using a pure-rust executable. Mercurial has a pure rust version (called
"rhg") that can handled a limited set of commands. It run without the
overhead of starting and initializing Python providing another very
significant boost to performance… but obviously requiring the Rust code path
to be usable.
Quick Conclusion of the Benchmarks
(Putting that first for people who just want a quick read.)
-
fsmonitor struggle on working copy with many modication,
-
Using the "re2" binding from "google-re2" helps, especially for these cases
-
On typical mozilla developer machine, the Rust variants match the fsmonitor
performance at worse and exceed it in multiple cases. Especially it does not
stuggle with the "many modification" case. -
On smaller machine, the Rust variants still provide a solid and reliable
performance win accross all operation. That make them preferable to fsmonitor. -
The rust variants matches "git status" performance on equivalement workload.
The pure Rust version significantly outperforms it.
Benchmarks descriptions
Machines
We ran benchmark on two different machines:
-
A i7-7700K 4 physical / 4 logical cores released in Jan 2017
To see performance in "low" parallelism case.
-
A i9-9900K 8 physical / 16 logical cores released in October 2019
To see performance in a "high" parallism case.
In both cases the repositories lived in a btrfs file system backed by solid
state disks (ssd or nvme) and the machines had enough ram to keep caches in
memory.
I also ran benchmarks on a more modern i7-1370P release on Jan 2023, and the
results were consistent with the i9-9900K ones.
Variants
Benchmarks were run with multiple variants of Mercurial:
- python-re:
- no Rust extensions used,
- regex engine is the std-lib "re" module.
- fsmonitor is disabled
- using the dirstate-v1 format
- python-re2:
- no Rust extensions used,
- regex engine is the std-lib "re" module.
- fsmonitor is disabled
- using the dirstate-v1 format
- fsmonitor-re:
- no Rust extensions used,
- regex engine is the std-lib "re" module.
- fsmonitor is enabled and working at its best
- using the dirstate-v1 format
- fsmonitor-re2:
- no Rust extensions used,
- regex engine is the std-lib "re" module.
- fsmonitor is enabled and working at its best
- using the dirstate-v1 format
- rust-ds1:
- Rust extensions are used,
- regex engine from the Rust "regexp" crate.
- fsmonitor is disabled
- using the dirstate-v1 format
- rust-ds2:
- Rust extensions are used,
- regex engine from the Rust "regexp" crate.
- fsmonitor is disabled
- using the dirstate-v2 format
- rgh-ds1:
- Pure rust executable is used,
- regex engine from the Rust "regexp" crate.
- fsmonitor is disabled
- using the dirstate-v1 format
- rgh-ds2:
- Pure rust executable is used,
- regex engine from the Rust "regexp" crate.
- fsmonitor is disabled
- using the dirstate-v2 format
Commands
We ran two kind of operations:
-
hg status
with the default output.
This command need to search for ignored and unknown files.
In this case improving the regex engine usually provides significant performance gain. -
hg status --modified --added --removed --deleted
.
This command only need to check the state of tracked files.
In this case, improving the regex engine does not have much effect, but it
is interesting to compare the performance of the various implementation.
Working copies
Case 1: pristine-928b0540e421
Working copy parent is 928b0540e421
* 341 759 tracked files
* 21 253 directories
* no untracked files
Case 2: pristine-8f96f8c756ae
Working copy parent is 8f96f8c756ae
(an older changeset I had dirty working copy for)
* 246 855 tracked files
* 15 047 directory
* no untracked files
Case 3: clean-8f96f8c756ae
Working copy parent is 8f96f8c756ae
* 246 855 tracked files
* 23 540 directories
* 79 901 ignored files
Case 4: dirty-8f96f8c756ae
Working copy parent is 8f96f8c756ae
* 246 855 tracked files
* 33 720 directories
* 244 386 clean files
* 1 065 modified files
* 247 added files
* 1 040 removed
* 364 missing files
* 63 455 unknown files
* 79 915 ignored files
Results Analysis
(full, raw number after this section)
About fsmonitor
Before diving into the improvements related to regex engine, we can note that
the benchmark show that fsmonitor provides a good boost in the pristine/clean cases, and
a noticeable but disappointing improvement in the very dirty case.
python-re fsmonitor-re
pristine-928b0540e421: 1.884 → 0.293 (-85%)
dirty-8f96f8c756ae: 2.157 → 1.440 (-33%)
Surprisingly when only listing tracked file (during commit for example), fsmonitor actually
get counter productive in the very dirty case
pristine-928b0540e421: 1.313 → 0.297 (-77%)
dirty-8f96f8c756ae: 0.993 → 1.272 (+28%)
In addition to being disappointing in the the very dirty case. The performance
with fsmonitor collapses when fsmonitor cannot use its cache. I observed 4
seconds execution time while setting up the brenchmark..
Improvement without involving Rust:
Using the re2 binding from the google-re2 package provides a small improvement
to plain python execution (about 15%). This case is relevant because this is
the one that will be used when fsmonitor cannot help or start.
python-re python-re2
pristine-928b0540e421: 1.884 → 1.650 (-15%)
dirty-8f96f8c756ae: 2.157 → 1.718 (-20%)
It does not make a difference when only listing tracked files as the hgignore is not involved.
python-re python-re2
pristine-928b0540e421: 1.313 → 1.332
dirty-8f96f8c756ae: 0.993 → 0.998
However, surprisingly, it helps fsmonitor quite a lot in in the dirty case
(dirty-8f96f8c756ae). Bringing fsmonitor performance in line with the plain
python one.
fsmonitor-re fsmonitor-re2
list-unknown 1.440 → 1.012 (-30%)
tracked only 1.272 → 0.840 (-34%)
So to conclude being able to use the "re2" regex engine save up to ⅓ of the
runtime of some operation and never slow things down. So that's a good win.
Improvement involving Rust variants:
For the pristine-928b0540e421 case (all tracked files clean, no ignored files),
Rust provides speed boost "equivalent" (or better) to the one from fsmonitor.
The precise comparison depends of the parallelism level.
With the 4 physical / 4 logical core machine. The Python+Rust version is slower
than fsmonitor, using dirstate-v2 helping to close some of the gap with
fsmonitor. Using dirstate-v2 also allow the "rhg" version to become twice
faster than the fsmonitor version. Also keep in mind that even when a bit
slower, the performance of the rust version will be much more stable than
fsmonitor.
python-re2: 1.650
fsmonitor-re2: 0.296 (-82%)
rust-ds1: 0.542 (-67%)
rust-ds2: 0.368 (-77%)
rhg-ds1: 0.401 (-75%)
rhg-ds2: 0.132 (-92%)
With the 8 physical / 16 physical code machine, the Rust catch up with
fsmonitor performance much quicker. The dirstate-v1 is a little slower, but the
dirstate-v2 version is already faster. The pure rust is always faster.
python-re2: 1.430
fsmonitor-re2: 0.278 (-80%)
rust-ds1: 0.359 (-74%)
rust-ds2: 0.259 (-81%)
rhg-ds1: 0.235 (-83%)
rhg-ds2: 0.052 (-96%)
Talking about parallism. We see that the code scale well, doubling the
number of core bring about twice the performance which is great.
pristine-928b0540e421 4/4 8/16
rhg-ds1: 0.401 → 0.235 (× 1.70)
rhg-ds2: 0.132 → 0.052 (× 2.54)
clean-8f96f8c756ae
rhg-ds1: 0.286 → 0.169 (× 1.70)
rhg-ds2: 0.101 → 0.040 (× 2.52)
dirty-8f96f8c756ae
rhg-ds1: 0.380 → 0.234 (x 1.62)
rhg-ds2: 0.232 → 0.124 (x 1.87)
Comparing with git performance on the pristine-928b0540e421 case also yield
great results. Surprisingly, the variant with a Python overhead still beat (or
match) git performance in this case. The pure Rust executable is always
significantly faster. Below is a comparison grouped by comparable formats.
git status -s: 0.554 (without untracked cache)
rust-ds1: 0.359 (- 35%)
rhg-ds1: 0.235 (- 57%)
git status -s: 0.232 (with untracked cache)
rust-ds2: 0.259 (+ 11%)
rhg-ds2: 0.052 (- 77%)
The clean-8f96f8c756ae case (all tracked clean, many ignored files) show result
result similar to pristine-928b0540e421. "Low" parallism give good gains
without fully matching the fs monitor performance. The High parallism provide
similar performance. In both case we gain the benefit of more stable
performances.
(cores) 4/4 8/16
python-re2: 1.282 | 1.119
fsmonitor-re2: 0.243 (-81%) | 0.225 (-80%)
rust-ds1: 0.416 (-68%) | 0.282 (-75%)
rust-ds2: 0.303 (-76%) | 0.222 (-80%)
rhg-ds1: 0.286 (-78%) | 0.169 (-85%)
rhg-ds2: 0.101 (-92%) | 0.040 (-96%)
Things change quite a lot in the dirty-8f96f8c756ae case, where fsmonitor
struggled. The Rust variants still provides great speedup, significantly
beating the fsmonitor variants for both machines. (comparing to fsmonitor-re
this time)
(cores) 4/4 8/16
fsmonitor-re: 1.440 | 1.501
fsmonitor-re2: 1.012 (-30%) | 1.051 (-30%)
rust-ds1: 0.624 (-56%) | 0.519 (-65%)
rust-ds2: 0.553 (-62%) | 0.483 (-68%)
rhg-ds1: 0.380 (-73%) | 0.234 (-84%)
rhg-ds2: 0.232 (-83%) | 0.124 (-91%)
Things is confirmed in the "listing tracked only" version of dirty-8f96f8c756ae
case were fs monitor was not really improving the situation compared to Python.
(cores) 4/4 8/16
python-re: 0.993 | 0.843076
python-re2: 0.998 | 0.843324
fsmonitor-re: 1.272 (+28%) | 1.291313 (+53%)
fsmonitor-re2: 0.840 (-15%) | 0.844374
rust-ds1: 0.364 (-63%) | 0.273305 (-68%)
rust-ds2: 0.301 (-70%) | 0.233230 (-72%)
rhg-ds1: 0.231 (-77%) | 0.153346 (-82%)
rhg-ds2: 0.099 (-90%) | 0.039545 (-95%)
Full benchmark numbers for hg status
Here are the exhaustive number, all time in seconds.
Case 1: pristine-928b0540e421
(4/4 cores i7-7700K Jan 2017)
python-re: 1.884
python-re2: 1.650
fsmonitor-re: 0.293 (more about 4 second when confused)
fsmonitor-re2: 0.296
rust-ds1: 0.542
rust-ds2: 0.368
rhg-ds1: 0.401
rhg-ds2: 0.132
(8/16 cores i9-9900K CPU October 2018)
python-re: 1.674
python-re2: 1.430
fsmonitor-re: 0.272
fsmonitor-re2: 0.278
rust-ds1: 0.359
rust-ds2: 0.259
rhg-ds1: 0.235
rhg-ds2: 0.052
For reference, I also gathered timing for `git status` on this machine and repo
git status -s: 0.554 (without untracked cache)
git status -s: 0.232 (with untracked cache)
Case 2: pristine-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 1.306
python-re2: 1.227
fsmonitor-re: 0.243
fsmonitor-re2: 0.242
rust-ds1: 0.416
rust-ds2: 0.308
rhg-ds1: 0.287
rhg-ds2: 0.102
(8/16 cores i9-9900K CPU)
python-re: 1.131
python-re2: 1.076
fsmonitor-re: 0.222
fsmonitor-re2: 0.222
rust-ds1: 0.279
rust-ds2: 0.222
rhg-ds1: 0.168
rhg-ds2: 0.038
Case 3: clean-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 1.294
python-re2: 1.282
fsmonitor-re: 0.241
fsmonitor-re2: 0.243
rust-ds1: 0.416
rust-ds2: 0.303
rhg-ds1: 0.286
rhg-ds2: 0.101
(8/16 cores i9-9900K CPU)
python-re: 1.170
python-re2: 1.119
fsmonitor-re: 0.224
fsmonitor-re2: 0.225
rust-ds1: 0.282
rust-ds2: 0.222
rhg-ds1: 0.169
rhg-ds2: 0.040
Case 4: dirty-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 2.157
python-re2: 1.718
fsmonitor-re: 1.440
fsmonitor-re2: 1.012
rust-ds1: 0.624
rust-ds2: 0.553
rhg-ds1: 0.380
rhg-ds2: 0.232
(8/16 cores i9-9900K CPU)
python-re: 2.031
python-re2: 1.560
fsmonitor-re: 1.501
fsmonitor-re2: 1.051
rust-ds1: 0.519
rust-ds2: 0.483
rhg-ds1: 0.234
rhg-ds2: 0.124
Benchmark numbers for hg status --modified --added --removed --deleted
With this invocation, status no longer need to list directory content (or use
cache to skip that step). Status just need to check the known list of tracked
files.
Case 1: pristine-928b0540e421
(4/4 cores i7-7700K CPU)
python-re: 1.313
python-re2: 1.332
fsmonitor-re: 0.297
fsmonitor-re2: 0.296
rust-ds1: 0.455
rust-ds2: 0.369
rhg-ds1: 0.316
rhg-ds2: 0.130
(8/16 cores i9-9900K CPU)
python-re: 1.129
python-re2: 1.133
fsmonitor-re: 0.273
fsmonitor-re2: 0.271
rust-ds1: 0.330
rust-ds2: 0.244
rhg-ds1: 0.207
rhg-ds2: 0.050
For reference, I also gathered timing for `git status` on this machine and repo
git status -s --untracked-files=no: 0.110
Case 2: pristine-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 0.993
python-re2: 0.987
fsmonitor-re: 0.241
fsmonitor-re2: 0.243
rust-ds1: 0.358
rust-ds2: 0.307
rhg-ds1: 0.228
rhg-ds2: 0.100
(8/16 cores i9-9900K CPU)
python-re: 0.856
python-re2: 0.839
fsmonitor-re: 0.221
fsmonitor-re2: 0.222
rust-ds1: 0.262
rust-ds2: 0.221
rhg-ds1: 0.152
rhg-ds2: 0.038
Case 3: clean-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 0.973
python-re2: 0.979
fsmonitor-re: 0.242
fsmonitor-re2: 0.242
rust-ds1: 0.357
rust-ds2: 0.304
rhg-ds1: 0.224
rhg-ds2: 0.098
(8/16 cores i9-9900K CPU)
python-re: 0.838
python-re2: 0.837
fsmonitor-re: 0.222
fsmonitor-re2: 0.221
rust-ds1: 0.263
rust-ds2: 0.219
rhg-ds1: 0.152
rhg-ds2: 0.037
Case 4: dirty-8f96f8c756ae
(4/4 cores i7-7700K)
python-re: 0.993
python-re2: 0.998
fsmonitor-re: 1.272
fsmonitor-re2: 0.840
rust-ds1: 0.364
rust-ds2: 0.301
rhg-ds1: 0.231
rhg-ds2: 0.099
(8/16 cores i9-9900K CPU)
python-re: 0.843
python-re2: 0.843
fsmonitor-re: 1.291
fsmonitor-re2: 0.844
rust-ds1: 0.273
rust-ds2: 0.233
rhg-ds1: 0.153
rhg-ds2: 0.040
Assignee | ||
Comment 2•7 months ago
|
||
This lookahead prevents the use of more modern and efficient regexp engine
slowing down status.
In practice we only have two vscode directory tracked in Mercurial:
- the root one, that see active development,
- the one in "remote/test/puppeteer/" that was never touched since its addition.
It is easy to not match the root in the hgignore, but harder for the other one.
Please note that once a file is tracked by Mercurial, the fact it is ignored or
not no longer matters, so in practice this will only affect "future" addition.
However the history shows that this addition are extremely rare (one in over 15
years) and that the only occurrence is some venturing, where the vscode file
seems less important.
So dropping this exception seems fine, the small inconvenience of having to
manually add the file in an hypothetical future is negligible compared to
concrete performance improvement of common operation to everyone.
See the other changesets dropping the second lookahead patterns for performance
number.
Assignee | ||
Comment 3•7 months ago
|
||
We are about to change the hgignore to remove the lookahead expression. This
means the "*.egg-info/" directories will be ignored everywhere again. To
prevent this causing issue with the vendoring logic, we add a new call to
add-remove explicitly listing file in ".egg-info" directories to override the
ignore pattern.
Once tracked, the ignore pattern will no longer affects these file and all will
be good.
Check the next commit for more information on the motivation.
Assignee | ||
Comment 4•7 months ago
|
||
Related changes :
- https://phabricator.services.mozilla.com/D208967
- https://phabricator.services.mozilla.com/D208968
- https://phabricator.services.mozilla.com/D208966
I was not able to test D208968, as the code does not seems covered by ./mach test python/mozbuild/mozbuild/test
)
Comment 6•7 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4952395ba0ec
https://hg.mozilla.org/mozilla-central/rev/7f0f90bea3c3
https://hg.mozilla.org/mozilla-central/rev/51489f639f64
Updated•4 months ago
|
Description
•