Closed Bug 1894160 Opened 7 months ago Closed 7 months ago

need to explicitly add vendored "*.egg-info" to simplify `.hgignore`

Categories

(Developer Infrastructure :: Mach Vendor & Updatebot, enhancement)

enhancement

Tracking

(firefox127 fixed)

RESOLVED FIXED
127 Branch
Tracking Status
firefox127 --- fixed

People

(Reporter: pierre-yves.david, Assigned: pierre-yves.david)

References

(Regressed 1 open bug)

Details

Attachments

(3 files)

Expected results:

The ".hgignore" of the firefox repository uses negative lookahead regular expressions for two of its patterns.

Using such patterns prevent the use of modern regular engine, slowing down all operations that use status (status, addremove, merge, update, commit, histedit, …).

At that point you may wonder: "does not fsmonitor to speed up status anyway? Why do we care". In short, the performance gains still apply with fsmonitor for a coulpe of reasons.

First fsmonitor still need to process the hgignore when the working copy is subject to a lot of modification. In a case we tested, the working copy state makes the fsmonitor gain limited (2.031 seconds without fsmonitor vs 1.501 seconds with fsmonitor) and using better regex engine provides a significant speedup with fsmonitor (from 1.501 to 1.051 seconds).

Second, fsmonitor only speed things up when it works, and there is many cases when it does not, for example, if too many files changed, it might not have picked fs events fast enough and have to restart from scratch, or the maximum number of inotify watch have been exceeded. In these cases the time spent in status can be significantly slower than the one without it, we observed up to 4 seconds status time in theses case in our benchmarks. Talking about benchmarks, we had issue with fsmonitor out of the box in most cases and had to adjust things to make it works.

That second point is quite important, without these look-ahead patterns, it is possible to use a better implementation of status written in Rust. That implementation provides performance similar to the one offered but fsmonitor best cases, without the associated unreliability.

So What kind of performance improvement are we talking about and how do we get them? I put long benchmark result in the commit description of the patches addressing this, but here is a quick look at key result of a series benchmark I ran:

Status for a clean working copy of Mozilla Unified from Mars 2024 on a 8 core machine.

    1.674 seconds Python-code + stdlib "re" module
    1.430 seconds Python-code + "re2" module from "google-re2"
    0.272 seconds fsmonitor + python-code (best case, worse case is ~4 seconds)
    0.278 seconds fsmonitor + python-code + re2
    0.359 seconds Rust-code with "dirstate-v1" format
    0.259 seconds Rust-code with "dirstate-v2" format
    0.235 seconds pure-Rust-executable with "dirstate-v1" format
    0.052 seconds pure-Rust-executable with "dirstate-v2" format

Status very dirty working copy of Mozilla Unified from Feb 2022 on a 8 core machine.

    1.501 seconds fsmonitor + python-code (best case, worse case is ~4 seconds)
    2.031 seconds Python-code + stdlib "re" module
    1.560 seconds Python-code + "re2" module from "google-re2"
    1.051 seconds seconds fsmonitor + python-code + re2
    0.519 seconds Rust-code with "dirstate-v1" format
    0.483 seconds Rust-code with "dirstate-v2" format
    0.234 seconds pure-Rust-executable with "dirstate-v1" format
    0.124 seconds pure-Rust-executable with "dirstate-v2" format

Here we see that:

  • being able to use the more modern re2 engine provides a significant performance improvement for code path that cannot rely on a trivial fsmonitor returns.

  • The rust code path in Mercurial (Available on linux and MacOs) provides the similar performance benefit as fsmonitor in the simple cases and significantly outperforms it in the more complex cases. (In addition, these performance gains will be more stable and reliable than fsmonitor)

  • performance of the pure rust executable exceed the one provide by fsmonitor, by a mix of eliminating the python setup overhead and using efficient algorithm (and format). The comparison of "Rust-code" vs "pure-Rust-executable" gives us an idea of that Python overhead. That is also useful to understand the performance benefit to be expected from operation that can do many status call like histedit.

So what do we need to do?

To gain access to these speedup, we "just" need to remove the two negative lookahead patterns.

The first one aims at ignoring vscode directores except for the one at the root of the repository and is simple to rewrite in another way.

The second is trickier as it ignores python packaging ".egg-info" directories except in third-party directory. We could use an approach similar to the first pattern, explicitly listing directory where they should be ignored. However an alternative is to have ".egg-info" directory ignored globally but have the vendoring too overriding such ignores by explicitly adding files within such directory while vendoring. I implemented that approach.

This lookahead expression prevents the use of more modern and efficient regexp
engine. This slows down "hg status" and other operations.

Since the exception are only about vendored content whose addition is managed by
a script (match vendor), that script can deal with this exception by itself,
and it does since the last changeset..

So we drop the exception to unlock various performance improvements for status.

Why does this improves things?

There improvement can come from different sources:

  • Using the "re2" regexp engine to match ignored files and directories provide
    a performance boost for vanillia mercurial installation and fs-monitor one in
    various cases. To benefit from it, just install the "google-re2" packages and
    mercurial will automatically uses it.

  • Installing a Mercurial compiled with the Rust extensions unlock the use of a
    more efficient code path for status that performs the necessary action in a
    smarter and parallel ways, providing a significant boost. These extensions
    are available on Linux and MacOs and some distribution have started to enable
    them by default.

  • Moving to a more modern "dirstate" format. The dirstate tracks the state of
    the working copy. For a couple of years, Mercurial has a new format for this
    information that is more efficient to read and update and tracks finer
    grained information. This allow substantial improvement in the way we run
    status. The Rust extensions are required to efficiently using this format.

  • Using a pure-rust executable. Mercurial has a pure rust version (called
    "rhg") that can handled a limited set of commands. It run without the
    overhead of starting and initializing Python providing another very
    significant boost to performance… but obviously requiring the Rust code path
    to be usable.

Quick Conclusion of the Benchmarks

(Putting that first for people who just want a quick read.)

  • fsmonitor struggle on working copy with many modication,

  • Using the "re2" binding from "google-re2" helps, especially for these cases

  • On typical mozilla developer machine, the Rust variants match the fsmonitor
    performance at worse and exceed it in multiple cases. Especially it does not
    stuggle with the "many modification" case.

  • On smaller machine, the Rust variants still provide a solid and reliable
    performance win accross all operation. That make them preferable to fsmonitor.

  • The rust variants matches "git status" performance on equivalement workload.
    The pure Rust version significantly outperforms it.

Benchmarks descriptions

Machines

We ran benchmark on two different machines:

  • A i7-7700K 4 physical / 4 logical cores released in Jan 2017

    To see performance in "low" parallelism case.

  • A i9-9900K 8 physical / 16 logical cores released in October 2019

    To see performance in a "high" parallism case.

In both cases the repositories lived in a btrfs file system backed by solid
state disks (ssd or nvme) and the machines had enough ram to keep caches in
memory.

I also ran benchmarks on a more modern i7-1370P release on Jan 2023, and the
results were consistent with the i9-9900K ones.

Variants

Benchmarks were run with multiple variants of Mercurial:

  • python-re:
    • no Rust extensions used,
    • regex engine is the std-lib "re" module.
    • fsmonitor is disabled
    • using the dirstate-v1 format
  • python-re2:
    • no Rust extensions used,
    • regex engine is the std-lib "re" module.
    • fsmonitor is disabled
    • using the dirstate-v1 format
  • fsmonitor-re:
    • no Rust extensions used,
    • regex engine is the std-lib "re" module.
    • fsmonitor is enabled and working at its best
    • using the dirstate-v1 format
  • fsmonitor-re2:
    • no Rust extensions used,
    • regex engine is the std-lib "re" module.
    • fsmonitor is enabled and working at its best
    • using the dirstate-v1 format
  • rust-ds1:
    • Rust extensions are used,
    • regex engine from the Rust "regexp" crate.
    • fsmonitor is disabled
    • using the dirstate-v1 format
  • rust-ds2:
    • Rust extensions are used,
    • regex engine from the Rust "regexp" crate.
    • fsmonitor is disabled
    • using the dirstate-v2 format
  • rgh-ds1:
    • Pure rust executable is used,
    • regex engine from the Rust "regexp" crate.
    • fsmonitor is disabled
    • using the dirstate-v1 format
  • rgh-ds2:
    • Pure rust executable is used,
    • regex engine from the Rust "regexp" crate.
    • fsmonitor is disabled
    • using the dirstate-v2 format

Commands

We ran two kind of operations:

  • hg status with the default output.
    This command need to search for ignored and unknown files.
    In this case improving the regex engine usually provides significant performance gain.

  • hg status --modified --added --removed --deleted.
    This command only need to check the state of tracked files.
    In this case, improving the regex engine does not have much effect, but it
    is interesting to compare the performance of the various implementation.

Working copies

Case 1: pristine-928b0540e421

Working copy parent is 928b0540e421
  * 341 759 tracked files
  *  21 253 directories
  * no untracked files

Case 2: pristine-8f96f8c756ae

Working copy parent is 8f96f8c756ae
    (an older changeset I had dirty working copy for)
  * 246 855 tracked files
  *  15 047 directory
  * no untracked files

Case 3: clean-8f96f8c756ae

Working copy parent is 8f96f8c756ae
  * 246 855 tracked files
  *  23 540 directories
  *  79 901 ignored files

Case 4: dirty-8f96f8c756ae

Working copy parent is 8f96f8c756ae
  * 246 855 tracked files
  *  33 720 directories
  * 244 386   clean files
  *   1 065 modified files
  *     247   added files
  *   1 040 removed
  *     364 missing files
  *  63 455 unknown files
  *  79 915 ignored files

Results Analysis

(full, raw number after this section)

About fsmonitor

Before diving into the improvements related to regex engine, we can note that
the benchmark show that fsmonitor provides a good boost in the pristine/clean cases, and
a noticeable but disappointing improvement in the very dirty case.

                       python-re fsmonitor-re
pristine-928b0540e421:     1.884 →      0.293 (-85%)
dirty-8f96f8c756ae:        2.157 →      1.440 (-33%)

Surprisingly when only listing tracked file (during commit for example), fsmonitor actually
get counter productive in the very dirty case

pristine-928b0540e421:     1.313 →      0.297 (-77%)
dirty-8f96f8c756ae:        0.993 →      1.272 (+28%)

In addition to being disappointing in the the very dirty case. The performance
with fsmonitor collapses when fsmonitor cannot use its cache. I observed 4
seconds execution time while setting up the brenchmark..

Improvement without involving Rust:

Using the re2 binding from the google-re2 package provides a small improvement
to plain python execution (about 15%). This case is relevant because this is
the one that will be used when fsmonitor cannot help or start.

                       python-re  python-re2
pristine-928b0540e421:      1.884 →   1.650 (-15%)
dirty-8f96f8c756ae:         2.157 →   1.718 (-20%)

It does not make a difference when only listing tracked files as the hgignore is not involved.

                       python-re  python-re2
pristine-928b0540e421:      1.313 →    1.332
dirty-8f96f8c756ae:         0.993 →    0.998

However, surprisingly, it helps fsmonitor quite a lot in in the dirty case
(dirty-8f96f8c756ae). Bringing fsmonitor performance in line with the plain
python one.

               fsmonitor-re fsmonitor-re2
list-unknown          1.440 →       1.012 (-30%)
tracked only          1.272 →       0.840 (-34%)

So to conclude being able to use the "re2" regex engine save up to ⅓ of the
runtime of some operation and never slow things down. So that's a good win.

Improvement involving Rust variants:

For the pristine-928b0540e421 case (all tracked files clean, no ignored files),
Rust provides speed boost "equivalent" (or better) to the one from fsmonitor.
The precise comparison depends of the parallelism level.

With the 4 physical / 4 logical core machine. The Python+Rust version is slower
than fsmonitor, using dirstate-v2 helping to close some of the gap with
fsmonitor. Using dirstate-v2 also allow the "rhg" version to become twice
faster than the fsmonitor version. Also keep in mind that even when a bit
slower, the performance of the rust version will be much more stable than
fsmonitor.

python-re2:    1.650
fsmonitor-re2: 0.296 (-82%)
rust-ds1:      0.542 (-67%)
rust-ds2:      0.368 (-77%)
rhg-ds1:       0.401 (-75%)
rhg-ds2:       0.132 (-92%)

With the 8 physical / 16 physical code machine, the Rust catch up with
fsmonitor performance much quicker. The dirstate-v1 is a little slower, but the
dirstate-v2 version is already faster. The pure rust is always faster.

python-re2:    1.430
fsmonitor-re2: 0.278 (-80%)
rust-ds1:      0.359 (-74%)
rust-ds2:      0.259 (-81%)
rhg-ds1:       0.235 (-83%)
rhg-ds2:       0.052 (-96%)

Talking about parallism. We see that the code scale well, doubling the
number of core bring about twice the performance which is great.

pristine-928b0540e421     4/4    8/16
    rhg-ds1:            0.401 → 0.235 (× 1.70)
    rhg-ds2:            0.132 → 0.052 (× 2.54)
clean-8f96f8c756ae
    rhg-ds1:            0.286 → 0.169 (× 1.70)
    rhg-ds2:            0.101 → 0.040 (× 2.52)
dirty-8f96f8c756ae
    rhg-ds1:            0.380 → 0.234 (x 1.62)
    rhg-ds2:            0.232 → 0.124 (x 1.87)

Comparing with git performance on the pristine-928b0540e421 case also yield
great results. Surprisingly, the variant with a Python overhead still beat (or
match) git performance in this case. The pure Rust executable is always
significantly faster. Below is a comparison grouped by comparable formats.

git status -s: 0.554 (without untracked cache)
rust-ds1:      0.359 (- 35%)
rhg-ds1:       0.235 (- 57%)

git status -s: 0.232 (with untracked cache)
rust-ds2:      0.259 (+ 11%)
rhg-ds2:       0.052 (- 77%)

The clean-8f96f8c756ae case (all tracked clean, many ignored files) show result
result similar to pristine-928b0540e421. "Low" parallism give good gains
without fully matching the fs monitor performance. The High parallism provide
similar performance. In both case we gain the benefit of more stable
performances.

    (cores)          4/4           8/16
    python-re2:    1.282        | 1.119
    fsmonitor-re2: 0.243 (-81%) | 0.225 (-80%)
    rust-ds1:      0.416 (-68%) | 0.282 (-75%)
    rust-ds2:      0.303 (-76%) | 0.222 (-80%)
    rhg-ds1:       0.286 (-78%) | 0.169 (-85%)
    rhg-ds2:       0.101 (-92%) | 0.040 (-96%)

Things change quite a lot in the dirty-8f96f8c756ae case, where fsmonitor
struggled. The Rust variants still provides great speedup, significantly
beating the fsmonitor variants for both machines. (comparing to fsmonitor-re
this time)

    (cores)          4/4           8/16
    fsmonitor-re:  1.440        | 1.501
    fsmonitor-re2: 1.012 (-30%) | 1.051 (-30%)
    rust-ds1:      0.624 (-56%) | 0.519 (-65%)
    rust-ds2:      0.553 (-62%) | 0.483 (-68%)
    rhg-ds1:       0.380 (-73%) | 0.234 (-84%)
    rhg-ds2:       0.232 (-83%) | 0.124 (-91%)

Things is confirmed in the "listing tracked only" version of dirty-8f96f8c756ae
case were fs monitor was not really improving the situation compared to Python.

    (cores)          4/4           8/16
    python-re:     0.993        | 0.843076
    python-re2:    0.998        | 0.843324
    fsmonitor-re:  1.272 (+28%) | 1.291313 (+53%)
    fsmonitor-re2: 0.840 (-15%) | 0.844374
    rust-ds1:      0.364 (-63%) | 0.273305 (-68%)
    rust-ds2:      0.301 (-70%) | 0.233230 (-72%)
    rhg-ds1:       0.231 (-77%) | 0.153346 (-82%)
    rhg-ds2:       0.099 (-90%) | 0.039545 (-95%)

Full benchmark numbers for hg status

Here are the exhaustive number, all time in seconds.

Case 1: pristine-928b0540e421

(4/4 cores i7-7700K Jan 2017)

    python-re:     1.884
    python-re2:    1.650
    fsmonitor-re:  0.293 (more about 4 second when confused)
    fsmonitor-re2: 0.296
    rust-ds1:      0.542
    rust-ds2:      0.368
    rhg-ds1:       0.401
    rhg-ds2:       0.132

(8/16 cores i9-9900K CPU October 2018)

    python-re:     1.674
    python-re2:    1.430
    fsmonitor-re:  0.272
    fsmonitor-re2: 0.278
    rust-ds1:      0.359
    rust-ds2:      0.259
    rhg-ds1:       0.235
    rhg-ds2:       0.052

    For reference, I also gathered timing for `git status` on this machine and repo

    git status -s: 0.554 (without untracked cache)
    git status -s: 0.232 (with untracked cache)

Case 2: pristine-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     1.306
    python-re2:    1.227
    fsmonitor-re:  0.243
    fsmonitor-re2: 0.242
    rust-ds1:      0.416
    rust-ds2:      0.308
    rhg-ds1:       0.287
    rhg-ds2:       0.102

(8/16 cores i9-9900K CPU)

    python-re:     1.131
    python-re2:    1.076
    fsmonitor-re:  0.222
    fsmonitor-re2: 0.222
    rust-ds1:      0.279
    rust-ds2:      0.222
    rhg-ds1:       0.168
    rhg-ds2:       0.038

Case 3: clean-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     1.294
    python-re2:    1.282
    fsmonitor-re:  0.241
    fsmonitor-re2: 0.243
    rust-ds1:      0.416
    rust-ds2:      0.303
    rhg-ds1:       0.286
    rhg-ds2:       0.101

(8/16 cores i9-9900K CPU)

    python-re:     1.170
    python-re2:    1.119
    fsmonitor-re:  0.224
    fsmonitor-re2: 0.225
    rust-ds1:      0.282
    rust-ds2:      0.222
    rhg-ds1:       0.169
    rhg-ds2:       0.040

Case 4: dirty-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     2.157
    python-re2:    1.718
    fsmonitor-re:  1.440
    fsmonitor-re2: 1.012
    rust-ds1:      0.624
    rust-ds2:      0.553
    rhg-ds1:       0.380
    rhg-ds2:       0.232

(8/16 cores i9-9900K CPU)

    python-re:     2.031
    python-re2:    1.560
    fsmonitor-re:  1.501
    fsmonitor-re2: 1.051
    rust-ds1:      0.519
    rust-ds2:      0.483
    rhg-ds1:       0.234
    rhg-ds2:       0.124

Benchmark numbers for hg status --modified --added --removed --deleted

With this invocation, status no longer need to list directory content (or use
cache to skip that step). Status just need to check the known list of tracked
files.

Case 1: pristine-928b0540e421

(4/4 cores i7-7700K CPU)

    python-re:     1.313
    python-re2:    1.332
    fsmonitor-re:  0.297
    fsmonitor-re2: 0.296
    rust-ds1:      0.455
    rust-ds2:      0.369
    rhg-ds1:       0.316
    rhg-ds2:       0.130

(8/16 cores i9-9900K CPU)

    python-re:     1.129
    python-re2:    1.133
    fsmonitor-re:  0.273
    fsmonitor-re2: 0.271
    rust-ds1:      0.330
    rust-ds2:      0.244
    rhg-ds1:       0.207
    rhg-ds2:       0.050

    For reference, I also gathered timing for `git status` on this machine and repo

    git status -s --untracked-files=no: 0.110

Case 2: pristine-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     0.993
    python-re2:    0.987
    fsmonitor-re:  0.241
    fsmonitor-re2: 0.243
    rust-ds1:      0.358
    rust-ds2:      0.307
    rhg-ds1:       0.228
    rhg-ds2:       0.100

(8/16 cores i9-9900K CPU)

    python-re:     0.856
    python-re2:    0.839
    fsmonitor-re:  0.221
    fsmonitor-re2: 0.222
    rust-ds1:      0.262
    rust-ds2:      0.221
    rhg-ds1:       0.152
    rhg-ds2:       0.038

Case 3: clean-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     0.973
    python-re2:    0.979
    fsmonitor-re:  0.242
    fsmonitor-re2: 0.242
    rust-ds1:      0.357
    rust-ds2:      0.304
    rhg-ds1:       0.224
    rhg-ds2:       0.098

(8/16 cores i9-9900K CPU)

    python-re:     0.838
    python-re2:    0.837
    fsmonitor-re:  0.222
    fsmonitor-re2: 0.221
    rust-ds1:      0.263
    rust-ds2:      0.219
    rhg-ds1:       0.152
    rhg-ds2:       0.037

Case 4: dirty-8f96f8c756ae

(4/4 cores i7-7700K)

    python-re:     0.993
    python-re2:    0.998
    fsmonitor-re:  1.272
    fsmonitor-re2: 0.840
    rust-ds1:      0.364
    rust-ds2:      0.301
    rhg-ds1:       0.231
    rhg-ds2:       0.099

(8/16 cores i9-9900K CPU)

    python-re:     0.843
    python-re2:    0.843
    fsmonitor-re:  1.291
    fsmonitor-re2: 0.844
    rust-ds1:      0.273
    rust-ds2:      0.233
    rhg-ds1:       0.153
    rhg-ds2:       0.040

This lookahead prevents the use of more modern and efficient regexp engine
slowing down status.

In practice we only have two vscode directory tracked in Mercurial:

  • the root one, that see active development,
  • the one in "remote/test/puppeteer/" that was never touched since its addition.

It is easy to not match the root in the hgignore, but harder for the other one.

Please note that once a file is tracked by Mercurial, the fact it is ignored or
not no longer matters, so in practice this will only affect "future" addition.

However the history shows that this addition are extremely rare (one in over 15
years) and that the only occurrence is some venturing, where the vscode file
seems less important.

So dropping this exception seems fine, the small inconvenience of having to
manually add the file in an hypothetical future is negligible compared to
concrete performance improvement of common operation to everyone.

See the other changesets dropping the second lookahead patterns for performance
number.

We are about to change the hgignore to remove the lookahead expression. This
means the "*.egg-info/" directories will be ignored everywhere again. To
prevent this causing issue with the vendoring logic, we add a new call to
add-remove explicitly listing file in ".egg-info" directories to override the
ignore pattern.

Once tracked, the ignore pattern will no longer affects these file and all will
be good.

Check the next commit for more information on the motivation.

Related changes :

I was not able to test D208968, as the code does not seems covered by ./mach test python/mozbuild/mozbuild/test)

Pushed by cosheehan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4952395ba0ec hgignore: drop the negative lookahead assertion around vscode; r=sheehan https://hg.mozilla.org/integration/autoland/rev/7f0f90bea3c3 vendor-python: explicitly add the content of the .egg-info directory; r=ahochheiden https://hg.mozilla.org/integration/autoland/rev/51489f639f64 hgignore: simplify the egginfo pattern; r=sheehan
Status: UNCONFIRMED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 127 Branch
Regressions: 1894617
Regressions: 1908016
Assignee: tom → pierre-yves.david
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: