Bug 2008619 Comment 10 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Justin Link

on 2026-01-30 14:34:38 PST

I have tried four different ways of disabling the word cache (comparisons [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=ce24dd5f236a34f38b309c3ec2a9188b087b131b&newRepo=try&framework=13&test_version=student-t) [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=7eba0bad8f8750b90cbcec270bc3dc26d46f96c0&newRepo=try&framework=13&test_version=student-t) [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=50fee7145e9fed8983c96817eca6b14795f776c3&newRepo=try&framework=13&test_version=student-t) and [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=d6fbbaa36b44e48936d51141cf6b50e4d0b1ed40&newRepo=try&framework=13&test_version=student-t)).

_Except on Windows non-ref hardware where anything that causes us to use the word cache less is a win_, simply disabling the word cache results in a clear regression. In all of the cases, there are huge regressions in Editor-TipTap. On Windows, those huge Editor-TipTap regressions are sometimes offset by wins in other tests, although those seem less significant (even though the effect is that the overall SP3 score gets pulled upward).

I was surprised to see such a strong regression after [Denis' earlier results](https://perf.compare/compare-results?baseRev=dcd939228adf143c357879ddfc9c01fc020ecb42&baseRepo=try&newRev=5f0f93719eecdd284ff894d5d0fc4c9d3e75c94d&newRepo=try&framework=13&replicates=) so I re-implemented his change and tried a few different thresholds [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=4b3ef563694b2eaa70c65ceaf004642bde7d84cf&newRepo=try&newRev=dd084e6ad49c1ea23d946fa2765d13db830dfdaf&newRepo=try&newRev=33e894d877d63aed976231f257189e42c0d0c16e&newRepo=try&framework=13&test_version=student-t) and still see the improvements that he saw. Using a lower threshold than what he picked (1024) might also be better.

There are a few take-aways that I see from this:
1) The word cache _is_ useful and valuable but we are definitely over-using it.
2) Non-ref Windows hardware seems to have a distinctly difference experience with the word cache. Perhaps the shaping itself is less expensive there for some reason so there is less benefit in caching the results? Maybe something aspect of the word cache executes less efficiently on that hardware which affects when it should be used?

I'm taking a few action items out of this:
1) Study how the tests that benefit from the word cache are shaping words vs the tests that don't and use that to determine the appropriate conditions for when we should try to skip it.
2) Look more closely into what's happening on non-ref Windows hardware. I triggered some performance profiles on these machines in CI and have started looking at the profiles. [These](https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FS-73UbNmTFeE8oeyS3qnFA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_speedometer3.zip) are the profiles from the slowest of these runs. So far it's surprising because we aren't actually spending much time in the main code related to shaping and caching.

Revision 1 by

Justin Link

on 2026-01-30 14:37:13 PST

I have tried four different ways of disabling the word cache (comparisons [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=ce24dd5f236a34f38b309c3ec2a9188b087b131b&newRepo=try&framework=13&test_version=student-t) [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=7eba0bad8f8750b90cbcec270bc3dc26d46f96c0&newRepo=try&framework=13&test_version=student-t) [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=50fee7145e9fed8983c96817eca6b14795f776c3&newRepo=try&framework=13&test_version=student-t) and [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=d6fbbaa36b44e48936d51141cf6b50e4d0b1ed40&newRepo=try&framework=13&test_version=student-t)).

_Except on Windows non-ref hardware where anything that causes us to use the word cache less is a win_, simply disabling the word cache results in a clear regression. In all of the cases, there are huge regressions in Editor-TipTap. On Windows, those huge Editor-TipTap regressions are sometimes offset by wins in other tests, although those seem less significant (even though the effect is that the overall SP3 score gets pulled upward).

I was surprised to see such a strong regression after [Denis' earlier results](https://perf.compare/compare-results?baseRev=dcd939228adf143c357879ddfc9c01fc020ecb42&baseRepo=try&newRev=5f0f93719eecdd284ff894d5d0fc4c9d3e75c94d&newRepo=try&framework=13&replicates=) so I re-implemented his change and tried a few different thresholds [here](https://perf.compare/compare-results?baseRev=62af655cc11db427bd044b8d298ca40438f9082a&baseRepo=try&newRev=4b3ef563694b2eaa70c65ceaf004642bde7d84cf&newRepo=try&newRev=dd084e6ad49c1ea23d946fa2765d13db830dfdaf&newRepo=try&newRev=33e894d877d63aed976231f257189e42c0d0c16e&newRepo=try&framework=13&test_version=student-t) and still see the improvements that he saw. Using a lower threshold than what he picked (1024) might also be better.

There are a few take-aways that I see from this:
1) The word cache _is_ useful and valuable but we are definitely over-using it.
2) Non-ref Windows hardware seems to have a distinctly difference experience with the word cache. Perhaps the shaping itself is less expensive there for some reason so there is less benefit in caching the results? Maybe something aspect of the word cache executes less efficiently on that hardware which affects when it should be used?

I'm taking a few action items out of this:
1) Study how the tests that benefit from the word cache are shaping words vs the tests that don't and use that to determine the appropriate conditions for when we should try to skip it.
2) Look more closely into what's happening on non-ref Windows hardware. I triggered some performance profiles on these machines in CI and have started looking at the profiles. [These](https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FS-73UbNmTFeE8oeyS3qnFA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_speedometer3.zip) are the profiles from the slowest of these runs. So far it's surprising because we aren't actually spending much time in the main code related to shaping and caching.
3) Try using a simpler data structure (without need for allocations and _possibly_ without need for locking?) like the cache that I previously implemented for JS atoms.

Back to Bug 2008619 Comment 10