Support storing into available capacity in StoreDenseElementHole
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox-esr115 | --- | wontfix |
firefox-esr128 | --- | wontfix |
firefox107 | --- | unaffected |
firefox108 | --- | unaffected |
firefox109 | --- | wontfix |
firefox110 | --- | wontfix |
firefox136 | --- | wontfix |
firefox137 | --- | wontfix |
firefox138 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: iain, Mentored)
References
(Blocks 2 open bugs, Regressed 1 open bug, Regression)
Details
(Keywords: perf-alert, regression)
Attachments
(2 files)
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Iain, bug 1797486 is in the range in comment 0. Mind taking a quick look to see if this is reproducible and easily fixable?
Assignee | ||
Comment 2•2 years ago
|
||
It's reproducible. Not immediately sure about easily fixable.
The relevant code looks something like this:
for (var i = 1; i <= 3; i++) {
var m = (1 << i) * 10000;
var flags = Array(m+1);
for (var j = 2; j <= m; j++) {
flags[j] = true;
}
// Do something with flags.
}
We allocate a large array, then we initialize all but the first two elements.
Before my patch, this code had bimodal behaviour:
- The first few iterations of the inner loop happen in the C++ interpreter. We don't attach any ICs.
- Eventually, we tier up to blinterp and attach a StoreDenseElementHole IC.
- We kick off an OSR compilation on a background thread.
4A: In the fast case, we finish the compilation before the second iteration of the outer loop. Because the pre-patch code supported indices > initializedLength, we don't bail out when writing toflags[2]
: the initialized length is 2, but the capacity is 2046. We continue running in Ion with StoreDenseElementHole.
4B: In the slow case, we start the second iteration of the loop before tiering up to Ion. When we try writing toflags[2]
, the guard fails, and we instead attach (and eventually transpile)CallAddOrUpdateSparseElement
.
After my patch, both baseline ICs and Ion only support writing one past the initialized length, even if we have the capacity.
One option for fixing this would be to add support (in both CacheIR and Ion) for the OOB case where we have the capacity. We'd have to add a loop to fill the gaps with the magic hole value.
Comment 3•2 years ago
|
||
This bug has been marked as a regression. Setting status flag for Nightly to affected
.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 4•2 years ago
|
||
Set release status flags based on info from the regressing bug 1797486
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Comment 5•4 months ago
|
||
Shold we accept the regression and close this bug?
Assignee | ||
Comment 6•4 months ago
|
||
We can accept the regression (in the sense that there's no urgency to fix this for the sake of sunspider), but it would still be nice to fix this at some point. This is a reasonable good-second-bug candidate. It's not a high priority, but it's also fairly small and self-contained.
I'll bump down the priority, switch it to a task, and mark myself as a mentor.
Quick summary: StoreDenseElementHole is used in cases where we are writing to an object with dense elements, but we might be writing past the end of the current elements. In both the CacheIRCompiler and Ion implementations of StoreDenseElementHole, we only support writing to the next empty element, even if we have allocated capacity for more. The primary reason for this is that we have to initialize the gap between the existing elements and the new one we're adding with the magic hole value, which requires us to generate a loop. We also have to make sure that the NON_PACKED flag is set on the elements, now that we've introduced a hole.
Assignee | ||
Updated•4 months ago
|
Assignee | ||
Updated•4 months ago
|
Assignee | ||
Updated•4 months ago
|
Assignee | ||
Comment 7•13 days ago
|
||
The two code paths were already almost identical. The only differences were that the Ion code used setupAlignedABICall and had a small optimization to reuse the condition flags an earlier test (on architectures with condition flags).
Updated•13 days ago
|
Assignee | ||
Comment 8•13 days ago
|
||
With this patch, we have jitcode support for storing anywhere within the allocated elements capacity, or one item past.
The assertion in addDenseElementPure is no longer correct because we reallocate the capacity before we initialize the holes and update initLength.
We could add support for writing more than one element past the capacity, but we'd have to call a different VM function and think more carefully about when should go sparse.
Comment 10•11 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/75fc4a9ec050
https://hg.mozilla.org/mozilla-central/rev/dbeb07f4be69
Reporter | ||
Comment 11•11 days ago
|
||
Cross-posting from Bug 1951338:
Either this bug or Bug 1951338 lead to some improvements:
SP3:
Windows: 2.8% on SP3-perf-Dashboard-Total , which was driven by 7.2% improvement on SP3-perf-Dashboard-SelectingPoints-total
Android: 4.8% on Android SP3-perf-Dashboard-SelectingPoints-total
Sunspider
54% on Sunspider-access-nsieve (Probably due to bug 1804104)
Updated•10 days ago
|
Comment 12•9 days ago
•
|
||
It was this bug.
Perf-Dashboard profiles before/after:
Before: https://share.firefox.dev/4c1XCCr (SetElem IC in findOptimalSegmentationInternal)
After: https://share.firefox.dev/3RjJo6g (no SetElem IC)
Comment 13•6 days ago
|
||
(In reply to Pulsebot from comment #9)
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75fc4a9ec050
Share more StoreDenseElementHole code r=jandem
https://hg.mozilla.org/integration/autoland/rev/dbeb07f4be69
Initialize holes in prepareOOBStoreElement r=jandem
Perfherder has detected the following browsertime performance change.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
17% | speedometer3 Perf-Dashboard/SelectingPoints/Sync | linux1804-64-nightlyasrelease-qr | fission webrender | 45.08 -> 37.61 | Before/After |
16% | speedometer3 Perf-Dashboard/SelectingPoints/total | linux1804-64-nightlyasrelease-qr | fission webrender | 48.24 -> 40.67 | Before/After |
8% | speedometer3 Perf-Dashboard/total | linux1804-64-nightlyasrelease-qr | fission webrender | 101.38 -> 93.47 | Before/After |
6% | speedometer3 Perf-Dashboard/SelectingPoints/Sync | windows11-64-24h2-nightlyasrelease | fission webrender | 21.50 -> 20.16 | Before/After |
6% | speedometer3 Perf-Dashboard/SelectingPoints/total | windows11-64-24h2-nightlyasrelease | fission webrender | 23.53 -> 22.12 | Before/After |
... | ... | ... | ... | ... | ... |
3% | speedometer3 Perf-Dashboard/total | windows11-64-24h2-nightlyasrelease | fission webrender | 52.54 -> 51.04 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 44345
For more information on performance sheriffing please see our FAQ.
Description
•