Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE ) spends first 4 minutes around MinorGC with large images as input (80% time spent in js::gc::TenuringTracer::traverse)
Categories
(Core :: JavaScript Engine, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox142 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: jandem, NeedInfo)
References
(Blocks 1 open bug, )
Details
Attachments
(4 files)
- Open the attached testcase (Or go to https://codepen.io/sweaver2112/pen/JjLmLpE )
- Click on "REsolution and color accuracy"
- Set "Window size px" = 1
- Set "Output size multiplier" =0.25
- From the top left, click to browse and select the attached sample input image.
AR: https://share.firefox.dev/4dX8OzA , https://share.firefox.dev/4f6jUmS (First 2m13s doing stuff around MinorGC)
ER: Not so?
Reporter | ||
Updated•10 months ago
|
Reporter | ||
Comment 1•10 months ago
|
||
Reporter | ||
Updated•10 months ago
|
Reporter | ||
Comment 2•10 months ago
|
||
Reporter | ||
Updated•10 months ago
|
Reporter | ||
Comment 3•10 months ago
|
||
Profile with a 2MB image: https://share.firefox.dev/40acmvi
Reporter | ||
Comment 4•10 months ago
|
||
Reporter | ||
Updated•10 months ago
|
Updated•10 months ago
|
Reporter | ||
Comment 5•3 months ago
|
||
Profile with latest Nightly (Partial): https://share.firefox.dev/4mclxUy
Samply: https://share.firefox.dev/43qUopq (7 minutes total, 4 minutes in minorGC)
Reporter | ||
Updated•3 months ago
|
Comment 7•1 month ago
|
||
Most of the time is spent tracing the whole cell store buffer, in particular tracing object elements (js::gc::TenuringTracer::traceObjectElements). This suggests that JIT code is adding whole cells store buffer entries for arrays with a large number of elements (my guess would be the image data array in the testcase). This happens when setting a property on a tenured object to a nursery object. Using the whole cell store buffer for JIT post barriers works well most of the time but this is one case where it doesn't do so well.
Jan, can you think of a way of avoiding using the whole cell buffer in cases like these?
Assignee | ||
Comment 8•1 month ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #7)
Jan, can you think of a way of avoiding using the whole cell buffer in cases like these?
In jit::PostWriteElementBarrier
we can use the per-element barrier for large arrays, but the threshold for that is 4096 elements (MAX_WHOLE_CELL_BUFFER_SIZE
) and this page has a lot of arrays with 500-1500 elements. Lowering that threshold to 256 seems to help. I've pushed that change to try to see what it does on Speedometer/JetStream.
Mayank, can you verify that patch helps when the try build is available?
Reporter | ||
Comment 9•1 month ago
•
|
||
(In reply to Jan de Mooij [:jandem] from comment #8)
Mayank, can you verify that patch helps when the try build is available?
I used the Nightly0as-release build from here: https://treeherder.mozilla.org/jobs?repo=try&author=jdemooij%40mozilla.com&selectedTaskRun=Yo87lrH8QI61q-H8q24RJg.0
Profile: https://share.firefox.dev/44SL7Gf (40 seconds)
So a huge improvement to the GC time. From 4 minutes ---> to 40 seconds.
Assignee | ||
Comment 10•1 month ago
|
||
The website in the bug uses a lot of tenured arrays with length 500-1500 and this
eliminates a lot of element tracing time during minor GC.
Updated•1 month ago
|
Assignee | ||
Updated•1 month ago
|
Comment 11•1 month ago
|
||
Assignee | ||
Comment 12•1 month ago
|
||
I'm hoping this improves JetStream's date-format-tofte and Sp3 a bit but my Try results might be noisy too.
Jon was wondering about lowering the value even more. NI myself to test that on Try after we have some AWFY numbers from this version.
Comment 13•1 month ago
|
||
bugherder |
Reporter | ||
Comment 14•1 month ago
|
||
Profile from latest Nightly containing this fix: https://share.firefox.dev/4f5hGFK
Updated•28 days ago
|
Comment 15•22 days ago
•
|
||
I took a look at the history of MAX_WHOLE_CELL_BUFFER_SIZE. There are a couple of interesting comments in bug 1233857. In particular these two:
The typical solution to this problem is known as "card marking". Because we do not control our element memory, however, this approach will not work for us. We've discussed writing an elements allocator to make this possible, but there is actually another way that is much simpler: ...
Also, ObjectElements has a number of bits available. We could add a flag that means "object is already in the whole-cell buffer", or we could add a counter: if we barriered many elements, give up and just put it in the whole cell buffer instead.
The first one seems especially interesting. Now that we have our own slots/elements allocator, should we consider card-marking for dynamic slots/elements?
Description
•