Closed Bug 1925764 Opened 10 months ago Closed 1 month ago

Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 4 minutes around MinorGC with large images as input (80% time spent in js::gc::TenuringTracer::traverse)

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED FIXED
142 Branch
Tracking Status
firefox142 --- fixed

People

(Reporter: mayankleoboy1, Assigned: jandem, NeedInfo)

References

(Blocks 1 open bug, )

Details

Attachments

(4 files)

  1. Open the attached testcase (Or go to https://codepen.io/sweaver2112/pen/JjLmLpE )
  2. Click on "REsolution and color accuracy"
  • Set "Window size px" = 1
  • Set "Output size multiplier" =0.25
  1. From the top left, click to browse and select the attached sample input image.

AR: https://share.firefox.dev/4dX8OzA , https://share.firefox.dev/4f6jUmS (First 2m13s doing stuff around MinorGC)

ER: Not so?

Summary: Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends 42s around MinorGC with large images as input → Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 42s around MinorGC with large images as input
Summary: Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 42s around MinorGC with large images as input → Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 2minutes around MinorGC with large images as input
Attached image Sample input image
Summary: Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 2minutes around MinorGC with large images as input → Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 2minutes around MinorGC with large images as input (72% time spent in js::gc::TenuringTracer::traverse)

Profile with a 2MB image: https://share.firefox.dev/40acmvi

Attached image 2MB sample input file
Severity: -- → N/A
Priority: -- → P3

Profile with latest Nightly (Partial): https://share.firefox.dev/4mclxUy
Samply: https://share.firefox.dev/43qUopq (7 minutes total, 4 minutes in minorGC)

Summary: Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 2minutes around MinorGC with large images as input (72% time spent in js::gc::TenuringTracer::traverse) → Codepen demo (https://codepen.io/sweaver2112/pen/JjLmLpE) spends first 4 minutes around MinorGC with large images as input (80% time spent in js::gc::TenuringTracer::traverse)

Jonco, anything interesting here?

Flags: needinfo?(jcoppeard)

Most of the time is spent tracing the whole cell store buffer, in particular tracing object elements (js::gc::TenuringTracer::traceObjectElements). This suggests that JIT code is adding whole cells store buffer entries for arrays with a large number of elements (my guess would be the image data array in the testcase). This happens when setting a property on a tenured object to a nursery object. Using the whole cell store buffer for JIT post barriers works well most of the time but this is one case where it doesn't do so well.

Jan, can you think of a way of avoiding using the whole cell buffer in cases like these?

Flags: needinfo?(jcoppeard) → needinfo?(jdemooij)

(In reply to Jon Coppeard (:jonco) from comment #7)

Jan, can you think of a way of avoiding using the whole cell buffer in cases like these?

In jit::PostWriteElementBarrier we can use the per-element barrier for large arrays, but the threshold for that is 4096 elements (MAX_WHOLE_CELL_BUFFER_SIZE) and this page has a lot of arrays with 500-1500 elements. Lowering that threshold to 256 seems to help. I've pushed that change to try to see what it does on Speedometer/JetStream.

Mayank, can you verify that patch helps when the try build is available?

Flags: needinfo?(mayankleoboy1)

(In reply to Jan de Mooij [:jandem] from comment #8)

Mayank, can you verify that patch helps when the try build is available?

I used the Nightly0as-release build from here: https://treeherder.mozilla.org/jobs?repo=try&author=jdemooij%40mozilla.com&selectedTaskRun=Yo87lrH8QI61q-H8q24RJg.0

Profile: https://share.firefox.dev/44SL7Gf (40 seconds)

So a huge improvement to the GC time. From 4 minutes ---> to 40 seconds.

Flags: needinfo?(mayankleoboy1)

The website in the bug uses a lot of tenured arrays with length 500-1500 and this
eliminates a lot of element tracing time during minor GC.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Pushed by jdemooij@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/a38e00bca2c8 https://hg.mozilla.org/integration/autoland/rev/6da43dcb36a9 Lower MAX_WHOLE_CELL_BUFFER_SIZE to use single-element post barrier in more cases. r=jonco

I'm hoping this improves JetStream's date-format-tofte and Sp3 a bit but my Try results might be noisy too.

Jon was wondering about lowering the value even more. NI myself to test that on Try after we have some AWFY numbers from this version.

Flags: needinfo?(jdemooij)
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 142 Branch

Profile from latest Nightly containing this fix: https://share.firefox.dev/4f5hGFK

QA Whiteboard: [qa-triage-done-c143/b142]

I took a look at the history of MAX_WHOLE_CELL_BUFFER_SIZE. There are a couple of interesting comments in bug 1233857. In particular these two:

The typical solution to this problem is known as "card marking". Because we do not control our element memory, however, this approach will not work for us. We've discussed writing an elements allocator to make this possible, but there is actually another way that is much simpler: ...

Also, ObjectElements has a number of bits available. We could add a flag that means "object is already in the whole-cell buffer", or we could add a counter: if we barriered many elements, give up and just put it in the whole cell buffer instead.

The first one seems especially interesting. Now that we have our own slots/elements allocator, should we consider card-marking for dynamic slots/elements?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: