Closed Bug 1774733 Opened 2 years ago Closed 2 years ago

Slow JS_Stringify performance (Lots of hangs in IOUtils::WriteJSON)

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

VERIFIED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: florian, Assigned: sfink)

References

(Blocks 2 open bugs)

Details

(Keywords: perf:responsiveness, Whiteboard: [bhr:IOUtils::WriteJSON][sp3])

Attachments

(2 files, 1 obsolete file)

BHR data shows that 11% of the total hang time is in IOUtils::WriteJSON, making it the third most common hang.

A large part of it is in free and memcpy calls from the realloc call caused by mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy>::growStorageBy. Do we really need to have the entire piece of data in a single contiguous string to write it to disk?

It's also possible (and likely) that the data we write is too big, and written too often. Session store data is likely involved here.

IOUtils.writeJSON is part of OS.File stuff, not the web platform.

Barret, any ideas? It looks like you added this. Thanks.

Component: DOM: Core & HTML → OS.File
Flags: needinfo?(brennie)
Product: Core → Toolkit

Yeah, it is possible to split this up into multiple buffers (a la BufferList), but is it a bit more complicated because we also support writing LZ4 compressed files, and the implementaiton we're using requires the whole buffer. We might be able to use frame based compression but I'll have to do some experimenting.

Assignee: nobody → brennie
Flags: needinfo?(brennie)

Upon further inspection, that signature comes from inside JS itself. The call to JS_Stringify is what is creating the vector in question.

I'll confer with some Spidermonkey folks on how to approach this.

While session-store data is likely a part of the problem here, that is probably a much bigger part to fix. The immediate hang here is poor performance in JS_Stringify. Moving this to JavaScript Engine component do to a bit of optimization on string-ify of large array objects. This also looks like it affects JS Array-join behaviour so some perf work probably is worth doing.

Fixing the JS StringBuffer behaviour to grow vector less often seems like it would help a lot here. If there are problems still after that, we can go further up the stack avoid the need to a final linear buffer, but that breaks the JS_Stringify API and a new one would be needed.

Assignee: brennie → nobody
Component: OS.File → JavaScript Engine
Product: Toolkit → Core
Summary: Lots of hangs in IOUtils::WriteJSON → Slow JS_Stringify performance (Lots of hangs in IOUtils::WriteJSON)
Assignee: nobody → sfink

Here's an example profile of js -e 'obj = JSON.parse(os.file.readFile("unsanitized.json")); for (i = 0; i < 200; i++) JSON.stringify(obj)' where unsanitized.json is a 5.9MB sessionstore file. It shows lots of realloc activity. Timings:

./obj-opt-js/dist/bin/js -e   1.27s user 1.91s system 100% cpu 3.165 total
./obj-opt-js/dist/bin/js -e   1.27s user 1.91s system 100% cpu 3.178 total
./obj-opt-js/dist/bin/js -e   1.29s user 1.90s system 100% cpu 3.186 total
./obj-opt-js/dist/bin/js -e   1.27s user 1.90s system 100% cpu 3.161 total
./obj-opt-js/dist/bin/js -e   1.25s user 1.92s system 100% cpu 3.164 total

If I just change the behavior in this case to growing a StringBuffer's allocation by a factor if 8 instead of 2, it makes a huge difference:

./obj-opt-js/dist/bin/js -e   1.12s user 1.25s system 99% cpu 2.368 total
./obj-opt-js/dist/bin/js -e   1.06s user 1.17s system 99% cpu 2.236 total
./obj-opt-js/dist/bin/js -e   1.09s user 1.30s system 99% cpu 2.390 total
./obj-opt-js/dist/bin/js -e   1.07s user 1.36s system 100% cpu 2.431 total
./obj-opt-js/dist/bin/js -e   1.09s user 1.35s system 100% cpu 2.431 total

The original pattern of reallocs is:

realloc((size_t) 512)
realloc((size_t) 1024)
realloc((size_t) 4096)
realloc((size_t) 65536)
realloc((size_t) 131072)
realloc((size_t) 262144)
realloc((size_t) 524288)
realloc((size_t) 1048576)
realloc((size_t) 2097152)
realloc((size_t) 4194304)
realloc((size_t) 8388608)
realloc((size_t) 16777216)
realloc((size_t) 11148102)

Note the last one—that's from extractWellSized, where just before handing the malloc pointer off to a freshly created JSString, it shrinks it down to its actual size.

For large strings in a 64-bit virtual memory space, I question whether that has real-world value. It seems like it would hopefully just be a bunch of unused and uncommitted memory. Changing it might throw off our memory calculations, though. It might show up as a serious regression.

Marking as a P1 since sfink is working on this now.

Severity: -- → S3
Priority: -- → P1

(In reply to Florian Quèze [:florian] from comment #0)

A large part of it is in free and memcpy calls from the realloc call caused by mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy>::growStorageBy. Do we really need to have the entire piece of data in a single contiguous string to write it to disk?

This sounds like a generic issue which just fall on the JS engine shoulder due to the js::StringBufferAllocPolicy usage.
My blind guess would be that one should provide a reserve function call to initialize the storage to the best known guess.

In the case of the session, I presume we could default to reusing the same or slightly increased length compared to the previous load/storage, and this would be a good enough approximation to avoid most of the reallocation.

Doing that as part of the allocator sounds harder, but maybe we could use the stack traces, or the caller to map to the mean value associated with each call-site.

Blocks: sm-runtime

It strikes me that realloc 2MB -> 4MB is fairly useless overhead. It's grabbing onto 2MB of physical memory so it can move the bits from an existing 2MB of physical memory, which are never used again. What I would want to happen is that the allocator (jemalloc or whatever) reallocs into a larger chunk of virtual memory space, but reuses the existing physical pages rather than memcpy'ing any data over. It would require the mapping to be page-aligned, but that seems reasonable for anything that might grow to be this large.

On Linux, for example, this would call mremap with the MREMAP_MAYMOVE flag.

Assignee: sfink → sphink
Status: NEW → ASSIGNED

I filed bug 1778423 for the idea in comment 9, but it looks like it may not be straightforward.

Condition computeGrowth behavior on existence of AP::computeGrowth<S> method

Bleh, this turned out to be messier than expected. It turns out that rounding up to a power of two isn't that straightforward! And mozalloc.h is a tricky thing to touch. I ended up with a solution that avoided touching it.

Attachment #9284421 - Attachment is obsolete: true
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b014f84dacd0
Allow AllocPolicy to determine Vector growth policy, and be aggressive about StringBuilder allocation strategy to reduce memcpy'ing. r=jandem
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
No longer regressions: 1782562

Verified this is down significantly (down to about 4-5%).

Status: RESOLVED → VERIFIED
Blocks: 1803803
Whiteboard: [bhr:IOUtils::WriteJSON] → [bhr:IOUtils::WriteJSON][sp3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: