Slow JS_Stringify performance (Lots of hangs in IOUtils::WriteJSON)
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox105 | --- | fixed |
People
(Reporter: florian, Assigned: sfink)
References
(Blocks 2 open bugs)
Details
(Keywords: perf:responsiveness, Whiteboard: [bhr:IOUtils::WriteJSON][sp3])
Attachments
(2 files, 1 obsolete file)
BHR data shows that 11% of the total hang time is in IOUtils::WriteJSON
, making it the third most common hang.
A large part of it is in free and memcpy calls from the realloc call caused by mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy>::growStorageBy
. Do we really need to have the entire piece of data in a single contiguous string to write it to disk?
It's also possible (and likely) that the data we write is too big, and written too often. Session store data is likely involved here.
Comment 1•2 years ago
|
||
IOUtils.writeJSON is part of OS.File stuff, not the web platform.
Barret, any ideas? It looks like you added this. Thanks.
Comment 2•2 years ago
•
|
||
Yeah, it is possible to split this up into multiple buffers (a la BufferList), but is it a bit more complicated because we also support writing LZ4 compressed files, and the implementaiton we're using requires the whole buffer. We might be able to use frame based compression but I'll have to do some experimenting.
Comment 3•2 years ago
|
||
Upon further inspection, that signature comes from inside JS itself. The call to JS_Stringify is what is creating the vector in question.
I'll confer with some Spidermonkey folks on how to approach this.
Comment 4•2 years ago
|
||
While session-store data is likely a part of the problem here, that is probably a much bigger part to fix. The immediate hang here is poor performance in JS_Stringify. Moving this to JavaScript Engine component do to a bit of optimization on string-ify of large array objects. This also looks like it affects JS Array-join behaviour so some perf work probably is worth doing.
Fixing the JS StringBuffer behaviour to grow vector less often seems like it would help a lot here. If there are problems still after that, we can go further up the stack avoid the need to a final linear buffer, but that breaks the JS_Stringify API and a new one would be needed.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 5•2 years ago
|
||
Here's an example profile of js -e 'obj = JSON.parse(os.file.readFile("unsanitized.json")); for (i = 0; i < 200; i++) JSON.stringify(obj)'
where unsanitized.json
is a 5.9MB sessionstore file. It shows lots of realloc activity. Timings:
./obj-opt-js/dist/bin/js -e 1.27s user 1.91s system 100% cpu 3.165 total
./obj-opt-js/dist/bin/js -e 1.27s user 1.91s system 100% cpu 3.178 total
./obj-opt-js/dist/bin/js -e 1.29s user 1.90s system 100% cpu 3.186 total
./obj-opt-js/dist/bin/js -e 1.27s user 1.90s system 100% cpu 3.161 total
./obj-opt-js/dist/bin/js -e 1.25s user 1.92s system 100% cpu 3.164 total
If I just change the behavior in this case to growing a StringBuffer's allocation by a factor if 8 instead of 2, it makes a huge difference:
./obj-opt-js/dist/bin/js -e 1.12s user 1.25s system 99% cpu 2.368 total
./obj-opt-js/dist/bin/js -e 1.06s user 1.17s system 99% cpu 2.236 total
./obj-opt-js/dist/bin/js -e 1.09s user 1.30s system 99% cpu 2.390 total
./obj-opt-js/dist/bin/js -e 1.07s user 1.36s system 100% cpu 2.431 total
./obj-opt-js/dist/bin/js -e 1.09s user 1.35s system 100% cpu 2.431 total
The original pattern of reallocs is:
realloc((size_t) 512)
realloc((size_t) 1024)
realloc((size_t) 4096)
realloc((size_t) 65536)
realloc((size_t) 131072)
realloc((size_t) 262144)
realloc((size_t) 524288)
realloc((size_t) 1048576)
realloc((size_t) 2097152)
realloc((size_t) 4194304)
realloc((size_t) 8388608)
realloc((size_t) 16777216)
realloc((size_t) 11148102)
Assignee | ||
Comment 6•2 years ago
|
||
Note the last one—that's from extractWellSized
, where just before handing the malloc pointer off to a freshly created JSString
, it shrinks it down to its actual size.
For large strings in a 64-bit virtual memory space, I question whether that has real-world value. It seems like it would hopefully just be a bunch of unused and uncommitted memory. Changing it might throw off our memory calculations, though. It might show up as a serious regression.
Comment 7•2 years ago
|
||
Marking as a P1 since sfink is working on this now.
Comment 8•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #0)
A large part of it is in free and memcpy calls from the realloc call caused by
mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy>::growStorageBy
. Do we really need to have the entire piece of data in a single contiguous string to write it to disk?
This sounds like a generic issue which just fall on the JS engine shoulder due to the js::StringBufferAllocPolicy
usage.
My blind guess would be that one should provide a reserve
function call to initialize the storage to the best known guess.
In the case of the session, I presume we could default to reusing the same or slightly increased length compared to the previous load/storage, and this would be a good enough approximation to avoid most of the reallocation.
Doing that as part of the allocator sounds harder, but maybe we could use the stack traces, or the caller to map to the mean value associated with each call-site.
Assignee | ||
Comment 9•2 years ago
|
||
It strikes me that realloc 2MB -> 4MB is fairly useless overhead. It's grabbing onto 2MB of physical memory so it can move the bits from an existing 2MB of physical memory, which are never used again. What I would want to happen is that the allocator (jemalloc or whatever) reallocs into a larger chunk of virtual memory space, but reuses the existing physical pages rather than memcpy'ing any data over. It would require the mapping to be page-aligned, but that seems reasonable for anything that might grow to be this large.
On Linux, for example, this would call mremap
with the MREMAP_MAYMOVE
flag.
Assignee | ||
Comment 10•2 years ago
|
||
Assignee | ||
Comment 11•2 years ago
|
||
I filed bug 1778423 for the idea in comment 9, but it looks like it may not be straightforward.
Assignee | ||
Comment 12•2 years ago
|
||
Condition computeGrowth behavior on existence of AP::computeGrowth<S> method
Assignee | ||
Comment 13•2 years ago
|
||
Bleh, this turned out to be messier than expected. It turns out that rounding up to a power of two isn't that straightforward! And mozalloc.h is a tricky thing to touch. I ended up with a solution that avoided touching it.
Assignee | ||
Comment 14•2 years ago
|
||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Comment 16•2 years ago
|
||
bugherder |
Comment 17•2 years ago
|
||
Verified this is down significantly (down to about 4-5%).
Updated•2 years ago
|
Updated•2 years ago
|
Description
•