Closed
Bug 1447951
Opened 6 years ago
Closed 6 years ago
Don't use nsStringBuffer for dynamic atoms
Categories
(Core :: XPCOM, enhancement)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla62
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
(Blocks 1 open bug)
Details
(Whiteboard: [overhead:500k])
Attachments
(2 files, 1 obsolete file)
10.64 KB,
patch
|
Details | Diff | Splinter Review | |
20.66 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Dynamic atoms point to the chars in an nsStringBuffer. This means that, in addition to the chars themselves, we use memory for: - the pointer within the dynamic atom; - mRefCount and mStorageSize within the nsStringBuffer. On 64-bit that is 16 bytes of overhead. On 32-bit it is 12 bytes. This overhead might be worth it if the nsStringBuffers were often shared. But they're not. I instrumented the atom table memory reporter and got these numbers with a bunch of tabs open: - 1007 dynamic atoms with a shared buffer, taking up 70656 bytes - 35040 dynamic atoms with an unshared buffer, taking up 2299392 bytes (The "taking up" number here is the combination of the dynamic atoms and the nsStringBuffers.) In other words, about 3% of the dynamic atoms have a shared nsStringBuffer. (This is at least partly because lots of atoms are created from UTF8 strings, in which case the nsStringBuffer is freshly allocated and certain to be unshared.) That's a small enough number that we should consider not using nsStringBuffer. An alternative would be to make nsDynamicAtom a variable-sized struct with the chars tacked onto the end. That would halve the number of allocations, avoid the 16 (or 12) bytes of overhead, and also minimize slop.
Comment 1•6 years ago
|
||
I believe the only place where we would share the stringbuffer right now is when the dynamic atom is being returned to JS via DOMString::SetKnownLiveAtom. In practice, the place where this will matter most is .id gets on elements. And there it's a _performance_ optimization (not having to copy the string), not just a memory one. At least in theory. It might be worth changing DOMString::SetKnownLiveAtom to not share the stringbuffer for the dynamic atom case (which is what we would have to do anyway if we made this change) and do some performance measurement to see whether there might be possible impact there.
Updated•6 years ago
|
Whiteboard: [overhead:500k]
Assignee | ||
Comment 2•6 years ago
|
||
First I tried just making nsDynamicAtom own its chars outright, instead of using nsStringBuffer. It seemed like it should help a bit, because we're not storing the mRefCount and mStorageSize, but in practice it was no better and probably a little bit worse. Never fear, a better approach will be forthcoming.
Assignee | ||
Comment 3•6 years ago
|
||
This reduces memory usage because we only need one allocation instead of two for the dynamic atom and its chars, and because we don't need to store a refcount and a size. It precludes sharing of chars between dynamic atoms, but we weren't benefiting much from that anyway. This reduces per-process memory usage by up to several hundred KiB on my Linux64 box. One consequence of this change is that we need to allocate + copy in DOMString::SetKnownLiveAtom(), which could make some things slower.
Assignee | ||
Comment 4•6 years ago
|
||
Comment on attachment 8986421 [details] [diff] [review] Store nsDynamicAtom's chars after the end of the object I did some measurements with cnn.com and Gmail: BEFORE > parent process > 2,275,104 B (00.94%) -- atoms > ├──2,002,720 B (00.83%) -- dynamic e │ ├──1,808,672 B (00.75%) ── unshared-buffers > │ └────194,048 B (00.08%) ── atom-objects > └────272,384 B (00.11%) ── table > cnn.com process > 671,968 B (00.26%) -- atoms > ├──431,328 B (00.17%) -- dynamic > │ ├──291,104 B (00.11%) ── unshared-buffers > │ └──140,224 B (00.05%) ── atom-objects > └──240,640 B (00.09%) ── table > gmail.com process > 1,329,856 B (00.59%) -- atoms > ├────813,760 B (00.36%) -- dynamic > │ ├──444,736 B (00.20%) ── unshared-buffers > │ └──369,024 B (00.16%) ── atom-objects > └────516,096 B (00.23%) ── table AFTER > parent process > 2,284,736 B (00.93%) -- atoms > ├──2,010,304 B (00.82%) ── dynamic-atoms > └────274,432 B (00.11%) ── table > cnn.com process > 511,888 B (00.19%) -- atoms > ├──275,344 B (00.10%) ── dynamic-atoms > └──236,544 B (00.09%) ── table > gmail.com process > 958,624 B (00.41%) -- atoms > ├──514,048 B (00.22%) ── table > └──444,576 B (00.19%) ── dynamic-atoms Not much difference on the parent process, strangely, but a clear win on cnn.com (~180 KB) and Gmail (~370KB). The majority of this would be the slop avoidance by having a single allocation per dynamic atom instead of two allocations.
Attachment #8986421 -
Flags: review?(nfroyd)
Assignee | ||
Comment 5•6 years ago
|
||
This reduces memory usage because we only need one allocation instead of two for the dynamic atom and its chars, and because we don't need to store a refcount and a size. It precludes sharing of chars between dynamic atoms, but we weren't benefiting much from that anyway. This reduces per-process memory usage by up to several hundred KiB on my Linux64 box. One consequence of this change is that we need to allocate + copy in DOMString::SetKnownLiveAtom(), which could make some things slower.
Assignee | ||
Updated•6 years ago
|
Attachment #8986421 -
Attachment is obsolete: true
Attachment #8986421 -
Flags: review?(nfroyd)
Assignee | ||
Updated•6 years ago
|
Attachment #8986653 -
Flags: review?(nfroyd)
Comment 6•6 years ago
|
||
Comment on attachment 8986653 [details] [diff] [review] Store nsDynamicAtom's chars after the end of the object Review of attachment 8986653 [details] [diff] [review]: ----------------------------------------------------------------- This is a nice change; r=me. I thought that you asked bz about a potential benchmark to see if this change had any effect; am I imagining things? If there was a benchmark, did this change produce any interesting benchmark numbers? ::: dom/bindings/DOMString.h @@ +224,5 @@ > + // Dynamic atoms own their own chars, and never have 0 length because > + // nsGkAtoms::_empty is a static atom. > + MOZ_ASSERT(aAtom->GetLength() > 0); > + AsAString() = nsString(aAtom->AsDynamic()->String(), > + aAtom->GetLength()); I think it'd be slightly better to: AsAString().Assign(aAtom->AsDynamic()->String(), ...); here. In a perfect world, I think your version and the Assign version compile to the same code, but I don't think the compiler is quite that intelligent...
Attachment #8986653 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 7•6 years ago
|
||
> This is a nice change; r=me. I thought that you asked bz about a potential
> benchmark to see if this change had any effect; am I imagining things? If
> there was a benchmark, did this change produce any interesting benchmark
> numbers?
I wrote something like that in my original bzexport text, but then it got rejected because bz is on vacation and not accepting feedback requests. Perhaps you saw that somehow? Though I can't see any sign of it in the bug...
Anyway, I measured on Speedometer and there was no difference. I think it'll be ok.
Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b67973aeb2af Store nsDynamicAtom's chars after the end of the object. r=froydnj
Comment 9•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b67973aeb2af
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox62:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment 10•6 years ago
|
||
Awesome work Nick!
Updated•6 years ago
|
Comment 11•6 years ago
|
||
Nick, did you run the performance test I suggested in comment 1? Specifically, a "read the id" microbenchmark, with the [Pure] annotation removed from the "id" attribute in Element.webidl. Apart from that, the DOMString changes look ok to me. I don't have an obvious better idea to prevent the copy; it would need to happen one way or another once our DOMString got converted to a JSString. I so wish we had a unified string representation. :(
Updated•6 years ago
|
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #11) > Nick, did you run the performance test I suggested in comment 1? > Specifically, a "read the id" microbenchmark, with the [Pure] annotation > removed from the "id" attribute in Element.webidl. No. The only perf test I ran was Speedometer, where there was no difference.
Flags: needinfo?(n.nethercote)
Comment 13•6 years ago
|
||
Right, I saw the bit about Speedometer. Speedometer is not the only performance thing we care about, though. Do you plan to run such a test? If not, please let me know and I will try to do it.
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 14•6 years ago
|
||
> If not, please let me know and I will try to do it.
That would be very helpful. You have a much clearer idea of what to measure than I do.
Flags: needinfo?(n.nethercote)
Comment 15•6 years ago
|
||
So I have done some measurements using this testcase: <script> var count = 20000000; var start = new Date(); var el = document.createElement("span"); el.id = "This is a test"; for (var i = 0; i < count; ++i) { el.id; } var stop = new Date(); document.write((stop - start)/count * 1e6); </script> which reports ns/call for the id get. For all tests I used m-c revision 6c29e4fc89c7 (right before this bug landed) with the [Pure] annotation on Element.id removed. I used a "control" build which had no other changes and a "test" build also had this change in DOMString.h: - SetKnownLiveStringBuffer( - aAtom->AsDynamic()->GetStringBuffer(), aAtom->GetLength()); + AsAString().Assign(aAtom->AsDynamic()->String(), aAtom->GetLength()); I tested several different values for the id: "foo", "This is a test", "aaa...aaa" repeated 15 times, "bbb...bbb" repeated 16 times, "ccc...ccc" repeated 63 times, "ddd...ddd" repeated 64 times, "eee...eee" repeated 100 times, "fff...fff" repeated 101 times. On my (Mac) laptop, I get values like so: control, "foo": ~42 control, "This is a test": ~50 control, "aaa...aaa" (15x): ~52 control, "bbb...bbb" (16x): ~23 control, "ccc...ccc" (63x): ~23 control, "ddd...ddd" (64x): ~23 control, "eee...eee" (100x): ~23 control, "fff...fff" (101x): ~23 test, "foo": ~50 test, "This is a test": ~67 test, "aaa...aaa" (15x): ~68 test, "bbb...bbb" (16x): ~107 test, "ccc...ccc" (63x): ~250 test, "ddd...ddd" (64x): ~136 test, "eee...eee" (100x): ~160 test, "fff...fff" (101x): ~380 The "control" results compared to each other surprised me a bit. That's happening because js::NewMaybeExternalString does the inline-latin1-string optimization before it does the external string thing (see bug 1038099). We allow up to 15 chars in inline latin1 strings, so all three of "foo", "This is a test", and "aaaaaaaaaaaaaaa" end up doing an inline string plus copy here. The difference in times between these cases is the extra work the copy and CanStoreCharsAsLatin1 have to do for the longer string, I expect. For the strings longer than 15 chars, we do end up creating an external string in the "control" build and hence avoid the extra copying and scan bits. Since we also hit the external string cache, and without even having to do the char scan in ExternalStringCache::lookup we avoid the GC pressure and allocation work too. Hence the faster time for those case. In the "test" build, the inline-string cases are a little slower than "control" because there's more work done in copying into the DOMString's internal buffer. That's an autostring, so we don't even try the external string thing and just end up creating allocated inline strings. So it's basically the same work as the "control" build except for the extra copy into the on-stack buffer, which is not too bad. Once we're over 15 chars, the behaior is quite different, though. In the "test" build we keep doing the "copy into stack buffer" thing up to 63 chars. Those cases don't do an inline string, and don't do an external string (because we have no stringbuffer), so end up doing one copy to the stack in the bindings and then another copy to the heap in the JS engine when allocating the string. And there's the string allocation, GCs, etc. Once we hit 64 chars, we stop using the on-stack buffer in DOMString and just copy directly to a heap nsStringBuffer. For the 64- and 100-char cases we end up hitting the external string cache (after doing the scan of the string, because we're not pointer-identical to what's in the cache) and hence times drop somewhat compared to the 63-char case: we're back to not doing a second copy, not doing a GCthing allocation, and not doing more GC. At 101 chars we stop doing the linear scan in ExternalStringCache::lookup and hence never hit the external string cache. Now we're always doing a copy to the heap in the DOM code and then always allocating a new external string. There's still only one copy, but more GC pressure than there was in the cache-hit cases. And the copy itself is more expensive, of course. So the conclusion is that for short values this is a somewhat minor regression, but for values in the range [16,63] we're adding two copies and a bunch of GC pressure, for a significant slowdown. It feels to me like the right solution here might be to have an external string type that hold on to an nsDynamicAtom so we can avoid all that and basically recover the performance we used to have... Thoughts?
Flags: needinfo?(n.nethercote)
Comment 16•6 years ago
|
||
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #15) > It feels to me like the right solution here might be to have an external > string type that hold on to an nsDynamicAtom so we can avoid all that and > basically recover the performance we used to have... Thoughts? My initial, mostly unconsidered thought is that I am unexcited about adding more cases to the string code.
Comment 17•6 years ago
|
||
I sympathize, but I'm not sure I have any better ideas offhand, short of knowing what the length distribution of things coming through DOMString::SetKnownLiveAtom looks like.... I really wish we had a more-shared JS/C++ string representation. But even that would not help here, because the whole point of this bug was to stop using the shared representation we _did_ have, for dynamic atoms.
Comment 18•6 years ago
|
||
So to be clear, I am not proposing we add more cases to nsAString, just to DOMString.
Comment 19•6 years ago
|
||
I filed bug 1473149 on the dynamic atom external string idea.
Updated•6 years ago
|
Flags: needinfo?(n.nethercote)
You need to log in
before you can comment on or make changes to this bug.
Description
•