1447951 - Don't use nsStringBuffer for dynamic atoms

Assignee

Description

•

8 years ago

Dynamic atoms point to the chars in an nsStringBuffer. This means that, in addition to the chars themselves, we use memory for: - the pointer within the dynamic atom; - mRefCount and mStorageSize within the nsStringBuffer. On 64-bit that is 16 bytes of overhead. On 32-bit it is 12 bytes. This overhead might be worth it if the nsStringBuffers were often shared. But they're not. I instrumented the atom table memory reporter and got these numbers with a bunch of tabs open: - 1007 dynamic atoms with a shared buffer, taking up 70656 bytes - 35040 dynamic atoms with an unshared buffer, taking up 2299392 bytes (The "taking up" number here is the combination of the dynamic atoms and the nsStringBuffers.) In other words, about 3% of the dynamic atoms have a shared nsStringBuffer. (This is at least partly because lots of atoms are created from UTF8 strings, in which case the nsStringBuffer is freshly allocated and certain to be unshared.) That's a small enough number that we should consider not using nsStringBuffer. An alternative would be to make nsDynamicAtom a variable-sized struct with the chars tacked onto the end. That would halve the number of allocations, avoid the 16 (or 12) bytes of overhead, and also minimize slop.

Boris Zbarsky [:bzbarsky]

Comment 1

•

8 years ago

I believe the only place where we would share the stringbuffer right now is when the dynamic atom is being returned to JS via DOMString::SetKnownLiveAtom. In practice, the place where this will matter most is .id gets on elements. And there it's a _performance_ optimization (not having to copy the string), not just a memory one. At least in theory. It might be worth changing DOMString::SetKnownLiveAtom to not share the stringbuffer for the dynamic atom case (which is what we would have to do anyway if we made this change) and do some performance measurement to see whether there might be possible impact there.

Eric Rahm [:erahm]

Updated

•

8 years ago

Whiteboard: [overhead:500k]

Nicholas Nethercote [inactive]

Assignee

Comment 2

•

8 years ago

Attached patch Bug 1447951 - (DO NOT LAND) Make nsDynamicAtom own its chars. — Details — Splinter Review

First I tried just making nsDynamicAtom own its chars outright, instead of using nsStringBuffer. It seemed like it should help a bit, because we're not storing the mRefCount and mStorageSize, but in practice it was no better and probably a little bit worse. Never fear, a better approach will be forthcoming.

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

8 years ago

Attached patch Store nsDynamicAtom's chars after the end of the object (obsolete) — Details — Splinter Review

This reduces memory usage because we only need one allocation instead of two for the dynamic atom and its chars, and because we don't need to store a refcount and a size. It precludes sharing of chars between dynamic atoms, but we weren't benefiting much from that anyway. This reduces per-process memory usage by up to several hundred KiB on my Linux64 box. One consequence of this change is that we need to allocate + copy in DOMString::SetKnownLiveAtom(), which could make some things slower.

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

8 years ago

Comment on attachment 8986421 [details] [diff] [review] Store nsDynamicAtom's chars after the end of the object I did some measurements with cnn.com and Gmail: BEFORE > parent process > 2,275,104 B (00.94%) -- atoms > ├──2,002,720 B (00.83%) -- dynamic e │ ├──1,808,672 B (00.75%) ── unshared-buffers > │ └────194,048 B (00.08%) ── atom-objects > └────272,384 B (00.11%) ── table > cnn.com process > 671,968 B (00.26%) -- atoms > ├──431,328 B (00.17%) -- dynamic > │ ├──291,104 B (00.11%) ── unshared-buffers > │ └──140,224 B (00.05%) ── atom-objects > └──240,640 B (00.09%) ── table > gmail.com process > 1,329,856 B (00.59%) -- atoms > ├────813,760 B (00.36%) -- dynamic > │ ├──444,736 B (00.20%) ── unshared-buffers > │ └──369,024 B (00.16%) ── atom-objects > └────516,096 B (00.23%) ── table AFTER > parent process > 2,284,736 B (00.93%) -- atoms > ├──2,010,304 B (00.82%) ── dynamic-atoms > └────274,432 B (00.11%) ── table > cnn.com process > 511,888 B (00.19%) -- atoms > ├──275,344 B (00.10%) ── dynamic-atoms > └──236,544 B (00.09%) ── table > gmail.com process > 958,624 B (00.41%) -- atoms > ├──514,048 B (00.22%) ── table > └──444,576 B (00.19%) ── dynamic-atoms Not much difference on the parent process, strangely, but a clear win on cnn.com (~180 KB) and Gmail (~370KB). The majority of this would be the slop avoidance by having a single allocation per dynamic atom instead of two allocations.

Attachment #8986421 - Flags: review?(nfroyd)

Nicholas Nethercote [inactive]

Assignee

Comment 5

•

8 years ago

Attached patch Store nsDynamicAtom's chars after the end of the object — Details — Splinter Review

This reduces memory usage because we only need one allocation instead of two for the dynamic atom and its chars, and because we don't need to store a refcount and a size. It precludes sharing of chars between dynamic atoms, but we weren't benefiting much from that anyway. This reduces per-process memory usage by up to several hundred KiB on my Linux64 box. One consequence of this change is that we need to allocate + copy in DOMString::SetKnownLiveAtom(), which could make some things slower.

Nicholas Nethercote [inactive]

Assignee

Updated

•

8 years ago

Attachment #8986421 - Attachment is obsolete: true

Attachment #8986421 - Flags: review?(nfroyd)

Nicholas Nethercote [inactive]

Assignee

Updated

•

8 years ago

Attachment #8986653 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 6

•

8 years ago

Comment on attachment 8986653 [details] [diff] [review] Store nsDynamicAtom's chars after the end of the object Review of attachment 8986653 [details] [diff] [review]: ----------------------------------------------------------------- This is a nice change; r=me. I thought that you asked bz about a potential benchmark to see if this change had any effect; am I imagining things? If there was a benchmark, did this change produce any interesting benchmark numbers? ::: dom/bindings/DOMString.h @@ +224,5 @@ > + // Dynamic atoms own their own chars, and never have 0 length because > + // nsGkAtoms::_empty is a static atom. > + MOZ_ASSERT(aAtom->GetLength() > 0); > + AsAString() = nsString(aAtom->AsDynamic()->String(), > + aAtom->GetLength()); I think it'd be slightly better to: AsAString().Assign(aAtom->AsDynamic()->String(), ...); here. In a perfect world, I think your version and the Assign version compile to the same code, but I don't think the compiler is quite that intelligent...

Attachment #8986653 - Flags: review?(nfroyd) → review+

Nicholas Nethercote [inactive]

Assignee

Comment 7

•

8 years ago

> This is a nice change; r=me. I thought that you asked bz about a potential > benchmark to see if this change had any effect; am I imagining things? If > there was a benchmark, did this change produce any interesting benchmark > numbers? I wrote something like that in my original bzexport text, but then it got rejected because bz is on vacation and not accepting feedback requests. Perhaps you saw that somehow? Though I can't see any sign of it in the bug... Anyway, I measured on Speedometer and there was no difference. I think it'll be ok.

Pulsebot

Comment 8

•

8 years ago

Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b67973aeb2af Store nsDynamicAtom's chars after the end of the object. r=froydnj

Cristina Coroiu [:ccoroiu]

Comment 9

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b67973aeb2af

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Bobby Holley (:bholley)

Comment 10

•

8 years ago

Awesome work Nick!

Ryan VanderMeulen (PTO, back July 27)

Updated

•

8 years ago

status-firefox61: affected → wontfix

Boris Zbarsky [:bzbarsky]

Comment 11

•

8 years ago

Nick, did you run the performance test I suggested in comment 1? Specifically, a "read the id" microbenchmark, with the [Pure] annotation removed from the "id" attribute in Element.webidl. Apart from that, the DOMString changes look ok to me. I don't have an obvious better idea to prevent the copy; it would need to happen one way or another once our DOMString got converted to a JSString. I so wish we had a unified string representation. :(

Boris Zbarsky [:bzbarsky]

Updated

•

8 years ago

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Assignee

Comment 12

•

8 years ago

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #11) > Nick, did you run the performance test I suggested in comment 1? > Specifically, a "read the id" microbenchmark, with the [Pure] annotation > removed from the "id" attribute in Element.webidl. No. The only perf test I ran was Speedometer, where there was no difference.

Flags: needinfo?(n.nethercote)

Boris Zbarsky [:bzbarsky]

Comment 13

•

8 years ago

Right, I saw the bit about Speedometer. Speedometer is not the only performance thing we care about, though. Do you plan to run such a test? If not, please let me know and I will try to do it.

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Assignee

Comment 14

•

8 years ago

> If not, please let me know and I will try to do it. That would be very helpful. You have a much clearer idea of what to measure than I do.

Flags: needinfo?(n.nethercote)

Boris Zbarsky [:bzbarsky]

Comment 15

•

8 years ago

So I have done some measurements using this testcase: <script> var count = 20000000; var start = new Date(); var el = document.createElement("span"); el.id = "This is a test"; for (var i = 0; i < count; ++i) { el.id; } var stop = new Date(); document.write((stop - start)/count * 1e6); </script> which reports ns/call for the id get. For all tests I used m-c revision 6c29e4fc89c7 (right before this bug landed) with the [Pure] annotation on Element.id removed. I used a "control" build which had no other changes and a "test" build also had this change in DOMString.h: - SetKnownLiveStringBuffer( - aAtom->AsDynamic()->GetStringBuffer(), aAtom->GetLength()); + AsAString().Assign(aAtom->AsDynamic()->String(), aAtom->GetLength()); I tested several different values for the id: "foo", "This is a test", "aaa...aaa" repeated 15 times, "bbb...bbb" repeated 16 times, "ccc...ccc" repeated 63 times, "ddd...ddd" repeated 64 times, "eee...eee" repeated 100 times, "fff...fff" repeated 101 times. On my (Mac) laptop, I get values like so: control, "foo": ~42 control, "This is a test": ~50 control, "aaa...aaa" (15x): ~52 control, "bbb...bbb" (16x): ~23 control, "ccc...ccc" (63x): ~23 control, "ddd...ddd" (64x): ~23 control, "eee...eee" (100x): ~23 control, "fff...fff" (101x): ~23 test, "foo": ~50 test, "This is a test": ~67 test, "aaa...aaa" (15x): ~68 test, "bbb...bbb" (16x): ~107 test, "ccc...ccc" (63x): ~250 test, "ddd...ddd" (64x): ~136 test, "eee...eee" (100x): ~160 test, "fff...fff" (101x): ~380 The "control" results compared to each other surprised me a bit. That's happening because js::NewMaybeExternalString does the inline-latin1-string optimization before it does the external string thing (see bug 1038099). We allow up to 15 chars in inline latin1 strings, so all three of "foo", "This is a test", and "aaaaaaaaaaaaaaa" end up doing an inline string plus copy here. The difference in times between these cases is the extra work the copy and CanStoreCharsAsLatin1 have to do for the longer string, I expect. For the strings longer than 15 chars, we do end up creating an external string in the "control" build and hence avoid the extra copying and scan bits. Since we also hit the external string cache, and without even having to do the char scan in ExternalStringCache::lookup we avoid the GC pressure and allocation work too. Hence the faster time for those case. In the "test" build, the inline-string cases are a little slower than "control" because there's more work done in copying into the DOMString's internal buffer. That's an autostring, so we don't even try the external string thing and just end up creating allocated inline strings. So it's basically the same work as the "control" build except for the extra copy into the on-stack buffer, which is not too bad. Once we're over 15 chars, the behaior is quite different, though. In the "test" build we keep doing the "copy into stack buffer" thing up to 63 chars. Those cases don't do an inline string, and don't do an external string (because we have no stringbuffer), so end up doing one copy to the stack in the bindings and then another copy to the heap in the JS engine when allocating the string. And there's the string allocation, GCs, etc. Once we hit 64 chars, we stop using the on-stack buffer in DOMString and just copy directly to a heap nsStringBuffer. For the 64- and 100-char cases we end up hitting the external string cache (after doing the scan of the string, because we're not pointer-identical to what's in the cache) and hence times drop somewhat compared to the 63-char case: we're back to not doing a second copy, not doing a GCthing allocation, and not doing more GC. At 101 chars we stop doing the linear scan in ExternalStringCache::lookup and hence never hit the external string cache. Now we're always doing a copy to the heap in the DOM code and then always allocating a new external string. There's still only one copy, but more GC pressure than there was in the cache-hit cases. And the copy itself is more expensive, of course. So the conclusion is that for short values this is a somewhat minor regression, but for values in the range [16,63] we're adding two copies and a bunch of GC pressure, for a significant slowdown. It feels to me like the right solution here might be to have an external string type that hold on to an nsDynamicAtom so we can avoid all that and basically recover the performance we used to have... Thoughts?

Flags: needinfo?(n.nethercote)

Nathan Froyd [:froydnj]

Comment 16

•

8 years ago

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #15) > It feels to me like the right solution here might be to have an external > string type that hold on to an nsDynamicAtom so we can avoid all that and > basically recover the performance we used to have... Thoughts? My initial, mostly unconsidered thought is that I am unexcited about adding more cases to the string code.

Boris Zbarsky [:bzbarsky]

Comment 17

•

8 years ago

I sympathize, but I'm not sure I have any better ideas offhand, short of knowing what the length distribution of things coming through DOMString::SetKnownLiveAtom looks like.... I really wish we had a more-shared JS/C++ string representation. But even that would not help here, because the whole point of this bug was to stop using the shared representation we _did_ have, for dynamic atoms.

Boris Zbarsky [:bzbarsky]

Comment 18

•

8 years ago

So to be clear, I am not proposing we add more cases to nsAString, just to DOMString.

Boris Zbarsky [:bzbarsky]

Updated

•

8 years ago

Depends on: 1473149

Boris Zbarsky [:bzbarsky]

Comment 19

•

8 years ago

I filed bug 1473149 on the dynamic atom external string idea.

Boris Zbarsky [:bzbarsky]

Updated

•

8 years ago

Flags: needinfo?(n.nethercote)

Nathan Froyd [:froydnj]

Updated

•

7 years ago

Depends on: 1481555

Emilio Cobos Álvarez [:emilio]

Updated

•

2 years ago

Updated

•

2 years ago

Bug 1447951 - (DO NOT LAND) Make nsDynamicAtom own its chars. 8 years ago Nicholas Nethercote [inactive] 10.64 KB, patch		Details \| Diff \| Splinter Review
Store nsDynamicAtom's chars after the end of the object 8 years ago Nicholas Nethercote [inactive] 17.48 KB, patch		Details \| Diff \| Splinter Review
Store nsDynamicAtom's chars after the end of the object 8 years ago Nicholas Nethercote [inactive] 20.66 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review