Closed Bug 1407117 Opened 3 years ago Closed 3 years ago

Simplify static atom representation

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file, 3 obsolete files)

Currently nsAtom::mString points to the interior of an nsStringBuffer. For
static atoms this requires the use of nsFakeStringBuffer, which is pretty
gross.

This patch changes things so that nsAtom::mString points to a static char
buffer for static atoms. This simplifies a number of things:

- nsFakeStringBuffer and CheckStaticAtomSizes are no longer needed.

- FakeBufferRefCountHelper is no longer needed.

- nsAtom's constructor for static atoms is simpler.

- RegisterStaticAtoms() is simpler.

On the flip-side, a couple of things get more complicated.

- nsAtom::ToString() treats static and dynamic atoms differently.

- nsAtom::GetStringBuffer() is now only valid for dynamic atoms. This
  function is only used in two places, both involving DOMString, so those
  locations are updated appropriately.

On Linux64 this change reduces the size of the binary by 8752 B, and moves
81968 B from the .data to the .rodata section, where it can be shared between
processes.
bz: please review the dom/ changes.

froydnj: please review everything else.
Attachment #8916879 - Flags: review?(nfroyd)
Attachment #8916879 - Flags: review?(bzbarsky)
Comment on attachment 8916879 [details] [diff] [review]
Simplify static atom representation

Review of attachment 8916879 [details] [diff] [review]:
-----------------------------------------------------------------

Moving all that stuff to .rodata is nice...that just comes about because we're able to apply `const` to the string buffer, whereas we weren't/couldn't before, correct?

::: dom/base/nsAttrValue.h
@@ +544,5 @@
>      case eAtom:
>      {
>        nsAtom *atom = static_cast<nsAtom*>(GetPtr());
> +      if (atom->IsStaticAtom()) {
> +        aResult.AsAString() = nsDependentString(atom->GetUTF16String());

Use aAtom->GetLength() here, so we don't need to compute the strlen?

::: dom/bindings/DOMString.h
@@ +179,5 @@
>      MOZ_ASSERT(!mStringBuffer, "Setting stringbuffer twice?");
>      MOZ_ASSERT(aAtom || aNullHandling != eNullNotExpected);
>      if (aNullHandling == eNullNotExpected || aAtom) {
> +      if (aAtom->IsStaticAtom()) {
> +        AsAString() = nsDependentString(aAtom->GetUTF16String());

Use aAtom->GetLength() here, so we don't need to compute the strlen?

::: xpcom/ds/nsAtom.h
@@ +56,4 @@
>    {
>      // See the comment on |mString|'s declaration.
> +    if (IsStaticAtom()) {
> +      aString = nsDependentString(mString);

We can use nsDependentString(mLength, mLength) here to avoid taking strlen(mString) every time, correct?

Should we out-of-line this function now?  It's even bigger than it was before...

::: xpcom/ds/nsStaticAtom.h
@@ +19,2 @@
>  #define NS_STATIC_ATOM_BUFFER(buffer_name, str_data) \
> +  static const char16_t buffer_name[sizeof(str_data)] = u"" str_data;

WDYT about making this more robust and using the old sizeof(str_data)/sizeof(str_data[0]) trick here, since we don't have any checking that people are really, truly passing in 8-bit strings here?  Alternatively, we could have static_asserts on str_data[0], but that feels like a bit much.

OTOH, we already had sizeof(str_data) before, and that seemed to work out OK, so maybe the above paranoia is not justifiable.
Attachment #8916879 - Flags: review?(nfroyd) → review+
Comment on attachment 8916879 [details] [diff] [review]
Simplify static atom representation

So the tradeoff here is that now we make a copy of the string, instead of sharing, when we convert an atom to an nsString or DOMString, right?

Not only that, but we end up doing an strlen-equivalent, though we could fix that by actually passing our mLength to nsDependentString... Or just using nsDependentAtomString(this), to avoid duplicating its logic.

That said, another approach might be to store an array of const nsLiteralString, not an array of u"".  It would have a bit data, and I'm not sure whether it could still be rodata...  But it would allow us to continue the non-copying behavior, though it might need some changes to DOMString to be able to represent an nsLiteralString that's guaranteed to outlive the DOMString or something. 

Thoughts?
Flags: needinfo?(n.nethercote)
Attachment #8916879 - Flags: review?(bzbarsky)
> Moving all that stuff to .rodata is nice...that just comes about because
> we're able to apply `const` to the string buffer, whereas we
> weren't/couldn't before, correct?

Yes. I originally didn't add the `const` and didn't see this benefit, but then I added it and it happened.


> ::: xpcom/ds/nsStaticAtom.h
> @@ +19,2 @@
> >  #define NS_STATIC_ATOM_BUFFER(buffer_name, str_data) \
> > +  static const char16_t buffer_name[sizeof(str_data)] = u"" str_data;
> 
> WDYT about making this more robust and using the old
> sizeof(str_data)/sizeof(str_data[0]) trick here, since we don't have any
> checking that people are really, truly passing in 8-bit strings here? 
> Alternatively, we could have static_asserts on str_data[0], but that feels
> like a bit much.

Are you worried that someone might use a u"" literal as an argument? If so, I'd prefer a static assert on the size of str_data[0] and forcing 8-bit strings rather than permitting 16-bit strings.
Flags: needinfo?(n.nethercote)
> So the tradeoff here is that now we make a copy of the string, instead of
> sharing, when we convert an atom to an nsString or DOMString, right?

No. For dynamic atoms the behaviour is unchanged (modulo the extra IsStatic check). For static atoms we create a dependent string, so it's still sharing rather than copying.
 
> Not only that, but we end up doing an strlen-equivalent, though we could fix
> that by actually passing our mLength to nsDependentString... Or just using
> nsDependentAtomString(this), to avoid duplicating its logic.

That's a good idea and I will do it.

> That said, another approach might be to store an array of const
> nsLiteralString, not an array of u"".  It would have a bit data, and I'm not
> sure whether it could still be rodata...  But it would allow us to continue
> the non-copying behavior, though it might need some changes to DOMString to
> be able to represent an nsLiteralString that's guaranteed to outlive the
> DOMString or something. 

That sounds complicated and I don't think it will help!
New version has the following changes:

- De-inlines ToString().
  
- Adds a static assert that all static atom string literals are 8-bit.

- Uses nsDependentAtomString and mLength to avoid strlen() calls when creating
  dependent strings from atoms.
Attachment #8917201 - Flags: review?(bzbarsky)
Attachment #8916879 - Attachment is obsolete: true
Blocks: 1257126
> For static atoms we create a dependent string, so it's still sharing rather than copying.

It's copying.  The dependent string constructor doesn't copy, but assignment of a dependent string into an nsAString _does_, because the place you're assigning to has no idea what the lifetime of the dependent thing is, so has to assume the worst and copy the data.

The old code just bumped the refcount on the array buffer (which was a no-op for the static atom case).

> and I don't think it will help

In what sense?  Assignment of a _literal_ string into an nsAString does not copy, last I checked, because we guarantee that literal strings have the same lifetime as the process.

Maybe if we know the data is static we could just create an nsLiteralString to use for the assignment right at the assignment point.  Or even just call the two-arg form of AssignLiteral?  Worth checking with Nathan whether that's reasonable.  We don't want usage of that to spread too much, but this case seems like a decent fit for it.
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #7)
> 
> It's copying.  The dependent string constructor doesn't copy, but assignment
> of a dependent string into an nsAString _does_, because the place you're
> assigning to has no idea what the lifetime of the dependent thing is, so has
> to assume the worst and copy the data.

I see. Thank you for the correction.

> Maybe if we know the data is static we could just create an nsLiteralString
> to use for the assignment right at the assignment point.

I think you can't create an nsLiteralString from something other than a string literal...

> Or even just call the two-arg form of AssignLiteral?

... but that should work. Good suggestion.
I updated the patch to use AssignLiteral() in the three relevant places.
Attachment #8917256 - Flags: review?(nfroyd)
Attachment #8917256 - Flags: review?(bzbarsky)
Attachment #8917201 - Attachment is obsolete: true
Attachment #8917201 - Flags: review?(bzbarsky)
Comment on attachment 8917256 [details] [diff] [review]
Simplify static atom representation

This is probably OK, but can you please file a followup on me to add first-class literal support to DOMString?  We're still ending up constructing, and destroying, a wholly unnecessary nsString in there.
Attachment #8917256 - Flags: review?(bzbarsky) → review+
Comment on attachment 8917256 [details] [diff] [review]
Simplify static atom representation

Review of attachment 8917256 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know how the suggestion below would play into bz's request for first-class literal support for DOMString...

::: dom/base/nsAttrValue.h
@@ +546,5 @@
>        nsAtom *atom = static_cast<nsAtom*>(GetPtr());
> +      if (atom->IsStaticAtom()) {
> +        // AssignLiteral() lets us assign without copying. This isn't a string
> +        // literal, but it's a static atom and thus has an unbounded lifetime,
> +        // which is what's important.

WDYT about adding:

NS_AssignAtomToString(nsAtom*, nsAString&);

declared in nsAtom.h, that does what this code does, so we can put the implementation, and more importantly, the explanation of what we're doing, in a single place (versus the several places we have it in this patch)?  Alternatively, we could have nsString::Assign(nsAtom*) or something like that, but then the string code depends on bits of the atom implementation, which doesn't seem like a good dependency setup.  Either way, the user has to remember to use this special case for atoms so we do the efficient thing...
Attachment #8917256 - Flags: review?(nfroyd)
Attachment #8917256 - Flags: review?(bzbarsky)
Attachment #8917256 - Flags: review+
Comment on attachment 8917256 [details] [diff] [review]
Simplify static atom representation

Sigh, bugzilla.
Attachment #8917256 - Flags: review?(bzbarsky) → review+
It would play OK.  I'd just end up not using AssignAtomToString at all, and instead changing DOMString to hold on to the literal buffer directly instead of sticking it into an nsAString.
Blocks: 1407858
I filed bug 1407858 for the first-class literal support in DOMString, and I will add XXX comments that refer to that bug in the two relevant places in the code.

I decided against adding NS_AssignAtomToString() because once bug 1407858 is done there would only be one use of it (in nsAtom::ToString()).
When I put AssignLiteral() in the three places discussed above I get assertion
failures like this before the browser even starts:

> Assertion failure: aStringBuffer (Why are we getting null?), at /builds/worker
/workspace/build/src/obj-firefox/dist/include/mozilla/dom/DOMString.h:139
> #01: mozilla::dom::ElementBinding::get_localName [dom/bindings/ElementBinding.
cpp::539]
> #02: mozilla::dom::GenericBindingGetter [dom/bindings/BindingUtils.cpp:2910]

E.g. see the following try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=809c5c1a48fdc27737d9c10f
ce97e7fb37337638

The problem is that nsINode::GetLocalName() does this:

> aNodeName.SetStringBuffer(nsStringBuffer::FromString(nodeName),
>                           nodeName.Length());

It assumes that nodeName is an nsString with a shared buffer, but because of
the AssignLiteral() calls elsewhere in the patch that's no longer true.

Some other places that call SetStringBuffer() first check for a shared buffer,
though there is inconsistency all around. I'm not sure if it's by luck or 
design that this problem hasn't come up prior to now. 

I've modified the patch so that all the places that call SetStringBuffer() now
do something more careful that allows for other possibilities, mostly by
calling SetOwnedString() or SetOwnedAtom() instead. (This also have the benefit
of reducing the two XXX comments in the previous patch to one.)

bz, how does this look? It's green on try. I've put "njn:" comments throughout 
to explain some of my thinking and uncertainties.
Attachment #8918108 - Flags: review?(bzbarsky)
Attachment #8917256 - Attachment is obsolete: true
Comment on attachment 8918108 [details] [diff] [review]
Simplify static atom representation

>+        // njn: in the eString case are we guaranteed to always have an
>+        // nsStringBuffer

Yes, we are.  The only places we set our type to eString/eStringBase are the two nsAttrValue::SetTo taking an nsAttrValue and nsAString, and both have a stringbuffer.  We should probably document this invariant clearly in nsAttrValue.h

>+      // njn: is aResult guaranteed to be empty before the assignment?

It should be, in theory.  The API contract for DOMString is that you can only assign to it once.

>+    // njn: fixed assumption that nodeName is a shared string.

Right.  So this used to always be true due to how the string was created, but isn't true anymore.

This change is a little unfortunate in the sense that it will cause a copy in the "came from a static atom" common case.  But bug 1407858 will help with that.

This comment can presumably go away, right?

>+    // njn: (But is aNodeName guaranteed to be empty before the assignment?)

Should be if the caller is following the API contract, yes.

>+  // njn: well, it could be a non-shared string, though it's not likely in this

Sort of.  This test is explicitly here to test the stringbuffer codepath.  If it ever stops testing that, we want to know.

But you're right, that the caller could invoke this test in such a way that it would not test this codepath.  In that case we'd want to fix the caller.

>+    // njn: this one appears to be correct already, though it might be nicer to
>+    //      have this code SetEphemeralStringBuffer-else-Assign sequence
>+    //      encapsulated in a DOMString method

That's fair.  File a followup, please?  We only had one consumer so far, so...

r=me
Attachment #8918108 - Flags: review?(bzbarsky) → review+
https://hg.mozilla.org/mozilla-central/rev/4c3ce57ff29c
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.