Closed Bug 1032726 Opened 6 years ago Closed 6 years ago

Make DOM bindings work with Latin1 strings and nursery-allocated strings

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 2 obsolete files)

I prototyped this last week and discussed it with bz, and I got it to the point where it was a pretty small slowdown (in the worst case).
ConvertJSValueToByteString scans the string for chars > 255, we can skip this for Latin1 strings. The patch also adds the AutoCheckCannotGC API, so that the GC analysis checks that we don't use the chars across a GC call. I had to move the ThrowErrorMessage outside the loop to make that work.

r? bz for the DOM changes, Terrence for the rest.
Attachment #8448606 - Flags: review?(terrence)
Attachment #8448606 - Flags: review?(bzbarsky)
Note that Codegen.py still generates code that calls GetTwoByteAtomChars unconditionally. I'll fix this in a later patch.
Attachment #8448621 - Flags: review?(terrence)
Attachment #8448621 - Flags: review?(bzbarsky)
The inner loop can easily be templatized to work on either Latin1Char or jschar with no overhead.
Attachment #8448662 - Flags: review?(terrence)
Attachment #8448662 - Flags: review?(bzbarsky)
Comment on attachment 8448606 [details] [diff] [review]
Part 1 - ConvertJSValueToByteString

r=me
Attachment #8448606 - Flags: review?(bzbarsky) → review+
Comment on attachment 8448621 [details] [diff] [review]
Part 2 - GetArrayIndexFromId

>+                  ${argName}.SetData(js::GetTwoByteAtomChars(nogc, atom), js::GetAtomLength(atom));

How do we know the atom is 2-byte here?

r=me with that addressed.
Attachment #8448621 - Flags: review?(bzbarsky) → review+
Oh, I see, that's what you plan to fix in a later patch.  OK.
Comment on attachment 8448662 [details] [diff] [review]
Part 3 - FindEnumStringIndex

>+    if (JS_StringHasLatin1Chars(str)) {

We'll want an inline API for that eventually, then use it here?  Followup is ok.

r=me
Attachment #8448662 - Flags: review?(bzbarsky) → review+
The main goal is to add an inlineable js::CopyStringChars friend API. This requires some extra machinery, but this stuff is perf-critical and we need this to keep the regression as small as possible.
Attachment #8448767 - Flags: review?(terrence)
Attached patch Part 5 - Fake(Dependent)String (obsolete) — Splinter Review
This is the main patch. It renames FakeDependentString to FakeString. Inline storage (64 chars atm) is used for short strings. nsStringBuffer is used for longer strings, this potentially avoids copies later on because nsStringBuffer can be shared.

In a follow-up patch I want to move this class + AssignJSString to its own header, so that we can use it in other code instead of nsDependentJSString.

As I mentioned last week, with this patch a (worst case) getAttribute micro-benchmark is about 9% slower with a 32-bit opt build on OS X.
Attachment #8448839 - Flags: review?(bzbarsky)
Attached patch Part 5 - Fake(Dependent)String (obsolete) — Splinter Review
Sorry for the bugspam; I always notice something after requesting review :(
Attachment #8448839 - Attachment is obsolete: true
Attachment #8448839 - Flags: review?(bzbarsky)
Attachment #8448843 - Flags: review?(bzbarsky)
Blocks: 1032926
Comment on attachment 8448843 [details] [diff] [review]
Part 5 - Fake(Dependent)String

>+    if (len + 1 <= sInlineCapacity) {

This is subject to integer overflow, right?  Better to check |len <= sInlineCapacity - 1|.

>+    nsRefPtr<nsStringBuffer> buf = nsStringBuffer::Alloc((len + 1) * sizeof(nsString::char_type));

This too seems subject to integer overflow...  Of course nsStringBuffer::Alloc will add more stuff to it.  :(

We should actually verify that (len + 1) * sizeof(nsString::char_type) + sizeof(nsStringBuffer) will fit in a uint32_t.

>+    SetData(static_cast<nsString::char_type*>(buf->Data()), len);
>+    mozilla::unused << buf.forget(); // Take ownership.

How about:

  SetData(static_cast<nsString::char_type*>(buf.forget().take()->Data()), len);

and then no need for the second line.

>+    mFlags = nsString::F_SHARED | nsString::F_TERMINATED;

Please document that this method leaves the string in an inconsistent state where it has F_TERMINATED set but no actual null terminator and that the caller is expected to put the null terminator in appropriately.

>+static bool
>+AssignJSString(JSContext *cx, FakeString &dest, JSString *s)

inline bool?

>+  if (!dest.SetCapacityAndLength(len) ||
>+      !js::CopyStringChars(cx, dest.BeginWriting(), s, len))
>+  {
>+    return false;

Doesn't failure to SetCapacityAndLength need to throw an exception?

>-            if isMember == "Variadic":

No, I think you want to keep that branching.  In particular, for variadics of short strings this will cause an extra copy (complete with malloc) here that's not really desirable.

The comment probably should be updated with s/have to make a copy/may have to make a copy/

r=me with those fixed
Attachment #8448843 - Flags: review?(bzbarsky) → review+
Comment on attachment 8448606 [details] [diff] [review]
Part 1 - ConvertJSValueToByteString

Review of attachment 8448606 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8448606 - Flags: review?(terrence) → review+
Comment on attachment 8448621 [details] [diff] [review]
Part 2 - GetArrayIndexFromId

Review of attachment 8448621 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8448621 - Flags: review?(terrence) → review+
Comment on attachment 8448662 [details] [diff] [review]
Part 3 - FindEnumStringIndex

Review of attachment 8448662 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8448662 - Flags: review?(terrence) → review+
Comment on attachment 8448767 [details] [diff] [review]
Part 4 - Beef up friend APIs

Review of attachment 8448767 [details] [diff] [review]:
-----------------------------------------------------------------

I would not be opposed to moving all of this new API to js/public/String.h and just having inlinable string methods as part of the official API, although we can always do that later.

::: js/src/jsfriendapi.h
@@ +574,5 @@
>      const JSJitInfo *jitinfo;
>      void *_1;
>  };
>  
> +struct String {

{ on newline.
Attachment #8448767 - Flags: review?(terrence) → review+
Thanks for the fast reviews!

(In reply to Boris Zbarsky [:bz] from comment #11)
> r=me with those fixed

I'll post a follow-up patch to address your comments + rebase the patch on top of Peter's patch.

(In reply to Terrence Cole [:terrence] from comment #15)
> I would not be opposed to moving all of this new API to js/public/String.h
> and just having inlinable string methods as part of the official API,
> although we can always do that later.

Yeah, Waldo suggested that too... It's probably a good idea and will make some callers nicer; I'll file a follow-up bug.
Depends on: 1032748
Comment on attachment 8448843 [details] [diff] [review]
Part 5 - Fake(Dependent)String

>+// A struct that has the same layout as an nsString but much faster
>+// constructor and destructor behavior. FakeString uses inline storage
>+// for small strings and a nsStringBuffer for longer strings.
>+struct FakeString {
>+  FakeString() :
>+    mFlags(nsString::F_TERMINATED)
>   {
>   }
> 
>+  ~FakeString() {
>+    if (mFlags & nsString::F_SHARED) {
>+      nsStringBuffer::FromData(mData)->Release();
>+    }
>+  }
[Is this still much faster than nsString?]
(In reply to neil@parkwaycc.co.uk from comment #18)
> [Is this still much faster than nsString?]

Yes, definitely. Last week I prototyped this and at first I used a class that inherited from nsAutoString and it was much slower (even for small strings that should fit in the inline storage).
I wanted to post an interdiff, but the rebase touched a bunch of code so I think it's best to post the new patch.
Attachment #8448843 - Attachment is obsolete: true
Attachment #8449405 - Flags: review?(bzbarsky)
Parts 2 + 3:

https://hg.mozilla.org/integration/mozilla-inbound/rev/60e70f9d98cb
https://hg.mozilla.org/integration/mozilla-inbound/rev/219b6b2a4a29

(In reply to Boris Zbarsky [:bz] from comment #7)
> We'll want an inline API for that eventually, then use it here?  Followup is
> ok.

Yup, will post a follow-up patch for that.
> [Is this still much faster than nsString?]

Yes.  nsString has out-of-line (effectively, in the case of the dtor) ctor/dtor, the ctor does a bit more work than FakeString's, and the dtor also does a bit more work.
Comment on attachment 8449405 [details] [diff] [review]
Part 5 - Fake(Dependent)String, v2

>-  nsDependentString string;
>+  binding_detail::FakeString string;

No, we really do want to keep the nsDependentString here.

>@@ -4440,44 +4440,35 @@ def getJSToNativeConversionInfo(type, de

>             # We have to make a copy, except in the variadic case, because our

This comment is no longer relevant.  Perhaps replace it with:

  # Convert directly into the nsString member we have.

?

>             return JSToNativeConversionInfo(
>                 fill(

I don't think this complexity is needed.  There is no more $*{assign} or "str" temporary, so all this is doing is returning $*{convert} inside a now-unnecessary block.  We should just do:

             return JSToNativeConversionInfo(
                 getConversionCode("${declName}"),
                 declType=declType,
                 dealWithOptional=isOptional)

r=me with those fixed.
Attachment #8449405 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #23)
> No, we really do want to keep the nsDependentString here.

We can no longer use a nsDependentString for JS strings right, because we might GC or have Latin1 strings? Am I missing something?
Comment on attachment 8449405 [details] [diff] [review]
Part 5 - Fake(Dependent)String, v2

>     pval.set(JS::StringValue(s));  // Root the new string.
>   }
> 
>-  size_t len;
>-  const jschar *chars = JS_GetStringCharsZAndLength(cx, s, &len);
>-  if (!chars) {
>-    return false;
>-  }
>-
>-  result.Rebind(chars, len);
>-  return true;
>+  return binding_detail::AssignJSString(cx, result, s);
If you're copying, rather than rebinding, the string, do you still need to root it? (Should you try to avoid copying in the two byte char cases?)
> We can no longer use a nsDependentString for JS strings right

Oh, right.  Just use nsString or nsAutoString there there.

> If you're copying, rather than rebinding, the string, do you still need to root it? 

Only if some part of AssignJSString can GC.  If not, we can nix this pval argument altogether, I think.

> (Should you try to avoid copying in the two byte char cases?)

That would fail for the nursery-allocated string case, right?  Because if we don't copy and pass a pointer directly into the inline nursery string into Gecko and then cause a GC, the string can move and suddenly we're screwed.
(In reply to Boris Zbarsky from comment #26)
> > If you're copying, rather than rebinding, the string, do you still need to root it? 
> 
> Only if some part of AssignJSString can GC.  If not, we can nix this pval
> argument altogether, I think.
> 
> > (Should you try to avoid copying in the two byte char cases?)
> 
> That would fail for the nursery-allocated string case, right?  Because if we
> don't copy and pass a pointer directly into the inline nursery string into
> Gecko and then cause a GC, the string can move and suddenly we're screwed.

I don't know which case(s) apply here, I was just concerned that we were rooting a JS string and then copying it, and I wasn't sure which, if either, was wrong.

(Unfortunately DXR isn't very good with generated files so I don't even know whether it's a good idea to keep trying to share code between FakeString and the old FakeDependentString.)
Basically, the JS folks do not want us holding char pointers into JS strings across GC.

nsAString provides a char pointer.

That means either copying when we produce an nsAString or no longer using nsAString for Web IDL API string arguments and instead using a class that keeps alive a JSString* and asks it for the chars as needed, with no-gc static asserts or something.

The former is way more doable on a sane timeframe.  :(
(In reply to neil@parkwaycc.co.uk from comment #25)
> If you're copying, rather than rebinding, the string, do you still need to
> root it?

(In reply to Boris Zbarsky [:bz] from comment #26)
> Only if some part of AssignJSString can GC.  If not, we can nix this pval
> argument altogether, I think.

Oh, good point. AssignJSString can't GC AFAIK (rope flattening is fallible but won't GC), I'll remove pval in the followup patch I'm about to post.
After removing pval from ConvertJSValueToString I could remove mutableVal from a bunch of places in Codegen.py (I think..)
Attachment #8449616 - Flags: review?(bzbarsky)
Comment on attachment 8449616 [details] [diff] [review]
Part 6 - Followup changes

> ConvertJSValueToByteString(JSContext* cx, JS::Handle<JS::Value> v,
>   if (!JS_StringHasLatin1Chars(s)) {

Worth using the inline thing here too?  Might not be....

r=me
Attachment #8449616 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky from comment #28)
> Basically, the JS folks do not want us holding char pointers into JS strings
> across GC.
Fair enough. Copying seems sane in that case, except for external strings.

(In reply to Jan de Mooij from comment #29)
> Oh, good point. AssignJSString can't GC AFAIK (rope flattening is fallible
> but won't GC), I'll remove pval in the followup patch I'm about to post.
(Seems a waste to flatten the string and then copy it again.)

(From update of attachment 8449616 [details] [diff] [review])
> // pval must not be null and must point to a rooted JS::Value
> template<typename T>
> static inline bool
> ConvertJSValueToString(JSContext* cx, JS::Handle<JS::Value> v,
>-                       JS::MutableHandle<JS::Value> pval,
Nit: comment is out of date.
Part 6:

https://hg.mozilla.org/integration/mozilla-inbound/rev/a133199d09e2

(In reply to Boris Zbarsky [:bz] from comment #31)
> > ConvertJSValueToByteString(JSContext* cx, JS::Handle<JS::Value> v,
> >   if (!JS_StringHasLatin1Chars(s)) {
> 
> Worth using the inline thing here too?  Might not be....

Done.

(In reply to neil@parkwaycc.co.uk from comment #34)
> Nit: comment is out of date.

Fixed.
Keywords: leave-open
Comment on attachment 8449405 [details] [diff] [review]
Part 5 - Fake(Dependent)String, v2

>+  if (MOZ_UNLIKELY(!dest.SetCapacity(len + 1, mozilla::fallible_t()))) {
>+    JS_ReportOutOfMemory(cx);
>+    return false;
>+  }
>+  if (MOZ_UNLIKELY(!js::CopyStringChars(cx, dest.BeginWriting(), s, len))) {
>+    return false;
>+  }
>+  dest.BeginWriting()[len] = '\0';
>+  dest.SetLength(len);
>+  return true;
I overlooked that this is probably overkill; SetLength calls SetCapacity anyway, and SetCapacity allows for and writes the null terminator, so you can just write something like this:
if (MOZ_UNLIKELY(!dest.SetLength(len, mozilla::fallible_t()))) {
  JS_ReportOutOfMemory(cx);
  return false;
}
return js::CopyStringChars(cx, dest.BeginWriting(), s, len);
(In reply to neil@parkwaycc.co.uk from comment #37)
> I overlooked that this is probably overkill; SetLength calls SetCapacity
> anyway, and SetCapacity allows for and writes the null terminator, so you
> can just write something like this:

Ah interesting. This won't work atm because FakeString's SetLength implementation doesn't behave like this, but we could change that to match ns*String more...
Flags: needinfo?(jdemooij)
Depends on: 1041140
(In reply to Jan de Mooij [:jandem] from comment #38)
> Ah interesting. This won't work atm because FakeString's SetLength
> implementation doesn't behave like this, but we could change that to match
> ns*String more...

Neil fixed this in bug 1041140; clearing needinfo.
Flags: needinfo?(jdemooij)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.