Bug 1449849 Comment 8 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5)

> > Is the problem that the garbage collector runs on a different thread and can free the JS string during the call into C++?
> 
> No, the problem is that the GC might run due to something that the C++ or something later in the binding implementation does and move the JS string in memory during the call into C++.

OK, so not a threading problem but a problem of potentially any call from C++ to JS API causing the string to move? 

> We could try to optimize out the copy in cases when we know GC is not possible after we start referencing the string data.  In practice this would mean:
> 
> 1) We could only do this for the last argument to the method (or the argument to a setter), because processing of a later argument can nearly always run GC.  There are some exceptions, but they're a little complicated.

My primary use cases for this in the JS-to-DOM direction are:

1) TextDecoder.encodeInto()
2) The setter and the methods that manipulate the 'data' member of a DOM text node
3) TextDecoder.encode()
4) Implementation internals of bug 1449861 (WebIDL string type that shows UTF-8 to the C++ code).

Of these, TextDecoder.encodeInto() also has a later argument after the string: The Uint8Array to write into. The C++ code has to call ComputeLengthAndData() on the Uint8Array. Is that enough to potentially cause the JS engine to move the string data around?

As noted in comment 0, I expect the JS string data to be exposed as Span<const char> or Span<const char16_t>. That is, I don't expect this feature to provide general nsCString semantics. However, the rules for the JS string data pointed to by Span<const char> or Span<const char16_t> not going away should be clear enough for people to understand so as not to program use-after-frees.

. . .

In the DOM-to-JS data flow direction, I care mainly about the getter for the 'data' member of DOM text nodes. Other than that, there might be some opportunities on things like getters for always-ASCII (Punycode or percent-encoded) URL components. (There are a bunch of other APIs that always return ASCII, but those probably aren't perf-sensitive.)
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5)

> > Is the problem that the garbage collector runs on a different thread and can free the JS string during the call into C++?
> 
> No, the problem is that the GC might run due to something that the C++ or something later in the binding implementation does and move the JS string in memory during the call into C++.

OK, so not a threading problem but a problem of potentially any call from C++ to JS API causing the string to move? 

> We could try to optimize out the copy in cases when we know GC is not possible after we start referencing the string data.  In practice this would mean:
> 
> 1) We could only do this for the last argument to the method (or the argument to a setter), because processing of a later argument can nearly always run GC.  There are some exceptions, but they're a little complicated.

My primary use cases for this in the JS-to-DOM direction are:

1) TextEncoder.encodeInto()
2) The setter and the methods that manipulate the 'data' member of a DOM text node
3) TextEncoder.encode()
4) Implementation internals of bug 1449861 (WebIDL string type that shows UTF-8 to the C++ code).

Of these, TextEncoder.encodeInto() also has a later argument after the string: The Uint8Array to write into. The C++ code has to call ComputeLengthAndData() on the Uint8Array. Is that enough to potentially cause the JS engine to move the string data around?

As noted in comment 0, I expect the JS string data to be exposed as Span<const char> or Span<const char16_t>. That is, I don't expect this feature to provide general nsCString semantics. However, the rules for the JS string data pointed to by Span<const char> or Span<const char16_t> not going away should be clear enough for people to understand so as not to program use-after-frees.

. . .

In the DOM-to-JS data flow direction, I care mainly about the getter for the 'data' member of DOM text nodes. Other than that, there might be some opportunities on things like getters for always-ASCII (Punycode or percent-encoded) URL components. (There are a bunch of other APIs that always return ASCII, but those probably aren't perf-sensitive.)

Back to Bug 1449849 Comment 8