(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5) > > Is the problem that the garbage collector runs on a different thread and can free the JS string during the call into C++? > > No, the problem is that the GC might run due to something that the C++ or something later in the binding implementation does and move the JS string in memory during the call into C++. OK, so not a threading problem but a problem of potentially any call from C++ to JS API causing the string to move? > We could try to optimize out the copy in cases when we know GC is not possible after we start referencing the string data. In practice this would mean: > > 1) We could only do this for the last argument to the method (or the argument to a setter), because processing of a later argument can nearly always run GC. There are some exceptions, but they're a little complicated. My primary use cases for this in the JS-to-DOM direction are: 1) TextDecoder.encodeInto() 2) The setter and the methods that manipulate the 'data' member of a DOM text node 3) TextDecoder.encode() 4) Implementation internals of bug 1449861 (WebIDL string type that shows UTF-8 to the C++ code). Of these, TextDecoder.encodeInto() also has a later argument after the string: The Uint8Array to write into. The C++ code has to call ComputeLengthAndData() on the Uint8Array. Is that enough to potentially cause the JS engine to move the string data around? As noted in comment 0, I expect the JS string data to be exposed as Span<const char> or Span<const char16_t>. That is, I don't expect this feature to provide general nsCString semantics. However, the rules for the JS string data pointed to by Span<const char> or Span<const char16_t> not going away should be clear enough for people to understand so as not to program use-after-frees. . . . In the DOM-to-JS data flow direction, I care mainly about the getter for the 'data' member of DOM text nodes. Other than that, there might be some opportunities on things like getters for always-ASCII (Punycode or percent-encoded) URL components. (There are a bunch of other APIs that always return ASCII, but those probably aren't perf-sensitive.)
Bug 1449849 Comment 8 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5) > > Is the problem that the garbage collector runs on a different thread and can free the JS string during the call into C++? > > No, the problem is that the GC might run due to something that the C++ or something later in the binding implementation does and move the JS string in memory during the call into C++. OK, so not a threading problem but a problem of potentially any call from C++ to JS API causing the string to move? > We could try to optimize out the copy in cases when we know GC is not possible after we start referencing the string data. In practice this would mean: > > 1) We could only do this for the last argument to the method (or the argument to a setter), because processing of a later argument can nearly always run GC. There are some exceptions, but they're a little complicated. My primary use cases for this in the JS-to-DOM direction are: 1) TextEncoder.encodeInto() 2) The setter and the methods that manipulate the 'data' member of a DOM text node 3) TextEncoder.encode() 4) Implementation internals of bug 1449861 (WebIDL string type that shows UTF-8 to the C++ code). Of these, TextEncoder.encodeInto() also has a later argument after the string: The Uint8Array to write into. The C++ code has to call ComputeLengthAndData() on the Uint8Array. Is that enough to potentially cause the JS engine to move the string data around? As noted in comment 0, I expect the JS string data to be exposed as Span<const char> or Span<const char16_t>. That is, I don't expect this feature to provide general nsCString semantics. However, the rules for the JS string data pointed to by Span<const char> or Span<const char16_t> not going away should be clear enough for people to understand so as not to program use-after-frees. . . . In the DOM-to-JS data flow direction, I care mainly about the getter for the 'data' member of DOM text nodes. Other than that, there might be some opportunities on things like getters for always-ASCII (Punycode or percent-encoded) URL components. (There are a bunch of other APIs that always return ASCII, but those probably aren't perf-sensitive.)