Open Bug 1578396 Opened 5 years ago Updated 2 years ago

Use SIMD and avoid intermediate allocations in impl FromJSValConvertible for String

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

Details

Conversion from JSString to Rust String should JS_EncodeStringToUTF8BufferPartial use from bug 1561567.
https://searchfox.org/mozilla-central/rev/e04021f29e6d8a37753ba2b510432315ce05a8d7/js/rust/src/conversions.rs#561

TBD: Should the outgoing string be allocated to for the worst case up front, should there be a worst-case intermediate buffer anyway, or should the string potentially be reallocated during the conversion?

(In reply to Henri Sivonen (:hsivonen) from comment #0)

TBD: Should the outgoing string be allocated to for the worst case up front, should there be a worst-case intermediate buffer anyway, or should the string potentially be reallocated during the conversion?

(Computing the exact required length up front is probably the least efficient solution.)

Priority: -- → P2

Is this a 100%-general code path, or are the incoming strings' contents typically going to be limited in any manner?

If the incoming string is small enough, allocating for the worst case up front doesn't seem crazy. But past some threshold, at least for Latin-1 strings, maybe start by assuming all-ASCII, then if that turns out wrong reallocate to an exact length? Possibly even do the same for two-byte strings as well.

I suggest doing this:

  1. Check the string's status bits for the Latin1-only status.
  2. If Latin1-only, allocate malloc_good_size(jsstring->length() * 2), otherwise malloc_good_size(jsstring->length() * 3).
  3. Convert with JS_EncodeStringToUTF8BufferPartial.
  4. Pass the number of bytes actually written to malloc_good_size.
  5. Subtract what it returns from the allocation size.
  6. If the delta exceeds a magic threshold, allocate a buffer for the smaller size and memcpy the UTF-8 over.

Not sure if we already expose malloc_good_size to SpiderMonkey-visible Rust.

The reallocation here looks bad for the ASCII case, but if we wanted to avoid it, we'd either a) walk the rope twice doing UTF-8 math twice (which, if previous experience from XPCOM strings is any indication, would be more expensive than malloc + memcpy) or b) hide the allocation in a rope linearization step.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.