Closed Bug 1501155 Opened 6 years ago Closed 5 years ago

[BinAST] Use WTF-8 in string encoding

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(2 files, 1 obsolete file)

The string literal in binjs file is raw UTF-8 string which doesn't contain any escape sequence like \uXXXX, which means a lone surrogate should be representable there.
We can use WTF-8 as the string encoding in the binjs file.

related issue in binjs-ref https://github.com/binast/binjs-ref/issues/190
corresponds to https://github.com/binast/binjs-ref/pull/193
will ask review once binjs-ref is patched
Blocks: 1507368
This adds API for WTF-8 [1] that is used by BinAST parser which is fixed in Part 2.
  * add WTF8Chars
  * add AtomizeWTF8Chars
  * templatize methods/classes in CharacterEncoding.cpp to accept both UTF8Chars and WTF8Chars

given AtomizeWTF8Chars needs the exact same optimization added by bug 1494942,
I've cloned them by template.

[1] https://simonsapin.github.io/wtf-8/
Attachment #9019270 - Attachment is obsolete: true
Attachment #9025227 - Flags: review?(jwalden+bmo)
And just replaced the consumer in BinAST side.
Attachment #9025228 - Flags: review?(dteller)
Comment on attachment 9025228 [details] [diff] [review]
Part 2: Use WTF-8 as BinAST string encoding.

Review of attachment 9025228 [details] [diff] [review]:
-----------------------------------------------------------------

Most complicated patch ever :)
Attachment #9025228 - Flags: review?(dteller) → review+
Comment on attachment 9025227 [details] [diff] [review]
Part 1: Add AtomizeWTF8Chars.

Review of attachment 9025227 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/CharacterEncoding.cpp
@@ +210,2 @@
>   */
> +template <typename InputCharsT>

Use class for this, because the parameter *is* always a class.  Same for the other InputCharsT below.

@@ +214,2 @@
>  {
>      MOZ_ASSERT(1 <= utf8Length && utf8Length <= 4);

Add

  static_assert(std::is_same<InputCharsT, UTF8Chars>::value ||
                std::is_same<InputCharsT, WTF8Chars>::value,
                "must be either UTF-8 or WTF-8");

at top here.

@@ +404,2 @@
>  static void
> +CopyAndInflateUTF8IntoBuffer(JSContext* cx, const InputCharsT src, CharT *dst, size_t outlen, bool allASCII)

Put the * by CharT while you're touching.
Attachment #9025227 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/mozilla-central/rev/4ba2f019ce6d
https://hg.mozilla.org/mozilla-central/rev/b1a6e2052ea1
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: