Closed Bug 1472066 Opened 6 years ago Closed 6 years ago

Instantiate TokenStreamCharsBase for UTF-8 source text

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(4 files)

Start at the lowest levels, gradually work upward.

There are a few functions I probably will want to move around the hierarchy some, still -- consumeRestOfSingleLineComment could be in TokenStreamCharsBase if it didn't do line/column updating -- but for the most part what is in TSCB after this bug should be defined properly for both encodings.
FWIW some of these patches *will* add UTF-8 parsing beyond the trivial case for ASCII.  Such cases will stick out like a sore thumb pretty well, and they'll be easy to clean up at some future time once we know the primitives we want, the outputs of those primitives, and similar real-world understanding.
Attachment #8988647 - Flags: review?(arai.unmht)
It's possible int32_t should become OptionalCodeUnit at some point, with a function you can call on it to get Utf8Unit or char16_t as the actual unit type, but this is forward progress for now.
Attachment #8988648 - Flags: review?(arai.unmht)
Attachment #8988647 - Flags: review?(arai.unmht) → review+
Comment on attachment 8988648 [details] [diff] [review]
Add some helper functions to assist in dealing with the non-integral Utf8Unit code unit type, for UTF-8 source code

Review of attachment 8988648 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.h
@@ +1162,5 @@
> +    /**
> +     * Convert a non-EOF code unit returned by |getCodeUnit()| or
> +     * |peekCodeUnit()| to a CharT code unit.
> +     */
> +    inline CharT toCharT(int32_t value);

can we use `codeUnit` or `codeUnitValue` or something as parameter name?
so that it's clear what it is, in the definition below.
Attachment #8988648 - Flags: review?(arai.unmht) → review+
Attachment #8988649 - Flags: review?(arai.unmht) → review+
Attachment #8988650 - Flags: review?(arai.unmht) → review+
Depends on: 1426909
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/234fc6b955c9
Specialize TokenStreamCharsBase::fillCharBufferWithTemplateStringContents for char16_t now that its alternative UTF-8 implementation will have to function a bit differently to write data into a char16_t charBuffer.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/29dc2c520713
Add some helper functions to enable (once the non-integral Utf8Unit code unit type lands, soon) dealing with Utf8Unit as the type of UTF-8 source code in addition to char16_t for UTF-16 source code.  r=arai
Landed the various parts I could that don't need bug 1426909 yet, leaving open for that development...
Keywords: leave-open
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1031a09274e0
Define UTF-8-specific versions of certain tokenizing helper functions, complementing the existing UTF-16 versions.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe4e166eea33
Instantiate TokenStreamCharsBase for UTF-8 source text.  r=arai
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/1031a09274e0
https://hg.mozilla.org/mozilla-central/rev/fe4e166eea33
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: