Closed Bug 1491137 Opened Last year Closed Last year

Split up the myriad JS::Compile functions into better-named functions that clearly indicate encoding in the name

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: Waldo, Assigned: Waldo)

Details

Attachments

(16 files)

12.73 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.68 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.47 KB, patch
jandem
: review+
Details | Diff | Splinter Review
2.43 KB, patch
jandem
: review+
Details | Diff | Splinter Review
6.65 KB, patch
jandem
: review+
Details | Diff | Splinter Review
2.36 KB, patch
jandem
: review+
Details | Diff | Splinter Review
4.87 KB, patch
jandem
: review+
Details | Diff | Splinter Review
6.43 KB, patch
jandem
: review+
Details | Diff | Splinter Review
17.53 KB, patch
jandem
: review+
Details | Diff | Splinter Review
4.55 KB, patch
jandem
: review+
Details | Diff | Splinter Review
9.34 KB, patch
jandem
: review+
Details | Diff | Splinter Review
10.89 KB, patch
jandem
: review+
Details | Diff | Splinter Review
14.68 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.79 KB, patch
jandem
: review+
Details | Diff | Splinter Review
6.18 KB, patch
jandem
: review+
Details | Diff | Splinter Review
19.52 KB, patch
jandem
: review+
Details | Diff | Splinter Review
We have a zillion functions named JS::Compile now, which is super-confusing when it comes to quickly recognizing which particular overload a call is calling.

And the overloads that take const char*/size_t to indicate the data, *separately* indicate that data's encoding via |options.utf8|.  Which means when you look directly at a call, you can't know the encoding it uses without tracing backward in code to the corresponding passed options.

These issues are both silly.  The various Compile functions should be named distinctly, for greater clarity.  The encoding should be part of the function name, not squirrelled into a separate argument passed to the function.  For now, until the utf8 flag is *completely* redundant with which function is called, these functions will all simply assert that the flag has the expected value.  When there are no more calls with a single consistent encoding, then it'll be time to remove the utf8 flag entirely.)

(The end goal *with respect to UTF-8 parsing* is to have Compile*Latin1 and Compile*Utf8 or similar for the various functions, that behave identically to what exists now.  Then the next step is to introduce something like Compile*Utf8Fast that will invoke all the UTF-8 parsing infrastructure that currently compiles but is not used.  Then, incrementally switch over existing users of the Compile*Utf8 functions to that.)

Patches for a bunch of renaming to follow.  I haven't made these changes to *every* single compilation/evaluation function yet, but I've touched a lot of them, and not reason not to get out there what's done so far, because it makes searching for uses of overloads much easier *right now*.
There's a certain ambiguity about whether Utf8 applies to the provided *path*, or to the contents of the file at that path.  I tried to be very clear about the distinction in the doc-comment.  If you have a better naming suggestion, please do tell; I'm not wedded to the current tack.
Attachment #9008886 - Flags: review?(jdemooij)
I haven't locally renamed the JS::Compile that takes SourceBufferHolder& to JS::CompileSourceBuffer or similar.  I think probably we want to have JS::Compile{Utf8,Utf16}Buffer once SourceBufferHolder has been split into Utf8SourceBufferHolder and Utf16SourceBufferHolder, because we definitely want differently-typed source buffers to use differently-typed overloads, but I'm not sure.  For now, just the minimal rename.
Attachment #9008887 - Flags: review?(jdemooij)
There are a number of these overloads that are only called with one specific encoding being used.  When I see them, I try to rename to have *only* the one used encoding available.

Ultimately that one used encoding ought to be UTF-8 everywhere, but for this bug's purposes, I am not putting in any serious effort to change callers to make them massage their data into a particular encoding they weren't already using.
Attachment #9008891 - Flags: review?(jdemooij)
If I had to guess, these were in place because someone was a completist and not because we actually use them.  Doesn't seem good enough reason to keep them, to me, because basically they're pure convenience functions.  We ought to be pushing people to APIs that take source text exactly.  (Or at some future time, that take source-text generator objects of some sort, to enable incremental parsing, but we're not there now.)
Attachment #9008893 - Flags: review?(jdemooij)
Still not super-happy about this name because of "is the path UTF-8 or the contents of the file at that path UTF-8", still open to suggestions on it.
Attachment #9008894 - Flags: review?(jdemooij)
Comment on attachment 9008883 [details] [diff] [review]
Split the JS::Compile that takes const char*/size_t into JS::CompileUtf8 and JS::CompileLatin1 that assert |options.utf8| has consistent value, and a deprecated JS::Compile overload that delegates to one of those as appropriate

Review of attachment 9008883 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/CompilationAndEvaluation.cpp
@@ +193,5 @@
>  JS_CompileScript(JSContext* cx, const char* ascii, size_t length,
>                   const JS::CompileOptions& options, MutableHandleScript script)
>  {
> +    MOZ_ASSERT(std::all_of(ascii, ascii + length, mozilla::IsAscii<char>),
> +               "JS_CompileScript only accepts ASCII characters");

So actually this is wrong.  Despite the clear name of the parameter, *at least* ipc/testshell/XPCShellEnvironment.cpp calls this function passing bytes that -- by dint of having been passed to JS_BufferIsCompilableUnit previously -- have been previously treated as UTF-8.

So it seems like trying to assert/enforce ASCII validity on this parameter is wrong, and this function should be reduced down to just the return-statement in it.  So it goes.

I guess maybe I should try to break this function up into Latin1/Utf8 variants, too, atop the patches posted here.
Comment on attachment 9008883 [details] [diff] [review]
Split the JS::Compile that takes const char*/size_t into JS::CompileUtf8 and JS::CompileLatin1 that assert |options.utf8| has consistent value, and a deprecated JS::Compile overload that delegates to one of those as appropriate

Review of attachment 9008883 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/CompilationAndEvaluation.h
@@ +165,2 @@
>  extern JS_PUBLIC_API(bool)
> +CompileUtf8(JSContext* cx, const ReadOnlyCompileOptions& options,

Maybe UTF8 (all caps) for consistency with JS::UTF8Chars, JS_ReportErrorUTF8, JS_NewStringCopyUTF8*? Although OTOH we use Latin1 instead of LATIN1 so maybe we should standardize on Utf8.

@@ +169,5 @@
> +/**
> + * Compile the provided Latin-1 data (i.e. each byte directly corresponds to
> + * the same Unicode code point) into a script.
> + *
> + * The |options.utf8| flag is asserted to be true.  At a future time this flag

s/true/false/
Attachment #9008883 - Flags: review?(jdemooij) → review+
Attachment #9008885 - Flags: review?(jdemooij) → review+
Attachment #9008886 - Flags: review?(jdemooij) → review+
Attachment #9008887 - Flags: review?(jdemooij) → review+
Attachment #9008891 - Flags: review?(jdemooij) → review+
Attachment #9008893 - Flags: review?(jdemooij) → review+
Comment on attachment 9008894 [details] [diff] [review]
Rename the JS::Compile function that accepts const char* path to JS::CompileUtf8Path, because every caller of it indicates the file contents should be interpreted as UTF-8

Review of attachment 9008894 [details] [diff] [review]:
-----------------------------------------------------------------

Nice cleanup.

::: js/public/CompilationAndEvaluation.h
@@ +236,2 @@
>  extern JS_PUBLIC_API(bool)
> +CompileUtf8Path(JSContext* cx, const ReadOnlyCompileOptions& options,

Agreed it's a bit ambiguous but I can't think of a better name. It's probably fine; it's not the kind of API that's used all over the place.

(Maybe CompileUtf8{With,From}Path? Not sure if that's any clearer. I guess CompilePathAsUtf8 breaks symmetry with similar function names. Maybe at some point Utf8 becomes the default and this will be just CompilePath?)
Attachment #9008894 - Flags: review?(jdemooij) → review+
Frankly I'm surprised at this UTF-8 assumption already existing (and having somehow been believed to be permissible).  But it does.  And apparently it was.  At least this is one part of the Augean stables requiring no cleaning...

Ideally some sort of UTF-8 data type would be used, either in addition to or in replacement of the name, but we don't have such yet, so what can you do.  This rename is usefully clarifying now, and that seems plenty of justification.
Attachment #9009511 - Flags: review?(jdemooij)
Things you notice when your APIs are organized even a little.

Maybe one or two of those CompileLatin1's in tests could be CompileUtf8, but I tried not to make the change unless it should be basically clearly correct from context, without downstream effect from using those options elsewhere.
Attachment #9009512 - Flags: review?(jdemooij)
It was cool to forward-declare JS::Value right up til that inline function got added -- but then once you add it and *don't* have the MutableHandle<T>-related specializations for T=JS::Value, everything explodes.  So, time to drag it in.

Converting users to the correct encoding-centric version is left for a future patch.
Attachment #9009516 - Flags: review?(jdemooij)
Keywords: leave-open
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9161d07ce4e6
Split the JS::Compile that takes const char*/size_t into JS::CompileUtf8 and JS::CompileLatin1 that assert |options.utf8| has consistent value, and a JS::Compile that delegates to one of those as appropriate -- a step toward having entirely separate compilation functions that do direct parsing of UTF-8 without inflating first.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/4798f1e77e2c
Rename the FILE*-accepting static Compile function in compilation code to CompileFile.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/4df0e94c014b
Rename the filename-accepting static Compile function in compilation code to CompilePath.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0fe6c6c780b
Rename the SourceBufferHolder-accepting static Compile function in compilation code to CompileSourceBuffer.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/e53e72d72ccd
Rename the JS::Compile function that accepts FILE* to JS::CompileUtf8File, because every caller passes a file with UTF-8 content.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce6aa70e7631
Remove the JS::CompileForNonSyntacticScope overloads that accept FILE* or a filename because they're unused.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a5f2cfea6fc
Rename the JS::Compile function that accepts const char* path to JS::CompileUtf8Path, because every caller of it indicates the file contents should be interpreted as UTF-8.  r=jandem
Attachment #9009511 - Flags: review?(jdemooij) → review+
Attachment #9009512 - Flags: review?(jdemooij) → review+
Comment on attachment 9009513 [details] [diff] [review]
Rename the JS::Evaluate that accepts a filename and is always used to process UTF-8 source text to JS::EvaluateUtf8Path

Review of attachment 9009513 [details] [diff] [review]:
-----------------------------------------------------------------

Nice
Attachment #9009513 - Flags: review?(jdemooij) → review+
Attachment #9009516 - Flags: review?(jdemooij) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f2ccbeb047d
Rename JS_BufferIsCompilableUnit to JS_Utf8BufferIsCompilableUnit, consistent with how it already interprets that data, and propagate that presumption of UTF-8-ness a little bit further along into callers.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/34cc4a8c46c0
Remove JS_Compile{,UC}Script, because except for argument ordering they're exactly identical to existing JS::Compile* functions.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/48dd64a6036c
Rename the JS::Evaluate that accepts a filename and is always used to process UTF-8 source text to JS::EvaluateUtf8Path.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/06c3e3d28ece
Split the JS::Evaluate that takes const char*/size_t source data into JS::EvaluateUtf8 and JS::EvaluateLatin1 that assert |options.utf8| has consistent value, plus a JS::Evaluate that delegates to one of those two as appropriate.  r=jandem
It probably isn't great for the function name parameter to be (I think) implicitly Latin-1, but writing docs for those two functions is a little intricate and afield of these changes, so I didn't do anything about it.
Attachment #9009790 - Flags: review?(jdemooij)
Attachment #9009791 - Flags: review?(jdemooij)
Setting a useless field can have no effect on semantics, and testing a field *only* in asserts can have no effect on semantics *generally*, so as those are the only two ways we interact with the |utf8| field now, we can remove the field and make the two |setUTF8| functions vacuous.  \o/
Attachment #9009792 - Flags: review?(jdemooij)
Final patch here!  \o/  Then on to beginning to fill in sibling CompileUtf8"Fast"-styled functions that will invoke UTF-8 parsing rather than first inflating to UTF-16, and we can start to see how well everything already done works.
Attachment #9009793 - Flags: review?(jdemooij)
Attachment #9009789 - Flags: review?(jdemooij) → review+
Attachment #9009790 - Flags: review?(jdemooij) → review+
Attachment #9009791 - Flags: review?(jdemooij) → review+
Attachment #9009792 - Flags: review?(jdemooij) → review+
Comment on attachment 9009793 [details] [diff] [review]
Remove the two setUTF8(bool) CompileOptions functions now that the underlying field they alter is gone

Review of attachment 9009793 [details] [diff] [review]:
-----------------------------------------------------------------

Cool 👏
Attachment #9009793 - Flags: review?(jdemooij) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/378a8a44cfe2
Remove JS::Evaluate that takes const char*/size_t, rewriting all callers to use JS::Evaluate{Latin1,Utf8}, including opting some callers in to UTF-8 where they had not used it before.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/7925d1419afa
Rename JS::CompileFunction that takes const char*/size_t for function text to JS::CompileFunctionUtf8 that asserts consistent |options.utf8|.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e58082870d6
Inline ::CompileFile into its sole caller.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4a8480ff7d0
Remove TransitiveCompileOptions::utf8, because its value is now only ever used to assert consistency with the correct one of a UTF-8/Latin-1 API being used.  r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/004799a1bc5c
Remove the two setUTF8(bool) CompileOptions functions now that the underlying field they alter is gone.  r=jandem
Status: ASSIGNED → RESOLVED
Closed: Last year
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.