Closed Bug 1416781 Opened 7 years ago Closed 7 years ago

Add explanation about encoding variants of error reporting API in jsapi.h

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(1 file, 1 obsolete file)

now error reporting API has 4 encoding variants (UTF8, ASCII, Latin, UC),
and it should be confusing.

I'm about to add explanation comment in jsapi.h.
Added explanation for encoding variants of JS_ReportError* and JS_ReportErrorNumber*.
(it took some time to remember the plan etc, so I'd like to note things there)

with my understanding, we're going to have only 2 variants, UTF-8 and ASCII, eventually.
and Latin1 is only temporary thing.

and also, UC variant can also be removed by just removing the conversion to caller-side (different encoding for format string and arguments won't be nice)
Attachment #8927838 - Flags: review?(jwalden+bmo)
One thing somewhat worth noting: moving conversions caller-side means callers pay a conversion cost *even when the report is a warning that doesn't result in an error*.

So, when we included a decompiled-this in a warning message about an object not being extensible, when adding a new property (adds aren't errors if this happens in non-strict code) -- decompiling |this| could be inordinately expensive for no reason at all because we didn't actually warn.  (There was a bug on this recently

To be sure, the conversion cost is probably smallish O(N).  And when it's big, it may well be smaller than the cost to generate the param-string at all.  But it seems unideal to have API that enforces such costs.

I don't have good ideas for a better tack to take on this.  I haven't had any good ideas in the past, but thinking now I have a crazy idea that *might* work.  (Brace yourself.)  We could have the API be variadic-template functions, that for parameter-strings you pass lambdas to, that are only called to generate error-strings if an error/warning actually is reported.  You could throw templates even harder at it so the parameter-to-string conversion was hidden away for simple types -- like literal strings, numeric types, maybe even Rooted<T>&.  And lambdas would only be used for dynamic-generation stuff.

This is *such* a C++ solution to the problem.  :-D

Anyway, even if we're not dealing with it here (and shouldn't), I thought I'd flag the issue, given it's salient and someone else should be thinking about this beyond just me.  Even if for now we'll just keep building on the current API with this kind of flaw.
interesting idea :)

indeed we're converting Value/JSString etc into string before calling error reporting function.
(for UC case, it's only about error, not warning, so this case is not so critical, but I agree that in more general case it's better not moving the conversion to caller-side)

there's one thing that I worry about, that is, the code duplication because of having multiple template instances for each callsite (I'm not sure how much compiler can handle such case nicely tho).
If this can be implemented somehow without so much code duplication, I think we can go with that idea.
Comment on attachment 8927838 [details] [diff] [review]
Add explanation about encoding variants of error reporting APIs.

Review of attachment 8927838 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing too horribly wrong in some senses, just English nitpicks, and some amount of maybe-stylistic things, and certain fastidiousness about exactly what API documentation should and shouldn't promise.

::: js/src/jsapi.h
@@ +5608,5 @@
>  
>  /*
>   * Error reporting.
> + *
> + * There are 4 encoding variants for error reporting API:

I would spell out "four" and s/for error/for the error/.

@@ +5611,5 @@
> + *
> + * There are 4 encoding variants for error reporting API:
> + *   UTF-8
> + *     JSAPI's default encoding for error handling.  Use this when the encoding
> + *     of error message, format string, and arguments, is UTF-8, except for

s/of error/of the error/ and s/arguments,/arguments/.

@@ +5612,5 @@
> + * There are 4 encoding variants for error reporting API:
> + *   UTF-8
> + *     JSAPI's default encoding for error handling.  Use this when the encoding
> + *     of error message, format string, and arguments, is UTF-8, except for
> + *     the case all of them are fully in ASCII range.

Get rid of the ", except...range" bit.  There is no problem passing in all-ASCII stuff here, and indeed in many cases that's always going to be what happens -- just, that isn't known statically and all.  Leave the all-ASCII bit to the ASCII variant description, I think.

@@ +5614,5 @@
> + *     JSAPI's default encoding for error handling.  Use this when the encoding
> + *     of error message, format string, and arguments, is UTF-8, except for
> + *     the case all of them are fully in ASCII range.
> + *   ASCII
> + *     Equivalent to UTF-8, with extra assertion about encoding being ASCII.

s/with.*ASCII/but also asserts that the error message, format string, and arguments are all ASCII/

@@ +5617,5 @@
> + *   ASCII
> + *     Equivalent to UTF-8, with extra assertion about encoding being ASCII.
> + *     Use this when the encoding of error message, format string, and
> + *     arguments, is fully in ASCII range.
> + *     (currently there's no perf difference between UTF-8 and ASCII)

The "Use this" sentence can be removed, with the above change.

As to perf differences, I don't think we need to mention anything about this, and indeed probably shouldn't because it ostensibly boxes us in as to our actual perf.  If we say they're the same, someone's going to come along and hold us to it.

So maybe, then, after the modified first sentence, we should just say

"""
Because ASCII is a subset of UTF-8, any use of this encoding variant *could* be replaced with use of the UTF-8 variant.  This variant exists solely to double-check the developer's assumption that all these strings truly are ASCII, given that UTF-8 and ASCII strings regrettably have the same C++ type.
"""

@@ +5620,5 @@
> + *     arguments, is fully in ASCII range.
> + *     (currently there's no perf difference between UTF-8 and ASCII)
> + *   UC = UTF-16
> + *     Use this when arguments are UTF-16.  Format string should be UTF-8.
> + *     (there is extra cost in conversion to UTF-8)

s/Format/The format/ and s/should/must/

I'm not quite sure where the "extra cost" bit is going.  If this is implying the difference in format-string encoding is motivated by performance, it doesn't really seem necessary to explain our implementation choice -- it just *is*, and people get what they're going to get.  And if so, I would leave this parenthetical off.

@@ +5623,5 @@
> + *     Use this when arguments are UTF-16.  Format string should be UTF-8.
> + *     (there is extra cost in conversion to UTF-8)
> + *   Latin1 (planned to be removed)
> + *     This can be used when the encoding is unknown, but please use other
> + *     variant as long as possible.

Counterpropose this string instead?

"""
In this variant, all strings are interpreted byte-for-byte as the corresponding Unicode codepoint.  This encoding may *safely* be used on any null-terminated string, regardless of its encoding.  (You shouldn't *actually* be uncertain, but in the real world, a string's encoding -- if promised at all -- may be more...aspirational...than reality.)  This encoding variant will eventually be removed -- work to convert your uses to UTF-8 as you're able.
"""
Attachment #8927838 - Flags: review?(jwalden+bmo) → review-
Thank you for reviewing!
here's updated patch.
Attachment #8927838 - Attachment is obsolete: true
Attachment #8929399 - Flags: review?(jwalden+bmo)
Priority: -- → P2
Attachment #8929399 - Flags: review?(jwalden+bmo) → review+
(In reply to Tooru Fujisawa [:arai] from comment #3)
> there's one thing that I worry about, that is, the code duplication because
> of having multiple template instances for each callsite (I'm not sure how
> much compiler can handle such case nicely tho).

Worth noting: we do all the conversion now at every callsite, differently, *already*.  Moving it behind templates doesn't change any of that work that's done.  (And of course template-processing a constant-string argument is going to be a no-op even if you pass it through a bunch of template code, once the optimizer boils away all the trivial functions.)

What would change, would be that some small amount of the if-a-warning-and-not-warning-do-nothing would be present in callers.  That likely could be out-of-lined, so the effective hit would be something like one more function call at each callsite.

That cost seems not horrible to me, I think.  And it might be once we start doing it, there's room to trim harder somehow.  I have concepts in my head, but actually implementing them may reveal extra simplifications possible, or possible after the first pass.

I would note, there *may* be one further problem to doing this.  Namely, the entirety of the error-reporting stack must deal with varargs and va_list.  And it may well not be possible to forge a va_list from variadic template arguments.  There are things you simply can't do with varargs, that may have to be fixed to attack this.
Pushed by arai_a@mac.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/70071e79c967
Add explanation about encoding variants of error reporting APIs. r=jwalden
https://hg.mozilla.org/integration/mozilla-inbound/rev/70071e79c967d3d2481efd3159f007b1bad1a71f
Bug 1416781 - Add explanation about encoding variants of error reporting APIs. r=jwalden
https://hg.mozilla.org/mozilla-central/rev/70071e79c967
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: