Closed Bug 1289050 Opened 8 years ago Closed 8 years ago

Clarify input/output character encoding of error reporting API.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- affected
firefox52 --- fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(24 files, 19 obsolete files)

185.67 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
16.75 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
16.52 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
14.70 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.36 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
167.53 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
94.41 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
71.08 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
50.93 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.71 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
3.07 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
23.85 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.28 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
7.03 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.74 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.48 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
7.61 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
9.29 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.84 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
2.35 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
3.61 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
16.52 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
3.27 KB, patch
jgilbert
: review+
Details | Diff | Splinter Review
11.37 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
Error reporting API like JS_ReportError* have 2 variants, ASCII and Unicode.
but ASCII variants actually accept Latin1 in several cases,
and also in some case strings with more random encoding are passed.

we're going to change internal encoding to UTF-8, and also provide UTF8 variant,
but before that, we need to clarify current API and fix some callsites that passes even non-Latin1 encoding.
Depends on: 1289051
Depends on: 1295017
Assignee: nobody → arai.unmht
After replacing JS_ReportError things, we also need to fix functions used to generate strings for error reporting, like converting type, variable name, or JSString to char*.
most functions use encodeLatin1 at the last step, that results in replacing JS_ReportError to Latin1 variant in, for example, frontend.
I think it would be better fixing them in other bugs, after this.
maybe at the same time as filename encoding (bug 987069)
Prepared 14 patches, to replace ASCII variants callsites to UTF8 or Latin1.

UTF8 variants will be added by bug 1295017, but it's now not efficient, as it converts string to UTF-16 internally.  that will be fixed by bug 1283710.  So I'll land them at once to avoid performance impact.


In part 1, replaced:
  * JS_ReportError with ASCII string literal
  * JS_ReportError with ASCII string literal, formatting non-string parameters (%u and %d)
to JS_ReportErrorUTF8 with same arguments

will ask review for other parts after this.
Attachment #8789969 - Flags: review?(jwalden+bmo)
Comment on attachment 8789969 [details] [diff] [review]
Part 1: Use UTF8 variant of JS_ReportError in simple case.

Review of attachment 8789969 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/TestingFunctions.cpp
@@ +1807,5 @@
>  testingFunc_bailAfter(JSContext* cx, unsigned argc, Value* vp)
>  {
>      CallArgs args = CallArgsFromVp(argc, vp);
>      if (args.length() != 1 || !args[0].isInt32() || args[0].toInt32() < 0) {
> +        JS_ReportErrorUTF8(cx, "Argument must be a positive number that fits an int32");

s/fits an/fits in an/

perhaps in a followup mini-patch

@@ +2078,5 @@
> +            JS_ReportErrorUTF8(cx,
> +                               "the first argument argument must be maxBytes, "
> +                               "maxMallocBytes, gcStackpoolLifespan, gcBytes or "
> +                               "gcNumber");
> +            JS_ReportErrorUTF8(cx, "clonebuffer setter requires a single string argument");

Uh, what.  Let's fix these into a single report in a followup?  I'd suggest how to fix here, in a mini-followup, except Splinter doesn't give enough context to do it.
Attachment #8789969 - Flags: review?(jwalden+bmo) → review+
Replaced:
  * JS_ReportError with ASCII string literal, formatting ASCII string
  * JS_ReportError with const char* with ASCII string
  * JS_ReportError with ASCII string literal + macro
to JS_ReportErrorUTF8 with same arguments
Attachment #8790515 - Flags: review?(jdemooij)
Replaced:
  * JS_ReportError with ASCII string literal, formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorUTF8 with same ASCII string literal, formatting UTF8 string encoded by JSAutoByteString.encodeUtf8
Attachment #8790516 - Flags: review?(evilpies)
FileAsTypedArray and FileAsString now receives `const char* pathname` parameter with Latin1 encoding (actually, raw binary encoding), but we need it to be original JSString, and encode it to Latin1 for fopen, and encode it to UTF8 for error reporting.

so, changed them to receive JS::HandleString, and encode it inside them.
I changed most JS_ReportError* to JS_ReportError*UTF8,
except the case it formats the result of strerror, as its encoding can be non-UTF8.
Attachment #8790517 - Flags: review?(sphink)
Replaced JS_ReportWarning with ASCII string literal to JS_ReportWarningUTF8.

Also, in dom/canvas/WebGLContextUtils.cpp, replaced JS_ReportWarning with ASCII string literal, formatting unknown encoding, to use JS_ReportWarningLatin1, this at least keeps current behavior.

I'll file a separated bug to check and replace if needed.
Attachment #8790539 - Flags: review?(bugs)
Replaced:
  * JS_ReportErrorNumber without any formatting
to JS_ReportErrorNumberUTF8
Attachment #8790540 - Flags: review?(terrence)
Replaced:
  * JS_ReportErrorNumber, formatting ASCII string literal
to JS_ReportErrorNumberUTF8
Attachment #8790541 - Flags: review?(shu)
Replaced:
  * JS_ReportErrorNumber, formatting const char* with ASCII string
to JS_ReportErrorUTF8 with same arguments
Attachment #8790542 - Flags: review?(sphink)
Replaced:
  * JS_ReportErrorNumber, formatting Latin1 encoded string
to JS_ReportErrorNumberLatin1
Attachment #8790544 - Flags: review?(jdemooij)
Replaced:
  * JS_ReportErrorNumber, formatting strerror result
to JS_ReportErrorNumberLatin1

actually we'd better converting it to some known encoding, but for now this keeps current behavior.
Attachment #8790545 - Flags: review?(jwalden+bmo)
Replaced:
  * JS_ReportErrorNumberVA formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorNumberUTF8VA, formatting UTF8 string encoded by JSAutoByteString.encodeUtf8
Attachment #8790546 - Flags: review?(bugs)
Replaced:
  * JS_ReportErrorFlagsAndNumber without formatting
  * JS_ReportErrorFlagsAndNumber formatting ASCII string
to JS_ReportErrorFlagsAndNumberUTF8

Also replaced:
  * JS_ReportErrorFlagsAndNumber formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorFlagsAndNumberUTF8 formatting UTF8 string encoded by JSAutoByteString.encodeUtf8

Also replaced:
  * JS_ReportErrorFlagsAndNumber formatting Latin1 string
to JS_ReportErrorFlagsAndNumberLatin1
Attachment #8790547 - Flags: review?(evilpies)
error reporting in Parser/TokenStream uses Latin1 encoding now, so changed ArgumentsAreASCII to ArgumentsAreLatin1 for ExpandErrorArgumentsVA function call.
Attachment #8790548 - Flags: review?(till)
finally, this removes ASCII variants.
Attachment #8790549 - Flags: review?(jwalden+bmo)
Attachment #8790515 - Flags: review?(jdemooij) → review+
Comment on attachment 8790544 [details] [diff] [review]
Part 9: Use Latin1 variant of JS_ReportErrorNumber after encoding in Latin1.

Review of attachment 8790544 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Interpreter.cpp
@@ +4975,5 @@
>      MOZ_ASSERT(errorNumber == JSMSG_UNINITIALIZED_LEXICAL ||
>                 errorNumber == JSMSG_BAD_CONST_ASSIGN);
>      JSAutoByteString printable;
>      if (ValueToPrintable(cx, IdToValue(id), &printable))
> +        JS_ReportErrorNumberLatin1(cx, GetErrorMessage, nullptr, errorNumber, printable.ptr());

Not for this bug, but many of these *should* be non-Latin1 right, for non-Latin1 identifiers?
Attachment #8790544 - Flags: review?(jdemooij) → review+
See Also: → 1302358
Thank you for reviewing :)

(In reply to Jan de Mooij [:jandem] from comment #18)
> Comment on attachment 8790544 [details] [diff] [review]
> Part 9: Use Latin1 variant of JS_ReportErrorNumber after encoding in Latin1.
> 
> Review of attachment 8790544 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/Interpreter.cpp
> @@ +4975,5 @@
> >      MOZ_ASSERT(errorNumber == JSMSG_UNINITIALIZED_LEXICAL ||
> >                 errorNumber == JSMSG_BAD_CONST_ASSIGN);
> >      JSAutoByteString printable;
> >      if (ValueToPrintable(cx, IdToValue(id), &printable))
> > +        JS_ReportErrorNumberLatin1(cx, GetErrorMessage, nullptr, errorNumber, printable.ptr());
> 
> Not for this bug, but many of these *should* be non-Latin1 right, for
> non-Latin1 identifiers?

yes, filed bug 1302358.
Comment on attachment 8790542 [details] [diff] [review]
Part 8: Use UTF8 variant of JS_ReportErrorNumber in non-simple cases.

Review of attachment 8790542 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsnum.cpp
@@ +886,5 @@
>      }
>  
>      ToCStringBuf cbuf;
>      if (char* numStr = NumberToCString(cx, &cbuf, prec, 10))
> +        JS_ReportErrorNumberUTF8(cx, GetErrorMessage, nullptr, JSMSG_PRECISION_RANGE, numStr);

I guess calling this utf8 is safest? I don't see how it could be non-latin1, but it's pretty nonlocal; maybe there's a utf8 thousands separator or something.
Attachment #8790542 - Flags: review?(sphink) → review+
Attachment #8790517 - Flags: review?(sphink) → review+
from IRC discussion,
I'll add explicit ASCII variants and use them if input is always ASCII.
Attachment #8789969 - Attachment is obsolete: true
Attachment #8790515 - Attachment is obsolete: true
Attachment #8790516 - Attachment is obsolete: true
Attachment #8790517 - Attachment is obsolete: true
Attachment #8790539 - Attachment is obsolete: true
Attachment #8790540 - Attachment is obsolete: true
Attachment #8790541 - Attachment is obsolete: true
Attachment #8790542 - Attachment is obsolete: true
Attachment #8790544 - Attachment is obsolete: true
Attachment #8790545 - Attachment is obsolete: true
Attachment #8790546 - Attachment is obsolete: true
Attachment #8790547 - Attachment is obsolete: true
Attachment #8790548 - Attachment is obsolete: true
Attachment #8790549 - Attachment is obsolete: true
Attachment #8790516 - Flags: review?(evilpies)
Attachment #8790539 - Flags: review?(bugs)
Attachment #8790540 - Flags: review?(terrence)
Attachment #8790541 - Flags: review?(shu)
Attachment #8790545 - Flags: review?(jwalden+bmo)
Attachment #8790546 - Flags: review?(bugs)
Attachment #8790547 - Flags: review?(evilpies)
Attachment #8790548 - Flags: review?(till)
Attachment #8790549 - Flags: review?(jwalden+bmo)
Depends on: 1302582
change from previous patch:
  replaced all cases to ASCII

--

Replaced:
  * JS_ReportError with ASCII string literal
  * JS_ReportError with ASCII string literal, formatting non-string parameters (%u and %d)
to JS_ReportErrorASCII with same arguments
Attachment #8790790 - Attachment is obsolete: true
Attachment #8793479 - Flags: review?(jwalden+bmo)
Comment on attachment 8793479 [details] [diff] [review]
Part 1: Use ASCII variant of JS_ReportError in simple case.

Review of attachment 8793479 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/TestingFunctions.cpp
@@ +2084,5 @@
> +            JS_ReportErrorASCII(cx,
> +                                "the first argument argument must be maxBytes, "
> +                                "maxMallocBytes, gcStackpoolLifespan, gcBytes or "
> +                                "gcNumber");
> +            JS_ReportErrorASCII(cx, "clonebuffer setter requires a single string argument");

Hmm, two reports in a row -- not kosher.  Should combine these into one report, somehow, in a followup.
Attachment #8793479 - Flags: review?(jwalden+bmo) → review+
change from previous patch:
  replaced most cases to ASCII.

I missed one case that formatting UTF-16 with "%hs", in caps/nsScriptSecurityManager.cpp.

> inline void SetPendingException(JSContext *cx, const char16_t *aMsg)
> {
>     JS_ReportError(cx, "%hs", aMsg);
> }

I think currently it's not working at all for non-ASCII case, as %hs doesn't do any encoding conversion, but just copies the data, so generated string is just corrupted.
Using Latin1 variant will at least keep current behavior.

We need to fix it separately, maybe by adding UC variant of JS_ReportError, or converting `aMsg` to UTF-8 at caller, or something.

--

for other cases, replaced:
  * JS_ReportError with ASCII string literal, formatting ASCII string
  * JS_ReportError with const char* with ASCII string
  * JS_ReportError with ASCII string literal + macro
to JS_ReportErrorASCII with same arguments
Attachment #8793522 - Flags: review?(jdemooij)
Comment on attachment 8793522 [details] [diff] [review]
Part 2: Use ASCII or Latin1 variants of JS_ReportError in not-simple cases.

Review of attachment 8793522 [details] [diff] [review]:
-----------------------------------------------------------------

Yoink.  r+ with the additional name-changes, but remove the JSClass::name and JS component loader changes and put them in separate patches, 'cause they're trickier.

::: caps/nsScriptSecurityManager.cpp
@@ +146,5 @@
>  
>    return GetOriginFromURI(uri, aOrigin);
>  }
>  
>  inline void SetPendingException(JSContext *cx, const char *aMsg)

I'd probably put ASCII into the function name, for utmost clarity at the call sites.

::: dom/plugins/base/nsJSNPRuntime.cpp
@@ +637,5 @@
>      }
>  
>      PopException();
>    } else {
> +    ::JS_ReportErrorASCII(cx, message);

And add ASCII to the ThrowJSException name.

::: js/src/builtin/Profilers.cpp
@@ +200,2 @@
>          } else if (!args[argi].isString()) {
> +            JS_ReportErrorASCII(cx, "%s: invalid arguments (string expected)", caller);

Hmm.  Not a good place to shove ASCII into a function name, at least not with the current signature.  :-\  Not sure what to do about this.

::: js/src/irregexp/RegExpEngine.cpp
@@ +1871,5 @@
>  
>      Analysis analysis(cx, ignore_case, is_ascii, unicode);
>      analysis.EnsureAnalyzed(node);
>      if (analysis.has_failed()) {
> +        JS_ReportErrorASCII(cx, analysis.errorMessage());

Would be nice if Analysis::fail were failASCII to clarify this.  (Is it really the case that there's only one caller of the function?  Weird.)

::: js/src/jscntxt.cpp
@@ +393,5 @@
>      if (!JS_GetProperty(cx, callee, "usage", &usage))
>          return;
>  
>      if (!usage.isString()) {
> +        JS_ReportErrorASCII(cx, "%s", msg);

ReportUsageErrorASCII

::: js/src/shell/js.cpp
@@ +3733,5 @@
>          return false;
>      }
>      if (!args[0].isString()) {
>          const char* typeName = InformalValueTypeName(args[0]);
> +        JS_ReportErrorASCII(cx, "expected string to compile, got %s", typeName);

Do we ever assert anywhere that JSClass::name is ASCII?  Looks like we have to if we're going to do this.  Split this bit of change into a separate patch that adds such assert, if there isn't one now?

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +404,5 @@
>      }
>  
>      if (JS_TypeOfValue(cx, NSGetFactory_val) != JSTYPE_FUNCTION) {
> +        JS_ReportErrorASCII(cx, "%s has NSGetFactory property that is not a function",
> +                            spec.get());

FileLocation::GetURIString in http://searchfox.org/mozilla-central/source/xpcom/build/FileLocation.cpp#99 appears to produce UTF-8 output in some/all cases, so I don't think this is right.  Split this bit of change into a separate patch.

::: js/xpconnect/src/Sandbox.cpp
@@ +1407,5 @@
>      if (!found)
>          return true;
>  
>      if (!value.isBoolean()) {
> +        JS_ReportErrorASCII(mCx, "Expected a boolean value for property %s", name);

Hum.  Not sure how to push "ASCII" into the option names here.  What about if you put a comment in Sandbox::Parse noting that all option names must be ASCII-only?

::: js/xpconnect/wrappers/AccessCheck.cpp
@@ +273,5 @@
>  static void
>  EnterAndThrow(JSContext* cx, JSObject* wrapper, const char* msg)
>  {
>      JSAutoCompartment ac(cx, wrapper);
> +    JS_ReportErrorASCII(cx, msg);

EnterAndThrowASCII?
Attachment #8793522 - Flags: review?(jdemooij) → review+
change from previous patch:
  one missed case where we should re-encode to UTF-8, in mozJSComponentLoader.cpp

--

Replaced:
  * JS_ReportError with ASCII string literal, formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorUTF8 with same ASCII string literal, formatting UTF8 string encoded by JSAutoByteString.encodeUtf8
Attachment #8793546 - Flags: review?(jwalden+bmo)
no changes.  there was no ASCII case.
Attachment #8793547 - Flags: review+
change from previous patch:
  replaced UTF8 variants to ASCII variants

--

Replaced JS_ReportWarning with ASCII string literal to JS_ReportWarningASCII.

Also, in dom/canvas/WebGLContextUtils.cpp, replaced JS_ReportWarning with ASCII string literal, formatting unknown encoding, to use JS_ReportWarningLatin1.
I'll file a separated bug to check and replace if needed.
Attachment #8793548 - Flags: review?(jwalden+bmo)
[Part 6]: Use ASCII variant of JS_ReportErrorNumber in simple case.

change from previous patch:
  replaced all cases to ASCII

--

Replaced:
  * JS_ReportErrorNumber without any formatting
to JS_ReportErrorNumberASCII
Attachment #8793549 - Flags: review?(jwalden+bmo)
[Part 7]: Use ASCII variant of JS_ReportErrorNumber when parameters are all static string.

change from previous patch:
  replaced all cases to ASCII

--

Replaced:
  * JS_ReportErrorNumber, formatting ASCII string literal
to JS_ReportErrorNumberASCII
Attachment #8793550 - Flags: review?(jwalden+bmo)
[Part 8]: (again) Use ASCII or UTF8 variant of JS_ReportErrorNumber in non-simple cases.

change from previous patch:
  replaced most cases to ASCII except it's formatting UTF-8 encoded string

--

Replaced:
  * JS_ReportErrorNumber, formatting const char* with ASCII string
to JS_ReportErrorASCII with same arguments

and replaced:
  * JS_ReportErrorNumber, formatting const char* with UTF8 string
to JS_ReportErrorUTF8 with same arguments
Attachment #8793551 - Flags: review?(jwalden+bmo)
no changes.

--

Replaced:
  * JS_ReportErrorNumber, formatting strerror result
to JS_ReportErrorNumberLatin1
Attachment #8793553 - Flags: review?(jwalden+bmo)
no changes.

--

Replaced:
  * JS_ReportErrorNumberVA formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorNumberUTF8VA, formatting UTF8 string encoded by JSAutoByteString.encodeUtf8
Attachment #8793554 - Flags: review?(jwalden+bmo)
change from previous patch:
  replaced UTF8 to ASCII where it's formatting ASCII

--

Replaced:
  * JS_ReportErrorFlagsAndNumber without formatting
  * JS_ReportErrorFlagsAndNumber formatting ASCII string
to JS_ReportErrorFlagsAndNumberASCII

Also replaced:
  * JS_ReportErrorFlagsAndNumber formatting Latin1 string
    encoded by JSAutoByteString
to JS_ReportErrorFlagsAndNumberUTF8 formatring UTF8 string encoded by JSAutoByteString.encodeUtf8

Also replaced:
  * JS_ReportErrorFlagsAndNumber formatting Latin1 string
to JS_ReportErrorFlagsAndNumberLatin1
Attachment #8793555 - Flags: review?(jwalden+bmo)
no changes.

--

error reporting in Parser/TokenStream uses Latin1 encoding now, so changed ArgumentsAreASCII to ArgumentsAreLatin1 for ExpandErrorArgumentsVA function call.
Attachment #8793556 - Flags: review?(jwalden+bmo)
Comment on attachment 8793548 [details] [diff] [review]
Part 5: Use ASCII or Latin1 variants of JS_ReportWarning.

Review of attachment 8793548 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/WebGLContextUtils.cpp
@@ +90,5 @@
>      }
>  
>      JSContext* cx = api.cx();
> +    // FIXME: Need to check encoding of fmt and ap (bug XXX).
> +    JS_ReportWarningLatin1(cx, "WebGL: %s", buf);

Can we split these two lines into a separate patch, that also renames WebGLContext::GenerateWarning to include ASCII in the name, and also appropriately audits/adjusts all callers?  I'm not comfortable landing the new thing with an XXX by it -- seems like we could lose track of it, generally.  (I guess I was okay with SetPendingException because we don't *have* a workable thing for it to call yet, at all -- whereas here we know most or maybe all things do actually work with it.)
Attachment #8793548 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793549 [details] [diff] [review]
Part 6: Use ASCII variant of JS_ReportErrorNumber in simple case.

Review of attachment 8793549 [details] [diff] [review]:
-----------------------------------------------------------------

This all reminds me that we need to get rid of the GetErrorMessage/number setup.  :-(  Also a few places should have some helper functions, like in scripted proxies to get the handler or report was-revoked, and such.
Attachment #8793549 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793553 [details] [diff] [review]
Part 10: Use Latin1 variant of JS_ReportErrorNumber for strerror.

Review of attachment 8793553 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/shell/js.cpp
@@ +1034,5 @@
> +             * Use Latin1 variant here because the encoding of the return value
> +             * of strerror function can be non-UTF-8.
> +             */
> +            JS_ReportErrorNumberLatin1(cx, my_GetErrorMessage, nullptr, JSSMSG_CANT_OPEN,
> +                                       filename, strerror(errno));

These filenames aren't necessarily Latin-1.  And if we go back to bug 987069, the assumption at some point eventually will be that they're *not* Latin-1 and are specifically UTF-8.  So maybe

"""
Filenames are in some random system encoding.  *Probably* it's UTF-8, but no guarantees.

strerror(errno)'s encoding, in contrast, depends on the user's C locale.

Latin-1 is possibly wrong for both of these -- but at least if it's wrong it'll produce mojibake *safely*.  Run with Latin-1 til someone complains.
"""

to clarify what's up with both parts, and why Latin-1 is used despite it possibly matching neither encoding.

Also, let's have a single static function in this file that we can use multiple places, to centralize this comment.
Attachment #8793553 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793550 [details] [diff] [review]
Part 7: Use ASCII variant of JS_ReportErrorNumber when parameters are all static string.

Review of attachment 8793550 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/js.msg
@@ +33,5 @@
>   *         "{0} is not a member of the {1} family")
>   *
>   * can be used:
>   *
> + * JS_ReportErrorNumberASCII(JSMSG_NOT_A_SUBSPECIES, "Rhino", "Monkey");

This might want to wait for the last patch that removes JS_ReportErrorNumber completely.
Attachment #8793550 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793546 [details] [diff] [review]
Part 3: Use UTF8 variant of JS_ReportError after re-encoding string.

Review of attachment 8793546 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ctypes/Library.cpp
@@ +165,2 @@
>  #ifdef XP_WIN
> +      JS_ReportErrorUTF8(cx, "couldn't open library %hs: %s", pathCharsUTF8.ptr(), error);

Shouldn't that first one be %s and not %hs, now?

::: js/src/jsfriendapi.cpp
@@ +1336,4 @@
>  }
>  
>  JS_FRIEND_API(void)
>  js::ReportErrorWithId(JSContext* cx, const char* msg, HandleId id)

ReportASCIIErrorWithId

This whole function is pretty dumb, seeing as it's called by only one file, but we can leave removing this bit of friendapi to another patch.

::: js/src/shell/js.cpp
@@ +1220,5 @@
>              return false;
>          args[i].setString(str);
>  
>          JSAutoByteString opt(cx, str);
>          if (!opt)

Can't you just make this

  JSAutoByteString opt;
  if (!opt.encodeUtf8(cx, str))
    return false;

and then avoid the clearing/reencoding?  Looks like everyone's interpreting the string in a way that's UTF-8-compatible.

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +167,5 @@
>      if (!buf) {
>          return NS_ERROR_OUT_OF_MEMORY;
>      }
>  
> +    JS_ReportErrorUTF8(callerContext, buf);

Uh, this is pre-existingly wrong.  It'd be a format-specifier vulnerability, even, if |ap| had any content-provided values in it -- but fortunately it only has things like file locations that are trusted.  This should be JS_ReportErrorUTF8(callerContext, "%s", buf);.

@@ +1270,3 @@
>                      return NS_ERROR_FAILURE;
>                  return ReportOnCaller(cxhelper, ERROR_GETTING_SYMBOL,
>                                        PromiseFlatCString(aLocation).get(),

This location looks temptingly UTF-8, but I would feel much, much safer if we weren't using the original user-provided value, but instead used a spec gotten from ComponentLoaderInfo::mURI.  Could you post a patch that converts all these |PromiseFlatCString(aLocation).get()| into calls to a |nsresult ComponentLoaderInfo::Location(PromiseFlatCString&s aLocation)| that does mURI->GetSpec() for a location string?  Separate this file's changes out into a separate patch, then ask me again -- and a module peer, since we're starting to go beyond surface-level changes here.

@@ +1380,5 @@
>  
>  JSCLContextHelper::~JSCLContextHelper()
>  {
>      if (mBuf) {
> +        JS_ReportErrorUTF8(mContext, mBuf);

Another not-vulnerability.

  JS_ReportErrorUTF8(mContext, "%s", mBuf);

::: js/xpconnect/src/Sandbox.cpp
@@ +931,5 @@
>          } else {
> +            name.clear();
> +            RootedString nameStr(cx, nameValue.toString());
> +            if (!name.encodeUtf8(cx, nameStr))
> +                return false;

No need for clearing/reencoding, just do

  JSAutoByteString name;
  if (!name.encodeUtf8(cx, nameValue.toString()))
    return false;

a bit above.  And given this throws an error on failure, it seems like NS_ENSURE_TRUE isn't necessary for this.

::: js/xpconnect/src/XPCShellImpl.cpp
@@ +490,5 @@
>      JS::CallArgs args = CallArgsFromVp(argc, vp);
>      ContextOptions oldContextOptions = ContextOptionsRef(cx);
>  
>      for (unsigned i = 0; i < args.length(); ++i) {
> +        RootedString str(cx, ToString(cx, args[i]));

Put the RootedString declaration outside the loop, then just assign here.

@@ +495,5 @@
>          if (!str)
>              return false;
>  
>          JSAutoByteString opt(cx, str);
>          if (!opt)

Move the declaration outside the loop, then have

  opt.clear();
  if (!opt.encodeUtf8(cx, str))
    return false;

and don't bother reencoding below.
Attachment #8793546 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793554 [details] [diff] [review]
Part 11: Use UTF8 variant of JS_ReportErrorNumberVA.

Review of attachment 8793554 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bindings/BindingUtils.cpp
@@ +91,5 @@
>  binding_detail::ThrowErrorMessage(JSContext* aCx, const unsigned aErrorNumber, ...)
>  {
>    va_list ap;
>    va_start(ap, aErrorNumber);
> +  JS_ReportErrorNumberUTF8VA(aCx, GetErrorMessage, nullptr, aErrorNumber, ap);

We should (similar to js.msg/jsshell.msg) add a build step checking dom/bindings/Errors.msg is UTF-8 -- not as important as for the ASCII thing SpiderMonkey wants, and it's doubtful someone would ever change its encoding from ASCII/UTF-8 deliberately or by accident, but worth doing.

::: dom/bindings/BindingUtils.h
@@ +1232,1 @@
>                    const char* sourceDescription)

Mind removing the body of this function while you're here?  The specializations below mean this is entirely dead code, and it seems best to remove the distraction.

@@ +1285,5 @@
>  
>  template<bool InvalidValueFatal>
>  inline int
>  FindEnumStringIndex(JSContext* cx, JS::Handle<JS::Value> v, const EnumEntry* values,
>                      const char* type, const char* sourceDescription, bool* ok)

It would be good to file a followup bug to convert this to the JSAPI signature convention of returning false on failure, filling the index in an outparam.  This is in a bunch of JSAPI-using code, and the oddball convention makes the code harder to read -- plus you have oddities like being forced to set an outparam *and* return a value that should never be examined (like |return 0| when ToString fails is a perfectly reasonable return value, but you have to know the caller will never look at it because of the |*ok = false;| to know it's fine).
Attachment #8793554 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793556 [details] [diff] [review]
Part 13: Use ArgumentsAreLatin1 in frontend error reporting.

Review of attachment 8793556 [details] [diff] [review]:
-----------------------------------------------------------------

I skimmed all of {Parser,TokenStream}.cpp, and it does look like everything passed in is ASCII -- or rarely the result of AtomToPrintableString which does Latin-1 but not by name, or encodeLatin1 which does so by name.  It's conceivable I may have missed something, but I did at least cursorily check everything.

::: js/src/frontend/TokenStream.cpp
@@ +640,5 @@
>          }
>      }
>  
>      if (!ExpandErrorArgumentsVA(cx, GetErrorMessage, nullptr, errorNumber, &err.message,
> +                                nullptr, ArgumentsAreLatin1, &err.report, args))

This appears to be correct, so this patch can land.  (I think you should send mail to js-internals, and announce on IRC, that this happened, so everyone's aware.  It wouldn't be hard for someone to accidentally encodeUtf8 an atom of some sort in a frontend error message, producing mojibake output.)

But I'd really prefer to see Latin1 in all the callers' names so that some distant caller in Parser.cpp can't accidentally pass in something non-Latin-1, for example, without having to type out some sort of warning to themselves.

However, touching up ~200 instances in Parser.cpp and ~90 instances in TokenStream.cpp (just hits for "report" in each file, with many false positives) is a bit burdensome.  And it's possible (I'm not sure how likely) I may need to touch all this soon, in pursuit of being able to parse/tokenize 8-bit character data without having to inflate to 16-bit character size.  So let's leave the naming as-is for now.  And maybe I'll clean this up after this lands -- or at least get that out of the way for subsequent fixing, if that's how it works out.
Attachment #8793556 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793557 [details] [diff] [review]
Part 14: Remove pseudo-ASCII variant of JS_ReportError*.

Review of attachment 8793557 [details] [diff] [review]:
-----------------------------------------------------------------

I assume after all the other patches this compiles.  \o/
Attachment #8793557 - Flags: review?(jwalden+bmo) → review+
Depends on: 1304970
Fixed 2 JS_ReportErrorASCII calls.

the first JS_ReportErrorASCII call is totally unrelated, so removed.

also, separated the second call into 2 cases, because, setCloneBuffer_impl is a setter, so calling it with `args.length() != 1` cannot happen unless the setter function is directly called.
calling it with non-string value can happen.
Attachment #8794503 - Flags: review?(jwalden+bmo)
spec can contain random encoding if it's "jar:" URI, as the part after "!/" is raw string from chrome.manifest.
So changed to call JS_ReportErrorLatin1.
Attachment #8794504 - Flags: review?(jwalden+bmo)
Added assertion for class name into js::InformalValueTypeName.
Attachment #8794505 - Flags: review?(jwalden+bmo)
Added ComponentLoaderInfo::Location and changed |PromiseFlatCString(aLocation).get()| to the result of |info.Location(...)|.
Attachment #8794506 - Flags: review?(jwalden+bmo)
Confirmed almost all callers pass ASCII strings, except those 2 cases:

https://dxr.mozilla.org/mozilla-central/rev/60cc643978c7020926fe4145761e26945fcd5c37/dom/canvas/WebGLProgram.cpp#1099
>             mContext->GenerateWarning("linkProgram: Failed to link, leaving the following"
>                                       " log:\n%s\n",
>                                       mLinkLog.BeginReading());

https://dxr.mozilla.org/mozilla-central/rev/60cc643978c7020926fe4145761e26945fcd5c37/dom/canvas/WebGLContextExtensions.cpp#292
>             GenerateWarning("getExtension('%s'): MOZ_ prefixed WebGL extension"
>                             " strings are deprecated. Support for them will be"
>                             " removed in the future. Use unprefixed extension"
>                             " strings. To get draft extensions, set the"
>                             " webgl.enable-draft-extensions preference.",
>                             name.get());

those `mLinkLog` and `name` can contain user provided value from JavaScript,
the value is converted from UTF16 to lower 8bit, it's not guaranteed to be ASCII.
I'll handle them in later patch.

For other cases, added ASCII to error reporting functions in dom/canvas.
Attachment #8794507 - Flags: review?(jwalden+bmo)
Changed WebGLContext::GenerateWarningASCII to use JS_ReportWarningASCII.
Attachment #8794509 - Flags: review?(jwalden+bmo)
Added WebGLContext::GenerateWarningLatin1 and changed WebGLContext::GenerateWarningASCII with va_list to WebGLContext::GenerateWarningImpl, with extra parameter for encoding.
now it supports ASCII and Latin1.

Then, changed above 2 GenerateWarning callsites to GenerateWarningLatin1.
Attachment #8794510 - Flags: review?(jwalden+bmo)
Comment on attachment 8794507 [details] [diff] [review]
Part 5.1: Add ASCII to error reporting function name in dom/canvas.

Review of attachment 8794507 [details] [diff] [review]:
-----------------------------------------------------------------

Soooooo, jgilbert is a poopyhead and sounds like he isn't okay with putting ASCII into these names to clarify to callers what encoding their strings should be.  ;-)  So probably this'll be r- by him, but I'll let him do that.

If we're not going to put ASCII into the function names, then I think I'd prefer we use the Latin1 warning function for all this.  That's safe against someone accidentally putting non-ASCII into these strings (and there are a few in here where I could imagine it happening, like the ones that cite a GLES section and currently use '$' instead of a proper section symbol).  It's not perfect, but it assuages my concern about having all these APIs not indicate their character encoding in the function names -- worst that happens is WebGL errors would just contain mojibake, not be actually unsafe.

So, r+ from me, but if jgilbert really wants to be a party pooper here, make the warning functions take Latin-1 input to be safe.  No matter how much he says he'll r- anything that adds non-ASCII to any of this, either eventually he'll leave, or he'll miss something in a trivial review.  Be safe, just make it Latin-1, and if an error message goes wrong, it's on the WebGL people's heads -- but safely.

Also, probably just group all the mini-followups in one bug, they're all pretty trivial.

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +2437,5 @@
>          JSContext* context = nsContentUtils::GetCurrentJSContext();
>          if (context) {
> +          JS_ReportWarningASCII(context,
> +                                "CanvasRenderingContext2D.createPattern()"
> +                                " failed to snapshot source canvas.");

If any of these lines get changed, I'd prefer whitespace be trailing so text lines up vertically, so move the space a line earlier.  And anywhere else in this patch, please.

::: dom/canvas/WebGL2ContextFramebuffers.cpp
@@ +88,5 @@
>          MOZ_CRASH("GFX: Bad target.");
>      }
>  
>      if (!fb)
> +        return ErrorInvalidOperationASCII("%a: Xannot modify framebuffer 0.");

jgilbert tells me -- shocker! -- this is bizarro typos.  So we need another bug on file for this.

::: dom/canvas/WebGLContextBuffers.cpp
@@ +319,5 @@
>          return;
>  
>      ////
>  
>      UniquePtr<uint8_t> zeroBuffer((uint8_t*)calloc(size, 1));

Erm, this is an allocator mismatch.  There's no guarantee that deleting a single uint8_t -- which is what this looks like it's doing -- will be equivalent to the |free| that's what's *actually* necessary to free this memory.  In principle malloc/calloc/realloc/free could use a separate heap from the global operator new/delete, which would make this code Bad.

Better than calling calloc like this, instead do

  auto zeroBuffer = MakeUnique<uint8_t[]>(size);

(Yes, this will allocate a zeroed array.  See mfbt/UniquePtr.h's comment by MakeUnique, "If Type is an array T[]".)  I guess defer this to the followup bug, too.

::: dom/canvas/WebGLContextDraw.cpp
@@ +868,5 @@
>  
>          if (!checked_byteLength.isValid() ||
>              !checked_sizeOfLastElement.isValid())
>          {
> +            ErrorInvalidOperationASCII("%s: Integer overflow occured while"

occurred

::: dom/canvas/WebGLContextValidate.cpp
@@ +137,5 @@
>      bool separate = (readOffset + size < writeOffset || writeOffset + size < readOffset);
>      if (!separate) {
> +        ErrorInvalidValueASCII("%s: ranges [readOffset, readOffset + size) and"
> +                               " [writeOffset, writeOffset + size) overlap",
> +                               info);

The definition of |bool separate| above looks like it should be using <=, not < -- at least not if it's going to be consistent with this error message.  More followuping.

@@ +161,5 @@
>      default:
>          break;
>      }
>  
> +    ErrorInvalidEnumInfoASCII(info, target);

This enclosing function is dead in my tree -- followup to remove.

::: dom/canvas/WebGLContextVertices.cpp
@@ +442,5 @@
>      /* XXX make work with bufferSubData & heterogeneous types
>       if (type != mBoundArrayBuffer->GLType())
> +     return ErrorInvalidOperationASCII("vertexAttribPointer: type must match"
> +                                       " bound VBO type: %d != %d", type,"
> +                                       " mBoundArrayBuffer->GLType());

Quoting got messed up here.

::: dom/canvas/WebGLProgram.cpp
@@ +523,5 @@
>      }
>  
>      if (StringBeginsWith(name, NS_LITERAL_STRING("gl_"))) {
> +        mContext->ErrorInvalidOperationASCII("bindAttribLocation: Can't set"
> +                                             " the  location of a name that"

Double-space in this.

::: dom/canvas/WebGLTexture.cpp
@@ +451,5 @@
>          if (incompleteReason) {
> +            mContext->GenerateWarningASCII("%s: Active texture %u for target"
> +                                           " 0x%04x is 'incomplete', and will"
> +                                           " be rendered as RGBA(0,0,0,1), as"
> +                                           " per the GLES 2.0.24 $3.8.2: %s",

Huh, this could be a proper § if we had Latin-1.  But it can stay as-is regardless what we do, in any case.
Attachment #8794507 - Flags: review?(jwalden+bmo)
Attachment #8794507 - Flags: review?(jgilbert)
Attachment #8794507 - Flags: review+
Attachment #8794503 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8794505 [details] [diff] [review]
Part 2.2: Use ASCII variant of JS_ReportError when formatting JSClass::name, and assert it is ASCII.

Review of attachment 8794505 [details] [diff] [review]:
-----------------------------------------------------------------

r+ in spirit, but if actually doing what I suggest spirals too hard, feel free to go back to the review-well on this.  :-)

::: js/src/jscntxt.cpp
@@ +317,5 @@
>      return false;
>  }
>  
> +void
> +js::AssertIsASCII(const char* s)

Rather than have a "Assertion failure: (*s & 0x80) == 0" as an error message, what about if we make this function bool js::StringIsASCII(const char* s)?  Then we could have

  MOZ_ASSERT(StringIsASCII(obj->clasp()->name));

in places like here, with far more readable message.  (And if the argument isn't informative like here, the caller can add a reason-string to the assertion to better explain -- which they can't do when the assertion is buried within an AssertIsASCII function.)

::: js/src/jsobj.cpp
@@ +91,5 @@
>  {
> +    if (v.isObject()) {
> +#ifdef DEBUG
> +        AssertIsASCII(v.toObject().getClass()->name);
> +#endif

So this is perfectly fine, but it's entirely possible someone might have a class that doesn't have an ASCII name, and it's a ticking time bomb until this gets called.  I'd like an assertion in a more central place.

Given we're looking at classes associated with *objects*, perhaps putting it in object creation code makes sense.  I guess these days we store class in ObjectGroup, so it seems like that's a fair place to put it.  How about adding this assertion to ObjectGroup::ObjectGroup, and then because ObjectGroup::clasp_ isn't const, also to ObjectGroup::setClasp?  I think that should cover the bases, or at least go much further along those lines than this does.
Attachment #8794505 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8794507 [details] [diff] [review]
Part 5.1: Add ASCII to error reporting function name in dom/canvas.

Review of attachment 8794507 [details] [diff] [review]:
-----------------------------------------------------------------

Don't use the ASCII suffix in the webgl function names. Our error reporting functions should assume ascii, and use the non-ascii versions where applicable.

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +2437,5 @@
>          JSContext* context = nsContentUtils::GetCurrentJSContext();
>          if (context) {
> +          JS_ReportWarningASCII(context,
> +                                "CanvasRenderingContext2D.createPattern()"
> +                                " failed to snapshot source canvas.");

You can do as you wish with this file, but for all other files in this patch, the existing formatting is deliberate module style. Do not alter this formatting in this patch.

::: dom/canvas/WebGL2ContextFramebuffers.cpp
@@ +88,5 @@
>          MOZ_CRASH("GFX: Bad target.");
>      }
>  
>      if (!fb)
> +        return ErrorInvalidOperationASCII("%a: Xannot modify framebuffer 0.");

Fixed with no bug.

::: dom/canvas/WebGLContextBuffers.cpp
@@ +319,5 @@
>          return;
>  
>      ////
>  
>      UniquePtr<uint8_t> zeroBuffer((uint8_t*)calloc(size, 1));

This would work, but we also have UniqueBuffer in WebGL for this purpose. (like UniquePtr, but with *alloc/free) This is bug 1306112 now.

::: dom/canvas/WebGLContextValidate.cpp
@@ +161,5 @@
>      default:
>          break;
>      }
>  
> +    ErrorInvalidEnumInfoASCII(info, target);

Will remove.
Attachment #8794507 - Flags: review?(jgilbert) → review-
Comment on attachment 8794510 [details] [diff] [review]
Part 5.3: Add WebGLContext::GenerateWarningLatin1 and call JS_ReportWarningLatin1 in dom/canvas.

Review of attachment 8794510 [details] [diff] [review]:
-----------------------------------------------------------------

These entrypoints are both plain ASCII.
Attachment #8794510 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 8794504 [details] [diff] [review]
Part 2.1: Use JS_ReportErrorLatin1 in mozJSComponentLoader::LoadModule.

Review of attachment 8794504 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +406,5 @@
>      if (JS_TypeOfValue(cx, NSGetFactory_val) != JSTYPE_FUNCTION) {
> +        /*
> +         * spec's encoding is ASCII unless it's zip file, otherwise it's
> +         * random encoding.  Latin1 variant is safe for random encoding.
> +         */

This is a bit in the weeds as a comment here.  It *should* be something documented on FileLocation::GetURIString itself.  Please add a comment by that function declaration indicating that the returned string isn't necessarily ASCII, that it may include URL-encoded versions of non-ASCII codepoints, and that non-ASCII codepoints aren't necessarily *guaranteed* to be URL-encoded -- so the overall string is encoded as Latin-1, one byte per codepoint in the returned URI string.  This should get review from whoever owns FileLocation, maybe froydnj because he has the most log-blame for the file recently.  (Ohai!)

Let me also briefly register complaints about the utter un-documentation of how our URL escaping code works, the encodings of its inputs, the encoding it produces, &c.  So grody.  It would be so nice if any of that were cleaner and better-documented.  And maybe if it didn't exist in XPCOM, which is a weird place to find URL handling code.
Attachment #8794504 - Flags: review?(nfroyd)
Attachment #8794504 - Flags: review?(jwalden+bmo)
Attachment #8794504 - Flags: review+
Comment on attachment 8794506 [details] [diff] [review]
Part 3.1: Add mozJSComponentLoader::Location and use UTF8 variant of JS_ReportError in mozJSComponentLoader.

Review of attachment 8794506 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +256,5 @@
>          mKey.emplace();
>          return mResolvedURI->GetSpec(*mKey);
>      }
>  
> +    nsresult ComponentLoaderInfo::Location(PromiseFlatCString& aLocation) {

Don't qualify the name?  Also, on second thought I'd name it GetLocation, with "get" suggesting potential fallibility.  Also add MOZ_MUST_USE before the type so nobody forgets to error-check.

@@ +258,5 @@
>      }
>  
> +    nsresult ComponentLoaderInfo::Location(PromiseFlatCString& aLocation) {
> +        nsresult rv;
> +        rv = info.EnsureURI();

Don't have |info.| in this?  And combine the assignment with the decl.

@@ +1045,5 @@
>          } else if (!targetVal.isNull()) {
>              // If targetVal isNull(), we actually want to leave targetObject null.
>              // Not doing so breaks |make package|.
> +            return ReportOnCallerUTF8(cx, ERROR_SCOPE_OBJ,
> +                                      PromiseFlatCString(registryLocation).get());

I dearly wish it were possible to know |registryLocation| were UTF-8 other than by looking from here to the header, then from the header to the .idl that says this is UTF-8.  :-(
Attachment #8794506 - Flags: review?(jwalden+bmo) → review+
To land these patches incrementally, we need to use ArgumentsAreLatin1 in existing implicit ASCII variants.
Attachment #8795950 - Flags: review?(jwalden+bmo)
Attachment #8795950 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/08eb49570c322315b53b11c51784e8938d47418b
Bug 1289050 - Part 0: Use ArgumentsAreLatin1 in implicit ASCII variants of JS_ReportErrorNumber, JS_ReportErrorNumberVA, and JS_ReportErrorFlagsAndNumber. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/78565156728208e4cb9d20f1a288ba300ceb475f
Bug 1289050 - Part 1: Use ASCII variant of JS_ReportError in simple case. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/778f50f8d2bc7876d16e4a27b13d108f1fa35684
Bug 1289050 - Part 1.1: Make error reporting cleaner in CloneBufferObject::setCloneBuffer_impl. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/e78af6564ec3b4f497f35387669aff74f963009f
Bug 1289050 - Part 2: Use ASCII or Latin1 variants of JS_ReportError in not-simple cases. r=jwalden
Keywords: leave-open
Comment on attachment 8793551 [details] [diff] [review]
Part 8: Use ASCII or UTF8 variant of JS_ReportErrorNumber in non-simple cases.

Review of attachment 8793551 [details] [diff] [review]:
-----------------------------------------------------------------

The wasm bits are a bit scrambled (maybe from 0xc?), but I assume you'll pick up the moves when rebasing.

::: js/src/asmjs/WasmBinaryToAST.cpp
@@ +121,5 @@
>      uint32_t offset = c.d.currentOffset();
>      char offsetStr[sizeof "4294967295"];
>      SprintfLiteral(offsetStr, "%" PRIu32, offset);
> +    JS_ReportErrorNumberASCII(c.cx, GetErrorMessage, nullptr, JSMSG_WASM_DECODE_FAIL,
> +                              offsetStr, str);

Nice seeing this is just dead entirely on trunk, so I don't have to check it.

::: js/src/asmjs/WasmJS.cpp
@@ +229,5 @@
>      UniqueChars error;
>      SharedModule module = Compile(*bytecode, compileArgs, &error);
>      if (!module) {
>          if (error)
> +            JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr, JSMSG_WASM_FAIL, error.get());

In tracking down what |error| might be here, I see that Decoder::fail(const char* msg, ...) is called by users that pass in printf-style formats, for example PRIu32 and PRIx32, but itself interprets arguments via JS_vsmprintf.  I also see that Decoder::fail(UniqueChars msg) is used *only* by the former Decoder::fail overload.  More followup-land: replace the PRI* format specifiers with jsprf.cpp-compatible ones literally spelled out, and inline the second overload into the first overload.

::: js/src/builtin/TypedObject.cpp
@@ +1590,5 @@
>                          HandleTypedObject obj)
>  {
>      // Serialize type string of obj
> +    RootedAtom typeReprAtom(cx, &obj->typeDescr().stringRepr());
> +    UniqueChars typeReprStr(JS_EncodeStringToUTF8(cx, typeReprAtom));

Woo!  This should fix an existing bug, if I read correctly!

js> var t = new TypedObject.StructType({ a\u1234c: TypedObject.int8 });
js> Object.defineProperty((new t), "foo", { value: 12 })
typein:2:1 TypeError: new StructType({a4c: int8}): Object is not extensible
Stack:
  @typein:2:1

@@ +1696,5 @@
>      RootedString str(cx, ValueToSource(cx, idVal));
>      if (!str)
>          return false;
>  
> +    char* propName = JS_EncodeStringToUTF8(cx, str);

Use a smart pointer to hold/free this.

Technically, ValueToSource called on strings (which this happens to be, if you track callers) only gives us back strings with odd codepoints escaped, so this won't actually have weird stuff in it.  But that's getting a bit tricksy, so this is good (and arguably better).

::: js/src/ctypes/CTypes.cpp
@@ +1349,5 @@
>  ArgumentRangeMismatch(JSContext* cx, const char* arg, const char* func,
>                       const char* range)
>  {
> +  JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr,
> +                            CTYPESMSG_ARG_RANGE_MISMATCH, arg, func, range);

|arg| appears to be the empty string in every caller, so please remove it and replace the argument in the call in a followup.

::: js/src/json.cpp
@@ +309,5 @@
>      bool foundCycle(JSContext* cx) {
>          auto addPtr = stack.lookupForAdd(obj_);
>          if (addPtr) {
> +            JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr, JSMSG_JSON_CYCLIC_VALUE,
> +                                      js_object_str);

rs=me in a separate patch, followup, whatever, to make JSMSG_JSON_CYCLIC_VALUE not have any parameters and just inline "object" directly.  This R Dumb.

::: js/src/vm/Debugger.cpp
@@ +3180,4 @@
>  
>  /* static */ Debugger*
>  Debugger::fromThisValue(JSContext* cx, const CallArgs& args, const char* fnname)
>  {

This function too is only used in this file, should be file-static.

@@ +7643,3 @@
>  
>  /* static */ DebuggerFrame*
>  DebuggerFrame::checkThis(JSContext* cx, const CallArgs& args, const char* fnname, bool checkLive)

This function should also be file-static.

Additionally, the THIS_FRAME_THISOBJ macro should be folded into its sole caller, and THIS_FRAME_ITER and THIS_FRAME_OWNER and THIS_FRAME_OWNER_ITER are unused (at least per searchfox) and should be removed.

@@ +10164,3 @@
>  /* static */ DebuggerEnvironment*
>  DebuggerEnvironment::checkThis(JSContext* cx, const CallArgs& args, const char* fnname,
>                                 bool requireDebuggee)

Another function that should be file-static.

I guess there are enough of these file-static functions and macro changes that it's worth a review.  Here, followup bug, doesn't matter.

::: js/src/vm/DebuggerMemory.cpp
@@ +78,4 @@
>  };
>  
>  /* static */ DebuggerMemory*
>  DebuggerMemory::checkThis(JSContext* cx, CallArgs& args, const char* fnName)

This too can/should be a file-local static.

::: js/src/vm/ErrorObject.cpp
@@ +160,4 @@
>  
>  /* static */ bool
>  js::ErrorObject::checkAndUnwrapThis(JSContext* cx, CallArgs& args, const char* fnName,
>                                      MutableHandle<ErrorObject*> error)

This function is only used in this file, so there's no reason for it to be a static function on ErrorObject, and therefore necessarily declared in ErrorObject.h, and potentially called in other files.  rs=me in a followup patch to make this file-static and not a member function in ErrorObject -- would have saved me some searching trouble (to verify encoding of |fnName|) had it been this way already.

::: js/src/vm/SavedStacks.cpp
@@ +603,4 @@
>  
>  /* static */ bool
>  SavedFrame::checkThis(JSContext* cx, CallArgs& args, const char* fnName,
>                        MutableHandleObject frame)

This too would be better as a file-local static function than as a member function in SavedFrame, rs=me to move it as well.
Attachment #8793551 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8793555 [details] [diff] [review]
Part 12: Use ASCII or Latin1 or UTF8 variant of JS_ReportErrorFlagsAndNumber.

Review of attachment 8793555 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jscntxt.cpp
@@ +856,5 @@
>          strcmp(bytes.get(), js_null_str) == 0) {
> +        ok = JS_ReportErrorFlagsAndNumberLatin1(cx, JSREPORT_ERROR,
> +                                                GetErrorMessage, nullptr,
> +                                                JSMSG_NO_PROPERTIES,
> +                                                bytes.get(), nullptr, nullptr);

Get rid of the two trailing nullptrs -- those look like remnants from when these three reports were all a single call, I bet.

@@ +861,5 @@
>      } else if (v.isUndefined()) {
> +        ok = JS_ReportErrorFlagsAndNumberLatin1(cx, JSREPORT_ERROR,
> +                                                GetErrorMessage, nullptr,
> +                                                JSMSG_UNEXPECTED_TYPE,
> +                                                bytes.get(), js_undefined_str, nullptr);

Also the trailing nullptr here, and in the one below.

Treating DVG results as Latin-1 is safe, at least, but not really right.  At the same time, DVG is just wrong, because it observably generates its string if the value is an object with a toSource property on it.  Given that wrongness that we need to expunge, trying to make this right somehow is a bit dumb -- so Latin-1 it is.

@@ +906,5 @@
>      if (!bytes)
>          return false;
>  
> +    ok = JS_ReportErrorFlagsAndNumberLatin1(cx, flags, GetErrorMessage, nullptr, errorNumber,
> +                                            bytes.get(), arg1, arg2);

Ditto for DVG/Latin-1 here.

::: js/src/vm/RegExpObject.cpp
@@ +1055,5 @@
>      }
>  
>      if (!ok) {
> +        TwoByteChars range(&lastParsed, 1);
> +        UniqueChars utf8(JS::CharsToNewUTF8CharsZ(nullptr, range).c_str());

This needs a null-check and fail if null.
Attachment #8793555 - Flags: review?(jwalden+bmo) → review+
changed AssertIsASCII to StringIsASCII, and added assertion to ObjectGroup::ObjectGroup and ObjectGroup::setClasp.
ObjectGroup::setClasp is moved from .h to .cpp, as declaration in jscntxt.h is not available there, because of dependency.
Attachment #8796145 - Flags: review?(jwalden+bmo)
changed:
 PRIu32 => %lu
 PRIx32 => %lx
Attachment #8796146 - Flags: review?(jwalden+bmo)
Changed:
  * Debugger::fromThisValue => Debugger_fromThisValue
  * DebuggerFrame::requireLive => DebuggerFrame_requireLive
  * DebuggerFrame::checkThis => DebuggerFrame_checkThis
  * DebuggerObject::checkThis => DebuggerObject_checkThis
  * DebuggerEnvironment::checkThis => DebuggerEnvironment_checkThis

Also, made Debugger::class_ public to access it from Debugger_fromThisValue.
other class_ members are already public.
Attachment #8796148 - Flags: review?(jwalden+bmo)
reverted function name change.
Attachment #8794507 - Attachment is obsolete: true
Attachment #8794509 - Attachment is obsolete: true
Attachment #8794510 - Attachment is obsolete: true
Attachment #8794509 - Flags: review?(jwalden+bmo)
Attachment #8796149 - Flags: review?(jgilbert)
Comment on attachment 8794504 [details] [diff] [review]
Part 2.1: Use JS_ReportErrorLatin1 in mozJSComponentLoader::LoadModule.

Review of attachment 8794504 [details] [diff] [review]:
-----------------------------------------------------------------

I think Waldo was suggesting you ask me for review on the changes you make to FileLocation::GetURIString and associated things, not this patch.
Attachment #8794504 - Flags: review?(nfroyd)
Comment on attachment 8796149 [details] [diff] [review]
Part 5.1: Use ASCII and Latin1 variants of JS_ReportWarning in dom/canvas. r=jwalden

Review of attachment 8796149 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/WebGLContextUtils.cpp
@@ +89,5 @@
>          return;
>      }
>  
>      JSContext* cx = api.cx();
> +    JS_ReportWarningLatin1(cx, "WebGL: %s", buf);

ASCII would be fine here.
Attachment #8796149 - Flags: review?(jgilbert) → review+
(In reply to Nathan Froyd [:froydnj] from comment #69)
> I think Waldo was suggesting you ask me for review on the changes you make
> to FileLocation::GetURIString and associated things, not this patch.

Er -- yes, I was.  Then I proceeded to set addl.review+ and transfer the existing request on the existing patch over, which doesn't make any sense.  :-)  Don't blame arai for this.

(Maybe this counts as my TIGS moment for the day.  ;-) )
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #71)
> (In reply to Nathan Froyd [:froydnj] from comment #69)
> > I think Waldo was suggesting you ask me for review on the changes you make
> > to FileLocation::GetURIString and associated things, not this patch.
> 
> Er -- yes, I was.  Then I proceeded to set addl.review+ and transfer the
> existing request on the existing patch over, which doesn't make any sense. 
> :-)  Don't blame arai for this.

My mistake!  I didn't look too closely at who was requesting what review.
Attachment #8796147 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8796148 [details] [diff] [review]
Part 8.4: Convert some debugger related static methods to file static functions.

Review of attachment 8796148 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Debugger.cpp
@@ +7702,1 @@
>          return nullptr;

Switch this to

    if (checkLive) {
        if (!DebuggerFrame_requireLive(cx, frame))
            return false;
    }

since you're changing things.
Attachment #8796148 - Flags: review?(jwalden+bmo) → review+
Attachment #8796146 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8796145 [details] [diff] [review]
Part 2.2: Use ASCII variant of JS_ReportError when formatting JSClass::name, and assert it is ASCII.

Review of attachment 8796145 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ObjectGroup.cpp
@@ +48,5 @@
>      setGeneration(zone()->types.generation);
>  }
>  
>  void
> +ObjectGroup::setClasp(const Class* clasp)

Hmm, out-of-lining this seems somewhat undesirable/unnecessary.  It seems to me that StringIsASCII is a sufficiently justifiable function, that we could make it JS::StringIsASCII and put it in js/public/CharacterEncoding.h and js/src/vm/CharacterEncoding.cpp.  Then we could #include "js/CharacterEncoding.h" and not have to out-of-line this function.  That seems preferable to me for this.
Attachment #8796145 - Flags: review?(jwalden+bmo) → review-
Thanks :)
moved it to CharacterEncoding with JS namespace
Attachment #8796145 - Attachment is obsolete: true
Attachment #8796324 - Flags: review?(jwalden+bmo)
Comment on attachment 8796324 [details] [diff] [review]
Part 2.2: Use ASCII variant of JS_ReportError when formatting JSClass::name, and assert it is ASCII.

Review of attachment 8796324 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/CharacterEncoding.h
@@ +323,5 @@
>  extern Latin1CharsZ
>  LossyUTF8CharsToNewLatin1CharsZ(JSContext* cx, const UTF8Chars utf8, size_t* outlen);
>  
> +extern bool
> +StringIsASCII(const char* s);

Needs a doc-comment of some sort, like

"""
Returns true if all characters in the given null-terminated string are ASCII, i.e. < 0x80, false otherwise.
"""

::: js/src/jscntxt.cpp
@@ +555,5 @@
>              } else if (typeArg == ArgumentsAreASCII || typeArg == ArgumentsAreLatin1) {
>                  const char* charArg = va_arg(ap, char*);
>                  size_t charArgLength = strlen(charArg);
>  
> +                MOZ_ASSERT_IF(typeArg == ArgumentsAreASCII, JS::StringIsASCII(charArg));

Add a #include "js/CharacterEncoding.h" to this file if it lacks one.

::: js/src/vm/ObjectGroup.cpp
@@ +36,5 @@
>      PodZero(this);
>  
>      /* Windows may not appear on prototype chains. */
>      MOZ_ASSERT_IF(proto.isObject(), !IsWindow(proto.toObject()));
> +    MOZ_ASSERT(JS::StringIsASCII(clasp->name));

Add a #include "js/CharacterEncoding.h" to this file if it lacks one.

::: js/src/vm/ObjectGroup.h
@@ +9,5 @@
>  
>  #include "jsbytecode.h"
>  #include "jsfriendapi.h"
>  
> +#include "js/CharacterEncoding.h"

I think this wants to go above the GCHashTable.h #include to pass the include-style-checker.
Attachment #8796324 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/968d21d7f16d1790dd70365f81aa5ba18a4cf95b
Bug 1289050 - Part 2.1: Use JS_ReportErrorLatin1 in mozJSComponentLoader::LoadModule. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/b353e488f0eb8a6295db6275273157016963e31c
Bug 1289050 - Part 2.2: Use ASCII variant of JS_ReportError when formatting JSClass::name, and assert it is ASCII. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/fd72da5a490517502ec2eca4bd5eedb9400c997c
Bug 1289050 - Part 3: Use UTF8 variant of JS_ReportError after re-encoding string. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/125555035882e55c1abc8103be89e6ecf26c3ce1
Bug 1289050 - Part 3.1: Add mozJSComponentLoader::Location and use UTF8 variant of JS_ReportError in mozJSComponentLoader. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/2abf8aadd55680cec8955d41a7aa69caef0b67b0
Bug 1289050 - Part 4: Use Latin1 or UTF8 variants of JS_ReportError in js shell. r=sfink

https://hg.mozilla.org/integration/mozilla-inbound/rev/865b287375d7ef16ceb8092f18aa91ba625d1095
Bug 1289050 - Part 5: Use ASCII variant of JS_ReportWarning. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/99298ff4a2d2f7438f226b5b2d2f1cce07410743
Bug 1289050 - Part 5.1: Use ASCII variants of JS_ReportWarning in dom/canvas. r=jwalden,jgilbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/ccacc9678443f2a45da35f3486e0b0bb814e4fd8
Bug 1289050 - Part 6: Use ASCII variant of JS_ReportErrorNumber in simple case. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e98bedb1102a74111da7d7bcdba99f7f9adae37
Bug 1289050 - Part 7: Use ASCII variant of JS_ReportErrorNumber when parameters are all static string. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/f970fe55ff6cbf39fca00df19b0894ad0830fbd9
Bug 1289050 - Part 8: Use ASCII or UTF8 variant of JS_ReportErrorNumber in non-simple cases. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/245a2abe284edd7049699224ef26df239cef99fa
Bug 1289050 - Part 8.1: Use %lu and %lx instead of PRIu32 and PRIx32 in WasmBinary.cpp. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/bc6b3f284a830f3d8ca5d4e409441c361e3b83fd
Bug 1289050 - Part 8.2: Remove unused parameter from ArgumentRangeMismatch and CTYPESMSG_ARG_RANGE_MISMATCH. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/96d5a9486cea49d12ae5066a3e6171b7c48b43bf
Bug 1289050 - Part 8.3: Remove unused parameter from JSMSG_JSON_CYCLIC_VALUE. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/7a24ab0cb4f0d6173edb259c2623e079ccdd2adb
Bug 1289050 - Part 8.4: Convert some debugger related static methods to file static functions. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/44ddce7c9b96316464b66b90a2f4d98fb763f52f
Bug 1289050 - Part 8.5: Convert js::ErrorObject::checkAndUnwrapThis to file static function. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/14ef4a8f55f36318d169db10c638726928fd59e0
Bug 1289050 - Part 8.6: Convert SavedFrame::checkThis to file static function. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/b1428a6cabc5c470e505841d2306e5ca4c3576f7
Bug 1289050 - Part 9: Use Latin1 variant of JS_ReportErrorNumber after encoding in Latin1. r=jandem

https://hg.mozilla.org/integration/mozilla-inbound/rev/ab961ef0ffd0b57e8a3bf7caafa7e9a1fa6302b2
Bug 1289050 - Part 10: Use Latin1 variant of JS_ReportErrorNumber for strerror. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/e35af31d7fd5ea2049d65da77bdd323c5fe7a759
Bug 1289050 - Part 11: Use UTF8 variant of JS_ReportErrorNumberVA. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/f803ec5c0469b1e9b949ac686e99bf9611b44d04
Bug 1289050 - Part 12: Use ASCII or Latin1 or UTF8 variant of JS_ReportErrorFlagsAndNumber. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/2422b2dcfcb20ae38cc76fd5ce31a22baf37e7da
Bug 1289050 - Part 13: Use ArgumentsAreLatin1 in frontend error reporting. r=jwalden

https://hg.mozilla.org/integration/mozilla-inbound/rev/3b4b49952871c7fb905aaed928e183a8a26191df
Bug 1289050 - Part 14: Remove pseudo-ASCII variant of JS_ReportError*. r=jwalden
Keywords: leave-open
See Also: → 1306537
Depends on: 1306538
No longer depends on: 1306538
See Also: → 1306538
See Also: → 1306540
See Also: → 1416781
You need to log in before you can comment on or make changes to this bug.