Closed Bug 1504802 Opened 6 years ago Closed 6 years ago

Line-of-context computation for syntax errors in UTF-8 tokenizing is not wholly correct

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(2 files)

Usually, f you have a syntax error in UTF-8-tokenized source, the extra line of context associated with it -- linebuf, I think JSAPI calls it -- is computed correctly: it'll all be valid UTF-8/Unicode, its length will be reported correctly, the offset of the token in it will be correct.

However, because the line of context is stored/exposed as UTF-16, eyeballing things, it looks like there are places where the UTF-8-to-16 conversion is not correctly applied to the offsets/lengths computed by the line-of-context computation function.  I'm not sure of the full extent of it, but there is definitely *some* badness here.

So if I add this jsapi-test somewhere (relying on an increasingly long series of patches smeared across half a dozen bugs):

BEGIN_TEST(testBadUtf8InWindow)
{
    static const char firstInWindowIsMultiUnit[] =
      "\xCF\x80 = 3.141592654; @ bad starts HERE:\x80\xFF\xFF";
    CHECK(testContext(firstInWindowIsMultiUnit,
                      u"π = 3.141592654; @ bad starts HERE:"));
    return true;
}

bool
compileUtf8(const char* chars, size_t len, JS::MutableHandleScript script)
{
    // JS::CompileUtf8DontInflate the source text in a fresh realm...
}

template<size_t N, size_t ContextLenWithNull>
bool
testContext(const char (&chars)[N], const char16_t (&expectedContext)[ContextLenWithNull])
{
    JS::Rooted<JSScript*> script(cx);
    CHECK(!compileUtf8(chars, N - 1, &script));

    JS::RootedValue exn(cx);
    CHECK(JS_GetPendingException(cx, &exn));
    JS_ClearPendingException(cx);

    js::ErrorReport report(cx);
    CHECK(report.init(cx, exn, js::ErrorReport::WithSideEffects));

    // ...more testing elided, that's never reached 'cause we
    // assert in the above already...

    return true;
}
END_TEST(testBadUtf8InWindow)

then I hit

Assertion failure: linebufArg[linebufLengthArg] == '\0', at /home/jwalden/moz/after/js/src/jsapi.cpp:6302

I think because π occupies only a single UTF-16 code unit but we computed something in terms of UTF-8 code units somewhere.
Right now this fails because UTF-16 encoding compresses -- so offsets based on UTF-8 will overflow past the end.  We'll hit an assertion when initing the linebuf that there's a null terminator at the linebufLength.
Attachment #9022804 - Flags: review?(arai.unmht)
With this patch in place, the previous test passes.

This was the last definite and obvious and known error in UTF-8 compilation, but I seriously doubt there aren't more.  I'll probably start looking into where column numbers are stored next, then once I run down the inevitable bugs in there, it might be time for comprehensive audit-style searching for problems.
Attachment #9022805 - Flags: review?(arai.unmht)
Comment on attachment 9022805 [details] [diff] [review]
Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it

Review of attachment 9022805 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Utf8.h
@@ +237,2 @@
>  inline bool
> +IsAscii<Utf8Unit>(Utf8Unit aUnit)

FYI, this is needed for the IsAscii<Unit> in TS.cpp to work.  Merely using IsAscii with a template function argument presents ambiguities, at least with clang.  A lambda can't be used because at least til recently lambdas can't appear in unevaluated context (like in the decltype used inside MOZ_ASSERT verifying the passed condition isn't vacuously true by dint of its type).  And IsAscii<Utf8Unit> without this change won't invoke this definition but instead will start trying to mozilla::MakeUnsigned<Utf8Unit> which blows up.

So, this IsAscii overload needs to be a template specialization so we can use IsAscii<Unit> to pick it up.
Blocks: 1504947
Priority: -- → P2
Flags: needinfo?(arai.unmht)
Attachment #9022804 - Flags: review?(arai.unmht) → review+
Comment on attachment 9022805 [details] [diff] [review]
Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it

Review of attachment 9022805 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.h
@@ +1441,5 @@
> +     *                                encodedWindowLength, &utf16Length);
> +     *   MOZ_ASSERT(utf16Offset == 7);
> +     *   MOZ_ASSERT(utf16Length = 13);
> +     *
> +     * This function asserts if called for UTF-16: the sole caller can avoid

out of curiosity, does "asserts" here means "crashes"?
Attachment #9022805 - Flags: review?(arai.unmht) → review+
Flags: needinfo?(arai.unmht)
(In reply to Tooru Fujisawa [:arai] from comment #4)
> > +     * This function asserts if called for UTF-16: the sole caller can avoid
> 
> out of curiosity, does "asserts" here means "crashes"?

It'll hit the MOZ_ASSERT_UNREACHABLE, so assert.  The single caller is in code guarded by a |std::is_same<Unit, char16_t>::value| test, so this should be literally unreachable.
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/971f97ac3013
Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fbfcab03843
Add a jsapi-test for a UTF-8 multi-unit code point within the line of context for a syntax error.  r=arai
https://hg.mozilla.org/mozilla-central/rev/971f97ac3013
https://hg.mozilla.org/mozilla-central/rev/0fbfcab03843
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: