Closed
Bug 1504802
Opened 6 years ago
Closed 6 years ago
Line-of-context computation for syntax errors in UTF-8 tokenizing is not wholly correct
Categories
(Core :: JavaScript Engine, defect, P2)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla65
Tracking | Status | |
---|---|---|
firefox65 | --- | fixed |
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(2 files)
2.98 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
9.33 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
Usually, f you have a syntax error in UTF-8-tokenized source, the extra line of context associated with it -- linebuf, I think JSAPI calls it -- is computed correctly: it'll all be valid UTF-8/Unicode, its length will be reported correctly, the offset of the token in it will be correct. However, because the line of context is stored/exposed as UTF-16, eyeballing things, it looks like there are places where the UTF-8-to-16 conversion is not correctly applied to the offsets/lengths computed by the line-of-context computation function. I'm not sure of the full extent of it, but there is definitely *some* badness here. So if I add this jsapi-test somewhere (relying on an increasingly long series of patches smeared across half a dozen bugs): BEGIN_TEST(testBadUtf8InWindow) { static const char firstInWindowIsMultiUnit[] = "\xCF\x80 = 3.141592654; @ bad starts HERE:\x80\xFF\xFF"; CHECK(testContext(firstInWindowIsMultiUnit, u"π = 3.141592654; @ bad starts HERE:")); return true; } bool compileUtf8(const char* chars, size_t len, JS::MutableHandleScript script) { // JS::CompileUtf8DontInflate the source text in a fresh realm... } template<size_t N, size_t ContextLenWithNull> bool testContext(const char (&chars)[N], const char16_t (&expectedContext)[ContextLenWithNull]) { JS::Rooted<JSScript*> script(cx); CHECK(!compileUtf8(chars, N - 1, &script)); JS::RootedValue exn(cx); CHECK(JS_GetPendingException(cx, &exn)); JS_ClearPendingException(cx); js::ErrorReport report(cx); CHECK(report.init(cx, exn, js::ErrorReport::WithSideEffects)); // ...more testing elided, that's never reached 'cause we // assert in the above already... return true; } END_TEST(testBadUtf8InWindow) then I hit Assertion failure: linebufArg[linebufLengthArg] == '\0', at /home/jwalden/moz/after/js/src/jsapi.cpp:6302 I think because π occupies only a single UTF-16 code unit but we computed something in terms of UTF-8 code units somewhere.
Assignee | ||
Comment 1•6 years ago
|
||
Right now this fails because UTF-16 encoding compresses -- so offsets based on UTF-8 will overflow past the end. We'll hit an assertion when initing the linebuf that there's a null terminator at the linebufLength.
Attachment #9022804 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 2•6 years ago
|
||
With this patch in place, the previous test passes. This was the last definite and obvious and known error in UTF-8 compilation, but I seriously doubt there aren't more. I'll probably start looking into where column numbers are stored next, then once I run down the inevitable bugs in there, it might be time for comprehensive audit-style searching for problems.
Attachment #9022805 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 3•6 years ago
|
||
Comment on attachment 9022805 [details] [diff] [review] Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it Review of attachment 9022805 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/Utf8.h @@ +237,2 @@ > inline bool > +IsAscii<Utf8Unit>(Utf8Unit aUnit) FYI, this is needed for the IsAscii<Unit> in TS.cpp to work. Merely using IsAscii with a template function argument presents ambiguities, at least with clang. A lambda can't be used because at least til recently lambdas can't appear in unevaluated context (like in the decltype used inside MOZ_ASSERT verifying the passed condition isn't vacuously true by dint of its type). And IsAscii<Utf8Unit> without this change won't invoke this definition but instead will start trying to mozilla::MakeUnsigned<Utf8Unit> which blows up. So, this IsAscii overload needs to be a template specialization so we can use IsAscii<Unit> to pick it up.
Updated•6 years ago
|
Priority: -- → P2
Updated•6 years ago
|
Flags: needinfo?(arai.unmht)
Updated•6 years ago
|
Attachment #9022804 -
Flags: review?(arai.unmht) → review+
Comment 4•6 years ago
|
||
Comment on attachment 9022805 [details] [diff] [review] Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it Review of attachment 9022805 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +1441,5 @@ > + * encodedWindowLength, &utf16Length); > + * MOZ_ASSERT(utf16Offset == 7); > + * MOZ_ASSERT(utf16Length = 13); > + * > + * This function asserts if called for UTF-16: the sole caller can avoid out of curiosity, does "asserts" here means "crashes"?
Attachment #9022805 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Flags: needinfo?(arai.unmht)
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #4) > > + * This function asserts if called for UTF-16: the sole caller can avoid > > out of curiosity, does "asserts" here means "crashes"? It'll hit the MOZ_ASSERT_UNREACHABLE, so assert. The single caller is in code guarded by a |std::is_same<Unit, char16_t>::value| test, so this should be literally unreachable.
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/971f97ac3013 Translate Unit-relevant offsets into sourceUnits, into char16_t-relevant offsets with respect to the just-computed lineOfContext, when reporting a syntax error and adding a line of context to it. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/0fbfcab03843 Add a jsapi-test for a UTF-8 multi-unit code point within the line of context for a syntax error. r=arai
Comment 7•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/971f97ac3013 https://hg.mozilla.org/mozilla-central/rev/0fbfcab03843
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in
before you can comment on or make changes to this bug.
Description
•