UTF-8 and UTF-16 atomization functions don't produce the same atom (was: The page goes blank after loading on ca.warbyparker.com)
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | wontfix |
firefox68 | --- | wontfix |
firefox69 | --- | wontfix |
firefox70 | --- | fixed |
People
(Reporter: ksenia, Assigned: Waldo)
References
(Regression, )
Details
(Keywords: regression)
Attachments
(2 files)
175 bytes,
application/x-javascript
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta-
|
Details | Review |
Steps to reproduce:
In Firefox Nightly navigate to https://ca.warbyparker.com/ and observe the behavior.
Expected:
Site loads
Actual:
Site loads, but then goes blank due to an error: TypeError: "_ is not a function"
here:
_ = k.µ,
E = k.ƒ,
S = n(4),
w = S.isEmpty,
L = S.kebabCase,
N = S.isFunction,
x = S.omit;
n(169);
var M = function (e) {
var t = e.props,
n = e.state;
return _(f(), 'c-formgroup', t.cssUtility, t.cssModifier, E(n.isFocused) ('-focus'), E(!w(t.txtError)) ('-error'))
},
When I set dom.script_loader.external_scripts.utf8_parsing.enabled
to be false, the error goes away
From mozregression:
6:24.69 INFO: Narrowed inbound regression window from [5f48ef70, 613d59f3] (3 builds) to [a3e42056, 613d59f3] (2 builds) (~1 steps left)
6:24.69 INFO: No more inbound revisions, bisection finished.
6:24.69 INFO: Last good revision: a3e4205698b8711791e4bc97ceaceabaa137c553
6:24.69 INFO: First bad revision: 613d59f386c7a757825b4ad7055db072075f5670
6:24.69 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=a3e4205698b8711791e4bc97ceaceabaa137c553&tochange=613d59f386c7a757825b4ad7055db072075f5670
Assignee | ||
Comment 1•5 years ago
|
||
Ooh, this looks juicy! (And unlike the other ostensible UTF-8 bug, much more likely to be directly UTF-8-relevant.) I am on this for sure. Debugging nowish...
Assignee | ||
Comment 2•5 years ago
|
||
Still grunging through this, but right now I don't think this is a UTF-8 tokenizing bug.
...it's an atomization bug. 😲 YES I AM SERIOUS HERE.
And somehow I just happened to stumble upon it.
Mega-yikes.
Assignee | ||
Comment 3•5 years ago
|
||
[jwalden@find-waldo-now src]$ cat /tmp/fail.js
var obj1 = { "µ": "PASS" };
print(obj1.µ || "FAIL", "U+00B5 MICRO SIGN");
var obj2 = { "𧊶": "PASS" };
print(obj2.𧊶 || "FAIL", "U+272B6 CJK UNIFIED IDEOGRAPH-272B6");
[jwalden@find-waldo-now src]$ dbg/js/src/js --file /tmp/fail.js
PASS U+00B5 MICRO SIGN
PASS U+272B6 CJK UNIFIED IDEOGRAPH-272B6
[jwalden@find-waldo-now src]$ dbg/js/src/js --utf8-file /tmp/fail.js
(compiling '/tmp/fail.js' as UTF-8 without inflating)
FAIL U+00B5 MICRO SIGN
PASS U+272B6 CJK UNIFIED IDEOGRAPH-272B6
Stepping through in a debugger, the obj1
property lookup uses a jsid
containing a JSString*
having one bit pattern...but the property in obj1
itself of identical name uses a jsid
containing a JSString*
having a different bit pattern. And if you check both strings, they both say they're atoms.
In other words, something is going very wrong about atomizing UTF-8-valued identifier names.
In the string-literal case, we construct the atom by appending full code points to a UTF-16 buffer, then we drain the buffer into a JSAtom*
.
But in the identifier-name tokenizing case, we instead call this function (in this testcase, directly upon the source units themselves):
template <>
/* static */ MOZ_ALWAYS_INLINE JSAtom*
TokenStreamCharsBase<mozilla::Utf8Unit>::atomizeSourceChars(
JSContext* cx, mozilla::Span<const mozilla::Utf8Unit> units) {
auto chars = ToCharSpan(units);
return AtomizeUTF8Chars(cx, chars.data(), chars.size());
}
And apparently the two methods produce disagreeing "atoms".
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Minimal jsapi-test fodder:
BEGIN_TEST(testAtomizeUTF8AndUTF16Agree) {
// U+00B5 MICRO SIGN ("µ")
static const char16_t MicroSign16[] = u"\u00B5";
static_assert(mozilla::ArrayLength(MicroSign16) - 1 == 1,
"single-code-point UTF-16 has length 1");
static const char MicroSign8[] = u8"\u00B5";
static_assert(mozilla::ArrayLength(MicroSign8) - 1 == 2,
"single-code-point UTF-8 has length 2");
JS::Rooted<JSAtom*> atom16(cx, js::AtomizeChars(cx, MicroSign16, 1));
CHECK(atom16);
CHECK(atom16->latin1OrTwoByteChar(0) == u'µ');
CHECK(atom16->length() == 1);
JS::Rooted<JSAtom*> atom8(cx, js::AtomizeUTF8Chars(cx, MicroSign8, 2));
CHECK(atom8);
CHECK(atom8->latin1OrTwoByteChar(0) == u'µ');
CHECK(atom8->length() == 1);
CHECK(atom16 == atom8);
return true;
}
END_TEST(testAtomizeUTF8AndUTF16Agree)
Assignee | ||
Comment 5•5 years ago
|
||
...r=jonco
Updated•5 years ago
|
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/autoland/rev/0a25146a7c1a AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself. (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)... r=jonco
Assignee | ||
Comment 7•5 years ago
|
||
Comment on attachment 9087672 [details]
Bug 1575947 - AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself. (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)...
Beta/Release Uplift Approval Request
- User impact if declined: Property accesses in source code using single-code-point property names in the range [U+0080, U+00FF] will evaluate to
undefined
, not the actual value of the property. And it'll be very difficult for web developers to figure out, because the string created for the property will function in every respect like it should -- except for pointer identity and therefore property-lookup equivalence. Basically impossible to debug without going in with a C++ debugger.
Possibly other bad symptoms could manifest, beyond just property accesses in source: in principle every caller of js::AtomizeUTF8Chars
https://searchfox.org/mozilla-central/search?q=AtomizeUTF8Chars represents a risk. It appears, for example, global imports in asm.js
may be susceptible to errors due to this bug. But only callers that care about JSAtom*
pointer equality (e.g. they use the atom to access a property) will actually be buggy. I don't immediately know how many do that. At a quick skim I'm guessing the TokenStream
caller that stumbled over this is the most likely caller to do this, but I can't say none of the others do it.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The static string store is simple, and other paths to create a string already have code similar to this. The other cases have it simpler -- they don't have to compute the code point to check for a static string of. But the decoding and probing we do here is for exactly one UTF-8 bit encoding and is extremely simple -- and tested by the included test -- so it's very hard to see any room for error.
- String changes made/needed:
Assignee | ||
Comment 8•5 years ago
|
||
While the URL testcase started misbehaving with bug 1554362, the root problem was introduced by bug 1494942.
The comment // Since the static strings are all ascii, we can check them before trying anything else.
introduced by that patch is probably most of what led people astray. Had that been true, the fact of disjoint storage of atoms in three distinct places (all of which must be probed when atomizing character sequences to be sure a duplicate atom isn't created) wouldn't have mattered, because "\u00B5" and so on would not have been in static strings and would have had to be created/record in the normal atoms set.
Comment 9•5 years ago
|
||
bugherder |
Comment 10•5 years ago
|
||
This isn't a new issue and we're out of betas this cycle. Let's let it ride with Fx70.
Updated•5 years ago
|
Updated•2 years ago
|
Description
•