Closed Bug 1575947 Opened 5 years ago Closed 5 years ago

UTF-8 and UTF-16 atomization functions don't produce the same atom (was: The page goes blank after loading on ca.warbyparker.com)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla70

Project Flags:

Webcompat Priority

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox-esr68

---

wontfix

firefox68

---

wontfix

firefox69

---

wontfix

firefox70

---

fixed

People

(Reporter: ksenia, Assigned: Waldo)

References

(Regression,
URL
)

Details

(Keywords: regression)

Attachments

(2 files)

Reduced testcase 5 years ago Jeff Walden [:Waldo] 175 bytes, application/x-javascript		Details
Bug 1575947 - AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself. (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)... 5 years ago Jeff Walden [:Waldo] 47 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta-	Details \| Review

Ksenia Berezina [:ksenia]

Reporter

Description

•

5 years ago

Steps to reproduce:
In Firefox Nightly navigate to https://ca.warbyparker.com/ and observe the behavior.

Expected:
Site loads

Actual:
Site loads, but then goes blank due to an error: TypeError: "_ is not a function" here:

       _ = k.µ,
      E = k.ƒ,
      S = n(4),
      w = S.isEmpty,
      L = S.kebabCase,
      N = S.isFunction,
      x = S.omit;
      n(169);
      var M = function (e) {
        var t = e.props,
        n = e.state;
        return _(f(), 'c-formgroup', t.cssUtility, t.cssModifier, E(n.isFocused) ('-focus'), E(!w(t.txtError)) ('-error'))
      },

When I set dom.script_loader.external_scripts.utf8_parsing.enabled to be false, the error goes away

From mozregression:

 6:24.69 INFO: Narrowed inbound regression window from [5f48ef70, 613d59f3] (3 builds) to [a3e42056, 613d59f3] (2 builds) (~1 steps left)
 6:24.69 INFO: No more inbound revisions, bisection finished.
 6:24.69 INFO: Last good revision: a3e4205698b8711791e4bc97ceaceabaa137c553
 6:24.69 INFO: First bad revision: 613d59f386c7a757825b4ad7055db072075f5670
 6:24.69 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=a3e4205698b8711791e4bc97ceaceabaa137c553&tochange=613d59f386c7a757825b4ad7055db072075f5670

Flags: needinfo?(jwalden)

Jeff Walden [:Waldo]

Assignee

Comment 1

•

5 years ago

Ooh, this looks juicy! (And unlike the other ostensible UTF-8 bug, much more likely to be directly UTF-8-relevant.) I am on this for sure. Debugging nowish...

Flags: needinfo?(jwalden)

Jeff Walden [:Waldo]

Assignee

Comment 2

•

5 years ago

Still grunging through this, but right now I don't think this is a UTF-8 tokenizing bug.

...it's an atomization bug. 😲 YES I AM SERIOUS HERE.

And somehow I just happened to stumble upon it.

Mega-yikes.

Jeff Walden [:Waldo]

Assignee

Comment 3

•

5 years ago

Attached file Reduced testcase — Details

[jwalden@find-waldo-now src]$ cat /tmp/fail.js 
var obj1 = { "µ": "PASS" };
print(obj1.µ || "FAIL", "U+00B5 MICRO SIGN");

var obj2 = { "𧊶": "PASS" };
print(obj2.𧊶 || "FAIL", "U+272B6 CJK UNIFIED IDEOGRAPH-272B6");
[jwalden@find-waldo-now src]$ dbg/js/src/js --file /tmp/fail.js 
PASS U+00B5 MICRO SIGN
PASS U+272B6 CJK UNIFIED IDEOGRAPH-272B6
[jwalden@find-waldo-now src]$ dbg/js/src/js --utf8-file /tmp/fail.js 
(compiling '/tmp/fail.js' as UTF-8 without inflating)
FAIL U+00B5 MICRO SIGN
PASS U+272B6 CJK UNIFIED IDEOGRAPH-272B6

Stepping through in a debugger, the obj1 property lookup uses a jsid containing a JSString* having one bit pattern...but the property in obj1 itself of identical name uses a jsid containing a JSString* having a different bit pattern. And if you check both strings, they both say they're atoms.

In other words, something is going very wrong about atomizing UTF-8-valued identifier names.

In the string-literal case, we construct the atom by appending full code points to a UTF-16 buffer, then we drain the buffer into a JSAtom*.

But in the identifier-name tokenizing case, we instead call this function (in this testcase, directly upon the source units themselves):

template <>
/* static */ MOZ_ALWAYS_INLINE JSAtom*
TokenStreamCharsBase<mozilla::Utf8Unit>::atomizeSourceChars(
    JSContext* cx, mozilla::Span<const mozilla::Utf8Unit> units) {
  auto chars = ToCharSpan(units);
  return AtomizeUTF8Chars(cx, chars.data(), chars.size());
}

And apparently the two methods produce disagreeing "atoms".

Jeff Walden [:Waldo]

Assignee

Updated

•

5 years ago

Assignee: nobody → jwalden

Status: NEW → ASSIGNED

Jeff Walden [:Waldo]

Assignee

Updated

•

5 years ago

Component: DOM: Core & HTML → JavaScript Engine

Summary: The page goes blank after loading on ca.warbyparker.com → UTF-8 and UTF-16 atomization functions don't produce the same atom (was: The page goes blank after loading on ca.warbyparker.com)

Jeff Walden [:Waldo]

Assignee

Comment 4

•

5 years ago

Minimal jsapi-test fodder:

BEGIN_TEST(testAtomizeUTF8AndUTF16Agree) {
  // U+00B5 MICRO SIGN ("µ")
  static const char16_t MicroSign16[] = u"\u00B5";
  static_assert(mozilla::ArrayLength(MicroSign16) - 1 == 1,
                "single-code-point UTF-16 has length 1");
  static const char MicroSign8[] = u8"\u00B5";
  static_assert(mozilla::ArrayLength(MicroSign8) - 1 == 2,
                "single-code-point UTF-8 has length 2");

  JS::Rooted<JSAtom*> atom16(cx, js::AtomizeChars(cx, MicroSign16, 1));
  CHECK(atom16);
  CHECK(atom16->latin1OrTwoByteChar(0) == u'µ');
  CHECK(atom16->length() == 1);

  JS::Rooted<JSAtom*> atom8(cx, js::AtomizeUTF8Chars(cx, MicroSign8, 2));
  CHECK(atom8);
  CHECK(atom8->latin1OrTwoByteChar(0) == u'µ');
  CHECK(atom8->length() == 1);

  CHECK(atom16 == atom8);
  return true;
}
END_TEST(testAtomizeUTF8AndUTF16Agree)

Jeff Walden [:Waldo]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1575947 - AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself. (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)... — Details

...r=jonco

Steven DeTar [:sdetar]

Updated

•

5 years ago

Priority: -- → P1

Pulsebot

Comment 6

•

5 years ago

Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/autoland/rev/0a25146a7c1a
AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself.  (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)... r=jonco

Jeff Walden [:Waldo]

Assignee

Comment 7

•

5 years ago

Comment on attachment 9087672 [details]
Bug 1575947 - AtomizeUTF8OrWTF8Chars should look up non-ASCII, Latin-1 single-code-point static strings itself. (Static strings are not recorded as permanent atoms or in the normal atom table, so it's always necessary to specially look them up.)...

Beta/Release Uplift Approval Request

User impact if declined: Property accesses in source code using single-code-point property names in the range [U+0080, U+00FF] will evaluate to undefined, not the actual value of the property. And it'll be very difficult for web developers to figure out, because the string created for the property will function in every respect like it should -- except for pointer identity and therefore property-lookup equivalence. Basically impossible to debug without going in with a C++ debugger.

Possibly other bad symptoms could manifest, beyond just property accesses in source: in principle every caller of js::AtomizeUTF8Chars https://searchfox.org/mozilla-central/search?q=AtomizeUTF8Chars represents a risk. It appears, for example, global imports in asm.js may be susceptible to errors due to this bug. But only callers that care about JSAtom* pointer equality (e.g. they use the atom to access a property) will actually be buggy. I don't immediately know how many do that. At a quick skim I'm guessing the TokenStream caller that stumbled over this is the most likely caller to do this, but I can't say none of the others do it.

Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): The static string store is simple, and other paths to create a string already have code similar to this. The other cases have it simpler -- they don't have to compute the code point to check for a static string of. But the decoding and probing we do here is for exactly one UTF-8 bit encoding and is extremely simple -- and tested by the included test -- so it's very hard to see any room for error.
String changes made/needed:

Attachment #9087672 - Flags: approval-mozilla-beta?

Jeff Walden [:Waldo]

Assignee

Comment 8

•

5 years ago

While the URL testcase started misbehaving with bug 1554362, the root problem was introduced by bug 1494942.

The comment // Since the static strings are all ascii, we can check them before trying anything else. introduced by that patch is probably most of what led people astray. Had that been true, the fact of disjoint storage of atoms in three distinct places (all of which must be probed when atomizing character sequences to be sure a duplicate atom isn't created) wouldn't have mattered, because "\u00B5" and so on would not have been in static strings and would have had to be created/record in the normal atoms set.

Regressed by: 1494942

Andrei Ciure[:aciure]

Comment 9

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0a25146a7c1a

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox70: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla70

Ryan VanderMeulen [:RyanVM]

Comment 10

•

5 years ago

This isn't a new issue and we're out of betas this cycle. Let's let it ride with Fx70.

status-firefox68: --- → wontfix

status-firefox69: --- → wontfix

status-firefox-esr60: --- → unaffected

status-firefox-esr68: --- → wontfix

Flags: in-testsuite+

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Attachment #9087672 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

Regressions: 1620290

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.