933885 - xpcshell's dump() mangles non-ASCII output

Assignee

Description

•

11 years ago

The dump() implementation in XPCShell.cpp [ https://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCShellImpl.cpp#278 ] blindly writes the raw bytes of a JSString object to a C FILE.  It is only by dumb luck that this works as well as it does, and it breaks down the moment you try to print a non-ASCII string.  In particular, if you have an xpcshell test that emits non-ASCII characters in (some of) its subtest reports, the string will be mangled to the point where Python's (notoriously picky) JSON decoder will refuse to process it *at all*, and that means runxpcshelltests.py won't be able to figure out whether the test passed!

A complete fix (in the sense that './mach xpcshell-test <test which prints non-ASCII>' does the Right Thing) requires some small adjustments to runxpcshelltests.py as well, but the important part is to make dump() convert to UTF-8 (as other implementations of dump(), e.g. the one right next door in mozJSComponentLoader.cpp, do).

Patch shortly.

Zack Weinberg (:zwol)

Assignee

Comment 1

•

11 years ago

Attached patch 933885-xpcshell-dump-utf8.diff (obsolete) — Details — Splinter Review

Assignee: nobody → zackw

Status: NEW → ASSIGNED

Attachment #826029 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 2

•

11 years ago

Comment on attachment 826029 [details] [diff] [review]
933885-xpcshell-dump-utf8.diff

The C++ bits look fine, especially if you pass "length" as the second arg to the NS_ConvertUTF16toUTF8 ctor.

But I don't know what those JS bits are and whether those changes are correct.  Please get someone who does to review those?

Attachment #826029 - Flags: review?(bzbarsky) → review+

Zack Weinberg (:zwol)

Assignee

Updated

•

11 years ago

Blocks: 933494

Zack Weinberg (:zwol)

Assignee

Comment 3

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #2)
> Comment on attachment 826029 [details] [diff] [review]
> 933885-xpcshell-dump-utf8.diff
> 
> The C++ bits look fine, especially if you pass "length" as the second arg to
> the NS_ConvertUTF16toUTF8 ctor.

OK.  I copied that code from mozJSComponentLoader, so I'll make the same change there.
 
> But I don't know what those JS bits are and whether those changes are
> correct.  Please get someone who does to review those?

Understood.

Zack Weinberg (:zwol)

Assignee

Comment 4

•

11 years ago

Attached patch 933885-xpcshell-dump-utf8.diff (obsolete) — Details — Splinter Review

Revised patch.

ted: Your review is being sought specifically for the runxpcshelltests.py change.  The removal of the fake-o JSON loader is just cleanup. The important part for this bug is the addition of the message.encode() call to report_message; this is because mach's hooks expect UTF-8 byte strings and barf if they get Unicode.

See also the dependent bug 933494, which adds a test (more precisely, adds subtests to an existing test) that need this to work.

Attachment #826029 - Attachment is obsolete: true

Attachment #826042 - Flags: review?(ted)

Attachment #826042 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 5

•

11 years ago

Comment on attachment 826042 [details] [diff] [review]
933885-xpcshell-dump-utf8.diff

r=me

Attachment #826042 - Flags: review?(bzbarsky) → review+

(not currently active) Ted Mielczarek

Comment 6

•

11 years ago

Comment on attachment 826042 [details] [diff] [review]
933885-xpcshell-dump-utf8.diff

Review of attachment 826042 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, sorry for the delay.

Attachment #826042 - Flags: review?(ted) → review+

Zack Weinberg (:zwol)

Assignee

Comment 7

•

11 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/ec8ac6112088

Note: if this needs to be backed out for some reason, bug 933494 will also need to be backed out.

Ryan VanderMeulen [:RyanVM]

Comment 8

•

11 years ago

Backed out for frequent xpcshell timeouts in test_singlebytes.js.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9cd9aae255b5

https://tbpl.mozilla.org/php/getParsedLog.php?id=30289839&tree=Mozilla-Inbound

Zack Weinberg (:zwol)

Assignee

Comment 9

•

11 years ago

Attached patch patch v2 (with fixes for problem test_singlebytes.js) (obsolete) — Details — Splinter Review

(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #8)
> Backed out for frequent xpcshell timeouts in test_singlebytes.js.

This was fundamentally a latent problem in test_singlebytes.js, which is ... not well written.  What it wants to do is test round-trip encoding and decoding of *every possible Unicode code point* in UTF-8 and both variants of UTF-16, plus a cross-check between UTF-8 TextEncoder and encodeURIComponent.  To do so, it creates enormous strings (0x1000 characters each) and passes them to do_check_eq(), i.e. each of those strings is written *twice* to the logfile.  And since it's using a shim to adapt xpcshell/head.js conventions to W3C testharness.js conventions, and that shim prints everything out *again*, we're getting these huge strings four times each.  Times five for the five separate tests done for each code point.  A logfile for *just this test*, with the dump() fix, is roughly 67 megabytes long!  (Without the fix, it's 378 *kilo*bytes of mangledness.)  So, with dump() fixed, runxpcshelltests.py has to do more work, and since the test is already rather slow, it was falling over the timeout threshold. To top it all off, there were identical typos in each copy of the (repeated for each test) string generation code, causing test_singlebytes.js not to be testing what it was supposed to test in the first place!

The fix has several pieces:

1) xpcshell test output may contain arbitrary C0 and C1 control characters, which should not be allowed to make it to the logfile.  This is appropriately dealt with in runxpcshelltests.py.

2) Remove the debugging printouts from the testharness.js shim (dom/encoding/test/unit/head.js) so that everything isn't printed twice.

3) Split test_singlebytes.js into two pieces, one for the round-trip tests as described above (now test_utf_roundtrip.js) and one for all the other tests it does (now test_encoding_misc.js).  If there is a problem with one, the other will not be affected.

4) Rewrite test_utf_roundtrip.js to correct the internal bugs, make it be substantially more efficient (runtime of the test itself goes from ~45 to ~15 seconds) and, perhaps most important, to *not* print out these gigantic strings at all.  Instead, if there is a difference, only the differing code units are printed.

The js/ part of the patch is unchanged from the previous revision.  Ted, I'm asking you to look again at the runxpcshelltests changes; Jonas, you reviewed the last changes to test_singlebytes.js so I'm giving you that part.

I am aware that these tests are taken from some third party repository or other; unfortunately, making test_utf_roundtrip.js sane required bypassing the testharness.js shim, so it cannot go back upstream as is (testharness.js would have to grow support for "only print per-code-unit differences if these two huge strings do not compare equal" first).  Regardless, I do not have time to do the upstreaming.

Pushed to try this time: https://tbpl.mozilla.org/?tree=Try&rev=470ef558eee4

Attachment #826042 - Attachment is obsolete: true

Attachment #829534 - Flags: review?(ted)

Attachment #829534 - Flags: review?(jonas)

Zack Weinberg (:zwol)

Assignee

Comment 10

•

11 years ago

Attached patch patch v2a (obsolete) — Details — Splinter Review

Small revision: it turns out existing logic (such as the JS stack dumper for failed tests) relies on being able to stick newlines in these "messages", so exclude TAB, CR, and LF from escaping.

Attachment #829534 - Attachment is obsolete: true

Attachment #829534 - Flags: review?(ted)

Attachment #829534 - Flags: review?(jonas)

Attachment #829791 - Flags: review?(ted)

Attachment #829791 - Flags: review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 11

•

11 years ago

Comment on attachment 829791 [details] [diff] [review]
patch v2a

Review of attachment 829791 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I don't think I'm a good reviewer for any of this

Attachment #829791 - Flags: review?(jonas) → review?

(not currently active) Ted Mielczarek

Comment 12

•

11 years ago

Comment on attachment 829791 [details] [diff] [review]
patch v2a

Review of attachment 829791 [details] [diff] [review]:
-----------------------------------------------------------------

This patch makes my head hurt. I'm going to punt to gps since I think he has a better understanding of Python and Unicode.

Sorry this sat in my review queue so long, I apologize for that.

Attachment #829791 - Flags: review?(ted)

Attachment #829791 - Flags: review?(gps)

Attachment #829791 - Flags: review?

Gregory Szorc [:gps]

Comment 13

•

11 years ago

Comment on attachment 829791 [details] [diff] [review]
patch v2a

Review of attachment 829791 [details] [diff] [review]:
-----------------------------------------------------------------

Can you split this patch into its independent components? That would really help with review and help things get landed sooner.

I'm not super crazy about cleaning up output in Python: I'd much prefer for the xpcshell process to emit utf-8, using some combination of the designated Unicode replacement character (�) with intelligent escaping (e.g. hex) for byte arrays. Although, with JavaScript using String for both textual and non-textual data, that means we're either using some heuristic or changing our tests to call one function for byte compare and one for string or doing something else equally hacky. Not what I call a robust solution. I can see why cleaning up "invalid" output from Python seems like the reasonable decision.

That being said, I'm very concerned about the loss of debugging data with this patch. I rely on the low-level output to help debug test failures all the time. It's a time saver, especially when you are debugging failures from buildbot logs.

Review gets f+ for effort and the Python code looking mostly good, but r- on the loss of debug info. We need to preserve that data (at least in the case of test failure) with minimal data loss. Failure to do so makes writing new tests and debugging failures harder. We cannot make testing harder than it already is.

::: dom/encoding/test/unit/head.js
@@ -15,5 @@
>    do_check_neq(a, b, Components.stack.caller);
>  }
>  
>  function assert_array_equals(a, b, msg) {
> -  dump("assert_array_equals(\"" + msg + "\")");

A problem with removing these dump() calls is that it removes valuable debug logging when things fail in automation or even when executed locally. I fear the removal of this output will result in people resorting to adding dump()'s into their tests, possibly as part of temporary try pushes to help diagnose failures.

It's really nice for the error messages to just provide actionable error messages without requiring people to adjust their tests to capture this information first. IMO it really cuts down on development test of tests, which is already a pain point, IMO.

::: dom/encoding/test/unit/test_encoding_misc.js
@@ +1,2 @@
> +// NOTE: Requires testharness.js
> +// http://www.w3.org/2008/webapps/wiki/Harness

Need moar license header.

Perhaps you should `hg cp` this file and remove the roundtrip parts so its logical history is preserved. It will also help with review. I assume content in this file is untouched from test_utf_roundtrip.js, but it's hard to tell. If you upload this patch to ReviewBoard or Phabricator, it should tell us, however. I blame Splinter.

Splitting the extraction of this file into its own patch would be very helpful.

::: dom/encoding/test/unit/test_utf_roundtrip.js
@@ +53,5 @@
> +  }
> +
> +  // It should be impossible to get here.
> +  do_report_result(false, description + " | failed to detect string difference",
> +                   Components.stack.caller, false);

I think I grok what you are doing here. But I'm not a Unicode expert.

I think you should get review from a DOM peer (or whoever owns this test file) for this refactoring.

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +121,5 @@
>      if (!chars)
>          return false;
>  
> +    NS_ConvertUTF16toUTF8 utf8str(reinterpret_cast<const PRUnichar*>(chars),
> +                                  length);

While this change seems good (my guess is it helps prevent buffer overruns), I'm not sure what the implication is and don't feel comfortable reviewing it.

::: js/xpconnect/src/XPCShellImpl.cpp
@@ +301,5 @@
> +    if (IsDebuggerPresent()) {
> +      OutputDebugStringW(reinterpret_cast<const PRUnichar*>(chars));
> +    }
> +#endif
> +    fputs(utf8str.get(), gOutFile);

Ditto for this C++.

::: testing/xpcshell/runxpcshelltests.py
@@ +74,5 @@
>  
> +# TODO: perhaps this should be in a more generally shared location?
> +# This gets all C0 and C1 controls except TAB, CR, and LF.
> +# (??? CR may not be necessary.)
> +_cleanup_encoding_re = re.compile(u"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f\\\\]")

Could you please document which characters these catch? Perhaps use a multiline regex so you can comment each line/entry.

@@ +77,5 @@
> +# (??? CR may not be necessary.)
> +_cleanup_encoding_re = re.compile(u"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f\\\\]")
> +def _cleanup_encoding_repl(m):
> +    c = m.group(0)
> +    return "\\\\" if c == "\\" else "\\x{:02X}".format(ord(m.group(0)))

Fun fact: Python has a 'hex' encoding that allows you to easily obtain 2 digit hex representation of bytes. e.g.

'foo'.encode('hex')

This /might/ be faster than using str.format() with ord(). Might want to microbench it.

@@ +86,5 @@
> +       UTF-8, but it may not be *correct* UTF-8.  Produce a byte
> +       string that can safely be dumped into a (generally UTF-8-coded)
> +       logfile."""
> +    if not isinstance(s, unicode):
> +        s = s.decode("utf-8", "replace")

What concerns me here is that the output from any dump() or print() in JS that emits invalid UTF-8 is going to result in �. There are currently places where we compare non-textual byte sequences and this output can tell us which characters vary. � cannot tell us this critical info required to fix a test.

Attachment #829791 - Flags: review?(gps) → feedback+

Gregory Szorc [:gps]

Comment 14

•

11 years ago

Zack: I want to applaud you for tackling this issue. dump()'s ignorant behavior has bothered me for a very long time and I'm very happy to see you taking a stab at it. When reading my review, let's not forget that perfect is the enemy of good. I'm sure we can find a middle ground.

Zack Weinberg (:zwol)

Assignee

Comment 15

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #13)
> Can you split this patch into its independent components? That would really
> help with review and help things get landed sooner.

Certainly.  I should have done that already, in fact. If nothing else,
some of the things you say you're not qualified to review have already
been reviewed by other people.

It won't make sense to land the patches individually, though.  Also,
for the record, I will be on vacation from Dec 23 through Jan 5 and
won't be able to make further revisions to the patches.

> I'm not super crazy about cleaning up output in Python: I'd much prefer for
> the xpcshell process to emit utf-8, using some combination of the designated
> Unicode replacement character (�) with intelligent escaping (e.g. hex) for
> byte arrays.

In fact, that is what my changes to dump() do.  I just checked it, and
NS_ConvertUTF16toUTF8 guarantees to produce valid UTF-8 even if handed
invalid input.  This is in tension with your desire to accurately
record binary output, since it throws away information by replacing
invalid surrogate pairs with U+FFFD, but I don't feel comfortable
changing what NS_ConvertUTF16toUTF8 does without a lot more input from
other people.  I think an easier path forward would be to identify
tests that need to report binary output - I expect and provide a
different set of do_check_* functions for them.)

The output cleanup in Python has a different purpose: its primary
function is to prevent validly-encoded *control characters* from
reaching the logfile verbatim, which can also cause downstream parser
failure.  This may not have been clear from the code, which is also
checking for invalid UTF-8 sequences, but that's just extra
defensiveness on my part.

> That being said, I'm very concerned about the loss of debugging data with
> this patch. I rely on the low-level output to help debug test failures all
> the time. It's a time saver, especially when you are debugging failures from
> buildbot logs.

You may not have realized that the "low-level debug logs" I removed
(I assume you are talking about the changes to dom/encoding/test/unit/head.js)
are almost entirely redundant with the logs printed by do_check_eq and friends.
To demonstrate this, if you create a file dom/encoding/test/unit/test_demo.js
with these contents:

    test(function() {
      assert_equals("a", "a", "one");
      assert_not_equals("a", "b", "two");
      assert_array_equals(["a"], ["a"], "three");
      assert_true(1+1 === 2, "four");
      assert_throws({name:"TypeError"}, function () { throw new TypeError(); });
    }, "demo");

add it to xpcshell.ini in that directory, and run it
(./mach xpcshell-test dom/encoding/test/unit/test_demo.js) you will
get this output (chatter above and below snipped):

 0:01.63 test("demo")assert_equals(a, a, "one")
 0:01.63 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 1] "a" == "a"
 0:01.63 assert_not_equals(a, b, "two")
 0:01.63 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 2] "a" != "b"
 0:01.63 assert_array_equals("three")
 0:01.63 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 3] 1 == 1
 0:01.64 assert_true(true, "four")
 0:01.64 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 4] true == true
 0:01.64 assert_throws("[object Object]", function () {throw new TypeError()})
 0:01.64 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 5] "TypeError" == "TypeError"

With the latest iteration of my changes, instead we get

 0:01.65
 0:01.65 TEST-INFO | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | test group: demo
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 2] one: "a" == "a"
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 3] two: "a" != "b"
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 4] three: (length) 1 == 1
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 4] three: all array elements equal
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 5] four: true
 0:01.65
 0:01.65 TEST-PASS | /home/zack/projects/mozilla/B-mc/_tests/xpcshell/dom/encoding/test/unit/test_demo.js | [null : 6] expected to catch an exception named TypeError, got TypeError

(I'm not sure why the blank lines.)  You can see that *everything* that
was captured by the "assert_*" messages is now included on the TEST-INFO
and TEST-PASS lines.

It is important to get rid of the extra messages, because when the
assert_* functions are called with long strings, it *doubles* the size
of the log, and excessively huge logs can *in and of themselves* cause
tests to time out.

So I'm afraid I must insist on that change. If you still disagree,
please ask another testing peer for a second opinion.

...
> ::: dom/encoding/test/unit/test_encoding_misc.js
> @@ +1,2 @@
> > +// NOTE: Requires testharness.js
> > +// http://www.w3.org/2008/webapps/wiki/Harness
>
> Need moar license header.

This was originally a file taken verbatim from a third-party test suite.
I originally said I didn't have time to upstream the changes, but in
January I probably will.  Regardless, I think slapping MPL boilerplate
on it is inappropriate.  (I'm not aware of any official policy for this
situation, are you?)

> Perhaps you should `hg cp` this file and remove the roundtrip parts so its
> logical history is preserved. It will also help with review. I assume
> content in this file is untouched from test_utf_roundtrip.js, but it's hard
> to tell.

Yes, the contents of test_encoding_misc.js are moved unchanged from
test_utf_roundtrip.js.  I will see what I can do about making the
diffs more readable.

> ::: dom/encoding/test/unit/test_utf_roundtrip.js
>
> I think you should get review from a DOM peer (or whoever owns this test
> file) for this refactoring.

Yah.  Unfortunately, sicking has already declined, and I'm not sure
who else might be qualified.

> ::: js/xpconnect/loader/mozJSComponentLoader.cpp
> >
> > +    NS_ConvertUTF16toUTF8 utf8str(reinterpret_cast<const PRUnichar*>(chars),
> > +                                  length);
>
> While this change seems good (my guess is it helps prevent buffer overruns),
> I'm not sure what the implication is and don't feel comfortable reviewing it.

bz already r+ed this part (but I'm going to have him re-review it,
because I found another place where we have the same problem).  I
don't fully understand this stuff myself, but I believe it is needed
in order to handle JS strings containing U+0000 correctly.

> > +# TODO: perhaps this should be in a more generally shared location?
> > +# This gets all C0 and C1 controls except TAB, CR, and LF.
> > +# (??? CR may not be necessary.)
> > +_cleanup_encoding_re = re.compile(u"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f\\\\]")
>
> Could you please document which characters these catch?

"All C0 and C1 controls except TAB, CR, and LF" is already
unambiguous, but perhaps this is better?

# This regex matches all of the C0 and C1 controls (U+0000 through
# U+001F and U+007F through U+009F) except TAB (U+0009), CR (U+000D),
# and LF (U+000A), plus backslash (U+005C).

Unfortunately, Python does not appear to support \p{Cc}, and
[[:cntrl:]] would only match the C0 controls (i.e. the ones in ASCII).

> Perhaps use a multiline regex so you can comment each line/entry.

I don't think that would make anything clearer.

> Fun fact: Python has a 'hex' encoding that allows you to easily obtain 2
> digit hex representation of bytes. e.g.
>
> 'foo'.encode('hex')
>
> This /might/ be faster than using str.format() with ord(). Might want to
> microbench it.

I'll look into it.

> @@ +86,5 @@
> > +       UTF-8, but it may not be *correct* UTF-8.  Produce a byte
> > +       string that can safely be dumped into a (generally UTF-8-coded)
> > +       logfile."""
> > +    if not isinstance(s, unicode):
> > +        s = s.decode("utf-8", "replace")
>
> What concerns me here is that the output from any dump() or print() in JS
> that emits invalid UTF-8 is going to result in �. There are currently places
> where we compare non-textual byte sequences and this output can tell us
> which characters vary. � cannot tell us this critical info required to fix a
> test.

As mentioned above, this would need to be addressed in
NS_ConvertUTF16toUTF8, which already substitutes U+FFFD for malformed
UTF-16; the additional logic here is just for extra defensiveness.
I'm not totally against that but I am reluctant to do it without
buy-in from XPCOM and other consumers.

Flags: needinfo?(gps)

Zack Weinberg (:zwol)

Assignee

Comment 16

•

11 years ago

> > 'foo'.encode('hex')
> > 
> > This /might/ be faster than using str.format() with ord(). Might want to
> > microbench it.
> 
> I'll look into it.

This turns out to be significantly _slower_ than str.format() + ord().

Zack Weinberg (:zwol)

Assignee

Comment 17

•

11 years ago

Comment on attachment 829791 [details] [diff] [review]
patch v2a

Here comes a new six-part patch series.

https://tbpl.mozilla.org/?tree=Try&rev=c3d7b9c6e99f

Attachment #829791 - Attachment is obsolete: true

Zack Weinberg (:zwol)

Assignee

Comment 18

•

11 years ago

Attached patch 933885-1-xpcshell-dump-utf8.diff — Details — Splinter Review

Here are the C++ changes that started all of this, again. Boris, you've seen this before.  There _is_ a substantive change: I noticed that xpcshell's print() really ought to get the same treatment.  The new review request is largely for that.  Also I'd like to draw your attention to Gregory's concerns about not being able to dump arbitrary binary data... it seems to me that the cure for that is _not_ to make ConvertUTF16toUTF8 preserve malformed UTF-16 somehow, but I don't exactly have a better idea...

Attachment #8349566 - Flags: review?(bzbarsky)

Zack Weinberg (:zwol)

Assignee

Comment 19

•

11 years ago

Attached patch 933885-2a-runxpcshelltests-json.diff — Details — Splinter Review

runxpcshelltests.py changes, A: remove the no-longer-necessary fake JSON parser.  (This has nothing to do with the rest of the bug.)

Attachment #8349569 - Flags: review?(gps)

Zack Weinberg (:zwol)

Assignee

Comment 20

•

11 years ago

Attached patch 933885-2b-runxpcshelltests-encoding.diff — Details — Splinter Review

runxpcshelltests.py changes, B: escape control characters (and malformed UTF-8, which _should_ be impossible, but paranoia is a virtue) when decoding JSON logs.

Attachment #8349572 - Flags: review?(gps)

Zack Weinberg (:zwol)

Assignee

Comment 21

•

11 years ago

Attached patch 933885-3-testharness-js-shim-remove-redundant-output.diff — Details — Splinter Review

Make the testharness.js shim substantially cleverer about what it logs; the _important_ thing is removing the redundant output.

Attachment #8349574 - Flags: review?(gps)

Zack Weinberg (:zwol)

Assignee

Comment 22

•

11 years ago

Attached patch 933885-4a-split-test-singlebytes.diff (obsolete) — Details — Splinter Review

Split test_singlebytes.js into two files, one for the "round trip every code point below U+10FFFF" loops, one for the miscellaneous tests afterward.

Attachment #8349576 - Flags: review?(hsivonen)

Zack Weinberg (:zwol)

Assignee

Comment 23

•

11 years ago

Attached patch 933885-4b-overhaul-test-utf-roundtrip.diff (obsolete) — Details — Splinter Review

And, finally, overhaul the round-trip-every-codepoint tests so that they (a) test what they're supposed to test, (b) don't take ages doing it.  (This test still takes 24 seconds to run! but hopefully it will no longer be timing out the test suite and thus causing part 1, which is the part of this that actually _matters_, to get backed out. :-P)

Henri: I tagged you for review on this and the previous patch largely because you are Mr. Character Encodings lately.  Please feel free to decline or reassign.

Attachment #8349580 - Flags: review?(hsivonen)

Boris Zbarsky [:bzbarsky]

Comment 24

•

11 years ago

Comment on attachment 8349566 [details] [diff] [review]
933885-1-xpcshell-dump-utf8.diff

r=me

We should consider just adding a dedicated dumping function for non-text data...

Attachment #8349566 - Flags: review?(bzbarsky) → review+

Gregory Szorc [:gps]

Comment 25

•

11 years ago

(In reply to Zack Weinberg (:zwol) (vacation Dec 26 - Jan 5) from comment #15)
> > That being said, I'm very concerned about the loss of debugging data with
> > this patch. I rely on the low-level output to help debug test failures all
> > the time. It's a time saver, especially when you are debugging failures from
> > buildbot logs.
> 
> You may not have realized that the "low-level debug logs" I removed
> (I assume you are talking about the changes to
> dom/encoding/test/unit/head.js)
> are almost entirely redundant with the logs printed by do_check_eq and
> friends.

Glad to hear it! It sounds like my concern doesn't hold water.

> ...
> > ::: dom/encoding/test/unit/test_encoding_misc.js
> > @@ +1,2 @@
> > > +// NOTE: Requires testharness.js
> > > +// http://www.w3.org/2008/webapps/wiki/Harness
> >
> > Need moar license header.
> 
> This was originally a file taken verbatim from a third-party test suite.
> I originally said I didn't have time to upstream the changes, but in
> January I probably will.  Regardless, I think slapping MPL boilerplate
> on it is inappropriate.  (I'm not aware of any official policy for this
> situation, are you?)

My default response to "no license header" is "put something there." If the licensing is unclear, IMO there should at least be a comment indicating where things came from lest an ignorant reviewer (like me) and coder (not like you) slap an incorrect one on there because they weren't warned!
 
> > ::: dom/encoding/test/unit/test_utf_roundtrip.js
> >
> > I think you should get review from a DOM peer (or whoever owns this test
> > file) for this refactoring.
> 
> Yah.  Unfortunately, sicking has already declined, and I'm not sure
> who else might be qualified.

Yeah, I have no clue. You can always raise this at the Tuesday Engineering meeting and lmandel will wait for someone to stand up and own it.

> > > +# TODO: perhaps this should be in a more generally shared location?
> > > +# This gets all C0 and C1 controls except TAB, CR, and LF.
> > > +# (??? CR may not be necessary.)
> > > +_cleanup_encoding_re = re.compile(u"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f\\\\]")
> >
> > Could you please document which characters these catch?
> 
> "All C0 and C1 controls except TAB, CR, and LF" is already
> unambiguous, but perhaps this is better?
> 
> # This regex matches all of the C0 and C1 controls (U+0000 through
> # U+001F and U+007F through U+009F) except TAB (U+0009), CR (U+000D),
> # and LF (U+000A), plus backslash (U+005C).
> 
> Unfortunately, Python does not appear to support \p{Cc}, and
> [[:cntrl:]] would only match the C0 controls (i.e. the ones in ASCII).
> 
> > Perhaps use a multiline regex so you can comment each line/entry.
> 
> I don't think that would make anything clearer.

I wrote this comment because I don't have the ascii table memorized :) I didn't realize there was even a regexp character class for control characters (at least in some implementations)! It just seems like a bit too much magic for me. But if this is really something that is well-understood by people who know text encodings, I suppose I'm just lacking knowledge and the comment is probably good enough.


> > @@ +86,5 @@
> > > +       UTF-8, but it may not be *correct* UTF-8.  Produce a byte
> > > +       string that can safely be dumped into a (generally UTF-8-coded)
> > > +       logfile."""
> > > +    if not isinstance(s, unicode):
> > > +        s = s.decode("utf-8", "replace")
> >
> > What concerns me here is that the output from any dump() or print() in JS
> > that emits invalid UTF-8 is going to result in �. There are currently places
> > where we compare non-textual byte sequences and this output can tell us
> > which characters vary. � cannot tell us this critical info required to fix a
> > test.
> 
> As mentioned above, this would need to be addressed in
> NS_ConvertUTF16toUTF8, which already substitutes U+FFFD for malformed
> UTF-16; the additional logic here is just for extra defensiveness.
> I'm not totally against that but I am reluctant to do it without
> buy-in from XPCOM and other consumers.

Agreed. Let's not bloat scope. If there's no significant data loss, I'm fine with maintaining the current methodology.

Flags: needinfo?(gps)

Gregory Szorc [:gps]

Comment 26

•

11 years ago

Comment on attachment 8349569 [details] [diff] [review]
933885-2a-runxpcshelltests-json.diff

Review of attachment 8349569 [details] [diff] [review]:
-----------------------------------------------------------------

I'm digging the split up patches - thanks!

Attachment #8349569 - Flags: review?(gps) → review+

Gregory Szorc [:gps]

Comment 27

•

11 years ago

Comment on attachment 8349572 [details] [diff] [review]
933885-2b-runxpcshelltests-encoding.diff

Review of attachment 8349572 [details] [diff] [review]:
-----------------------------------------------------------------

This seems good. A unit test would be nice, but I won't hold it to you.

::: testing/xpcshell/runxpcshelltests.py
@@ +90,5 @@
> +       string that can safely be dumped into a (generally UTF-8-coded)
> +       logfile."""
> +    if not isinstance(s, unicode):
> +        s = s.decode("utf-8", "replace")
> +    if s.endswith('\n'):

Nit: Mixing '' and "". I personally use '' unless a string contains ', but it really doesn't matter too much. Unlike other languages where '' is reserved for single character, there is absolutely no difference between '' and "" in Python except for what characters need escaped.

Attachment #8349572 - Flags: review?(gps) → review+

Boris Zbarsky [:bzbarsky]

Comment 28

•

11 years ago

You should probably ask Masatoshi Kimura to review the DOM test changes.

Flags: needinfo?(VYV03354)

Henri Sivonen (:hsivonen)

Comment 29

•

11 years ago

Comment on attachment 8349576 [details] [diff] [review]
933885-4a-split-test-singlebytes.diff

emk would be a more appropriate reviewer for these.

Attachment #8349576 - Flags: review?(hsivonen) → review?(VYV03354)

Henri Sivonen (:hsivonen)

Updated

•

11 years ago

Attachment #8349580 - Flags: review?(hsivonen) → review?(VYV03354)

Masatoshi Kimura [:emk]

Comment 30

•

11 years ago

Comment on attachment 8349576 [details] [diff] [review]
933885-4a-split-test-singlebytes.diff

Review of attachment 8349576 [details] [diff] [review]:
-----------------------------------------------------------------

Please upstream the change when you have a time. The file was originally taken from here:
https://code.google.com/p/stringencoding/source/browse/tests.js
The license file is attached here:
https://mxr.mozilla.org/mozilla-central/source/dom/encoding/test/stringencoding-license.txt
(Unlike MPL, Apache License doesn't require including the license text for all licensed files.)

Attachment #8349576 - Flags: review?(VYV03354) → review+

Masatoshi Kimura [:emk]

Comment 31

•

11 years ago

Comment on attachment 8349580 [details] [diff] [review]
933885-4b-overhaul-test-utf-roundtrip.diff

Review of attachment 8349580 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/encoding/test/unit/head.js
@@ +39,5 @@
>  }
>  
> +// This is used to avoid dumping enormous strings to the
> +// test log in the normal (success) case.
> +function assert_string_equals(actual, expected, description) {

The real assert_* functions in testharness will generate the message only if the expectation fails. So you should improve the shim directly rather than modifying the testharness API.

::: dom/encoding/test/unit/test_utf_roundtrip.js
@@ +43,5 @@
>  }
>  
> +const MIN_CODEPOINT = 0;
> +const MAX_CODEPOINT = 0x10FFFF;
> +const BATCH_SIZE = 0x1000;

Don't use "const" for the meant-to-portable test file, especially when it will be upstreamed eventually.

@@ +53,5 @@
> +    for (i = MIN_CODEPOINT; i < MAX_CODEPOINT; i += BATCH_SIZE) {
> +      label = cpname(i) + " - " + cpname(i + BATCH_SIZE - 1);
> +      string = genblock(i, BATCH_SIZE);
> +
> +      for (encoding of ["UTF-8", "UTF-16LE", "UTF-16BE"]) {

Same above. Don't rely on the ES6 syntax.

@@ +56,5 @@
> +
> +      for (encoding of ["UTF-8", "UTF-16LE", "UTF-16BE"]) {
> +        encoded = TextEncoder(encoding).encode(string);
> +        decoded = TextDecoder(encoding).decode(encoded);
> +        assert_string_equals(string, decoded, encoding + " round trip " + label);

"label" is a defined term in the Encoding Standard, so it is not a good variable name in this context.

Attachment #8349580 - Flags: review?(VYV03354) → review-

Masatoshi Kimura [:emk]

Updated

•

11 years ago

Flags: needinfo?(VYV03354)

Gregory Szorc [:gps]

Comment 32

•

11 years ago

Comment on attachment 8349574 [details] [diff] [review]
933885-3-testharness-js-shim-remove-redundant-output.diff

Review of attachment 8349574 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. Please add a comment why we can't use xpcshell's head.js functions and are using do_report_result() instead.

Not sure if this needs review from a DOM peer.

::: dom/encoding/test/unit/head.js
@@ +1,5 @@
>  /**
> + * This is a shim for the W3C testharness.js, mapping those of its
> + * functions that we need to the testing/xpcshell/head.js API.
> + * See http://www.w3.org/2008/webapps/wiki/Harness
> + *

I /think/ the copyright notice should come first.

@@ +10,2 @@
>  function assert_equals(a, b, msg) {
> +  var text = msg + ": " + _wrap_with_quotes_if_necessary(a) +

We can use let here, right? I generally forbid new instances of var in non-content code. But the W3C testharness.js relationship has me confused.

Attachment #8349574 - Flags: review?(gps) → review+

Masatoshi Kimura [:emk]

Comment 33

•

11 years ago

(In reply to Masatoshi Kimura [:emk] from comment #31)
> ::: dom/encoding/test/unit/head.js
> @@ +39,5 @@
> >  }
> >  
> > +// This is used to avoid dumping enormous strings to the
> > +// test log in the normal (success) case.
> > +function assert_string_equals(actual, expected, description) {
> 
> The real assert_* functions in testharness will generate the message only if
> the expectation fails. So you should improve the shim directly rather than
> modifying the testharness API.

Also note that these test files are shared with the mochitest which will use the real testharness.

Zack Weinberg (:zwol)

Assignee

Comment 34

•

10 years ago

Finally getting back to this - apologies for the long delay.

(In reply to Masatoshi Kimura [:emk] from comment #31)
> > +// This is used to avoid dumping enormous strings to the
> > +// test log in the normal (success) case.
> > +function assert_string_equals(actual, expected, description) {
> 
> The real assert_* functions in testharness will generate the message only if
> the expectation fails. So you should improve the shim directly rather than
> modifying the testharness API.

Unfortunately, in this context it is unacceptable to dump the complete (4096 characters each, so 8-16KB of data depending on where we are in the test) pair of strings *whether or not* the test fails.  It will cause testsuite timeouts.  I am prepared to attempt to upstream the API change all the way to the W3C test harness, but I do not know when I will have time for it and I don't want that to block landing this set of fixes locally.

Alternatively, I could move this function to the test itself and have it depend only on the public testharness API.  The diagnostics on failure will not be as good but we can probably live with that; it's not used in so many places.

> Also note that these test files are shared with the mochitest which will use
> the real testharness.

Really? We are running this test twice? Why are we doing that? It's very expensive even after my changes.

... However, I think this makes a conclusive argument in favor of putting the special string comparison logic in the test itself.  I'll rework it that way and see what its upstream thinks.

> Don't use "const" for the meant-to-portable test file, especially when it
> will be upstreamed eventually.

Ok.

> > +      for (encoding of ["UTF-8", "UTF-16LE", "UTF-16BE"]) {
> 
> Same above. Don't rely on the ES6 syntax.

Ok.

> "label" is a defined term in the Encoding Standard, so it is not a good
> variable name in this context.

Ok.

(In reply to Gregory Szorc [:gps] from comment #32)
> 
> This looks good to me. Please add a comment why we can't use xpcshell's
> head.js functions and are using do_report_result() instead.

Ok.

> Not sure if this needs review from a DOM peer.

I don't know either.

> ::: dom/encoding/test/unit/head.js
> @@ +1,5 @@
> >  /**
> > + * This is a shim for the W3C testharness.js, mapping those of its
> > + * functions that we need to the testing/xpcshell/head.js API.
> > + * See http://www.w3.org/2008/webapps/wiki/Harness
> > + *
> 
> I /think/ the copyright notice should come first.

Yah, probably.

> @@ +10,2 @@
> >  function assert_equals(a, b, msg) {
> > +  var text = msg + ": " + _wrap_with_quotes_if_necessary(a) +
> 
> We can use let here, right? I generally forbid new instances of var in
> non-content code. But the W3C testharness.js relationship has me confused.

Consistency with code not changed in this patch says use 'var' here.  This file isn't taken from anywhere else, though, so I wouldn't object to going through and making it consistently use 'let'.  Please choose.

Flags: needinfo?(gps)

Flags: needinfo?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Comment 35

•

10 years ago

Missed this on an earlier pass...

(In reply to Gregory Szorc [:gps] from comment #27)
> 933885-2b-runxpcshelltests-encoding.diff
> 
> This seems good. A unit test would be nice, but I won't hold it to you.

If you point me at documentation on how to write unit tests for this, I'll see what I can do.

> > +        s = s.decode("utf-8", "replace")
> > +    if s.endswith('\n'):
> 
> Nit: Mixing '' and "". I personally use '' unless a string contains ', but
> it really doesn't matter too much. Unlike other languages where '' is
> reserved for single character, there is absolutely no difference between ''
> and "" in Python except for what characters need escaped.

Switched to '' strings throughout this new code.

Masatoshi Kimura [:emk]

Comment 36

•

10 years ago

(In reply to Zack Weinberg (:zwol) from comment #34)
> Unfortunately, in this context it is unacceptable to dump the complete (4096
> characters each, so 8-16KB of data depending on where we are in the test)
> pair of strings *whether or not* the test fails.  It will cause testsuite
> timeouts.

It is not what I'm saying (see below).

> I am prepared to attempt to upstream the API change all the way
> to the W3C test harness, but I do not know when I will have time for it and
> I don't want that to block landing this set of fixes locally.

The W3C testharness will *already* dump the string only if the test fails.

> Alternatively, I could move this function to the test itself and have it
> depend only on the public testharness API.  The diagnostics on failure will
> not be as good but we can probably live with that; it's not used in so many
> places.

You can just move the logic into your assert_equals shim, because the real testharness function is already doing that.

> > Also note that these test files are shared with the mochitest which will use
> > the real testharness.
> 
> Really? We are running this test twice? Why are we doing that? It's very
> expensive even after my changes.

To make sure TextDecoder and TextEncoder are available and work correctly on both xpcshell/JSM environment and the browser environment. Both environments are quite different. (But maybe I'm a bit paranoia.)

Flags: needinfo?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Comment 37

•

10 years ago

(In reply to Masatoshi Kimura [:emk] from comment #36)
> (In reply to Zack Weinberg (:zwol) from comment #34)
> > I am prepared to attempt to upstream the API change all the way
> > to the W3C test harness, but I do not know when I will have time for it and
> > I don't want that to block landing this set of fixes locally.
> 
> The W3C testharness will *already* dump the string only if the test fails.

This test must not dump the string *even if* the test fails.

I have prepared a revision that moves the function to the test itself.  It is still unacceptably slow in debug builds, but that seems to be an issue with the JS engine which may need its own bug.

> > Really? We are running this test twice? Why are we doing that? It's very
> > expensive even after my changes.
> 
> To make sure TextDecoder and TextEncoder are available and work correctly on
> both xpcshell/JSM environment and the browser environment. Both environments
> are quite different. (But maybe I'm a bit paranoia.)

Are the mochitests using this same file, or a separate copy?

Flags: needinfo?(VYV03354)

Masatoshi Kimura [:emk]

Comment 38

•

10 years ago

(In reply to Zack Weinberg (:zwol) from comment #37)
> (In reply to Masatoshi Kimura [:emk] from comment #36)
> > (In reply to Zack Weinberg (:zwol) from comment #34)
> > > I am prepared to attempt to upstream the API change all the way
> > > to the W3C test harness, but I do not know when I will have time for it and
> > > I don't want that to block landing this set of fixes locally.
> > 
> > The W3C testharness will *already* dump the string only if the test fails.
> 
> This test must not dump the string *even if* the test fails.

I don't think the failure case does matter. Failed reftests will dump the screen capture using data URL.

> I have prepared a revision that moves the function to the test itself.  It
> is still unacceptably slow in debug builds, but that seems to be an issue
> with the JS engine which may need its own bug.
> 
> > > Really? We are running this test twice? Why are we doing that? It's very
> > > expensive even after my changes.
> > 
> > To make sure TextDecoder and TextEncoder are available and work correctly on
> > both xpcshell/JSM environment and the browser environment. Both environments
> > are quite different. (But maybe I'm a bit paranoia.)
> 
> Are the mochitests using this same file, or a separate copy?

Th same file in the source tree. Although the test infrastructure may create a separate copy in objdir, I don't familiar with it enough.

Flags: needinfo?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Updated

•

10 years ago

Depends on: 962605

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 39

•

10 years ago

Comment on attachment 8349574 [details] [diff] [review]
933885-3-testharness-js-shim-remove-redundant-output.diff

Review of attachment 8349574 [details] [diff] [review]:
-----------------------------------------------------------------

1.11 + * This shim does some tests a little differently than the W3C test
    1.12 + * harness; equality comparisons, especially, are less precise.
    1.13 + * The difference does not presently affect any test results.

Why?

Zack Weinberg (:zwol)

Assignee

Comment 40

•

10 years ago

Attached patch 933885-4a-merge-from-upstream.diff — Details — Splinter Review

OK, here we go with a new set of "part 4" patches.  The new 4b and 4c have been submitted upstream: https://code.google.com/p/stringencoding/issues/detail?id=11

and the complete series + the patches from bug 962605 have been pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=4ce7cf4e0794

In order to submit these changes upstream, I needed to merge our copy of the tests *from* upstream.  This patch just does that.  It is NOT a complete merge: we have a correction to test-gbk.js, which AFAICT has always been in error upstream (filed https://code.google.com/p/stringencoding/issues/detail?id=12 ) and we allow you to call JS constructors without 'new', which is apparently now forbidden, so I had to exclude that test (filed bug 963948).  And our files are named differently for no apparent reason, which I did not correct in case there is a valid reason.

The unfortunate size of the patch is due to extremely large static arrays/strings all on one line in the context.  None of those lines actually changed.

Attachment #8349576 - Attachment is obsolete: true

Attachment #8349580 - Attachment is obsolete: true

Attachment #8365554 - Flags: review?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Comment 41

•

10 years ago

Attached patch 933885-4b-split-test-singlebytes.diff — Details — Splinter Review

Slight change from previous: the "sampled" UTF round-trip tests now stay with the "exhaustive" ones, and the file names are different.

Attachment #8365556 - Flags: review?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Comment 42

•

10 years ago

Attached patch 933885-4c-overhaul-test-utf-roundtrip.diff — Details — Splinter Review

Notable changes from previous iteration include: assert_string_equals is now part of the test, Uint8Array/Uint16Array instead of Int16Array, removal of for-of, removal of attempt to shim typed arrays (other tests in this set unconditionally use typed arrays), better way to size the return array in genblock().  Still takes 33 seconds to complete on my computer, but hopefully bug 962605 will fix that.

Attachment #8365557 - Flags: review?(VYV03354)

Zack Weinberg (:zwol)

Assignee

Comment 43

•

10 years ago

(In reply to :Ms2ger from comment #39)
> Comment on attachment 8349574 [details] [diff] [review]
> 933885-3-testharness-js-shim-remove-redundant-output.diff
> 
> Review of attachment 8349574 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> 1.11 + * This shim does some tests a little differently than the W3C test
>     1.12 + * harness; equality comparisons, especially, are less precise.
>     1.13 + * The difference does not presently affect any test results.
> 
> Why?

Concrete example: this is the W3C's assert_equals, which is at great pains to be even more precise than === (in particular, it papers over the weird-by-design behavior of NaN):

    function same_value(x, y) {
        if (y !== y)
        {
            //NaN case
            return x !== x;
        }
        else if (x === 0 && y === 0) {
            //Distinguish +0 and -0
            return 1/x === 1/y;
        }
        else
        {
            //typical case
            return x === y;
        }
    }
    function assert_equals(actual, expected, description)
    {
         /*
          * Test if two primitives are equal or two objects
          * are the same object
          */
        if (typeof actual != typeof expected)
        {
            assert(false, "assert_equals", description,
                          "expected (" + typeof expected + ") ${expected} but got (" + typeof actual + ") ${actual}",
                          {expected:expected, actual:actual});
            return;
        }
        assert(same_value(actual, expected), "assert_equals", description,
                                             "expected ${expected} but got ${actual}",
                                             {expected:expected, actual:actual});
    };

But, before my patches, this is what the shim does:

    function assert_equals(a, b, msg) {
      dump("assert_equals(" + a + ", " + b + ", \"" + msg + "\")");
      do_check_eq(a, b, Components.stack.caller);
    }

and do_check_eq (in testing/xpcshell/head.js) just does an == (not even ===) comparison.  I thought it would be safer not to change that.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 44

•

10 years ago

(In reply to Zack Weinberg (:zwol) from comment #43)
> (In reply to :Ms2ger from comment #39)
> > Comment on attachment 8349574 [details] [diff] [review]
> > 933885-3-testharness-js-shim-remove-redundant-output.diff
> > 
> > Review of attachment 8349574 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > 1.11 + * This shim does some tests a little differently than the W3C test
> >     1.12 + * harness; equality comparisons, especially, are less precise.
> >     1.13 + * The difference does not presently affect any test results.
> > 
> > Why?
> 
> Concrete example: this is the W3C's assert_equals, which is at great pains
> to be even more precise than === (in particular, it papers over the
> weird-by-design behavior of NaN):
> 
>     function same_value(x, y) {
>         if (y !== y)
>         {
>             //NaN case
>             return x !== x;
>         }
>         else if (x === 0 && y === 0) {
>             //Distinguish +0 and -0
>             return 1/x === 1/y;
>         }
>         else
>         {
>             //typical case
>             return x === y;
>         }
>     }
>     function assert_equals(actual, expected, description)
>     {
>          /*
>           * Test if two primitives are equal or two objects
>           * are the same object
>           */
>         if (typeof actual != typeof expected)
>         {
>             assert(false, "assert_equals", description,
>                           "expected (" + typeof expected + ") ${expected}
> but got (" + typeof actual + ") ${actual}",
>                           {expected:expected, actual:actual});
>             return;
>         }
>         assert(same_value(actual, expected), "assert_equals", description,
>                                              "expected ${expected} but got
> ${actual}",
>                                              {expected:expected,
> actual:actual});
>     };
> 
> But, before my patches, this is what the shim does:
> 
>     function assert_equals(a, b, msg) {
>       dump("assert_equals(" + a + ", " + b + ", \"" + msg + "\")");
>       do_check_eq(a, b, Components.stack.caller);
>     }
> 
> and do_check_eq (in testing/xpcshell/head.js) just does an == (not even ===)
> comparison.  I thought it would be safer not to change that.

Oh right, I forgot about the NaN and +-0 cases. Might be good to use ===, though, if that does cause failures.

Masatoshi Kimura [:emk]

Comment 45

•

10 years ago

Comment on attachment 8365554 [details] [diff] [review]
933885-4a-merge-from-upstream.diff

Review of attachment 8365554 [details] [diff] [review]:
-----------------------------------------------------------------

> And our files are named differently for no apparent reason, which I did not correct in case there is a valid reason.

Because our test infrastructure requires that test files must start with "test_".

::: dom/encoding/test/unit/test_singlebytes.js
@@ -296,5 @@
>  );
>  
>  test(
>    function () {
> -    var encodings = ["utf-8", "ibm866", "iso-8859-2", "iso-8859-3", "iso-8859-4", "iso-8859-5", "iso-8859-6", "iso-8859-7", "iso-8859-8", "iso-8859-8-i", "iso-8859-10", "iso-8859-13", "iso-8859-14", "iso-8859-15", "iso-8859-16", "koi8-r", "koi8-u", "macintosh", "windows-874", "windows-1250", "windows-1251", "windows-1252", "windows-1253", "windows-1254", "windows-1255", "windows-1256", "windows-1257", "windows-1258", "x-mac-cyrillic", "gbk", "gb18030", "hz-gb-2312", "big5", "euc-jp", "iso-2022-jp", "shift_jis", "euc-kr", "x-user-defined"];

Why the upstream removed x-user-defined from the test?

Attachment #8365554 - Flags: review?(VYV03354) → review+

Masatoshi Kimura [:emk]

Updated

•

10 years ago

Attachment #8365556 - Flags: review?(VYV03354) → review+

Masatoshi Kimura [:emk]

Comment 46

•

10 years ago

Comment on attachment 8365557 [details] [diff] [review]
933885-4c-overhaul-test-utf-roundtrip.diff

> removal of attempt to shim typed arrays (other tests in this set
> unconditionally use typed arrays)

Yup, typed arrays shim doesn't make sense because the API itself uses Uint8Array.

Attachment #8365557 - Flags: review?(VYV03354) → review+

Zack Weinberg (:zwol)

Assignee

Comment 47

•

10 years ago

(In reply to Masatoshi Kimura [:emk] from comment #45)
> Comment on attachment 8365554 [details] [diff] [review]
> 933885-4a-merge-from-upstream.diff
>
> Why the upstream removed x-user-defined from the test?

They didn't; it has never been in upstream's version.  I assume we added it for a good reason, so I have restored it in my copy of this patch.  (in both places. also iso-8859-8-i.)

https://hg.mozilla.org/integration/mozilla-inbound/rev/43082b814d99
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9a50a9bc491
https://hg.mozilla.org/integration/mozilla-inbound/rev/73733b06cbf7
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc1e7d605099
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b7da405c472
https://hg.mozilla.org/integration/mozilla-inbound/rev/e698efbf50dd
https://hg.mozilla.org/integration/mozilla-inbound/rev/a02ade046ba9

Carsten Book [:Tomcat]

Comment 48

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/43082b814d99
https://hg.mozilla.org/mozilla-central/rev/c9a50a9bc491
https://hg.mozilla.org/mozilla-central/rev/73733b06cbf7
https://hg.mozilla.org/mozilla-central/rev/bc1e7d605099
https://hg.mozilla.org/mozilla-central/rev/9b7da405c472
https://hg.mozilla.org/mozilla-central/rev/e698efbf50dd
https://hg.mozilla.org/mozilla-central/rev/a02ade046ba9

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla29

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Depends on: 964379

Zack Weinberg (:zwol)

Assignee

Updated

•

10 years ago

Flags: needinfo?(gps)

Mark Banner (:standard8)

Updated

•

10 years ago

Depends on: 975550

933885-xpcshell-dump-utf8.diff 11 years ago Zack Weinberg (:zwol) 3.95 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
933885-xpcshell-dump-utf8.diff 11 years ago Zack Weinberg (:zwol) 4.86 KB, patch	bzbarsky : review+ ted : review+	Details \| Diff \| Splinter Review
patch v2 (with fixes for problem test_singlebytes.js) 11 years ago Zack Weinberg (:zwol) 36.45 KB, patch		Details \| Diff \| Splinter Review
patch v2a 11 years ago Zack Weinberg (:zwol) 36.56 KB, patch	gps : feedback+	Details \| Diff \| Splinter Review
933885-1-xpcshell-dump-utf8.diff 11 years ago Zack Weinberg (:zwol) 3.37 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
933885-2a-runxpcshelltests-json.diff 11 years ago Zack Weinberg (:zwol) 1.94 KB, patch	gps : review+	Details \| Diff \| Splinter Review
933885-2b-runxpcshelltests-encoding.diff 11 years ago Zack Weinberg (:zwol) 3.21 KB, patch	gps : review+	Details \| Diff \| Splinter Review
933885-3-testharness-js-shim-remove-redundant-output.diff 11 years ago Zack Weinberg (:zwol) 3.93 KB, patch	gps : review+	Details \| Diff \| Splinter Review
933885-4a-split-test-singlebytes.diff 11 years ago Zack Weinberg (:zwol) 15.66 KB, patch	emk : review+	Details \| Diff \| Splinter Review
933885-4b-overhaul-test-utf-roundtrip.diff 11 years ago Zack Weinberg (:zwol) 7.68 KB, patch	emk : review-	Details \| Diff \| Splinter Review
933885-4a-merge-from-upstream.diff 10 years ago Zack Weinberg (:zwol) 969.05 KB, patch	emk : review+	Details \| Diff \| Splinter Review
933885-4b-split-test-singlebytes.diff 10 years ago Zack Weinberg (:zwol) 16.85 KB, patch	emk : review+	Details \| Diff \| Splinter Review
933885-4c-overhaul-test-utf-roundtrip.diff 10 years ago Zack Weinberg (:zwol) 10.37 KB, patch	emk : review+	Details \| Diff \| Splinter Review