Differential Testing: Different output message involving RegExp, exec, --latin1-strings

RESOLVED FIXED in Firefox 34

Status

()

Core
JavaScript Engine: JIT
--
major
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: gkw, Assigned: bhackett)

Tracking

(Blocks: 2 bugs, {regression, testcase})

Trunk
mozilla34
x86_64
All
regression, testcase
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox31 unaffected, firefox32 wontfix, firefox33 wontfix, firefox34 fixed, firefox35 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

3 years ago
print((/(?!(?!(?!6)[\Wc]))/i).exec())

$ ./js-dbgDisabled-opt-64-prof-dm-ts-darwin-5d9af625f42e --ion-offthread-compile=off --ion-eager 370.js

$ ./js-dbgDisabled-opt-64-prof-dm-ts-darwin-5d9af625f42e --ion-offthread-compile=off --ion-eager --latin1-strings 370.js
null

(Tested this on 64-bit Mac js opt threadsafe deterministic shell off m-c rev 5d9af625f42e)

My configure flags are:

CC="clang -Qunused-arguments" CXX="clang++ -Qunused-arguments" AR=ar sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=x86_64-apple-darwin12.5.0 --disable-debug --enable-optimize --enable-profiling --enable-gczeal --enable-debug-symbols --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options>

Guessing this is related to bug 998392 as well, so setting needinfo? from Jan.
Flags: needinfo?(jdemooij)
Hah, this seems to be a bug in our irregexp port somewhere that only affects TwoByte strings:

I get the following output for:

    print((/(?!(?!(?!6)[\Wc]))/i).test());

d8:                : false
SM before irregexp : false
SM --latin1-strings: false
SM                 : true
Not sure but I think this is a bug in CharacterRange::AddCaseEquivalents.
Flags: needinfo?(jdemooij) → needinfo?(bhackett1024)
status-firefox31: --- → unaffected
status-firefox32: --- → affected
status-firefox33: --- → affected
(Assignee)

Comment 3

3 years ago
Created attachment 8456580 [details] [diff] [review]
patch

This is a problem with the same \u0130 'capital I with a dot over it' character that also caused problems in one of the existing tests in bug 976446.  That bug just changed the test, though I guess this fix is better (which also restores the original test).
Assignee: nobody → bhackett1024
Attachment #8456580 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Comment on attachment 8456580 [details] [diff] [review]
patch

Review of attachment 8456580 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is correct, with the patch the following test fails:

assertEq("foobar\xff5baz\u1200".search(/bar\u0178\d/i), 3);
Attachment #8456580 - Flags: review?(jdemooij)
(Assignee)

Comment 5

3 years ago
(In reply to Jan de Mooij [:jandem] from comment #4)
> Comment on attachment 8456580 [details] [diff] [review]
> patch
> 
> Review of attachment 8456580 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't think this is correct, with the patch the following test fails:
> 
> assertEq("foobar\xff5baz\u1200".search(/bar\u0178\d/i), 3);

OK, I have no idea what the correct fix is here then.

We have this comment:

    // The standard requires that non-ASCII characters cannot have ASCII
    // character codes in their equivalence class.

But this is contradicted by our unicode tables (which we presumably pull from some standard source), in which the lowercase of \u0130 is a lowercase 'i'.

We could just watch for cases where the unicode tables have a pattern like this, and filter it out, like this patch does, but doing that on Latin1 characters is apparently not correct.  We could do it on Ascii characters (<= 127) but the irregexp source already seems to conflate the concepts of Ascii strings and Latin1 strings all over the place, including when invoking this mysterious standard.

Should I just do a spotfix for this stupid \u0130 character?
Assignee: bhackett1024 → nobody
(In reply to Brian Hackett (:bhackett) from comment #5)
> OK, I have no idea what the correct fix is here then.

I think the problem is in CharacterRange::AddCaseEquivalents. IIRC we call GetCaseIndependentLetters, but V8 doesn't. We should look at their code and see what they do exactly, and in which cases it's different from what we do.
(Assignee)

Comment 7

3 years ago
(In reply to Jan de Mooij [:jandem] from comment #6)
> (In reply to Brian Hackett (:bhackett) from comment #5)
> > OK, I have no idea what the correct fix is here then.
> 
> I think the problem is in CharacterRange::AddCaseEquivalents. IIRC we call
> GetCaseIndependentLetters, but V8 doesn't. We should look at their code and
> see what they do exactly, and in which cases it's different from what we do.

v8's unicode library's interface is totally different from ours, so to avoid needing to port all of that stuff (and any additional dependencies) I implemented it less efficiently using GetCaseIndependentLetters and our library's ToUpperCase and ToLowerCase.  Barring spec insanity (to wit: this bug) these should be equivalent.
I know it sucks but we should still fix it somehow :(
(Assignee)

Comment 9

3 years ago
Created attachment 8458125 [details] [diff] [review]
special case \u0130
Attachment #8456580 - Attachment is obsolete: true
Attachment #8458125 - Flags: review?(jdemooij)
Hm there's also a problem with "k" if you disable Latin1 strings.. (EnableLatin1Strings in String.cpp

assertEq((/(?!(?!(?!6)[\Wc]))/i).test("k"), false);
The spec says this:

The abstract operation Canonicalize takes a character parameter ch and performs the following steps:

1. If IgnoreCase is false, return ch.
2. Let u be ch converted to upper case as if by calling the standard built-in method String.prototype.toUpperCase on the one-character String ch.
3. If u does not consist of a single character, return ch.
4. Let cu be u's character.
5. If ch's code unit value is greater than or equal to decimal 128 and cu's code unit value is less than decimal 128, then return ch.
6. Return cu.

Does that help?

Updated

3 years ago
Attachment #8458125 - Flags: review?(jdemooij)
(Reporter)

Updated

3 years ago
Flags: needinfo?(bhackett1024)
(Assignee)

Comment 12

3 years ago
Created attachment 8466291 [details] [diff] [review]
patch
Attachment #8458125 - Attachment is obsolete: true
Attachment #8466291 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Comment on attachment 8466291 [details] [diff] [review]
patch

Review of attachment 8466291 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/irregexp/RegExpEngine.cpp
@@ +215,5 @@
> +        // Watch for duplicates.
> +        bool found = false;
> +        for (size_t j = 0; j < count; j++) {
> +            if (letters[j] == c)
> +                found = true;

Nit: can |break;| here.
Attachment #8466291 - Flags: review?(jdemooij) → review+
(Reporter)

Comment 14

3 years ago
Thanks for fixing! Perhaps this is ready for landing soon?
Flags: needinfo?(bhackett1024)
(Assignee)

Comment 15

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/293f3a5eabdb
Flags: needinfo?(bhackett1024)
https://hg.mozilla.org/mozilla-central/rev/293f3a5eabdb
Assignee: nobody → bhackett1024
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Flags: qe-verify-

Updated

3 years ago
Duplicate of this bug: 1076728

Updated

3 years ago
Blocks: 976446
No longer blocks: 998392, 1028867
Comment on attachment 8466291 [details] [diff] [review]
patch

The patch was not backported but apparently people are running into this (see bug 1076728). I know it's late in the cycle but requesting approval just in case.

Approval Request Comment
[Feature/regressing bug #]: Bug 976446.
[User impact if declined]: Broken websites.
[Describe test coverage new/current, TBPL]: Tested on TBPL, patch is in 34/35.
[Risks and why]: Low risk.
[String/UUID change made/needed]: None.
Attachment #8466291 - Flags: approval-mozilla-beta?
Jan, that is unfortunate. However, having it on m-c for a month demonstrates that the change should be safe. You confirm that it didn't cause any side effect?
Flags: needinfo?(jdemooij)
(In reply to Sylvestre Ledru [:sylvestre] from comment #19)
> You confirm that it didn't cause any side effect?

I'm not aware of any side effects, no.
Flags: needinfo?(jdemooij)
status-firefox32: affected → wontfix
status-firefox33: affected → wontfix
status-firefox34: --- → affected
status-firefox35: --- → fixed
status-firefox34: affected → fixed
Comment on attachment 8466291 [details] [diff] [review]
patch

We are too late in the cycle to take it. Sorry, this will be fixed with the release of 34...
Attachment #8466291 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Updated

3 years ago
Duplicate of this bug: 1093957
You need to log in before you can comment on or make changes to this bug.