Closed Bug 1338779 Opened 7 years ago Closed 7 years ago

Incorrect RegExp matching with word boundary and Kleene plus operator in IgnoreCase+Unicode mode

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox54

---

fixed

People

(Reporter: anba, Assigned: arai)

References

Details

Attachments

(1 file)

Handle characters map across Latin1 and non-Latin1 properly in TextNode. 7 years ago Tooru Fujisawa [:arai] 15.17 KB, patch	till : review+	Details \| Diff \| Splinter Review

André Bargull [:anba]

Reporter

Description

•

7 years ago

Test case:
---
print(/(i\B\u017f)/ui.test("is"));
print(/(i\B\u017f)+/ui.test("is"));
---

Expected: Prints "true" two times
Actual: The second print call outputs "false"


Reproducible with and without the patch from bug 1338373 applied.

Till Schneidereit [:till]

Updated

•

7 years ago

Flags: needinfo?(arai.unmht)

Tooru Fujisawa [:arai]

Assignee

Comment 1

•

7 years ago

looks like we should avoid some optimization for ASCII vs non-ASCII match,
or add exception for those characters.

will try fixing.

Tooru Fujisawa [:arai]

Assignee

Updated

•

7 years ago

Assignee: nobody → arai.unmht

Status: NEW → ASSIGNED

Tooru Fujisawa [:arai]

Assignee

Comment 2

•

7 years ago

actually, Latin1 vs non-Latin1 match with ignoreCase, regardless of unicode or not.

/(\u039C)/i.test("\xB5") // true
/(\u039C)+/i.test("\xB5") // false <= should be true

Tooru Fujisawa [:arai]

Assignee

Comment 3

•

7 years ago

Attached patch Handle characters map across Latin1 and non-Latin1 properly in TextNode. — Details — Splinter Review

This is the issue that TextNode doesn't match when it contains non-Latin1 character that matches to Latin1 character with ignore_case, and input string is Latin1, in certain situation depending on enclosing context.

here's the details.

In TextNode::TextEmitPass, NON_ASCII_MATCH pass tries to skip matching when the input string is Latin1 and the current character is non-Latin1.
It should not skip if the current character maps to Latin1 by ignore_case matching.
So changed it to use ConvertNonLatin1ToLatin1 result for ignore_case.

Then, if above code avoid skipping, that character should be handled in later pass.
Before this patch, ignore_case match was using another 2 passes, NON_LETTER_CHARACTER_MATCH (EmitAtomNonLetter) and CASE_CHARACTER_MATCH (EmitAtomLetter).
Those categories are not describing well when a letter can map across non-Latin1 to Latin1, and non-Latin1 case is filtered out in ConvertNonLatin1ToLatin1 because of Latin1 match, that results in a single character has only one `chars` for GetCaseIndependentLetters().

So, changed them to CASE_SINGLE_CHARACTER_MATCH and CASE_MUTLI_CHARACTER_MATCH:
  * CASE_SINGLE_CHARACTER_MATCH : GetCaseIndependentLetters() == 1
    calls EmitAtomSingle
  * CASE_MULTI_CHARACTER_MATCH : GetCaseIndependentLetters() > 1
    calls EmitAtomMulti

Flags: needinfo?(arai.unmht)

Attachment #8836431 - Flags: review?(till)

Tooru Fujisawa [:arai]

Assignee

Comment 4

•

7 years ago

also, filed bug 1338841 to rename some "ascii" to "latin1" in irregexp.

Comment 5

•

7 years ago

Comment on attachment 8836431 [details] [diff] [review]
Handle characters map across Latin1 and non-Latin1 properly in TextNode.

Review of attachment 8836431 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thank you! Also thank you for the excellent explanation of what's going on, very helpful.

Attachment #8836431 - Flags: review?(till) → review+

Tooru Fujisawa [:arai]

Assignee

Comment 6

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/639dea0e1a9cd6a91ed98767bb80325b4c9774ff
Bug 1338779 - Handle characters map across Latin1 and non-Latin1 properly in TextNode. r=till

Carsten Book [:Tomcat]

Comment 7

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/639dea0e1a9c

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox54: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Incorrect RegExp matching with word boundary and Kleene plus operator in IgnoreCase+Unicode mode

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: anba, Assigned: arai)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type