Closed
Bug 1338779
Opened 7 years ago
Closed 7 years ago
Incorrect RegExp matching with word boundary and Kleene plus operator in IgnoreCase+Unicode mode
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: anba, Assigned: arai)
References
Details
Attachments
(1 file)
15.17 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
Test case: --- print(/(i\B\u017f)/ui.test("is")); print(/(i\B\u017f)+/ui.test("is")); --- Expected: Prints "true" two times Actual: The second print call outputs "false" Reproducible with and without the patch from bug 1338373 applied.
Updated•7 years ago
|
Flags: needinfo?(arai.unmht)
Assignee | ||
Comment 1•7 years ago
|
||
looks like we should avoid some optimization for ASCII vs non-ASCII match, or add exception for those characters. will try fixing.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•7 years ago
|
||
actually, Latin1 vs non-Latin1 match with ignoreCase, regardless of unicode or not. /(\u039C)/i.test("\xB5") // true /(\u039C)+/i.test("\xB5") // false <= should be true
Assignee | ||
Comment 3•7 years ago
|
||
This is the issue that TextNode doesn't match when it contains non-Latin1 character that matches to Latin1 character with ignore_case, and input string is Latin1, in certain situation depending on enclosing context. here's the details. In TextNode::TextEmitPass, NON_ASCII_MATCH pass tries to skip matching when the input string is Latin1 and the current character is non-Latin1. It should not skip if the current character maps to Latin1 by ignore_case matching. So changed it to use ConvertNonLatin1ToLatin1 result for ignore_case. Then, if above code avoid skipping, that character should be handled in later pass. Before this patch, ignore_case match was using another 2 passes, NON_LETTER_CHARACTER_MATCH (EmitAtomNonLetter) and CASE_CHARACTER_MATCH (EmitAtomLetter). Those categories are not describing well when a letter can map across non-Latin1 to Latin1, and non-Latin1 case is filtered out in ConvertNonLatin1ToLatin1 because of Latin1 match, that results in a single character has only one `chars` for GetCaseIndependentLetters(). So, changed them to CASE_SINGLE_CHARACTER_MATCH and CASE_MUTLI_CHARACTER_MATCH: * CASE_SINGLE_CHARACTER_MATCH : GetCaseIndependentLetters() == 1 calls EmitAtomSingle * CASE_MULTI_CHARACTER_MATCH : GetCaseIndependentLetters() > 1 calls EmitAtomMulti
Flags: needinfo?(arai.unmht)
Attachment #8836431 -
Flags: review?(till)
Assignee | ||
Comment 4•7 years ago
|
||
also, filed bug 1338841 to rename some "ascii" to "latin1" in irregexp.
See Also: → 1338841
Comment 5•7 years ago
|
||
Comment on attachment 8836431 [details] [diff] [review] Handle characters map across Latin1 and non-Latin1 properly in TextNode. Review of attachment 8836431 [details] [diff] [review]: ----------------------------------------------------------------- Looks great, thank you! Also thank you for the excellent explanation of what's going on, very helpful.
Attachment #8836431 -
Flags: review?(till) → review+
Assignee | ||
Comment 6•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/639dea0e1a9cd6a91ed98767bb80325b4c9774ff Bug 1338779 - Handle characters map across Latin1 and non-Latin1 properly in TextNode. r=till
Comment 7•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/639dea0e1a9c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•