Closed
Bug 359651
Opened 18 years ago
Closed 10 years ago
Non-greedy regular expressions can capture an extra character under certain circumstances
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
INVALID
People
(Reporter: kliu, Unassigned)
References
()
Details
(Keywords: regression)
Attachments
(1 file)
684 bytes,
text/html
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Under certain unusual circumstances, non-greedy regexp is not entirely non-greedy. I have not tested all of the cases where this happens, but from the cases that I have tested, it seems that 1) if there is a pattern capture (parentheses) and 2) if the pattern capture should capture 0 bytes when non-greedy and >0 bytes when greedy and 3) that capture parentheses itself is moderated by a '?', then the pattern capture will capture 1 byte when non-greedy (instead of 0 bytes). Reproducible: Always Steps to Reproduce: 1. Run the following JavaScript: var x = "123"; var regexp = /^(.*?)?(\d+)$/; alert(x.replace(regexp, "$1-$2")); Actual Results: 1-23 Expected Results: -123 Perl: -123 MSIE/6: -123 Gecko/1.5 (FB/0.7): -123 Gecko/1.6 (FF/0.8): 1-23 So the problem cropped up somewhere in the transition from Gecko/1.5 to Gecko/1.6. Everything I tried after Gecko/1.6 (incl. trunk) returns the incorrect "1-23" string. Other cases: x = "123"; regexp = /^(.*?)(\d+)$/; -> WORKS (no '?' after capture) x = "x123"; regexp = /^(.*?)?(\d+)$/; -> WORKS (>0 bytes in capture) x = "x123"; regexp = /^(x.*?)?(\d+)$/; -> WORKS (>0 bytes in capture) x = "x123"; regexp = /^x(.*?)?(\d+)$/; -> BROKEN I stumbled upon this bug by accident. A '?' after a (.*?), while perfectly legal, is redundant. I was changing some regexp around, and I had left in a '?' after changing one of my captures from a pattern that matched at least 1 byte to one that matched at least 0 bytes, specifically, (.*?) (and subsequently spending some amount of time wondering why my code suddenly stopped worked correctly). Because you shouldn't need to have a '?' after captures that can capture 0 bytes, I think that this is a minor problem. Nevertheless, the behavior exhibited by Gecko/1.6 and above is incorrect and inconsistent with that of Perl and should be corrected (and there could be other people who end up doing what I did; neglecting to remove the '?' when changes to a capture made it redundant).
Comment 2•18 years ago
|
||
Brian, another one crying out for help from you. /be
Severity: minor → trivial
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
OS: Windows XP → All
Hardware: PC → All
Comment 3•10 years ago
|
||
Given that all JS engines agree on the result here, this isn't a bug. If anything, it's a specification bug, but even then, it most certainly can't be changed anymore, as this behavior will be relied upon by client code, nowadays.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•