Closed
Bug 881958
Opened 11 years ago
Closed 11 years ago
Parser ignores escaped characters
Categories
(L20n :: JS Library, defect, P2)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 918655
1.0
People
(Reporter: zbraniecki, Assigned: zbraniecki)
Details
Attachments
(1 file, 1 obsolete file)
8.68 KB,
patch
|
stas
:
review-
|
Details | Diff | Splinter Review |
Python parser does a fine job at unescaping escaped characters in string: https://github.com/l20n/python-l20n/blob/master/lib/format/lol/parser.py#L200-L204 but our JS parser doesn't do that in getString. It only does it in getComplexString which makes certain strings like: <id "foo\\"foo"> return 'foo\"foo' instead of 'foo"foo'.
Comment 1•11 years ago
|
||
I also noticed that \\ \" ends up correctly as \ ", but \\\" throws.
Comment 2•11 years ago
|
||
Not sure if this is the same bug, but \\{{ ends up escaping the braces somehow. In tinker, <welcome "Welcome to \\{{ l20n }}!"> is shown as Welcome to {{ l20n }}! Instead of Welcome to L20n!
Updated•11 years ago
|
Assignee: nobody → gandalf
Priority: -- → P2
Target Milestone: --- → 1.0
Assignee | ||
Comment 3•11 years ago
|
||
this patch changes the behavior of our escaping. With the patch we only escape: - \" -> " in case string is enclosed in " - \' -> ' in case string is enclosed in ' - \{{ -> {{ and no compilation The cost of the patch is estimated to be ~18% according to make perf, but if I reduce the number of iterations to 1 and launch it multiple times, the numbers are basically the same, so I assume that it slows down JITted code mostly. We will have to update the grammar because according to it we should also escape: - \\ -> \ - \" -> " in case of " enclosed string - \' -> ' in case of ' enclosed string One more caveats of that is that simple string keeps \{{ as \{{, only parsing it into complex string does the unescaping. I believe it to be the right choice - from simple string perspective \{{ is just a string. And it will be parsed into complex string because it contains {{.
Attachment #771009 -
Flags: review?(stas)
Assignee | ||
Comment 4•11 years ago
|
||
Updated patch. As per conv. with Stas we decided that we'll only unescape string quotes.
Attachment #771009 -
Attachment is obsolete: true
Attachment #771009 -
Flags: review?(stas)
Attachment #782013 -
Flags: review?(stas)
Comment 5•11 years ago
|
||
Comment on attachment 782013 [details] [diff] [review] patch Review of attachment 782013 [details] [diff] [review]: ----------------------------------------------------------------- I've been thinking about this and I had a change of heart. Maybe we should try to be consistent and make the backslash an escape character. Always. Suppose we fix bug 898594 using the \uXXXX syntax. I'm not sure how we'd then allow to write a literal "\u" which is not to be escaped. I tried the folliwng, but they break the parser: <foo1 "{{ '\' }}u"> <foo2 """ {{ "\" }}u"""> Maybe we should always assume that \ is an escape char, and if what follows isn't any special opchar (like "u" in the example above), we'd just print it? So, \a gives a. Just like in JavaScript. As far as our double-pass parsing goes, I think we should first detect if the string is maybeComplex (let's change the name from isNotComplex). If it's maybeComplex, don't unescape anything and pass the raw string on to parseString (second pass). If it's not maybeComplex, than go back and unescape everything from left to right. Gandalf, what do you think? ::: lib/l20n/parser.js @@ +140,5 @@ > } > > + var escopchars = { > + '"': new RegExp('\\\\"', 'g'), > + "'": new RegExp("\\\\'", 'g'), Can you use regexp literals here? According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Creating_a_Regular_Expression: Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance. Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input. Plus, I think you don't need to escape backslashes to many times in literals. ::: tests/lib/parser.js @@ +63,2 @@ > > //var ast = parser.parse("<id ''''string\\''''>"); Why is this commented out? This seems to break badly. Any idea why? Should we file another bug?
Comment 6•11 years ago
|
||
Comment on attachment 782013 [details] [diff] [review] patch Review of attachment 782013 [details] [diff] [review]: ----------------------------------------------------------------- r- based on comment 5; waiting for response from gandalf
Attachment #782013 -
Flags: review?(stas) → review-
Assignee | ||
Comment 7•11 years ago
|
||
Ok, I'm taking another approach. Trying to figure out how much would it cost to remove separation between complex and simple string overall. So far got to the point where I have: - backslash unescaping - unicode unescaping - placeables I don't have: - multiquote closings - max_placeables counter And I got performance *gain* on a long settings.lol You can see the current code here: https://github.com/zbraniecki/l20n.js/blob/parser3/lib/l20n/parser.js#L152
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•