Closed Bug 881958 Opened 11 years ago Closed 11 years ago

Parser ignores escaped characters

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 918655

Milestone:

1.0

People

(Reporter: zbraniecki, Assigned: zbraniecki)

Details

Attachments

(1 file, 1 obsolete file)

patch 11 years ago Zibi Braniecki [:zbraniecki][:gandalf] 8.06 KB, patch		Details \| Diff \| Splinter Review
patch 11 years ago Zibi Braniecki [:zbraniecki][:gandalf] 8.68 KB, patch	stas : review-	Details \| Diff \| Splinter Review

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Description

•

11 years ago

Python parser does a fine job at unescaping escaped characters in string: https://github.com/l20n/python-l20n/blob/master/lib/format/lol/parser.py#L200-L204

but our JS parser doesn't do that in getString. It only does it in getComplexString which makes certain strings like:

<id "foo\\"foo"> return 'foo\"foo' instead of 'foo"foo'.

Staś Małolepszy :stas

Comment 1

•

11 years ago

I also noticed that \\ \" ends up correctly as \ ", but \\\" throws.

Staś Małolepszy :stas

Comment 2

•

11 years ago

Not sure if this is the same bug, but \\{{ ends up escaping the braces somehow.  In tinker,

    <welcome "Welcome to \\{{ l20n }}!">

is shown as 

    Welcome to {{ l20n }}!

Instead of

    Welcome to L20n!

Staś Małolepszy :stas

Updated

•

11 years ago

Assignee: nobody → gandalf

Priority: -- → P2

Target Milestone: --- → 1.0

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 3

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

this patch changes the behavior of our escaping.

With the patch we only escape:

 - \" -> " in case string is enclosed in "
 - \' -> ' in case string is enclosed in '
 - \{{ -> {{ and no compilation

The cost of the patch is estimated to be ~18% according to make perf, but if I reduce the number of iterations to 1 and launch it multiple times, the numbers are basically the same, so I assume that it slows down JITted code mostly.

We will have to update the grammar because according to it we should also escape:

 - \\ -> \
 - \" -> " in case of " enclosed string
 - \' -> ' in case of ' enclosed string

One more caveats of that is that simple string keeps \{{ as \{{, only parsing it into complex string does the unescaping.
I believe it to be the right choice - from simple string perspective \{{ is just a string. And it will be parsed into complex string because it contains {{.

Attachment #771009 - Flags: review?(stas)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 4

•

11 years ago

Attached patch patch — Details — Splinter Review

Updated patch.

As per conv. with Stas we decided that we'll only unescape string quotes.

Attachment #771009 - Attachment is obsolete: true

Attachment #771009 - Flags: review?(stas)

Attachment #782013 - Flags: review?(stas)

Staś Małolepszy :stas

Comment 5

•

11 years ago

Comment on attachment 782013 [details] [diff] [review]
patch

Review of attachment 782013 [details] [diff] [review]:
-----------------------------------------------------------------

I've been thinking about this and I had a change of heart.  Maybe we should try to be consistent and make the backslash an escape character.  Always.

Suppose we fix bug 898594 using the \uXXXX syntax.  I'm not sure how we'd then allow to write a literal "\u" which is not to be escaped.

I tried the folliwng, but they break the parser:

  <foo1 "{{ '\' }}u">
  <foo2 """ {{ "\" }}u"""> 

Maybe we should always assume that \ is an escape char, and if what follows isn't any special opchar (like "u" in the example above), we'd just print it?  So, \a gives a.  Just like in JavaScript.

As far as our double-pass parsing goes, I think we should first detect if the string is maybeComplex (let's change the name from isNotComplex).  If it's maybeComplex, don't unescape anything and pass the raw string on to parseString (second pass).  If it's not maybeComplex, than go back and unescape everything from left to right.

Gandalf, what do you think?

::: lib/l20n/parser.js
@@ +140,5 @@
>      }
>  
> +    var escopchars = {
> +      '"': new RegExp('\\\\"', 'g'),
> +      "'": new RegExp("\\\\'", 'g'),

Can you use regexp literals here?

According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Creating_a_Regular_Expression:

    Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

    Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

Plus, I think you don't need to escape backslashes to many times in literals.

::: tests/lib/parser.js
@@ +63,2 @@
>  
>      //var ast = parser.parse("<id ''''string\\''''>");

Why is this commented out?  This seems to break badly.  Any idea why?  Should we file another bug?

Staś Małolepszy :stas

Comment 6

•

11 years ago

Comment on attachment 782013 [details] [diff] [review]
patch

Review of attachment 782013 [details] [diff] [review]:
-----------------------------------------------------------------

r- based on comment 5; waiting for response from gandalf

Attachment #782013 - Flags: review?(stas) → review-

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 7

•

11 years ago

Ok, I'm taking another approach.

Trying to figure out how much would it cost to remove separation between complex and simple string overall.

So far got to the point where I have:

 - backslash unescaping
 - unicode unescaping
 - placeables

I don't have:
 - multiquote closings
 - max_placeables counter

And I got performance *gain* on a long settings.lol

You can see the current code here: https://github.com/zbraniecki/l20n.js/blob/parser3/lib/l20n/parser.js#L152

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Parser ignores escaped characters

Categories

(L20n :: JS Library, defect, P2)

Tracking

(Not tracked)

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Attachment

General

Description

File Name

Content Type