Closed Bug 898594 Opened 11 years ago Closed 11 years ago

support unicode escapes like \u0020

Categories

(L20n :: JS Library, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 918655

People

(Reporter: Pike, Assigned: zbraniecki)

Details

Attachments

(1 file, 2 obsolete files)

L20n strings should support unicode escapes of sorts in case someone has problems to enter a string through the keyboard, or wants an explicit funny char.

I propose we use "standard" unicode escapes, \u0020.
Gandalf, does this affect your work in bug 881958?
Flags: needinfo?(gandalf)
I don't know. We currently load files via XHR and it does not unescape unicode chars. At the same time writing strings into JS does.

If we can find a way to unescape strings while loading from XHR that'll be perfect.

Otherwise we'll have to do this manually which will be very expensive.
Flags: needinfo?(gandalf)
String.fromCharCode looks promising:

[15:43:24.636] String.fromCharCode(0x00A9)
[15:43:24.639] "©"

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCharCode

I guess we could detect \u and take the four characters that follow and replace the whole thing with the return value of String.fromCharCode?
Tentatively setting target milestone to 1.0.
Priority: -- → P3
Target Milestone: --- → 1.0
Component: General → JS Library
Attached patch patch (obsolete) — Splinter Review
Here's a proposal patch.

it seems that it has to be done manually.

We can either done it on the whole source, or within parseString.

The major difference is that in the former case we allow for unicode chars to be used in building syntax, not only string values.

Wrt. performance, I didn't test it and I'm not sure how to reliably test which method will be faster (one cycle through a bigger string vs. multiple cycles on small chunks).
Attachment #784117 - Flags: feedback?(stas)
(In reply to Zbigniew Braniecki [:gandalf] from comment #5)

> The major difference is that in the former case we allow for unicode chars
> to be used in building syntax, not only string values.

I think that replacing in the whole source is wrong.  It could lead to weird obfuscation attacks, and would make it harder to write external tools.  Moreover, without proper parsing, it would be hard to know if a given \u isn't escaped itself.  E.g. \\u00ae.

I suggest we only unescape \u in Strings, but only after unescaping actual backslashes and unly if the string contains "\u".  See bug 881958.
Comment on attachment 784117 [details] [diff] [review]
patch

Review of attachment 784117 [details] [diff] [review]:
-----------------------------------------------------------------

feedback- based on my comment 6
Attachment #784117 - Flags: feedback?(stas) → feedback-
I agree that we should restrict unicode escapes to the literal string parts.
Attached patch patch v2 (obsolete) — Splinter Review
Updated patch.

Not review ready, cause I want to play with bug 881958 and figure out how to layout those two together, but it's feature ready so asking for feedback.

The impact of this bug on performance is as follows:

1) On our jsshell benchmark it's a 14% hit on parser :(
2) On Keon v1-train, settings app it's not noticeable

For me 1) is a pretty significant cost. 14% for an edge feature is a sad tradeoff to make :(

2) is reassuring. As I keep believing that the whole parsing has little impact on Gaia overall (because it happens in parallel with other startup activities) it seems that we can land it without getting hit on Gaia.

Still, would love to avoid the cost. Also the cost of doing this "per literal" is 4x comparing to doing it once for the whole string according to jsshell benchmark
Assignee: nobody → gandalf
Attachment #784117 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #784689 - Flags: feedback?(stas)
Attachment #784689 - Flags: feedback?(l10n)
Comment on attachment 784689 [details] [diff] [review]
patch v2

Review of attachment 784689 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/l20n/parser.js
@@ +168,4 @@
>        _index = close + len;
>        return {
>          type: 'String',
> +        content: decodeUnicode(str),

This runs replace on every string in the source file.  As you mention, there's a performance hit.  On my machine, it's 26%:

  master:   727 µs
  patch:   912 µs (+26%)

However, if you guard that with a simple indexOf like so:

  content: str.indexOf('\\u') === -1 ? str : decodeUnicode(str), 

the perf hit depends solely on the actual use of \u escapes in the string.  Our benchmark file does't use any, so the hit is small (yet still statistically significant)

  indexOf:   758 µs (+4%)

Depending on the unescaping strategy from bug 881958, this may or my not produce false positives, but that's okay, because they should be very rare and they won't break decodeUnicode anyways.
Attachment #784689 - Flags: feedback?(stas) → feedback-
Comment on attachment 784689 [details] [diff] [review]
patch v2

Review of attachment 784689 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/l20n/parser.js
@@ +168,4 @@
>        _index = close + len;
>        return {
>          type: 'String',
> +        content: decodeUnicode(str),

Also, this does escape "foo {{ \u00EF }}", right?
Attachment #784689 - Flags: feedback?(l10n) → feedback-
Attached patch patch v3Splinter Review
patch v3. 

The cost on jsshell is marginal (<100us), the cost on node is less marginal (300us).
Attachment #784689 - Attachment is obsolete: true
Attachment #807440 - Flags: review?(stas)
if this patch gets r+, I'll add tests before landing it
Comment on attachment 807440 [details] [diff] [review]
patch v3

Review of attachment 807440 [details] [diff] [review]:
-----------------------------------------------------------------

The patch in bug 918655 also seems to address this.  Do you still want me to review this patch?
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment on attachment 807440 [details] [diff] [review]
patch v3

Review of attachment 807440 [details] [diff] [review]:
-----------------------------------------------------------------

The patch in bug 918655 also seems to address this.  Do you still want me to review this patch?
Attachment #807440 - Flags: review?(stas)
Oops, sorry, I just meant to cancel the review request, not sure why my comment 14 got duplicated.  Ignore.
Gecko will soon support unicode chars as part of ES6 (bug 952985).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: