Closed Bug 898594 Opened 11 years ago Closed 11 years ago

support unicode escapes like \u0020

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 918655

Milestone:

1.0

People

(Reporter: Pike, Assigned: zbraniecki)

Details

Attachments

(1 file, 2 obsolete files)

patch 11 years ago Zibi Braniecki [:zbraniecki][:gandalf] 1.38 KB, patch	stas : feedback-	Details \| Diff \| Splinter Review
patch v2 11 years ago Zibi Braniecki [:zbraniecki][:gandalf] 947 bytes, patch	stas : feedback- Pike : feedback-	Details \| Diff \| Splinter Review
patch v3 11 years ago Zibi Braniecki [:zbraniecki][:gandalf] 2.05 KB, patch		Details \| Diff \| Splinter Review

Axel Hecht [:Pike]

Reporter

Description

•

11 years ago

L20n strings should support unicode escapes of sorts in case someone has problems to enter a string through the keyboard, or wants an explicit funny char.

I propose we use "standard" unicode escapes, \u0020.

Staś Małolepszy :stas

Comment 1

•

11 years ago

Gandalf, does this affect your work in bug 881958?

Flags: needinfo?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 2

•

11 years ago

I don't know. We currently load files via XHR and it does not unescape unicode chars. At the same time writing strings into JS does.

If we can find a way to unescape strings while loading from XHR that'll be perfect.

Otherwise we'll have to do this manually which will be very expensive.

Flags: needinfo?(gandalf)

Staś Małolepszy :stas

Comment 3

•

11 years ago

String.fromCharCode looks promising:

[15:43:24.636] String.fromCharCode(0x00A9)
[15:43:24.639] "©"

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCharCode

I guess we could detect \u and take the four characters that follow and replace the whole thing with the return value of String.fromCharCode?

Staś Małolepszy :stas

Comment 4

•

11 years ago

Tentatively setting target milestone to 1.0.

Priority: -- → P3

Target Milestone: --- → 1.0

Staś Małolepszy :stas

Updated

•

11 years ago

Component: General → JS Library

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 5

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Here's a proposal patch.

it seems that it has to be done manually.

We can either done it on the whole source, or within parseString.

The major difference is that in the former case we allow for unicode chars to be used in building syntax, not only string values.

Wrt. performance, I didn't test it and I'm not sure how to reliably test which method will be faster (one cycle through a bigger string vs. multiple cycles on small chunks).

Attachment #784117 - Flags: feedback?(stas)

Staś Małolepszy :stas

Comment 6

•

11 years ago

(In reply to Zbigniew Braniecki [:gandalf] from comment #5)

> The major difference is that in the former case we allow for unicode chars
> to be used in building syntax, not only string values.

I think that replacing in the whole source is wrong.  It could lead to weird obfuscation attacks, and would make it harder to write external tools.  Moreover, without proper parsing, it would be hard to know if a given \u isn't escaped itself.  E.g. \\u00ae.

I suggest we only unescape \u in Strings, but only after unescaping actual backslashes and unly if the string contains "\u".  See bug 881958.

Staś Małolepszy :stas

Comment 7

•

11 years ago

Comment on attachment 784117 [details] [diff] [review]
patch

Review of attachment 784117 [details] [diff] [review]:
-----------------------------------------------------------------

feedback- based on my comment 6

Attachment #784117 - Flags: feedback?(stas) → feedback-

Axel Hecht [:Pike]

Reporter

Comment 8

•

11 years ago

I agree that we should restrict unicode escapes to the literal string parts.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 9

•

11 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

Updated patch.

Not review ready, cause I want to play with bug 881958 and figure out how to layout those two together, but it's feature ready so asking for feedback.

The impact of this bug on performance is as follows:

1) On our jsshell benchmark it's a 14% hit on parser :(
2) On Keon v1-train, settings app it's not noticeable

For me 1) is a pretty significant cost. 14% for an edge feature is a sad tradeoff to make :(

2) is reassuring. As I keep believing that the whole parsing has little impact on Gaia overall (because it happens in parallel with other startup activities) it seems that we can land it without getting hit on Gaia.

Still, would love to avoid the cost. Also the cost of doing this "per literal" is 4x comparing to doing it once for the whole string according to jsshell benchmark

Assignee: nobody → gandalf

Attachment #784117 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Attachment #784689 - Flags: feedback?(stas)

Attachment #784689 - Flags: feedback?(l10n)

Staś Małolepszy :stas

Comment 10

•

11 years ago

Comment on attachment 784689 [details] [diff] [review]
patch v2

Review of attachment 784689 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/l20n/parser.js
@@ +168,4 @@
>        _index = close + len;
>        return {
>          type: 'String',
> +        content: decodeUnicode(str),

This runs replace on every string in the source file.  As you mention, there's a performance hit.  On my machine, it's 26%:

  master:   727 µs
  patch:   912 µs (+26%)

However, if you guard that with a simple indexOf like so:

  content: str.indexOf('\\u') === -1 ? str : decodeUnicode(str), 

the perf hit depends solely on the actual use of \u escapes in the string.  Our benchmark file does't use any, so the hit is small (yet still statistically significant)

  indexOf:   758 µs (+4%)

Depending on the unescaping strategy from bug 881958, this may or my not produce false positives, but that's okay, because they should be very rare and they won't break decodeUnicode anyways.

Attachment #784689 - Flags: feedback?(stas) → feedback-

Axel Hecht [:Pike]

Reporter

Comment 11

•

11 years ago

Comment on attachment 784689 [details] [diff] [review]
patch v2

Review of attachment 784689 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/l20n/parser.js
@@ +168,4 @@
>        _index = close + len;
>        return {
>          type: 'String',
> +        content: decodeUnicode(str),

Also, this does escape "foo {{ \u00EF }}", right?

Attachment #784689 - Flags: feedback?(l10n) → feedback-

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 12

•

11 years ago

Attached patch patch v3 — Details — Splinter Review

patch v3. 

The cost on jsshell is marginal (<100us), the cost on node is less marginal (300us).

Attachment #784689 - Attachment is obsolete: true

Attachment #807440 - Flags: review?(stas)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 13

•

11 years ago

if this patch gets r+, I'll add tests before landing it

Staś Małolepszy :stas

Comment 14

•

11 years ago

Comment on attachment 807440 [details] [diff] [review]
patch v3

Review of attachment 807440 [details] [diff] [review]:
-----------------------------------------------------------------

The patch in bug 918655 also seems to address this.  Do you still want me to review this patch?

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

11 years ago

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Staś Małolepszy :stas

Comment 16

•

11 years ago

Comment on attachment 807440 [details] [diff] [review]
patch v3

Review of attachment 807440 [details] [diff] [review]:
-----------------------------------------------------------------

The patch in bug 918655 also seems to address this.  Do you still want me to review this patch?

Attachment #807440 - Flags: review?(stas)

Staś Małolepszy :stas

Comment 17

•

11 years ago

Oops, sorry, I just meant to cancel the review request, not sure why my comment 14 got duplicated.  Ignore.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 18

•

10 years ago

Gecko will soon support unicode chars as part of ES6 (bug 952985).

You need to log in before you can comment on or make changes to this bug.