Closed
Bug 898594
Opened 11 years ago
Closed 11 years ago
support unicode escapes like \u0020
Categories
(L20n :: JS Library, defect, P3)
L20n
JS Library
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 918655
1.0
People
(Reporter: Pike, Assigned: zbraniecki)
Details
Attachments
(1 file, 2 obsolete files)
2.05 KB,
patch
|
Details | Diff | Splinter Review |
L20n strings should support unicode escapes of sorts in case someone has problems to enter a string through the keyboard, or wants an explicit funny char. I propose we use "standard" unicode escapes, \u0020.
Comment 1•11 years ago
|
||
Gandalf, does this affect your work in bug 881958?
Flags: needinfo?(gandalf)
Assignee | ||
Comment 2•11 years ago
|
||
I don't know. We currently load files via XHR and it does not unescape unicode chars. At the same time writing strings into JS does. If we can find a way to unescape strings while loading from XHR that'll be perfect. Otherwise we'll have to do this manually which will be very expensive.
Flags: needinfo?(gandalf)
Comment 3•11 years ago
|
||
String.fromCharCode looks promising: [15:43:24.636] String.fromCharCode(0x00A9) [15:43:24.639] "©" https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCharCode I guess we could detect \u and take the four characters that follow and replace the whole thing with the return value of String.fromCharCode?
Comment 4•11 years ago
|
||
Tentatively setting target milestone to 1.0.
Priority: -- → P3
Target Milestone: --- → 1.0
Updated•11 years ago
|
Component: General → JS Library
Assignee | ||
Comment 5•11 years ago
|
||
Here's a proposal patch. it seems that it has to be done manually. We can either done it on the whole source, or within parseString. The major difference is that in the former case we allow for unicode chars to be used in building syntax, not only string values. Wrt. performance, I didn't test it and I'm not sure how to reliably test which method will be faster (one cycle through a bigger string vs. multiple cycles on small chunks).
Attachment #784117 -
Flags: feedback?(stas)
Comment 6•11 years ago
|
||
(In reply to Zbigniew Braniecki [:gandalf] from comment #5) > The major difference is that in the former case we allow for unicode chars > to be used in building syntax, not only string values. I think that replacing in the whole source is wrong. It could lead to weird obfuscation attacks, and would make it harder to write external tools. Moreover, without proper parsing, it would be hard to know if a given \u isn't escaped itself. E.g. \\u00ae. I suggest we only unescape \u in Strings, but only after unescaping actual backslashes and unly if the string contains "\u". See bug 881958.
Comment 7•11 years ago
|
||
Comment on attachment 784117 [details] [diff] [review] patch Review of attachment 784117 [details] [diff] [review]: ----------------------------------------------------------------- feedback- based on my comment 6
Attachment #784117 -
Flags: feedback?(stas) → feedback-
Reporter | ||
Comment 8•11 years ago
|
||
I agree that we should restrict unicode escapes to the literal string parts.
Assignee | ||
Comment 9•11 years ago
|
||
Updated patch. Not review ready, cause I want to play with bug 881958 and figure out how to layout those two together, but it's feature ready so asking for feedback. The impact of this bug on performance is as follows: 1) On our jsshell benchmark it's a 14% hit on parser :( 2) On Keon v1-train, settings app it's not noticeable For me 1) is a pretty significant cost. 14% for an edge feature is a sad tradeoff to make :( 2) is reassuring. As I keep believing that the whole parsing has little impact on Gaia overall (because it happens in parallel with other startup activities) it seems that we can land it without getting hit on Gaia. Still, would love to avoid the cost. Also the cost of doing this "per literal" is 4x comparing to doing it once for the whole string according to jsshell benchmark
Assignee: nobody → gandalf
Attachment #784117 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #784689 -
Flags: feedback?(stas)
Attachment #784689 -
Flags: feedback?(l10n)
Comment 10•11 years ago
|
||
Comment on attachment 784689 [details] [diff] [review] patch v2 Review of attachment 784689 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/l20n/parser.js @@ +168,4 @@ > _index = close + len; > return { > type: 'String', > + content: decodeUnicode(str), This runs replace on every string in the source file. As you mention, there's a performance hit. On my machine, it's 26%: master: 727 µs patch: 912 µs (+26%) However, if you guard that with a simple indexOf like so: content: str.indexOf('\\u') === -1 ? str : decodeUnicode(str), the perf hit depends solely on the actual use of \u escapes in the string. Our benchmark file does't use any, so the hit is small (yet still statistically significant) indexOf: 758 µs (+4%) Depending on the unescaping strategy from bug 881958, this may or my not produce false positives, but that's okay, because they should be very rare and they won't break decodeUnicode anyways.
Attachment #784689 -
Flags: feedback?(stas) → feedback-
Reporter | ||
Comment 11•11 years ago
|
||
Comment on attachment 784689 [details] [diff] [review] patch v2 Review of attachment 784689 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/l20n/parser.js @@ +168,4 @@ > _index = close + len; > return { > type: 'String', > + content: decodeUnicode(str), Also, this does escape "foo {{ \u00EF }}", right?
Attachment #784689 -
Flags: feedback?(l10n) → feedback-
Assignee | ||
Comment 12•11 years ago
|
||
patch v3. The cost on jsshell is marginal (<100us), the cost on node is less marginal (300us).
Attachment #784689 -
Attachment is obsolete: true
Attachment #807440 -
Flags: review?(stas)
Assignee | ||
Comment 13•11 years ago
|
||
if this patch gets r+, I'll add tests before landing it
Comment 14•11 years ago
|
||
Comment on attachment 807440 [details] [diff] [review] patch v3 Review of attachment 807440 [details] [diff] [review]: ----------------------------------------------------------------- The patch in bug 918655 also seems to address this. Do you still want me to review this patch?
Assignee | ||
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment 16•11 years ago
|
||
Comment on attachment 807440 [details] [diff] [review] patch v3 Review of attachment 807440 [details] [diff] [review]: ----------------------------------------------------------------- The patch in bug 918655 also seems to address this. Do you still want me to review this patch?
Attachment #807440 -
Flags: review?(stas)
Comment 17•11 years ago
|
||
Oops, sorry, I just meant to cancel the review request, not sure why my comment 14 got duplicated. Ignore.
Assignee | ||
Comment 18•10 years ago
|
||
Gecko will soon support unicode chars as part of ES6 (bug 952985).
You need to log in
before you can comment on or make changes to this bug.
Description
•