Closed Bug 988387 Opened 10 years ago Closed 9 years ago

Getting whole characters example only includes code for a surrogate pair

Categories

(Developer Documentation Graveyard :: JavaScript, defect, P5)

All
Other
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wowmotty, Assigned: bruant.d)

References

()

Details

:: Developer Documentation Request

      Request Type: Correction
     Gecko Version: unspecified
 Technical Contact: 

:: Details

I am by no means an expert when it comes to dealing with unicode and the multilingual planes. I do have a keyboard plugin with which users contribute keyboard layouts. One such layout is the tamil language which contains a grouping of up to four unicode characters (source: https://github.com/Mottie/Keyboard/blob/master/layouts/tamil.js#L37)

"\u0bb6\u0bcd\u0bb0\u0bc0" and "\u0b95\u0bcd\u0bb7"

I was trying to use the "getting whole characters" code, but it is only designed to examine a surrogate pair. Would it be wrong to just create a loop looking for the next space? Or are the above character groupings not typical?
I am by no means an expert with unicode either. Wondering if this is rather a question for platforms like Stackoverflow. We can always improve the examples of the documentation, but I am not sure if many experts in the unicode area will read this bug report.

First of all, moving over to JavaScript documentation. Let's see if someone has an idea.
Assignee: eshepherd → bruant.d
Component: General → JavaScript
Whiteboard: c=General u=webdev p=0
I think Tom worked in this area some time ago. Maybe he knows.
Flags: needinfo?(evilpies)
See <http://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters> for the answer to your question.

TL;DR You’d need to implement UAX#29’s algorithm for determining grapheme cluster boundaries (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) in JavaScript to do this.
Flags: needinfo?(evilpies)
Thanks Mathias!

So, as you stated, the example I shared is not a surrogate pair, but a grapheme cluster. Thanks for clarifying.

Anyway, I guess that the MDN String.prototype.charAt() page should include some quotes from your talk (very interesting!) and a link to your page instead of the code that is there now.

(In reply to Mathias Bynens from comment #4)
Added a link to Mathias' blog post for now. Feel free to edit the wiki to add more advanced information.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.