When we have surrogate pair or unicode character from plan 1 to 16. We should process that two unicode code point as a single entity in 1. cursor movement (move forward and backward) 2. selection (select forward and backward) 3. delection (backspace and delete)
how to reproduce 1. visit http://bugzilla.mozilla.org/attachment.cgi?id=67079&action=view the source of this file is "𠀀𠀁𠀂𠀃" it show 8 ? mark because you don't have a font for that AND bug 122582. 2. selection forward problem: 2.a click before the first "?" 2.b select toward the right, expect result: select the first two "?" together before bug 122582 got fixed. actual result: it could select only the first "?" this is wrong because it only represent half of the character. 3. selection backward problem, try same as 2 but select from the last forward 4. cursor movement problem 4.1 select "File:Edit Page" to bring that page into composer 4.2 move your cursor by using <- or -> key expect result- move across two "?" every time you hit untill bug 122582 got fix actual result- move across only one "?", which mean only half of the unicode plane 1-16 character 5. deletion problem click "Backspace" or "delete" it should remove the whole character (two PRUnichar for plan 1-16) here are more information about plan 1-16 characters or ncr >= 𐀀 one plan 1-16 character is encoded by two PRUnichar (two 16 bits) this is call surrogate pair in unicode. The first PRUnichar is called high surrogate and is range from 0xD800-DBFF and the second PRUnichar is called low surroaget and is range from 0xDC00-0xDFFF in the layout or editor, the surrogate pair should be treat as one single character
I know the cursor movement and selection code is in nsTextFrame.cpp PeekOffset probably the case for case eSelectCharacter: Not sure about the delection code. Maybe we should file a seperate bug for delection and jfrancis can tell us where is that code.
This problem exists even where bug 122582 doesn't apply. Steps to reproduce: 1) open http://www.geocities.com/i18nguy/unicode-example-plane1.html 2) click next to one of the Gothic or Deseret characters in the table. 3) drag across the character. Expected result: the character will be selected Actual result: as the highlight moves across the character it first becomes ?? with one selected and one not, and then returns to the correct glyph. Similar behaviour with deletion.
should we fix this for GB18030 support?
Keywords: intl, nsbeta1
nsbeta1+ low priority
Assignee: smontagu → ftang
Keywords: nsbeta1 → nsbeta1+
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Priority: -- → P4
give to shanjian
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
reassign to ftang
Assignee: shanjian → ftang
it looks like the cursor movement code and selection code are in different place. but looking at the SUN_CTL may help us find all of them
Status: NEW → ASSIGNED
Created attachment 76119 [details] [diff] [review] patch fix editor cursor movement this fix the cursor movement issue with surrogate. We still need to fix 1. mouse selection issue 2. backsapce and detele key should remove the whole surrogate pair
Created attachment 76133 [details] [diff] [review] v2 patch patch handle editor cursor movement AND mouse selection.
Attachment #76119 - Attachment is obsolete: true
smontagu, can you review the current patch. I still need to talk to jfrancis about delete and backspace. But I may spin that off to a seperate bug.
Now why would they go and do a thing like that to unicode? Unicode was supposed to make things simpler, not harder. Yuck. Editor will have to change in a few places. And that's assuming we fix selection to guarantee that selection boundaries are never "between" one of these funky characters. You can file a bug against me for that. Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs? In other words, could you encounter one such unicode character (say, 0xDC00) without the other half of the pair being around? Would that mean something different? Or is that just illegal? I want to avoid having to walk around the string to figure out what is going on.
>Now why would they go and do a thing like that to unicode? Unicode was supposed >to make things simpler, not harder. Yuck. Because 16 bits cannot encode all the characters people NEEDs and 32 bits is too BIG for most of the people. >Editor will have to change in a few places. And that's assuming we fix selection >to guarantee that selection boundaries are never "between" one of these funky >characters. >You can file a bug against me for that. I don't want to increase your work load. I rather fix that by ourself and ask for your review. >Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs? Yes. They are presereved by Unicode in the very early day. DB800-DBff is called high surrogate and DC00-DFFF is called low surrogate. They have no other usage of these values except surrogate. >In other words, could you encounter one such unicode character (say, 0xDC00) without >the other half of the pair being around? Not in Unicode standard. If such thing happen, the result is undefined. >Would that mean something different? No >Or is that just illegal? YES >I want to avoid having to walk around the string to figure out what is going on.
Comment on attachment 76133 [details] [diff] [review] v2 patch I think you should use |mContentOffset| instead of |ip| throughout.
Created attachment 76276 [details] [diff] [review] v3 of patch
Attachment #76133 - Attachment is obsolete: true
Comment on attachment 76276 [details] [diff] [review] v3 of patch The patch as is doesn't build, because IS_LOW_SURROGATE is not defined. Assuming that you are aware of that and it is defined in another patch that you will be checking in before this one, r=smontagu
Attachment #76276 - Flags: review+
I think I will have to do the editor work involved. Deletion is the very hardest thing in the editor.
Comment on attachment 76276 [details] [diff] [review] v3 of patch email@example.com Assuming the file that defines IS_LOW_SURROGATE() macro (part of patch in bug 130441) is included in nsTextFrame.cpp.
Attachment #76276 - Flags: superreview+
Comment on attachment 76276 [details] [diff] [review] v3 of patch a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76276 - Flags: approval+
fixed and check in
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
*** Bug 131066 has been marked as a duplicate of this bug. ***
Yuying, please verify...thanks..changing qa_contact to ylong
QA Contact: sujay → ylong
On 03-27 trunk build / WinXP-SimpChinese: Fixed with cursor movement and selection. However, when press backspace or delete key to do deleting, only a half of one character is deleted, and the rest of part is showed "?". Re-open it for now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 76276 [details] [diff] [review] v3 of patch patch landed. bug still open. obsoleting patch so it doesn't look like an approved change waiting to land.
Attachment #76276 - Attachment is obsolete: true
I'm assuming this bug is only open for remaining editor work, so I'm taking ownership.
Assignee: ftang → jfrancis
Status: REOPENED → NEW
Clearing nsbeta1+ and moving to Mozilla1.01 for remaining work
Keywords: nsbeta1+ → nsbeta1
Target Milestone: mozilla1.0 → mozilla1.0.1
I agree the remaining issue are nsbeta1- bug.
The days of having a half dozen milestones out in front of us to divide bugs between seem to be gone, though I dont know why. Lumping everything together as far out as I can. I'll pull back things that I am working on as I go.
Target Milestone: mozilla1.0.1 → mozilla1.2beta
[ushing these out as far as bugzilla will let me. I'll pull them back as I work on them.
Target Milestone: mozilla1.2beta → mozilla1.4beta
batch: adding topembed per Gecko2 document http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
nsbeta1- for the remaining work.
Keywords: nsbeta1 → nsbeta1-
can we split the remaining work out into another bug and close this one?
jfrancis: pls resolve this one as fixed. ylong: can we split the remaining work out into another bug and close this one? thanks!
Whiteboard: adt3 → [adt3]
Sorry I forgot the add comment after I filed a bug 173541 a few days ago for the remain problem. Mark as fixed for this one.
Status: NEW → RESOLVED
Last Resolved: 16 years ago → 16 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.