Closed
Bug 122584
Opened 23 years ago
Closed 22 years ago
editor problem with surrogate pair
Categories
(Core :: DOM: Editor, defect, P4)
Tracking
()
VERIFIED
FIXED
mozilla1.4beta
People
(Reporter: ftang, Assigned: mozeditor)
References
Details
(Keywords: intl, topembed+, Whiteboard: [adt3])
Attachments
(3 obsolete files)
When we have surrogate pair or unicode character from plan 1 to 16. We should process that two unicode code point as a single entity in 1. cursor movement (move forward and backward) 2. selection (select forward and backward) 3. delection (backspace and delete)
Reporter | ||
Comment 1•23 years ago
|
||
how to reproduce 1. visit http://bugzilla.mozilla.org/attachment.cgi?id=67079&action=view the source of this file is "𠀀𠀁𠀂𠀃" it show 8 ? mark because you don't have a font for that AND bug 122582. 2. selection forward problem: 2.a click before the first "?" 2.b select toward the right, expect result: select the first two "?" together before bug 122582 got fixed. actual result: it could select only the first "?" this is wrong because it only represent half of the character. 3. selection backward problem, try same as 2 but select from the last forward 4. cursor movement problem 4.1 select "File:Edit Page" to bring that page into composer 4.2 move your cursor by using <- or -> key expect result- move across two "?" every time you hit untill bug 122582 got fix actual result- move across only one "?", which mean only half of the unicode plane 1-16 character 5. deletion problem click "Backspace" or "delete" it should remove the whole character (two PRUnichar for plan 1-16) here are more information about plan 1-16 characters or ncr >= 𐀀 one plan 1-16 character is encoded by two PRUnichar (two 16 bits) this is call surrogate pair in unicode. The first PRUnichar is called high surrogate and is range from 0xD800-DBFF and the second PRUnichar is called low surroaget and is range from 0xDC00-0xDFFF in the layout or editor, the surrogate pair should be treat as one single character
Reporter | ||
Comment 2•23 years ago
|
||
I know the cursor movement and selection code is in nsTextFrame.cpp PeekOffset probably the case for case eSelectCharacter: Not sure about the delection code. Maybe we should file a seperate bug for delection and jfrancis can tell us where is that code.
Comment 3•23 years ago
|
||
This problem exists even where bug 122582 doesn't apply. Steps to reproduce: 1) open http://www.geocities.com/i18nguy/unicode-example-plane1.html 2) click next to one of the Gothic or Deseret characters in the table. 3) drag across the character. Expected result: the character will be selected Actual result: as the highlight moves across the character it first becomes ?? with one selected and one not, and then returns to the correct glyph. Similar behaviour with deletion.
Reporter | ||
Comment 4•23 years ago
|
||
should we fix this for GB18030 support?
Reporter | ||
Comment 5•23 years ago
|
||
nsbeta1+ low priority
Reporter | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Reporter | ||
Updated•22 years ago
|
Whiteboard: adt3
Reporter | ||
Comment 9•22 years ago
|
||
it looks like the cursor movement code and selection code are in different place. but looking at the SUN_CTL may help us find all of them
Status: NEW → ASSIGNED
Reporter | ||
Comment 10•22 years ago
|
||
this fix the cursor movement issue with surrogate. We still need to fix 1. mouse selection issue 2. backsapce and detele key should remove the whole surrogate pair
Reporter | ||
Comment 11•22 years ago
|
||
patch handle editor cursor movement AND mouse selection.
Attachment #76119 -
Attachment is obsolete: true
Reporter | ||
Comment 12•22 years ago
|
||
smontagu, can you review the current patch. I still need to talk to jfrancis about delete and backspace. But I may spin that off to a seperate bug.
Assignee | ||
Comment 13•22 years ago
|
||
Now why would they go and do a thing like that to unicode? Unicode was supposed to make things simpler, not harder. Yuck. Editor will have to change in a few places. And that's assuming we fix selection to guarantee that selection boundaries are never "between" one of these funky characters. You can file a bug against me for that. Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs? In other words, could you encounter one such unicode character (say, 0xDC00) without the other half of the pair being around? Would that mean something different? Or is that just illegal? I want to avoid having to walk around the string to figure out what is going on.
Reporter | ||
Comment 14•22 years ago
|
||
>Now why would they go and do a thing like that to unicode? Unicode was supposed >to make things simpler, not harder. Yuck. Because 16 bits cannot encode all the characters people NEEDs and 32 bits is too BIG for most of the people. >Editor will have to change in a few places. And that's assuming we fix selection >to guarantee that selection boundaries are never "between" one of these funky >characters. >You can file a bug against me for that. I don't want to increase your work load. I rather fix that by ourself and ask for your review. >Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs? Yes. They are presereved by Unicode in the very early day. DB800-DBff is called high surrogate and DC00-DFFF is called low surrogate. They have no other usage of these values except surrogate. >In other words, could you encounter one such unicode character (say, 0xDC00) without >the other half of the pair being around? Not in Unicode standard. If such thing happen, the result is undefined. >Would that mean something different? No >Or is that just illegal? YES >I want to avoid having to walk around the string to figure out what is going on.
Comment 15•22 years ago
|
||
Comment on attachment 76133 [details] [diff] [review] v2 patch I think you should use |mContentOffset| instead of |ip[0]| throughout.
Reporter | ||
Comment 16•22 years ago
|
||
Attachment #76133 -
Attachment is obsolete: true
Comment 17•22 years ago
|
||
Comment on attachment 76276 [details] [diff] [review] v3 of patch The patch as is doesn't build, because IS_LOW_SURROGATE is not defined. Assuming that you are aware of that and it is defined in another patch that you will be checking in before this one, r=smontagu
Attachment #76276 -
Flags: review+
Assignee | ||
Comment 18•22 years ago
|
||
I think I will have to do the editor work involved. Deletion is the very hardest thing in the editor.
Comment 19•22 years ago
|
||
Comment on attachment 76276 [details] [diff] [review] v3 of patch sr=kin@netscape.com Assuming the file that defines IS_LOW_SURROGATE() macro (part of patch in bug 130441) is included in nsTextFrame.cpp.
Attachment #76276 -
Flags: superreview+
Comment 20•22 years ago
|
||
Comment on attachment 76276 [details] [diff] [review] v3 of patch a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76276 -
Flags: approval+
Reporter | ||
Comment 21•22 years ago
|
||
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 22•22 years ago
|
||
*** Bug 131066 has been marked as a duplicate of this bug. ***
Comment 23•22 years ago
|
||
Yuying, please verify...thanks..changing qa_contact to ylong
QA Contact: sujay → ylong
Comment 24•22 years ago
|
||
On 03-27 trunk build / WinXP-SimpChinese: Fixed with cursor movement and selection. However, when press backspace or delete key to do deleting, only a half of one character is deleted, and the rest of part is showed "?". Re-open it for now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 25•22 years ago
|
||
Comment on attachment 76276 [details] [diff] [review] v3 of patch patch landed. bug still open. obsoleting patch so it doesn't look like an approved change waiting to land.
Attachment #76276 -
Attachment is obsolete: true
Assignee | ||
Comment 26•22 years ago
|
||
I'm assuming this bug is only open for remaining editor work, so I'm taking ownership.
Assignee: ftang → jfrancis
Status: REOPENED → NEW
Comment 27•22 years ago
|
||
Clearing nsbeta1+ and moving to Mozilla1.01 for remaining work
Updated•22 years ago
|
Reporter | ||
Comment 28•22 years ago
|
||
I agree the remaining issue are nsbeta1- bug.
Assignee | ||
Comment 29•22 years ago
|
||
The days of having a half dozen milestones out in front of us to divide bugs between seem to be gone, though I dont know why. Lumping everything together as far out as I can. I'll pull back things that I am working on as I go.
Target Milestone: mozilla1.0.1 → mozilla1.2beta
Updated•22 years ago
|
Assignee | ||
Comment 30•22 years ago
|
||
[ushing these out as far as bugzilla will let me. I'll pull them back as I work on them.
Target Milestone: mozilla1.2beta → mozilla1.4beta
Comment 31•22 years ago
|
||
batch: adding topembed per Gecko2 document http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
Comment 32•22 years ago
|
||
nsbeta1- for the remaining work.
Comment 33•22 years ago
|
||
can we split the remaining work out into another bug and close this one?
Updated•22 years ago
|
Comment 34•22 years ago
|
||
jfrancis: pls resolve this one as fixed. ylong: can we split the remaining work out into another bug and close this one? thanks!
Whiteboard: adt3 → [adt3]
Comment 35•22 years ago
|
||
Sorry I forgot the add comment after I filed a bug 173541 a few days ago for the remain problem. Mark as fixed for this one.
Status: NEW → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•