editor problem with surrogate pair

VERIFIED FIXED in mozilla1.4beta

Status

()

Core
Editor
P4
normal
VERIFIED FIXED
17 years ago
16 years ago

People

(Reporter: Frank Tang, Assigned: Joe Francis)

Tracking

({intl, topembed+})

Trunk
mozilla1.4beta
x86
Windows 2000
intl, topembed+
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt3])

Attachments

(3 obsolete attachments)

(Reporter)

Description

17 years ago
When we have surrogate pair or unicode character from plan 1 to 16.
We should process that two unicode code point as a single entity in
1. cursor movement (move forward and backward)
2. selection (select forward and backward)
3. delection (backspace and delete)
(Reporter)

Comment 1

17 years ago
how to reproduce
1. visit http://bugzilla.mozilla.org/attachment.cgi?id=67079&action=view
the source of this file is
"𠀀𠀁𠀂𠀃"
it show 8 ? mark because you don't have a font for that AND bug 122582.
2. selection forward problem:
2.a click before the first "?"
2.b select toward the right, 
expect result:
select the first two "?" together before bug 122582 got fixed. 
actual result:
it could select only the first "?"
this is wrong because it only represent half of the character.
3. selection backward problem, try same as 2 but select from the last forward

4. cursor movement problem
4.1 select "File:Edit Page" to bring that page into composer
4.2 move your cursor  by using <- or -> key
expect result- move across two "?" every time you hit untill bug 122582 got fix
actual result- move across only one "?", which mean only half of the unicode
plane 1-16  character

5. deletion problem
click "Backspace" or "delete" it should remove the whole character (two
PRUnichar for plan 1-16)

here are more information about plan 1-16 characters or ncr >= &#x10000;
one plan 1-16 character is encoded by two PRUnichar (two 16 bits)
this is call surrogate pair in unicode.
The first PRUnichar is called high surrogate and is range from 0xD800-DBFF
and the second PRUnichar is called low surroaget and is range from 
0xDC00-0xDFFF

in the layout or editor, the surrogate pair should be treat as one single
character 
(Reporter)

Comment 2

17 years ago
I know the cursor movement and selection code is in
nsTextFrame.cpp PeekOffset
probably the case for
case eSelectCharacter:

Not sure about the delection code. Maybe we should file a seperate bug for
delection and jfrancis can tell us where is that code. 
This problem exists even where bug 122582 doesn't apply. Steps to reproduce:

1) open http://www.geocities.com/i18nguy/unicode-example-plane1.html
2) click next to one of the Gothic or Deseret characters in the table.
3) drag across the character.
Expected result: the character will be selected
Actual result: as the highlight moves across the character it first becomes ??
with one selected and one not, and then returns to the correct glyph.

Similar behaviour with deletion.
(Reporter)

Comment 4

16 years ago
should we fix this for GB18030 support?
Keywords: intl, nsbeta1
(Reporter)

Comment 5

16 years ago
nsbeta1+ low priority
Assignee: smontagu → ftang
Keywords: nsbeta1 → nsbeta1+
(Reporter)

Updated

16 years ago
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
(Reporter)

Comment 6

16 years ago
P4
Priority: -- → P4
(Reporter)

Comment 7

16 years ago
give to shanjian
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
(Reporter)

Updated

16 years ago
Whiteboard: adt3
(Reporter)

Comment 8

16 years ago
reassign to ftang
Assignee: shanjian → ftang
(Reporter)

Comment 9

16 years ago
it looks like the cursor movement code and selection code are in different
place. but looking at the SUN_CTL may help us find all of them
Status: NEW → ASSIGNED
(Reporter)

Comment 10

16 years ago
Created attachment 76119 [details] [diff] [review]
patch fix editor cursor movement

this fix the cursor movement issue with surrogate. We still need to fix
1. mouse selection issue
2. backsapce and detele key should remove the whole surrogate pair
(Reporter)

Comment 11

16 years ago
Created attachment 76133 [details] [diff] [review]
v2 patch

patch handle editor cursor movement AND mouse selection.
Attachment #76119 - Attachment is obsolete: true
(Reporter)

Comment 12

16 years ago
smontagu, can you review the current patch. I still need to talk to jfrancis
about delete and backspace. But I may spin that off to a seperate bug. 
(Assignee)

Comment 13

16 years ago
Now why would they go and do a thing like that to unicode?  Unicode was supposed 
to make things simpler, not harder.  Yuck.

Editor will have to change in a few places.  And that's assuming we fix selection 
to guarantee that selection boundaries are never "between" one of these funky 
characters.  

You can file a bug against me for that.

Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs?  In 
other words, could you encounter one such unicode character (say, 0xDC00) without 
the other half of the pair being around?  Would that mean something different?  
Or is that just illegal?  I want to avoid having to walk around the string to 
figure out what is going on.
(Reporter)

Comment 14

16 years ago
>Now why would they go and do a thing like that to unicode?  Unicode was supposed 
>to make things simpler, not harder.  Yuck.
Because 16 bits cannot encode all the characters people NEEDs and 32 bits is too
BIG for most of the people. 

>Editor will have to change in a few places.  And that's assuming we fix selection 
>to guarantee that selection boundaries are never "between" one of these funky 
>characters.  
>You can file a bug against me for that.
I don't want to increase your work load. I rather fix that by ourself and ask
for your review.

>Are the 0xD800-DBFF and 0xDC00-0xDFFF unicode ranges used only for the pairs?  
Yes. They are presereved by Unicode in the very early day. DB800-DBff is called
high surrogate and DC00-DFFF is called low surrogate. They have no other usage
of these values except surrogate.

>In other words, could you encounter one such unicode character (say, 0xDC00)
without 
>the other half of the pair being around? 
Not in Unicode standard. If such thing happen, the result is undefined. 

>Would that mean something different?  
No

>Or is that just illegal?  
YES
>I want to avoid having to walk around the string to figure out what is going on.

Comment on attachment 76133 [details] [diff] [review]
v2 patch

I think you should use |mContentOffset| instead of |ip[0]| throughout.
(Reporter)

Comment 16

16 years ago
Created attachment 76276 [details] [diff] [review]
v3 of patch
Attachment #76133 - Attachment is obsolete: true
Comment on attachment 76276 [details] [diff] [review]
v3 of patch

The patch as is doesn't build, because IS_LOW_SURROGATE is not defined.
Assuming that you are aware of that and it is defined in another patch that you
will be checking in before this one, r=smontagu
Attachment #76276 - Flags: review+
(Assignee)

Comment 18

16 years ago
I think I will have to do the editor work involved.  Deletion is the very hardest 
thing in the editor.  

Comment 19

16 years ago
Comment on attachment 76276 [details] [diff] [review]
v3 of patch

sr=kin@netscape.com

Assuming the file that defines IS_LOW_SURROGATE() macro (part of patch in bug
130441) is included in nsTextFrame.cpp.
Attachment #76276 - Flags: superreview+
(Reporter)

Updated

16 years ago
Blocks: 104060

Comment 20

16 years ago
Comment on attachment 76276 [details] [diff] [review]
v3 of patch

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76276 - Flags: approval+
(Reporter)

Comment 21

16 years ago
fixed and check in 
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
(Reporter)

Comment 22

16 years ago
*** Bug 131066 has been marked as a duplicate of this bug. ***

Comment 23

16 years ago
Yuying, please verify...thanks..changing qa_contact to ylong
QA Contact: sujay → ylong

Comment 24

16 years ago
On 03-27 trunk build / WinXP-SimpChinese:
Fixed with cursor movement and selection.
However, when press backspace or delete key to do deleting, only a half of one
character is deleted, and the rest of part is showed "?". 

Re-open it for now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 25

16 years ago
Comment on attachment 76276 [details] [diff] [review]
v3 of patch

patch landed. bug still open. obsoleting patch so it doesn't look like an
approved change waiting to land.
Attachment #76276 - Attachment is obsolete: true
(Assignee)

Comment 26

16 years ago
I'm assuming this bug is only open for remaining editor work, so I'm taking 
ownership.  
Assignee: ftang → jfrancis
Status: REOPENED → NEW
Clearing nsbeta1+ and moving to Mozilla1.01 for remaining work
Keywords: nsbeta1+ → nsbeta1
Target Milestone: mozilla1.0 → mozilla1.0.1

Updated

16 years ago
Keywords: nsbeta1 → nsbeta1-
(Reporter)

Comment 28

16 years ago
I agree the remaining issue are nsbeta1- bug. 
(Reporter)

Updated

16 years ago
Blocks: 157673
(Assignee)

Comment 29

16 years ago
The days of having a half dozen milestones out in front of us to divide bugs 
between seem to be gone, though I dont know why.  Lumping everything together as 
far out as I can.  I'll pull back things that I am working on as I go.
Target Milestone: mozilla1.0.1 → mozilla1.2beta

Updated

16 years ago
Keywords: nsbeta1- → nsbeta1
(Assignee)

Comment 30

16 years ago
[ushing these out as far as bugzilla will let me.  I'll pull them back as I work
on them.
Target Milestone: mozilla1.2beta → mozilla1.4beta

Comment 31

16 years ago
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
nsbeta1- for the remaining work.
Keywords: nsbeta1 → nsbeta1-

Comment 33

16 years ago
can we split the remaining work out into another bug and close this one?

Updated

16 years ago
Keywords: topembed → topembed+

Comment 34

16 years ago
jfrancis: pls resolve this one as fixed.

ylong: can we split the remaining work out into another bug and close this one?
thanks!
Whiteboard: adt3 → [adt3]

Comment 35

16 years ago
Sorry I forgot the add comment after I filed a bug 173541 a few days ago for the
remain problem.

Mark as fixed for this one.
Status: NEW → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED

Comment 36

16 years ago
Verified.
Status: RESOLVED → VERIFIED

Updated

16 years ago
Blocks: 176349
(Reporter)

Updated

16 years ago
Depends on: 180372
(Reporter)

Updated

16 years ago
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.