Closed Bug 78039 Opened 23 years ago Closed 23 years ago

Copy-pasting Japanese (Unicode) characters to a Unicode-aware app doesn't work

Categories

(Core :: XUL, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 10816
mozilla0.9.2

People

(Reporter: hsivonen, Assigned: nhottanscp)

References

()

Details

(Keywords: intl, Whiteboard: OSX+)

Build ID: 2001-04-18 FizzillaCFM

Steps to reproduce:
1) Load a UTF-8 page with some Japanese text eg.
http://www.w3.org/Graphics/SVG/Overview.htm8
2) Locate a piece of Japanese text on the page.
3) Copy Japanese text and some English text around it.
4) Paste to TextEdit

Actual results:
Only the Latin characters are pasted. The Japanese characters are lost.

Expected results:
Expected all characters to be pasted.
what is the os data flavor for unicode? i'm sure this won't work on macos9 
either.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
You need to put Mac-encoded Japanese on the clipboard, using the Unicode 
Converter to convert from Unicode to styled text. See the text drawing code for 
similar stuff, but note that you'll have to build the style scrap data too.
a-hah!

kScrapFlavorTypeUnicode = 'utxt'

we just need to add a mapping to this in the mimeMapper and it should just work.
Target Milestone: mozilla1.0 → mozilla0.9.2
This bug is not related only to japanese or only to unicode. Simply go to any
page in cyrillic, or open cyrillic in the editor or in mozilla mail and try to
copy it and paste in whichever editor you will, and the only thing that gets
pasted in is spaces, commas, colons and so on. As well as other ascii characters...
I'm sorry. I forgot to not that this bug report relates to Mac OS X. My comment
was about regular Mac version (not Fizilla) on MacOS 9.1, and it is the same for
MacOS 9.04.
this is a dupe of a bug nhotta has. the fix is a one-liner.
Assignee: pinkerton → nhotta
Status: ASSIGNED → NEW
one liner?
Status: NEW → ASSIGNED
Component: DOM to Text Conversion → XP Toolkit/Widgets
Keywords: intl
'utxt' support was checked in by bug 10816.
But seems to be some remaining problems. I plan to look into them in 0.9.2.
Current status is that one can paste unicode text (UTXT) *into* Mozilla (e.g.
from WorldText), but that only ISO-8859-1 text is copied *from* Mozilla to
external Apps, which could be caused by the fact that Mozilla copies both STYL
and UTXT, but that it supports multilingualism on the UTXT-layer only, but not
yet on the STYL layer. External app's takes STYL before UTXT clipboard. And
hence the trouble.
Whiteboard: OSX+
I tried today's Macintosh build on MacOS X (US).
* I went to http://home.netscape.com/ja/download/index.html, the page contains 
Japanese text and English (e.g. Netscape, Communicator).
* I copied characters from Mozilla and pasted to "TextEdit", the text pasted 
correctly (both Japanese and English).
* In "TextEdit", I selected menu "Format -> Make Plain Text". Then selected some 
characters (English and Japanese), pasted to Mozilla HTML composer. The text was 
pasted correctly (both English and Japanese).
* I repeated the same test for Korean, http://home.netscape.com/ko/download. 
Korean also worked, I could also copy/paste Korean and Japanese between the 
apps.

So plain text copy/paste is working by 'utxt' support.

Leif, do you have the text you tried for "WorldText" available somewhere on the 
web? I may try the same text on my machine to see if "TextEdit" also has the 
problem.
The problem is or might be *not* that Mozilla doesn't support UTXT (I have allready  confirmed some instances when UTXT is clearly working!!!). The real problem is that MOzilla uncorrectly copies text containing ISO-8859-1 as both *wrongly formatted* styl TEXT and as unicode text. It does this *only* when the copy contain text from 8859-1-range. (And it creates problems because most apps prefer STYL TEXT over unicode text.) Else, if you select japanese or cyrillic character(s), they are copied and pasted as unformatted unicode text. 

The two flavours on the clipboard should be identical in content: it should be a multilingual style TEXT layer and multilingual UTXT layer. Since Mozilla clipboard does support multilingualism *only* via 'UTXT' and not (yet) via 'STYL', it can only go wrong when it copies anything as *formatted TEXT*. That means that non-8859-1 text *must* be deleted. At least on the TEXT layers. In addition, mac applications generally prefers STYL text over UTXT. Even WorldText does that. So, let us say that Mozilla puts uncorrect info on the TEXT layer, but correct unicode text on the UTXT layer, still text will be pasted into WorldText in the malformed version. 

I cannot say for sure whether the Mozilla puts correct info on the UTXT layer or not in the case I am speakin about, but if things works in TextEdit on OSX but not in WorldText on OS9.1, the only answer to that can be that TextEdit on OSX prefers UTXT over the TEXT layer.  But I can say for sure that disabling copying of text as formatted STYL TEXT should help us get better results

You find the WorldText program inside the full version CD of OS9.1.  I can also send it to you if you wish.
I have 9.1 CD so I will try WorldText later. Not sure why it has a problem
because mozilla converts Latin1 to MacRoman for 'TEXT' and most of the
characters are supposed to be preserved.
Did not understand you comment... The problem is not that Latin1 is mistreated. Latin1 is preserved wonderfully. But in a text containing cyrillic, it is not enough to preserve Latin1... The problem is in Mozilla: why does it copy as unoformatted UTXT if you copy *just* cyrillic text, but as both TEXT and UTXT if it copies something with 8859-1-text?

OK -- you are not wrong: I pasted content of www.syndod.com into an editor that prefers UTXT
before TEXT (PEPPER from <www.hekkelman.com>) with success.

So what have we then: are you going to stop copying to the TEXT layer since it is not being used anyway?
Or are you going to fix TEXT so it can also work with multilingual text and not confuse WorldText
and similar applications?
'TEXT' cannot hold more than one script, so Latin1 and Cyrillic cannot exist
together, this is not a limitaion of Mozilla.
'TEXT' cannot be dropped since not everybody understands 'utxt'. Applications
which support 'utxt' should prefer 'utxt' over 'TEXT', those applications have
to fix it.
Perhaps I have confused you with something, so let me quote
from Tomasz comment on bug 10816:
"but I guess it [WASTE-based apps] tries to read 'TEXT' with 'styl' first,
and  then if it is not present, it reads 'utxt' (which is without style 
information). If I may guess, MLTE does the same. Actually I think it 
should be done the other way: 'utxt' should be "preferred" since we do 
not know if 8-bit text carries correct, but this is not acceptable for 
WASTE an MLTE because you lose style information."

So, you are wrong: 'TEXT' with 'styl' is the hitherto standard metod
for multilingual text on the Mac. And the text copied from Mozilla is,
according to WorldScript "formatted text", which one must assume
is the same as 'TEXT' with 'styl'. 

We both know that Mozilla does not support multilingualism via 'TEXT'
with 'styl' currently. So why does it copy text like that then?
And of course Mozilla should not put on the clipboard to different
contents: all the ASCII etc on TEXT layer and and something else on the
UTXT layer. This is indeed Mozilla's problem.
You say that "'TEXT' cannot be dropped since not everybody 
understands 'utxt'."  Yes, and No. If we speak about copying of 
a 8859-1 text, current behaviour is OK. 


But for other texts, TEXT must not be copied until multilingualism 
via  "'Text' with 'styl'" is supported. The current
behaviour isn't of help to anyone.
I can understand that if both 'TEXT' and 'styl' exist then prefer 'TEXT' over
'utxt'. But only 'TEXT' and no 'styl' then 'utxt' may be preferred.

I do not agree with your last comment about not use 'TEXT'. It works fine if the
text is not multi scripts and the script of the text matches with the systems
script (I think this is a situation of many users).
 
You said: "But only 'TEXT' and no 'styl' then 'utxt' may be preferred." 
I say: Mozilla give use 'TEXT' *with* 'styl'  -- according to WorldText. 
I do not know what could theortically happen if you made sure that it
copied 'TEXT' only. But I don't think that you can expect  Mozilla to 
change the way Mac apps is handeling text... 

You said.: "I do not agree with your last comment about not use 'TEXT'. 
It works fine if the text is not multi scripts"
I say: This is wrong. Plain wrong. It only works correct if the text belongs 
to ISO-8859-1.  

You said: "...and the script of the text  matches with the systems script 
I say: I thought we had agreed upon that the consept of "system script" does
not fit for the Mac. This reminds of a previous discussion... when Marina
claimed that it mattered if I pasted into russian or latin Simple Text...

You said: "(I think this is a situation of many users)."
I say: Please tell me about one such experience then. Give me an example.
>Mozilla give use 'TEXT' *with* 'styl'  -- according to WorldText.
I installed WorldText, could you tell me how to check that?

Current mozilla support of 'TEXT' is based on a system script. I understand that
this does not work for everybody but we cannot drop its support for that reason.
'style' support issue is filed separatly (bug 79864).
Select "Show Clipboard" from the 'Edit' menu so the clipboard window
shows. Then you can read at the top of that window informtation
about the content of the clipboard.
 -
>Current mozilla support of 'TEXT' is based on a system script.
Once again, can't you give an example  of what you mean? 
Give an example for another OS if you don't know it from Mac...
 -
'TEXT' should not be dropped but expanded with fix for bug 79864.
I can understand if you keep the current behaviour until then, to
support "core user base" which need only 8859-1. That's an 
understandable argument/compromise, even if I don't like it.
But a better solution would be to make it so that Mozilla skips
'TEXT' as soon as it detects non-8859-1 (or non-system-scrip,
should things really work that way...). Because, there simply is
no use for 'TEXT' in those cases, just trouble... But that might not 
be an "one liner" ? 
I tried WorldText "Show Clipboard". It says "styled text", not sure if this
means 'styl'. I found that if I copy a plain text from "TextEdit" then "Show
Clipboard" also says "styled text". And the plain text from "TextEdit" to
WorldText does not work for Japanese text (on US MacOS X). So this could be a
generic WorldText problem. Please try this combination (WorldText and TextEdit).


Example of system script. Japanese localized MacOS system script is smJapanese.
So mozilla can convert unicode to Shift_JIS without losing Japanese characters.
I was able to copy/paste Japanese text from mozilla to simpletext.


>And the plain text from "TextEdit" to WorldText does not work for Japanese text
(on US MacOS X). 
Did you try select all and then "Font substitution" from 'Layout' menu?

>...Japanese localized MacOS system script is smJapanese.
>...I was able to copy/paste Japanese text from mozilla to simpletext.
Where you unable to copy/paste ISO-8859-1?
bugzilla-daemon@mozilla.org wrote:
> Example of system script. Japanese localized MacOS system script is smJapanese.
> So mozilla can convert unicode to Shift_JIS without losing Japanese characters.
> I was able to copy/paste Japanese text from mozilla to simpletext.

Simple question. I can copy and paste Arabic text (Windows 1256) from Win Mozilla 
to Word 2000. This is on Windows 2000 Pro Japanese. System script (locale) is 
Japanese, of course. Why I cannot do the same thing with Mac Mozilla? Where is  
Mozilla's cross-platform compatibility? 
Unicode clipboard is used for Windows2000 but not for Windows98. There are
limitations based on availability of unicode support by OS.
I mark this as a dup of bug 10816 ('utxt' support). The problem described in the 
original comment is fixed.

To Leif's questions,
* No, I did not try "Font substitution" but I think it doesn't matter for plain 
text.
* No, Japanese only, no Latin1. As I mentioned before, plain text cannot hold 
more than two scripts.

*** This bug has been marked as a duplicate of 10816 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
* I agree that this is a duplicate of the older bug 10816
** If you did not try fontsubstitution, please try before ruling it out. Though
I agree that WorldText might have bugs, like for instance telling that
unstyledtext is styled. And I also agree that WorldText should prefer
unformatted 'utxt' over unformatted 'text'. Not doing so is illogical and
perhaps even a bug.
*** As for Japanese vs Latin1: I did of course not think about a situation where
you copied *both* japanese and latin1 at the same time. This last note only to
save my already damaged repuation ;-) 
I tried again with "Font substitution" checked but the same result, could not
copy Japanese text correctly.

Ok. Perhaps it is WorldText's misinterpreation of the text as styled that is the
cause.
Btw, I have learned that if you copy e.g. [*only*] the CYRILLIC CAPITAL LETTER
PE Mozilla put a two-byte 'utxt' flavor on the clipboard (0x041F, as expected),
Mozilla also put an *empty* 'TEXT' flavor on the clipboard. E.g. WASTE apps then
see the 'TEXT' flavor and displays/pasts that empty text instead. There are
*many* WASTE apps around.. WASTE would however have no problem with this if it
preferred 'utxt' over 'text'. And to fix this (remove that empty(!) TEXT) would
probably only be intermediate until 'text' with 'styl' is supported?

The empty text is a result of a conversion error.
Filed a bug 83218 for fallback to question marks instead of skipping unconverted
characters.
qa_contact to John Morrison
QA Contact: sujay → jrgm
verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.