Closed
Bug 78039
Opened 24 years ago
Closed 24 years ago
Copy-pasting Japanese (Unicode) characters to a Unicode-aware app doesn't work
Categories
(Core :: XUL, defect)
Tracking
()
mozilla0.9.2
People
(Reporter: hsivonen, Assigned: nhottanscp)
References
()
Details
(Keywords: intl, Whiteboard: OSX+)
Build ID: 2001-04-18 FizzillaCFM
Steps to reproduce:
1) Load a UTF-8 page with some Japanese text eg.
http://www.w3.org/Graphics/SVG/Overview.htm8
2) Locate a piece of Japanese text on the page.
3) Copy Japanese text and some English text around it.
4) Paste to TextEdit
Actual results:
Only the Latin characters are pasted. The Japanese characters are lost.
Expected results:
Expected all characters to be pasted.
Comment 1•24 years ago
|
||
what is the os data flavor for unicode? i'm sure this won't work on macos9
either.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment 2•24 years ago
|
||
You need to put Mac-encoded Japanese on the clipboard, using the Unicode
Converter to convert from Unicode to styled text. See the text drawing code for
similar stuff, but note that you'll have to build the style scrap data too.
Comment 3•24 years ago
|
||
a-hah!
kScrapFlavorTypeUnicode = 'utxt'
we just need to add a mapping to this in the mimeMapper and it should just work.
Target Milestone: mozilla1.0 → mozilla0.9.2
Comment 4•24 years ago
|
||
This bug is not related only to japanese or only to unicode. Simply go to any
page in cyrillic, or open cyrillic in the editor or in mozilla mail and try to
copy it and paste in whichever editor you will, and the only thing that gets
pasted in is spaces, commas, colons and so on. As well as other ascii characters...
Comment 5•24 years ago
|
||
I'm sorry. I forgot to not that this bug report relates to Mac OS X. My comment
was about regular Mac version (not Fizilla) on MacOS 9.1, and it is the same for
MacOS 9.04.
Comment 6•24 years ago
|
||
this is a dupe of a bug nhotta has. the fix is a one-liner.
Assignee: pinkerton → nhotta
Status: ASSIGNED → NEW
Comment 7•24 years ago
|
||
one liner?
| Assignee | ||
Updated•24 years ago
|
| Assignee | ||
Comment 8•24 years ago
|
||
'utxt' support was checked in by bug 10816.
But seems to be some remaining problems. I plan to look into them in 0.9.2.
Comment 9•24 years ago
|
||
Current status is that one can paste unicode text (UTXT) *into* Mozilla (e.g.
from WorldText), but that only ISO-8859-1 text is copied *from* Mozilla to
external Apps, which could be caused by the fact that Mozilla copies both STYL
and UTXT, but that it supports multilingualism on the UTXT-layer only, but not
yet on the STYL layer. External app's takes STYL before UTXT clipboard. And
hence the trouble.
Updated•24 years ago
|
Whiteboard: OSX+
| Assignee | ||
Comment 10•24 years ago
|
||
I tried today's Macintosh build on MacOS X (US).
* I went to http://home.netscape.com/ja/download/index.html, the page contains
Japanese text and English (e.g. Netscape, Communicator).
* I copied characters from Mozilla and pasted to "TextEdit", the text pasted
correctly (both Japanese and English).
* In "TextEdit", I selected menu "Format -> Make Plain Text". Then selected some
characters (English and Japanese), pasted to Mozilla HTML composer. The text was
pasted correctly (both English and Japanese).
* I repeated the same test for Korean, http://home.netscape.com/ko/download.
Korean also worked, I could also copy/paste Korean and Japanese between the
apps.
So plain text copy/paste is working by 'utxt' support.
Leif, do you have the text you tried for "WorldText" available somewhere on the
web? I may try the same text on my machine to see if "TextEdit" also has the
problem.
Comment 11•24 years ago
|
||
The problem is or might be *not* that Mozilla doesn't support UTXT (I have allready confirmed some instances when UTXT is clearly working!!!). The real problem is that MOzilla uncorrectly copies text containing ISO-8859-1 as both *wrongly formatted* styl TEXT and as unicode text. It does this *only* when the copy contain text from 8859-1-range. (And it creates problems because most apps prefer STYL TEXT over unicode text.) Else, if you select japanese or cyrillic character(s), they are copied and pasted as unformatted unicode text.
The two flavours on the clipboard should be identical in content: it should be a multilingual style TEXT layer and multilingual UTXT layer. Since Mozilla clipboard does support multilingualism *only* via 'UTXT' and not (yet) via 'STYL', it can only go wrong when it copies anything as *formatted TEXT*. That means that non-8859-1 text *must* be deleted. At least on the TEXT layers. In addition, mac applications generally prefers STYL text over UTXT. Even WorldText does that. So, let us say that Mozilla puts uncorrect info on the TEXT layer, but correct unicode text on the UTXT layer, still text will be pasted into WorldText in the malformed version.
I cannot say for sure whether the Mozilla puts correct info on the UTXT layer or not in the case I am speakin about, but if things works in TextEdit on OSX but not in WorldText on OS9.1, the only answer to that can be that TextEdit on OSX prefers UTXT over the TEXT layer. But I can say for sure that disabling copying of text as formatted STYL TEXT should help us get better results
You find the WorldText program inside the full version CD of OS9.1. I can also send it to you if you wish.
| Assignee | ||
Comment 12•24 years ago
|
||
I have 9.1 CD so I will try WorldText later. Not sure why it has a problem
because mozilla converts Latin1 to MacRoman for 'TEXT' and most of the
characters are supposed to be preserved.
Comment 13•24 years ago
|
||
Did not understand you comment... The problem is not that Latin1 is mistreated. Latin1 is preserved wonderfully. But in a text containing cyrillic, it is not enough to preserve Latin1... The problem is in Mozilla: why does it copy as unoformatted UTXT if you copy *just* cyrillic text, but as both TEXT and UTXT if it copies something with 8859-1-text?
Comment 14•24 years ago
|
||
OK -- you are not wrong: I pasted content of www.syndod.com into an editor that prefers UTXT
before TEXT (PEPPER from <www.hekkelman.com>) with success.
So what have we then: are you going to stop copying to the TEXT layer since it is not being used anyway?
Or are you going to fix TEXT so it can also work with multilingual text and not confuse WorldText
and similar applications?
| Assignee | ||
Comment 15•24 years ago
|
||
'TEXT' cannot hold more than one script, so Latin1 and Cyrillic cannot exist
together, this is not a limitaion of Mozilla.
'TEXT' cannot be dropped since not everybody understands 'utxt'. Applications
which support 'utxt' should prefer 'utxt' over 'TEXT', those applications have
to fix it.
Comment 16•24 years ago
|
||
Perhaps I have confused you with something, so let me quote
from Tomasz comment on bug 10816:
"but I guess it [WASTE-based apps] tries to read 'TEXT' with 'styl' first,
and then if it is not present, it reads 'utxt' (which is without style
information). If I may guess, MLTE does the same. Actually I think it
should be done the other way: 'utxt' should be "preferred" since we do
not know if 8-bit text carries correct, but this is not acceptable for
WASTE an MLTE because you lose style information."
So, you are wrong: 'TEXT' with 'styl' is the hitherto standard metod
for multilingual text on the Mac. And the text copied from Mozilla is,
according to WorldScript "formatted text", which one must assume
is the same as 'TEXT' with 'styl'.
We both know that Mozilla does not support multilingualism via 'TEXT'
with 'styl' currently. So why does it copy text like that then?
And of course Mozilla should not put on the clipboard to different
contents: all the ASCII etc on TEXT layer and and something else on the
UTXT layer. This is indeed Mozilla's problem.
Comment 17•24 years ago
|
||
You say that "'TEXT' cannot be dropped since not everybody
understands 'utxt'." Yes, and No. If we speak about copying of
a 8859-1 text, current behaviour is OK.
But for other texts, TEXT must not be copied until multilingualism
via "'Text' with 'styl'" is supported. The current
behaviour isn't of help to anyone.
| Assignee | ||
Comment 18•24 years ago
|
||
I can understand that if both 'TEXT' and 'styl' exist then prefer 'TEXT' over
'utxt'. But only 'TEXT' and no 'styl' then 'utxt' may be preferred.
I do not agree with your last comment about not use 'TEXT'. It works fine if the
text is not multi scripts and the script of the text matches with the systems
script (I think this is a situation of many users).
Comment 19•24 years ago
|
||
You said: "But only 'TEXT' and no 'styl' then 'utxt' may be preferred."
I say: Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText.
I do not know what could theortically happen if you made sure that it
copied 'TEXT' only. But I don't think that you can expect Mozilla to
change the way Mac apps is handeling text...
You said.: "I do not agree with your last comment about not use 'TEXT'.
It works fine if the text is not multi scripts"
I say: This is wrong. Plain wrong. It only works correct if the text belongs
to ISO-8859-1.
You said: "...and the script of the text matches with the systems script
I say: I thought we had agreed upon that the consept of "system script" does
not fit for the Mac. This reminds of a previous discussion... when Marina
claimed that it mattered if I pasted into russian or latin Simple Text...
You said: "(I think this is a situation of many users)."
I say: Please tell me about one such experience then. Give me an example.
| Assignee | ||
Comment 20•24 years ago
|
||
>Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText.
I installed WorldText, could you tell me how to check that?
Current mozilla support of 'TEXT' is based on a system script. I understand that
this does not work for everybody but we cannot drop its support for that reason.
'style' support issue is filed separatly (bug 79864).
Comment 21•24 years ago
|
||
Select "Show Clipboard" from the 'Edit' menu so the clipboard window
shows. Then you can read at the top of that window informtation
about the content of the clipboard.
-
>Current mozilla support of 'TEXT' is based on a system script.
Once again, can't you give an example of what you mean?
Give an example for another OS if you don't know it from Mac...
-
'TEXT' should not be dropped but expanded with fix for bug 79864.
I can understand if you keep the current behaviour until then, to
support "core user base" which need only 8859-1. That's an
understandable argument/compromise, even if I don't like it.
But a better solution would be to make it so that Mozilla skips
'TEXT' as soon as it detects non-8859-1 (or non-system-scrip,
should things really work that way...). Because, there simply is
no use for 'TEXT' in those cases, just trouble... But that might not
be an "one liner" ?
| Assignee | ||
Comment 22•24 years ago
|
||
I tried WorldText "Show Clipboard". It says "styled text", not sure if this
means 'styl'. I found that if I copy a plain text from "TextEdit" then "Show
Clipboard" also says "styled text". And the plain text from "TextEdit" to
WorldText does not work for Japanese text (on US MacOS X). So this could be a
generic WorldText problem. Please try this combination (WorldText and TextEdit).
Example of system script. Japanese localized MacOS system script is smJapanese.
So mozilla can convert unicode to Shift_JIS without losing Japanese characters.
I was able to copy/paste Japanese text from mozilla to simpletext.
Comment 23•24 years ago
|
||
>And the plain text from "TextEdit" to WorldText does not work for Japanese text
(on US MacOS X).
Did you try select all and then "Font substitution" from 'Layout' menu?
>...Japanese localized MacOS system script is smJapanese.
>...I was able to copy/paste Japanese text from mozilla to simpletext.
Where you unable to copy/paste ISO-8859-1?
Comment 24•24 years ago
|
||
bugzilla-daemon@mozilla.org wrote:
> Example of system script. Japanese localized MacOS system script is smJapanese.
> So mozilla can convert unicode to Shift_JIS without losing Japanese characters.
> I was able to copy/paste Japanese text from mozilla to simpletext.
Simple question. I can copy and paste Arabic text (Windows 1256) from Win Mozilla
to Word 2000. This is on Windows 2000 Pro Japanese. System script (locale) is
Japanese, of course. Why I cannot do the same thing with Mac Mozilla? Where is
Mozilla's cross-platform compatibility?
| Assignee | ||
Comment 25•24 years ago
|
||
Unicode clipboard is used for Windows2000 but not for Windows98. There are
limitations based on availability of unicode support by OS.
| Assignee | ||
Comment 26•24 years ago
|
||
I mark this as a dup of bug 10816 ('utxt' support). The problem described in the
original comment is fixed.
To Leif's questions,
* No, I did not try "Font substitution" but I think it doesn't matter for plain
text.
* No, Japanese only, no Latin1. As I mentioned before, plain text cannot hold
more than two scripts.
*** This bug has been marked as a duplicate of 10816 ***
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Comment 27•24 years ago
|
||
* I agree that this is a duplicate of the older bug 10816
** If you did not try fontsubstitution, please try before ruling it out. Though
I agree that WorldText might have bugs, like for instance telling that
unstyledtext is styled. And I also agree that WorldText should prefer
unformatted 'utxt' over unformatted 'text'. Not doing so is illogical and
perhaps even a bug.
*** As for Japanese vs Latin1: I did of course not think about a situation where
you copied *both* japanese and latin1 at the same time. This last note only to
save my already damaged repuation ;-)
| Assignee | ||
Comment 28•24 years ago
|
||
I tried again with "Font substitution" checked but the same result, could not
copy Japanese text correctly.
Comment 29•24 years ago
|
||
Ok. Perhaps it is WorldText's misinterpreation of the text as styled that is the
cause.
Btw, I have learned that if you copy e.g. [*only*] the CYRILLIC CAPITAL LETTER
PE Mozilla put a two-byte 'utxt' flavor on the clipboard (0x041F, as expected),
Mozilla also put an *empty* 'TEXT' flavor on the clipboard. E.g. WASTE apps then
see the 'TEXT' flavor and displays/pasts that empty text instead. There are
*many* WASTE apps around.. WASTE would however have no problem with this if it
preferred 'utxt' over 'text'. And to fix this (remove that empty(!) TEXT) would
probably only be intermediate until 'text' with 'styl' is supported?
| Assignee | ||
Comment 30•24 years ago
|
||
The empty text is a result of a conversion error.
Filed a bug 83218 for fallback to question marks instead of skipping unconverted
characters.
You need to log in
before you can comment on or make changes to this bug.
Description
•