Closed
Bug 78039
Opened 23 years ago
Closed 23 years ago
Copy-pasting Japanese (Unicode) characters to a Unicode-aware app doesn't work
Categories
(Core :: XUL, defect)
Tracking
()
mozilla0.9.2
People
(Reporter: hsivonen, Assigned: nhottanscp)
References
()
Details
(Keywords: intl, Whiteboard: OSX+)
Build ID: 2001-04-18 FizzillaCFM Steps to reproduce: 1) Load a UTF-8 page with some Japanese text eg. http://www.w3.org/Graphics/SVG/Overview.htm8 2) Locate a piece of Japanese text on the page. 3) Copy Japanese text and some English text around it. 4) Paste to TextEdit Actual results: Only the Latin characters are pasted. The Japanese characters are lost. Expected results: Expected all characters to be pasted.
Comment 1•23 years ago
|
||
what is the os data flavor for unicode? i'm sure this won't work on macos9 either.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment 2•23 years ago
|
||
You need to put Mac-encoded Japanese on the clipboard, using the Unicode Converter to convert from Unicode to styled text. See the text drawing code for similar stuff, but note that you'll have to build the style scrap data too.
Comment 3•23 years ago
|
||
a-hah! kScrapFlavorTypeUnicode = 'utxt' we just need to add a mapping to this in the mimeMapper and it should just work.
Target Milestone: mozilla1.0 → mozilla0.9.2
Comment 4•23 years ago
|
||
This bug is not related only to japanese or only to unicode. Simply go to any page in cyrillic, or open cyrillic in the editor or in mozilla mail and try to copy it and paste in whichever editor you will, and the only thing that gets pasted in is spaces, commas, colons and so on. As well as other ascii characters...
Comment 5•23 years ago
|
||
I'm sorry. I forgot to not that this bug report relates to Mac OS X. My comment was about regular Mac version (not Fizilla) on MacOS 9.1, and it is the same for MacOS 9.04.
Comment 6•23 years ago
|
||
this is a dupe of a bug nhotta has. the fix is a one-liner.
Assignee: pinkerton → nhotta
Status: ASSIGNED → NEW
Comment 7•23 years ago
|
||
one liner?
Assignee | ||
Updated•23 years ago
|
Assignee | ||
Comment 8•23 years ago
|
||
'utxt' support was checked in by bug 10816. But seems to be some remaining problems. I plan to look into them in 0.9.2.
Comment 9•23 years ago
|
||
Current status is that one can paste unicode text (UTXT) *into* Mozilla (e.g. from WorldText), but that only ISO-8859-1 text is copied *from* Mozilla to external Apps, which could be caused by the fact that Mozilla copies both STYL and UTXT, but that it supports multilingualism on the UTXT-layer only, but not yet on the STYL layer. External app's takes STYL before UTXT clipboard. And hence the trouble.
Updated•23 years ago
|
Whiteboard: OSX+
Assignee | ||
Comment 10•23 years ago
|
||
I tried today's Macintosh build on MacOS X (US). * I went to http://home.netscape.com/ja/download/index.html, the page contains Japanese text and English (e.g. Netscape, Communicator). * I copied characters from Mozilla and pasted to "TextEdit", the text pasted correctly (both Japanese and English). * In "TextEdit", I selected menu "Format -> Make Plain Text". Then selected some characters (English and Japanese), pasted to Mozilla HTML composer. The text was pasted correctly (both English and Japanese). * I repeated the same test for Korean, http://home.netscape.com/ko/download. Korean also worked, I could also copy/paste Korean and Japanese between the apps. So plain text copy/paste is working by 'utxt' support. Leif, do you have the text you tried for "WorldText" available somewhere on the web? I may try the same text on my machine to see if "TextEdit" also has the problem.
Comment 11•23 years ago
|
||
The problem is or might be *not* that Mozilla doesn't support UTXT (I have allready confirmed some instances when UTXT is clearly working!!!). The real problem is that MOzilla uncorrectly copies text containing ISO-8859-1 as both *wrongly formatted* styl TEXT and as unicode text. It does this *only* when the copy contain text from 8859-1-range. (And it creates problems because most apps prefer STYL TEXT over unicode text.) Else, if you select japanese or cyrillic character(s), they are copied and pasted as unformatted unicode text. The two flavours on the clipboard should be identical in content: it should be a multilingual style TEXT layer and multilingual UTXT layer. Since Mozilla clipboard does support multilingualism *only* via 'UTXT' and not (yet) via 'STYL', it can only go wrong when it copies anything as *formatted TEXT*. That means that non-8859-1 text *must* be deleted. At least on the TEXT layers. In addition, mac applications generally prefers STYL text over UTXT. Even WorldText does that. So, let us say that Mozilla puts uncorrect info on the TEXT layer, but correct unicode text on the UTXT layer, still text will be pasted into WorldText in the malformed version. I cannot say for sure whether the Mozilla puts correct info on the UTXT layer or not in the case I am speakin about, but if things works in TextEdit on OSX but not in WorldText on OS9.1, the only answer to that can be that TextEdit on OSX prefers UTXT over the TEXT layer. But I can say for sure that disabling copying of text as formatted STYL TEXT should help us get better results You find the WorldText program inside the full version CD of OS9.1. I can also send it to you if you wish.
Assignee | ||
Comment 12•23 years ago
|
||
I have 9.1 CD so I will try WorldText later. Not sure why it has a problem because mozilla converts Latin1 to MacRoman for 'TEXT' and most of the characters are supposed to be preserved.
Comment 13•23 years ago
|
||
Did not understand you comment... The problem is not that Latin1 is mistreated. Latin1 is preserved wonderfully. But in a text containing cyrillic, it is not enough to preserve Latin1... The problem is in Mozilla: why does it copy as unoformatted UTXT if you copy *just* cyrillic text, but as both TEXT and UTXT if it copies something with 8859-1-text?
Comment 14•23 years ago
|
||
OK -- you are not wrong: I pasted content of www.syndod.com into an editor that prefers UTXT before TEXT (PEPPER from <www.hekkelman.com>) with success. So what have we then: are you going to stop copying to the TEXT layer since it is not being used anyway? Or are you going to fix TEXT so it can also work with multilingual text and not confuse WorldText and similar applications?
Assignee | ||
Comment 15•23 years ago
|
||
'TEXT' cannot hold more than one script, so Latin1 and Cyrillic cannot exist together, this is not a limitaion of Mozilla. 'TEXT' cannot be dropped since not everybody understands 'utxt'. Applications which support 'utxt' should prefer 'utxt' over 'TEXT', those applications have to fix it.
Comment 16•23 years ago
|
||
Perhaps I have confused you with something, so let me quote from Tomasz comment on bug 10816: "but I guess it [WASTE-based apps] tries to read 'TEXT' with 'styl' first, and then if it is not present, it reads 'utxt' (which is without style information). If I may guess, MLTE does the same. Actually I think it should be done the other way: 'utxt' should be "preferred" since we do not know if 8-bit text carries correct, but this is not acceptable for WASTE an MLTE because you lose style information." So, you are wrong: 'TEXT' with 'styl' is the hitherto standard metod for multilingual text on the Mac. And the text copied from Mozilla is, according to WorldScript "formatted text", which one must assume is the same as 'TEXT' with 'styl'. We both know that Mozilla does not support multilingualism via 'TEXT' with 'styl' currently. So why does it copy text like that then? And of course Mozilla should not put on the clipboard to different contents: all the ASCII etc on TEXT layer and and something else on the UTXT layer. This is indeed Mozilla's problem.
Comment 17•23 years ago
|
||
You say that "'TEXT' cannot be dropped since not everybody understands 'utxt'." Yes, and No. If we speak about copying of a 8859-1 text, current behaviour is OK. But for other texts, TEXT must not be copied until multilingualism via "'Text' with 'styl'" is supported. The current behaviour isn't of help to anyone.
Assignee | ||
Comment 18•23 years ago
|
||
I can understand that if both 'TEXT' and 'styl' exist then prefer 'TEXT' over 'utxt'. But only 'TEXT' and no 'styl' then 'utxt' may be preferred. I do not agree with your last comment about not use 'TEXT'. It works fine if the text is not multi scripts and the script of the text matches with the systems script (I think this is a situation of many users).
Comment 19•23 years ago
|
||
You said: "But only 'TEXT' and no 'styl' then 'utxt' may be preferred." I say: Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText. I do not know what could theortically happen if you made sure that it copied 'TEXT' only. But I don't think that you can expect Mozilla to change the way Mac apps is handeling text... You said.: "I do not agree with your last comment about not use 'TEXT'. It works fine if the text is not multi scripts" I say: This is wrong. Plain wrong. It only works correct if the text belongs to ISO-8859-1. You said: "...and the script of the text matches with the systems script I say: I thought we had agreed upon that the consept of "system script" does not fit for the Mac. This reminds of a previous discussion... when Marina claimed that it mattered if I pasted into russian or latin Simple Text... You said: "(I think this is a situation of many users)." I say: Please tell me about one such experience then. Give me an example.
Assignee | ||
Comment 20•23 years ago
|
||
>Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText. I installed WorldText, could you tell me how to check that? Current mozilla support of 'TEXT' is based on a system script. I understand that this does not work for everybody but we cannot drop its support for that reason. 'style' support issue is filed separatly (bug 79864).
Comment 21•23 years ago
|
||
Select "Show Clipboard" from the 'Edit' menu so the clipboard window shows. Then you can read at the top of that window informtation about the content of the clipboard. - >Current mozilla support of 'TEXT' is based on a system script. Once again, can't you give an example of what you mean? Give an example for another OS if you don't know it from Mac... - 'TEXT' should not be dropped but expanded with fix for bug 79864. I can understand if you keep the current behaviour until then, to support "core user base" which need only 8859-1. That's an understandable argument/compromise, even if I don't like it. But a better solution would be to make it so that Mozilla skips 'TEXT' as soon as it detects non-8859-1 (or non-system-scrip, should things really work that way...). Because, there simply is no use for 'TEXT' in those cases, just trouble... But that might not be an "one liner" ?
Assignee | ||
Comment 22•23 years ago
|
||
I tried WorldText "Show Clipboard". It says "styled text", not sure if this means 'styl'. I found that if I copy a plain text from "TextEdit" then "Show Clipboard" also says "styled text". And the plain text from "TextEdit" to WorldText does not work for Japanese text (on US MacOS X). So this could be a generic WorldText problem. Please try this combination (WorldText and TextEdit). Example of system script. Japanese localized MacOS system script is smJapanese. So mozilla can convert unicode to Shift_JIS without losing Japanese characters. I was able to copy/paste Japanese text from mozilla to simpletext.
Comment 23•23 years ago
|
||
>And the plain text from "TextEdit" to WorldText does not work for Japanese text (on US MacOS X). Did you try select all and then "Font substitution" from 'Layout' menu? >...Japanese localized MacOS system script is smJapanese. >...I was able to copy/paste Japanese text from mozilla to simpletext. Where you unable to copy/paste ISO-8859-1?
Comment 24•23 years ago
|
||
bugzilla-daemon@mozilla.org wrote: > Example of system script. Japanese localized MacOS system script is smJapanese. > So mozilla can convert unicode to Shift_JIS without losing Japanese characters. > I was able to copy/paste Japanese text from mozilla to simpletext. Simple question. I can copy and paste Arabic text (Windows 1256) from Win Mozilla to Word 2000. This is on Windows 2000 Pro Japanese. System script (locale) is Japanese, of course. Why I cannot do the same thing with Mac Mozilla? Where is Mozilla's cross-platform compatibility?
Assignee | ||
Comment 25•23 years ago
|
||
Unicode clipboard is used for Windows2000 but not for Windows98. There are limitations based on availability of unicode support by OS.
Assignee | ||
Comment 26•23 years ago
|
||
I mark this as a dup of bug 10816 ('utxt' support). The problem described in the original comment is fixed. To Leif's questions, * No, I did not try "Font substitution" but I think it doesn't matter for plain text. * No, Japanese only, no Latin1. As I mentioned before, plain text cannot hold more than two scripts. *** This bug has been marked as a duplicate of 10816 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Comment 27•23 years ago
|
||
* I agree that this is a duplicate of the older bug 10816 ** If you did not try fontsubstitution, please try before ruling it out. Though I agree that WorldText might have bugs, like for instance telling that unstyledtext is styled. And I also agree that WorldText should prefer unformatted 'utxt' over unformatted 'text'. Not doing so is illogical and perhaps even a bug. *** As for Japanese vs Latin1: I did of course not think about a situation where you copied *both* japanese and latin1 at the same time. This last note only to save my already damaged repuation ;-)
Assignee | ||
Comment 28•23 years ago
|
||
I tried again with "Font substitution" checked but the same result, could not copy Japanese text correctly.
Comment 29•23 years ago
|
||
Ok. Perhaps it is WorldText's misinterpreation of the text as styled that is the cause. Btw, I have learned that if you copy e.g. [*only*] the CYRILLIC CAPITAL LETTER PE Mozilla put a two-byte 'utxt' flavor on the clipboard (0x041F, as expected), Mozilla also put an *empty* 'TEXT' flavor on the clipboard. E.g. WASTE apps then see the 'TEXT' flavor and displays/pasts that empty text instead. There are *many* WASTE apps around.. WASTE would however have no problem with this if it preferred 'utxt' over 'text'. And to fix this (remove that empty(!) TEXT) would probably only be intermediate until 'text' with 'styl' is supported?
Assignee | ||
Comment 30•23 years ago
|
||
The empty text is a result of a conversion error. Filed a bug 83218 for fallback to question marks instead of skipping unconverted characters.
You need to log in
before you can comment on or make changes to this bug.
Description
•