Closed Bug 78039 Opened 24 years ago Closed 24 years ago

Copy-pasting Japanese (Unicode) characters to a Unicode-aware app doesn't work

Categories

(Core :: XUL, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 10816
mozilla0.9.2

People

(Reporter: hsivonen, Assigned: nhottanscp)

References

()

Details

(Keywords: intl, Whiteboard: OSX+)

Build ID: 2001-04-18 FizzillaCFM Steps to reproduce: 1) Load a UTF-8 page with some Japanese text eg. http://www.w3.org/Graphics/SVG/Overview.htm8 2) Locate a piece of Japanese text on the page. 3) Copy Japanese text and some English text around it. 4) Paste to TextEdit Actual results: Only the Latin characters are pasted. The Japanese characters are lost. Expected results: Expected all characters to be pasted.
what is the os data flavor for unicode? i'm sure this won't work on macos9 either.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
You need to put Mac-encoded Japanese on the clipboard, using the Unicode Converter to convert from Unicode to styled text. See the text drawing code for similar stuff, but note that you'll have to build the style scrap data too.
a-hah! kScrapFlavorTypeUnicode = 'utxt' we just need to add a mapping to this in the mimeMapper and it should just work.
Target Milestone: mozilla1.0 → mozilla0.9.2
This bug is not related only to japanese or only to unicode. Simply go to any page in cyrillic, or open cyrillic in the editor or in mozilla mail and try to copy it and paste in whichever editor you will, and the only thing that gets pasted in is spaces, commas, colons and so on. As well as other ascii characters...
I'm sorry. I forgot to not that this bug report relates to Mac OS X. My comment was about regular Mac version (not Fizilla) on MacOS 9.1, and it is the same for MacOS 9.04.
this is a dupe of a bug nhotta has. the fix is a one-liner.
Assignee: pinkerton → nhotta
Status: ASSIGNED → NEW
one liner?
Status: NEW → ASSIGNED
Component: DOM to Text Conversion → XP Toolkit/Widgets
Keywords: intl
'utxt' support was checked in by bug 10816. But seems to be some remaining problems. I plan to look into them in 0.9.2.
Current status is that one can paste unicode text (UTXT) *into* Mozilla (e.g. from WorldText), but that only ISO-8859-1 text is copied *from* Mozilla to external Apps, which could be caused by the fact that Mozilla copies both STYL and UTXT, but that it supports multilingualism on the UTXT-layer only, but not yet on the STYL layer. External app's takes STYL before UTXT clipboard. And hence the trouble.
Whiteboard: OSX+
I tried today's Macintosh build on MacOS X (US). * I went to http://home.netscape.com/ja/download/index.html, the page contains Japanese text and English (e.g. Netscape, Communicator). * I copied characters from Mozilla and pasted to "TextEdit", the text pasted correctly (both Japanese and English). * In "TextEdit", I selected menu "Format -> Make Plain Text". Then selected some characters (English and Japanese), pasted to Mozilla HTML composer. The text was pasted correctly (both English and Japanese). * I repeated the same test for Korean, http://home.netscape.com/ko/download. Korean also worked, I could also copy/paste Korean and Japanese between the apps. So plain text copy/paste is working by 'utxt' support. Leif, do you have the text you tried for "WorldText" available somewhere on the web? I may try the same text on my machine to see if "TextEdit" also has the problem.
The problem is or might be *not* that Mozilla doesn't support UTXT (I have allready confirmed some instances when UTXT is clearly working!!!). The real problem is that MOzilla uncorrectly copies text containing ISO-8859-1 as both *wrongly formatted* styl TEXT and as unicode text. It does this *only* when the copy contain text from 8859-1-range. (And it creates problems because most apps prefer STYL TEXT over unicode text.) Else, if you select japanese or cyrillic character(s), they are copied and pasted as unformatted unicode text. The two flavours on the clipboard should be identical in content: it should be a multilingual style TEXT layer and multilingual UTXT layer. Since Mozilla clipboard does support multilingualism *only* via 'UTXT' and not (yet) via 'STYL', it can only go wrong when it copies anything as *formatted TEXT*. That means that non-8859-1 text *must* be deleted. At least on the TEXT layers. In addition, mac applications generally prefers STYL text over UTXT. Even WorldText does that. So, let us say that Mozilla puts uncorrect info on the TEXT layer, but correct unicode text on the UTXT layer, still text will be pasted into WorldText in the malformed version. I cannot say for sure whether the Mozilla puts correct info on the UTXT layer or not in the case I am speakin about, but if things works in TextEdit on OSX but not in WorldText on OS9.1, the only answer to that can be that TextEdit on OSX prefers UTXT over the TEXT layer. But I can say for sure that disabling copying of text as formatted STYL TEXT should help us get better results You find the WorldText program inside the full version CD of OS9.1. I can also send it to you if you wish.
I have 9.1 CD so I will try WorldText later. Not sure why it has a problem because mozilla converts Latin1 to MacRoman for 'TEXT' and most of the characters are supposed to be preserved.
Did not understand you comment... The problem is not that Latin1 is mistreated. Latin1 is preserved wonderfully. But in a text containing cyrillic, it is not enough to preserve Latin1... The problem is in Mozilla: why does it copy as unoformatted UTXT if you copy *just* cyrillic text, but as both TEXT and UTXT if it copies something with 8859-1-text?
OK -- you are not wrong: I pasted content of www.syndod.com into an editor that prefers UTXT before TEXT (PEPPER from <www.hekkelman.com>) with success. So what have we then: are you going to stop copying to the TEXT layer since it is not being used anyway? Or are you going to fix TEXT so it can also work with multilingual text and not confuse WorldText and similar applications?
'TEXT' cannot hold more than one script, so Latin1 and Cyrillic cannot exist together, this is not a limitaion of Mozilla. 'TEXT' cannot be dropped since not everybody understands 'utxt'. Applications which support 'utxt' should prefer 'utxt' over 'TEXT', those applications have to fix it.
Perhaps I have confused you with something, so let me quote from Tomasz comment on bug 10816: "but I guess it [WASTE-based apps] tries to read 'TEXT' with 'styl' first, and then if it is not present, it reads 'utxt' (which is without style information). If I may guess, MLTE does the same. Actually I think it should be done the other way: 'utxt' should be "preferred" since we do not know if 8-bit text carries correct, but this is not acceptable for WASTE an MLTE because you lose style information." So, you are wrong: 'TEXT' with 'styl' is the hitherto standard metod for multilingual text on the Mac. And the text copied from Mozilla is, according to WorldScript "formatted text", which one must assume is the same as 'TEXT' with 'styl'. We both know that Mozilla does not support multilingualism via 'TEXT' with 'styl' currently. So why does it copy text like that then? And of course Mozilla should not put on the clipboard to different contents: all the ASCII etc on TEXT layer and and something else on the UTXT layer. This is indeed Mozilla's problem.
You say that "'TEXT' cannot be dropped since not everybody understands 'utxt'." Yes, and No. If we speak about copying of a 8859-1 text, current behaviour is OK. But for other texts, TEXT must not be copied until multilingualism via "'Text' with 'styl'" is supported. The current behaviour isn't of help to anyone.
I can understand that if both 'TEXT' and 'styl' exist then prefer 'TEXT' over 'utxt'. But only 'TEXT' and no 'styl' then 'utxt' may be preferred. I do not agree with your last comment about not use 'TEXT'. It works fine if the text is not multi scripts and the script of the text matches with the systems script (I think this is a situation of many users).
You said: "But only 'TEXT' and no 'styl' then 'utxt' may be preferred." I say: Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText. I do not know what could theortically happen if you made sure that it copied 'TEXT' only. But I don't think that you can expect Mozilla to change the way Mac apps is handeling text... You said.: "I do not agree with your last comment about not use 'TEXT'. It works fine if the text is not multi scripts" I say: This is wrong. Plain wrong. It only works correct if the text belongs to ISO-8859-1. You said: "...and the script of the text matches with the systems script I say: I thought we had agreed upon that the consept of "system script" does not fit for the Mac. This reminds of a previous discussion... when Marina claimed that it mattered if I pasted into russian or latin Simple Text... You said: "(I think this is a situation of many users)." I say: Please tell me about one such experience then. Give me an example.
>Mozilla give use 'TEXT' *with* 'styl' -- according to WorldText. I installed WorldText, could you tell me how to check that? Current mozilla support of 'TEXT' is based on a system script. I understand that this does not work for everybody but we cannot drop its support for that reason. 'style' support issue is filed separatly (bug 79864).
Select "Show Clipboard" from the 'Edit' menu so the clipboard window shows. Then you can read at the top of that window informtation about the content of the clipboard. - >Current mozilla support of 'TEXT' is based on a system script. Once again, can't you give an example of what you mean? Give an example for another OS if you don't know it from Mac... - 'TEXT' should not be dropped but expanded with fix for bug 79864. I can understand if you keep the current behaviour until then, to support "core user base" which need only 8859-1. That's an understandable argument/compromise, even if I don't like it. But a better solution would be to make it so that Mozilla skips 'TEXT' as soon as it detects non-8859-1 (or non-system-scrip, should things really work that way...). Because, there simply is no use for 'TEXT' in those cases, just trouble... But that might not be an "one liner" ?
I tried WorldText "Show Clipboard". It says "styled text", not sure if this means 'styl'. I found that if I copy a plain text from "TextEdit" then "Show Clipboard" also says "styled text". And the plain text from "TextEdit" to WorldText does not work for Japanese text (on US MacOS X). So this could be a generic WorldText problem. Please try this combination (WorldText and TextEdit). Example of system script. Japanese localized MacOS system script is smJapanese. So mozilla can convert unicode to Shift_JIS without losing Japanese characters. I was able to copy/paste Japanese text from mozilla to simpletext.
>And the plain text from "TextEdit" to WorldText does not work for Japanese text (on US MacOS X). Did you try select all and then "Font substitution" from 'Layout' menu? >...Japanese localized MacOS system script is smJapanese. >...I was able to copy/paste Japanese text from mozilla to simpletext. Where you unable to copy/paste ISO-8859-1?
bugzilla-daemon@mozilla.org wrote: > Example of system script. Japanese localized MacOS system script is smJapanese. > So mozilla can convert unicode to Shift_JIS without losing Japanese characters. > I was able to copy/paste Japanese text from mozilla to simpletext. Simple question. I can copy and paste Arabic text (Windows 1256) from Win Mozilla to Word 2000. This is on Windows 2000 Pro Japanese. System script (locale) is Japanese, of course. Why I cannot do the same thing with Mac Mozilla? Where is Mozilla's cross-platform compatibility?
Unicode clipboard is used for Windows2000 but not for Windows98. There are limitations based on availability of unicode support by OS.
I mark this as a dup of bug 10816 ('utxt' support). The problem described in the original comment is fixed. To Leif's questions, * No, I did not try "Font substitution" but I think it doesn't matter for plain text. * No, Japanese only, no Latin1. As I mentioned before, plain text cannot hold more than two scripts. *** This bug has been marked as a duplicate of 10816 ***
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
* I agree that this is a duplicate of the older bug 10816 ** If you did not try fontsubstitution, please try before ruling it out. Though I agree that WorldText might have bugs, like for instance telling that unstyledtext is styled. And I also agree that WorldText should prefer unformatted 'utxt' over unformatted 'text'. Not doing so is illogical and perhaps even a bug. *** As for Japanese vs Latin1: I did of course not think about a situation where you copied *both* japanese and latin1 at the same time. This last note only to save my already damaged repuation ;-)
I tried again with "Font substitution" checked but the same result, could not copy Japanese text correctly.
Ok. Perhaps it is WorldText's misinterpreation of the text as styled that is the cause. Btw, I have learned that if you copy e.g. [*only*] the CYRILLIC CAPITAL LETTER PE Mozilla put a two-byte 'utxt' flavor on the clipboard (0x041F, as expected), Mozilla also put an *empty* 'TEXT' flavor on the clipboard. E.g. WASTE apps then see the 'TEXT' flavor and displays/pasts that empty text instead. There are *many* WASTE apps around.. WASTE would however have no problem with this if it preferred 'utxt' over 'text'. And to fix this (remove that empty(!) TEXT) would probably only be intermediate until 'text' with 'styl' is supported?
The empty text is a result of a conversion error. Filed a bug 83218 for fallback to question marks instead of skipping unconverted characters.
qa_contact to John Morrison
QA Contact: sujay → jrgm
verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.