Closed
Bug 333064
Opened 19 years ago
Closed 9 years ago
Copying CJK text inserts unwanted spaces
Categories
(Core :: DOM: Serializers, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: musiphil, Assigned: jorgk-bmo)
References
Details
Attachments
(5 files, 9 obsolete files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
When I select and copy some text, some gratuitous spaces are also copied.
Reproducible: Always
Steps to Reproduce:
1. Visit the above URL
2. Copy the entire text (by Ctrl+A Ctrl+C, for example)
3. Open any text editor and paste the copied text
Actual Results:
[The first words of each line]
수 숫단
소 녀는
그 러나
Expected Results:
수숫단
소녀는
그러나
Reporter | ||
Comment 1•19 years ago
|
||
The Korean characters got converted to numeric character references (&#nnnnn;) while posting, but still you get the idea.
Comment 2•19 years ago
|
||
Not quite sure what component this belongs in, but I'll guess DOM to Text for starters. Apparently it's a result of using align="justify", since if I remove that from the first paragraph, that one copies correctly while the remaining justified ones get an extra space.
Assignee: nobody → dom-to-text
Component: General → DOM to Text Conversion
Product: Firefox → Core
QA Contact: general
Version: unspecified → Trunk
Updated•15 years ago
|
Assignee: dom-to-text → nobody
QA Contact: dom-to-text
Reporter | ||
Comment 3•14 years ago
|
||
substituting the broken link for the example
Reporter | ||
Updated•14 years ago
|
Reporter | ||
Comment 4•14 years ago
|
||
Can anybody confirm that this is still a problem in Firefox 4, or that this has been fixed? It still persists in Firefox 3.6.16. I don't know why it has stayed UNCONFIRMED so far. Did I have to do something more?
Is it the same bug as bug 673667 ?
Reporter | ||
Comment 6•13 years ago
|
||
(In reply to Fanolian from comment #5)
> Is it the same bug as bug 673667 ?
I believe so. Thank you for reporting this.
Comment 8•12 years ago
|
||
Confirmed on latest Nightly 21 on Win 7. Seems that CJK characters are effected but pasting to pure-text editors have no problem.
(Pasting to MS Word 2013 with "Keep Text Only" option gives correct result as well)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 9•12 years ago
|
||
I grabbed the "text/html" clipboard target using this tool https://github.com/intuited/clipcli
The result is here http://people.mozilla.com/~kchen/333064.html
You can find that we added extra newlines in the paragraph. For Latin scripts this has visually no difference because HTML treat newlines as whitespaces. But for CJK scripts this will cause extra whitespace in between characters.
Comment 10•12 years ago
|
||
Can assign this bug to me? I want to give it a try!
Updated•12 years ago
|
Assignee: nobody → arvin0731
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment 13•10 years ago
|
||
Pase the copied rich text from Firefox to an HTML editor that doesn't change the source and then view the source (eg. any HTML element with contenteditable=true), we can see that there is a line break inserted between every 73 Chinese characters, which seems like a feature of Gecko. Hope the arbitary line break can be removed, or at least add an option to allow disabling it, since the space inserted contributes a significant semantic change for Chinese text.
Comment 14•10 years ago
|
||
Comment 15•10 years ago
|
||
I think the text get wrapped here:
http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#175-204
And 72 is the default wrap column:
http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#116-121
Comment 16•10 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #15)
> I think the text get wrapped here:
>
>
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#175-204
>
> And 72 is the default wrap column:
>
>
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#116-121
Thank you very much... Could we add a config setting to overwrite the behavior?
Comment 17•9 years ago
|
||
I can reproduce the test case in bug 673667 comment 0. There are vairous comments in this bug talking about line breaks in HTML source code being rendered as white space between CJK characters. That is NOT a bug and certainly not this bug.
Comment 18•9 years ago
|
||
Comment 19•9 years ago
|
||
mMaxColumn = 72 was there before we switch from CVS to Hg!
https://hg.mozilla.org/mozilla-central/annotate/dfff608ebad0/content/base/src/nsHTMLContentSerializer.cpp#l117
Comment 20•9 years ago
|
||
XMLSerializer and node.innerHTML/outerHTML are not affected.
Comment 21•9 years ago
|
||
Bug 524975 seems to be describing the same issue. I am not sure if we should dup.
See Also: → 524975
Assignee | ||
Comment 22•9 years ago
|
||
Looking at bug 524975 and looking at what that bug landed:
http://hg.mozilla.org/mozilla-central/rev/4ae32b827a7f#l1.24
we could possibly use the new serialiser flag introduced in bug 1225864:
OutputDisallowLineBreaking.
See Also: → 606747
I am chiming in with my two cent worth.
With FF 45.x, I see unwanted spaces inserted into text when copy&paste operation is performed for Japanese text.
(I suspect this problem manifests for other Asian languages such as Chinese, Korean and Thai. I say "Thai" because I noticed the Bug 606747 - add extra spaces when copy/paste thai language.)
Here I could create a very clear-cut test case where you can try the
copy&paste of the Japanese text and see where the unintended/unwanted spaces are inserted. (See attached). This is more or less from a real text and not as simple as Jorg's sample.
The unintended/unwanted spaces are very embarrassing and problematic
in business/formal document and not good for business processing where web interface is often used these days!
The problem was first noticed in Thunderbird HTML message composition, but found to be DOM-related (I think).
See https://groups.google.com/forum/#!topic/mozilla.dev.apps.thunderbird/6juKnWvrm-Y
BTW, this problem happens only with PASTE *WITH* FORMATTING, and not with paste *without* formatting.
TIA
Assignee | ||
Comment 24•9 years ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #23)
> In the pasted text, you will note some extra spaces inserted. Look carefully.
Well, my Korean text is simpler, you don't have to look carefully, it tells you where the spaces are inserted ;-)
(In reply to Jorg K (GMT+2) from comment #24)
> (In reply to ISHIKAWA, Chiaki from comment #23)
> > In the pasted text, you will note some extra spaces inserted. Look carefully.
> Well, my Korean text is simpler, you don't have to look carefully, it tells
> you where the spaces are inserted ;-)
Yes, I noticed that :-) I just wanted to show that the space(s) do creep in and may not be so visible in real world document when one is in a rush, but on the receiving side, when one reads a document carefully, it *IS* noticeable and the reader may think the writer/sender is a careless writer (oh well)!
[I am embarrassed that there is a typo: "embarassing". ]
At least, I made sure that my HTML page passed W3C's validator with flying colours!
Thank you again for looking into this issue.
Assignee | ||
Comment 26•9 years ago
|
||
Hmm, I played around with the encoder flags here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.cpp#102
removing OutputPreformatted (since it makes little sense together with OutputRaw, see:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsIDocumentEncoder.idl#76
(XXXbz How does this interact with OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
and also added OutputDisallowLineBreaking as suggested in comment #22. Nothing helped.
Needs more investigation, starting here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#199
(mWrapColumn = 72;)
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#1126
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXHTMLContentSerializer.cpp#76
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#119
Looks like the wrap after 72 characters is deeply engrained ;-(
Most likely we'll need to tweak one of these:
https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
mozilla/dom/base/nsXMLContentSerializer.cpp
119 mMaxColumn = 72;
122 mMaxColumn = aWrapColumn;
1345 mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
1476 if (mDoWrap && mColPos + 1 >= mMaxColumn) {
1552 } while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
1739 if (mColPos >= mMaxColumn) {
Note that since we have OutputRaw, mDoWrap is false:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLContentSerializer.cpp#114
Comment 27•9 years ago
|
||
My other question would be why do we have a "wrap at 72 char" logic at first place? I don't see any spec on HTML/XML serialization mentioning the magic number 72 and/or like "the serialized DOM *may* be wrapped at 72 char" or something.
I don't have access to CVS history to see why it was added either.
Comment 28•9 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #26)
> Hmm, I played around with the encoder flags here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.
> cpp#102
> removing OutputPreformatted (since it makes little sense together with
> OutputRaw, see:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsIDocumentEncoder.idl#76
> (XXXbz How does this interact with
> OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
> and also added OutputDisallowLineBreaking as suggested in comment #22.
> Nothing helped.
>
> Needs more investigation, starting here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#199
> (mWrapColumn = 72;)
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#1126
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXHTMLContentSerializer.cpp#76
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXMLContentSerializer.cpp#119
>
> Looks like the wrap after 72 characters is deeply engrained ;-(
>
> Most likely we'll need to tweak one of these:
> https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
> mozilla/dom/base/nsXMLContentSerializer.cpp
> 119 mMaxColumn = 72;
> 122 mMaxColumn = aWrapColumn;
> 1345 mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
> 1476 if (mDoWrap && mColPos + 1 >= mMaxColumn) {
> 1552 } while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
> 1739 if (mColPos >= mMaxColumn) {
> Note that since we have OutputRaw, mDoWrap is false:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/
> nsXMLContentSerializer.cpp#114
Thank you for document the code path in question. It's very valuable.
As opposed to keep trying these nsIDocumentEncoder flags, after reading the comment and the part of the code you referenced (and the IDL), I have come to the conclusion that we should simply dive into nsXMLContentSerializer, and identify the instruction that adds the new line, and somehow ensure we don't get into that condition when the OutputRaw flag is set. Since, IMHO, the behavior simply violates the description of "OutputRaw" as described in the IDL.
What do you think?
Assignee | ||
Comment 29•9 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27)
> My other question would be why do we have a "wrap at 72 char" logic at first
> place?
Remember Netscape? The browser/mail client/HTML editor? It's a hang over from the quirky old past. The bug is from 2005, that's about the year Netscape was split into Firefox and Thunderbird. Much of the serialisation code was written to cater for e-mail formatting.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
> What do you think?
Sure, I'll debug it in the next few days. Note that OutputDisallowLineBreaking may be important, since ASCII strings are not broken in the middle, whereas CJK (Chinese, Japanese, Korean) strings can always be broken. I found out the hard way when fixing the "Asian crisis" in Thunderbird (bug 1225864 and others).
(Can I request that you don't quote/reply to entire comments and only reply to a specific line/section. We don't need duplicated content.)
Assignee | ||
Comment 30•9 years ago
|
||
I was right: Disallowing line breaking for CJK text here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#116
fixes the problem as predicted. I just have to see how to pass it in since there seems to be more than one call to nsXMLContentSerializer::Init(). Can't be long now.
Comment 31•9 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #30)
> Can't be long now.
Nice! I was thinking about working on this in my spare hours but I will leave you to it.
(In reply to Jorg K (GMT+2) from comment #29)
> (Can I request that you don't quote/reply to entire comments and only reply
> to a specific line/section. We don't need duplicated content.)
Sure.
Assignee | ||
Comment 32•9 years ago
|
||
This fixes the problem. Tested on attachment 8748800 [details].
Now comes the usual: Try run and writing a test for it. Here the try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c766e41514f2
Assignee: arvin0731 → mozilla
Status: NEW → ASSIGNED
Assignee | ||
Comment 33•9 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
"Raw" already worked, but the path that needed fixing wasn't the path that used "raw".
Chiaki-san: I tried the patch in TB and it works great there. No more unwanted spaces pasted.
Assignee | ||
Updated•9 years ago
|
Summary: Copying text copies gratuitous spaces → Copying CJK text inserts unwanted spaces
(In reply to Jorg K (GMT+2) from comment #33)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #28)
> > Since, IMHO, the behavior simply violates the description of "OutputRaw"
> > as described in the IDL.
> "Raw" already worked, but the path that needed fixing wasn't the path that
> used "raw".
>
> Chiaki-san: I tried the patch in TB and it works great there. No more
> unwanted spaces pasted.
Dear Jorg,
Thank you, isn't this great? So once your patch lands in M-C, and C-C picks it up, the TB will see the original issue I saw in TB resolved. How nice.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27)
> I don't have access to CVS history to see why it was added either.
BTW, is bonsai no longer available? There used to be a web server that allows us to look int pre-Mercurial CVS history.
Assignee | ||
Comment 35•9 years ago
|
||
Neil, would you be able to review this?
Ehsan is busy and doesn't want to do reviews and Masayuki will tell me that he doesn't have review powers in dom/base/ (*).
This is making use of the serialiser flag nsIDocumentEncoder::OutputDisallowLineBreaking created for exactly this purpose in bug 1225864 to avoid shredding through CJK strings since they can be broken anywhere for screen display. But they can't be broken if I want to turn DOM into text.
Also, would you like to see another try run with the new test? I have a try run without it in comment #32 and that came out green. The test works locally and it fails locally with the C++ part removed.
(*) On second thought, Masayuki reviewed bug 1225864 which was also in the serialisers in dom/base/ but in bug 233705 comment #53 he said:
I don't have rights to review dom/base/nsPlainTextSerializer.*
So I don't know.
Attachment #8749166 -
Attachment is obsolete: true
Attachment #8749425 -
Flags: review?(enndeakin)
Assignee | ||
Comment 36•9 years ago
|
||
Changed the test, the <div> containing the text doesn't need to be editable.
Attachment #8749425 -
Attachment is obsolete: true
Attachment #8749425 -
Flags: review?(enndeakin)
Attachment #8749535 -
Flags: review?(enndeakin)
Comment 37•9 years ago
|
||
Comment on attachment 8749535 [details] [diff] [review]
Possible solution (v1) with test (v1a).
>+ div.focus();
The div isn't focusable so this doesn't do anything.
>+ // Copy to clipboard.
>+ var webnav = SpecialPowers.wrap(window).QueryInterface(SpecialPowers.Ci.nsIInterfaceRequestor)
>+ .getInterface(SpecialPowers.Ci.nsIWebNavigation)
>+ var docShell = webnav.QueryInterface(SpecialPowers.Ci.nsIDocShell);
>+ var documentViewer = docShell.contentViewer
>+ .QueryInterface(SpecialPowers.Ci.nsIContentViewerEdit);
>+ documentViewer.copySelection();
>+ var clipboard = SpecialPowers.Services.clipboard;
>+
>+ var clipboardData = SpecialPowers.wrap(getClipboardData("text/html", clipboard));
>+ var value = clipboardData.value.QueryInterface(SpecialPowers.Ci.nsISupportsString).data;
>+
It would be much better to just synthesize a shortcut key event with synthesizeKey() and listen for the copy or paste event than doing this low-level stuff.
The above code might not work on e10s or linux either since it doesn't wait for the native clipboard. You can just use SimpleTest.waitForClipboard instead. I think you want a try run for this, yes.
Attachment #8749535 -
Flags: review?(enndeakin) → review-
Assignee | ||
Comment 38•9 years ago
|
||
Neil, I'm having a hard time changing the test to using waitForClipboard.
This seems to be undocumented and I can't find a good example to copy from.
I copied from
https://dxr.mozilla.org/mozilla-aurora/source/editor/libeditor/tests/test_bug603556.html#27
Note that I don't need the paste, since the problem happens at copy time.
Like this, the test always succeeds, even without the actual code fix (see patch).
I have no idea what clipboard flavour it tests to decide on whether the copy succeeded or failed. Console says:
TEST-PASS | dom/base/test/test_bug333064.html | Clipboard has the given value
So perhaps it was comparing the text/plain flavour.
I've also seen that waitForClipboard can have more parameters(?):
https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/test_bug1170911.html#42
But I wouldn't know how to use them and appending "text/html" as fifth parameter gives:
Timed out while polling clipboard for pasted data.
Also, that must use a function different to the one here:
https://dxr.mozilla.org/mozilla-aurora/source/devtools/client/inspector/test/head.js#723
The first parameter can be a comparison function. But once again, a simple string gets passed in here and that seems to be the text/plain flavour.
<offtopic>
Knowing what the solution to this bug is took me 30 seconds, poking around to see where to put this solution, that is finding the one line in 7 million code lines that make up Mozilla, took me 1-2 hours. And trying to put a test together has taken a multiple of this time and is still not finished.
So you're help to finish off this bug would be much appreciated.
</off-topic>
Flags: needinfo?(enndeakin)
Comment 39•9 years ago
|
||
(In reply to Jorg K (PTO during summer, NI me) from comment #38)
> I've also seen that waitForClipboard can have more parameters(?):
> https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/
> test_bug1170911.html#42
> But I wouldn't know how to use them and appending "text/html" as fifth
> parameter gives:
> Timed out while polling clipboard for pasted data.
>
I think it only handles text/unicode, but that should be available for this test, right?
> <offtopic>
> Knowing what the solution to this bug is took me 30 seconds, poking around
> to see where to put this solution, that is finding the one line in 7 million
> code lines that make up Mozilla, took me 1-2 hours. And trying to put a test
> together has taken a multiple of this time and is still not finished.
>
> So you're help to finish off this bug would be much appreciated.
> </off-topic>
If you're having trouble just check if the test works on try as is and that's fine too. I'd rather avoid direct nsITransferable use if possible but I'd also rather developers not waste time just for tests.
Flags: needinfo?(enndeakin)
Assignee | ||
Comment 40•9 years ago
|
||
The waitForClipboard seems much simpler, but the clipboard data that is tested or passed to the test is correct even without the code change.
The problem is the "test/html" flavour. That runs trough a faulty serialiser and the CJK string gets chopped.
So is there any way to easily get to the "test/html" flavour in this setup?
Where is the definition of waitForClipboard that takes more than four parameters?
Flags: needinfo?(enndeakin)
Assignee | ||
Comment 41•9 years ago
|
||
Here we have the best of both worlds ;-)
Using waitForClipboard and checking the "text/html" manually.
Fails without the code fix, works with the code fix. Added bonus: "text/plain" is also tested. What do you think? Try run coming up.
Attachment #8749535 -
Attachment is obsolete: true
Attachment #8749976 -
Attachment is obsolete: true
Attachment #8750106 -
Flags: review?(enndeakin)
Assignee | ||
Comment 42•9 years ago
|
||
Assignee | ||
Comment 43•9 years ago
|
||
BTW, at least on Linux this new test runs as part of M(1), and it's green ;-)
Assignee | ||
Comment 44•9 years ago
|
||
OK, M(1) and M-e10s(1) are green everywhere.
Comment 45•9 years ago
|
||
It looks like there is a bug on the last line of waitForClipboard. It passes 'text/unicode' instead of 'requestedFlavor'.
It would be ideal if we could get this to work. Then, you can just check the data in the first argument to waitForClipboard:
waitForClipboard(function (data) { return data == expectedData }, ...
Flags: needinfo?(enndeakin)
Assignee | ||
Comment 46•9 years ago
|
||
This test works now. I've I take the C++ hunk out, I get this failure:
Timed out while polling clipboard for pasted data.
I don't get something along the lines of "unexpected value".
You pointed out that
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#973
is questionable. Well, I've changed the
}, aFailureFn, "text/unicode");
to
}, aFailureFn, requestedFlavor);
and it didn't make a difference.
In any case, I dumped out the value passed in to the comparison function and it contains <html><body> ... (160 Korean characters) ... etc.
So the test works and the code works although, as mentioned above, if I remove the code change I get a strange test failure that will be hard to understand should it ever arise.
So what do you suggest?
Another try run here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f28f021bd21
Attachment #8750295 -
Flags: review?(enndeakin)
Assignee | ||
Comment 47•9 years ago
|
||
Further to what I said:
I don't even need to pull out the C++ change to create a test failure.
If I change
return value.includes("html") && value.includes("안".repeat(160));
to
return value.includes("htmlhuhu") && value.includes("안".repeat(160));
I get a failure with:
Timed out while polling clipboard for pasted data.
Is that expected? I find it mildly confusing.
Assignee | ||
Comment 48•9 years ago
|
||
Hmm, fails on Linux on M(1) and M-e10s(1): Timed out while polling clipboard for pasted data.
Maybe the clipboard is different and doesn't have the "html"??
(And I was silly to run all Mochitests instead of just (1).)
Comment 49•9 years ago
|
||
The way that waitForClipboard works is that it puts some dummy data on the clipboard, calls the supplied initializer function to put what you want on the clipboard, then keeps polling the clipboard until validatorFn returns true.
Without the C++ change, it will fail and time out.
With the change, it should succeed but not necessarily right away.
Without the change I suggested in comment 45, the validatorFn should be passed the text/unicode data, which should succeed even without your changes, as the plain text doesn't have the line-breaking issue. Is this not the case?
Assignee | ||
Comment 50•9 years ago
|
||
(In reply to Neil Deakin from comment #49)
> The way that waitForClipboard works is that it puts some dummy data on the
> clipboard, calls the supplied initializer function to put what you want on
> the clipboard, then keeps polling the clipboard until validatorFn returns
> true.
> Without the C++ change, it will fail and time out.
OK, that explains the time out.
> With the [C++] change, it should succeed but not necessarily right away.
Yes, with the C++ hunk in place, it works on Windows, but apparently not on other platforms, see comment #49 above.
> Without the change I suggested in comment 45, the validatorFn should be
> passed the text/unicode data, which should succeed even without your
> changes, as the plain text doesn't have the line-breaking issue. Is this not
> the case?
Without the change you suggested in comment 45, I see <html><body> ... in the validator function.
The test as published in attachment 8750295 [details] [diff] [review] works locally on Windows (but apparently not on other platforms, see comment #49 above). So I get passed the HTML. Let me dump it out for you:
<html><body>
<!--StartFragment--><div id="korean-text"> ... 160 expected characters ... </div><!--EndFragment-->
</body>
</html>
Assignee | ||
Comment 51•9 years ago
|
||
Yes, the text/unicode flavour doesn't have the line breaking problem. That's what I got when I didn't ask for "text/html" as fifth parameter.
Assignee | ||
Comment 52•9 years ago
|
||
Looking at waitForClipboard again, I can't see an error:
// First we wait for a known value different from the expected one.
var preExpectedVal = initialVal;
SpecialPowers.clipboardCopyString(preExpectedVal);
wait(function(aData) { return aData == preExpectedVal; },
function() {
// Call the original setup fn
aSetupFn();
wait(inputValidatorFn, aSuccessFn, aFailureFn, requestedFlavor); <<===
}, aFailureFn, "text/unicode");
}
The first wait call passes in some initial value, a simplified comparison function and selects the "text/unicode" flavour. The subsequent inner call does request the correct flavour.
In any case, my test isn't working on Linux, maybe also not on Mac, most likely because the only Windows has the <html><body> stuff, as we know from other bugs and as can be seen here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/test/test_bug166235.html#74
In conclusion: I will cancel the try run and rerun it with a different test.
Comment 53•9 years ago
|
||
OK, I looked in more detail at this code. The waitForClipboard function is actually correct as is; it is just very confusing code. The 'text/unicode' reference is only for the initial dummy data check. For other checks it does indeed use the passed in flavor. So ignore that change I suggested.
Note that on Mac, I just get as the passed data:
<div id="korean-text">안안...</div>
You probably want to check specially for the div element with 'korean-text' as an id.
Assignee | ||
Comment 54•9 years ago
|
||
Test that should work on all platforms.
Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=458299fbc89e
Attachment #8750106 -
Attachment is obsolete: true
Attachment #8750295 -
Attachment is obsolete: true
Attachment #8750106 -
Flags: review?(enndeakin)
Attachment #8750295 -
Flags: review?(enndeakin)
Attachment #8750328 -
Flags: review?(enndeakin)
Assignee | ||
Comment 55•9 years ago
|
||
Wires crossed here. I'll do another one.
Assignee | ||
Comment 56•9 years ago
|
||
OK, here we have a test that checks for "korean-text" (that proves that we got the HTML data) and the string itself.
Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0fc0798bbf4
Attachment #8750328 -
Attachment is obsolete: true
Attachment #8750328 -
Flags: review?(enndeakin)
Attachment #8750337 -
Flags: review?(enndeakin)
Assignee | ||
Comment 57•9 years ago
|
||
(this time without the dump().)
Attachment #8750337 -
Attachment is obsolete: true
Attachment #8750337 -
Flags: review?(enndeakin)
Attachment #8750340 -
Flags: review?(enndeakin)
Updated•9 years ago
|
Attachment #8750340 -
Flags: review?(enndeakin) → review+
Assignee | ||
Comment 58•9 years ago
|
||
Thanks Neil!
Dear Sheriff, try run in comment #56.
Keywords: checkin-needed
Assignee | ||
Comment 59•9 years ago
|
||
Kent, this is a patch to fix unwanted spaces in copy/paste of CJK text.
It uses the same technique employed to fix these problems in e-mail. In fact, it builds on our solution from bug 1225864 (see comment #22). So our tweak has finally proven useful for FF as well ;-)
Now that we can add tracking flags to M-C bugs, do you still want to duplicate the bug like we did before?
I recommend inclusion in the next TB 45.x release, the patch has no risk since - as I said - it uses a well understood technique. Chiaki alerted me to the problem on dev-apps-thunderbird@lists.mozilla.org.
Comment 60•9 years ago
|
||
We don't need to create separate TB bugs if we can set the needed flags in Core bugs. This should have a reasonable chance of uplifting into TB branches - but it should first make the proposed "release candidate" beta proposed for next week. Unfortunately there is no way for us to run his through TB aurora without uplifting to m-aurora.
Flags: needinfo?(rkent)
Comment 61•9 years ago
|
||
Keywords: checkin-needed
Comment 62•9 years ago
|
||
What will the copied source be after the patch? I mean, will the copied source code still be line-breaked on every 72 chars? And will it happen on all languages?
Assignee | ||
Comment 63•9 years ago
|
||
This patch affects the "text/html" flavour of the clipboard after copying. The only change that was made is that long CJK strings are now no longer broken since that lead to the insertion of unwanted spaces.
Long non-ASCII string with "European" characters were already not broken, so copying a long string of á was treated differently to a long string of 안.
Otherwise the formatting hasn't changed so there is no need for any concern.
Assignee | ||
Comment 64•9 years ago
|
||
Oh, I see you raised bug 673667. This is exactly the issue we fixed here.
Comment 65•9 years ago
|
||
So there will still be a line-break at, say, white spaces if the string exceeds 72 chars?
An additional question is what is the actually rule to determine "CJK string"? Will full-shape symbols, such as full-shape spaces or CJK punctuations (e.g. ,。、;:○『』 ...) be counted as "CJK string" and be non-broken?
Comment 66•9 years ago
|
||
backed out due to test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27556616&repo=mozilla-inbound
Flags: needinfo?(mozilla)
Assignee | ||
Comment 67•9 years ago
|
||
Yes, CJK punctuation counts as CJK string.
Tomcat, sorry, will disable the text on Android.
Flags: needinfo?(mozilla)
Comment 68•9 years ago
|
||
I wonder whether it would be better to add an user preference to determine whether to break the copied HTML source, since that in some circumstances the user may not want the auto line break of HTML source at all.
Assignee | ||
Comment 69•9 years ago
|
||
Neil, which one should it be?
skip-if = toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'mulet' || buildapp == 'b2g' || toolkit == 'android'
skip-if = os == 'android'
Or
skip-if = (os != 'linux') && (os != 'mac') && (os != 'windows')
Could you pre-approve a patch with the line of your choice added ;-) ?
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(enndeakin)
Comment 70•9 years ago
|
||
... Or a preference to customize the length to break the string instead of using a magic number 72.
Comment 71•9 years ago
|
||
Assignee | ||
Comment 72•9 years ago
|
||
Danny: We're fixing the serialisation of CJK strings in this bug, that is, turning the browsers document object model (DOM) into text.
There is no option required. Besides: What would you like this option to be called?
erroneously-insert-spaces-into-copied-cjk-strings with default set to 'false'?
If you have concerns, please wait until this bug has landed, then try a Nightly compile, and if you have a problem, raise another bug.
This bug here is done, only landing the patch caused some internal test failures which we will address.
Comment 73•9 years ago
|
||
I agree that fixing this issue does not require adding options.
I was somehow talking another (related) issue, that the feature of automatically line break might be a non-necessary issue, and if not, a potential solution is to remove the automatically line break functionality.
If this functionality is still considered necessary for some reason, consider the fact that one may not want this functionality in some situation, we may add a preference such as "chars-to-break-html-source-code-on-copying" with 72 (or maybe -1) as default and -1 meaning never break.
Assignee | ||
Comment 74•9 years ago
|
||
That's an enhancement request for another bug.
Assignee | ||
Comment 76•9 years ago
|
||
Attachment #8750340 -
Attachment is obsolete: true
Attachment #8750754 -
Flags: review+
Assignee | ||
Comment 77•9 years ago
|
||
OK, Android test disabled as Neil suggested in comment #75.
Keywords: checkin-needed
Comment 78•9 years ago
|
||
Keywords: checkin-needed
Comment 79•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment 80•9 years ago
|
||
Checked in http://hg.mozilla.org/releases/mozilla-esr45/rev/25a069f8002f THUNDERBIRD_45_VERBRANCH
Assignee | ||
Comment 81•9 years ago
|
||
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.
Approval Request Comment
[Feature/regressing bug #]: Not a regression, was always wrong ;-)
[User impact if declined]: Unwanted extra spaces appear when CJK text is copied.
[Describe test coverage new/current, TreeHerder]: Automated test.
[Risks and why]: I don't consider it risky.
All we do is disable the line-breaker, a technique introduced in bug 1225864 and well tested in Thunderbird since TB 45.
Now talking about risk: Disabling the line breaker did uncover a crash in another code path, so bug 1274837 would need uplift, too.
[String/UUID change made/needed]: None.
Attachment #8750754 -
Flags: approval-mozilla-aurora?
Comment 82•9 years ago
|
||
Jorg, we have this bug for 10 years, I am not sure we need to take an uplift for Firefox 48.
For Thunderbird, could you use only the relbranch?
Assignee | ||
Comment 83•9 years ago
|
||
We've already advanced that onto a branch for TB 45 ESR. Since so far we don't have release branches for our betas, it would be good to have it uplifted. Yes, broken for 10 years, so even better if it arrives six weeks earlier now ;-)
Comment 84•9 years ago
|
||
What we are going to do is to use release branches on mozilla-beta to land bugs such as this that we would like in our betas, for TB 47.0b1 and 48.0b1
Comment 85•9 years ago
|
||
Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1
Assignee | ||
Comment 86•9 years ago
|
||
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.
Too late now.
Attachment #8750754 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 87•8 years ago
|
||
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/502af8ab3f06
Assignee | ||
Comment 88•8 years ago
|
||
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/9a5abbdc1fb4
You need to log in
before you can comment on or make changes to this bug.
Description
•