Closed Bug 333064 Opened 18 years ago Closed 8 years ago

Copying CJK text inserts unwanted spaces

Categories

(Core :: DOM: Serializers, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed
thunderbird_esr38 --- wontfix
thunderbird_esr45 46+ fixed

People

(Reporter: musiphil, Assigned: jorgk-bmo)

References

Details

Attachments

(5 files, 9 obsolete files)

1.44 KB, text/html
Details
1.20 KB, text/html
Details
758 bytes, text/html
Details
6.92 KB, text/html
Details
4.14 KB, patch
jorgk-bmo
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1

When I select and copy some text, some gratuitous spaces are also copied.

Reproducible: Always

Steps to Reproduce:
1. Visit the above URL
2. Copy the entire text (by Ctrl+A Ctrl+C, for example)
3. Open any text editor and paste the copied text

Actual Results:  
[The first words of each line]

수 숫단

소 녀는

그 러나

Expected Results:  
수숫단

소녀는

그러나
The Korean characters got converted to numeric character references (&#nnnnn;) while posting, but still you get the idea.
Not quite sure what component this belongs in, but I'll guess DOM to Text for starters. Apparently it's a result of using align="justify", since if I remove that from the first paragraph, that one copies correctly while the remaining justified ones get an extra space.
Assignee: nobody → dom-to-text
Component: General → DOM to Text Conversion
Product: Firefox → Core
QA Contact: general
Version: unspecified → Trunk
Assignee: dom-to-text → nobody
QA Contact: dom-to-text
substituting the broken link for the example
Can anybody confirm that this is still a problem in Firefox 4, or that this has been fixed? It still persists in Firefox 3.6.16. I don't know why it has stayed UNCONFIRMED so far. Did I have to do something more?
Is it the same bug as bug 673667 ?
(In reply to Fanolian from comment #5)
> Is it the same bug as bug 673667 ?

I believe so. Thank you for reporting this.
Confirmed on latest Nightly 21 on Win 7. Seems that CJK characters are effected but pasting to pure-text editors have no problem.

(Pasting to MS Word 2013 with "Keep Text Only" option gives correct result as well)
Status: UNCONFIRMED → NEW
Ever confirmed: true
I grabbed the "text/html" clipboard target using this tool https://github.com/intuited/clipcli

The result is here http://people.mozilla.com/~kchen/333064.html

You can find that we added extra newlines in the paragraph. For Latin scripts this has visually no difference because HTML treat newlines as whitespaces. But for CJK scripts this will cause extra whitespace in between characters.
Can assign this bug to me?  I want to give it a try!
Assignee: nobody → arvin0731
Pase the copied rich text from Firefox to an HTML editor that doesn't change the source and then view the source (eg. any HTML element with contenteditable=true), we can see that there is a line break inserted between every 73 Chinese characters, which seems like a feature of Gecko. Hope the arbitary line break can be removed, or at least add an option to allow disabling it, since the space inserted contributes a significant semantic change for Chinese text.
(In reply to Kan-Ru Chen [:kanru] from comment #15)
> I think the text get wrapped here:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#175-204
> 
> And 72 is the default wrap column:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#116-121

Thank you very much... Could we add a config setting to overwrite the behavior?
I can reproduce the test case in bug 673667 comment 0. There are vairous comments in this bug talking about line breaks in HTML source code being rendered as white space between CJK characters. That is NOT a bug and certainly not this bug.
XMLSerializer and node.innerHTML/outerHTML are not affected.
Bug 524975 seems to be describing the same issue. I am not sure if we should dup.
See Also: → 524975
Looking at bug 524975 and looking at what that bug landed:
http://hg.mozilla.org/mozilla-central/rev/4ae32b827a7f#l1.24
we could possibly use the new serialiser flag introduced in bug 1225864:
OutputDisallowLineBreaking.
See Also: → 606747
I am chiming in with my two cent worth.
With FF 45.x, I see unwanted spaces inserted into text when copy&paste operation is performed for Japanese text.
(I suspect this problem manifests for other Asian languages such as Chinese, Korean and Thai. I say "Thai" because I noticed the  Bug 606747 - add extra spaces when copy/paste thai language.)

Here I could create a very clear-cut test case where you can try the
copy&paste of the Japanese text and see where the unintended/unwanted spaces are inserted. (See attached). This is more or less from a real text and not as simple as Jorg's sample. 

The unintended/unwanted spaces are very embarrassing and problematic
in business/formal document and not good for business processing where web interface is often used these days!

The problem was first noticed in Thunderbird HTML message composition, but found to be DOM-related (I think).

See https://groups.google.com/forum/#!topic/mozilla.dev.apps.thunderbird/6juKnWvrm-Y

BTW, this problem happens only with PASTE *WITH* FORMATTING, and not with paste *without* formatting.

TIA
(In reply to ISHIKAWA, Chiaki from comment #23)
> In the pasted text, you will note some extra spaces inserted. Look carefully.
Well, my Korean text is simpler, you don't have to look carefully, it tells you where the spaces are inserted ;-)
(In reply to Jorg K (GMT+2) from comment #24)
> (In reply to ISHIKAWA, Chiaki from comment #23)
> > In the pasted text, you will note some extra spaces inserted. Look carefully.
> Well, my Korean text is simpler, you don't have to look carefully, it tells
> you where the spaces are inserted ;-)

Yes, I noticed that :-) I just wanted to show that the space(s) do creep in and may not be so visible in real world document when one is in a rush, but on the receiving side, when one reads a document carefully, it *IS* noticeable and the reader may think the writer/sender is a careless writer (oh well)! 
[I am embarrassed that there is a typo: "embarassing". ]

At least, I made sure that my HTML page passed W3C's validator with flying colours!

Thank you again for looking into this issue.
Hmm, I played around with the encoder flags here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.cpp#102
removing OutputPreformatted (since it makes little sense together with OutputRaw, see:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsIDocumentEncoder.idl#76
(XXXbz How does this interact with OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
and also added OutputDisallowLineBreaking as suggested in comment #22. Nothing helped.

Needs more investigation, starting here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#199
(mWrapColumn = 72;)
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#1126
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXHTMLContentSerializer.cpp#76
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#119

Looks like the wrap after 72 characters is deeply engrained ;-(

Most likely we'll need to tweak one of these:
https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
	mozilla/dom/base/nsXMLContentSerializer.cpp
119 	mMaxColumn = 72;
122 	mMaxColumn = aWrapColumn;
1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
1739 	if (mColPos >= mMaxColumn) { 
Note that since we have OutputRaw, mDoWrap is false:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLContentSerializer.cpp#114
My other question would be why do we have a "wrap at 72 char" logic at first place? I don't see any spec on HTML/XML serialization mentioning the magic number 72 and/or like "the serialized DOM *may* be wrapped at 72 char" or something.

I don't have access to CVS history to see why it was added either.
(In reply to Jorg K (GMT+2) from comment #26)
> Hmm, I played around with the encoder flags here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.
> cpp#102
> removing OutputPreformatted (since it makes little sense together with
> OutputRaw, see:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsIDocumentEncoder.idl#76
> (XXXbz How does this interact with
> OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
> and also added OutputDisallowLineBreaking as suggested in comment #22.
> Nothing helped.
> 
> Needs more investigation, starting here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#199
> (mWrapColumn = 72;)
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#1126
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXHTMLContentSerializer.cpp#76
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXMLContentSerializer.cpp#119
> 
> Looks like the wrap after 72 characters is deeply engrained ;-(
> 
> Most likely we'll need to tweak one of these:
> https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
> 	mozilla/dom/base/nsXMLContentSerializer.cpp
> 119 	mMaxColumn = 72;
> 122 	mMaxColumn = aWrapColumn;
> 1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
> 1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
> 1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
> 1739 	if (mColPos >= mMaxColumn) { 
> Note that since we have OutputRaw, mDoWrap is false:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/
> nsXMLContentSerializer.cpp#114

Thank you for document the code path in question. It's very valuable.

As opposed to keep trying these nsIDocumentEncoder flags, after reading the comment and the part of the code you referenced (and the IDL), I have come to the conclusion that we should simply dive into nsXMLContentSerializer, and identify the instruction that adds the new line, and somehow ensure we don't get into that condition when the OutputRaw flag is set. Since, IMHO, the behavior simply violates the description of "OutputRaw" as described in the IDL.

What do you think?
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27)
> My other question would be why do we have a "wrap at 72 char" logic at first
> place?
Remember Netscape? The browser/mail client/HTML editor? It's a hang over from the quirky old past. The bug is from 2005, that's about the year Netscape was split into Firefox and Thunderbird. Much of the serialisation code was written to cater for e-mail formatting.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
> What do you think?
Sure, I'll debug it in the next few days. Note that OutputDisallowLineBreaking may be important, since ASCII strings are not broken in the middle, whereas CJK (Chinese, Japanese, Korean) strings can always be broken. I found out the hard way when fixing the "Asian crisis" in Thunderbird (bug 1225864 and others).

(Can I request that you don't quote/reply to entire comments and only reply to a specific line/section. We don't need duplicated content.)
I was right: Disallowing line breaking for CJK text here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#116
fixes the problem as predicted. I just have to see how to pass it in since there seems to be more than one call to nsXMLContentSerializer::Init(). Can't be long now.
(In reply to Jorg K (GMT+2) from comment #30)
> Can't be long now.

Nice! I was thinking about working on this in my spare hours but I will leave you to it.

(In reply to Jorg K (GMT+2) from comment #29)
> (Can I request that you don't quote/reply to entire comments and only reply
> to a specific line/section. We don't need duplicated content.)

Sure.
Attached patch Possible solution (v1). (obsolete) — Splinter Review
This fixes the problem. Tested on attachment 8748800 [details].

Now comes the usual: Try run and writing a test for it. Here the try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c766e41514f2
Assignee: arvin0731 → mozilla
Status: NEW → ASSIGNED
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
"Raw" already worked, but the path that needed fixing wasn't the path that used "raw".

Chiaki-san: I tried the patch in TB and it works great there. No more unwanted spaces pasted.
Summary: Copying text copies gratuitous spaces → Copying CJK text inserts unwanted spaces
(In reply to Jorg K (GMT+2) from comment #33)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #28)
> > Since, IMHO, the behavior simply violates the description of "OutputRaw"
> > as described in the IDL.
> "Raw" already worked, but the path that needed fixing wasn't the path that
> used "raw".
> 
> Chiaki-san: I tried the patch in TB and it works great there. No more
> unwanted spaces pasted.

Dear Jorg, 
Thank you, isn't this great? So once your patch lands in M-C, and C-C picks it up, the TB will see the original issue I saw in TB resolved. How nice.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27) 
> I don't have access to CVS history to see why it was added either.

BTW, is bonsai no longer available? There used to be a web server that allows us to look int pre-Mercurial CVS history.
Neil, would you be able to review this?
Ehsan is busy and doesn't want to do reviews and Masayuki will tell me that he doesn't have review powers in dom/base/ (*).

This is making use of the serialiser flag nsIDocumentEncoder::OutputDisallowLineBreaking created for exactly this purpose in bug 1225864 to avoid shredding through CJK strings since they can be broken anywhere for screen display. But they can't be broken if I want to turn DOM into text.

Also, would you like to see another try run with the new test? I have a try run without it in comment #32 and that came out green. The test works locally and it fails locally with the C++ part removed.

(*) On second thought, Masayuki reviewed bug 1225864 which was also in the serialisers in dom/base/ but in bug 233705 comment #53 he said:
  I don't have rights to review dom/base/nsPlainTextSerializer.*
So I don't know.
Attachment #8749166 - Attachment is obsolete: true
Attachment #8749425 - Flags: review?(enndeakin)
Changed the test, the <div> containing the text doesn't need to be editable.
Attachment #8749425 - Attachment is obsolete: true
Attachment #8749425 - Flags: review?(enndeakin)
Attachment #8749535 - Flags: review?(enndeakin)
Comment on attachment 8749535 [details] [diff] [review]
Possible solution (v1) with test (v1a).

>+  div.focus();

The div isn't focusable so this doesn't do anything.


>+  // Copy to clipboard.
>+  var webnav = SpecialPowers.wrap(window).QueryInterface(SpecialPowers.Ci.nsIInterfaceRequestor)
>+                     .getInterface(SpecialPowers.Ci.nsIWebNavigation)
>+  var docShell = webnav.QueryInterface(SpecialPowers.Ci.nsIDocShell);
>+  var documentViewer = docShell.contentViewer
>+                               .QueryInterface(SpecialPowers.Ci.nsIContentViewerEdit);
>+  documentViewer.copySelection();
>+  var clipboard = SpecialPowers.Services.clipboard;
>+
>+  var clipboardData = SpecialPowers.wrap(getClipboardData("text/html", clipboard));
>+  var value = clipboardData.value.QueryInterface(SpecialPowers.Ci.nsISupportsString).data;
>+

It would be much better to just synthesize a shortcut key event with synthesizeKey() and listen for the copy or paste event than doing this low-level stuff.

The above code might not work on e10s or linux either since it doesn't wait for the native clipboard. You can just use SimpleTest.waitForClipboard instead. I think you want a try run for this, yes.
Attachment #8749535 - Flags: review?(enndeakin) → review-
Attached patch WIP - not working (obsolete) — Splinter Review
Neil, I'm having a hard time changing the test to using waitForClipboard.

This seems to be undocumented and I can't find a good example to copy from.

I copied from
https://dxr.mozilla.org/mozilla-aurora/source/editor/libeditor/tests/test_bug603556.html#27
Note that I don't need the paste, since the problem happens at copy time.

Like this, the test always succeeds, even without the actual code fix (see patch).

I have no idea what clipboard flavour it tests to decide on whether the copy succeeded or failed. Console says:
TEST-PASS | dom/base/test/test_bug333064.html | Clipboard has the given value

So perhaps it was comparing the text/plain flavour.

I've also seen that waitForClipboard can have more parameters(?):
https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/test_bug1170911.html#42
But I wouldn't know how to use them and appending "text/html" as fifth parameter gives:
Timed out while polling clipboard for pasted data.

Also, that must use a function different to the one here:
https://dxr.mozilla.org/mozilla-aurora/source/devtools/client/inspector/test/head.js#723
The first parameter can be a comparison function. But once again, a simple string gets passed in here and that seems to be the text/plain flavour.

<offtopic>
Knowing what the solution to this bug is took me 30 seconds, poking around to see where to put this solution, that is finding the one line in 7 million code lines that make up Mozilla, took me 1-2 hours. And trying to put a test together has taken a multiple of this time and is still not finished.

So you're help to finish off this bug would be much appreciated.
</off-topic>
Flags: needinfo?(enndeakin)
(In reply to Jorg K (PTO during summer, NI me) from comment #38)
> I've also seen that waitForClipboard can have more parameters(?):
> https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/
> test_bug1170911.html#42
> But I wouldn't know how to use them and appending "text/html" as fifth
> parameter gives:
> Timed out while polling clipboard for pasted data.
> 

I think it only handles text/unicode, but that should be available for this test, right?

> <offtopic>
> Knowing what the solution to this bug is took me 30 seconds, poking around
> to see where to put this solution, that is finding the one line in 7 million
> code lines that make up Mozilla, took me 1-2 hours. And trying to put a test
> together has taken a multiple of this time and is still not finished.
> 
> So you're help to finish off this bug would be much appreciated.
> </off-topic>

If you're having trouble just check if the test works on try as is and that's fine too. I'd rather avoid direct nsITransferable use if possible but I'd also rather developers not waste time just for tests.
Flags: needinfo?(enndeakin)
The waitForClipboard seems much simpler, but the clipboard data that is tested or passed to the test is correct even without the code change.

The problem is the "test/html" flavour. That runs trough a faulty serialiser and the CJK string gets chopped.

So is there any way to easily get to the "test/html" flavour in this setup?

Where is the definition of waitForClipboard that takes more than four parameters?
Flags: needinfo?(enndeakin)
Here we have the best of both worlds ;-)

Using waitForClipboard and checking the "text/html" manually.

Fails without the code fix, works with the code fix. Added bonus: "text/plain" is also tested. What do you think? Try run coming up.
Attachment #8749535 - Attachment is obsolete: true
Attachment #8749976 - Attachment is obsolete: true
Attachment #8750106 - Flags: review?(enndeakin)
BTW, at least on Linux this new test runs as part of M(1), and it's green ;-)
OK, M(1) and M-e10s(1) are green everywhere.
It looks like there is a bug on the last line of waitForClipboard. It passes 'text/unicode' instead of 'requestedFlavor'.

It would be ideal if we could get this to work. Then, you can just check the data in the first argument to waitForClipboard:

waitForClipboard(function (data) { return data == expectedData }, ...
Flags: needinfo?(enndeakin)
This test works now. I've I take the C++ hunk out, I get this failure:
Timed out while polling clipboard for pasted data.

I don't get something along the lines of "unexpected value".

You pointed out that
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#973
is questionable. Well, I've changed the
  }, aFailureFn, "text/unicode");
to
  }, aFailureFn, requestedFlavor);
and it didn't make a difference.

In any case, I dumped out the value passed in to the comparison function and it contains <html><body> ... (160 Korean characters) ... etc.

So the test works and the code works although, as mentioned above, if I remove the code change I get a strange test failure that will be hard to understand should it ever arise.

So what do you suggest?

Another try run here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f28f021bd21
Attachment #8750295 - Flags: review?(enndeakin)
Further to what I said:
I don't even need to pull out the C++ change to create a test failure.
If I change
  return value.includes("html") && value.includes("안".repeat(160));
to
  return value.includes("htmlhuhu") && value.includes("안".repeat(160));
I get a failure with:
Timed out while polling clipboard for pasted data.

Is that expected? I find it mildly confusing.
Hmm, fails on Linux on M(1) and M-e10s(1): Timed out while polling clipboard for pasted data.
Maybe the clipboard is different and doesn't have the "html"??
(And I was silly to run all Mochitests instead of just (1).)
The way that waitForClipboard works is that it puts some dummy data on the clipboard, calls the supplied initializer function to put what you want on the clipboard, then keeps polling the clipboard until validatorFn returns true.

Without the C++ change, it will fail and time out.

With the change, it should succeed but not necessarily right away.

Without the change I suggested in comment 45, the validatorFn should be passed the text/unicode data, which should succeed even without your changes, as the plain text doesn't have the line-breaking issue. Is this not the case?
(In reply to Neil Deakin from comment #49)
> The way that waitForClipboard works is that it puts some dummy data on the
> clipboard, calls the supplied initializer function to put what you want on
> the clipboard, then keeps polling the clipboard until validatorFn returns
> true.
> Without the C++ change, it will fail and time out.
OK, that explains the time out.

> With the [C++] change, it should succeed but not necessarily right away.
Yes, with the C++ hunk in place, it works on Windows, but apparently not on other platforms, see comment #49 above.

> Without the change I suggested in comment 45, the validatorFn should be
> passed the text/unicode data, which should succeed even without your
> changes, as the plain text doesn't have the line-breaking issue. Is this not
> the case?
Without the change you suggested in comment 45, I see <html><body> ... in the validator function.
The test as published in attachment 8750295 [details] [diff] [review] works locally on Windows (but apparently not on other platforms, see comment #49 above). So I get passed the HTML. Let me dump it out for you:
<html><body>
<!--StartFragment--><div id="korean-text"> ... 160 expected characters ... </div><!--EndFragment-->
</body>
</html>
Yes, the text/unicode flavour doesn't have the line breaking problem. That's what I got when I didn't ask for "text/html" as fifth parameter.
Looking at waitForClipboard again, I can't see an error:
    // First we wait for a known value different from the expected one.
    var preExpectedVal = initialVal;
    SpecialPowers.clipboardCopyString(preExpectedVal);
    wait(function(aData) { return aData  == preExpectedVal; },
         function() {
           // Call the original setup fn
           aSetupFn();
           wait(inputValidatorFn, aSuccessFn, aFailureFn, requestedFlavor); <<===
         }, aFailureFn, "text/unicode");
}

The first wait call passes in some initial value, a simplified comparison function and selects the "text/unicode" flavour. The subsequent inner call does request the correct flavour.

In any case, my test isn't working on Linux, maybe also not on Mac, most likely because the only Windows has the <html><body> stuff, as we know from other bugs and as can be seen here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/test/test_bug166235.html#74

In conclusion: I will cancel the try run and rerun it with a different test.
OK, I looked in more detail at this code. The waitForClipboard function is actually correct as is; it is just very confusing code. The 'text/unicode' reference is only for the initial dummy data check. For other checks it does indeed use the passed in flavor. So ignore that change I suggested.

Note that on Mac, I just get as the passed data:

<div id="korean-text">안안...</div>

You probably want to check specially for the div element with 'korean-text' as an id.
Test that should work on all platforms.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=458299fbc89e
Attachment #8750106 - Attachment is obsolete: true
Attachment #8750295 - Attachment is obsolete: true
Attachment #8750106 - Flags: review?(enndeakin)
Attachment #8750295 - Flags: review?(enndeakin)
Attachment #8750328 - Flags: review?(enndeakin)
Wires crossed here. I'll do another one.
OK, here we have a test that checks for "korean-text" (that proves that we got the HTML data) and the string itself.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0fc0798bbf4
Attachment #8750328 - Attachment is obsolete: true
Attachment #8750328 - Flags: review?(enndeakin)
Attachment #8750337 - Flags: review?(enndeakin)
(this time without the dump().)
Attachment #8750337 - Attachment is obsolete: true
Attachment #8750337 - Flags: review?(enndeakin)
Attachment #8750340 - Flags: review?(enndeakin)
Attachment #8750340 - Flags: review?(enndeakin) → review+
Thanks Neil!

Dear Sheriff, try run in comment #56.
Keywords: checkin-needed
Kent, this is a patch to fix unwanted spaces in copy/paste of CJK text.
It uses the same technique employed to fix these problems in e-mail. In fact, it builds on our solution from bug 1225864 (see comment #22). So our tweak has finally proven useful for FF as well ;-)

Now that we can add tracking flags to M-C bugs, do you still want to duplicate the bug like we did before?

I recommend inclusion in the next TB 45.x release, the patch has no risk since - as I said - it uses a well understood technique. Chiaki alerted me to the problem on dev-apps-thunderbird@lists.mozilla.org.
Flags: needinfo?(rkent)
We don't need to create separate TB bugs if we can set the needed flags in Core bugs. This should have a reasonable chance of uplifting into TB branches - but it should first make the proposed "release candidate" beta proposed for next week. Unfortunately there is no way for us to run his through TB aurora without uplifting to m-aurora.
Flags: needinfo?(rkent)
What will the copied source be after the patch? I mean, will the copied source code still be line-breaked on every 72 chars? And will it happen on all languages?
This patch affects the "text/html" flavour of the clipboard after copying. The only change that was made is that long CJK strings are now no longer broken since that lead to the insertion of unwanted spaces.

Long non-ASCII string with "European" characters were already not broken, so copying a long string of á was treated differently to a long string of 안.

Otherwise the formatting hasn't changed so there is no need for any concern.
Oh, I see you raised bug 673667. This is exactly the issue we fixed here.
So there will still be a line-break at, say, white spaces if the string exceeds 72 chars?

An additional question is what is the actually rule to determine "CJK string"? Will full-shape symbols, such as full-shape spaces or CJK punctuations (e.g. ,。、;:○『』 ...) be counted as "CJK string" and be non-broken?
backed out due to test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27556616&repo=mozilla-inbound
Flags: needinfo?(mozilla)
Yes, CJK punctuation counts as CJK string.

Tomcat, sorry, will disable the text on Android.
Flags: needinfo?(mozilla)
I wonder whether it would be better to add an user preference to determine whether to break the copied HTML source, since that in some circumstances the user may not want the auto line break of HTML source at all.
Neil, which one should it be?

skip-if = toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'mulet' || buildapp == 'b2g' || toolkit == 'android'
skip-if = os == 'android'

Or

skip-if = (os != 'linux') && (os != 'mac') && (os != 'windows')

Could you pre-approve a patch with the line of your choice added ;-) ?
Flags: needinfo?(enndeakin)
... Or a preference to customize the length to break the string instead of using a magic number 72.
Danny: We're fixing the serialisation of CJK strings in this bug, that is, turning the browsers document object model (DOM) into text.

There is no option required. Besides: What would you like this option to be called?
erroneously-insert-spaces-into-copied-cjk-strings with default set to 'false'?

If you have concerns, please wait until this bug has landed, then try a Nightly compile, and if you have a problem, raise another bug.

This bug here is done, only landing the patch caused some internal test failures which we will address.
I agree that fixing this issue does not require adding options.

I was somehow talking another (related) issue, that the feature of automatically line break might be a non-necessary issue, and if not, a potential solution is to remove the automatically line break functionality.

If this functionality is still considered necessary for some reason, consider the fact that one may not want this functionality in some situation, we may add a preference such as "chars-to-break-html-source-code-on-copying" with 72 (or maybe -1) as default and -1 meaning never break.
That's an enhancement request for another bug.
Probably:

  skip-if = toolkit == 'android'
Flags: needinfo?(enndeakin)
OK, Android test disabled as Neil suggested in comment #75.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a35b2ac359e5
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Approval Request Comment
[Feature/regressing bug #]: Not a regression, was always wrong ;-)
[User impact if declined]: Unwanted extra spaces appear when CJK text is copied.
[Describe test coverage new/current, TreeHerder]: Automated test.
[Risks and why]: I don't consider it risky.
All we do is disable the line-breaker, a technique introduced in bug 1225864 and well tested in Thunderbird since TB 45.

Now talking about risk: Disabling the line breaker did uncover a crash in another code path, so bug 1274837 would need uplift, too.
[String/UUID change made/needed]: None.
Attachment #8750754 - Flags: approval-mozilla-aurora?
Jorg, we have this bug for 10 years, I am not sure we need to take an uplift for Firefox 48.
For Thunderbird, could you use only the relbranch?
We've already advanced that onto a branch for TB 45 ESR. Since so far we don't have release branches for our betas, it would be good to have it uplifted. Yes, broken for 10 years, so even better if it arrives six weeks earlier now ;-)
What we are going to do is to use release branches on mozilla-beta to land bugs such as this that we would like in our betas, for TB 47.0b1 and 48.0b1
Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Too late now.
Attachment #8750754 - Flags: approval-mozilla-aurora?
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/502af8ab3f06
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/9a5abbdc1fb4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: