Copying CJK text inserts unwanted spaces

RESOLVED FIXED in Firefox 49

Status

()

Core
Serializers
RESOLVED FIXED
11 years ago
a year ago

People

(Reporter: Seungbeom Kim, Assigned: Jorg K (GMT+2))

Tracking

Trunk
mozilla49
x86
Windows XP
Points:
---

Firefox Tracking Flags

(firefox49 fixed, thunderbird_esr38 wontfix, thunderbird_esr4546+ fixed)

Details

Attachments

(5 attachments, 9 obsolete attachments)

1.44 KB, text/html
Details
1.20 KB, text/html
Details
758 bytes, text/html
Details
6.92 KB, text/html
Details
4.14 KB, patch
Jorg K (GMT+2)
: review+
Details | Diff | Splinter Review
(Reporter)

Description

11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1

When I select and copy some text, some gratuitous spaces are also copied.

Reproducible: Always

Steps to Reproduce:
1. Visit the above URL
2. Copy the entire text (by Ctrl+A Ctrl+C, for example)
3. Open any text editor and paste the copied text

Actual Results:  
[The first words of each line]

수 숫단

소 녀는

그 러나

Expected Results:  
수숫단

소녀는

그러나
(Reporter)

Comment 1

11 years ago
The Korean characters got converted to numeric character references (&#nnnnn;) while posting, but still you get the idea.
Not quite sure what component this belongs in, but I'll guess DOM to Text for starters. Apparently it's a result of using align="justify", since if I remove that from the first paragraph, that one copies correctly while the remaining justified ones get an extra space.
Assignee: nobody → dom-to-text
Component: General → DOM to Text Conversion
Product: Firefox → Core
QA Contact: general
Version: unspecified → Trunk
Assignee: dom-to-text → nobody
QA Contact: dom-to-text
(Reporter)

Comment 3

6 years ago
Created attachment 523503 [details]
an example of the problem

substituting the broken link for the example
(Reporter)

Updated

6 years ago
(Reporter)

Comment 4

6 years ago
Can anybody confirm that this is still a problem in Firefox 4, or that this has been fixed? It still persists in Firefox 3.6.16. I don't know why it has stayed UNCONFIRMED so far. Did I have to do something more?

Comment 5

6 years ago
Is it the same bug as bug 673667 ?
(Reporter)

Comment 6

6 years ago
(In reply to Fanolian from comment #5)
> Is it the same bug as bug 673667 ?

I believe so. Thank you for reporting this.
Duplicate of this bug: 673667
Confirmed on latest Nightly 21 on Win 7. Seems that CJK characters are effected but pasting to pure-text editors have no problem.

(Pasting to MS Word 2013 with "Keep Text Only" option gives correct result as well)
Status: UNCONFIRMED → NEW
Ever confirmed: true
I grabbed the "text/html" clipboard target using this tool https://github.com/intuited/clipcli

The result is here http://people.mozilla.com/~kchen/333064.html

You can find that we added extra newlines in the paragraph. For Latin scripts this has visually no difference because HTML treat newlines as whitespaces. But for CJK scripts this will cause extra whitespace in between characters.

Comment 10

4 years ago
Can assign this bug to me?  I want to give it a try!
Assignee: nobody → arvin0731
Comment hidden (obsolete)
Comment hidden (obsolete)

Comment 13

3 years ago
Pase the copied rich text from Firefox to an HTML editor that doesn't change the source and then view the source (eg. any HTML element with contenteditable=true), we can see that there is a line break inserted between every 73 Chinese characters, which seems like a feature of Gecko. Hope the arbitary line break can be removed, or at least add an option to allow disabling it, since the space inserted contributes a significant semantic change for Chinese text.
Maybe something to look here:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsHTMLContentSerializer.cpp

or

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsPlainTextSerializer.cpp
I think the text get wrapped here:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#175-204

And 72 is the default wrap column:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#116-121

Comment 16

3 years ago
(In reply to Kan-Ru Chen [:kanru] from comment #15)
> I think the text get wrapped here:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#175-204
> 
> And 72 is the default wrap column:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#116-121

Thank you very much... Could we add a config setting to overwrite the behavior?
I can reproduce the test case in bug 673667 comment 0. There are vairous comments in this bug talking about line breaks in HTML source code being rendered as white space between CJK characters. That is NOT a bug and certainly not this bug.
Created attachment 8747064 [details]
Reduced test case from bug 673667 comment 0
mMaxColumn = 72 was there before we switch from CVS to Hg!

https://hg.mozilla.org/mozilla-central/annotate/dfff608ebad0/content/base/src/nsHTMLContentSerializer.cpp#l117
XMLSerializer and node.innerHTML/outerHTML are not affected.
Bug 524975 seems to be describing the same issue. I am not sure if we should dup.
See Also: → bug 524975
(Assignee)

Comment 22

a year ago
Created attachment 8748800 [details]
Here a simple web page showing the problem.

Looking at bug 524975 and looking at what that bug landed:
http://hg.mozilla.org/mozilla-central/rev/4ae32b827a7f#l1.24
we could possibly use the new serialiser flag introduced in bug 1225864:
OutputDisallowLineBreaking.

Updated

a year ago
See Also: → bug 606747

Comment 23

a year ago
Created attachment 8748858 [details]
a real world Japanese paragraph copy & paste shows the problem.

I am chiming in with my two cent worth.
With FF 45.x, I see unwanted spaces inserted into text when copy&paste operation is performed for Japanese text.
(I suspect this problem manifests for other Asian languages such as Chinese, Korean and Thai. I say "Thai" because I noticed the  Bug 606747 - add extra spaces when copy/paste thai language.)

Here I could create a very clear-cut test case where you can try the
copy&paste of the Japanese text and see where the unintended/unwanted spaces are inserted. (See attached). This is more or less from a real text and not as simple as Jorg's sample. 

The unintended/unwanted spaces are very embarrassing and problematic
in business/formal document and not good for business processing where web interface is often used these days!

The problem was first noticed in Thunderbird HTML message composition, but found to be DOM-related (I think).

See https://groups.google.com/forum/#!topic/mozilla.dev.apps.thunderbird/6juKnWvrm-Y

BTW, this problem happens only with PASTE *WITH* FORMATTING, and not with paste *without* formatting.

TIA
(Assignee)

Comment 24

a year ago
(In reply to ISHIKAWA, Chiaki from comment #23)
> In the pasted text, you will note some extra spaces inserted. Look carefully.
Well, my Korean text is simpler, you don't have to look carefully, it tells you where the spaces are inserted ;-)

Comment 25

a year ago
(In reply to Jorg K (GMT+2) from comment #24)
> (In reply to ISHIKAWA, Chiaki from comment #23)
> > In the pasted text, you will note some extra spaces inserted. Look carefully.
> Well, my Korean text is simpler, you don't have to look carefully, it tells
> you where the spaces are inserted ;-)

Yes, I noticed that :-) I just wanted to show that the space(s) do creep in and may not be so visible in real world document when one is in a rush, but on the receiving side, when one reads a document carefully, it *IS* noticeable and the reader may think the writer/sender is a careless writer (oh well)! 
[I am embarrassed that there is a typo: "embarassing". ]

At least, I made sure that my HTML page passed W3C's validator with flying colours!

Thank you again for looking into this issue.
(Assignee)

Comment 26

a year ago
Hmm, I played around with the encoder flags here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.cpp#102
removing OutputPreformatted (since it makes little sense together with OutputRaw, see:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsIDocumentEncoder.idl#76
(XXXbz How does this interact with OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
and also added OutputDisallowLineBreaking as suggested in comment #22. Nothing helped.

Needs more investigation, starting here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#199
(mWrapColumn = 72;)
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#1126
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXHTMLContentSerializer.cpp#76
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#119

Looks like the wrap after 72 characters is deeply engrained ;-(

Most likely we'll need to tweak one of these:
https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
	mozilla/dom/base/nsXMLContentSerializer.cpp
119 	mMaxColumn = 72;
122 	mMaxColumn = aWrapColumn;
1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
1739 	if (mColPos >= mMaxColumn) { 
Note that since we have OutputRaw, mDoWrap is false:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLContentSerializer.cpp#114
My other question would be why do we have a "wrap at 72 char" logic at first place? I don't see any spec on HTML/XML serialization mentioning the magic number 72 and/or like "the serialized DOM *may* be wrapped at 72 char" or something.

I don't have access to CVS history to see why it was added either.
(In reply to Jorg K (GMT+2) from comment #26)
> Hmm, I played around with the encoder flags here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.
> cpp#102
> removing OutputPreformatted (since it makes little sense together with
> OutputRaw, see:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsIDocumentEncoder.idl#76
> (XXXbz How does this interact with
> OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
> and also added OutputDisallowLineBreaking as suggested in comment #22.
> Nothing helped.
> 
> Needs more investigation, starting here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#199
> (mWrapColumn = 72;)
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#1126
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXHTMLContentSerializer.cpp#76
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXMLContentSerializer.cpp#119
> 
> Looks like the wrap after 72 characters is deeply engrained ;-(
> 
> Most likely we'll need to tweak one of these:
> https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
> 	mozilla/dom/base/nsXMLContentSerializer.cpp
> 119 	mMaxColumn = 72;
> 122 	mMaxColumn = aWrapColumn;
> 1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
> 1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
> 1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
> 1739 	if (mColPos >= mMaxColumn) { 
> Note that since we have OutputRaw, mDoWrap is false:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/
> nsXMLContentSerializer.cpp#114

Thank you for document the code path in question. It's very valuable.

As opposed to keep trying these nsIDocumentEncoder flags, after reading the comment and the part of the code you referenced (and the IDL), I have come to the conclusion that we should simply dive into nsXMLContentSerializer, and identify the instruction that adds the new line, and somehow ensure we don't get into that condition when the OutputRaw flag is set. Since, IMHO, the behavior simply violates the description of "OutputRaw" as described in the IDL.

What do you think?
(Assignee)

Comment 29

a year ago
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27)
> My other question would be why do we have a "wrap at 72 char" logic at first
> place?
Remember Netscape? The browser/mail client/HTML editor? It's a hang over from the quirky old past. The bug is from 2005, that's about the year Netscape was split into Firefox and Thunderbird. Much of the serialisation code was written to cater for e-mail formatting.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
> What do you think?
Sure, I'll debug it in the next few days. Note that OutputDisallowLineBreaking may be important, since ASCII strings are not broken in the middle, whereas CJK (Chinese, Japanese, Korean) strings can always be broken. I found out the hard way when fixing the "Asian crisis" in Thunderbird (bug 1225864 and others).

(Can I request that you don't quote/reply to entire comments and only reply to a specific line/section. We don't need duplicated content.)
(Assignee)

Comment 30

a year ago
I was right: Disallowing line breaking for CJK text here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#116
fixes the problem as predicted. I just have to see how to pass it in since there seems to be more than one call to nsXMLContentSerializer::Init(). Can't be long now.
(In reply to Jorg K (GMT+2) from comment #30)
> Can't be long now.

Nice! I was thinking about working on this in my spare hours but I will leave you to it.

(In reply to Jorg K (GMT+2) from comment #29)
> (Can I request that you don't quote/reply to entire comments and only reply
> to a specific line/section. We don't need duplicated content.)

Sure.
(Assignee)

Comment 32

a year ago
Created attachment 8749166 [details] [diff] [review]
Possible solution (v1).

This fixes the problem. Tested on attachment 8748800 [details].

Now comes the usual: Try run and writing a test for it. Here the try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c766e41514f2
Assignee: arvin0731 → mozilla
Status: NEW → ASSIGNED
(Assignee)

Comment 33

a year ago
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
"Raw" already worked, but the path that needed fixing wasn't the path that used "raw".

Chiaki-san: I tried the patch in TB and it works great there. No more unwanted spaces pasted.
(Assignee)

Updated

a year ago
Summary: Copying text copies gratuitous spaces → Copying CJK text inserts unwanted spaces

Comment 34

a year ago
(In reply to Jorg K (GMT+2) from comment #33)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #28)
> > Since, IMHO, the behavior simply violates the description of "OutputRaw"
> > as described in the IDL.
> "Raw" already worked, but the path that needed fixing wasn't the path that
> used "raw".
> 
> Chiaki-san: I tried the patch in TB and it works great there. No more
> unwanted spaces pasted.

Dear Jorg, 
Thank you, isn't this great? So once your patch lands in M-C, and C-C picks it up, the TB will see the original issue I saw in TB resolved. How nice.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27) 
> I don't have access to CVS history to see why it was added either.

BTW, is bonsai no longer available? There used to be a web server that allows us to look int pre-Mercurial CVS history.
(Assignee)

Comment 35

a year ago
Created attachment 8749425 [details] [diff] [review]
Possible solution (v1) with test.

Neil, would you be able to review this?
Ehsan is busy and doesn't want to do reviews and Masayuki will tell me that he doesn't have review powers in dom/base/ (*).

This is making use of the serialiser flag nsIDocumentEncoder::OutputDisallowLineBreaking created for exactly this purpose in bug 1225864 to avoid shredding through CJK strings since they can be broken anywhere for screen display. But they can't be broken if I want to turn DOM into text.

Also, would you like to see another try run with the new test? I have a try run without it in comment #32 and that came out green. The test works locally and it fails locally with the C++ part removed.

(*) On second thought, Masayuki reviewed bug 1225864 which was also in the serialisers in dom/base/ but in bug 233705 comment #53 he said:
  I don't have rights to review dom/base/nsPlainTextSerializer.*
So I don't know.
Attachment #8749166 - Attachment is obsolete: true
Attachment #8749425 - Flags: review?(enndeakin)
(Assignee)

Comment 36

a year ago
Created attachment 8749535 [details] [diff] [review]
Possible solution (v1) with test (v1a).

Changed the test, the <div> containing the text doesn't need to be editable.
Attachment #8749425 - Attachment is obsolete: true
Attachment #8749425 - Flags: review?(enndeakin)
Attachment #8749535 - Flags: review?(enndeakin)
Comment on attachment 8749535 [details] [diff] [review]
Possible solution (v1) with test (v1a).

>+  div.focus();

The div isn't focusable so this doesn't do anything.


>+  // Copy to clipboard.
>+  var webnav = SpecialPowers.wrap(window).QueryInterface(SpecialPowers.Ci.nsIInterfaceRequestor)
>+                     .getInterface(SpecialPowers.Ci.nsIWebNavigation)
>+  var docShell = webnav.QueryInterface(SpecialPowers.Ci.nsIDocShell);
>+  var documentViewer = docShell.contentViewer
>+                               .QueryInterface(SpecialPowers.Ci.nsIContentViewerEdit);
>+  documentViewer.copySelection();
>+  var clipboard = SpecialPowers.Services.clipboard;
>+
>+  var clipboardData = SpecialPowers.wrap(getClipboardData("text/html", clipboard));
>+  var value = clipboardData.value.QueryInterface(SpecialPowers.Ci.nsISupportsString).data;
>+

It would be much better to just synthesize a shortcut key event with synthesizeKey() and listen for the copy or paste event than doing this low-level stuff.

The above code might not work on e10s or linux either since it doesn't wait for the native clipboard. You can just use SimpleTest.waitForClipboard instead. I think you want a try run for this, yes.
Attachment #8749535 - Flags: review?(enndeakin) → review-
(Assignee)

Comment 38

a year ago
Created attachment 8749976 [details] [diff] [review]
WIP - not working

Neil, I'm having a hard time changing the test to using waitForClipboard.

This seems to be undocumented and I can't find a good example to copy from.

I copied from
https://dxr.mozilla.org/mozilla-aurora/source/editor/libeditor/tests/test_bug603556.html#27
Note that I don't need the paste, since the problem happens at copy time.

Like this, the test always succeeds, even without the actual code fix (see patch).

I have no idea what clipboard flavour it tests to decide on whether the copy succeeded or failed. Console says:
TEST-PASS | dom/base/test/test_bug333064.html | Clipboard has the given value

So perhaps it was comparing the text/plain flavour.

I've also seen that waitForClipboard can have more parameters(?):
https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/test_bug1170911.html#42
But I wouldn't know how to use them and appending "text/html" as fifth parameter gives:
Timed out while polling clipboard for pasted data.

Also, that must use a function different to the one here:
https://dxr.mozilla.org/mozilla-aurora/source/devtools/client/inspector/test/head.js#723
The first parameter can be a comparison function. But once again, a simple string gets passed in here and that seems to be the text/plain flavour.

<offtopic>
Knowing what the solution to this bug is took me 30 seconds, poking around to see where to put this solution, that is finding the one line in 7 million code lines that make up Mozilla, took me 1-2 hours. And trying to put a test together has taken a multiple of this time and is still not finished.

So you're help to finish off this bug would be much appreciated.
</off-topic>
Flags: needinfo?(enndeakin)
(In reply to Jorg K (PTO during summer, NI me) from comment #38)
> I've also seen that waitForClipboard can have more parameters(?):
> https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/
> test_bug1170911.html#42
> But I wouldn't know how to use them and appending "text/html" as fifth
> parameter gives:
> Timed out while polling clipboard for pasted data.
> 

I think it only handles text/unicode, but that should be available for this test, right?

> <offtopic>
> Knowing what the solution to this bug is took me 30 seconds, poking around
> to see where to put this solution, that is finding the one line in 7 million
> code lines that make up Mozilla, took me 1-2 hours. And trying to put a test
> together has taken a multiple of this time and is still not finished.
> 
> So you're help to finish off this bug would be much appreciated.
> </off-topic>

If you're having trouble just check if the test works on try as is and that's fine too. I'd rather avoid direct nsITransferable use if possible but I'd also rather developers not waste time just for tests.
Flags: needinfo?(enndeakin)
(Assignee)

Comment 40

a year ago
The waitForClipboard seems much simpler, but the clipboard data that is tested or passed to the test is correct even without the code change.

The problem is the "test/html" flavour. That runs trough a faulty serialiser and the CJK string gets chopped.

So is there any way to easily get to the "test/html" flavour in this setup?

Where is the definition of waitForClipboard that takes more than four parameters?
Flags: needinfo?(enndeakin)
(Assignee)

Comment 41

a year ago
Created attachment 8750106 [details] [diff] [review]
Possible solution (v1) with test (v2).

Here we have the best of both worlds ;-)

Using waitForClipboard and checking the "text/html" manually.

Fails without the code fix, works with the code fix. Added bonus: "text/plain" is also tested. What do you think? Try run coming up.
Attachment #8749535 - Attachment is obsolete: true
Attachment #8749976 - Attachment is obsolete: true
Attachment #8750106 - Flags: review?(enndeakin)
(Assignee)

Comment 42

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1feca83985b
(Assignee)

Comment 43

a year ago
BTW, at least on Linux this new test runs as part of M(1), and it's green ;-)
(Assignee)

Comment 44

a year ago
OK, M(1) and M-e10s(1) are green everywhere.
It looks like there is a bug on the last line of waitForClipboard. It passes 'text/unicode' instead of 'requestedFlavor'.

It would be ideal if we could get this to work. Then, you can just check the data in the first argument to waitForClipboard:

waitForClipboard(function (data) { return data == expectedData }, ...
Flags: needinfo?(enndeakin)
(Assignee)

Comment 46

a year ago
Created attachment 8750295 [details] [diff] [review]
Possible solution (v1) with test (v3).

This test works now. I've I take the C++ hunk out, I get this failure:
Timed out while polling clipboard for pasted data.

I don't get something along the lines of "unexpected value".

You pointed out that
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#973
is questionable. Well, I've changed the
  }, aFailureFn, "text/unicode");
to
  }, aFailureFn, requestedFlavor);
and it didn't make a difference.

In any case, I dumped out the value passed in to the comparison function and it contains <html><body> ... (160 Korean characters) ... etc.

So the test works and the code works although, as mentioned above, if I remove the code change I get a strange test failure that will be hard to understand should it ever arise.

So what do you suggest?

Another try run here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f28f021bd21
Attachment #8750295 - Flags: review?(enndeakin)
(Assignee)

Comment 47

a year ago
Further to what I said:
I don't even need to pull out the C++ change to create a test failure.
If I change
  return value.includes("html") && value.includes("안".repeat(160));
to
  return value.includes("htmlhuhu") && value.includes("안".repeat(160));
I get a failure with:
Timed out while polling clipboard for pasted data.

Is that expected? I find it mildly confusing.
(Assignee)

Comment 48

a year ago
Hmm, fails on Linux on M(1) and M-e10s(1): Timed out while polling clipboard for pasted data.
Maybe the clipboard is different and doesn't have the "html"??
(And I was silly to run all Mochitests instead of just (1).)
The way that waitForClipboard works is that it puts some dummy data on the clipboard, calls the supplied initializer function to put what you want on the clipboard, then keeps polling the clipboard until validatorFn returns true.

Without the C++ change, it will fail and time out.

With the change, it should succeed but not necessarily right away.

Without the change I suggested in comment 45, the validatorFn should be passed the text/unicode data, which should succeed even without your changes, as the plain text doesn't have the line-breaking issue. Is this not the case?
(Assignee)

Comment 50

a year ago
(In reply to Neil Deakin from comment #49)
> The way that waitForClipboard works is that it puts some dummy data on the
> clipboard, calls the supplied initializer function to put what you want on
> the clipboard, then keeps polling the clipboard until validatorFn returns
> true.
> Without the C++ change, it will fail and time out.
OK, that explains the time out.

> With the [C++] change, it should succeed but not necessarily right away.
Yes, with the C++ hunk in place, it works on Windows, but apparently not on other platforms, see comment #49 above.

> Without the change I suggested in comment 45, the validatorFn should be
> passed the text/unicode data, which should succeed even without your
> changes, as the plain text doesn't have the line-breaking issue. Is this not
> the case?
Without the change you suggested in comment 45, I see <html><body> ... in the validator function.
The test as published in attachment 8750295 [details] [diff] [review] works locally on Windows (but apparently not on other platforms, see comment #49 above). So I get passed the HTML. Let me dump it out for you:
<html><body>
<!--StartFragment--><div id="korean-text"> ... 160 expected characters ... </div><!--EndFragment-->
</body>
</html>
(Assignee)

Comment 51

a year ago
Yes, the text/unicode flavour doesn't have the line breaking problem. That's what I got when I didn't ask for "text/html" as fifth parameter.
(Assignee)

Comment 52

a year ago
Looking at waitForClipboard again, I can't see an error:
    // First we wait for a known value different from the expected one.
    var preExpectedVal = initialVal;
    SpecialPowers.clipboardCopyString(preExpectedVal);
    wait(function(aData) { return aData  == preExpectedVal; },
         function() {
           // Call the original setup fn
           aSetupFn();
           wait(inputValidatorFn, aSuccessFn, aFailureFn, requestedFlavor); <<===
         }, aFailureFn, "text/unicode");
}

The first wait call passes in some initial value, a simplified comparison function and selects the "text/unicode" flavour. The subsequent inner call does request the correct flavour.

In any case, my test isn't working on Linux, maybe also not on Mac, most likely because the only Windows has the <html><body> stuff, as we know from other bugs and as can be seen here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/test/test_bug166235.html#74

In conclusion: I will cancel the try run and rerun it with a different test.
OK, I looked in more detail at this code. The waitForClipboard function is actually correct as is; it is just very confusing code. The 'text/unicode' reference is only for the initial dummy data check. For other checks it does indeed use the passed in flavor. So ignore that change I suggested.

Note that on Mac, I just get as the passed data:

<div id="korean-text">안안...</div>

You probably want to check specially for the div element with 'korean-text' as an id.
(Assignee)

Comment 54

a year ago
Created attachment 8750328 [details] [diff] [review]
Possible solution (v1) with test (v4).

Test that should work on all platforms.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=458299fbc89e
Attachment #8750106 - Attachment is obsolete: true
Attachment #8750295 - Attachment is obsolete: true
Attachment #8750106 - Flags: review?(enndeakin)
Attachment #8750295 - Flags: review?(enndeakin)
Attachment #8750328 - Flags: review?(enndeakin)
(Assignee)

Comment 55

a year ago
Wires crossed here. I'll do another one.
(Assignee)

Comment 56

a year ago
Created attachment 8750337 [details] [diff] [review]
Possible solution (v1) with test (v5).

OK, here we have a test that checks for "korean-text" (that proves that we got the HTML data) and the string itself.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0fc0798bbf4
Attachment #8750328 - Attachment is obsolete: true
Attachment #8750328 - Flags: review?(enndeakin)
Attachment #8750337 - Flags: review?(enndeakin)
(Assignee)

Comment 57

a year ago
Created attachment 8750340 [details] [diff] [review]
Possible solution (v1) with test (v5b).

(this time without the dump().)
Attachment #8750337 - Attachment is obsolete: true
Attachment #8750337 - Flags: review?(enndeakin)
Attachment #8750340 - Flags: review?(enndeakin)
Attachment #8750340 - Flags: review?(enndeakin) → review+
(Assignee)

Comment 58

a year ago
Thanks Neil!

Dear Sheriff, try run in comment #56.
Keywords: checkin-needed
(Assignee)

Comment 59

a year ago
Kent, this is a patch to fix unwanted spaces in copy/paste of CJK text.
It uses the same technique employed to fix these problems in e-mail. In fact, it builds on our solution from bug 1225864 (see comment #22). So our tweak has finally proven useful for FF as well ;-)

Now that we can add tracking flags to M-C bugs, do you still want to duplicate the bug like we did before?

I recommend inclusion in the next TB 45.x release, the patch has no risk since - as I said - it uses a well understood technique. Chiaki alerted me to the problem on dev-apps-thunderbird@lists.mozilla.org.
status-thunderbird_esr45: --- → affected
tracking-thunderbird_esr45: --- → ?
Flags: needinfo?(rkent)
We don't need to create separate TB bugs if we can set the needed flags in Core bugs. This should have a reasonable chance of uplifting into TB branches - but it should first make the proposed "release candidate" beta proposed for next week. Unfortunately there is no way for us to run his through TB aurora without uplifting to m-aurora.
Flags: needinfo?(rkent)

Comment 61

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ae676e82fac
Keywords: checkin-needed

Comment 62

a year ago
What will the copied source be after the patch? I mean, will the copied source code still be line-breaked on every 72 chars? And will it happen on all languages?
(Assignee)

Comment 63

a year ago
This patch affects the "text/html" flavour of the clipboard after copying. The only change that was made is that long CJK strings are now no longer broken since that lead to the insertion of unwanted spaces.

Long non-ASCII string with "European" characters were already not broken, so copying a long string of á was treated differently to a long string of 안.

Otherwise the formatting hasn't changed so there is no need for any concern.
(Assignee)

Comment 64

a year ago
Oh, I see you raised bug 673667. This is exactly the issue we fixed here.

Comment 65

a year ago
So there will still be a line-break at, say, white spaces if the string exceeds 72 chars?

An additional question is what is the actually rule to determine "CJK string"? Will full-shape symbols, such as full-shape spaces or CJK punctuations (e.g. ,。、;:○『』 ...) be counted as "CJK string" and be non-broken?
backed out due to test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27556616&repo=mozilla-inbound
Flags: needinfo?(mozilla)
(Assignee)

Comment 67

a year ago
Yes, CJK punctuation counts as CJK string.

Tomcat, sorry, will disable the text on Android.
Flags: needinfo?(mozilla)

Comment 68

a year ago
I wonder whether it would be better to add an user preference to determine whether to break the copied HTML source, since that in some circumstances the user may not want the auto line break of HTML source at all.
(Assignee)

Comment 69

a year ago
Neil, which one should it be?

skip-if = toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'mulet' || buildapp == 'b2g' || toolkit == 'android'
skip-if = os == 'android'

Or

skip-if = (os != 'linux') && (os != 'mac') && (os != 'windows')

Could you pre-approve a patch with the line of your choice added ;-) ?
(Assignee)

Updated

a year ago
Flags: needinfo?(enndeakin)

Comment 70

a year ago
... Or a preference to customize the length to break the string instead of using a magic number 72.

Comment 71

a year ago
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/620a79a39975
(Assignee)

Comment 72

a year ago
Danny: We're fixing the serialisation of CJK strings in this bug, that is, turning the browsers document object model (DOM) into text.

There is no option required. Besides: What would you like this option to be called?
erroneously-insert-spaces-into-copied-cjk-strings with default set to 'false'?

If you have concerns, please wait until this bug has landed, then try a Nightly compile, and if you have a problem, raise another bug.

This bug here is done, only landing the patch caused some internal test failures which we will address.

Comment 73

a year ago
I agree that fixing this issue does not require adding options.

I was somehow talking another (related) issue, that the feature of automatically line break might be a non-necessary issue, and if not, a potential solution is to remove the automatically line break functionality.

If this functionality is still considered necessary for some reason, consider the fact that one may not want this functionality in some situation, we may add a preference such as "chars-to-break-html-source-code-on-copying" with 72 (or maybe -1) as default and -1 meaning never break.
(Assignee)

Comment 74

a year ago
That's an enhancement request for another bug.
Probably:

  skip-if = toolkit == 'android'
Flags: needinfo?(enndeakin)
(Assignee)

Comment 76

a year ago
Created attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.
Attachment #8750340 - Attachment is obsolete: true
Attachment #8750754 - Flags: review+
(Assignee)

Comment 77

a year ago
OK, Android test disabled as Neil suggested in comment #75.
Keywords: checkin-needed

Comment 78

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/a35b2ac359e5
Keywords: checkin-needed

Comment 79

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a35b2ac359e5
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Checked in http://hg.mozilla.org/releases/mozilla-esr45/rev/25a069f8002f THUNDERBIRD_45_VERBRANCH
status-thunderbird_esr38: --- → wontfix
status-thunderbird_esr45: affected → fixed
tracking-thunderbird_esr45: ? → 46+
(Assignee)

Comment 81

a year ago
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Approval Request Comment
[Feature/regressing bug #]: Not a regression, was always wrong ;-)
[User impact if declined]: Unwanted extra spaces appear when CJK text is copied.
[Describe test coverage new/current, TreeHerder]: Automated test.
[Risks and why]: I don't consider it risky.
All we do is disable the line-breaker, a technique introduced in bug 1225864 and well tested in Thunderbird since TB 45.

Now talking about risk: Disabling the line breaker did uncover a crash in another code path, so bug 1274837 would need uplift, too.
[String/UUID change made/needed]: None.
Attachment #8750754 - Flags: approval-mozilla-aurora?
Jorg, we have this bug for 10 years, I am not sure we need to take an uplift for Firefox 48.
For Thunderbird, could you use only the relbranch?
(Assignee)

Comment 83

a year ago
We've already advanced that onto a branch for TB 45 ESR. Since so far we don't have release branches for our betas, it would be good to have it uplifted. Yes, broken for 10 years, so even better if it arrives six weeks earlier now ;-)
What we are going to do is to use release branches on mozilla-beta to land bugs such as this that we would like in our betas, for TB 47.0b1 and 48.0b1
Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1
(Assignee)

Comment 86

a year ago
Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Too late now.
Attachment #8750754 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 87

a year ago
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/502af8ab3f06
(Assignee)

Comment 88

a year ago
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/9a5abbdc1fb4
You need to log in before you can comment on or make changes to this bug.