333064 - Copying CJK text inserts unwanted spaces

Reporter

Description

•

18 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1

When I select and copy some text, some gratuitous spaces are also copied.

Reproducible: Always

Steps to Reproduce:
1. Visit the above URL
2. Copy the entire text (by Ctrl+A Ctrl+C, for example)
3. Open any text editor and paste the copied text

Actual Results:  
[The first words of each line]

&#49688; &#49707;&#45800;

&#49548; &#45376;&#45716;

&#44536; &#47084;&#45208;

Expected Results:  
&#49688;&#49707;&#45800;

&#49548;&#45376;&#45716;

&#44536;&#47084;&#45208;

Seungbeom Kim

Reporter

Comment 1

•

18 years ago

The Korean characters got converted to numeric character references (&#nnnnn;) while posting, but still you get the idea.

Phil Ringnalda (:philor)

Comment 2

•

18 years ago

Not quite sure what component this belongs in, but I'll guess DOM to Text for starters. Apparently it's a result of using align="justify", since if I remove that from the first paragraph, that one copies correctly while the remaining justified ones get an extra space.

Assignee: nobody → dom-to-text

Component: General → DOM to Text Conversion

Product: Firefox → Core

QA Contact: general

Version: unspecified → Trunk

Phil Ringnalda (:philor)

Updated

•

15 years ago

Assignee: dom-to-text → nobody

QA Contact: dom-to-text

Seungbeom Kim

Reporter

Comment 3

•

13 years ago

Attached file an example of the problem — Details

substituting the broken link for the example

Seungbeom Kim

Reporter

Updated

•

13 years ago

URL: http://musiphil.bawi.org/test/space.html

Seungbeom Kim

Reporter

Comment 4

•

13 years ago

Can anybody confirm that this is still a problem in Firefox 4, or that this has been fixed? It still persists in Firefox 3.6.16. I don't know why it has stayed UNCONFIRMED so far. Did I have to do something more?

Fanolian

Comment 5

•

13 years ago

Is it the same bug as bug 673667 ?

Seungbeom Kim

Reporter

Comment 6

•

13 years ago

(In reply to Fanolian from comment #5)
> Is it the same bug as bug 673667 ?

I believe so. Thank you for reporting this.

Peter Pin-Guang Chen [:petercpg] (MozTW.org)

Comment 8

•

11 years ago

Confirmed on latest Nightly 21 on Win 7. Seems that CJK characters are effected but pasting to pure-text editors have no problem.

(Pasting to MS Word 2013 with "Keep Text Only" option gives correct result as well)

Status: UNCONFIRMED → NEW

Ever confirmed: true

Kan-Ru Chen [:kanru] (UTC+9)

Comment 9

•

11 years ago

I grabbed the "text/html" clipboard target using this tool https://github.com/intuited/clipcli

The result is here http://people.mozilla.com/~kchen/333064.html

You can find that we added extra newlines in the paragraph. For Latin scripts this has visually no difference because HTML treat newlines as whitespaces. But for CJK scripts this will cause extra whitespace in between characters.

ArvinH

Comment 10

•

11 years ago

Can assign this bug to me?  I want to give it a try!

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

11 years ago

Assignee: nobody → arvin0731

Comment hidden (obsolete)

I'd create another testcase for the bug,

http://jsfiddle.net/7kvGb/
verify on Firefox 22 + OS X 10.8.4

We put some line brake in <p> of an html, while it rendered as a continence paragraph at website, if we copy the rendered content and paste to other place,
the line brake in HTML will turn into a space, and cause un-wanted space character.

Comment hidden (obsolete)

Please ignore my last comment, I found that I'd mis-understand the bug.

Danny Lin

Comment 13

•

10 years ago

Pase the copied rich text from Firefox to an HTML editor that doesn't change the source and then view the source (eg. any HTML element with contenteditable=true), we can see that there is a line break inserted between every 73 Chinese characters, which seems like a feature of Gecko. Hope the arbitary line break can be removed, or at least add an option to allow disabling it, since the space inserted contributes a significant semantic change for Chinese text.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 14

•

10 years ago

Maybe something to look here:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsHTMLContentSerializer.cpp

or

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsPlainTextSerializer.cpp

Kan-Ru Chen [:kanru] (UTC+9)

Comment 15

•

10 years ago

I think the text get wrapped here:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#175-204

And 72 is the default wrap column:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLContentSerializer.cpp#116-121

Danny Lin

Comment 16

•

10 years ago

(In reply to Kan-Ru Chen [:kanru] from comment #15)
> I think the text get wrapped here:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#175-204
> 
> And 72 is the default wrap column:
> 
>  
> http://dxr.mozilla.org/mozilla-central/source/content/base/src/
> nsXMLContentSerializer.cpp#116-121

Thank you very much... Could we add a config setting to overwrite the behavior?

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 17

•

8 years ago

I can reproduce the test case in bug 673667 comment 0. There are vairous comments in this bug talking about line breaks in HTML source code being rendered as white space between CJK characters. That is NOT a bug and certainly not this bug.

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 18

•

8 years ago

Attached file Reduced test case from bug 673667 comment 0 — Details

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 19

•

8 years ago

mMaxColumn = 72 was there before we switch from CVS to Hg!

https://hg.mozilla.org/mozilla-central/annotate/dfff608ebad0/content/base/src/nsHTMLContentSerializer.cpp#l117

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 20

•

8 years ago

XMLSerializer and node.innerHTML/outerHTML are not affected.

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 21

•

8 years ago

Bug 524975 seems to be describing the same issue. I am not sure if we should dup.

Comment 22

•

8 years ago

Attached file Here a simple web page showing the problem. — Details

Looking at bug 524975 and looking at what that bug landed:
http://hg.mozilla.org/mozilla-central/rev/4ae32b827a7f#l1.24
we could possibly use the new serialiser flag introduced in bug 1225864:
OutputDisallowLineBreaking.

ISHIKAWA, Chiaki

Updated

•

8 years ago

Comment 23

•

8 years ago

Attached file a real world Japanese paragraph copy & paste shows the problem. — Details

I am chiming in with my two cent worth.
With FF 45.x, I see unwanted spaces inserted into text when copy&paste operation is performed for Japanese text.
(I suspect this problem manifests for other Asian languages such as Chinese, Korean and Thai. I say "Thai" because I noticed the  Bug 606747 - add extra spaces when copy/paste thai language.)

Here I could create a very clear-cut test case where you can try the
copy&paste of the Japanese text and see where the unintended/unwanted spaces are inserted. (See attached). This is more or less from a real text and not as simple as Jorg's sample. 

The unintended/unwanted spaces are very embarrassing and problematic
in business/formal document and not good for business processing where web interface is often used these days!

The problem was first noticed in Thunderbird HTML message composition, but found to be DOM-related (I think).

See https://groups.google.com/forum/#!topic/mozilla.dev.apps.thunderbird/6juKnWvrm-Y

BTW, this problem happens only with PASTE *WITH* FORMATTING, and not with paste *without* formatting.

TIA

Jorg K (CEST = GMT+2)

Assignee

Comment 24

•

8 years ago

(In reply to ISHIKAWA, Chiaki from comment #23)
> In the pasted text, you will note some extra spaces inserted. Look carefully.
Well, my Korean text is simpler, you don't have to look carefully, it tells you where the spaces are inserted ;-)

ISHIKAWA, Chiaki

Comment 25

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #24)
> (In reply to ISHIKAWA, Chiaki from comment #23)
> > In the pasted text, you will note some extra spaces inserted. Look carefully.
> Well, my Korean text is simpler, you don't have to look carefully, it tells
> you where the spaces are inserted ;-)

Yes, I noticed that :-) I just wanted to show that the space(s) do creep in and may not be so visible in real world document when one is in a rush, but on the receiving side, when one reads a document carefully, it *IS* noticeable and the reader may think the writer/sender is a careless writer (oh well)! 
[I am embarrassed that there is a typo: "embarassing". ]

At least, I made sure that my HTML page passed W3C's validator with flying colours!

Thank you again for looking into this issue.

Jorg K (CEST = GMT+2)

Assignee

Comment 26

•

8 years ago

Hmm, I played around with the encoder flags here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.cpp#102
removing OutputPreformatted (since it makes little sense together with OutputRaw, see:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsIDocumentEncoder.idl#76
(XXXbz How does this interact with OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
and also added OutputDisallowLineBreaking as suggested in comment #22. Nothing helped.

Needs more investigation, starting here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#199
(mWrapColumn = 72;)
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsDocumentEncoder.cpp#1126
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXHTMLContentSerializer.cpp#76
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#119

Looks like the wrap after 72 characters is deeply engrained ;-(

Most likely we'll need to tweak one of these:
https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
	mozilla/dom/base/nsXMLContentSerializer.cpp
119 	mMaxColumn = 72;
122 	mMaxColumn = aWrapColumn;
1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
1739 	if (mColPos >= mMaxColumn) { 
Note that since we have OutputRaw, mDoWrap is false:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLContentSerializer.cpp#114

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 27

•

8 years ago

My other question would be why do we have a "wrap at 72 char" logic at first place? I don't see any spec on HTML/XML serialization mentioning the magic number 72 and/or like "the serialized DOM *may* be wrapped at 72 char" or something.

I don't have access to CVS history to see why it was added either.

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 28

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #26)
> Hmm, I played around with the encoder flags here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsCopySupport.
> cpp#102
> removing OutputPreformatted (since it makes little sense together with
> OutputRaw, see:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsIDocumentEncoder.idl#76
> (XXXbz How does this interact with
> OutputFormatted/OutputRaw/OutputPreformatted/OutputFormatFlowed?)
> and also added OutputDisallowLineBreaking as suggested in comment #22.
> Nothing helped.
> 
> Needs more investigation, starting here:
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#199
> (mWrapColumn = 72;)
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsDocumentEncoder.cpp#1126
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXHTMLContentSerializer.cpp#76
> https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/
> nsXMLContentSerializer.cpp#119
> 
> Looks like the wrap after 72 characters is deeply engrained ;-(
> 
> Most likely we'll need to tweak one of these:
> https://dxr.mozilla.org/mozilla-central/search?q=mMaxColumn
> 	mozilla/dom/base/nsXMLContentSerializer.cpp
> 119 	mMaxColumn = 72;
> 122 	mMaxColumn = aWrapColumn;
> 1345 	mIndent.Length() >= uint32_t(mMaxColumn) - MIN_INDENTED_LINE_LENGTH) {
> 1476 	if (mDoWrap && mColPos + 1 >= mMaxColumn) {
> 1552 	} while ( (!mDoWrap || colPos + length < mMaxColumn) && aPos < aEnd);
> 1739 	if (mColPos >= mMaxColumn) { 
> Note that since we have OutputRaw, mDoWrap is false:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/
> nsXMLContentSerializer.cpp#114

Thank you for document the code path in question. It's very valuable.

As opposed to keep trying these nsIDocumentEncoder flags, after reading the comment and the part of the code you referenced (and the IDL), I have come to the conclusion that we should simply dive into nsXMLContentSerializer, and identify the instruction that adds the new line, and somehow ensure we don't get into that condition when the OutputRaw flag is set. Since, IMHO, the behavior simply violates the description of "OutputRaw" as described in the IDL.

What do you think?

Jorg K (CEST = GMT+2)

Assignee

Comment 29

•

8 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27)
> My other question would be why do we have a "wrap at 72 char" logic at first
> place?
Remember Netscape? The browser/mail client/HTML editor? It's a hang over from the quirky old past. The bug is from 2005, that's about the year Netscape was split into Firefox and Thunderbird. Much of the serialisation code was written to cater for e-mail formatting.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
> What do you think?
Sure, I'll debug it in the next few days. Note that OutputDisallowLineBreaking may be important, since ASCII strings are not broken in the middle, whereas CJK (Chinese, Japanese, Korean) strings can always be broken. I found out the hard way when fixing the "Asian crisis" in Thunderbird (bug 1225864 and others).

(Can I request that you don't quote/reply to entire comments and only reply to a specific line/section. We don't need duplicated content.)

Jorg K (CEST = GMT+2)

Assignee

Comment 30

•

8 years ago

I was right: Disallowing line breaking for CJK text here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/nsXMLContentSerializer.cpp#116
fixes the problem as predicted. I just have to see how to pass it in since there seems to be more than one call to nsXMLContentSerializer::Init(). Can't be long now.

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 31

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #30)
> Can't be long now.

Nice! I was thinking about working on this in my spare hours but I will leave you to it.

(In reply to Jorg K (GMT+2) from comment #29)
> (Can I request that you don't quote/reply to entire comments and only reply
> to a specific line/section. We don't need duplicated content.)

Sure.

Jorg K (CEST = GMT+2)

Assignee

Comment 32

•

8 years ago

Attached patch Possible solution (v1). (obsolete) — Details — Splinter Review

This fixes the problem. Tested on attachment 8748800 [details].

Now comes the usual: Try run and writing a test for it. Here the try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c766e41514f2

Assignee: arvin0731 → mozilla

Status: NEW → ASSIGNED

Jorg K (CEST = GMT+2)

Assignee

Comment 33

•

8 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #28)
> Since, IMHO, the behavior simply violates the description of "OutputRaw"
> as described in the IDL.
"Raw" already worked, but the path that needed fixing wasn't the path that used "raw".

Chiaki-san: I tried the patch in TB and it works great there. No more unwanted spaces pasted.

Jorg K (CEST = GMT+2)

Assignee

Updated

•

8 years ago

Summary: Copying text copies gratuitous spaces → Copying CJK text inserts unwanted spaces

ISHIKAWA, Chiaki

Comment 34

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #33)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #28)
> > Since, IMHO, the behavior simply violates the description of "OutputRaw"
> > as described in the IDL.
> "Raw" already worked, but the path that needed fixing wasn't the path that
> used "raw".
> 
> Chiaki-san: I tried the patch in TB and it works great there. No more
> unwanted spaces pasted.

Dear Jorg, 
Thank you, isn't this great? So once your patch lands in M-C, and C-C picks it up, the TB will see the original issue I saw in TB resolved. How nice.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #27) 
> I don't have access to CVS history to see why it was added either.

BTW, is bonsai no longer available? There used to be a web server that allows us to look int pre-Mercurial CVS history.

Jorg K (CEST = GMT+2)

Assignee

Comment 35

•

8 years ago

Attached patch Possible solution (v1) with test. (obsolete) — Details — Splinter Review

Neil, would you be able to review this?
Ehsan is busy and doesn't want to do reviews and Masayuki will tell me that he doesn't have review powers in dom/base/ (*).

This is making use of the serialiser flag nsIDocumentEncoder::OutputDisallowLineBreaking created for exactly this purpose in bug 1225864 to avoid shredding through CJK strings since they can be broken anywhere for screen display. But they can't be broken if I want to turn DOM into text.

Also, would you like to see another try run with the new test? I have a try run without it in comment #32 and that came out green. The test works locally and it fails locally with the C++ part removed.

(*) On second thought, Masayuki reviewed bug 1225864 which was also in the serialisers in dom/base/ but in bug 233705 comment #53 he said:
  I don't have rights to review dom/base/nsPlainTextSerializer.*
So I don't know.

Attachment #8749166 - Attachment is obsolete: true

Attachment #8749425 - Flags: review?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 36

•

8 years ago

Attached patch Possible solution (v1) with test (v1a). (obsolete) — Details — Splinter Review

Changed the test, the <div> containing the text doesn't need to be editable.

Attachment #8749425 - Attachment is obsolete: true

Attachment #8749425 - Flags: review?(enndeakin)

Attachment #8749535 - Flags: review?(enndeakin)

Neil Deakin

Comment 37

•

8 years ago

Comment on attachment 8749535 [details] [diff] [review]
Possible solution (v1) with test (v1a).

>+  div.focus();

The div isn't focusable so this doesn't do anything.


>+  // Copy to clipboard.
>+  var webnav = SpecialPowers.wrap(window).QueryInterface(SpecialPowers.Ci.nsIInterfaceRequestor)
>+                     .getInterface(SpecialPowers.Ci.nsIWebNavigation)
>+  var docShell = webnav.QueryInterface(SpecialPowers.Ci.nsIDocShell);
>+  var documentViewer = docShell.contentViewer
>+                               .QueryInterface(SpecialPowers.Ci.nsIContentViewerEdit);
>+  documentViewer.copySelection();
>+  var clipboard = SpecialPowers.Services.clipboard;
>+
>+  var clipboardData = SpecialPowers.wrap(getClipboardData("text/html", clipboard));
>+  var value = clipboardData.value.QueryInterface(SpecialPowers.Ci.nsISupportsString).data;
>+

It would be much better to just synthesize a shortcut key event with synthesizeKey() and listen for the copy or paste event than doing this low-level stuff.

The above code might not work on e10s or linux either since it doesn't wait for the native clipboard. You can just use SimpleTest.waitForClipboard instead. I think you want a try run for this, yes.

Attachment #8749535 - Flags: review?(enndeakin) → review-

Jorg K (CEST = GMT+2)

Assignee

Comment 38

•

8 years ago

Attached patch WIP - not working (obsolete) — Details — Splinter Review

Neil, I'm having a hard time changing the test to using waitForClipboard.

This seems to be undocumented and I can't find a good example to copy from.

I copied from
https://dxr.mozilla.org/mozilla-aurora/source/editor/libeditor/tests/test_bug603556.html#27
Note that I don't need the paste, since the problem happens at copy time.

Like this, the test always succeeds, even without the actual code fix (see patch).

I have no idea what clipboard flavour it tests to decide on whether the copy succeeded or failed. Console says:
TEST-PASS | dom/base/test/test_bug333064.html | Clipboard has the given value

So perhaps it was comparing the text/plain flavour.

I've also seen that waitForClipboard can have more parameters(?):
https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/test_bug1170911.html#42
But I wouldn't know how to use them and appending "text/html" as fifth parameter gives:
Timed out while polling clipboard for pasted data.

Also, that must use a function different to the one here:
https://dxr.mozilla.org/mozilla-aurora/source/devtools/client/inspector/test/head.js#723
The first parameter can be a comparison function. But once again, a simple string gets passed in here and that seems to be the text/plain flavour.

<offtopic>
Knowing what the solution to this bug is took me 30 seconds, poking around to see where to put this solution, that is finding the one line in 7 million code lines that make up Mozilla, took me 1-2 hours. And trying to put a test together has taken a multiple of this time and is still not finished.

So you're help to finish off this bug would be much appreciated.
</off-topic>

Flags: needinfo?(enndeakin)

Neil Deakin

Comment 39

•

8 years ago

(In reply to Jorg K (PTO during summer, NI me) from comment #38)
> I've also seen that waitForClipboard can have more parameters(?):
> https://dxr.mozilla.org/mozilla-aurora/source/dom/tests/mochitest/general/
> test_bug1170911.html#42
> But I wouldn't know how to use them and appending "text/html" as fifth
> parameter gives:
> Timed out while polling clipboard for pasted data.
> 

I think it only handles text/unicode, but that should be available for this test, right?

> <offtopic>
> Knowing what the solution to this bug is took me 30 seconds, poking around
> to see where to put this solution, that is finding the one line in 7 million
> code lines that make up Mozilla, took me 1-2 hours. And trying to put a test
> together has taken a multiple of this time and is still not finished.
> 
> So you're help to finish off this bug would be much appreciated.
> </off-topic>

If you're having trouble just check if the test works on try as is and that's fine too. I'd rather avoid direct nsITransferable use if possible but I'd also rather developers not waste time just for tests.

Flags: needinfo?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 40

•

8 years ago

The waitForClipboard seems much simpler, but the clipboard data that is tested or passed to the test is correct even without the code change.

The problem is the "test/html" flavour. That runs trough a faulty serialiser and the CJK string gets chopped.

So is there any way to easily get to the "test/html" flavour in this setup?

Where is the definition of waitForClipboard that takes more than four parameters?

Flags: needinfo?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 41

•

8 years ago

Attached patch Possible solution (v1) with test (v2). (obsolete) — Details — Splinter Review

Here we have the best of both worlds ;-)

Using waitForClipboard and checking the "text/html" manually.

Fails without the code fix, works with the code fix. Added bonus: "text/plain" is also tested. What do you think? Try run coming up.

Attachment #8749535 - Attachment is obsolete: true

Attachment #8749976 - Attachment is obsolete: true

Attachment #8750106 - Flags: review?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 42

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1feca83985b

Jorg K (CEST = GMT+2)

Assignee

Comment 43

•

8 years ago

BTW, at least on Linux this new test runs as part of M(1), and it's green ;-)

Jorg K (CEST = GMT+2)

Assignee

Comment 44

•

8 years ago

OK, M(1) and M-e10s(1) are green everywhere.

Neil Deakin

Comment 45

•

8 years ago

It looks like there is a bug on the last line of waitForClipboard. It passes 'text/unicode' instead of 'requestedFlavor'.

It would be ideal if we could get this to work. Then, you can just check the data in the first argument to waitForClipboard:

waitForClipboard(function (data) { return data == expectedData }, ...

Flags: needinfo?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 46

•

8 years ago

Attached patch Possible solution (v1) with test (v3). (obsolete) — Details — Splinter Review

This test works now. I've I take the C++ hunk out, I get this failure:
Timed out while polling clipboard for pasted data.

I don't get something along the lines of "unexpected value".

You pointed out that
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#973
is questionable. Well, I've changed the
  }, aFailureFn, "text/unicode");
to
  }, aFailureFn, requestedFlavor);
and it didn't make a difference.

In any case, I dumped out the value passed in to the comparison function and it contains <html><body> ... (160 Korean characters) ... etc.

So the test works and the code works although, as mentioned above, if I remove the code change I get a strange test failure that will be hard to understand should it ever arise.

So what do you suggest?

Another try run here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f28f021bd21

Attachment #8750295 - Flags: review?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 47

•

8 years ago

Further to what I said:
I don't even need to pull out the C++ change to create a test failure.
If I change
  return value.includes("html") && value.includes("안".repeat(160));
to
  return value.includes("htmlhuhu") && value.includes("안".repeat(160));
I get a failure with:
Timed out while polling clipboard for pasted data.

Is that expected? I find it mildly confusing.

Jorg K (CEST = GMT+2)

Assignee

Comment 48

•

8 years ago

Hmm, fails on Linux on M(1) and M-e10s(1): Timed out while polling clipboard for pasted data.
Maybe the clipboard is different and doesn't have the "html"??
(And I was silly to run all Mochitests instead of just (1).)

Neil Deakin

Comment 49

•

8 years ago

The way that waitForClipboard works is that it puts some dummy data on the clipboard, calls the supplied initializer function to put what you want on the clipboard, then keeps polling the clipboard until validatorFn returns true.

Without the C++ change, it will fail and time out.

With the change, it should succeed but not necessarily right away.

Without the change I suggested in comment 45, the validatorFn should be passed the text/unicode data, which should succeed even without your changes, as the plain text doesn't have the line-breaking issue. Is this not the case?

Jorg K (CEST = GMT+2)

Assignee

Comment 50

•

8 years ago

(In reply to Neil Deakin from comment #49)
> The way that waitForClipboard works is that it puts some dummy data on the
> clipboard, calls the supplied initializer function to put what you want on
> the clipboard, then keeps polling the clipboard until validatorFn returns
> true.
> Without the C++ change, it will fail and time out.
OK, that explains the time out.

> With the [C++] change, it should succeed but not necessarily right away.
Yes, with the C++ hunk in place, it works on Windows, but apparently not on other platforms, see comment #49 above.

> Without the change I suggested in comment 45, the validatorFn should be
> passed the text/unicode data, which should succeed even without your
> changes, as the plain text doesn't have the line-breaking issue. Is this not
> the case?
Without the change you suggested in comment 45, I see <html><body> ... in the validator function.
The test as published in attachment 8750295 [details] [diff] [review] works locally on Windows (but apparently not on other platforms, see comment #49 above). So I get passed the HTML. Let me dump it out for you:
<html><body>
<!--StartFragment--><div id="korean-text"> ... 160 expected characters ... </div><!--EndFragment-->
</body>
</html>

Jorg K (CEST = GMT+2)

Assignee

Comment 51

•

8 years ago

Yes, the text/unicode flavour doesn't have the line breaking problem. That's what I got when I didn't ask for "text/html" as fifth parameter.

Jorg K (CEST = GMT+2)

Assignee

Comment 52

•

8 years ago

Looking at waitForClipboard again, I can't see an error:
    // First we wait for a known value different from the expected one.
    var preExpectedVal = initialVal;
    SpecialPowers.clipboardCopyString(preExpectedVal);
    wait(function(aData) { return aData  == preExpectedVal; },
         function() {
           // Call the original setup fn
           aSetupFn();
           wait(inputValidatorFn, aSuccessFn, aFailureFn, requestedFlavor); <<===
         }, aFailureFn, "text/unicode");
}

The first wait call passes in some initial value, a simplified comparison function and selects the "text/unicode" flavour. The subsequent inner call does request the correct flavour.

In any case, my test isn't working on Linux, maybe also not on Mac, most likely because the only Windows has the <html><body> stuff, as we know from other bugs and as can be seen here:
https://dxr.mozilla.org/comm-central/source/mozilla/dom/base/test/test_bug166235.html#74

In conclusion: I will cancel the try run and rerun it with a different test.

Neil Deakin

Comment 53

•

8 years ago

OK, I looked in more detail at this code. The waitForClipboard function is actually correct as is; it is just very confusing code. The 'text/unicode' reference is only for the initial dummy data check. For other checks it does indeed use the passed in flavor. So ignore that change I suggested.

Note that on Mac, I just get as the passed data:

<div id="korean-text">안안...</div>

You probably want to check specially for the div element with 'korean-text' as an id.

Jorg K (CEST = GMT+2)

Assignee

Comment 54

•

8 years ago

Attached patch Possible solution (v1) with test (v4). (obsolete) — Details — Splinter Review

Test that should work on all platforms.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=458299fbc89e

Attachment #8750106 - Attachment is obsolete: true

Attachment #8750295 - Attachment is obsolete: true

Attachment #8750106 - Flags: review?(enndeakin)

Attachment #8750295 - Flags: review?(enndeakin)

Attachment #8750328 - Flags: review?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 55

•

8 years ago

Wires crossed here. I'll do another one.

Jorg K (CEST = GMT+2)

Assignee

Comment 56

•

8 years ago

Attached patch Possible solution (v1) with test (v5). (obsolete) — Details — Splinter Review

OK, here we have a test that checks for "korean-text" (that proves that we got the HTML data) and the string itself.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0fc0798bbf4

Attachment #8750328 - Attachment is obsolete: true

Attachment #8750328 - Flags: review?(enndeakin)

Attachment #8750337 - Flags: review?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 57

•

8 years ago

Attached patch Possible solution (v1) with test (v5b). (obsolete) — Details — Splinter Review

(this time without the dump().)

Attachment #8750337 - Attachment is obsolete: true

Attachment #8750337 - Flags: review?(enndeakin)

Attachment #8750340 - Flags: review?(enndeakin)

Neil Deakin

Updated

•

8 years ago

Attachment #8750340 - Flags: review?(enndeakin) → review+

Jorg K (CEST = GMT+2)

Assignee

Comment 58

•

8 years ago

Thanks Neil!

Dear Sheriff, try run in comment #56.

Keywords: checkin-needed

Jorg K (CEST = GMT+2)

Assignee

Comment 59

•

8 years ago

Kent, this is a patch to fix unwanted spaces in copy/paste of CJK text.
It uses the same technique employed to fix these problems in e-mail. In fact, it builds on our solution from bug 1225864 (see comment #22). So our tweak has finally proven useful for FF as well ;-)

Now that we can add tracking flags to M-C bugs, do you still want to duplicate the bug like we did before?

I recommend inclusion in the next TB 45.x release, the patch has no risk since - as I said - it uses a well understood technique. Chiaki alerted me to the problem on dev-apps-thunderbird@lists.mozilla.org.

status-thunderbird_esr45: --- → affected

tracking-thunderbird_esr45: --- → ?

Flags: needinfo?(rkent)

Kent James (:rkent)

Comment 60

•

8 years ago

We don't need to create separate TB bugs if we can set the needed flags in Core bugs. This should have a reasonable chance of uplifting into TB branches - but it should first make the proposed "release candidate" beta proposed for next week. Unfortunately there is no way for us to run his through TB aurora without uplifting to m-aurora.

Flags: needinfo?(rkent)

Pulsebot

Comment 61

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3ae676e82fac

Keywords: checkin-needed

Danny Lin

Comment 62

•

8 years ago

What will the copied source be after the patch? I mean, will the copied source code still be line-breaked on every 72 chars? And will it happen on all languages?

Jorg K (CEST = GMT+2)

Assignee

Comment 63

•

8 years ago

This patch affects the "text/html" flavour of the clipboard after copying. The only change that was made is that long CJK strings are now no longer broken since that lead to the insertion of unwanted spaces.

Long non-ASCII string with "European" characters were already not broken, so copying a long string of á was treated differently to a long string of 안.

Otherwise the formatting hasn't changed so there is no need for any concern.

Jorg K (CEST = GMT+2)

Assignee

Comment 64

•

8 years ago

Oh, I see you raised bug 673667. This is exactly the issue we fixed here.

Danny Lin

Comment 65

•

8 years ago

So there will still be a line-break at, say, white spaces if the string exceeds 72 chars?

An additional question is what is the actually rule to determine "CJK string"? Will full-shape symbols, such as full-shape spaces or CJK punctuations (e.g. ，。、；：○『』 ...) be counted as "CJK string" and be non-broken?

Carsten Book [:Tomcat]

Comment 66

•

8 years ago

backed out due to test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27556616&repo=mozilla-inbound

Flags: needinfo?(mozilla)

Jorg K (CEST = GMT+2)

Assignee

Comment 67

•

8 years ago

Yes, CJK punctuation counts as CJK string.

Tomcat, sorry, will disable the text on Android.

Flags: needinfo?(mozilla)

Danny Lin

Comment 68

•

8 years ago

I wonder whether it would be better to add an user preference to determine whether to break the copied HTML source, since that in some circumstances the user may not want the auto line break of HTML source at all.

Jorg K (CEST = GMT+2)

Assignee

Comment 69

•

8 years ago

Neil, which one should it be?

skip-if = toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'b2g' || toolkit == 'android'
skip-if = buildapp == 'mulet' || buildapp == 'b2g' || toolkit == 'android'
skip-if = os == 'android'

Or

skip-if = (os != 'linux') && (os != 'mac') && (os != 'windows')

Could you pre-approve a patch with the line of your choice added ;-) ?

Jorg K (CEST = GMT+2)

Assignee

Updated

•

8 years ago

Flags: needinfo?(enndeakin)

Danny Lin

Comment 70

•

8 years ago

... Or a preference to customize the length to break the string instead of using a magic number 72.

Pulsebot

Comment 71

•

8 years ago

Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/620a79a39975

Jorg K (CEST = GMT+2)

Assignee

Comment 72

•

8 years ago

Danny: We're fixing the serialisation of CJK strings in this bug, that is, turning the browsers document object model (DOM) into text.

There is no option required. Besides: What would you like this option to be called?
erroneously-insert-spaces-into-copied-cjk-strings with default set to 'false'?

If you have concerns, please wait until this bug has landed, then try a Nightly compile, and if you have a problem, raise another bug.

This bug here is done, only landing the patch caused some internal test failures which we will address.

Danny Lin

Comment 73

•

8 years ago

I agree that fixing this issue does not require adding options.

I was somehow talking another (related) issue, that the feature of automatically line break might be a non-necessary issue, and if not, a potential solution is to remove the automatically line break functionality.

If this functionality is still considered necessary for some reason, consider the fact that one may not want this functionality in some situation, we may add a preference such as "chars-to-break-html-source-code-on-copying" with 72 (or maybe -1) as default and -1 meaning never break.

Jorg K (CEST = GMT+2)

Assignee

Comment 74

•

8 years ago

That's an enhancement request for another bug.

Neil Deakin

Comment 75

•

8 years ago

Probably:

  skip-if = toolkit == 'android'

Flags: needinfo?(enndeakin)

Jorg K (CEST = GMT+2)

Assignee

Comment 76

•

8 years ago

Attached patch Possible solution (v1) with test (v5b) and Android test disabled. — Details — Splinter Review

Attachment #8750340 - Attachment is obsolete: true

Attachment #8750754 - Flags: review+

Jorg K (CEST = GMT+2)

Assignee

Comment 77

•

8 years ago

OK, Android test disabled as Neil suggested in comment #75.

Keywords: checkin-needed

Pulsebot

Comment 78

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a35b2ac359e5

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 79

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a35b2ac359e5

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox49: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49

Kent James (:rkent)

Comment 80

•

8 years ago

Checked in http://hg.mozilla.org/releases/mozilla-esr45/rev/25a069f8002f THUNDERBIRD_45_VERBRANCH

status-thunderbird_esr38: --- → wontfix

status-thunderbird_esr45: affected → fixed

tracking-thunderbird_esr45: ? → 46+

Jorg K (CEST = GMT+2)

Assignee

Comment 81

•

8 years ago

Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Approval Request Comment
[Feature/regressing bug #]: Not a regression, was always wrong ;-)
[User impact if declined]: Unwanted extra spaces appear when CJK text is copied.
[Describe test coverage new/current, TreeHerder]: Automated test.
[Risks and why]: I don't consider it risky.
All we do is disable the line-breaker, a technique introduced in bug 1225864 and well tested in Thunderbird since TB 45.

Now talking about risk: Disabling the line breaker did uncover a crash in another code path, so bug 1274837 would need uplift, too.
[String/UUID change made/needed]: None.

Attachment #8750754 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Comment 82

•

8 years ago

Jorg, we have this bug for 10 years, I am not sure we need to take an uplift for Firefox 48.
For Thunderbird, could you use only the relbranch?

Jorg K (CEST = GMT+2)

Assignee

Comment 83

•

8 years ago

We've already advanced that onto a branch for TB 45 ESR. Since so far we don't have release branches for our betas, it would be good to have it uplifted. Yes, broken for 10 years, so even better if it arrives six weeks earlier now ;-)

Kent James (:rkent)

Comment 84

•

8 years ago

What we are going to do is to use release branches on mozilla-beta to land bugs such as this that we would like in our betas, for TB 47.0b1 and 48.0b1

Kent James (:rkent)

Comment 85

•

8 years ago

Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1

Jorg K (CEST = GMT+2)

Assignee

Comment 86

•

8 years ago

Comment on attachment 8750754 [details] [diff] [review]
Possible solution (v1) with test (v5b) and Android test disabled.

Too late now.

Attachment #8750754 - Flags: approval-mozilla-aurora?

Jorg K (CEST = GMT+2)

Assignee

Comment 87

•

8 years ago

Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/502af8ab3f06

Jorg K (CEST = GMT+2)

Assignee

Comment 88

•

8 years ago

Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/9a5abbdc1fb4

an example of the problem 13 years ago Seungbeom Kim 1.44 KB, text/html		Details
Reduced test case from bug 673667 comment 0 8 years ago Tim Guan-tin Chien [:timdream] (please needinfo) 1.20 KB, text/html		Details
Here a simple web page showing the problem. 8 years ago Jorg K (CEST = GMT+2) 758 bytes, text/html		Details
a real world Japanese paragraph copy & paste shows the problem. 8 years ago ISHIKAWA, Chiaki 6.92 KB, text/html		Details
Possible solution (v1). 8 years ago Jorg K (CEST = GMT+2) 1.13 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test. 8 years ago Jorg K (CEST = GMT+2) 5.19 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v1a). 8 years ago Jorg K (CEST = GMT+2) 5.14 KB, patch	enndeakin : review-	Details \| Diff \| Splinter Review
WIP - not working 8 years ago Jorg K (CEST = GMT+2) 4.07 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v2). 8 years ago Jorg K (CEST = GMT+2) 5.15 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v3). 8 years ago Jorg K (CEST = GMT+2) 4.10 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v4). 8 years ago Jorg K (CEST = GMT+2) 4.02 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v5). 8 years ago Jorg K (CEST = GMT+2) 4.13 KB, patch		Details \| Diff \| Splinter Review
Possible solution (v1) with test (v5b). 8 years ago Jorg K (CEST = GMT+2) 4.11 KB, patch	enndeakin : review+	Details \| Diff \| Splinter Review
Possible solution (v1) with test (v5b) and Android test disabled. 8 years ago Jorg K (CEST = GMT+2) 4.14 KB, patch	jorgk-bmo : review+	Details \| Diff \| Splinter Review