Open Bug 1273836 Opened 8 years ago Updated 4 months ago

Copy-Paste of simple HTML adds extra new lines that are not present in original content

Categories

(Core :: Layout: Text and Fonts, defect)

46 Branch
defect

Tracking

()

UNCONFIRMED
Webcompat Priority P3

People

(Reporter: rafee, Unassigned)

References

Details

(Whiteboard: [webcompat])

Attachments

(3 files)

Attached image 695.png
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160502172042

Steps to reproduce:

We are using wysiwyg-editor for html  editor/formatter in our web-application.  Content is copied from site http://www.zerobelly.com/slides/5-teas-melt-fat#2 and pasted in html editor/formattor. Alert the content inside the div.


Actual results:

Extra new lines(empty) are added that are not part of the original content.

This is not happening in IE or Google chrome.



Expected results:

Original content with style should have copied to editor/formattor.
How to reproduce the issue? Do you have a free online HTML editor to test?
Flags: needinfo?(rafee)
(In reply to Loic from comment #1)
> How to reproduce the issue? Do you have a free online HTML editor to test?

1)Login to www.dreamstream-qa.com 
Credentials as follows
 user name : rafee@10kinfo.com
 password  : 111111

2)Click on second tab My Photos

3)Click on Add Photo/Video

4)Browse any picture from the machine  which leads a text editor to appear.

5)In this text editor paste the content from http://www.zerobelly.com/slides/5-teas-melt-fat#2 which is the description of any image including title also.

Eg:
 BARBERRY TEA
Blocks fat cells from growing

The stem, fruit and root bark of the barberry shrub contains berberine–a powerful, naturally occurring, fat-frying chemical. A study conducted by Chinese researchers revealed that berberine can prevent weight gain and the development of insulin resistance in rats consuming a high-fat diet. Previous studies have also found that consuming the plant can boost energy expenditure and help decrease the number of receptors on the surface of fat cells, making them less apt to absorb incoming sources of flubber. Pair it with this definitive plan of 14 ways to turn on your get-lean genes to slim down fast.

6)Click on save button.

7) Browse http://dreamstream-qa.com/profile/photo/ggggggggggggg-20351/

8) The first photo will be the one that uploaded just now.

9) Problem is with the description of photo.

10) Same steps are not creating any additional new lines in google chrome or IE.
I couldn't test the issue because URL www.dreamstream-qa.com giving me "The connection has timed out" error every time.

I created simple test case with the exact content you mentioned (from http://www.zerobelly.com/slides/5-teas-melt-fat#2) inside div with Sublime text editor and opened in the Firefox 46.0.1. I don't see any extra new lines. It's getting rendered as expected.
(In reply to Kanchan Kumari QA from comment #3)
> I couldn't test the issue because URL www.dreamstream-qa.com giving me "The
> connection has timed out" error every time.
> 
> I created simple test case with the exact content you mentioned (from
> http://www.zerobelly.com/slides/5-teas-melt-fat#2) inside div with Sublime
> text editor and opened in the Firefox 46.0.1. I don't see any extra new
> lines. It's getting rendered as expected.

Apologies.
This is QA server and it usually down from 9.30AM to 8.00 PM EDT for of organization internal reasons.
Use the following URL and credentials.
www.dreamstream.com
user name: rex@10kinfo.com

password : search

This URL is production URL and it never will be down.
(In reply to rafee from comment #2)
> (In reply to Loic from comment #1)
> > How to reproduce the issue? Do you have a free online HTML editor to test?
> 
> 1)Login to www.dreamstream-qa.com 
> Credentials as follows
>  user name : rafee@10kinfo.com
>  password  : 111111
> 
> 2)Click on second tab My Photos

I don't see this tab on the website.
Loic, My Photos tab is under My Profile.

Hello Rafee, I am able to duplicate this issue on 
Version 	49.0a1
Build ID 	20160526082533
User Agent 	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0

I do see extra space getting added while rendering on Firefox. On chrome, no such issue. 
I am marking this defect as "New" and setting the component so that a developer can look into it.
Component: Untriaged → Layout
Product: Firefox → Core
This bug is trivially reproducible in Treeherder as follows:

* Go to: https://treeherder.mozilla.org/logviewer.html#?job_id=70144325&repo=mozilla-inbound (this link will expire in 4 months, but you can just go to https://treeherder.mozilla.org/, click on any green symbol, and click the log button)
* Select 4 or so lines to copy
* Paste them in another text editor window
* You will see a bunch of newlines where there shouldn't be.

Jet, is there any chance this could be fixed? This keeps on coming up with respect to developers using Treeherder, but I'd be surprised if this was the only web site which exhibited this problem.
Flags: needinfo?(rafee) → needinfo?(bugs)
Component: Layout → Editor
(In reply to William Lachance (:wlach) from comment #7)
> This bug is trivially reproducible in Treeherder

The Treeherder logs do have 2 linebreaks per line, one each for the paragraph start and end. They are styled with CSS to collapse the margins between lines, like so:

<!DOCTYPE html>
<style>
#log p {
	margin:0
}
</style>
<body>
<div id="log">
<p class="highlight"><a id="1">1</a><span>[taskcluster 2017-01-22 20:29:59.837Z] Task ID: AtQ9eLx4ROCAD2q5ro_TCg
</span></p>
<p class="highlight"><a id="2">2</a><span>[taskcluster 2017-01-22 20:29:59.837Z] Worker ID: i-071e4f640074045fe
</span></p>
</div>

Comment out margin:0 in the style block to see this. Blink & Webkit don't serialize the collapsed space characters, but Gecko does. 

jfkthame: can you locate what the relevant specs say about this?
Flags: needinfo?(bugs) → needinfo?(jfkthame)
Component: Editor → Layout: Text
tl;dr - AFAIK, I don't think there are any specs for this; exactly how rich-content (HTML) copy/paste behaves is a UA implementation detail.

Longer version:

I'm not sure the treeherder example actually shows excessive line-breaks in the pasted content. AFAICS, what you're seeing is that each line is a separate <p> element, and the default style for <p> includes top/bottom margins. Result: there's what looks like a blank line between each log line.

The reason you don't see the same thing when copying the treeherder log lines with Chrome is that when it puts HTML content on the clipboard, it includes a bunch of style properties (as inline style attributes on each element), so as to preserve more of the styling of the source. This includes the "margin: 0px" property on the <p> elements, so the default top and bottom paragraph margins are suppressed.

I'll attach examples of the HTML that Firefox and Chrome each put on the clipboard when copying a few lines from such a log view. Viewing the examples as standalone HTML files makes it clear how different they are: the Chrome version includes the yellow highlighting of the log output, for example, whereas the Firefox version doesn't. OTOH, the Firefox version (because it includes a parent <code> element) appears in a monospaced font, while Chrome's HTML copy doesn't include parent elements, and does not include 'font' in the embedded styling, and so it appears in the browser's default (proportional) font.

Which is "better" is difficult to say, in general. I think there are valid use cases for attempting to preserve as much styling as possible -- kind of like Chrome does, although the fact that it doesn't preserve font-family seems like a shortcoming -- but there are also valid use cases for preserving the HTML element structure but *not* copying the styling of those elements from the source, so that they'll adopt the appropriate styling for such elements in the destination.
Comparing the Firefox and Chrome results in the attachments above, we can see that the major difference is not one of preserving or collapsing whitespace, but of whether or not CSS properties (such as 'margin') are embedded in the copied data, so that they travel with the pasted elements.
(In reply to Jonathan Kew (:jfkthame) from comment #12)
> Comparing the Firefox and Chrome results in the attachments above, we can
> see that the major difference is not one of preserving or collapsing
> whitespace, but of whether or not CSS properties (such as 'margin') are
> embedded in the copied data, so that they travel with the pasted elements.

When comparing the clipboard contents for Firefox and Chrome on my Mac (Finder > Edit > Show Clipboard) I do see serializing differences. I can also reproduce this by pasting from Treeherder to a plaintext editor (Sublime in my case.) Do the inline styles affect the plaintext conversion as well?
Flags: needinfo?(jfkthame)
The most obvious difference I see between Firefox and Chrome plain-text copy is that Chrome is including the line numbers that appear to the left of each "paragraph" in the treeherder log (as separate lines in the copied text):

    [taskcluster 2017-01-18 21:50:32.492Z] Task ID: GOmUNb9hQ2mCfBhh5dDVnA
    2
    [taskcluster 2017-01-18 21:50:32.492Z] Worker ID: i-0ff89882a56037929
    3
    [taskcluster 2017-01-18 21:50:32.492Z] Worker Group: us-east-1c
    4
    [taskcluster 2017-01-18 21:50:32.492Z] Worker Node Type: m4.4xlarge
    5

With Firefox, OTOH, the line numbers are not present:

    [taskcluster 2017-01-18 21:50:32.492Z] Task ID: GOmUNb9hQ2mCfBhh5dDVnA
    
    [taskcluster 2017-01-18 21:50:32.492Z] Worker ID: i-0ff89882a56037929
    
    [taskcluster 2017-01-18 21:50:32.492Z] Worker Group: us-east-1c
    
    [taskcluster 2017-01-18 21:50:32.492Z] Worker Node Type: m4.4xlarge

The reason the line numbers aren't included, I believe, is because they're styled with -moz-user-select:none and so they don't become part of the selection. But yes, there are blank lines in between each of the log lines.

My current guess -- without trying to find it in our code yet -- is that when we generate the plain-text representation, we deliberately put an extra newline after <p> elements to provide separation comparable to the default HTML styling of <p> (with top/bottom margins). If I change the <p> wrappers that treeherder puts around each log line into <div> wrappers instead, we no longer add the blank lines.

Minimal testcase:

    data:text/html,<p>a</p><p>b</p>

renders with a "blank line" (actually the default margin) between the two lines, and copy/pastes to plaintext as

    a

    b

in Firefox, with a double newline after the "a".

For comparison:

    data:text/html,<div>a</div><div>b</div>

renders with no extra space between the lines, and copy/pastes as expected:

    a
    b

Interestingly, Chrome's behavior is to try and make the plain-text representation follow the current styling of the <p> element. So:

    data:text/html,<p>a</p><p>b</p>

will copy/paste with a blank line after "a", whereas:

    data:text/html,<style>p{margin:0}</style><p>a</p><p>b</p>

will copy/paste without the blank line. Their rule seems to be that if the vertical margin between the elements is >= 0.5em, then they'll insert a blank line in the copied data. (But only ever a single blank line, even if the margin is huge.)

Is there a spec for this? I'm not sure offhand... I'll poke around a bit more. There's the Clipboard APIs spec[1], but that doesn't give any guidance here, it just says things like "Implementations should create alternate text/html and text/plain clipboard formats when content in a web page is selected" with no details as to what those alternate formats should contain. The HTML spec doesn't seem to address it (or if it does, I failed to find it).

There have been discussions from time to time[2,3] regarding how HTML "should" be converted to plain-text, which is required for various purposes (e.g. innerText) in addition to copy/paste. But AFAIK it's not currently standardized.


[1] https://www.w3.org/TR/clipboard-apis/#the-copy-action
[2] https://discourse.wicg.io/t/css-plain-text-conversion/976
[3] https://github.com/whatwg/compat/issues/5#issuecomment-145406699
Flags: needinfo?(jfkthame)
FWIW, here's the code that ensures we have a blank line before the contents of a <p> element when serializing to plain-text:

https://dxr.mozilla.org/mozilla-central/rev/8ff550409e1d1f8b54f6f7f115545dbef857be0b/dom/base/nsPlainTextSerializer.cpp#581-582

This is done unconditionally, just based on the fact that it's a <p> tag; we don't consider its style.
More generally, Firefox serializes the underlying data in a consistent way, regardless of how it is currently styled for presentation, while Chrome tries to generate a serialization that reflects the rendered presentation (to some extent; obviously what can be represented in a plain-text form is very limited).

Another example of this distinction is that if text-transform is applied, copy/paste from Chrome will give the transformed data, while Firefox will use the original text. (This has been discussed in the CSS WG in the past, but I don't recall any definitive conclusion being reached.)

Another somewhat interesting example:

    data:text/html,<style>p{display:inline;}</style><p>one</p><p>two</p>

Here, Firefox will still copy to plain-text as two paragraphs, reflecting the underlying elements (despite their styling):

    one

    two

whereas Chrome will see that the <p> elements are no longer block frames at all, and copy as:

    onetwo

which is of course what the rendered presentation looks like (despite the underlying structure).

However, if we add margins to the elements so that they're visually separated:

    data:text/html,<style>p{display:inline;margin:1em;}</style><p>one</p><p>two</p>

Chrome will still copy the text as

    onetwo

without any separation (unlike its behavior with vertical margins on block elements, where it does introduce a blank line). So in this case it doesn't attempt to make the plain-text a serialization of the visual presentation.

I don't think there's any simple answer to which behavior is "right": both have their uses, and both have their shortcomings.

In principle, a site should be able to use the HTML copy event to customize this behavior, though I haven't tried this to see how well it works in practice.
Recently ran into this - we have an internal support page that outputs code fragments for easy copy/paste into a terminal. FF is especially egregious with the extra lines it adds when copying HTML, especially when inside a <pre> or <code> block where you'd expect it to just copy the text and not have styling affect the copied text.

For example, with a bootstrap-styled page:
http://getbootstrap.com/docs/3.3/css/#code-block

In the Basic Block example, there's a code example `<p>Sample text here...</p>`

Triple-clicking on the word "Sample" will highlight the entire line of text, and when pasting will yield different results in different browsers:


1. FF 61.0.1 (64-bit)
```


<p>Sample text here...</p>


```
Text is surrounded with 2 blank lines.

2. Chrome 67.0.3396.99 (Official Build) (64-bit)
```
<p>Sample text here...</p>

```
Adds a trailing line break.

3. Microsoft Edge 40.15063.674.0
```
<p>Sample text here...</p>
```
Single line, no break.

In my opinion, I'd consider the Edge behavior is the most "correct", as text copied from a <pre> or <code> block should just copy as-is. Chrome is passable with that single line break in there. But FF seems a bit bizarre with the four extra lines.
Coming from https://webcompat.com/issues/18492

Another example of code construct which is interpreted differently by chrome and Firefox.
In https://golang.org/src/errors/errors.go?s=293:320#L1

<pre>…
<span id="L7" class="ln" data-content="     7">&nbsp;&nbsp;</span>
<span id="L8" class="ln" data-content="     8">&nbsp;&nbsp;</span><span class="comment">// New returns an error that formats as the given text.</span>
<span id="L9" class="ln" data-content="     9">&nbsp;&nbsp;</span><span class="selection">func New(text string) error</span> {
<span id="L10" class="ln" data-content="    10">&nbsp;&nbsp;</span>	return &amp;errorString{text}
<span id="L11" class="ln" data-content="    11">&nbsp;&nbsp;</span>}
<span id="L12" class="ln" data-content="    12">&nbsp;&nbsp;</span>
…
</pre>


Copy and Paste 
In Firefox to text editor:

// New returns an error that formats as the given text.

func New(text string) error {

	return &errorString{text}

}

In Chrome to text editor:

// New returns an error that formats as the given text.
func New(text string) error {
	return &errorString{text}
}
Flags: webcompat?
Whiteboard: [webcompat]

I also can confirm the bug! Having it now since a while after some of the last thunderbird more recent updates. I'm also using a HTML-Signature. I'm not shure if the bug has to do with that fact?!

I only can say it worked once without hassle while having same signature/html.

I also noticed the bug occurs while trying to select (focus) an inserted image in mail-body. The focus immediately is removed so that it's nearly impossible to resize image. It's hardly possible to get 'edit properties' by doing very fast double-click on image.

Every time trying to simply select the inserted image-element a new-line is inserted above on text-body, which is very very annoying!

To my own surprise... re-trying to find a solution on my own I suddenly found one just a minute after posting here :-D

See:
https://support.mozilla.org/de/questions/1183434

The solution posted by user n4mwd there is approved to work!

Try adding this boolean preference in the config editor.

mail.compose.attach_http_images

…and make sure it is set to true.

... well okay. For my sake this only fixes the problem partly. I can now select images without hassle. But the annoying new-line issue is still there, as I noticed after some more practice testing.

Migrating Webcompat whiteboard priorities to project flags. See bug 1547409.

Webcompat Priority: --- → ?

See bug 1547409. Migrating whiteboard priority tags to program flags.

Webcompat Priority: ? → revisit
Flags: webcompat?

Extra blank lines (random number of them) are added also in the Vk.com Article Editor. Steps to reproduce:

  1. Copy some text from any website (I copy from my own WordPress blog);

  2. Create new article and paste text.

I confirm seeing this using Firefox nightly 78.0a1 (2020-05-25) (64-bit) on linux. It started a few weeks ago, unfortunately I don't recall exactly when - at the time I took a wait-it-will-be-fixed approach ... it seems to be a regression.

How to demonstrate. Go to any Wikipedia page -

  1. copy 2 text paragraphs -
  2. paste into text editor such as vim on linux and the single line between the 2 paragraphs gets converted into 3 newlines.
  3. Pasting into a graphical editor such has 'kate' works fine.
  4. Doing the same using Google Chrome - works fine in vim as well as graphical editors.

Best I can tell It's only Firefox that exhibits this behavior for me.

Hope that's helpful.

I can also confirm this problem exists on 78.0.2, it is very easy to reproduce for me and extremely annoying. As an example, every time I copy a web page into my note taking application the newlines are all messed up and If I want the text formatted correctly I have to manually fix them. I ended up opening the pages in Chrome when I want to copy some content, which is really not nice.

Please let me know if I can helps somehow.

I can confirm this bug on Firefox 91. It's very annoying and happens on code blocks with line numbers, like this page for example: https://www.bigbinary.com/learn-rubyonrails-book/linting-and-formatting-code

This bug still bothers me in Firefox 96.0.1 (Arch Linux).

I am seeing this bug even on Firefox 97.0.2 (64-bit) Windows 10 64-bit.

Here's a minimal HTML code with CSS (user-select: none;) that causes this issue: https://jsfiddle.net/kaushalmodi/uvk7gL9x/12/

  1. Open that URL
  2. Hit Run

Copy the top two lines and paste in any editor (Notepad++, Emacs, doesn't matter). You will see that a blank line is inserted between the 2 copied lines when pasted elsewhere.

You can find a GIF in action showing a GitHub issue here.


This issue does not occur on these browsers:

  1. Google Chrome
  2. Microsoft Edge

I was also able to reproduce on Firefox 98.0 (linux) with Kaushal's jsfiddle.

Webcompat Priority: revisit → P3
Severity: normal → S3

Similar issues are present when copying text from Sharepoint Markdown code blocks. I'm using Firefox 102.12.0esr (64-bit) - Windows.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: