Closed Bug 365805 Opened 13 years ago Closed 10 years ago

Copy/paste lists should not add #

Categories

(Core :: DOM: Serializers, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b1

People

(Reporter: vlad.alexander, Assigned: ehsan)

References

Details

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

When copying lists from Firefox and pasting them into a text editor like Notepad, FF adds a # before each list item. This is incorrect behavior because only content should be pasted.

Reproducible: Always

Steps to Reproduce:
1. In FF, load a Web page that contains a list (ordered or unordered).
2. Select the list using a mouse and copy it.
3. Open Notepad and paste.

Actual Results:  
# followed by a space are added before each list item.

Expected Results:  
Only the contents of each list item should be pasted.
Why do you say this behavior is incorrect?  (Btw, I think it uses "#" for ordered lists and "*" for unordered lists.)
Assignee: nobody → dom-to-text
Component: General → DOM to Text Conversion
Product: Firefox → Core
QA Contact: general
Version: unspecified → Trunk
These characters are not part of the content and are added strictly for visual effect. As a result, this behavior can cause problems when copying content. Let's say I use a list to present a markup example in a article that I am writing. The markup will look like this:

<ol>
	<li>&lt;html&gt;</li>
	<li>&lt;head&gt;</li>
	<li>&lt;title&gt;My title&lt;/title&gt;</li>
	<li>&lt;/head&gt;</li>
	<li>&lt;body&gt;</li>
	<li>&lt;/body&gt;</li>
	<li>&lt;/html&gt;</li>
</ol>

Via CSS I can format this list to render in the browser like this:

<html>
<head>
<title>My title</title>
</head>
<body>
</body>
</html>

When readers of the article copy the example into Notepad, they get:

# <html>
# <head>
# <title>My title</title>
# </head>
# <body>
# </body>
# </html>

So the # characters (or *) corrupts my example.

Here is a Web page that uses lists in this way:

http://xhtml.com/en/xhtml/reference/ol/
Although in your example this behavior is truly very annoying, lists are not to be used this way.
Excerpt from W3C's HTML 4.01 spec, paragraph 10.1:
"Lists may contain:
* Unordered information.
* Ordered information.
* Definitions."

Your example therefore simply misuses lists (for whatever reason) to render a code example. Let me make this clearer: THIS IS TRULY NOT A FF ERROR!

Hence I recommend marking this bug as INVALID.
>lists are not to be used this way.
Where does it say that?

>Lists may contain: ...Ordered information...
That is exactly what code/markup is - ordered information.

Lists are perfect for rendering code/markup examples for the following reasons:

1. They can be visually styled to improve readability. There can be a visual delimiter between each list item and/or each list item can be displayed in different color especially using CSS 3 nth-child() selector.

2. Lists are very easy to navigate using assistive technologies.

Firefox is the only browser that adds content to what is pasted. This is wrong behavior.
Let's step back and take a very simple example. Say a user visits a Web page with a very simple list:

<ul>
	<li>Apples</li>
	<li>Peaches</li>
	<li>Pears</li>
</ul>

Then they copy this list from a Web page Firefox is rendering and paste it into another application like Microsoft Word or a WYSIWYG editor. They get:

# Apples
# Peaches
# Pears

Then the user wants to convert this to a real list in Word or in a WYSIWYG editor. But they are unnecessarily burdened with having to delete # characters. Why?
Duplicate of this bug: 414206
I'm the author of the duplicate bug. When you copy text from a website, I think the most important thing is that you retrieve the copied text as accurate as possible. This is important as copied text is also used for further processing and not only as a first look interpretation means. Please take a look at the CSV example in the testcase of the duplicate bug (414206), it will show why not applying extra formatting is critical for copying text. In that case applying extra formatting will render the CSV data completely useless. Improving readability is fine, but should not be at the cost of functional needs I would say. 

I've tested this in Internet Explorer 7; IE copies the text without applying special formatting (# or *). I think browsers should copy text from the same website similarly. I haven't checked other browsers, but I think firefox is the only one here applying special formatting. This makes firefox unusable for working with certain webpages. I don't actually think it is reasonable for the website owners to change their site; they don't show the the # or * bullets to the users, so why would they expect these to be copied? 
This bug has been reported about one year back. Who will decide how this is to be resolved or are the nay sayers just silencing this into oblivion?

Vlad Alexander clearly demonstrated this error and countered both replies that this is correct behavior. As both people did not post any comment for at least half a year, I must assume they agree with his rebuttal. 

Can the status be changed to confirmed? This is a real problem and does deserve attention.
I agree with Jan. This behaviour is truly to be thought of as a bug vs. feature thing and thus should be discussed by a broader community.

Personally, I prefer this behaviour because it allows for quick copy/paste between Firefox and OpenOffice.org without loosing syntactic information.
However, it can also be _very_ annoying in some situations (as mentioned above e.g. in code listings; see my previous comment what I think about putting code in lists). So to make life easier, let's simply make it configurable. Either as a GUI option or in about:config.
Flags: blocking1.8.1.13?
year-old unconfirmed enhancement requests don't "block" a security update. Please try to get this fixed on trunk. If the patch is small and well-tested then after it's been landed on trunk you can request approval for the branch on the patch.
Flags: blocking1.8.1.13? → blocking1.8.1.13-
I have the same problem with the # marks. i've trying with a colleague to find another solution displaying code with linenumbers and allowing copy&paste of the code without line-numbers. 
first attempt using a 2-col-1-row-table has issues where the linesnumbers are not synchronous with the code if its formatted with italic,bold or underline-style. it seems that the height of text is different. text using these styles seems to be larger (1 or 2 px) so that the effect maybe appears visible after many lines of code (100+) depending on how often the styles are used.
you can check out this by redering some code with GeSHi (using set_header_type(GESHI_HEADER_PRE_TABLE)).
we looked at many other pages. some pages using 3 tables (1 master [2 col 1 row],1 for line-numbers [1 col,x rows] and 1 for code [also 1 cell per line]). this is to much overhead which stresses the webserver with long code-snippets.
it should be fast generating and display :)

if anybody says that lists should not be used for formatting code, please post a working alternative :) currently we found no way to do that without glitches.

to make this "feature" further usable, a option to ENABLE it should be added to preferences/about:config.that means that the default should be (like in other browsers) without these chars. 
you can also change the selecion-behaviour, so that you're able to select the text with or without the dots (like selecting in a 2-col-table). This mybe the user-friedliest version, but more work for implementation. of course, you can also fix the different-height-issue :)
I know, this issue is somewhat older, but nearly the same age as the complaints from users of Syntax Highlighters like GeSHi.

So, for the warmup we take this bug report:
http://sourceforge.net/tracker/index.php?func=detail&aid=1651996&group_id=114997&atid=670231

What's the problem with it: People need highlighted code to show line numbers as navigating source with hundrets of lines without proper line numbers is simply a PITA. k, What possibilities are there:
- Showing line numbers using lists: By far the ONLY method that works WITH EVERY BROWSER visually as well as textually - EXCEPT for Gecko!!!)
- Using multiple tables: By far the most stupid method to do this as it breaks logical structure of line numbers (which are meta-information) and thus should be linked to the line they are referring to - Basiclly the opposit of what you do when using a table for this. Also this method is more than error-prone in regards to formatting.
- Do some other hackage that emulates visual output to look right: Hack even some more to finally get something that isn't semantically anything near a code listing, put some CSS 9000.0-stuff into it and hope that some browser was to interpret it as expected. That's simply foo and nothing more than what #3 complains about as being misuse of standards.

So, you break way number 11, which is, in the sense of the standards the way to go, expect people to fail on way 2 because your beloved feature is omni-potent for everything and offer people to try non-existent way number 3, which allows visual emulation, but doesn't allow for solving the actual problem.

So, let's try way number 2:
http://sourceforge.net/tracker/index.php?func=detail&aid=2046534&group_id=114997&atid=670231

Isn't it nice? You implement it and instead of the solution being cross-browser compatible, you get a mess that is even bigger, than the initial problem you started with. So. Let's combine way 2 with possibility 3 and hope it to work:

http://milianw.de/files/geshi-alignment-merged.html

Wow, it works, but is useless for practical use with automated code generation as this example is by far no real output of GeSHi, but a manually modified version of GeSHi output that is tweaked until it worked - according to some tests in about 30 different browsers.

But that's yet another NO_GO, since that solution only works with external style-sheets, breaking the one half of the websites using GeSHi with inline styles - that have their reason to do so (even though their code is somewhat bigger). The reason for this is simple: You can't generate the .php *-rule of the example without messing up the code far beyond sanity.

So, how to solve this problem without too many disadvantages???

Well, what about this: Allow the website (using one of those many non-standard CSS attributes) to control, what additional markup should be present for text rendering of the site? That option could than take some attributes like:

- none: No additional markup
- default: Let the browser (using about:config) decide on how to render
- auto: Some automatism, which could take additional markup, like code and pre tags, into account when deciding if additional markup should be added
- always: Always add special markup
- ul: Add markup for an unordered list
- ol: Add markup for an ordered list
- custom: use the text of the custom-text attribute to add for text rendering of this list.

This way you even gain some advantage from this broken behaviour, because instead of generally enabling or disabling this function you allow the web-author where this special rendering should be added. This not only solves this issue for the many users of Syntax Highlighters, but also allows you to do more advanced things like creating a documentation with code samples AND use of lists inside this documentation - and still having the simple Copy&Paste export work properly. You don't believe this would be used - The GeSHi documentation in text format has been Copy&Pasted from the HTML version for a long time - with this additional way of specifying things even the code samples could be updated without hassle.

So for the person behind Comment #3: If you still believe, this is misusage of lists: Show me YOUR WAY to do this (automatable!!!) without abusing lists AND working for ALL browsers - AND without breaking usability for handicapped people that are using screen readers!

I must second #5 here: [irony]whom except users of MediaWikis that wanna reauthor their pages as Wiki does this be of help - not to mention how many instances of this actually being done![/irony]

@Phillip Weißenbacher: What do you consider code listings to be, if not ordered information? Please give a detailled description, as well, as a WORKING solution that works with all browsers?

In general you can say: An automatism, that works form some cases, breaks others, but can't be evaded by simple means when better solutions are available than the automatism suggests, is a bug. It's that simple.

Or in other words: If I go to my bank to get money from my account and the automatic teller always gives the paper money in the biggest notes available, even though I'd like it to give them differently (e.g. 3x20 Euro, instead of 50+10 Euro) is destroying usability you just gained from not going to the personell and asking for your money there.

So: Set this issue CONFIRMED already!

Regards,
BenBE.
@BenBE: Thank you for your elaborate argumentation.

Nevertheless I feel a bit misunderstood: I did propose to make this configurable in my second post, although I admit I wasn't clear about what way to go on. My opinion is if other browsers default not to include additional contextual information (e.g. number signs) in copying lists, Gecko should neither.
Whether code is structural information or not and therefore is appropriate to be put into lists is still questionable: For me you're still misusing lists fo their ability to display a number before a line of text, just as one would misuse tables to get an accurate layout 10 years ago. The perfect thing from my POV is to (once again) split up display and content. IMHO the best way to go, would be to wrap code in <pre><code> tags and create a CSS switch to display line numbers. I know this is reverie, but it would be perfect from a designers POV (just as BenBE proposed: power to the people!).

Lists are, nevertheless, a good workaround to achieve the same result, whilst adding cross browser support with only Gecko standing in it's way with a rather (in times of XML we must admit) dull way of adding context sensitive information.

Conclusio: Mark this as a bug, change default behaviour not to include anything but line break delimited text and eventually make it configurable through about:config to allow advanced users/third parties to resort to the old behaviour.

Regrads,
Philipp
Flags: wanted1.9.1?
I agree with Philipp. Adding a CSS property is nice and all, but the solution is way more complicated than the problem (it would take way to long, while we need a fix NOW). It's a huge inconvenience that Gecko applies extra formatting while all other browsers do not. It's time to set Gecko's default mode to no extra formatting (this ensuring compatability among browsers). An extra option could be added to enable the extra formatting. 

So as there is some sort of consensus now, could this be marked confirmed and fixed asap (at least the default behaviour)? It's been two almost two years since this has been reported.
it's not as easy as it sounds :(
if you mean css like the following than we'll having cross-browser-problems (e.g. ie6 does not support these styles). line is a pseudo-html-tag to test counter without changing anything in the layout. this code works in firefox, but not ie6.

body {
  counter-reset: linecounter;
}
line:before {
  content: counter(linecounter) " ";
  counter-increment: linecounter;
}

it seems also not possible to set the width of the counter-text.so you get different indentation of the lines.

regards Frank
it's not as easy as it sounds :(
if you mean css like the following than we'll having cross-browser-problems (e.g. ie6 does not support these styles). line is a pseudo-html-tag to test counter without changing anything in the layout. this code works in firefox, but not ie6.

body {
  counter-reset: linecounter;
}
line:before {
  content: counter(linecounter) " ";
  counter-increment: linecounter;
}

it seems also not possible to set the width of the counter-text.so you get different indentation of the lines.

regards Frank
What I mean is that we should ignore the CSS solution, because it's not a short term realistic solution. Instead, Gecko should by default conform to the behaviour of all other browsers by not applying extra markup when copy / pasting. 

If preferred, extra formatting could be added as an option in about:config.

It should be very easy to change to Gecko default behaviour so that no special formatting is applied.
Assignee: dom-to-text → nobody
QA Contact: dom-to-text
So, the code responsible for this is in nsPlainTextSerializer.  It's easy to remove this behavior.  Jesse told me that this is something that annoys him, so I'm inclined to remove it.
Assignee: nobody → ehsan
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attached patch Patch (v1) (obsolete) — Splinter Review
Attachment #452643 - Flags: review?(jst)
Attached patch Patch (v2)Splinter Review
A slightly better patch, this time with tests!
Attachment #452643 - Attachment is obsolete: true
Attachment #452722 - Flags: review?(jst)
Attachment #452643 - Flags: review?(jst)
Attachment #452722 - Flags: review?(jst) → review+
http://hg.mozilla.org/mozilla-central/rev/7adf796c7fe3
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a6
I think I’ve just come up with the perfect way to solve this (albeit too late it seems): Invent a new pseudo-class :copied which would be applied to selected elements in the event that they are copied to the clipboard. This seems to be the best of both worlds because it allows browser vendors (via the default stylesheet) as well as website authors as well as end user (via user styles) to have control over this functionality.
Firefox would then include something like the following in its default style sheet:
ul > li:copied::before {
	content: "* ";
}
ol > li:copied::before {
    content: "# ";
}
(that’s because I happen to like – and find correct – the previous behaviour, which is why I got to weigh in on this bug in the first place.)

Web sites that use lists to display code listings (which I don’t find too incorrect, semantically) could then easily override this:
ol.code-listing > li:copied::before {
    content: "";
}

The :copied pseudo-class should of course be pushed into the standards bodies and until it is ratified be used as :-moz-copied.

I know this raises some concerns because it makes it easy for web site authors to disallow copying text from their website or to make copied content unusable but please note that this is already possible today with JavaScript (and is unfortunately used by some services like Tynt [see http://daringfireball.net/2010/05/tynt_copy_paste_jerks]).

Instead of the :copied pseudo-class, one could also imagine (ab-?)using the tty media type (or inventing a textonly media type) which Firefox would apply when copying a selection to the clipboard (at least to the unstyled-text-clipboard). The drawbacks on this are – however – rather severe: many web site authors include some of their style sheets as screen even for styles which are decidedly not screen-only and should affect the copied text.

Sorry about the bugspam and about being so late but I really don’t want this feature gone (and had no idea removing it was under consideration) and am just brainstorming some possible solutions that both sides could live with.
Raphael, this is I think something you should suggest on the www-style mailing list.
@ehsan : your patch breaks the mail IMHO

If the plaintext serializer added "*" for lists, it was mainly for the mail, to have a readable text/plain version of an html mail.

I don't know if I should reopen this bug to find a better solution or to create a new bug.

I think you should take care, during the copy paste, of some encoder flags (OutputFormatted ?), or to create a new one to suppress such formatting,
(In reply to comment #24)
> I think you should take care, during the copy paste, of some encoder flags
> (OutputFormatted ?), or to create a new one to suppress such formatting,

I'm not sure if I understand your suggestion here.
Instead of removing completely the code about eHTMLTag_ol and eHTMLTag_li, perhaps a better solution is to run it only if some nsIDocumentEncoder flags are set, although I don't remember exactly which flags exactly are set for the copy/paste and for the mail output.

A thing like that:

else if (type == eHTMLTag_ol) {
  EnsureVerticalSpace(mULCount + mOLStackIndex == 0 ? 1 : 0);
  if (mFlags & nsIDocumentEncoder::OutputFormatted) {
     // here the code you removed
  }
}
...
else if (type == eHTMLTag_li && mFlags & nsIDocumentEncoder::OutputFormatted) {
// here the code you removed
}

etc.

We have to investigate if nsIDocumentEncoder::OutputFormatted is the right flag for mail output and if it is not used for the copy/paste. If it is used by both, perhaps we should create a specific flag which will be set by the mail output.
That sounds fine to me, as long as that flag is not set by default.  Please file a new bug about this.
Blocks: 593758
No longer blocks: 593758
Depends on: 593758
Blocks: 268069
> body {
>   counter-reset: linecounter;
> }
> line:before {
>   content: counter(linecounter) " ";
>   counter-increment: linecounter;
> }
> 
> it seems also not possible to set the width of the counter-text.so you get
> different indentation of the lines.
> 
> regards Frank

actally, by decorating the pseudeo element line:before with inline-block, you can apply a specific width to it, so the indentations align just fine :)
Depends on: 662736
The "fix" for this non-bug is a regression.  It breaks the assumption of "what you select is what you get".  Copy&paste is supposed to get me what I marked, no matter where the text comes from.

Please revert.
Another call to revert. For content with numbered paragraphs the numbers are a vital part
of the text content. Not copying them with the rest of the text renders the copy incorrect.
Copying the numbers as shown would be reasonable.  The old behavior (adding the "#" character, for both numbered lists and invisible lists) was just weird.

Fixing it correctly might require a fix for bug 12460, so please complain in bug 662736, not here.
Blocks: 662736
No longer depends on: 662736
You need to log in before you can comment on or make changes to this bug.