Last Comment Bug 731896 - text copied from html tables with ctrl+select not to be separated by tabs anymore.
: text copied from html tables with ctrl+select not to be separated by tabs any...
Status: RESOLVED FIXED
[qa+]
: regression
Product: Core
Classification: Components
Component: Serializers (show other bugs)
: 13 Branch
: x86 All
: -- normal with 1 vote (vote)
: mozilla14
Assigned To: Henri Sivonen (:hsivonen)
:
Mentors:
data:text/html;charset=utf-8,<body><t...
Depends on:
Blocks: 650784
  Show dependency treegraph
 
Reported: 2012-02-29 18:57 PST by Alice0775 White
Modified: 2012-05-16 06:10 PDT (History)
13 users (show)
hsivonen: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified


Attachments
First attempt at a fix (2.62 KB, patch)
2012-03-05 06:55 PST, Henri Sivonen (:hsivonen)
no flags Details | Diff | Review
WIP part 1 (7.93 KB, patch)
2012-03-19 08:20 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Review
WIP part 2 (8.45 KB, patch)
2012-03-19 08:21 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Review
Export plain text to clipboard directly from the DOM without intermediate HTML (15.07 KB, patch)
2012-03-22 03:40 PDT, Henri Sivonen (:hsivonen)
bugs: review-
Details | Diff | Review
Export plain text to clipboard directly from the DOM without intermediate HTML, emulate old leading space quirk (12.29 KB, patch)
2012-03-27 01:05 PDT, Henri Sivonen (:hsivonen)
bugs: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Review

Description Alice0775 White 2012-02-29 18:57:48 PST
Build Identifier:
http://hg.mozilla.org/mozilla-central/rev/30b4f99a137c
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120229 Firefox/13.0a1 ID:20120229031108

@Rodze reports a bug in http://forums.mozillazine.org/viewtopic.php?p=11783897#p11783897

text copied from html tables with ctrl+select not to be separated by tabs anymore.

Reproducible: Always

1. Start Firefox with new profile
2. Open URL
3. Hold Ctrl and select any cells of the table
4. Copy and paste it anywhere

Actual Results:
  everything is together

Expected Results:
  columns separated by tabs and rows by new lines.



Regression window(m-c)
Works:
http://hg.mozilla.org/mozilla-central/rev/08c0c1440171
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120228 Firefox/13.0a1 ID:20120228070559
Fails:
http://hg.mozilla.org/mozilla-central/rev/dde4e0089a18
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120228 Firefox/13.0a1 ID:20120228092158
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=08c0c1440171&tochange=dde4e0089a18

Regression window(m-i)
Works:
http://hg.mozilla.org/integration/mozilla-inbound/rev/fdd55a46661c
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120226 Firefox/13.0a1 ID:20120227033049
Fails:
http://hg.mozilla.org/integration/mozilla-inbound/rev/8ea9dc2f8570
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120227 Firefox/13.0a1 ID:20120227035949
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fdd55a46661c&tochange=8ea9dc2f8570

Suspected : Bug 650784
Comment 1 Henri Sivonen (:hsivonen) 2012-03-05 06:08:41 PST
So the problem is this:

The change landed in bug 650784 changed clipboard export to use a mechanism that parses an HTML string as if it was a full HTML document and then serializes its body.

This works for full documents, because everyone expects the plain text serialization to omit <title> text which is outside <body>.

This works for most fragments, because almost all the time the fragment context is a body-like context and the parser happily implies <html><head></head><body> which the serializer ignores.

The problem is that when a <tr> or <td> is seen without a <table> first (i.e. the implied fragment context isn't body-like), the HTML parser that runs in the full doc mode discards those tokens.

AFAICT, the ways forward are:
 1) Making plain text clipboard export happen directly by showing the DOM range to the plain text serializer without going through an intermediate HTML string.
or
 2) Implementing the kind of DWIM insertion mode in the HTML tree builder that chooses the first real insertion mode upon seeing the first start tag token.

#2 has already been proposed for other purposes but #1 seems like a more sensible fix considering that we are trying to generate plain text from a DOM.

bz, smaug: Opinions?
Comment 2 Henri Sivonen (:hsivonen) 2012-03-05 06:11:53 PST
Nevermind. Solution #1 seems rather easy and the right thing to do.
Comment 3 Henri Sivonen (:hsivonen) 2012-03-05 06:55:51 PST
Created attachment 602885 [details] [diff] [review]
First attempt at a fix
Comment 4 Henri Sivonen (:hsivonen) 2012-03-15 08:06:09 PDT
Ehsan, the test case for bug 551704 has expectations about <br> at the end of selection and whether the plain text clipboard flavor ends in a line break. How important is it to keep the exact behavior there? Especially considering that the test case expectations for trailing line break in the HTML clipboard flavor and plain text clipboard flavor are different.
Comment 5 Henri Sivonen (:hsivonen) 2012-03-16 01:52:11 PDT
Adding Ehsan to CC for real. Ehsan, see comment 4.
Comment 6 :Ehsan Akhgari (out sick) 2012-03-16 15:29:58 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> Ehsan, the test case for bug 551704 has expectations about <br> at the end
> of selection and whether the plain text clipboard flavor ends in a line
> break. How important is it to keep the exact behavior there? Especially
> considering that the test case expectations for trailing line break in the
> HTML clipboard flavor and plain text clipboard flavor are different.

It is important (see bug 551704 comment 12).  Why do you want to change this behavior?  Note that the expected HTML is compared against innerHTML and the expected text is compared against what gets copied on the clipboard.
Comment 7 Henri Sivonen (:hsivonen) 2012-03-19 05:44:49 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #6)
> It is important (see bug 551704 comment 12).

That comment doesn't really explain why.

> Why do you want to change this behavior?

Due to reasons mentioned in comment 1, I'm changing text plain clipboard export to do a direct DOM to plain text serialization instead of DOM to HTML to plain text. I hoisted the check for invisible BRs from the HTML serializer to nsDocumentEncoder. Still with this change, for reasons I don't know, I get different results for the trailing line break in plain text compared to what the test case for bug 551704. For some cases tested, the trailing line break gets zapped. For some cases it doesn't get zapped.

> Note that the expected HTML is compared against innerHTML and the
> expected text is compared against what gets copied on the clipboard.

Good point.
Comment 8 :Ehsan Akhgari (out sick) 2012-03-19 08:13:53 PDT
Can you please attach both test cases here?
Comment 9 Henri Sivonen (:hsivonen) 2012-03-19 08:20:35 PDT
Created attachment 607173 [details] [diff] [review]
WIP part 1
Comment 10 Henri Sivonen (:hsivonen) 2012-03-19 08:21:12 PDT
Created attachment 607174 [details] [diff] [review]
WIP part 2
Comment 11 Henri Sivonen (:hsivonen) 2012-03-19 08:22:26 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #8)
> Can you please attach both test cases here?

Both? The test case for bug 551704 is on m-c. These patches show the problem when applied.
Comment 12 :Ehsan Akhgari (out sick) 2012-03-19 08:48:09 PDT
OK I'm kind of lost here.  Here's what currently happens with the rationale for it:

If you have an invisible BR at the end of the childlist of a node, innerHTML on its parent returns the <br> at the end, since the serialization algorithm <http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm> does not let us strip it out.  However, for copy and paste, we should strive to paste what the user sees on the screen.  If the user pastes into a plaintext field in an app, and sees the trailing new line, that would be puzzling as the newline was invisible in the source, and it would look like we're adding a linebreak upon paste.

Does this make sense?
Comment 13 Henri Sivonen (:hsivonen) 2012-03-20 00:26:19 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #12)
> Does this make sense?

Yes, but any ideas why my patches don't end up removing the trailing break consistently?

Nominating as tracking, since this is a regression in the Firefox 13 cycle. We should figure out a fix while 13 is still in Aurora.
Comment 14 Alex Keybl [:akeybl] 2012-03-21 14:49:54 PDT
Is bug 650784 critical enough for 13's release that we'd consider taking a forward fix rather than backing it out now?
Comment 15 Henri Sivonen (:hsivonen) 2012-03-22 00:16:08 PDT
(In reply to Alex Keybl [:akeybl] from comment #14)
> Is bug 650784 critical enough for 13's release that we'd consider taking a
> forward fix rather than backing it out now?

It's not critical for 13 release, but it's annoying to back out, because it requires first backing out stuff from Firefox, then from Thunderbird, then some more stuff from Firefox.

Also, this follow-up is ready except for trailing line breaks in the plain text clipboard flavor in some special cases.
Comment 16 Henri Sivonen (:hsivonen) 2012-03-22 03:40:30 PDT
Created attachment 608284 [details] [diff] [review]
Export plain text to clipboard directly from the DOM without intermediate HTML

This should work... Let's see what the tryserver says.
Comment 17 :Ehsan Akhgari (out sick) 2012-03-22 16:19:41 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> (In reply to Ehsan Akhgari [:ehsan] from comment #12)
> > Does this make sense?
> 
> Yes, but any ideas why my patches don't end up removing the trailing break
> consistently?

Nothing jumps at me by just looking over the patch...  Let me know if you want me to debug it for you (I can do that, but I haven't done that so far to make sure that we're not duplicating each others' efforts.  :-)
Comment 18 Henri Sivonen (:hsivonen) 2012-03-23 00:31:53 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #17)
> (In reply to Henri Sivonen (:hsivonen) from comment #13)
> > (In reply to Ehsan Akhgari [:ehsan] from comment #12)
> > > Does this make sense?
> > 
> > Yes, but any ideas why my patches don't end up removing the trailing break
> > consistently?
> 
> Nothing jumps at me by just looking over the patch...  Let me know if you
> want me to debug it for you (I can do that, but I haven't done that so far
> to make sure that we're not duplicating each others' efforts.  :-)

Thanks, but no need anymore. The attached patch works.
Comment 19 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-26 10:12:44 PDT
So this regresses Bug 564737?
Comment 20 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-26 10:15:39 PDT
Comment on attachment 608284 [details] [diff] [review]
Export plain text to clipboard directly from the DOM without intermediate HTML

And that bug has lots of dups.
Comment 21 Henri Sivonen (:hsivonen) 2012-03-26 11:18:02 PDT
(In reply to Olli Pettay [:smaug] from comment #19)
> So this regresses Bug 564737?

How so? That's about mailnews format conversion. This patch decouples clipboard from HTML source to plain text conversion. Aren't the effects of that fix on the clipboard just collateral damage and not a feature?
Comment 22 Henri Sivonen (:hsivonen) 2012-03-27 01:05:14 PDT
Created attachment 609645 [details] [diff] [review]
Export plain text to clipboard directly from the DOM without intermediate HTML, emulate old leading space quirk

This patch emulates the quirk that the leading space is dropped when the selection starts with a space and isn't in a text input, textarea or pre. I filed bug 739537 about removing the quirk.
Comment 23 Henri Sivonen (:hsivonen) 2012-03-31 07:13:14 PDT
Thanks. Landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b39493b4caa7
Comment 24 Ed Morley [:emorley] 2012-03-31 19:15:31 PDT
https://hg.mozilla.org/mozilla-central/rev/b39493b4caa7
Comment 25 Henri Sivonen (:hsivonen) 2012-04-03 00:20:55 PDT
Comment on attachment 609645 [details] [diff] [review]
Export plain text to clipboard directly from the DOM without intermediate HTML, emulate old leading space quirk

[Approval Request Comment]
Regression caused by (bug #): Bug 650784

User impact if declined:

When copying a part of an HTML table, cell boundaries in Firefox 13 wouldn't be represented as tabs in the plain text clipboard flavor, which would have an adverse effect on pasting into spreadsheets that ingest the plain text clipboard flavor as opposed to ingesting the HTML flavor and on pasting into a text editor and running a find and replace for tabs to post-process the data.

Testing completed (on m-c, etc.):

Has baked on m-c for a couple of days without complaints reaching me.

Risk to taking this patch (and alternatives if risky):

This patch makes us export plain text to directly from the DOM when previously we first exported from DOM into HTML, then parsed the HTML and then generated plain text from the output of the HTML parser. A change like this could regress something that our unit tests don't cover. (Indeed, *this* bug happened in the first place, because we didn't have enough test coverage.) However, the plain text serializer is the same as the one previously used after the HTML parsing phase, so that mitigates the opportunity for differences.

The alternatives are shipping Firefox 13 with the regression or backing out changes that were made to HTML to plain text conversion. Backing those changes would be particularly annoying, because Thunderbird relies on the changes and then a further changes were made after the Thunderbird changes. Thus, it would be necessary to first back out stuff from mozilla-central, then to back out stuff from comm-central and then to back out even more stuff from mozilla-central.

I think taking this patch gives less opportunity for accidents than a multi-stage backout from two trees.

String changes made by this patch: None.
Comment 26 Lukas Blakk [:lsblakk] use ?needinfo 2012-04-03 11:52:58 PDT
Comment on attachment 609645 [details] [diff] [review]
Export plain text to clipboard directly from the DOM without intermediate HTML, emulate old leading space quirk

[Triage Comment]
agreed this is better than a) shipping regression b) trying to detangle the html conversion as a backout.  I trust you'll report back once it's baked on Aurora with confirmation of lack of regressions from this.
Comment 27 Henri Sivonen (:hsivonen) 2012-04-03 23:55:42 PDT
Thank you for the approval. Landed:
https://hg.mozilla.org/releases/mozilla-aurora/rev/b0fa8e8ffbc0

I've made a note for myself to report back on the Aurora baking situation on April 12th.
Comment 28 Henri Sivonen (:hsivonen) 2012-04-12 00:33:57 PDT
(In reply to Lukas Blakk [:lsblakk] from comment #26)
> I trust you'll report back once it's baked on
> Aurora with confirmation of lack of regressions from this.

No reports about regressions have reached me. Additionally, I searched for open and duplicate bugs (across all of b.m.o) filed since 2012-04-04 and having "copy", "paste" or "clipboard" in the summary and found no reports of regressions.
Comment 29 Virgil Dicu [:virgil] [QA] 2012-05-16 06:07:58 PDT
Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20100101 Firefox/13.0

Verified on Ubuntu 12.04, Windows 7, Mac Os 10.6 in Firefox 13 beta 3. 
Using the URL: rows are separated by tab and columns by line when copying in a text editor.
Comment 30 Virgil Dicu [:virgil] [QA] 2012-05-16 06:10:27 PDT
(In reply to Virgil Dicu [:virgil] [QA] from comment #29)
> Using the URL: rows are separated by tab and columns by line when copying in
> a text editor.

The other way around actually: columns-tab, rows-line

Note You need to log in before you can comment on or make changes to this bug.