Closed Bug 94176 Opened 23 years ago Closed 22 years ago

Text copied from table with multiple rows has no CR/LF

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: justincwatt, Assigned: mozeditor)

References

()

Details

(Keywords: regression, Whiteboard: [plaintext] fixinhand edt_x3)

Attachments

(1 file, 3 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1)
Gecko/20010607 Netscape6/6.1b1
BuildID:    2001080110

As described on the following page:
http://www.unc.edu/~jwatt/tablecopyproblem.html

...when text is copied out of a table with multiple rows and into a text editor
(or email), there is no CR/LF (carriage return/line feed) between each row,
therefore the table data is copied out of the table as one single line of text.

Reproducible: Always
Steps to Reproduce:
1. Copy text from table on http://www.unc.edu/~jwatt/tablecopyproblem.html
2. Paste in Notepad (or similar text editor) with auto word-wrap off


Actual Results:  Text from multiple table rows is pasted as one (infinitely)
lone line of text

Expected Results:  There should be a CRLF after each row in the paste text
sounds like a serializer issue in Navigator; reassign
Assignee: mjudge → anthonyd
Component: Selection → DOM to Text Conversion
QA Contact: tpreston → sujay
so the request would be that a cr/lf be inserted for every <tr>? SO what would
be the expected behavior for nested tables?
Severity: normal → enhancement
Priority: -- → P4
Whiteboard: [RFE][plaintext]
Target Milestone: --- → Future
This is a regression.  We used to append newlines at the end of a table row.  I
wonder if this is another regression from the noxif landing?  Cc'ing Ben and
Daniel since they worked on the original code and probably want to know about
the regression.

Beth: for nested tables, it won't look like the original.  Nobody ever had time
to implement really smart table serializing code, though we all thought it would
be nice to do some day.  But we did have it looking reasonable for simple
tables, at one point.
Keywords: regression
In nsPlaintextSerializer::DoCloseContainer:

  else if ((type == eHTMLTag_tr) ||
           (type == eHTMLTag_li) ||
           (type == eHTMLTag_dt)) {
    // Items that should always end a line, but get no more whitespace
    EnsureVerticalSpace(0);
  } 

So either EnsureVerticalSpace has broken so that it doesn't actually end the
line any more, or somehow we're not getting to that point (hard to see how that
could be, though, it's the first clause in DoCloseContainer after the check for
body or html).  Ben, Daniel, any idea whether there have been recent regressions
in EnsureVerticalSpace(0)?
Status: UNCONFIRMED → NEW
Ever confirmed: true
> Ben, [...] any idea whether there have been recent regressions
> in EnsureVerticalSpace(0)?

No.

Did you try to add a |printf| to that |if|-block?
I was wondering how it would look, for example if I have a table that is 1x2, 
each cell has text and the 2nd cell has a nested table in it, would that nested 
table be indented in regard to the content of the cell?

|-----|----------------|
|one  | two            |
|     | |------------| |
|     | |more | stuff| |
|     | |------------| |
|-----|----------------|

so would the paste look like this?
one
two
  more stuff

or would it look like this?
one
two more stuff
The algorithm that we had before the regression inserted a newline after every
tr and between tables (whether they were nested or not).  
I'm a little confused about your example since the border goes around "more" and
"stuff" but not around "two", and to get that I think we'd need three levels of
table nesting.

But this is somewhat academic -- the issue is that if we don't put any newlines
into tables when we convert them, we'll get really long lines and all tables
(with more than one row) will be guaranteed hard to read.  If we do put newlines
in (as we used to before the regression) then simple tables will be easy to
read, and complex ones will be no worse.  If we want complex tables to be easy
to read as well, then we need to make it a priority and put some manpower on it.
 (I'd estimate two weeks for something halfway decent, more to do proper
alignment on pages that use tables for formatting).
let me code it out for you:
<table>
<tr>
  <td>one</td>
  <td>two
      <table>
      <tr>
         <td>more</td>
         <td>stuff</td>
      </tr>
      </table>
   </td>
</tr>
</table>

as for the extra added work for complex tables, I would suggest adding a new bug
and push it out to future.
Another related problem... (related to Composer)

When I make a generic 2x2 table in Composer, the generated source looks like
this (pay attention to the <BR>'s:

<table cellpadding="2" cellspacing="2" border="1" width="100%">
  <tbody>
    <tr>
      <td valign="Top">one<br>
      </td>
      <td valign="Top">two<br>
      </td>
    </tr>
    <tr>
      <td valign="Top">three<br>
      </td>
      <td valign="Top">four<br>
      </td>
    </tr>
  </tbody>
</table>

The automatically generated <BR> tags after the data in each <TD></TD> makes the
text copied out of the table look like this:

 one
two
three
four 

Yuck! Does this deserve a Composer-related bug? I wonder if the text copied out
of the table above will look like the following when this bug (bug 94176) is fixed ?

 one
two

three
four 

Also, why is there an initial space before the "one" ?
IMO: the editor adding those br tags is a bug and we shouldn't do that, but
that's a separate issue, not part of this bug.  I'm not sure if Joe has a bug on
that or not.

Offhand, I would expect Beth's sample to output as:
one two
more stuff

because the nested <table> tag is a block element and would cause an
EnsureVerticalSpace when it opens.

And in fact, that's pretty close to what happens now -- the example doesn't
trigger this bug, because there's no <tr> between table rows; the only <tr>
which should show up is complicated by a <table> tag which triggers a newline
all by itself.  I see an extra space between one and two, which I expect is
covered by another bug (I know we have several bugs on extra spaces showing up)
but otherwise the output is what I predicted above.
The bug on being smarter about tables was filed long ago, and is bug 18012,
Support Tables in Plaintext Output.
--> brade
Assignee: anthonyd → brade
I think this is fixed; resolve so QA can test.
Status: NEW → RESOLVED
Closed: 23 years ago
OS: Windows 2000 → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: Future → mozilla0.9.3
As far as I can tell, this bug is still busted in Mozilla 0.9.4, build:
2001091303 and the most recent nightly build: 2001100403.

For example, when text is copied out of this simple html table:

<html>
<body>
<table>
<tr>
<td>1</td>
<td>2</td>
</tr><tr>
<td>3</td>
<td>4</td>
</tr>
</table>
</body>
</html>

it looks like this:
1 2 3 4

and it should look like this:
1 2
3 4

I believe this bug needs to be re-opened, unless I am missing something.
reopen based on recent testing (comments above)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
-->peterv
Assignee: brade → harishd
Status: REOPENED → NEW
Really reassigning.
Assignee: harishd → peterv
Target Milestone: mozilla0.9.3 → ---
Blocks: 103522
Severity: enhancement → normal
Priority: P4 → --
Whiteboard: [RFE][plaintext] → [plaintext]
Dmose was just complaining about this on #mozilla.  And I've been hitting it a
lot trying to copy discussions from chatzilla and paste them into plaintext
editors.  I'm wasting a lot of time re-formatting stuff that we screwed up at
paste time.

I looked into it a little to see if the fix was obvious.  It turns out this
regressed when the "copy encoding" stuff was added, because now <tr> and <td>
are no longer passed along to nsPlaintextSerializer, so the serializer doesn't
know to do the right thing with them.

Adding nsHTMLAtoms::table, nsHTMLAtoms::tr, nsHTMLAtoms::th, and nsHTMLAtoms::td
to the list of tags in nsHTMLCopyEncoder::IncludeInContext fixes table copy, but
has the side effect that now text copied from inside one table cell (e.g. from
the main part of a page which uses tables for formatting) appends a newline
after the end (because the tr/table tags are included and the serializer sees a
block node closing).

We need a way of "including in context" only when the selection *spans* one of
these nodes.  Joe wrote the copy encoder stuff: Joe, how do we do that?  Is that
what the "promoted range"/"promoted point" stuff is about?  (I'm not sure what
they do.)

I'll take this bug for now since it's clear that no one else plans to look at it
and it's making a lot of people's lives miserable.  But I need some guidance
from Joe on the code.
Assignee: peterv → akkana
Haven't had much luck syncing with Joe.  Joe, I think this is basically the same
issue as the extra line breaks in text copied from pre blocks -- in both cases,
we really need to figure out whether the selection crosses a block boundary as
opposed to the selection living wholly inside a block boundary.

I'm willing to put in some work on this since it's sometimes a major impediment
to getting work done, but I don't understand how the current code works.  Can
you offer some pointers?
Assignee: akkana → jfrancis
I'm kinda surprised that the current code doesn't do the right thing here.  I 
just made a table and removed the trailing <br>s in source mode, and 
expiremented.  I see the bug if I select all of the table content by drag-
selecting from *inside* the table.  But I get the desired behavior if I select-
all.  The weird thing is that I thought the html stream we put into the clipboard 
should be identical in those two cases, and only the special data flavor that 
indicated how "deep" in the structure the real selection endpoints were would be 
different.  Since that latter structure is only used in Composers paste code, and 
not by the serializer, I would think these two cases would get identical results 
in the serialier.  But they don't.

Akkana, I can't really help you unless I just investigate this bug myself.  It 
doesn't make sense to me.  The tr's should be getting through.  
Target Milestone: --- → mozilla0.9.9
--> myself, after discussing with Nisheeth.
Assignee: jfrancis → tmutreja
Attached patch Patch to fix it... (obsolete) — Splinter Review
Akkana, as you mentioned above, changes in IncludeInContext() are causing the
regression by adding extra new lines but there seems to be no other way to
catch these tags in Serializer. In the patch I'm trying to work around the
addition of these new lines.
Comment on attachment 69655 [details] [diff] [review]
Patch to fix it...

This does fix the problem of the lacking newlines after <tr>, and looks like a
solid fix to the problem, so I'm giving it r=akkana.  Ideally I hope jfrancis
can take a look at the changes to nsDocumentEncoder.cpp, since he's the expert
on what might break when those lists change (but my guess is that it's fine
when taken with the rest of this patch).

The !mAtFirstColumn added for <td> nodes unfortunately isn't doing its job.  I
made a page that looked like this:

<table>
<tr><td>one<td>two<td>three<td>four
<tr><td>a  <td>b  <td>c    <td>d
</table>

and the first line came out:
onetwothreefour
That's true before this patch, too, so it shouldn't block acceptance of the
patch, but it remains a problem to be fixed.  (I don't know why the patch
didn't fix it -- seems like it should.)

I also noticed another problem (unrelated, probably already covered by a bug
somewhere else):
copy the html source beppe gave, and paste into an external plaintext app (I
did this in order to create an html file for testing) -- you lose the initial
whitespace from most of the lines.  I'm not sure if that's a new problem or not
(but it wasn't caused by this patch -- happens even without the patch).
Attachment #69655 - Flags: review+
Can you please add a comment to the new member variable, describing its purpose?
There are so many variables of this kind already that I think it's needed.

[OT]
> you lose the initial whitespace from most of the lines.

I know that I once filed a bug about that in textareas, not sure about it's
status, though. I see the same behaviour you describe, but not in textareas anymore.
I second Ben's request for a comment describing the new member variable.

Is there a bug for the missing whitespace between table cells? 

The other missing whitespace is probably (misbehaving?) whitespace compression I
would guess.
Attached patch Incorporating above suggestions. (obsolete) — Splinter Review
I see the reason why mAtFirstColumn is not working. It's because we are not
manipulating it to suit our requirements. So I'm replacing it with a check for
"mCurrentLine.IsEmpty()". That solves our purpose.

Also included the description of new member variable in this patch.

Akkana, would you please explain me about which white space are you talking
here:
"copy the html source beppe gave, and paste into an external plaintext app (I
did this in order to create an html file for testing) -- you lose the initial
whitespace from most of the lines."
I tried with the testcase provided by beppe and could not understand this
properly.
Comment on attachment 69841 [details] [diff] [review]
Incorporating above suggestions.

Ah, so mLineBreakIsDue is kind of a lazy line break, too avoid extra ending
lines (which is a problem right now)...

Have you tested it in mail too (sending a table as plain text), where the
formatting flags are a little different.

As for mAtFirstColumn it's only used when doing preformatted output. In the
other case we don't wrap until later so we don't know which column text will
end up at. mCurrentLine is not always used either. For some settings of the
flags it's bypassed and the text is written directly to the output
string/stream.
Thanks Daniel,

Included a new member variable |mIsFirstTD| to nsPlainTextSerializer. It keeps
a track if we are handling first <TD> of a <TR> or <TABLE>. If <TD> is not the
first one, we skip adding a new line character to the current text. Using a
separate member variable makes it independent of any specific flag setting.
Re the plaintext paste problems: look at beppe's comment #8.  Copy that whole
comment.  Paste into a plaintext window.  It pastes like this:
----
let me code it out for you:
<table>
<tr>
<td>one</td>
<td>two
      <table>
<tr>
<td>more</td>
<td>stuff</td>
</tr>
</table>
</td>
</tr>
</table>

as for the extra added work for complex tables, I would suggest adding a new bug
and push it out to future.
----
Notice that all the initial whitespace is lost.

I think this is covered by ftang's bug that jfrancis just took, summary
something like "plaintext copy/paste is very bad".
Status: NEW → ASSIGNED
Whiteboard: [plaintext] → [plaintext][needs sr=]
Comment on attachment 70063 [details] [diff] [review]
Modified to incorporate Daniel's sugestions.

This is a no-go.  You can't make this change to IncludeInContext().  It will
break html paste: copying anything in a table would cause a whole table to be
pasted in Composer.
Attachment #70063 - Attachment is obsolete: true
Attachment #70063 - Flags: needs-work+
What is the expected behavior of HTML paste? Should we not get a table pasted on 
composer? I think we should. 
Copying a word that happens to be in a table should result in a table being pasted.

contributors to this discussion may be interested in reviewing/testing my patch
in bug 98572
That was SUPPOSED to read:

Copying a word that happens to be in a table should NOT result in a table being
pasted.

The changes to nsPlainTextSerializer looks ok though. I assume you have run the
tests (they are part of the tinderbox tests so failing one of them breaks
tinderbox), and sent mail with it, verifying that it looks ok.

If so, r=bratell@lysator.liu.se for the serializer. You can use that r= in
coming patches as long as nsPlainTextSerializer stays the same. I'm very
thankful for all your work. There are a lot of small things to fix, that I've
never gotten to do.
Akkana, I think you're comment #19 is correct.  I have to change the copy code to 
detect block crossings.  Unfortunately, before I can do that I have to import a 
bunch of ws code that I originally wrote for the editor.  The reason is you could 
be "trivially" crossing a block.  For example, you could drag select a bunch of 
text that lives between two lists.  But if you do your drag just right (just 
wrong?) you could actually have the selection starting at the very end of the 
prior list, and ending at the very beginning of the following list.  Then, if I 
detect block crossings at copy time, I will force two single item lists (with 
empty items) to be output: one before your copied text and one after.

This is hardly what the user wanted. 

I'm in the process of working on this issue this week.  Once it is fixed the 
serializer should see table cells and rows, etc, for selections that are inside 
of tables and span table structure.

Assigning to myself pending copy work.  If there remains serialzer work after I 
improve copy I will assign back at that point.
Assignee: tmutreja → jfrancis
Status: ASSIGNED → NEW
*** Bug 125658 has been marked as a duplicate of this bug. ***
for the moment putting in moz1.1.  hoping to bring it back but cant right now due 
to nervousness by the bug gods.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.1
removing myself from the cc list
*** Bug 134776 has been marked as a duplicate of this bug. ***
From bug 134776, this occurs in Chatzilla and is very troublesome to copy and
paste discussions of bugs, etc.  Hope this will increase priority/effort a little.
*** Bug 136830 has been marked as a duplicate of this bug. ***
*** Bug 103522 has been marked as a duplicate of this bug. ***
*** Bug 153858 has been marked as a duplicate of this bug. ***
The trunk is the wave of the future!
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Heh.  I looked into this some more and discovered what is really going on.  The 
problem is that the parser is dropping partial table structure on the floor.  

The docencoder is indeed producing the tr's and td's, and clipboard is receiving 
them.  When the clipboard code tries to convert this to plaintext, it calls the 
parser.  The parser just drops table elements on the floor if it doesnt already 
have a table on it's tag stack.  So we get just the contents of the td's.  

the parser needs to not drop the tr's/td's.  Perhaps there should be a seperate 
mode for the parser where it understands it is just parsing a fragment, instead 
of a document.  The clipboard code could then call the parser in this mode.

Over to parser folks.  CC'ing pinkerton for clipboard.
Component: DOM to Text Conversion → Parser
reassigning
Assignee: jfrancis → harishd
Status: ASSIGNED → NEW
QA Contact: sujay → moied
Boris was thinking about the fragment stuff at some point I believe, adding him
to Cc.
back to me cuz i have a fix.  the patch i am attaching depends on the patch in
bug 159842.
Assignee: harishd → jfrancis
Attachment #69655 - Attachment is obsolete: true
Attachment #69841 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Whiteboard: [plaintext][needs sr=] → [plaintext]
Target Milestone: mozilla1.1beta → M1
Whiteboard: [plaintext] → [plaintext] fixinhand
*** Bug 162488 has been marked as a duplicate of this bug. ***
*** Bug 141862 has been marked as a duplicate of this bug. ***
I don't think M1 is the correct milestone...
Target Milestone: M1 → ---
resetting milestone; M1 is correct milestone if you want jfrancis to fix it soon.
Target Milestone: --- → M1
Oh, sorry. But it would be good if everyone stuck to the same meanings of
milestone fields and other fields. Most(?) teams uses milestone to set the
milestone to aim for, and priority to fine tune the order in which to fix the
bugs in a milestone.

Also, we have people estimating glide paths for releases etc. based on number of
bugs in milestones. Using nonstandard milestones screws these estimates.
Comment on attachment 93106 [details] [diff] [review]
widget patch to tell parser we are parsing fragment

sr=kin@netscape.com
Attachment #93106 - Flags: superreview+
Depends on: 159842
fix landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
Please verify this bug
Whiteboard: [plaintext] fixinhand → [plaintext] fixinhand edt_x3
using the trunk build from 2003040708 on win2K. I created a simple table, viewed
it in the browser, set my mail compose to be plaintext. And then copied & pasted
the data from the browser window to the mail compose window.

testing a simple table:
<table>
<tr>
  <td>one</td>
  <td>two</td>
</tr>

<tr>
   <td>more</td>
   <td>stuff</td>
</tr>
</table>

results in this output:
one   two
more  stuff

If I remove the content 'more' and do the same copy/paste - the ws is preserved.

Status: RESOLVED → VERIFIED
See also bug 236546, same bug when copying table cells/rows/columns with Ctrl.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: