Closed
Bug 94176
Opened 23 years ago
Closed 22 years ago
Text copied from table with multiple rows has no CR/LF
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
M1
People
(Reporter: justincwatt, Assigned: mozeditor)
References
()
Details
(Keywords: regression, Whiteboard: [plaintext] fixinhand edt_x3)
Attachments
(1 file, 3 obsolete files)
1.20 KB,
patch
|
kinmoz
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1) Gecko/20010607 Netscape6/6.1b1 BuildID: 2001080110 As described on the following page: http://www.unc.edu/~jwatt/tablecopyproblem.html ...when text is copied out of a table with multiple rows and into a text editor (or email), there is no CR/LF (carriage return/line feed) between each row, therefore the table data is copied out of the table as one single line of text. Reproducible: Always Steps to Reproduce: 1. Copy text from table on http://www.unc.edu/~jwatt/tablecopyproblem.html 2. Paste in Notepad (or similar text editor) with auto word-wrap off Actual Results: Text from multiple table rows is pasted as one (infinitely) lone line of text Expected Results: There should be a CRLF after each row in the paste text
Comment 1•23 years ago
|
||
sounds like a serializer issue in Navigator; reassign
Assignee: mjudge → anthonyd
Component: Selection → DOM to Text Conversion
QA Contact: tpreston → sujay
Comment 2•23 years ago
|
||
so the request would be that a cr/lf be inserted for every <tr>? SO what would be the expected behavior for nested tables?
Severity: normal → enhancement
Priority: -- → P4
Whiteboard: [RFE][plaintext]
Target Milestone: --- → Future
Comment 3•23 years ago
|
||
This is a regression. We used to append newlines at the end of a table row. I wonder if this is another regression from the noxif landing? Cc'ing Ben and Daniel since they worked on the original code and probably want to know about the regression. Beth: for nested tables, it won't look like the original. Nobody ever had time to implement really smart table serializing code, though we all thought it would be nice to do some day. But we did have it looking reasonable for simple tables, at one point.
Keywords: regression
Comment 4•23 years ago
|
||
In nsPlaintextSerializer::DoCloseContainer: else if ((type == eHTMLTag_tr) || (type == eHTMLTag_li) || (type == eHTMLTag_dt)) { // Items that should always end a line, but get no more whitespace EnsureVerticalSpace(0); } So either EnsureVerticalSpace has broken so that it doesn't actually end the line any more, or somehow we're not getting to that point (hard to see how that could be, though, it's the first clause in DoCloseContainer after the check for body or html). Ben, Daniel, any idea whether there have been recent regressions in EnsureVerticalSpace(0)?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 5•23 years ago
|
||
> Ben, [...] any idea whether there have been recent regressions
> in EnsureVerticalSpace(0)?
No.
Did you try to add a |printf| to that |if|-block?
Comment 6•23 years ago
|
||
I was wondering how it would look, for example if I have a table that is 1x2, each cell has text and the 2nd cell has a nested table in it, would that nested table be indented in regard to the content of the cell? |-----|----------------| |one | two | | | |------------| | | | |more | stuff| | | | |------------| | |-----|----------------| so would the paste look like this? one two more stuff or would it look like this? one two more stuff
Comment 7•23 years ago
|
||
The algorithm that we had before the regression inserted a newline after every tr and between tables (whether they were nested or not). I'm a little confused about your example since the border goes around "more" and "stuff" but not around "two", and to get that I think we'd need three levels of table nesting. But this is somewhat academic -- the issue is that if we don't put any newlines into tables when we convert them, we'll get really long lines and all tables (with more than one row) will be guaranteed hard to read. If we do put newlines in (as we used to before the regression) then simple tables will be easy to read, and complex ones will be no worse. If we want complex tables to be easy to read as well, then we need to make it a priority and put some manpower on it. (I'd estimate two weeks for something halfway decent, more to do proper alignment on pages that use tables for formatting).
Comment 8•23 years ago
|
||
let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future.
Reporter | ||
Comment 9•23 years ago
|
||
Another related problem... (related to Composer) When I make a generic 2x2 table in Composer, the generated source looks like this (pay attention to the <BR>'s: <table cellpadding="2" cellspacing="2" border="1" width="100%"> <tbody> <tr> <td valign="Top">one<br> </td> <td valign="Top">two<br> </td> </tr> <tr> <td valign="Top">three<br> </td> <td valign="Top">four<br> </td> </tr> </tbody> </table> The automatically generated <BR> tags after the data in each <TD></TD> makes the text copied out of the table look like this: one two three four Yuck! Does this deserve a Composer-related bug? I wonder if the text copied out of the table above will look like the following when this bug (bug 94176) is fixed ? one two three four Also, why is there an initial space before the "one" ?
Comment 10•23 years ago
|
||
IMO: the editor adding those br tags is a bug and we shouldn't do that, but that's a separate issue, not part of this bug. I'm not sure if Joe has a bug on that or not. Offhand, I would expect Beth's sample to output as: one two more stuff because the nested <table> tag is a block element and would cause an EnsureVerticalSpace when it opens. And in fact, that's pretty close to what happens now -- the example doesn't trigger this bug, because there's no <tr> between table rows; the only <tr> which should show up is complicated by a <table> tag which triggers a newline all by itself. I see an extra space between one and two, which I expect is covered by another bug (I know we have several bugs on extra spaces showing up) but otherwise the output is what I predicted above.
Comment 11•23 years ago
|
||
The bug on being smarter about tables was filed long ago, and is bug 18012, Support Tables in Plaintext Output.
Comment 13•23 years ago
|
||
I think this is fixed; resolve so QA can test.
Status: NEW → RESOLVED
Closed: 23 years ago
OS: Windows 2000 → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: Future → mozilla0.9.3
Reporter | ||
Comment 14•23 years ago
|
||
As far as I can tell, this bug is still busted in Mozilla 0.9.4, build: 2001091303 and the most recent nightly build: 2001100403. For example, when text is copied out of this simple html table: <html> <body> <table> <tr> <td>1</td> <td>2</td> </tr><tr> <td>3</td> <td>4</td> </tr> </table> </body> </html> it looks like this: 1 2 3 4 and it should look like this: 1 2 3 4 I believe this bug needs to be re-opened, unless I am missing something.
Comment 15•23 years ago
|
||
reopen based on recent testing (comments above)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•23 years ago
|
||
Really reassigning.
Assignee: harishd → peterv
Target Milestone: mozilla0.9.3 → ---
Updated•23 years ago
|
Severity: enhancement → normal
Priority: P4 → --
Whiteboard: [RFE][plaintext] → [plaintext]
Comment 18•23 years ago
|
||
Dmose was just complaining about this on #mozilla. And I've been hitting it a lot trying to copy discussions from chatzilla and paste them into plaintext editors. I'm wasting a lot of time re-formatting stuff that we screwed up at paste time. I looked into it a little to see if the fix was obvious. It turns out this regressed when the "copy encoding" stuff was added, because now <tr> and <td> are no longer passed along to nsPlaintextSerializer, so the serializer doesn't know to do the right thing with them. Adding nsHTMLAtoms::table, nsHTMLAtoms::tr, nsHTMLAtoms::th, and nsHTMLAtoms::td to the list of tags in nsHTMLCopyEncoder::IncludeInContext fixes table copy, but has the side effect that now text copied from inside one table cell (e.g. from the main part of a page which uses tables for formatting) appends a newline after the end (because the tr/table tags are included and the serializer sees a block node closing). We need a way of "including in context" only when the selection *spans* one of these nodes. Joe wrote the copy encoder stuff: Joe, how do we do that? Is that what the "promoted range"/"promoted point" stuff is about? (I'm not sure what they do.) I'll take this bug for now since it's clear that no one else plans to look at it and it's making a lot of people's lives miserable. But I need some guidance from Joe on the code.
Assignee: peterv → akkana
Comment 19•23 years ago
|
||
Haven't had much luck syncing with Joe. Joe, I think this is basically the same issue as the extra line breaks in text copied from pre blocks -- in both cases, we really need to figure out whether the selection crosses a block boundary as opposed to the selection living wholly inside a block boundary. I'm willing to put in some work on this since it's sometimes a major impediment to getting work done, but I don't understand how the current code works. Can you offer some pointers?
Assignee: akkana → jfrancis
Assignee | ||
Comment 20•23 years ago
|
||
I'm kinda surprised that the current code doesn't do the right thing here. I just made a table and removed the trailing <br>s in source mode, and expiremented. I see the bug if I select all of the table content by drag- selecting from *inside* the table. But I get the desired behavior if I select- all. The weird thing is that I thought the html stream we put into the clipboard should be identical in those two cases, and only the special data flavor that indicated how "deep" in the structure the real selection endpoints were would be different. Since that latter structure is only used in Composers paste code, and not by the serializer, I would think these two cases would get identical results in the serialier. But they don't. Akkana, I can't really help you unless I just investigate this bug myself. It doesn't make sense to me. The tr's should be getting through.
Target Milestone: --- → mozilla0.9.9
Comment 22•23 years ago
|
||
Akkana, as you mentioned above, changes in IncludeInContext() are causing the regression by adding extra new lines but there seems to be no other way to catch these tags in Serializer. In the patch I'm trying to work around the addition of these new lines.
Comment 23•23 years ago
|
||
Comment on attachment 69655 [details] [diff] [review] Patch to fix it... This does fix the problem of the lacking newlines after <tr>, and looks like a solid fix to the problem, so I'm giving it r=akkana. Ideally I hope jfrancis can take a look at the changes to nsDocumentEncoder.cpp, since he's the expert on what might break when those lists change (but my guess is that it's fine when taken with the rest of this patch). The !mAtFirstColumn added for <td> nodes unfortunately isn't doing its job. I made a page that looked like this: <table> <tr><td>one<td>two<td>three<td>four <tr><td>a <td>b <td>c <td>d </table> and the first line came out: onetwothreefour That's true before this patch, too, so it shouldn't block acceptance of the patch, but it remains a problem to be fixed. (I don't know why the patch didn't fix it -- seems like it should.) I also noticed another problem (unrelated, probably already covered by a bug somewhere else): copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines. I'm not sure if that's a new problem or not (but it wasn't caused by this patch -- happens even without the patch).
Attachment #69655 -
Flags: review+
Comment 24•23 years ago
|
||
Can you please add a comment to the new member variable, describing its purpose?
There are so many variables of this kind already that I think it's needed.
[OT]
> you lose the initial whitespace from most of the lines.
I know that I once filed a bug about that in textareas, not sure about it's
status, though. I see the same behaviour you describe, but not in textareas anymore.
Comment 25•23 years ago
|
||
I second Ben's request for a comment describing the new member variable. Is there a bug for the missing whitespace between table cells? The other missing whitespace is probably (misbehaving?) whitespace compression I would guess.
Comment 26•23 years ago
|
||
I see the reason why mAtFirstColumn is not working. It's because we are not manipulating it to suit our requirements. So I'm replacing it with a check for "mCurrentLine.IsEmpty()". That solves our purpose. Also included the description of new member variable in this patch. Akkana, would you please explain me about which white space are you talking here: "copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines." I tried with the testcase provided by beppe and could not understand this properly.
Comment 27•23 years ago
|
||
Comment on attachment 69841 [details] [diff] [review] Incorporating above suggestions. Ah, so mLineBreakIsDue is kind of a lazy line break, too avoid extra ending lines (which is a problem right now)... Have you tested it in mail too (sending a table as plain text), where the formatting flags are a little different. As for mAtFirstColumn it's only used when doing preformatted output. In the other case we don't wrap until later so we don't know which column text will end up at. mCurrentLine is not always used either. For some settings of the flags it's bypassed and the text is written directly to the output string/stream.
Comment 28•23 years ago
|
||
Thanks Daniel, Included a new member variable |mIsFirstTD| to nsPlainTextSerializer. It keeps a track if we are handling first <TD> of a <TR> or <TABLE>. If <TD> is not the first one, we skip adding a new line character to the current text. Using a separate member variable makes it independent of any specific flag setting.
Comment 29•23 years ago
|
||
Re the plaintext paste problems: look at beppe's comment #8. Copy that whole comment. Paste into a plaintext window. It pastes like this: ---- let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future. ---- Notice that all the initial whitespace is lost. I think this is covered by ftang's bug that jfrancis just took, summary something like "plaintext copy/paste is very bad".
Updated•23 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [plaintext] → [plaintext][needs sr=]
Assignee | ||
Comment 30•23 years ago
|
||
Comment on attachment 70063 [details] [diff] [review] Modified to incorporate Daniel's sugestions. This is a no-go. You can't make this change to IncludeInContext(). It will break html paste: copying anything in a table would cause a whole table to be pasted in Composer.
Attachment #70063 -
Attachment is obsolete: true
Attachment #70063 -
Flags: needs-work+
Comment 31•23 years ago
|
||
What is the expected behavior of HTML paste? Should we not get a table pasted on composer? I think we should.
Assignee | ||
Comment 32•23 years ago
|
||
Copying a word that happens to be in a table should result in a table being pasted. contributors to this discussion may be interested in reviewing/testing my patch in bug 98572
Assignee | ||
Comment 33•23 years ago
|
||
That was SUPPOSED to read: Copying a word that happens to be in a table should NOT result in a table being pasted.
Comment 34•23 years ago
|
||
The changes to nsPlainTextSerializer looks ok though. I assume you have run the tests (they are part of the tinderbox tests so failing one of them breaks tinderbox), and sent mail with it, verifying that it looks ok. If so, r=bratell@lysator.liu.se for the serializer. You can use that r= in coming patches as long as nsPlainTextSerializer stays the same. I'm very thankful for all your work. There are a lot of small things to fix, that I've never gotten to do.
Assignee | ||
Comment 35•23 years ago
|
||
Akkana, I think you're comment #19 is correct. I have to change the copy code to detect block crossings. Unfortunately, before I can do that I have to import a bunch of ws code that I originally wrote for the editor. The reason is you could be "trivially" crossing a block. For example, you could drag select a bunch of text that lives between two lists. But if you do your drag just right (just wrong?) you could actually have the selection starting at the very end of the prior list, and ending at the very beginning of the following list. Then, if I detect block crossings at copy time, I will force two single item lists (with empty items) to be output: one before your copied text and one after. This is hardly what the user wanted. I'm in the process of working on this issue this week. Once it is fixed the serializer should see table cells and rows, etc, for selections that are inside of tables and span table structure. Assigning to myself pending copy work. If there remains serialzer work after I improve copy I will assign back at that point.
Assignee: tmutreja → jfrancis
Status: ASSIGNED → NEW
Comment 36•23 years ago
|
||
*** Bug 125658 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 37•23 years ago
|
||
for the moment putting in moz1.1. hoping to bring it back but cant right now due to nervousness by the bug gods.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.1
Comment 38•22 years ago
|
||
removing myself from the cc list
Comment 39•22 years ago
|
||
*** Bug 134776 has been marked as a duplicate of this bug. ***
Comment 40•22 years ago
|
||
From bug 134776, this occurs in Chatzilla and is very troublesome to copy and paste discussions of bugs, etc. Hope this will increase priority/effort a little.
Comment 41•22 years ago
|
||
*** Bug 136830 has been marked as a duplicate of this bug. ***
Comment 42•22 years ago
|
||
*** Bug 103522 has been marked as a duplicate of this bug. ***
Comment 43•22 years ago
|
||
*** Bug 153858 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 44•22 years ago
|
||
The trunk is the wave of the future!
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Assignee | ||
Comment 45•22 years ago
|
||
Heh. I looked into this some more and discovered what is really going on. The problem is that the parser is dropping partial table structure on the floor. The docencoder is indeed producing the tr's and td's, and clipboard is receiving them. When the clipboard code tries to convert this to plaintext, it calls the parser. The parser just drops table elements on the floor if it doesnt already have a table on it's tag stack. So we get just the contents of the td's. the parser needs to not drop the tr's/td's. Perhaps there should be a seperate mode for the parser where it understands it is just parsing a fragment, instead of a document. The clipboard code could then call the parser in this mode. Over to parser folks. CC'ing pinkerton for clipboard.
Component: DOM to Text Conversion → Parser
Assignee | ||
Comment 46•22 years ago
|
||
reassigning
Assignee: jfrancis → harishd
Status: ASSIGNED → NEW
QA Contact: sujay → moied
Boris was thinking about the fragment stuff at some point I believe, adding him to Cc.
Assignee | ||
Comment 48•22 years ago
|
||
back to me cuz i have a fix. the patch i am attaching depends on the patch in bug 159842.
Assignee: harishd → jfrancis
Assignee | ||
Comment 49•22 years ago
|
||
Attachment #69655 -
Attachment is obsolete: true
Attachment #69841 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [plaintext][needs sr=] → [plaintext]
Target Milestone: mozilla1.1beta → M1
Assignee | ||
Updated•22 years ago
|
Whiteboard: [plaintext] → [plaintext] fixinhand
Comment 50•22 years ago
|
||
*** Bug 162488 has been marked as a duplicate of this bug. ***
Comment 51•22 years ago
|
||
*** Bug 141862 has been marked as a duplicate of this bug. ***
I don't think M1 is the correct milestone...
Target Milestone: M1 → ---
Comment 53•22 years ago
|
||
resetting milestone; M1 is correct milestone if you want jfrancis to fix it soon.
Target Milestone: --- → M1
Oh, sorry. But it would be good if everyone stuck to the same meanings of milestone fields and other fields. Most(?) teams uses milestone to set the milestone to aim for, and priority to fine tune the order in which to fix the bugs in a milestone. Also, we have people estimating glide paths for releases etc. based on number of bugs in milestones. Using nonstandard milestones screws these estimates.
Comment 55•22 years ago
|
||
Comment on attachment 93106 [details] [diff] [review] widget patch to tell parser we are parsing fragment sr=kin@netscape.com
Attachment #93106 -
Flags: superreview+
Assignee | ||
Comment 56•22 years ago
|
||
fix landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 22 years ago
Resolution: --- → FIXED
Comment 57•22 years ago
|
||
Please verify this bug
Whiteboard: [plaintext] fixinhand → [plaintext] fixinhand edt_x3
Comment 58•21 years ago
|
||
using the trunk build from 2003040708 on win2K. I created a simple table, viewed it in the browser, set my mail compose to be plaintext. And then copied & pasted the data from the browser window to the mail compose window. testing a simple table: <table> <tr> <td>one</td> <td>two</td> </tr> <tr> <td>more</td> <td>stuff</td> </tr> </table> results in this output: one two more stuff If I remove the content 'more' and do the same copy/paste - the ws is preserved.
Status: RESOLVED → VERIFIED
Comment 59•20 years ago
|
||
See also bug 236546, same bug when copying table cells/rows/columns with Ctrl.
You need to log in
before you can comment on or make changes to this bug.
Description
•