Closed Bug 436921 Opened 16 years ago Closed 16 years ago

Formatting of HTML tables in the clipboard (NOT to do with excel!)

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 438374

People

(Reporter: jnqnfe, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

I'm currently building a web based system in PHP. In part of the system I need the ability to copy and paste a large amount of data from tables on a third party website.

I've concluded that the best way to do this is for a user to highlight the entire table, and paste it into a HTML <textarea> form field. I can then run it through a PHP parsing algorithm which will process the formatting of the table that was placed in the clipboard, in order to extract the data (which it will then store and later produce its own tables from).

The problem is the formatting Firefox places in the clipboard for tables copied in this way is not good enough. For basic tables it is perfect, however when <p> tags are introduced (which they often are on the website i need to copy from) for formatting is no longer possible to process!

I have done an extensive study into this in several browsers, which you can find here: http://lrebrown.x10hosting.com/table_clipboard_test/
Please improve the formatting.

I understand that you also need to make sure formatting can be process by applications like excel, but I really don;t think it's possible for anything to process the latter tables i looked at in my study!

Reproducible: Always

Steps to Reproduce:
1. Highlight and copy HTML table
2. Paste into a text editor
3. Try to convert formatting to a simple version from which data can be extracted by a script (PHP). Note, please read the study for more info!
Actual Results:  
Formatting that can't be processed!

Expected Results:  
Formatting that can be processed!

Please read the study i did on this problem: http://lrebrown.x10hosting.com/table_clipboard_test/
Forgot to mention, I tested both version 2.0.0.14 and version 3.0 RC 1 in the study!
Davide Ficano  wrote 
Table2­Clipboard  -  https://addons.mozilla.org/firefox/1852
http://dafizilla.sourceforge.net

If you want to paste data into Microsoft Excel or OpenOffice Calc with correct disposition simply use Table2Clipboard.
Pasting in plain text editors is also supported as CSV file (but you can change rows and columns separators from option dialog)
"Make it parsable by tools" isn't one of the goals of our DOM-to-text conversion code, especially for tables that seem like they might be layout tables.  Use a bookmarklet or extension or something.
Component: General → DOM to Text Conversion
Product: Firefox → Core
QA Contact: general → dom-to-text
That said, if there are specific cases where our DOM-to-text conversion is suboptimal for both readability *and* parsability, please do file bugs for each one.  I agree with you that "table cell containing one P tag" could be one of those cases -- there's no point having that paste differently than without the P tag, since it looks the same in the browser.

data:text/html,<table><tr><td><p>A</p></td><td><p>B</p></td></tr></table>
(In reply to comment #3)
> "Make it parsable by tools" isn't one of the goals of our DOM-to-text
> conversion code, especially for tables that seem like they might be layout
> tables.  Use a bookmarklet or extension or something.
> 
(In reply to comment #4)
> That said, if there are specific cases where our DOM-to-text conversion is
> suboptimal for both readability *and* parsability, please do file bugs for each
> one.  I agree with you that "table cell containing one P tag" could be one of
> those cases -- there's no point having that paste differently than without the
> P tag, since it looks the same in the browser.
> 
> data:text/html,<table><tr><td><p>A</p></td><td><p>B</p></td></tr></table>
> 

fair enough, I don't expect to be able to parse tables used for layout, that could be a nightmare, however the tables I need to parse do often use <p> tags like in your example above.

if you could just fix the issue with the <p> tag as used above, that would be absolutely perfect! Do you want me to write a new, more specific bug report?
Yes, having a clean bug report would be good.  I think it's the same as bug 304455, though, so I'm just going to mark this bug as a dup.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
same sort of issue, but the <p> tag has a different effect to the <div> apparently has in the report you linked to, it's a lot more complicated!

since it's slightly different, more important (towards parsing), and the report you linked to is really old, I filed a new and cleaner report: https://bugzilla.mozilla.org/show_bug.cgi?id=438374
Oh, I assumed <p> and <div> had the same effect.  Thanks for noticing.
You need to log in before you can comment on or make changes to this bug.