Open
Bug 18012
Opened 25 years ago
Updated 16 days ago
Support tables in plaintext output
Categories
(Core :: DOM: Serializers, enhancement, P5)
Core
DOM: Serializers
Tracking
()
NEW
Future
People
(Reporter: BenB, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: helpwanted)
This is a "sub-bug" of bug #16800. I would like to see at least the most essential parts of tables (basic col and row formatting, colspan and rowspan, caption, maybe summary) supported. This wouldn't be easy, because it means (one more) rewrite of the line functions of nsHTMLToTXTSinkStream. I tried to lurk at lynx, but it doesn't support them either :-(.
Reporter | ||
Comment 1•25 years ago
|
||
Usual disclaimer: I don't know, if I will really implement this. I'm just looking.
Comment 2•25 years ago
|
||
FWIW, this would be most useful for the simplest cases - 2-4 columns only, no nested tables, no fanciness at all. It is easy to forget the days before DTP and the web, but as little as 15 years ago the majority of tables that were not professionally produced or made by computer programs were either composed at a typewriter or at a word-processing station of some kind connected to a letter-quality printer, i.e. a typewriter with a data cable. I am sure that if you get code for this working at all there will be pressure to support more and more subtleties, but the ability to view simple tables is worth enough to draw a line around what is reasonably possible.
Reporter | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Priority: P3 → P2
Reporter | ||
Updated•25 years ago
|
Target Milestone: M12 → M20
Reporter | ||
Updated•25 years ago
|
Assignee: mozilla → akkana
Status: ASSIGNED → NEW
Summary: Support tables in plaintext output → [HELP WANTED] Support tables in plaintext output
Whiteboard: [HELP WANTED]
Reporter | ||
Comment 3•25 years ago
|
||
I don't think, I'll work on that anytime soon. HELP WANTED.
Comment 4•25 years ago
|
||
Removing dependency: formatting tables would be nice regardless of anything else. Cc'ing Daniel in case he has any interest in this.
Updated•25 years ago
|
Keywords: helpwanted
Updated•25 years ago
|
Status: NEW → ASSIGNED
Updated•25 years ago
|
Summary: [HELP WANTED] Support tables in plaintext output → Support tables in plaintext output
Whiteboard: [HELP WANTED]
Bulk move of all "Output" component bugs to new "DOM to Test Conversion" component. Output will be deleted as a component.
Comment 7•24 years ago
|
||
moving back to previous owner
Assignee: beppe → akkana
Target Milestone: M20 → Future
Comment 9•24 years ago
|
||
Re-accepting.
Reporter | ||
Comment 10•24 years ago
|
||
Thsi bug is far from trivial. Means caching the content of table cells. I see two ways: - Ignore all formatting inside table cells. This is against the HTML 4.0 spec, which allows both inline and block tags inside table cells. Assuming, the table is more important than formatting inside it, this would be an improvemant, but of doubtable value. If we want to output commercial web pages, this way would worse the situation, as tables are often used for big scale formatting (see e.g. <http://www.mozilla.org> :-( ). - Set up a new output sink for each cell. This would preserve all formatting inside cells (even nested cells :) ), and is IMO a logical solution, but it is more work (see below) and might have some performance problems (dunno for sure). Implementation: - If <td>/<th>, go into table cell mode. - If in table cell mode, record all input up to the </td>/<th> corresponding to the <td>/th> above. This includes both leafs and tags! I have no idea how to do this, would be some outputsink magic. I think, this is the hard part. Akk? - Do the above for all cells until <table>. - Compare the length of the concatted leafs* for all cells (columns?), and calculate column widths. - If table cell is closed, create new HTML->TXT sink and feed it with the recorded data. Wrap column is sat following the calculated column width. Fill the lines with spaces up to the wrap column (we could add a new mode to the HTML->TXT for that). Record the output. - Lay the output out in a table. (line 1 of cell 1 + "|" + line 1 of cell 2 etc..) *Correctly, we would have to compare the length of the TXT output, but we don't know the column width yet, so we would have to run the HTML->TXT twice. The length of the concatted leafs is a close approximation, and should be enough.
Reporter | ||
Comment 11•24 years ago
|
||
w3m's table algorithm: <http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/STORY.html>. Good news: Seems like we will switch to direct DOM->text output sometime (currently, we do DOM->XIF/HTML->text), which means we can navigate through the document (back and forth), which means, we don't have to do the caching described above. Adding dependency on bug 51308.
Depends on: 51308
Comment 12•24 years ago
|
||
Anthonyd is taking over Output bugs, so he's the default owner for RFEs like this one.
Assignee: akkana → anthonyd
Status: ASSIGNED → NEW
Comment 14•23 years ago
|
||
This is a serializer bug; it needs to be reassigned to the module owner for DOM to Text Conversion.
Assignee: brade → anthonyd
Comment 15•22 years ago
|
||
reassigning to cmanske.
Assignee: anthonyd → cmanske
Severity: normal → enhancement
Reporter | ||
Comment 17•22 years ago
|
||
Table support should be optional. many webpages add a lot of cruft in the left and right columns, and having it before/after the real text makes it *much* easier to remove it from the resulting document later as if it were next to the real content.
Comment 18•22 years ago
|
||
*** Bug 143151 has been marked as a duplicate of this bug. ***
Updated•15 years ago
|
QA Contact: sujay → dom-to-text
Comment 19•6 years ago
|
||
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Comment 20•2 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.
Assignee: t_mutreja → nobody
Updated•2 years ago
|
Severity: normal → S3
Updated•16 days ago
|
Priority: P3 → P5
You need to log in
before you can comment on or make changes to this bug.
Description
•