Support tables in plaintext output

NEW
Assigned to

Status

()

Core
Serializers
P2
enhancement
19 years ago
5 years ago

People

(Reporter: BenB, Assigned: Tanu Mutreja)

Tracking

(Blocks: 1 bug, {helpwanted})

Trunk
Future
helpwanted
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

19 years ago
This is a "sub-bug" of bug #16800.

I would like to see at least the most essential parts of tables (basic col and
row formatting, colspan and rowspan, caption, maybe summary) supported.

This wouldn't be easy, because it means (one more) rewrite of the line functions
of nsHTMLToTXTSinkStream.

I tried to lurk at lynx, but it doesn't support them either :-(.
(Reporter)

Updated

19 years ago
Depends on: 17723
Target Milestone: M12
(Reporter)

Comment 1

19 years ago
Usual disclaimer: I don't know, if I will really implement this. I'm just
looking.

Comment 2

19 years ago
FWIW, this would be most useful for the simplest cases - 2-4 columns only,
no nested tables, no fanciness at all. It is easy to forget the days before
DTP and the web, but as little as 15 years ago the majority of tables that
were not professionally produced or made by computer programs were either
composed at a typewriter or at a word-processing station of some kind connected
to a letter-quality printer, i.e. a typewriter with a data cable.

I am sure that if you get code for this working at all there will be pressure
to support more and more subtleties, but the ability to view simple tables
is worth enough to draw a line around what is reasonably possible.
(Reporter)

Updated

19 years ago
Status: NEW → ASSIGNED
Priority: P3 → P2
(Reporter)

Updated

19 years ago
Target Milestone: M12 → M20
(Reporter)

Updated

18 years ago
Assignee: mozilla → akkana
Status: ASSIGNED → NEW
Summary: Support tables in plaintext output → [HELP WANTED] Support tables in plaintext output
Whiteboard: [HELP WANTED]
(Reporter)

Comment 3

18 years ago
I don't think, I'll work on that anytime soon. HELP WANTED.

Updated

18 years ago
No longer depends on: 17723

Comment 4

18 years ago
Removing dependency: formatting tables would be nice regardless of anything
else.

Cc'ing Daniel in case he has any interest in this.

Updated

18 years ago
Keywords: helpwanted

Updated

18 years ago
Status: NEW → ASSIGNED

Updated

18 years ago
Summary: [HELP WANTED] Support tables in plaintext output → Support tables in plaintext output
Whiteboard: [HELP WANTED]

Comment 5

18 years ago
Bulk move of all "Output" component bugs to new "DOM to Test Conversion" 
component.  Output will be deleted as a component.

Updated

18 years ago
Component: Output → DOM to Text Conversion

Comment 6

18 years ago
moving to future milestone
Assignee: akkana → beppe
Status: ASSIGNED → NEW

Comment 7

18 years ago
moving back to previous owner
Assignee: beppe → akkana
Target Milestone: M20 → Future

Comment 8

18 years ago
Re-accepting.
Status: NEW → ASSIGNED

Comment 9

18 years ago
Re-accepting.
(Reporter)

Comment 10

18 years ago
Thsi bug is far from trivial. Means caching the content of table cells. I see
two ways:
- Ignore all formatting inside table cells.
This is against the HTML 4.0 spec, which allows both inline and block tags
inside table cells. Assuming, the table is more important than formatting inside
it, this would be an improvemant, but of doubtable value. If we want to output
commercial web pages, this way would worse the situation, as tables are often
used for big scale formatting (see e.g. <http://www.mozilla.org> :-( ).
- Set up a new output sink for each cell.
This would preserve all formatting inside cells (even nested cells :) ), and is
IMO a logical solution, but it is more work (see below) and might have some
performance problems (dunno for sure).
Implementation:
  - If <td>/<th>, go into table cell mode.
  - If in table cell mode, record all input up to the </td>/<th> corresponding
to the <td>/th> above. This includes both leafs and tags! I have no idea how to
do this, would be some outputsink magic. I think, this is the hard part. Akk?
  - Do the above for all cells until <table>.
  - Compare the length of the concatted leafs* for all cells (columns?), and
calculate column widths.
  - If table cell is closed, create new HTML->TXT sink and feed it with the
recorded data. Wrap column is sat following the calculated column width. Fill
the lines with spaces up to the wrap column (we could add a new mode to the
HTML->TXT for that). Record the output.
  - Lay the output out in a table. (line 1 of cell 1 + "|" + line 1 of cell 2
etc..)

*Correctly, we would have to compare the length of the TXT output, but we don't
know the column width yet, so we would have to run the HTML->TXT twice. The
length of the concatted leafs is a close approximation, and should be enough.
(Reporter)

Comment 11

18 years ago
w3m's table algorithm:
<http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/STORY.html>.


Good news: Seems like we will switch to direct DOM->text output sometime
(currently, we do DOM->XIF/HTML->text), which means we can navigate through the
document (back and forth), which means, we don't have to do the caching
described above. Adding dependency on bug 51308.
Depends on: 51308

Comment 12

18 years ago
Anthonyd is taking over Output bugs, so he's the default owner for RFEs like
this one.
Assignee: akkana → anthonyd
Status: ASSIGNED → NEW

Updated

18 years ago
Status: NEW → ASSIGNED

Comment 13

17 years ago
--> brade
Assignee: anthonyd → brade
Status: ASSIGNED → NEW

Comment 14

17 years ago
This is a serializer bug; it needs to be reassigned to the module owner for DOM 
to Text Conversion.
Assignee: brade → anthonyd

Comment 15

16 years ago
reassigning to cmanske.
Assignee: anthonyd → cmanske
Severity: normal → enhancement

Comment 16

16 years ago
Over to serializer owner.
Assignee: cmanske → tmutreja
(Reporter)

Comment 17

16 years ago
Table support should be optional. many webpages add a lot of cruft in the left
and right columns, and having it before/after the real text makes it *much*
easier to remove it from the resulting document later as if it were next to the
real content.

Comment 18

16 years ago
*** Bug 143151 has been marked as a duplicate of this bug. ***
QA Contact: sujay → dom-to-text
You need to log in before you can comment on or make changes to this bug.