Closed
Bug 650787
Opened 13 years ago
Closed 13 years ago
nsPlainTextSerializer: Avoid dependencies on the old HTML parser when converting HTML to plain text
Categories
(MailNews Core :: MIME, defect)
MailNews Core
MIME
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 13.0
People
(Reporter: hsivonen, Assigned: hsivonen)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 4 obsolete files)
11.11 KB,
patch
|
Bienvenu
:
review+
|
Details | Diff | Splinter Review |
It appears that comm-central has multiple places where the old HTML parser is used with nsPlainTextSerializer as a sink to convert HTML to plain text.
The old parser and the sink interfaces are going to go away, so these conversions in mailnews core should use the new parser and a plain text serializer that takes a DOM (sub)tree instead of being a parser sink. Code consolidation with bug 650784 would make sense.
Comment 1•13 years ago
|
||
I strongly recommend against a reimplementation. nsPlainTextSerializer is very involved and delicate code. It took a lot of fine-tuning to get it right in all cases: copy-paste, normal plaintext, and format=flowed. Each of these by itself has totally non-obvious edge cases that needed to be considered. Rewriting this is bound to cause a lot of regressions.
I suggest to simply move this implementation to the new APIs, whatever they are.
Summary: Avoid dependencies on the old HTML parser when converting HTML to plain text → nsPlainTextSerializer: Avoid dependencies on the old HTML parser when converting HTML to plain text
Assignee | ||
Comment 2•13 years ago
|
||
(In reply to comment #1)
> I suggest to simply move this implementation to the new APIs, whatever they
> are.
nsPlainTextSerializer seems to inherit from nsIContentSerializer already. Does it already actually work as an nsIContentSerializer impl? If it does, I suggest simply dropping support for the nsIHTMLContentSink and nsIHTMLToTextSink interfaces and using nsPlainTextSerializer only as an nsIContentSerializer implementation.
Comment 3•13 years ago
|
||
> Does it already actually work as an nsIContentSerializer impl?
I don't think anybody knows, and I think you're best suited to find out.
Comment 4•13 years ago
|
||
I am having some success reimplementing nsPlainTextSerializer by implementing it as a specialization of nsHTMLContentSerializer. Although it may seem counter intuitive nsHTMLContentSerializer has most of the code already written. I'm just modifying it to fit ie strip out the appending of the open and close tags. also the newer implementation is much more elegant and simplifies the coding significantly.
It would help me if someone could write some test cases involving lists and tables or just more through tests in general.
Also I am wondering whether I should Ignore canvas tags in the newer code?
Comment 5•13 years ago
|
||
Blocks: 563890
Assignee | ||
Comment 7•13 years ago
|
||
(In reply to william bradshaw from comment #5)
> Blocks: 563890
Why?
Comment 8•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> (In reply to william bradshaw from comment #5)
> > Blocks: 563890
>
> Why?.
Missed the differance of the :I" thought nsPlainTextSerializer was an nsHTMLContentSink it's not it's a nsIHTMLContentSink sorry about that my bad. I'll post my in progress patch latter tonight when I get back to my dorm.
Comment 9•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> (In reply to comment #1)
> > I suggest to simply move this implementation to the new APIs, whatever they
> > are.
>
> nsPlainTextSerializer seems to inherit from nsIContentSerializer already.
> Does it already actually work as an nsIContentSerializer impl? If it does, I
> suggest simply dropping support for the nsIHTMLContentSink and
> nsIHTMLToTextSink interfaces and using nsPlainTextSerializer only as an
> nsIContentSerializer implementation.
<http://www.tjhsst.edu/~jcranmer/c-ccov/content/base/src/nsPlainTextSerializer.cpp.gcov.html> seems to suggest that it is not tested as such (there may be stuff in mochitests/reftests/etc., but I doubt it). Given how poorly tested the code is, I am not sure I would trust just flipping the impl use and seeing if it works.
Comment 10•13 years ago
|
||
william bradshaw, thanks for looking into this.
> also the newer implementation is much more elegant
> and simplifies the coding significantly.
When changing the implementation *logic* (not just APIs), please be super careful. This class was fine-tuned for many use–cases, including Thunderbird plaintext mail send, format=flowed, clipboard paste from browser, again in two variants: rich and simple text, and no linebreaks. Also test pasting of URLs.
This is very prone to regressions in edge cases, so I'm very vary of re-implementations or any changes to the logic whatsoever in fact.
Comment 11•13 years ago
|
||
I think I've found that my success in implementing nsPlainTextSerializer is limited to marginal success and am finding the deeper I go the more things don't work as I thought they did. I'm now siding with the not reimplementing crowd. Although it seems the idea seems to work on a high level small details of the way functions are implemented would cause me modify virtually every function of nsHTMLSerializer to attain full behavioral recreation. Although at this time (for time saving purposes) just moving to new API's is looking rather appealing to me. I would however advocate Bug 650784 or changing implementation to more closely follow the model set up by the nsXMLSerializer class tree to provide a more consistent code base.
Comment 12•13 years ago
|
||
Assignee | ||
Updated•13 years ago
|
Attachment #562637 -
Attachment is patch: true
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Assignee | ||
Comment 13•13 years ago
|
||
This need parts 1 and 1.5 from bug 650784 on m-c.
Attachment #562637 -
Attachment is obsolete: true
Attachment #591759 -
Flags: review?(mbanner)
Assignee | ||
Comment 14•13 years ago
|
||
Comment on attachment 591759 [details] [diff] [review]
Use nsIScriptableHTMLParser for HTML to plain text conversion
Hmm. This should probably use do_GetService instead of do_CreateInstance.
Attachment #591759 -
Flags: review?(mbanner)
Assignee | ||
Comment 15•13 years ago
|
||
Attachment #591759 -
Attachment is obsolete: true
Assignee | ||
Comment 16•13 years ago
|
||
Attachment #600887 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #601225 -
Flags: review?(dbienvenu)
Comment 17•13 years ago
|
||
Comment on attachment 601225 [details] [diff] [review]
Use nsIParseUtils for plain text conversion, with .xpt additions to packaging
Review of attachment 601225 [details] [diff] [review]:
-----------------------------------------------------------------
::: suite/installer/package-manifest.in
@@ +201,5 @@
> @BINPATH@/components/filepicker.xpt
> #endif
> @BINPATH@/components/find.xpt
> @BINPATH@/components/gfx.xpt
> +@BINPATH@/components/html5.xpt
Ah, I just did it in bug 731668 :-|
Comment 18•13 years ago
|
||
Comment on attachment 601225 [details] [diff] [review]
Use nsIParseUtils for plain text conversion, with .xpt additions to packaging
>diff --git a/mail/installer/package-manifest.in b/mail/installer/package-manifest.in
>+@BINPATH@/components/html5.xpt
Due to bug 732070, I've landed this part of the patch with r=me. I'll let David look at the rest later.
Comment 19•13 years ago
|
||
Comment on attachment 601225 [details] [diff] [review]
Use nsIParseUtils for plain text conversion, with .xpt additions to packaging
The changeset was: http://hg.mozilla.org/comm-central/rev/8eca2a2319d8
(Sorry, I forgot to correct the username after extracting that)
Assignee | ||
Comment 20•13 years ago
|
||
Attachment #601225 -
Attachment is obsolete: true
Attachment #601225 -
Flags: review?(dbienvenu)
Attachment #603650 -
Flags: review?(dbienvenu)
Updated•13 years ago
|
Attachment #603650 -
Flags: review?(dbienvenu) → review+
Assignee | ||
Comment 21•13 years ago
|
||
Thanks for the r+. Landed:
http://hg.mozilla.org/comm-central/rev/766214fa44ab
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Target Milestone: --- → Thunderbird 13.0
You need to log in
before you can comment on or make changes to this bug.
Description
•