Closed Bug 650784 Opened 14 years ago Closed 13 years ago

Implement HTML to plain text conversion as a DOM walker without nsParser dependency

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla13

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

(Keywords: addon-compat, dev-doc-needed)

Attachments

(7 files, 9 obsolete files)

WIP 14 years ago Henri Sivonen (:hsivonen) 24.13 KB, patch		Details \| Diff \| Splinter Review
Part 1: Remove the dependency on nsParser 14 years ago Henri Sivonen (:hsivonen) 32.21 KB, patch		Details \| Diff \| Splinter Review
Part 2: Remove dependency on nsIParserNode and nsIParserService 14 years ago Henri Sivonen (:hsivonen) 45.62 KB, patch		Details \| Diff \| Splinter Review
Part 1: Remove the dependency on nsParser, v2 14 years ago Henri Sivonen (:hsivonen) 39.89 KB, patch		Details \| Diff \| Splinter Review
Part 2: Remove dependency on nsIParserNode and nsIParserService, v2 14 years ago Henri Sivonen (:hsivonen) 47.49 KB, patch		Details \| Diff \| Splinter Review
Part 1: Introduce the new API 14 years ago Henri Sivonen (:hsivonen) 26.95 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
Part 2: Make nsPlainText serializer not be a content sink 14 years ago Henri Sivonen (:hsivonen) 12.36 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 3: Remove dependency on nsIParserNode and nsIParserService, v3 14 years ago Henri Sivonen (:hsivonen) 47.49 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core [Checked in: Comment 47] 14 years ago Henri Sivonen (:hsivonen) 20.40 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 0: Fold old nsParserUtils into nsContentUtils 13 years ago Henri Sivonen (:hsivonen) 46.33 KB, patch		Details \| Diff \| Splinter Review
Part 0: Fold old nsParserUtils into nsContentUtils, v2 [Checked in: Comment 47] 13 years ago Henri Sivonen (:hsivonen) 45.44 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 4: Scripted test case [Checked in: Comment 47] 13 years ago Henri Sivonen (:hsivonen) 4.11 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54] 13 years ago Henri Sivonen (:hsivonen) 47.43 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 1: Introduce the new API, v2 [Checked in: Comment 47] 13 years ago Henri Sivonen (:hsivonen) 25.23 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 0.5: Let the build system know that parser/html/ can have .idl files [Checked in: Comment 47] 13 years ago Henri Sivonen (:hsivonen) 3.79 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Part 2: Make nsPlainText serializer not be a content sink, unrotted [Checked in: Comment 54] 13 years ago Henri Sivonen (:hsivonen) 12.12 KB, patch	hsivonen : review+	Details \| Diff \| Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Description

•

14 years ago

It appears that currently clipboard-related conversions from HTML to plain text use http://mxr.mozilla.org/mozilla-central/source/widget/src/xpwidgets/nsHTMLFormatConverter.cpp#276 which uses the old parser and relies on nsPlainTextSerializer being an nsIHTMLContentSink. The old parser and the old content sink interfaces are going to go away. These conversions should use code that uses the new HTML parser to go from HTML source into a DOM if necessary and then serialization code that takes a DOM (sub)tree and outputs plain text.

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Blocks: 650787

Ben Bucksch (:BenB)

Comment 1

•

14 years ago

Please note the warnings at bug 650787 comment 1, 10 and 11.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

14 years ago

Blocks: 282244

Henri Sivonen (:hsivonen)

Assignee

Comment 2

•

14 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

This patch moves conversion to plain text to behind a static method in nsContentUtils instead of the XPCOM stuff we had before. What should I do about the TestHarness.h-based test can't call nsContentUtils?

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Attachment #590186 - Flags: feedback?(bugs)

Ben Bucksch (:BenB)

Comment 3

•

14 years ago

Comment on attachment 590186 [details] [diff] [review] WIP Henri, the converters are used in many places, also in Thunderbird and various in extensions. They must be exposed by XPCOM. Where similar functionality had not been properly exposed to JavaScript, it posed a huge problem for many parties, I've often heard people desperately needing it but having big problems to get to it from JS. If anything, it must be more accessible from XPCOM, not less. This is really important functionality that must be available to all components, like spell-checker, gloda, addon-manager wanting to display a HTML snipplet as plaintext, or anything else that needs to convert HTML to plaintext.

Attachment #590186 - Flags: feedback-

Ben Bucksch (:BenB)

Comment 4

•

14 years ago

Comment on attachment 590186 [details] [diff] [review] WIP As one of the authors of this component, I hope I don't overstep my authority by doing review- on this.

Attachment #590186 - Flags: review-

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #3) > Henri, the converters are used in many places, also in Thunderbird I'm planning to change Thunderbird call sites. > They must be exposed by XPCOM. XPCOM-exposed content sinks are going away and the old parser that can feed into them is going away. The old way of using this stuff isn't going to stay around. However, I'd be OK with wrapping the new nsContentUtils method in something that can be called through XPCOM. Would it work for you if nsScriptableUnescapeHTML (exposed via XPCOM) allowed the caller to pass flags and the column wrap?

Ben Bucksch (:BenB)

Comment 6

•

14 years ago

Comment on attachment 590186 [details] [diff] [review] WIP Yes, any way for JS callers to pass HTML through it and get plaintext using this class/implementation, and passing flags to it to control it, would be fine with me. Thanks! Revoking my objection.

Attachment #590186 - Flags: review-

Attachment #590186 - Flags: feedback-

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Keywords: addon-compat

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Attachment #590186 - Flags: feedback?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

14 years ago

Attached patch Part 1: Remove the dependency on nsParser (obsolete) — Details — Splinter Review

This patch exposes all functionality via XPCOM.

Attachment #590186 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 8

•

14 years ago

Attached patch Part 2: Remove dependency on nsIParserNode and nsIParserService (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Blocks: 720623

Henri Sivonen (:hsivonen)

Assignee

Comment 9

•

14 years ago

Attached patch Part 1: Remove the dependency on nsParser, v2 (obsolete) — Details — Splinter Review

Attachment #590721 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 10

•

14 years ago

Attached patch Part 2: Remove dependency on nsIParserNode and nsIParserService, v2 (obsolete) — Details — Splinter Review

Attachment #590722 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Depends on: 721313

Henri Sivonen (:hsivonen)

Assignee

Comment 11

•

14 years ago

Attached patch Part 1: Introduce the new API (obsolete) — Details — Splinter Review

Attachment #591430 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 12

•

14 years ago

Attached patch Part 2: Make nsPlainText serializer not be a content sink (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Comment 13

•

14 years ago

Attached patch Part 3: Remove dependency on nsIParserNode and nsIParserService, v3 (obsolete) — Details — Splinter Review

Some bits in the last part would logically belong in the middle part, but the last two parts will land together anyway.

Attachment #591432 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

14 years ago

Dev doc relevance: This change affects extension and XULRunner documentation. (Not Web platform documentation.) nsIHTMLToTextSink is removed. nsIScriptableHTMLParser is introduced. (Instantiated with NS_SCRIPTABLEHTMLPARSER_CONTRACTID / "@mozilla.org/scriptablehtmlparser;1".)

Keywords: dev-doc-needed

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Attachment #591719 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Attachment #591720 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Attachment #591722 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

14 years ago

Should I move nsScriptableUnescape.h/cpp to some place considered part of Core? I left it where it was in order to avoid breaking history.

Henri Sivonen (:hsivonen)

Assignee

Comment 16

•

14 years ago

Attached patch Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core [Checked in: Comment 47] — Details — Splinter Review

Attachment #591755 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 17

•

13 years ago

You should use the new license.

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 18

•

13 years ago

Comment on attachment 591719 [details] [diff] [review] Part 1: Introduce the new API >+[scriptable, uuid(290f49bb-0619-4bda-8006-ab31bec7231a)] >+interface nsIScriptableHTMLParser : nsISupports Kind of strange name. Would nsIHTMLToTextConverter be ok? It could then have convert() method Also, the new interface should have some tests. (perhaps they are in some other patch for this bug)

Attachment #591719 - Flags: review?(bugs) → review-

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 19

•

13 years ago

Comment on attachment 591755 [details] [diff] [review] Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core [Checked in: Comment 47] This needs some minor changes if scriptableHTMLParser is renamed

Attachment #591755 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 20

•

13 years ago

Comment on attachment 591720 [details] [diff] [review] Part 2: Make nsPlainText serializer not be a content sink Horrible old code

Attachment #591720 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 21

•

13 years ago

Comment on attachment 591722 [details] [diff] [review] Part 3: Remove dependency on nsIParserNode and nsIParserService, v3 >+nsContentUtils::IsHTMLBlock(nsIAtom* aLocalName) >+{ >+ return (aLocalName == nsGkAtoms::address) || >+ (aLocalName == nsGkAtoms::article) || Might look nicer if (aLocalName == nsGkAtoms::address) || was in its own line >- rv = DoAddLeaf(nsnull, >- eHTMLTag_text, >- Substring(textstr, start, length-start)); >+ DoAddText(false, Substring(textstr, start, length-start)); space before and after - >+nsPlainTextSerializer::DoAddText(bool aIsLineBreak, const nsAString& aText) ... >+ if ((mTagStackIndex > 1 && >+ mTagStack[mTagStackIndex-2] == nsGkAtoms::select) || >+ (mTagStackIndex > 0 && >+ mTagStack[mTagStackIndex-1] == nsGkAtoms::select)) { >+ // Don't output the contents of SELECT elements; >+ // Might be nice, eventually, to output just the selected element. >+ // Read more in bug 31994. >+ return; >+ } >+ >+ if (mTagStackIndex > 0 && >+ (mTagStack[mTagStackIndex-1] == nsGkAtoms::script || >+ mTagStack[mTagStackIndex-1] == nsGkAtoms::style)) { >+ // Don't output the contents of <script> or <style> tags; >+ return; >+ } So this code is now in two places. Could you add some helper method for it.

Attachment #591722 - Flags: review?(bugs) → review-

Henri Sivonen (:hsivonen)

Assignee

Comment 22

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #18) > >+interface nsIScriptableHTMLParser : nsISupports > Kind of strange name. > Would nsIHTMLToTextConverter be ok? > It could then have convert() method This isn't really a plain text converter interface. It's an interface for exposing entry points to the HTML parser to scripts and to outside libxul. These methods are logically like static methods, but they are on an interface, because that's how XPCOM does stuff. Considering what's now in the pipeline, bug 650776 will add more stuff to this interface. Maybe putting "HTML" in the interface name isn't great idea, since folding the functionality of nsIScriptableUnescapeHTML would already introduce an entry point to the XML parser. But, OTOH, it doesn't matter that much if a primarily HTML parser interface has some XML entry points, too. Are you OK with introducing an XPCOM service with a bunch of methods to be called by code that can't call nsContentUtils directly? I'd rather not add a bunch of one-off interfaces like nsIHTMLToTextConverter now that we don't guarantee frozen vtables anyway. How would you name such a service? nsIParserService is taken and I thought I'd keep new stuff on its way in and legacy stuff on its way out separate. > Also, the new interface should have some tests. > (perhaps they are in some other patch for this bug) The patch changes content/base/test/TestPlainTextSerializer.cpp to call the new interface. The serializer functionality behind the interface hasn't expanded, so retargeting the existing test coverage to the new entry point seemed appropriate.

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 23

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #22) > Are you OK with introducing an XPCOM service with a bunch of methods to be > called by code that can't call nsContentUtils directly? That sounds good. We even used to have nsIContentUtils. The name was terrible though. I'd rather not add a > bunch of one-off interfaces like nsIHTMLToTextConverter now that we don't > guarantee frozen vtables anyway. How would you name such a service? > nsIParserService is taken and I thought I'd keep new stuff on its way in and > legacy stuff on its way out separate. nsIParserUtils? or nsIParsingUtils > The patch changes content/base/test/TestPlainTextSerializer.cpp to call the > new interface. The serializer functionality behind the interface hasn't > expanded, so retargeting the existing test coverage to the new entry point > seemed appropriate. Just test that the new things works also in JS (since you're making the interface scriptable.)

Henri Sivonen (:hsivonen)

Assignee

Comment 24

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #23) > nsIParserUtils? We already have nsParserUtils that's pretty much unrelated to parsing. OK if I rename that one to nsAttributeUtils to get it out of the way?

Assignee: hsivonen → nobody

Component: Serializers → Selection

QA Contact: dom-to-text → selection

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

13 years ago

Component: Selection → Serializers

QA Contact: selection → dom-to-text

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 25

•

13 years ago

What about merging the current nsParserUtils to nsContentUtils, and then use nIParserUtils for this new stuff.

Component: Serializers → Selection

QA Contact: dom-to-text → selection

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 26

•

13 years ago

Ahem, can we stop moving this bug into Core::Selection?

Component: Selection → Serializers

QA Contact: selection → dom-to-text

Boris Zbarsky [:bzbarsky]

Updated

•

13 years ago

Assignee: nobody → hsivonen

Ben Bucksch (:BenB)

Comment 27

•

13 years ago

Before you move components around: "Utils" is a pretty bad name anyway. I'd recommend something with "convert" or similar in the name.

Henri Sivonen (:hsivonen)

Assignee

Comment 28

•

13 years ago

(In reply to Ms2ger from comment #26) > Ahem, can we stop moving this bug into Core::Selection? That happens due to some interaction between lazy-loading tabs and form state restoration. The changes are accidental and involuntary. (In reply to Olli Pettay [:smaug] from comment #25) > What about merging the current nsParserUtils to nsContentUtils, and then > use nIParserUtils for this new stuff. Sold! (In reply to Ben Bucksch (:BenB) from comment #27) > Before you move components around: "Utils" is a pretty bad name anyway. I'd > recommend something with "convert" or similar in the name. Naming is a total bikeshed and I already spent time renaming stuff to nsIParserUtils. Also, I expect the interface to gain non-conversion methods in due course.

Ben Bucksch (:BenB)

Comment 29

•

13 years ago

> Naming is a total bikeshed Yes, true, but naming is also the reason why nobody used nsIFormatConverter (bug 720623): nobody found it, nobody knew it. For me, "utils" means internal, minor stuff that is not very interesting. I would *never* look there, if I was searching for a HTML->TXT converter. But whatever, we'll see.

Henri Sivonen (:hsivonen)

Assignee

Comment 30

•

13 years ago

Attached patch Part 0: Fold old nsParserUtils into nsContentUtils (obsolete) — Details — Splinter Review

While at it, I removed a duplicated instance of the PI pseudo attribute parsing code and named the methods in a way that's more appropriate in terms of xml-stylesheet terminology.

Attachment #598845 - Flags: review?(bugs)

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 31

•

13 years ago

I assume the change in nsHtml5TreeOpExecutor.h wasn't meant for this patch?

Henri Sivonen (:hsivonen)

Assignee

Updated

•

13 years ago

Attachment #598845 - Attachment is obsolete: true

Attachment #598845 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 32

•

13 years ago

Attached patch Part 0: Fold old nsParserUtils into nsContentUtils, v2 [Checked in: Comment 47] — Details — Splinter Review

> I assume the change in nsHtml5TreeOpExecutor.h wasn't meant for this patch? Correct. Thanks.

Attachment #599058 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 33

•

13 years ago

Attached patch Part 4: Scripted test case [Checked in: Comment 47] — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Comment 34

•

13 years ago

Attached patch Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54] — Details — Splinter Review

Attachment #591722 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 35

•

13 years ago

Attached patch Part 1: Introduce the new API, v2 [Checked in: Comment 47] — Details — Splinter Review

Attachment #591719 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Updated

•

13 years ago

Blocks: 729044

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 36

•

13 years ago

Comment on attachment 599058 [details] [diff] [review] Part 0: Fold old nsParserUtils into nsContentUtils, v2 [Checked in: Comment 47] >+nsContentUtils::IsJavaScriptLanguage(const nsString& aName, PRUint32 *aFlags) >+{ >+ JSVersion version = JSVERSION_UNKNOWN; >+ >+ if (aName.LowerCaseEqualsLiteral("javascript") || >+ aName.LowerCaseEqualsLiteral("livescript") || >+ aName.LowerCaseEqualsLiteral("mocha")) { >+ version = JSVERSION_DEFAULT; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.0")) { >+ version = JSVERSION_1_0; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.1")) { >+ version = JSVERSION_1_1; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.2")) { >+ version = JSVERSION_1_2; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.3")) { >+ version = JSVERSION_1_3; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.4")) { >+ version = JSVERSION_1_4; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.5")) { >+ version = JSVERSION_1_5; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.6")) { >+ version = JSVERSION_1_6; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.7")) { >+ version = JSVERSION_1_7; >+ } >+ else if (aName.LowerCaseEqualsLiteral("javascript1.8")) { >+ version = JSVERSION_1_8; >+ } >+ if (version == JSVERSION_UNKNOWN) >+ return false; I know, you're just moving code, but could you fix the coding style if (expr) { stmt; } and } else if (expr) {

Attachment #599058 - Flags: review?(bugs) → review+

Henri Sivonen (:hsivonen)

Assignee

Updated

•

13 years ago

Attachment #599067 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

13 years ago

Attachment #599068 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

13 years ago

Attachment #599070 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 37

•

13 years ago

Bah. Now that TBPL is back, it shows failures where there were none locally. :-(

Ben Bucksch (:BenB)

Comment 38

•

13 years ago

Comment on attachment 599068 [details] [diff] [review] Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54] Ok, these are far-reaching changes. PlainTextSerializer has * so many use cases (format=flowed, rich plaintext and copy&paste) * delicate edge cases (various block elements, vertical and horizontal white space etc.) - basically similar to the HTML Editor rules * was created a decade before we had a useable testsuite and * is highly prone to regressions This has to be reviewed by somebody who has been involved in its creation (akk, Daniel or me, but not even I feel comfortable reviewing this, far too many changes).

Attachment #599068 - Flags: review?(akkzilla)

Ben Bucksch (:BenB)

Updated

•

13 years ago

Attachment #599070 - Flags: review?(akkzilla)

Ben Bucksch (:BenB)

Comment 39

•

13 years ago

Thunderbird is highly dependent on this class. One practical way to test is to run a large number of both HTML documents and HTML emails through the old version and the new version and see whether the result matches. Only exception is bug 314213: We need an empty line after <blockquote type=site>, but it's currently not there. You'd be highly welcome to fix this :). Please mind nested quotes, too.

Henri Sivonen (:hsivonen)

Assignee

Comment 40

•

13 years ago

Attached patch Part 0.5: Let the build system know that parser/html/ can have .idl files [Checked in: Comment 47] — Details — Splinter Review

This patch fixes the tryserver issues mentioned in comment 37.

Attachment #599488 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 41

•

13 years ago

(In reply to Ben Bucksch (:BenB) from comment #38) > Comment on attachment 599068 [details] [diff] [review] > Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 > > Ok, these are far-reaching changes. I think you are exaggerating. Part 3 does a pretty straight-forward substitution of old HTML tag enum values with corresponding nsGkAtoms. The most complex-looking changes arise, because in the new version, text nodes don't have a tag id enum value. nsContentUtils::IsHTMLVoid is trivial and easy to verify to be correct. nsContentUtils::IsHTMLBlock is derived by reading what the old code did. It should be possible to verify its equivalence to the old code by looking at the old and new code carefully. I welcome you to look at the old and new code carefully to check that I read the old code correctly. (In reply to Ben Bucksch (:BenB) from comment #39) > Thunderbird is highly dependent on this class. Does Thunderbird accordingly have unit tests for this? > One practical way to test is to run a large number of both HTML documents > and HTML emails through the old version and the new version and see whether > the result matches. Since you seem to have an idea of what cases are interesting, it would be helpful if you could supply test cases (as HTML files--not in email wrappers).

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 42

•

13 years ago

Comment on attachment 599488 [details] [diff] [review] Part 0.5: Let the build system know that parser/html/ can have .idl files [Checked in: Comment 47] rs=me

Attachment #599488 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 43

•

13 years ago

Comment on attachment 599070 [details] [diff] [review] Part 1: Introduce the new API, v2 [Checked in: Comment 47] >+ /** >+ * Converts HTML source to plain text by parsing the source and using the >+ * plain text serializer on the resulting tree. >+ * >+ * @param aSourceBuffer the string to parse as an HTML document >+ * @param aResultBuffer the string where the plain text result appears; >+ * may be the same string as aSourceBuffer >+ * @param aFlags Flags from nsIDocumentEncoder. >+ * @param aWrapCol Number of columns after which to line wrap 0 prevents wrapping, right? If so, could you please document it.

Attachment #599070 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 44

•

13 years ago

Comment on attachment 599068 [details] [diff] [review] Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54] >+nsContentUtils::IsHTMLBlock(nsIAtom* aLocalName) >+{ >+ return >+ (aLocalName == nsGkAtoms::address) || >+ (aLocalName == nsGkAtoms::article) || >+ (aLocalName == nsGkAtoms::aside) || >+ (aLocalName == nsGkAtoms::blockquote) || >+ (aLocalName == nsGkAtoms::center) || >+ (aLocalName == nsGkAtoms::dir) || >+ (aLocalName == nsGkAtoms::div) || >+ (aLocalName == nsGkAtoms::dl) || // XXX why not dt and dd? >+ (aLocalName == nsGkAtoms::fieldset) || >+ (aLocalName == nsGkAtoms::figure) || // XXX shouldn't figcaption be on this list >+ (aLocalName == nsGkAtoms::footer) || >+ (aLocalName == nsGkAtoms::form) || >+ (aLocalName == nsGkAtoms::h1) || >+ (aLocalName == nsGkAtoms::h2) || >+ (aLocalName == nsGkAtoms::h3) || >+ (aLocalName == nsGkAtoms::h4) || >+ (aLocalName == nsGkAtoms::h5) || >+ (aLocalName == nsGkAtoms::h6) || >+ (aLocalName == nsGkAtoms::header) || >+ (aLocalName == nsGkAtoms::hgroup) || >+ (aLocalName == nsGkAtoms::hr) || >+ (aLocalName == nsGkAtoms::li) || >+ (aLocalName == nsGkAtoms::listing) || >+ (aLocalName == nsGkAtoms::menu) || >+ (aLocalName == nsGkAtoms::multicol) || // XXX get rid of this one? >+ (aLocalName == nsGkAtoms::nav) || >+ (aLocalName == nsGkAtoms::ol) || >+ (aLocalName == nsGkAtoms::p) || >+ (aLocalName == nsGkAtoms::pre) || >+ (aLocalName == nsGkAtoms::section) || >+ (aLocalName == nsGkAtoms::table) || >+ (aLocalName == nsGkAtoms::ul) || >+ (aLocalName == nsGkAtoms::xmp); >+} Please file followups for the XXX >- if(NS_SUCCEEDED(GetAttributeValue(aNode, nsGkAtoms::start, startAttr))){ >+ if(NS_SUCCEEDED(GetAttributeValue(nsGkAtoms::start, startAttr))){ space after 'if' and before '{' >- else if (IsBlockLevel(aTag) >- && type != eHTMLTag_script >- && type != eHTMLTag_doctypeDecl >- && type != eHTMLTag_markupDecl) { >+ else if (nsContentUtils::IsHTMLBlock(aTag) >+ && aTag != nsGkAtoms::script) { Hmm, where do we handle doctype serialization? >+ nsAutoString aTagAttr; Badly named variable. aFoo is for parameters. > nsPlainTextSerializer::IsInPre() > { > PRInt32 i = mTagStackIndex; > while(i > 0) { >- if(mTagStack[i-1] == eHTMLTag_pre) >+ if(mTagStack[i-1] == nsGkAtoms::pre) > return true; >- if(IsBlockLevel(mTagStack[i-1])) { >+ if(nsContentUtils::IsHTMLBlock(mTagStack[i-1])) { i - 1, and space after 'if' Tricky to review.

Attachment #599068 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi] (vacation-ish -> Aug 1)

Comment 45

•

13 years ago

Comment on attachment 599067 [details] [diff] [review] Part 4: Scripted test case [Checked in: Comment 47] Could you put the test to content/base/test/chrome

Attachment #599067 - Flags: review?(bugs) → review+

Henri Sivonen (:hsivonen)

Assignee

Comment 46

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #36) > Comment on attachment 599058 [details] [diff] [review] > Part 0: Fold old nsParserUtils into nsContentUtils, v2 ... > I know, you're just moving code, but could you fix the coding style > if (expr) { > stmt; > } > and > } else if (expr) { Fixed. (In reply to Olli Pettay [:smaug] from comment #43) > Comment on attachment 599070 [details] [diff] [review] > Part 1: Introduce the new API, v2 ... > 0 prevents wrapping, right? If so, could you please document it. Documented. (In reply to Olli Pettay [:smaug] from comment #45) > Comment on attachment 599067 [details] [diff] [review] > Part 4: Scripted test case > > Could you put the test to content/base/test/chrome Moved there. Thanks. Landed parts other than 2 and 3: https://hg.mozilla.org/integration/mozilla-inbound/rev/5ccebf6862a9 https://hg.mozilla.org/integration/mozilla-inbound/rev/d40ba365b995 https://hg.mozilla.org/integration/mozilla-inbound/rev/429263e58090 https://hg.mozilla.org/integration/mozilla-inbound/rev/a423ca3c3ce1 https://hg.mozilla.org/integration/mozilla-inbound/rev/8ea9dc2f8570 comm-central needs a patch so that it starts using the API introduced in part 1 before parts 2 and 3 can land without burning c-c.

Whiteboard: [please leave open after inbound merge; parts 2 and 3 need to land after comm-central changes]

Matt Brubeck (:mbrubeck)

Comment 47

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/5ccebf6862a9 https://hg.mozilla.org/mozilla-central/rev/d40ba365b995 https://hg.mozilla.org/mozilla-central/rev/429263e58090 https://hg.mozilla.org/mozilla-central/rev/a423ca3c3ce1 https://hg.mozilla.org/mozilla-central/rev/8ea9dc2f8570

Whiteboard: [please leave open after inbound merge; parts 2 and 3 need to land after comm-central changes]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #599058 - Attachment description: Part 0: Fold old nsParserUtils into nsContentUtils, v2 → Part 0: Fold old nsParserUtils into nsContentUtils, v2 [Checked in: Comment 47]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #599070 - Attachment description: Part 1: Introduce the new API, v2 → Part 1: Introduce the new API, v2 [Checked in: Comment 47]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #591755 - Attachment description: Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core → Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core [Checked in: Comment 47]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #599067 - Attachment description: Part 4: Scripted test case → Part 4: Scripted test case [Checked in: Comment 47]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #599488 - Attachment description: Part 0.5: Let the build system know that parser/html/ can have .idl files → Part 0.5: Let the build system know that parser/html/ can have .idl files [Checked in: Comment 47]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Blocks: 731668

Alice0775 White

Updated

•

13 years ago

Depends on: 731896

Serge Gautherie (:sgautherie)

Comment 48

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #46) > comm-central needs a patch so that it starts using the API introduced in > part 1 before parts 2 and 3 can land without burning c-c. Ftr, you filed that as bug 650787.

David :Bienvenu

Comment 49

•

13 years ago

Comment on attachment 599068 [details] [diff] [review] Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54] has this bit-rotted, or does it rely on some other patch to be applied first? It doesn't apply cleanly for me.

alta88

Comment 50

•

13 years ago

it seems the current landing has broken feeds use of nsIScriptableUnescapeHTML in Tb trunk. Bug 732070.

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Blocks: 732070

Henri Sivonen (:hsivonen)

Assignee

Comment 51

•

13 years ago

(In reply to David :Bienvenu from comment #49) > Comment on attachment 599068 [details] [diff] [review] > Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 > > has this bit-rotted, No, AFAICT. > or does it rely on some other patch to be applied > first? It doesn't apply cleanly for me. It requires part 2 (attachment 591720 [details] [diff] [review]).

Henri Sivonen (:hsivonen)

Assignee

Comment 52

•

13 years ago

Attached patch Part 2: Make nsPlainText serializer not be a content sink, unrotted [Checked in: Comment 54] — Details — Splinter Review

Attachment #591720 - Attachment is obsolete: true

Attachment #603648 - Flags: review+

Daniel Holbert [:dholbert]

Updated

•

13 years ago

Depends on: 733935

Henri Sivonen (:hsivonen)

Assignee

Comment 53

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #44) > Comment on attachment 599068 [details] [diff] [review] > Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 > Please file followups for the XXX Bug 734059, bug 734060 and bug 734063. > >- if(NS_SUCCEEDED(GetAttributeValue(aNode, nsGkAtoms::start, startAttr))){ > >+ if(NS_SUCCEEDED(GetAttributeValue(nsGkAtoms::start, startAttr))){ > space after 'if' and before '{' Fixed throughout the file. > >- else if (IsBlockLevel(aTag) > >- && type != eHTMLTag_script > >- && type != eHTMLTag_doctypeDecl > >- && type != eHTMLTag_markupDecl) { > >+ else if (nsContentUtils::IsHTMLBlock(aTag) > >+ && aTag != nsGkAtoms::script) { > Hmm, where do we handle doctype serialization? It's a no-op in the .h file. > >+ nsAutoString aTagAttr; > Badly named variable. aFoo is for parameters. Fixed. > > nsPlainTextSerializer::IsInPre() > > { > > PRInt32 i = mTagStackIndex; > > while(i > 0) { > >- if(mTagStack[i-1] == eHTMLTag_pre) > >+ if(mTagStack[i-1] == nsGkAtoms::pre) > > return true; > >- if(IsBlockLevel(mTagStack[i-1])) { > >+ if(nsContentUtils::IsHTMLBlock(mTagStack[i-1])) { > i - 1, and space after 'if' Fixed. Thanks. https://hg.mozilla.org/integration/mozilla-inbound/rev/0617488cb3b9 https://hg.mozilla.org/integration/mozilla-inbound/rev/9c8f60098608 https://hg.mozilla.org/integration/mozilla-inbound/rev/a674b76d8d35

Marco Bonardo [:mak]

Comment 54

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/0617488cb3b9 https://hg.mozilla.org/mozilla-central/rev/9c8f60098608 https://hg.mozilla.org/mozilla-central/rev/a674b76d8d35

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla13

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #603648 - Attachment description: Part 2: Make nsPlainText serializer not be a content sink, unrotted → Part 2: Make nsPlainText serializer not be a content sink, unrotted [Checked in: Comment 54]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Attachment #599068 - Attachment description: Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 → Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54]

Serge Gautherie (:sgautherie)

Updated

•

13 years ago

Flags: in-testsuite+

Jorge Villalobos [:jorgev] (he/him)

Comment 55

•

13 years ago

I'm writing the Firefox 13 compat docs, so please correct me if I'm wrong: nsIScriptableUnescapeHTML can still be used, but it should be considered deprecated in favor of nsIParserUtils. Is this right?

Henri Sivonen (:hsivonen)

Assignee

Comment 56

•

13 years ago

(In reply to Jorge Villalobos [:jorgev] from comment #55) > I'm writing the Firefox 13 compat docs, so please correct me if I'm wrong: > nsIScriptableUnescapeHTML can still be used, but it should be considered > deprecated in favor of nsIParserUtils. Is this right? That's correct. Keeping nsIScriptableUnescapeHTML around is cheap, so there aren't any plans to remove is at present. However, I would encourage extension developers to use the new APIs in order to pass flags that are appropriate for their use cases. The flags implied by the old API may not be the most appropriate ones considering the use cases.

Ben Bucksch (:BenB)

Updated

•

11 years ago

Attachment #599070 - Flags: review?(akkzilla)

Ben Bucksch (:BenB)

Updated

•

11 years ago

Attachment #599068 - Flags: review?(akkzilla)

You need to log in before you can comment on or make changes to this bug.