The default bug view has changed. See this FAQ.

Implement HTML to plain text conversion as a DOM walker without nsParser dependency

RESOLVED FIXED in mozilla13

Status

()

Core
Serializers
RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

(Depends on: 1 bug, Blocks: 3 bugs, {addon-compat, dev-doc-needed})

Trunk
mozilla13
addon-compat, dev-doc-needed
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 9 obsolete attachments)

20.40 KB, patch
smaug
: review+
Details | Diff | Splinter Review
45.44 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.11 KB, patch
smaug
: review+
Details | Diff | Splinter Review
47.43 KB, patch
smaug
: review+
Details | Diff | Splinter Review
25.23 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.79 KB, patch
smaug
: review+
Details | Diff | Splinter Review
12.12 KB, patch
hsivonen
: review+
Details | Diff | Splinter Review
(Assignee)

Description

6 years ago
It appears that currently clipboard-related conversions from HTML to plain text use http://mxr.mozilla.org/mozilla-central/source/widget/src/xpwidgets/nsHTMLFormatConverter.cpp#276 which uses the old parser and relies on nsPlainTextSerializer being an nsIHTMLContentSink.

The old parser and the old content sink interfaces are going to go away. These conversions should use code that uses the new HTML parser to go from HTML source into a DOM if necessary and then serialization code that takes a DOM (sub)tree and outputs plain text.
(Assignee)

Updated

6 years ago
Blocks: 650787

Comment 1

6 years ago
Please note the warnings at bug 650787 comment 1, 10 and 11.

Updated

5 years ago
Blocks: 282244
(Assignee)

Comment 2

5 years ago
Created attachment 590186 [details] [diff] [review]
WIP

This patch moves conversion to plain text to behind a static method in nsContentUtils instead of the XPCOM stuff we had before. What should I do about the TestHarness.h-based test can't call nsContentUtils?
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #590186 - Flags: feedback?(bugs)

Comment 3

5 years ago
Comment on attachment 590186 [details] [diff] [review]
WIP

Henri, the converters are used in many places, also in Thunderbird and various in extensions. They must be exposed by XPCOM. Where similar functionality had not been properly exposed to JavaScript, it posed a huge problem for many parties, I've often heard people desperately needing it but having big problems to get to it from JS. If anything, it must be more accessible from XPCOM, not less.

This is really important functionality that must be available to all components, like spell-checker, gloda, addon-manager wanting to display a HTML snipplet as plaintext, or anything else that needs to convert HTML to plaintext.
Attachment #590186 - Flags: feedback-

Comment 4

5 years ago
Comment on attachment 590186 [details] [diff] [review]
WIP

As one of the authors of this component, I hope I don't overstep my authority by doing review- on this.
Attachment #590186 - Flags: review-
(Assignee)

Comment 5

5 years ago
(In reply to Ben Bucksch (:BenB) from comment #3)
> Henri, the converters are used in many places, also in Thunderbird 

I'm planning to change Thunderbird call sites.

> They must be exposed by XPCOM.

XPCOM-exposed content sinks are going away and the old parser that can feed into them is going away. The old way of using this stuff isn't going to stay around.

However, I'd be OK with wrapping the new nsContentUtils method in something that can be called through XPCOM.

Would it work for you if nsScriptableUnescapeHTML (exposed via XPCOM) allowed the caller to pass flags and the column wrap?

Comment 6

5 years ago
Comment on attachment 590186 [details] [diff] [review]
WIP

Yes, any way for JS callers to pass HTML through it and get plaintext using this class/implementation, and passing flags to it to control it, would be fine with me. Thanks!
Revoking my objection.
Attachment #590186 - Flags: review-
Attachment #590186 - Flags: feedback-
(Assignee)

Updated

5 years ago
Keywords: addon-compat
(Assignee)

Updated

5 years ago
Attachment #590186 - Flags: feedback?(bugs)
(Assignee)

Comment 7

5 years ago
Created attachment 590721 [details] [diff] [review]
Part 1: Remove the dependency on nsParser

This patch exposes all functionality via XPCOM.
Attachment #590186 - Attachment is obsolete: true
(Assignee)

Comment 8

5 years ago
Created attachment 590722 [details] [diff] [review]
Part 2: Remove dependency on nsIParserNode and nsIParserService
(Assignee)

Updated

5 years ago
Blocks: 720623
(Assignee)

Comment 9

5 years ago
Created attachment 591430 [details] [diff] [review]
Part 1: Remove the dependency on nsParser, v2
Attachment #590721 - Attachment is obsolete: true
(Assignee)

Comment 10

5 years ago
Created attachment 591432 [details] [diff] [review]
Part 2: Remove dependency on nsIParserNode and nsIParserService, v2
Attachment #590722 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Depends on: 721313
(Assignee)

Comment 11

5 years ago
Created attachment 591719 [details] [diff] [review]
Part 1: Introduce the new API
Attachment #591430 - Attachment is obsolete: true
(Assignee)

Comment 12

5 years ago
Created attachment 591720 [details] [diff] [review]
Part 2: Make nsPlainText serializer not be a content sink
(Assignee)

Comment 13

5 years ago
Created attachment 591722 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v3

Some bits in the last part would logically belong in the middle part, but the last two parts will land together anyway.
Attachment #591432 - Attachment is obsolete: true
(Assignee)

Comment 14

5 years ago
Dev doc relevance:
This change affects extension and XULRunner documentation. (Not Web platform documentation.)

nsIHTMLToTextSink is removed. nsIScriptableHTMLParser is introduced. (Instantiated with NS_SCRIPTABLEHTMLPARSER_CONTRACTID / "@mozilla.org/scriptablehtmlparser;1".)
Keywords: dev-doc-needed
(Assignee)

Updated

5 years ago
Attachment #591719 - Flags: review?(bugs)
(Assignee)

Updated

5 years ago
Attachment #591720 - Flags: review?(bugs)
(Assignee)

Updated

5 years ago
Attachment #591722 - Flags: review?(bugs)
(Assignee)

Comment 15

5 years ago
Should I move nsScriptableUnescape.h/cpp to some place considered part of Core? I left it where it was in order to avoid breaking history.
(Assignee)

Comment 16

5 years ago
Created attachment 591755 [details] [diff] [review]
Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core
[Checked in: Comment 47]
Attachment #591755 - Flags: review?(bugs)
You should use the new license.
Comment on attachment 591719 [details] [diff] [review]
Part 1: Introduce the new API

>+[scriptable, uuid(290f49bb-0619-4bda-8006-ab31bec7231a)]
>+interface nsIScriptableHTMLParser : nsISupports
Kind of strange name.
Would nsIHTMLToTextConverter be ok?
It could then have convert() method

Also, the new interface should have some tests.
(perhaps they are in some other patch for this bug)
Attachment #591719 - Flags: review?(bugs) → review-
Comment on attachment 591755 [details] [diff] [review]
Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core
[Checked in: Comment 47]

This needs some minor changes if scriptableHTMLParser is renamed
Attachment #591755 - Flags: review?(bugs) → review+
Comment on attachment 591720 [details] [diff] [review]
Part 2: Make nsPlainText serializer not be a content sink

Horrible old code
Attachment #591720 - Flags: review?(bugs) → review+
Comment on attachment 591722 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v3


>+nsContentUtils::IsHTMLBlock(nsIAtom* aLocalName)
>+{
>+  return (aLocalName == nsGkAtoms::address) ||
>+    (aLocalName == nsGkAtoms::article) ||
Might look nicer if (aLocalName == nsGkAtoms::address) ||
was in its own line


>-      rv = DoAddLeaf(nsnull,
>-                     eHTMLTag_text,
>-                     Substring(textstr, start, length-start));
>+      DoAddText(false, Substring(textstr, start, length-start));
space before and after -


>+nsPlainTextSerializer::DoAddText(bool aIsLineBreak, const nsAString& aText)
...
>+  if ((mTagStackIndex > 1 &&
>+       mTagStack[mTagStackIndex-2] == nsGkAtoms::select) ||
>+      (mTagStackIndex > 0 &&
>+        mTagStack[mTagStackIndex-1] == nsGkAtoms::select)) {
>+    // Don't output the contents of SELECT elements;
>+    // Might be nice, eventually, to output just the selected element.
>+    // Read more in bug 31994.
>+    return;
>+  }
>+
>+  if (mTagStackIndex > 0 &&
>+      (mTagStack[mTagStackIndex-1] == nsGkAtoms::script ||
>+       mTagStack[mTagStackIndex-1] == nsGkAtoms::style)) {
>+    // Don't output the contents of <script> or <style> tags;
>+    return;
>+  }
So this code is now in two places.
Could you add some helper method for it.
Attachment #591722 - Flags: review?(bugs) → review-
(Assignee)

Comment 22

5 years ago
(In reply to Olli Pettay [:smaug] from comment #18)
> >+interface nsIScriptableHTMLParser : nsISupports
> Kind of strange name.
> Would nsIHTMLToTextConverter be ok?
> It could then have convert() method

This isn't really a plain text converter interface. It's an interface for exposing entry points to the HTML parser to scripts and to outside libxul. These methods are logically like static methods, but they are on an interface, because that's how XPCOM does stuff.

Considering what's now in the pipeline, bug 650776 will add more stuff to this interface.

Maybe putting "HTML" in the interface name isn't great idea, since folding the functionality of nsIScriptableUnescapeHTML would already introduce an entry point to the XML parser. But, OTOH, it doesn't matter that much if a primarily HTML parser interface has some XML entry points, too.

Are you OK with introducing an XPCOM service with a bunch of methods to be called by code that can't call nsContentUtils directly? I'd rather not add a bunch of one-off interfaces like nsIHTMLToTextConverter now that we don't guarantee frozen vtables anyway. How would you name such a service? nsIParserService is taken and I thought I'd keep new stuff on its way in and legacy stuff on its way out separate.

> Also, the new interface should have some tests.
> (perhaps they are in some other patch for this bug)

The patch changes content/base/test/TestPlainTextSerializer.cpp to call the new interface. The serializer functionality behind the interface hasn't expanded, so retargeting the existing test coverage to the new entry point seemed appropriate.
(In reply to Henri Sivonen (:hsivonen) from comment #22)
> Are you OK with introducing an XPCOM service with a bunch of methods to be
> called by code that can't call nsContentUtils directly?
That sounds good. We even used to have nsIContentUtils.
The name was terrible though.

 I'd rather not add a
> bunch of one-off interfaces like nsIHTMLToTextConverter now that we don't
> guarantee frozen vtables anyway. How would you name such a service?
> nsIParserService is taken and I thought I'd keep new stuff on its way in and
> legacy stuff on its way out separate.
nsIParserUtils? or nsIParsingUtils


> The patch changes content/base/test/TestPlainTextSerializer.cpp to call the
> new interface. The serializer functionality behind the interface hasn't
> expanded, so retargeting the existing test coverage to the new entry point
> seemed appropriate.
Just test that the new things works also in JS (since you're making the interface scriptable.)
(Assignee)

Comment 24

5 years ago
(In reply to Olli Pettay [:smaug] from comment #23)
> nsIParserUtils?

We already have nsParserUtils that's pretty much unrelated to parsing. OK if I rename that one to nsAttributeUtils to get it out of the way?
Assignee: hsivonen → nobody
Component: Serializers → Selection
QA Contact: dom-to-text → selection

Updated

5 years ago
Component: Selection → Serializers
QA Contact: selection → dom-to-text
What about merging the current nsParserUtils to nsContentUtils, and then 
use nIParserUtils for this new stuff.
Component: Serializers → Selection
QA Contact: dom-to-text → selection
Ahem, can we stop moving this bug into Core::Selection?
Component: Selection → Serializers
QA Contact: selection → dom-to-text
Assignee: nobody → hsivonen

Comment 27

5 years ago
Before you move components around: "Utils" is a pretty bad name anyway. I'd recommend something with "convert" or similar in the name.
(Assignee)

Comment 28

5 years ago
(In reply to Ms2ger from comment #26)
> Ahem, can we stop moving this bug into Core::Selection?

That happens due to some interaction between lazy-loading tabs and form state restoration. The changes are accidental and involuntary.

(In reply to Olli Pettay [:smaug] from comment #25)
> What about merging the current nsParserUtils to nsContentUtils, and then 
> use nIParserUtils for this new stuff.

Sold!

(In reply to Ben Bucksch (:BenB) from comment #27)
> Before you move components around: "Utils" is a pretty bad name anyway. I'd
> recommend something with "convert" or similar in the name.

Naming is a total bikeshed and I already spent time renaming stuff to nsIParserUtils. Also, I expect the interface to gain non-conversion methods in due course.

Comment 29

5 years ago
> Naming is a total bikeshed

Yes, true, but naming is also the reason why nobody used nsIFormatConverter (bug 720623): nobody found it, nobody knew it. For me, "utils" means internal, minor stuff that is not very interesting. I would *never* look there, if I was searching for a HTML->TXT converter.

But whatever, we'll see.
(Assignee)

Comment 30

5 years ago
Created attachment 598845 [details] [diff] [review]
Part 0: Fold old nsParserUtils into nsContentUtils

While at it, I removed a duplicated instance of the PI pseudo attribute parsing code and named the methods in a way that's more appropriate in terms of xml-stylesheet terminology.
Attachment #598845 - Flags: review?(bugs)
I assume the change in nsHtml5TreeOpExecutor.h wasn't meant for this patch?
(Assignee)

Updated

5 years ago
Attachment #598845 - Attachment is obsolete: true
Attachment #598845 - Flags: review?(bugs)
(Assignee)

Comment 32

5 years ago
Created attachment 599058 [details] [diff] [review]
Part 0: Fold old nsParserUtils into nsContentUtils, v2
[Checked in: Comment 47]

> I assume the change in nsHtml5TreeOpExecutor.h wasn't meant for this patch?

Correct. Thanks.
Attachment #599058 - Flags: review?(bugs)
(Assignee)

Comment 33

5 years ago
Created attachment 599067 [details] [diff] [review]
Part 4: Scripted test case
[Checked in: Comment 47]
(Assignee)

Comment 34

5 years ago
Created attachment 599068 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
[Checked in: Comment 54]
Attachment #591722 - Attachment is obsolete: true
(Assignee)

Comment 35

5 years ago
Created attachment 599070 [details] [diff] [review]
Part 1: Introduce the new API, v2
[Checked in: Comment 47]
Attachment #591719 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Blocks: 729044
Comment on attachment 599058 [details] [diff] [review]
Part 0: Fold old nsParserUtils into nsContentUtils, v2
[Checked in: Comment 47]


>+nsContentUtils::IsJavaScriptLanguage(const nsString& aName, PRUint32 *aFlags)
>+{
>+  JSVersion version = JSVERSION_UNKNOWN;
>+
>+  if (aName.LowerCaseEqualsLiteral("javascript") ||
>+      aName.LowerCaseEqualsLiteral("livescript") ||
>+      aName.LowerCaseEqualsLiteral("mocha")) {
>+    version = JSVERSION_DEFAULT;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.0")) {
>+    version = JSVERSION_1_0;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.1")) {
>+    version = JSVERSION_1_1;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.2")) {
>+    version = JSVERSION_1_2;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.3")) {
>+    version = JSVERSION_1_3;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.4")) {
>+    version = JSVERSION_1_4;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.5")) {
>+    version = JSVERSION_1_5;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.6")) {
>+    version = JSVERSION_1_6;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.7")) {
>+    version = JSVERSION_1_7;
>+  }
>+  else if (aName.LowerCaseEqualsLiteral("javascript1.8")) {
>+    version = JSVERSION_1_8;
>+  }
>+  if (version == JSVERSION_UNKNOWN)
>+    return false;

I know, you're just moving code, but could you fix the coding style
if (expr) {
  stmt;
}
and
} else if (expr) {
Attachment #599058 - Flags: review?(bugs) → review+
(Assignee)

Updated

5 years ago
Attachment #599067 - Flags: review?(bugs)
(Assignee)

Updated

5 years ago
Attachment #599068 - Flags: review?(bugs)
(Assignee)

Updated

5 years ago
Attachment #599070 - Flags: review?(bugs)
(Assignee)

Comment 37

5 years ago
Bah. Now that TBPL is back, it shows failures where there were none locally. :-(

Comment 38

5 years ago
Comment on attachment 599068 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
[Checked in: Comment 54]

Ok, these are far-reaching changes. PlainTextSerializer has
* so many use cases (format=flowed, rich plaintext and copy&paste)
* delicate edge cases (various block elements, vertical and horizontal white space etc.) - basically similar to the HTML Editor rules
* was created a decade before we had a useable testsuite and
* is highly prone to regressions

This has to be reviewed by somebody who has been involved in its creation (akk, Daniel or me, but not even I feel comfortable reviewing this, far too many changes).
Attachment #599068 - Flags: review?(akkzilla)

Updated

5 years ago
Attachment #599070 - Flags: review?(akkzilla)

Comment 39

5 years ago
Thunderbird is highly dependent on this class.

One practical way to test is to run a large number of both HTML documents and HTML emails through the old version and the new version and see whether the result matches.

Only exception is bug 314213: We need an empty line after <blockquote type=site>, but it's currently not there. You'd be highly welcome to fix this :). Please mind nested quotes, too.
(Assignee)

Comment 40

5 years ago
Created attachment 599488 [details] [diff] [review]
Part 0.5: Let the build system know that parser/html/ can have .idl files
[Checked in: Comment 47]

This patch fixes the tryserver issues mentioned in comment 37.
Attachment #599488 - Flags: review?(bugs)
(Assignee)

Comment 41

5 years ago
(In reply to Ben Bucksch (:BenB) from comment #38)
> Comment on attachment 599068 [details] [diff] [review]
> Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
> 
> Ok, these are far-reaching changes.

I think you are exaggerating. Part 3 does a pretty straight-forward substitution of old HTML tag enum values with corresponding nsGkAtoms. The most complex-looking changes arise, because in the new version, text nodes don't have a tag id enum value.

nsContentUtils::IsHTMLVoid is trivial and easy to verify to be correct.

nsContentUtils::IsHTMLBlock is derived by reading what the old code did. It should be possible to verify its equivalence to the old code by looking at the old and new code carefully. I welcome you to look at the old and new code carefully to check that I read the old code correctly.

(In reply to Ben Bucksch (:BenB) from comment #39)
> Thunderbird is highly dependent on this class.

Does Thunderbird accordingly have unit tests for this?

> One practical way to test is to run a large number of both HTML documents
> and HTML emails through the old version and the new version and see whether
> the result matches.

Since you seem to have an idea of what cases are interesting, it would be helpful if you could supply test cases (as HTML files--not in email wrappers).
Comment on attachment 599488 [details] [diff] [review]
Part 0.5: Let the build system know that parser/html/ can have .idl files
[Checked in: Comment 47]

rs=me
Attachment #599488 - Flags: review?(bugs) → review+
Comment on attachment 599070 [details] [diff] [review]
Part 1: Introduce the new API, v2
[Checked in: Comment 47]


>+  /**
>+   * Converts HTML source to plain text by parsing the source and using the
>+   * plain text serializer on the resulting tree.
>+   *
>+   * @param aSourceBuffer the string to parse as an HTML document
>+   * @param aResultBuffer the string where the plain text result appears;
>+   *                      may be the same string as aSourceBuffer
>+   * @param aFlags Flags from nsIDocumentEncoder.
>+   * @param aWrapCol Number of columns after which to line wrap
0 prevents wrapping, right? If so, could you please document it.
Attachment #599070 - Flags: review?(bugs) → review+
Comment on attachment 599068 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
[Checked in: Comment 54]

>+nsContentUtils::IsHTMLBlock(nsIAtom* aLocalName)
>+{
>+  return
>+    (aLocalName == nsGkAtoms::address) ||
>+    (aLocalName == nsGkAtoms::article) ||
>+    (aLocalName == nsGkAtoms::aside) ||
>+    (aLocalName == nsGkAtoms::blockquote) ||
>+    (aLocalName == nsGkAtoms::center) ||
>+    (aLocalName == nsGkAtoms::dir) ||
>+    (aLocalName == nsGkAtoms::div) ||
>+    (aLocalName == nsGkAtoms::dl) || // XXX why not dt and dd?
>+    (aLocalName == nsGkAtoms::fieldset) ||
>+    (aLocalName == nsGkAtoms::figure) || // XXX shouldn't figcaption be on this list
>+    (aLocalName == nsGkAtoms::footer) ||
>+    (aLocalName == nsGkAtoms::form) ||
>+    (aLocalName == nsGkAtoms::h1) ||
>+    (aLocalName == nsGkAtoms::h2) ||
>+    (aLocalName == nsGkAtoms::h3) ||
>+    (aLocalName == nsGkAtoms::h4) ||
>+    (aLocalName == nsGkAtoms::h5) ||
>+    (aLocalName == nsGkAtoms::h6) ||
>+    (aLocalName == nsGkAtoms::header) ||
>+    (aLocalName == nsGkAtoms::hgroup) ||
>+    (aLocalName == nsGkAtoms::hr) ||
>+    (aLocalName == nsGkAtoms::li) ||
>+    (aLocalName == nsGkAtoms::listing) ||
>+    (aLocalName == nsGkAtoms::menu) ||
>+    (aLocalName == nsGkAtoms::multicol) || // XXX get rid of this one?
>+    (aLocalName == nsGkAtoms::nav) ||
>+    (aLocalName == nsGkAtoms::ol) ||
>+    (aLocalName == nsGkAtoms::p) ||
>+    (aLocalName == nsGkAtoms::pre) ||
>+    (aLocalName == nsGkAtoms::section) ||
>+    (aLocalName == nsGkAtoms::table) ||
>+    (aLocalName == nsGkAtoms::ul) ||
>+    (aLocalName == nsGkAtoms::xmp);
>+}
Please file followups for the XXX

>-        if(NS_SUCCEEDED(GetAttributeValue(aNode, nsGkAtoms::start, startAttr))){
>+        if(NS_SUCCEEDED(GetAttributeValue(nsGkAtoms::start, startAttr))){
space after 'if' and before '{'

>-  else if (IsBlockLevel(aTag)
>-           && type != eHTMLTag_script
>-           && type != eHTMLTag_doctypeDecl
>-           && type != eHTMLTag_markupDecl) {
>+  else if (nsContentUtils::IsHTMLBlock(aTag)
>+           && aTag != nsGkAtoms::script) {
Hmm, where do we handle doctype serialization?


>+    nsAutoString aTagAttr;
Badly named variable. aFoo is for parameters.


> nsPlainTextSerializer::IsInPre()
> {
>   PRInt32 i = mTagStackIndex;
>   while(i > 0) {
>-    if(mTagStack[i-1] == eHTMLTag_pre)
>+    if(mTagStack[i-1] == nsGkAtoms::pre)
>       return true;
>-    if(IsBlockLevel(mTagStack[i-1])) {
>+    if(nsContentUtils::IsHTMLBlock(mTagStack[i-1])) {
i - 1, and space after 'if'

Tricky to review.
Attachment #599068 - Flags: review?(bugs) → review+
Comment on attachment 599067 [details] [diff] [review]
Part 4: Scripted test case
[Checked in: Comment 47]

Could you put the test to content/base/test/chrome
Attachment #599067 - Flags: review?(bugs) → review+
(Assignee)

Comment 46

5 years ago
(In reply to Olli Pettay [:smaug] from comment #36)
> Comment on attachment 599058 [details] [diff] [review]
> Part 0: Fold old nsParserUtils into nsContentUtils, v2
...
> I know, you're just moving code, but could you fix the coding style
> if (expr) {
>   stmt;
> }
> and
> } else if (expr) {

Fixed.

(In reply to Olli Pettay [:smaug] from comment #43)
> Comment on attachment 599070 [details] [diff] [review]
> Part 1: Introduce the new API, v2
...
> 0 prevents wrapping, right? If so, could you please document it.

Documented.

(In reply to Olli Pettay [:smaug] from comment #45)
> Comment on attachment 599067 [details] [diff] [review]
> Part 4: Scripted test case
> 
> Could you put the test to content/base/test/chrome

Moved there.

Thanks.

Landed parts other than 2 and 3:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ccebf6862a9
https://hg.mozilla.org/integration/mozilla-inbound/rev/d40ba365b995
https://hg.mozilla.org/integration/mozilla-inbound/rev/429263e58090
https://hg.mozilla.org/integration/mozilla-inbound/rev/a423ca3c3ce1
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ea9dc2f8570

comm-central needs a patch so that it starts using the API introduced in part 1 before parts 2 and 3 can land without burning c-c.
Whiteboard: [please leave open after inbound merge; parts 2 and 3 need to land after comm-central changes]
https://hg.mozilla.org/mozilla-central/rev/5ccebf6862a9
https://hg.mozilla.org/mozilla-central/rev/d40ba365b995
https://hg.mozilla.org/mozilla-central/rev/429263e58090
https://hg.mozilla.org/mozilla-central/rev/a423ca3c3ce1
https://hg.mozilla.org/mozilla-central/rev/8ea9dc2f8570
Whiteboard: [please leave open after inbound merge; parts 2 and 3 need to land after comm-central changes]
Attachment #599058 - Attachment description: Part 0: Fold old nsParserUtils into nsContentUtils, v2 → Part 0: Fold old nsParserUtils into nsContentUtils, v2 [Checked in: Comment 47]
Attachment #599070 - Attachment description: Part 1: Introduce the new API, v2 → Part 1: Introduce the new API, v2 [Checked in: Comment 47]
Attachment #591755 - Attachment description: Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core → Part 1.5: Move nsScriptableUnescapeHTML from Toolkit to Core [Checked in: Comment 47]
Attachment #599067 - Attachment description: Part 4: Scripted test case → Part 4: Scripted test case [Checked in: Comment 47]
Attachment #599488 - Attachment description: Part 0.5: Let the build system know that parser/html/ can have .idl files → Part 0.5: Let the build system know that parser/html/ can have .idl files [Checked in: Comment 47]
Blocks: 731668

Updated

5 years ago
Depends on: 731896
(In reply to Henri Sivonen (:hsivonen) from comment #46)
> comm-central needs a patch so that it starts using the API introduced in
> part 1 before parts 2 and 3 can land without burning c-c.

Ftr, you filed that as bug 650787.

Comment 49

5 years ago
Comment on attachment 599068 [details] [diff] [review]
Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
[Checked in: Comment 54]

has this bit-rotted, or does it rely on some other patch to be applied first? It doesn't apply cleanly for me.

Comment 50

5 years ago
it seems the current landing has broken feeds use of nsIScriptableUnescapeHTML in Tb trunk.  Bug 732070.
Blocks: 732070
(Assignee)

Comment 51

5 years ago
(In reply to David :Bienvenu from comment #49)
> Comment on attachment 599068 [details] [diff] [review]
> Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
> 
> has this bit-rotted, 

No, AFAICT.

> or does it rely on some other patch to be applied
> first? It doesn't apply cleanly for me.

It requires part 2 (attachment 591720 [details] [diff] [review]).
(Assignee)

Comment 52

5 years ago
Created attachment 603648 [details] [diff] [review]
Part 2: Make nsPlainText serializer not be a content sink, unrotted
[Checked in: Comment 54]
Attachment #591720 - Attachment is obsolete: true
Attachment #603648 - Flags: review+
Depends on: 733935
(Assignee)

Comment 53

5 years ago
(In reply to Olli Pettay [:smaug] from comment #44)
> Comment on attachment 599068 [details] [diff] [review]
> Part 3: Remove dependency on nsIParserNode and nsIParserService, v4
> Please file followups for the XXX

Bug 734059, bug 734060 and bug 734063.

> >-        if(NS_SUCCEEDED(GetAttributeValue(aNode, nsGkAtoms::start, startAttr))){
> >+        if(NS_SUCCEEDED(GetAttributeValue(nsGkAtoms::start, startAttr))){
> space after 'if' and before '{'

Fixed throughout the file.

> >-  else if (IsBlockLevel(aTag)
> >-           && type != eHTMLTag_script
> >-           && type != eHTMLTag_doctypeDecl
> >-           && type != eHTMLTag_markupDecl) {
> >+  else if (nsContentUtils::IsHTMLBlock(aTag)
> >+           && aTag != nsGkAtoms::script) {
> Hmm, where do we handle doctype serialization?

It's a no-op in the .h file.

> >+    nsAutoString aTagAttr;
> Badly named variable. aFoo is for parameters.

Fixed.

> > nsPlainTextSerializer::IsInPre()
> > {
> >   PRInt32 i = mTagStackIndex;
> >   while(i > 0) {
> >-    if(mTagStack[i-1] == eHTMLTag_pre)
> >+    if(mTagStack[i-1] == nsGkAtoms::pre)
> >       return true;
> >-    if(IsBlockLevel(mTagStack[i-1])) {
> >+    if(nsContentUtils::IsHTMLBlock(mTagStack[i-1])) {
> i - 1, and space after 'if'

Fixed.

Thanks.

https://hg.mozilla.org/integration/mozilla-inbound/rev/0617488cb3b9
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c8f60098608
https://hg.mozilla.org/integration/mozilla-inbound/rev/a674b76d8d35
https://hg.mozilla.org/mozilla-central/rev/0617488cb3b9
https://hg.mozilla.org/mozilla-central/rev/9c8f60098608
https://hg.mozilla.org/mozilla-central/rev/a674b76d8d35
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Attachment #603648 - Attachment description: Part 2: Make nsPlainText serializer not be a content sink, unrotted → Part 2: Make nsPlainText serializer not be a content sink, unrotted [Checked in: Comment 54]
Attachment #599068 - Attachment description: Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 → Part 3: Remove dependency on nsIParserNode and nsIParserService, v4 [Checked in: Comment 54]
Flags: in-testsuite+
I'm writing the Firefox 13 compat docs, so please correct me if I'm wrong: nsIScriptableUnescapeHTML can still be used, but it should be considered deprecated in favor of nsIParserUtils. Is this right?
(Assignee)

Comment 56

5 years ago
(In reply to Jorge Villalobos [:jorgev] from comment #55)
> I'm writing the Firefox 13 compat docs, so please correct me if I'm wrong:
> nsIScriptableUnescapeHTML can still be used, but it should be considered
> deprecated in favor of nsIParserUtils. Is this right?

That's correct. Keeping nsIScriptableUnescapeHTML around is cheap, so there aren't any plans to remove is at present.  However, I would encourage extension developers to use the new APIs in order to pass flags that are appropriate for their use cases.  The flags implied by the old API may not be the most appropriate ones considering the use cases.

Updated

2 years ago
Attachment #599070 - Flags: review?(akkzilla)

Updated

2 years ago
Attachment #599068 - Flags: review?(akkzilla)
You need to log in before you can comment on or make changes to this bug.