Scriptable XPCOM wrapper for SAX content handler

RESOLVED FIXED

Status

()

Core
XML
--
enhancement
RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: Matthew Gertner, Assigned: Robert Sayre)

Tracking

(Depends on: 1 bug, {fixed1.8.1})

Trunk
fixed1.8.1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 25 obsolete attachments)

962 bytes, text/plain
Details
798 bytes, patch
peterv
: review+
peterv
: superreview+
Details | Diff | Splinter Review
105.29 KB, patch
Details | Diff | Splinter Review
105.79 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Here's a proposal for a scriptable SAX content handler (requested by bsmedberg). I've followed the 80/20 principle so I didn't include methods for skipped entities and document locators. But I can add these if people think they're important.

Reproducible: Always
(Reporter)

Comment 1

12 years ago
Created attachment 202482 [details]
IDL for SAX content handler
(Reporter)

Comment 2

12 years ago
Created attachment 202485 [details]
Fixed prefix and tabs

I accidently used our internal prefix "ap" for the interface. I've changed this to "ns". Also, the tabs were a little messed up, so I changed them to spaces.
why ACString instead of AUTF8String? all those can contain all unicode characters, can't they?
(Reporter)

Comment 4

12 years ago
AFAIK, XPConnect does horrible things to UTF-8 strings if they are declared as ACString. The conversion is performed correctly if they are declared as AUTF8String. The two are identical in C++, of course.
exactly, so why are you using ACString here:
    void onStartElement(in ACString uri, in ACString localName, in nsIPropertyBag attributes);
(Reporter)

Comment 6

12 years ago
Okay, I see what you're getting at. The tag name should be AUTF8String in case someone is brave (foolish?) enough to use internationalized tag names. The URI, of course, may not contain non-ASCII characters.

But actually I should probably have used AString everywhere to be consistent with the existing DOM and XML Schema interfaces. The downside is that in reality most data sources seem to use UTF-8 (my subjective experience), so this could imply needless conversion in some cases. That's why we use UTF-8.

Opinions?
The data coming out of expat is UTF16, we should use AString.
Status: UNCONFIRMED → NEW
Ever confirmed: true
> The URI, of course, may not contain non-ASCII characters.

why do you say that? consider IDN... or more generally, IRIs.
(Reporter)

Comment 9

12 years ago
Created attachment 202521 [details]
Changed all string parameters to UTF-16
(Reporter)

Updated

12 years ago
Attachment #202482 - Attachment is obsolete: true
(Reporter)

Updated

12 years ago
Attachment #202485 - Attachment is obsolete: true
(Reporter)

Comment 10

12 years ago
(In reply to comment #8)
> > The URI, of course, may not contain non-ASCII characters.
> 
> why do you say that? consider IDN... or more generally, IRIs.

Well, the XML namespace spec references RFC2396 (i.e. standard, non-internationalized URIs). In any case, I changed all the parameters to AStrings, so the question is moot now.
(In reply to comment #10)
> Well, the XML namespace spec references RFC2396

http://www.w3.org/TR/xml-names11/#sec-namespaces
(Reporter)

Comment 12

12 years ago
(In reply to comment #11)
> http://www.w3.org/TR/xml-names11/#sec-namespaces

I was looking at the original namespace spec, which strangely doesn't reference the 1.1 spec (that I can see). Thanks for point that out.
(Assignee)

Comment 13

12 years ago
(In reply to comment #12)
> (In reply to comment #11)
> > http://www.w3.org/TR/xml-names11/#sec-namespaces
> 
> I was looking at the original namespace spec, which strangely doesn't reference
> the 1.1 spec (that I can see). Thanks for point that out.

I don't think Expat supports XML 1.1 or XML Namespaces 1.1. I suggest sticking with XML 1.0 and XML Namespaces 1.0 for the first cut.
I think we should design interfaces in a way that they don't need constant changes.
(Reporter)

Comment 15

12 years ago
Since the interface now uses AString, the issue of whether to support internationalized URIs is no longer relevant at the IDL level. It might be worth documenting in the interface that only the Namespace 1.0 spec is supported for now, and this doubtless applies to many other interfaces as well. In any case, I agree with Christian that the interface should be designed if possible so as to avoid future changes.
(Assignee)

Comment 16

12 years ago
Created attachment 206569 [details] [diff] [review]
Full SAX2 interface with headers and stub implementations

Comment 17

12 years ago
If this gets resolved, bug 15090 should probably get duped to this.
(Assignee)

Comment 18

12 years ago
Created attachment 209213 [details] [diff] [review]
partial implementation

just saving my work
Attachment #206569 - Attachment is obsolete: true
(Assignee)

Comment 19

12 years ago
SAX Conformance Tests from ERH:
http://www.cafeconleche.org/SAXTest/

(Assignee)

Comment 20

12 years ago
Created attachment 209293 [details] [diff] [review]
ContentHandler working fully

bsmedberg,

Could you do a sanity check on this? It's still in-progress, so you don't need to get real detailed.
Attachment #209213 - Attachment is obsolete: true
Attachment #209293 - Flags: review?(benjamin)
(Assignee)

Comment 21

12 years ago
Created attachment 209294 [details]
Sample JS usage
(Assignee)

Comment 22

12 years ago
Created attachment 209300 [details] [diff] [review]
updated patch

The previous patch had an unused IDL file in it
Attachment #209293 - Attachment is obsolete: true
Attachment #209300 - Flags: review?(benjamin)
Attachment #209293 - Flags: review?(benjamin)
Comment on attachment 209300 [details] [diff] [review]
updated patch

in sax/src/Makefile.in, you set EXPORT_LIBRARY and LIBXUL_LIBRARY... I'd like you to avoid both of those for the time being. Use MOZILLA_INTERNAL_API if necessary.

nsSAXAttributes::mAttrs  is messy... the fact that you have five(?) strings per attribute is not clear from the header. Why not use

struct SAXAttr
{
  nsCString uri;
  nsCString localName;
  nsCString qName;

  // what is this, does it need to be a string?
  nsCString type;

  nsCString value;
};

nsTArray<SAXAttr> mAttrs;

Why are you making nsSAXAttributes accessible via the module? Isn't that object created by the module and handed out as needed? It seems that the "addAttribute" method should only be used internally, and you should just create the object directly (instead of via createinstance), initialize it using "internal" methods, and hand it out as an immutable object.

What is  ns[I]SAXXMLFilter? Why do we need it if it's going to be unimplemented?

Why aren't we exposing comments or doctype decls? That is important for various kinds editing applications that need to preserve source.

In nsSAXXMLReader::HandleStartElement you create a nsSAXAttributes object which you only use conditionally. Why not enclose the entire method in the (mContentHandler && supports(startElement)) check?

What's wrong with the line/column numbers in nsSAXXMLReader::ReportError?

Please comment nsSAXXMLReader::CheckMethods with #if 0 instead of /**/

After nits picked, please set r?peterv@propagandism.org, who is module owner of XML parsing.
Attachment #209300 - Flags: review?(benjamin) → review-
(Assignee)

Comment 24

12 years ago
(In reply to comment #23) 
> the fact that you have five(?) strings per
> attribute is not clear from the header. Why not use
> 
> struct SAXAttr
> {

Will do.

>   // what is this, does it need to be a string?
>   nsCString type;

We could make this a CString, but every other string has to be AString. 'type' is a DTD feature that's not used much anymore:

'The attribute type is one of the strings "CDATA", "ID", "IDREF", "IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION" (always in upper case). If the parser has not read a declaration for the attribute, or if the parser does not report attribute types, then it must return the value "CDATA" as stated in the XML 1.0 Recommendation (clause 3.3.3, "Attribute-Value Normalization"). For an enumerated attribute that is not a notation, the parser will report the type as "NMTOKEN".'

> nsTArray<SAXAttr> mAttrs;
> 
> Why are you making nsSAXAttributes accessible via the module? Isn't that 
> object created by the module and handed out as needed? It seems that the
> "addAttribute" method should only be used internally,

Disagree. It's common SAX pattern to chain handlers and create your own Attributes object and do some manipulation before calling the next handler in the chain.

> 
> What is  ns[I]SAXXMLFilter? Why do we need it if it's going to be
> unimplemented?

Well, it's part of the SAX API. It shouldn't take long to implement once the rest of the implementation is ready.
<http://www.saxproject.org/apidoc/org/xml/sax/XMLFilter.html>

> Why aren't we exposing comments or doctype decls? That is important for 
> various kinds editing applications that need to preserve source.

doctype stuff goes to the DTDHandler, but I need to make some small changes to ExpatSink to get the right events. Comments are not part of the SAX interface, but we can add them if they're really necessary. 
 
> In nsSAXXMLReader::HandleStartElement you create a nsSAXAttributes object 
> which you only use conditionally. Why not enclose the entire method in the
> (mContentHandler && supports(startElement)) check?

Will do if I am allowed to expose a few more Expat events in nsIExpatSink. If I'm not, I'll have to synthesize startPrefixMapping there.
 
> What's wrong with the line/column numbers in nsSAXXMLReader::ReportError?

Nothing anymore... I patched it. I'll remove the comment.
> We could make this a CString, but every other string has to be AString. 'type'
> is a DTD feature that's not used much anymore:
> 
> 'The attribute type is one of the strings "CDATA", "ID", "IDREF", "IDREFS",
> "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION" (always in upper
> case). If the parser has not read a declaration for the attribute, or if the
> parser does not report attribute types, then it must return the value "CDATA"

I don't think we should be using strings for this at all... attach constants to the interface and use an PRUint32 enumeration type.

> Disagree. It's common SAX pattern to chain handlers and create your own
> Attributes object and do some manipulation before calling the next handler in
> the chain.

Sure, but is this implementation useful for that purpose? Handler chains can implement the interface in other ways if needed.

> doctype stuff goes to the DTDHandler, but I need to make some small changes to
> ExpatSink to get the right events. Comments are not part of the SAX interface,
> but we can add them if they're really necessary. 

I don't think we should glue ourself to the official SAX APIs, but provide a useful implementation in that model... and comment preservation is very important to a whole class of editing applications.
(Assignee)

Comment 26

12 years ago
(In reply to comment #25)
>
> > Disagree. It's common SAX pattern to chain handlers and create your own
> > Attributes object
> 
> Sure, but is this implementation useful for that purpose?

I think so. In fact, I've never seen anyone use something other than their SAX library's AttributesImpl class.
 
> I don't think we should glue ourself to the official SAX APIs, but provide a
> useful implementation in that model... and comment preservation is very
> important to a whole class of editing applications.

Ah! <http://sax.sourceforge.net/apidoc/org/xml/sax/ext/LexicalHandler.html> 

I'll add a lexical handler interface for this.
(Assignee)

Updated

12 years ago
Depends on: 326477
(Assignee)

Updated

12 years ago
Blocks: 325080
(Assignee)

Updated

12 years ago
Assignee: xml → sayrer
(Assignee)

Updated

12 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 27

12 years ago
Created attachment 212014 [details] [diff] [review]
Updated SAX Interface and implementation

The diff contains the changes for bug 326477 as well. Anything in parser/htmlparser or parser/expat applies only to bug 326477, which this bug depends on.
Attachment #209294 - Attachment is obsolete: true
Attachment #209300 - Attachment is obsolete: true
Attachment #212014 - Flags: review?(peterv)
I'm wondering what all the functions on the nsISAXAttributes interface do.  Shouldn't they be documented?  If what they do is defined by a specification, the interface should link to that specification.

The same applies for most of the other interfaces this patch is adding.
Comment on attachment 212014 [details] [diff] [review]
Updated SAX Interface and implementation

I am reviewing this, but you should also fix what http://beaufour.dk/jst-review/?patch=212014 points out. Also would be nice to have patch without the patch for bug  326477.
(Assignee)

Comment 30

12 years ago
(In reply to comment #29)
> Also would be nice to
> have patch without the patch for bug  326477.
> 

OK, I'll prepare a new one today.
Comment on attachment 212014 [details] [diff] [review]
Updated SAX Interface and implementation

The idl files need way more documentation, both of what the methods/properties do and what the arguments are.

In the C++ implementation files you need to prefix the arguments with 'a' (aUri, ...) and use aResult instead of _retval.

Drop the comments of the type

>+/* long getIndexFromName (in AString uri, in AString localName); */

they don't add anything in terms of documentation.

Lose the empty constructors/destructors, you don't need them.

I'd like to see at least documentation why you deviate from the SAX API sometimes, for example in nsISAXErrorHandler:

>+  void error(in nsISAXLocator locator, in AString error);
>+  void fatalError(in nsISAXLocator locator, in AString error);
>+  void ignorableWarning(in nsISAXLocator locator, in AString error);

Why ignorableWarning? SAX spec uses warning.
Why do these not take a nsISAXParseException? I don't like the SAX API myself, but if we pretend to follow it somewhat we shouldn't deviate like this IMHO.

There's a bunch of methods in nsSAXAttributes that aren't used anywhere:

>+  NS_METHOD Clear();
>+  NS_METHOD RemoveAttribute(PRInt32 index);
>+  NS_METHOD SetAttribute(PRInt32 index, 
>+                         const nsAString &uri,
>+                         const nsAString &localName,
>+                         const nsAString &qName,
>+                         const nsAString &type,
>+                         const nsAString &value);
>+  NS_METHOD SetLocalName(PRInt32 index, const nsAString &localName);
>+  NS_METHOD SetQName(PRInt32 index, const nsAString &qName); 
>+  NS_METHOD SetType(PRInt32 index, const nsAString &type);
>+  NS_METHOD SetURI(PRInt32 index, const nsAString &uri);
>+  NS_METHOD SetValue(PRInt32 index, const nsAString &value);

Why do we have them?

Don't use NS_METHOD for non-interface methods (just nsresult should do).
(Assignee)

Comment 32

12 years ago
(In reply to comment #31)
> 
> I'd like to see at least documentation why you deviate from the SAX API

I also looked at the MSXML COM interface, and decided to match it where possible.

> 
> There's a bunch of methods in nsSAXAttributes that aren't used anywhere: 
> Why do we have them?

Because people will want to use the nsSAXAttributes implementation in their own ContentHandlers and SAXFilters.

(In reply to comment #32)
> (In reply to comment #31)
> > There's a bunch of methods in nsSAXAttributes that aren't used anywhere: 
> > Why do we have them?
> 
> Because people will want to use the nsSAXAttributes implementation in their own
> ContentHandlers and SAXFilters.

How would they do that, since they're not exposed on an interface?
(Assignee)

Comment 34

12 years ago
Created attachment 214024 [details] [diff] [review]
Simulacrum fixes
Attachment #212014 - Attachment is obsolete: true
Attachment #212014 - Flags: review?(peterv)
(In reply to comment #32)
> (In reply to comment #31)
> > 
> > I'd like to see at least documentation why you deviate from the SAX API
> 
> I also looked at the MSXML COM interface, and decided to match it where
> possible.

I'm curious, and unprejudiced either way: what are the advantages and disadvantages of sticking with SAX vs. matching MSXML?

/be
(Assignee)

Comment 36

12 years ago
>
> I'm curious, and unprejudiced either way: what are the advantages and
> disadvantages of sticking with SAX vs. matching MSXML?

Almost all of the differences between the two are around SAXException and InputSource. IMHO, those are Java-isms that don't map well to XPCOM, and many other SAX APIs let you ignore them (e.g. Python doesn't make you use InputSource, you can just pass a file-like).

Also, MSXML SAX is exposed in IE, but you can't write a JScript ContentHandler, AFAIK. Some folks have expressed interest in making this avaialable to web content, so I thought I would look for a match.
(In reply to comment #36)
> Almost all of the differences between the two are around SAXException and
> InputSource. IMHO, those are Java-isms that don't map well to XPCOM, and many
> other SAX APIs let you ignore them (e.g. Python doesn't make you use
> InputSource, you can just pass a file-like).
> 
> Also, MSXML SAX is exposed in IE, but you can't write a JScript ContentHandler,
> AFAIK. Some folks have expressed interest in making this avaialable to web
> content, so I thought I would look for a match.

Sounds good.

/be
(Assignee)

Comment 38

12 years ago
Created attachment 214038 [details] [diff] [review]
more simulacrum nits
Attachment #214024 - Attachment is obsolete: true
(Assignee)

Comment 39

12 years ago
Created attachment 214150 [details] [diff] [review]
now with copious documentation

All of the saxproject.org docs are public domain, so I've adapted them to XPCOM.
Attachment #214038 - Attachment is obsolete: true
(Assignee)

Comment 40

12 years ago
Created attachment 214155 [details] [diff] [review]
nits
Attachment #214150 - Attachment is obsolete: true
+ * <ol>
+ * <li>by attribute index;</li>
+ * <li>by Namespace-qualified name; or</li>
+ * <li>by qualified (prefixed) name.</li>
+ * </ol>

is there no way to avoid this html-centric stuff in comments? at least the <p> in the other paragraph can be removed.
(Assignee)

Comment 42

12 years ago
Created attachment 214275 [details] [diff] [review]
remove <p> tags

> is there no way to avoid this html-centric stuff in comments? at least the 
> <p> in the other paragraph can be removed.

And so it was done. No more <p> tags.
Attachment #214155 - Attachment is obsolete: true
(Assignee)

Comment 43

11 years ago
Created attachment 215012 [details] [diff] [review]
patch w/ mutable attributes interface
Attachment #214275 - Attachment is obsolete: true
Attachment #215012 - Flags: review?(peterv)
(Assignee)

Comment 44

11 years ago
Created attachment 215581 [details] [diff] [review]
whitespace, makefiles, dependencies reduced
Attachment #215012 - Attachment is obsolete: true
Attachment #215581 - Flags: review?(peterv)
Attachment #215012 - Flags: review?(peterv)
Comment on attachment 215581 [details] [diff] [review]
whitespace, makefiles, dependencies reduced

Make sure to prefix your arguments with a (aFoo) in C++.

Don't bother NS_ENSURE_ARG'ing [out] arguments. Can't happen from JS and C++ callers deserve to die if they pass in null.

I don't think we usually style our Doxygen comments with HTML.

Thanks for adding all the docs! They could still be improved a bit, they seem to talk about how SAX should be implemented, not about your implementation (eg it talks about what the SAX driver should do).

>Index: Makefile.in
>===================================================================
>RCS file: /cvsroot/mozilla/Makefile.in,v
>retrieving revision 1.337
>diff -u -8 -p -r1.337 Makefile.in
>--- Makefile.in	9 Mar 2006 05:47:25 -0000	1.337
>+++ Makefile.in	19 Mar 2006 15:57:53 -0000
>@@ -161,16 +161,17 @@ endif
> 
> tier_9_dirs	+= \
> 		uriloader \
> 		modules/libpref \
> 		modules/libimg \
> 		caps \
> 		parser/expat \
> 		parser/htmlparser \
>+		parser/sax \

I'm not convinced of this. I wonder if we should add a parser/xml instead. We could then move the other XML related parser files in there in the future.

>Index: parser/sax/Makefile.in
>===================================================================
>RCS file: parser/sax/Makefile.in
>diff -N parser/sax/Makefile.in
>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ parser/sax/Makefile.in	19 Mar 2006 15:57:54 -0000
>@@ -0,0 +1,47 @@
>+#
>+# ***** BEGIN LICENSE BLOCK *****

Nit: remove that first blank line.

>Index: parser/sax/public/Makefile.in
>===================================================================
>RCS file: parser/sax/public/Makefile.in
>diff -N parser/sax/public/Makefile.in
>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ parser/sax/public/Makefile.in	19 Mar 2006 15:57:58 -0000
>@@ -0,0 +1,61 @@
>+#

Nit: remove that first blank line.

>Index: parser/sax/public/nsISAXAttributes.idl
>===================================================================

>+  /**
>+   * Look up the index of an attribute by XML qualified (prefixed) name.
>+   * @param qName The qualified (prefixed) name.

Nit: s/(prefixed)//
Specify the return value.

>+  /**
>+   * Look up an attribute's XML qualified (prefixed) name by index.
>+   * @param index The attribute index (zero-based).
>+   * @return The XML qualified name, or the empty string if none is
>+   *         available, or null if the index is out of range.

How can the XML qualified name not be available?

>+  /**
>+   * Look up an attribute's type by XML qualified (prefixed) name.
>+   * @param qName The qualified (prefixed) name.
>+   * @return The attribute type as a string, or null if the attribute
>+   *         is not in the list.

Nit: s/(prefixed)//

>Index: parser/sax/public/nsISAXContentHandler.idl
>===================================================================

>+interface nsISAXContentHandler : nsISupports {

Nit: move the brace one line down (there's other files that have this).

>+   * <strong>There is an apparent contradiction between the
>+   * documentation for this method and the documentation for
>+   * ErrorHandler.fatalError().  Until this ambiguity is resolved in a
>+   * future major release, clients should make no assumptions about
>+   * whether endDocument() will or will not be invoked when the parser
>+   * has reported a fatalError() or thrown an exception.</strong>

Is that true in your implementation?

>+   * <li>the Namespace URI and local name are required when the
>+   * namespaces property is <var>true</var> (the default), and are
>+   * optional when the namespaces property is <var>false</var> (if one
>+   * is specified, both must be);</li>

What do you mean by optional, doesn't your implementation always pass them in?

>+   *
>+   * <li>the qualified name is required when the namespace-prefixes
>+   * property is <var>true</var>, and is optional when the
>+   * namespace-prefixes property is <var>false</var> (the
>+   * default).</li> </ul>

Same here.

>+   *
>+   * Note that the attribute list provided will contain only
>+   * attributes with explicit values (specified or defaulted):
>+   * #IMPLIED attributes will be omitted.

Your implementation doesn't seem to do that (feel free to file a followup bug if you plan to add it).

>+   * @param qName the qualified name (with prefix), or the
>+   *        empty string if qualified names are not available

How can they be not available?

>+   * @param atts the attributes attached to the element.  If
>+   *        there are no attributes, it shall be an empty
>+   *        Attributes object.  The value of this object after
>+   *        startElement returns is undefined

Nit: s/Attributes/SAXAttributes/

>+   * There should never be start/endPrefixMapping events for the
>+   * "xml" prefix, since it is predeclared and immutable.

The 'xml' prefix may be declared (to http://www.w3.org/XML/1998/namespace), the 'xmlns' prefix must not be declared (at least according to Namespaces in XML 1.1).

>Index: parser/sax/public/nsISAXEntityResolver.idl
>===================================================================

Drop this file since it's not used. File a bug for implementing this and attach it there.

>Index: parser/sax/public/nsISAXMutableAttributes.idl
>===================================================================

Would it make sense to provide a |copy| or |duplicate| method on nsISAXAttributes that returns a nsISAXMutableAttributes?

>Index: parser/sax/src/nsSAXAttributes.cpp
>===================================================================

>+nsSAXAttributes::GetIndexFromName(const nsAString &aURI,
>+                                  const nsAString &aLocalName,
>+                                  PRInt32 *aResult)
>+{
>+  PRInt32 index = -1;
>+  PRInt32 len = mAttrs.Length();
>+  SAXAttr att;

Shouldn't this be a reference (const SAXAttr &att)?

>+  for (PRInt32 i=0; i<len; i++) {

Spaces around operators.
++i
I prefer that you declare i outside of the for.

>+    att = mAttrs[i];
>+    if (att.localName.Equals(aLocalName) && att.uri.Equals(aURI)) {
>+      index = i;
>+      break;

Drop index, set *aResult and return here...

>+    }
>+  }
>+  *aResult = index;

... and set *aResult to -1 here.

>+nsSAXAttributes::GetIndexFromQName(const nsAString &aQName, PRInt32 *aResult)
>+{
>+  PRInt32 index = -1;
>+  PRInt32 len = mAttrs.Length();
>+  SAXAttr att;
>+  for (PRInt32 i=0; i<len; i++) {
>+    att = mAttrs[i];
>+    if (att.qName.Equals(aQName)) {
>+      index = i;
>+      break;
>+    }
>+  }
>+  *aResult = index;

See above.

>+nsSAXAttributes::GetLocalName(PRInt32 aIndex, nsAString &aResult)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex < 0) || (aIndex >= len)) {

Drop the inner braces.

>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);

No need to call Truncate.

>+  }else{

Spaces around else

>+    SAXAttr att = mAttrs[aIndex];

Reference.

>+nsSAXAttributes::GetQName(PRInt32 aIndex, nsAString &aResult)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex < 0) || (aIndex >= len)) {
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }else{
>+    SAXAttr att = mAttrs[aIndex];
>+    aResult.Assign(att.qName);
>+  }

See above.

>+nsSAXAttributes::GetType(PRInt32 aIndex, nsAString &aResult)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex < 0) || (aIndex >= len)) {
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }else{
>+    SAXAttr att = mAttrs[aIndex];
>+    aResult.Assign(att.type);

See above.

>+nsSAXAttributes::GetTypeFromName(const nsAString &aURI,
>+                                 const nsAString &aLocalName,
>+                                 nsAString &aResult)
>+{
>+  PRInt32 index = -1;
>+  GetIndexFromName(aURI, aLocalName, &index);
>+  if (index >= 0) {
>+    aResult.Assign(mAttrs[index].type);
>+  }else{

Spaces around else.

>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }

No need to call Truncate.

>+nsSAXAttributes::GetTypeFromQName(const nsAString &aQName, nsAString &aResult)
>+{
>+  PRInt32 index = -1;
>+  GetIndexFromQName(aQName, &index);
>+  if (index >= 0) {
>+    aResult.Assign(mAttrs[index].type);
>+  }else{
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }

See above.

>+NS_IMETHODIMP
>+nsSAXAttributes::GetURI(PRInt32 aIndex, nsAString &aResult)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex < 0) || (aIndex >= len)) {
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }else{
>+    SAXAttr att = mAttrs[aIndex];
>+    aResult.Assign(att.uri);
>+  }

See above.

>+nsSAXAttributes::GetValue(PRInt32 aIndex, nsAString &aResult)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex < 0) || (aIndex >= len)) {
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }else{
>+    SAXAttr att = mAttrs[aIndex];
>+    aResult.Assign(att.value);
>+  }

See above.

>+nsSAXAttributes::GetValueFromName(const nsAString &aURI,
>+                                  const nsAString &aLocalName,
>+                                  nsAString &aResult)
>+{
>+  PRInt32 index = -1;
>+  GetIndexFromName(aURI, aLocalName, &index);
>+  if (index >= 0) {
>+    aResult.Assign(mAttrs[index].value);
>+  }else{
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }

See above.

>+nsSAXAttributes::GetValueFromQName(const nsAString &aQName,
>+                                   nsAString &aResult)
>+{
>+  PRInt32 index = -1;
>+  GetIndexFromQName(aQName, &index);
>+  if (index >= 0) {
>+    aResult.Assign(mAttrs[index].value);
>+  }else{
>+    aResult.Truncate();
>+    aResult.SetIsVoid(PR_TRUE);
>+  }

See above.

>+nsresult

All methods declared in the IDL need to be NS_IMETHODIMP.

>+nsSAXAttributes::AddAttribute(const nsAString &aURI,
>+                              const nsAString &aLocalName,
>+                              const nsAString &aQName,
>+                              const nsAString &aType,
>+                              const nsAString &aValue)
>+{
>+  SAXAttr att;
>+  att.uri.Assign(aURI);
>+  att.localName.Assign(aLocalName);
>+  att.qName.Assign(aQName);
>+  att.type.Assign(aType);
>+  att.value.Assign(aValue);
>+
>+  if (!mAttrs.AppendElement(att))
>+    return NS_ERROR_OUT_OF_MEMORY;

Don't create a SAXAttr yourself, instead do:

  SAXAttr *att = mAttrs.AppendElement();
  if (!att) {
    return NS_ERROR_OUT_OF_MEMORY;
  }

  att->uri.Assign(aURI);
  att->localName.Assign(aLocalName);
  att->qName.Assign(aQName);
  att->type.Assign(aType);
  att->value.Assign(aValue);

>+nsresult
>+nsSAXAttributes::SetAttribute(PRInt32 aIndex,
>+                              const nsAString &aURI,
>+                              const nsAString &aLocalName,
>+                              const nsAString &aQName,
>+                              const nsAString &aType,
>+                              const nsAString &aValue)
>+{
>+  nsresult rv;
>+  rv = SetURI(aIndex, aURI);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  rv = SetLocalName(aIndex, aLocalName);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  rv = SetQName(aIndex, aQName);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  rv = SetType(aIndex, aType);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  rv = SetValue(aIndex, aValue);
>+  NS_ENSURE_SUCCESS(rv, rv);

  if (aIndex >= 0 && aIndex < mAttrs.Length()) {
    SAXAttr &att = mAttrs[aIndex];
    att.uri.Assign(aURI);
    att.localName.Assign(aLocalName);
    att.qName.Assign(aQName);
    att.type.Assign(aType);
    att.value.Assign(aValue);
  }

>+nsSAXAttributes::SetLocalName(PRInt32 aIndex, const nsAString &aLocalName)
>+{
>+  PRInt32 len = mAttrs.Length();
>+  if ((aIndex >= 0) && (aIndex < len)) {

Drop the inner braces and no need for len if you only use it once (also in the following methods).

>+nsSAXAttributes::SetQName(PRInt32 aIndex, const nsAString &aQName)

>+nsSAXAttributes::SetType(PRInt32 aIndex, const nsAString &aType)

>+nsSAXAttributes::SetURI(PRInt32 aIndex, const nsAString &aURI)

>+nsSAXAttributes::SetValue(PRInt32 aIndex, const nsAString &aValue)

>Index: parser/sax/src/nsSAXAttributes.h
>===================================================================

>+class nsSAXAttributes : public nsISAXMutableAttributes

>+private:
>+  ~nsSAXAttributes(){};

Nit: I prefer:

  ~nsSAXAttributes()
  {
  };

Do we really need these destructors? (Why is the implicit destructor not sufficient?)

>Index: parser/sax/src/nsSAXLocator.cpp
>===================================================================

>+nsSAXLocator::nsSAXLocator()
>+{
>+  mColumnNumber = 0;
>+  mLineNumber = 0;

Use an initialization list.

>+NS_IMETHODIMP nsSAXLocator::GetColumnNumber(PRInt32 *aColumnNumber)

New line after NS_IMETHODIMP (also in the following methods).

>+NS_IMETHODIMP nsSAXLocator::GetLineNumber(PRInt32 *aLineNumber)

>+NS_IMETHODIMP nsSAXLocator::GetPublicId(nsAString &aPublicId)

>+NS_IMETHODIMP nsSAXLocator::GetSystemId(nsAString &aSystemId)

>+NS_IMETHODIMP nsSAXLocator::SetColumnNumber(PRInt32 aColumnNumber)

>+NS_IMETHODIMP nsSAXLocator::SetLineNumber(PRInt32 aLineNumber)

>+NS_IMETHODIMP nsSAXLocator::SetSystemId(const nsAString &aSystemId)

>+NS_IMETHODIMP nsSAXLocator::SetPublicId(const nsAString &aPublicId)

>Index: parser/sax/src/nsSAXLocator.h
>===================================================================

>+class nsSAXLocator : public nsISAXLocator

>+  ~nsSAXLocator(){};

Nit:
  ~nsSAXLocator()
  {
  };

>Index: parser/sax/src/nsSAXXMLReader.cpp
>===================================================================

>+nsSAXXMLReader::nsSAXXMLReader(): mAsync(PR_TRUE)
>+{}

Nit:

{
}

>+nsSAXXMLReader::DidBuildModel()
>+{
>+  if (mContentHandler) {
>+    return mContentHandler->EndDocument();
>+  }
>+  return NS_OK;
>+  mParser = nsnull;

Need to set mParser to nsnull *before* returning.

>+nsSAXXMLReader::HandleCDataSection(const PRUnichar *aData,
>+                                   PRUint32 aLength)
>+{
>+  nsresult rv;
>+  if (mLexicalHandler) {
>+    rv = mLexicalHandler->StartCDATA();
>+    NS_ENSURE_SUCCESS(rv, rv);
>+  }
>+
>+  if (mContentHandler) {
>+    nsAutoString chars = nsAutoString(aData, aLength);
>+    rv = mContentHandler->Characters(chars);

Use Substring(aData, 0, aLength) instead of chars

>+nsSAXXMLReader::HandleCharacterData(const PRUnichar *aData,
>+                                    PRUint32 aLength)
>+{
>+  if (mContentHandler) {
>+    nsAutoString chars = nsAutoString(aData, aLength);
>+    return mContentHandler->Characters(chars);

Use Substring(aData, 0, aLength) instead of chars

>+nsSAXXMLReader::HandleStartNamespaceDecl(const PRUnichar *aPrefix,
>+                                         const PRUnichar *aUri)
>+{
>+  nsString prefix;
>+  if (aPrefix)
>+    prefix = nsDependentString(aPrefix);

  const nsString &prefix = aPrefix ? nsDependentString(aPrefix) :
                                     EmptyString();
or something similar.
And move it inside the if.
 
>+  
>+  if (mContentHandler) {
>+    return mContentHandler->StartPrefixMapping(prefix,
>+                                               nsDependentString(aUri));

This doesn't filter the 'xml' prefix (not sure if we want to, see my comment about the doc for startPrefixMapping).

>+nsSAXXMLReader::HandleEndNamespaceDecl(const PRUnichar *aPrefix)
>+{
>+  nsString prefix;
>+  if (aPrefix)
>+    prefix = nsDependentString(aPrefix);
>+
>+  if (mContentHandler) {
>+    return mContentHandler->EndPrefixMapping(prefix);
>+  }
>+  return NS_OK;

See above.

>+nsSAXXMLReader::HandleNotationDecl(const PRUnichar *aNotationName,
>+                                   const PRUnichar *aSystemId,
>+                                   const PRUnichar *aPublicId)
>+{
>+  if (mDTDHandler) {
>+    nsString sysId, pubId;
>+    if (aSystemId)
>+      sysId = nsDependentString(aSystemId);
>+    if (aSystemId)
>+      pubId = nsDependentString(aPublicId);

  const nsString &sysId = aPrefix ? nsDependentString(aSystemId) :
                                    EmptyString();

or something similar.
Same for pubid. And make the if for pubid test aPublicId.

>+nsSAXXMLReader::HandleUnparsedEntityDecl(const PRUnichar *aEntityName,
>+                                         const PRUnichar *aSystemId,
>+                                         const PRUnichar *aPublicId,
>+                                         const PRUnichar *aNotationName)
>+{
>+  if (mDTDHandler) {
>+    nsString sysId, pubId;
>+    if (aSystemId)
>+      sysId = nsDependentString(aSystemId);
>+    if (aSystemId)
>+      pubId = nsDependentString(aPublicId);

See above.

>+nsSAXXMLReader::ParseFromStream(nsIInputStream *stream,
>+                                const char *charset,
>+                                const char *contentType)
>+{
>+  NS_ENSURE_ARG(stream);
>+  NS_ENSURE_ARG(contentType);
>+
>+  // Put the nsCOMPtr out here so we hold a ref to the stream as needed
>+  nsresult rv;
>+  nsCOMPtr<nsIBufferedInputStream> bufferedStream;
>+  if (!NS_InputStreamIsBuffered(stream)) {
>+    bufferedStream = do_CreateInstance(NS_BUFFEREDINPUTSTREAM_CONTRACTID, &rv);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    rv = bufferedStream->Init(stream, 4096);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    stream = bufferedStream;

Just use NS_NewBufferedInputStream.

>+  } else {
>+    rv = NS_NewURI(getter_AddRefs(baseURI),
>+                   "about:blank" );

This can go on one line.

>+    if (NS_FAILED(rv)) return rv;

NS_ENSURE_SUCCESS(rv, rv);

>+    nsCOMPtr<nsIStreamListener> listener = do_QueryInterface(mParser);
>+    if (!listener)
>+      NS_ERROR("Couldn't QI to stream listener");
>+
>+    nsCOMPtr<nsIInputStreamChannel> inputChannel =
>+      do_QueryInterface(parserChannel);
>+    if (!inputChannel)
>+      NS_ERROR("Couldn't QI to nsIInputStreamChannel");

This will crash on the next line. Just do NS_ENSURE_SUCCESS(rv, rv) and pass rv into do_QueryInterface.

>+    rv = inputChannel->SetContentStream(stream);
>+    NS_ENSURE_SUCCESS(rv, rv);

You should have passed this into NS_NewInputStreamChannel.

>+
>+    rv = parser->Parse(baseURI,nsnull);

Space after comma.

>+    NS_ENSURE_SUCCESS(rv, rv);

You should QI the listener down here and return if the QI fails.

>+    rv = parserChannel->AsyncOpen(listener, nsnull);
>+    NS_ENSURE_SUCCESS(rv, rv);

>+nsSAXXMLReader::SplitExpatName(const PRUnichar *aExpatName,
>+                               nsAString& aURI,
>+                               nsAString& aLocalName,
>+                               nsAString& aQName)

>+  nsDependentString expatStr = nsDependentString(aExpatName);

Don't create two nsDependentStrings, use nsDependentString expatStr(aExpatName);

>+  PRInt32 break1, break2 = -1;
>+  break1 = expatStr.FindChar(PRUnichar(0xFFFF));
>+
>+  if (-1 == break1) {

Use kNotFound instead of -1 and put the constant on the right side.

>+    aLocalName = expatStr; // no namespace
>+    aURI.Truncate();
>+    aQName = expatStr;
>+  } else {
>+    aURI = Substring(expatStr, 0, break1);

StringHead(expatStr, break1)

>+    break2 = expatStr.FindChar(PRUnichar(0xFFFF), break1 + 1);
>+    if (-1 == break2) { // namespace, but no prefix

See above.

>+      aLocalName = Substring(expatStr, break1+1);

Spaces around +.

>+      aQName = aLocalName;
>+    } else { // namespace with prefix
>+      aLocalName = Substring(expatStr, break1 + 1, break2 - break1 - 1);
>+      aQName = Substring(expatStr, break2+1) +

Spaces around +.

>Index: parser/sax/src/nsSAXXMLReader.h
>===================================================================

>+#define NS_SAXXMLREADER_CID  \
>+{/* {ab1da296-6125-40ba-96d0-47a8282ae3db} */ \

Either drop the comment or move it above the define.

>+0xab1da296, 0x6125, 0x40ba, \
>+{ 0x96, 0xd0, 0x47, 0xa8, 0x28, 0x2a, 0xe3, 0xdb} }
>+
>+class nsSAXXMLReader : public nsISAXXMLReader,
>+                       public nsIExtendedExpatSink,
>+                       public nsIContentSink
>+{
>+public:
>+  NS_DECL_ISUPPORTS
>+  NS_DECL_NSIEXPATSINK
>+  NS_DECL_NSIEXTENDEDEXPATSINK
>+  NS_DECL_NSISAXXMLREADER
>+
>+  nsSAXXMLReader();
>+
>+  //nsIContentSink
>+  NS_IMETHOD WillBuildModel();
>+  NS_IMETHOD DidBuildModel();
>+  NS_IMETHOD SetParser(nsIParser* aParser);
>+  NS_IMETHOD WillInterrupt() { return NS_OK; }

I prefer the implementation on its own lines:

  NS_IMETHOD WillInterrupt()
  { 
    return NS_OK;
  }

>+protected:
>+  nsresult SplitExpatName(const PRUnichar *aExpatName,
>+                          nsAString& aURI,
>+                          nsAString& aLocalName,
>+                          nsAString& aQName);

This can be private, no? And be consistent: move the ampersand next to the start of the argument name.
Attachment #215581 - Flags: review?(peterv) → review-
(Assignee)

Comment 46

11 years ago
Created attachment 216267 [details] [diff] [review]
Fix addressing peterv's comments

(In reply to comment #45)
> (From update of attachment 215581 [details] [diff] [review] [edit])
  
> I don't think we usually style our Doxygen comments with HTML.

Removed.

> Thanks for adding all the docs! They could still be improved a bit, they seem
> to talk about how SAX should be implemented, not about your implementation (eg
> it talks about what the SAX driver should do).

Yah, I figured I would leave some of that text in, so the correct behavior would be clear (in case of bugs).


> >+		parser/sax \
> 
> I'm not convinced of this. I wonder if we should add a parser/xml instead. We
> could then move the other XML related parser files in there in the future.

Just let me know where to put it, and I'll move it.



> >+   * @return The XML qualified name, or the empty string if none is
> >+   *         available, or null if the index is out of range.
> 
> How can the XML qualified name not be available?

There are SAX feature flags that can turn this off.


 
> >+   * <strong>There is an apparent contradiction between the
> >+   * documentation for this method and the documentation for
> >+   * ErrorHandler.fatalError().  Until this ambiguity is resolved in a
> >+   * future major release, clients should make no assumptions about
> >+   * whether endDocument() will or will not be invoked when the parser
> >+   * has reported a fatalError() or thrown an exception.</strong>
> 
> Is that true in your implementation?

The problem is that the SAX spec is not clear. My implementation will always call endDocument, but it's not clear that's the correct behavior (the SAX people need to decide).

 
> >+   * <li>the Namespace URI and local name are required when the
> >+   * namespaces property is <var>true</var> (the default), and are
> >+   * optional when the namespaces property is <var>false</var> (if one
> >+   * is specified, both must be);</li>
> 
> What do you mean by optional, doesn't your implementation always pass them in?
There are SAX feature flags that can turn this off.

 
> >+   * Note that the attribute list provided will contain only
> >+   * attributes with explicit values (specified or defaulted):
> >+   * #IMPLIED attributes will be omitted.
> 
> Your implementation doesn't seem to do that (feel free to file a followup bug
> if you plan to add it).

Bug 331721


> >+   * @param qName the qualified name (with prefix), or the
> >+   *        empty string if qualified names are not available
> 
> How can they be not available?

There are SAX feature flags that can turn this off.

> >Index: parser/sax/public/nsISAXEntityResolver.idl
> >===================================================================
> 
> Drop this file since it's not used. File a bug for implementing this and attach
> it there.

Bug 331722

 
> >Index: parser/sax/public/nsISAXMutableAttributes.idl
> >===================================================================
> 
> Would it make sense to provide a |copy| or |duplicate| method on
> nsISAXAttributes that returns a nsISAXMutableAttributes?

We could do a setAttributes(in nsISAXAttributes atts) on nsISAXMutableAttributes?


> >+
> >+    rv = parser->Parse(baseURI, nsnull);
> >+    NS_ENSURE_SUCCESS(rv, rv);
> 
> You should QI the listener down here and return if the QI fails.
> 
> >+    rv = parserChannel->AsyncOpen(listener, nsnull);
> >+    NS_ENSURE_SUCCESS(rv, rv);

I'm not sure I understand what you want here. Please check this part.
Attachment #215581 - Attachment is obsolete: true
(Assignee)

Comment 47

11 years ago
Created attachment 216269 [details] [diff] [review]
nits
Attachment #216267 - Attachment is obsolete: true
Attachment #216269 - Flags: review?(peterv)
Comment on attachment 216269 [details] [diff] [review]
nits

Let's move all the files to parser/xml.
I don't feel strongly about copy/duplicate vs setAttributes, choose one ;-).

>Index: parser/sax/public/nsISAXAttributes.idl
>===================================================================

Since you switched nsISAXMutableAttributes to unsigned long's for the index, you might want to change nsISAXAttributes in the same way.

>Index: parser/sax/public/nsISAXXMLReader.idl
>===================================================================

>+   * Set the value of a feature flag. <strong>NOT CURRENTLY
>+   * IMPLEMENTED</strong>

There's a couple of these <strong> tags left in this file.

>Index: parser/sax/src/nsSAXAttributes.cpp
>===================================================================

>+nsSAXAttributes::GetIndexFromName(const nsAString &aURI,

>+  for (i=0; i < len; ++i) {

Space around =

>+nsSAXAttributes::GetIndexFromQName(const nsAString &aQName, PRInt32 *aResult)

>+  for (i=0; i < len; ++i) {

Ditto.

>+nsSAXAttributes::GetURI(PRInt32 aIndex, nsAString &aResult)

>+  if ((aIndex < 0) || (aIndex >= len)) {

Remove inner braces.

>+nsSAXAttributes::SetAttribute(PRUint32 aIndex,

>+  if (aIndex < mAttrs.Length()) {
>+    SAXAttr &att = mAttrs[aIndex];
>+    att.uri.Assign(aURI);
>+    att.localName.Assign(aLocalName);
>+    att.qName.Assign(aQName);
>+    att.type.Assign(aType);
>+    att.value.Assign(aValue);
>+    return NS_OK;
>+  }
>+  return NS_ERROR_FAILURE;

Return early instead:

  if (aIndex >= mAttrs.Length()) {
    return NS_ERROR_FAILURE;
  }

  ...

Some of the methods in this file return NS_ERROR_FAILURE on index out-of-bounds, and some don't. You might want to be consistent about that.

>Index: parser/sax/src/nsSAXLocator.cpp
>===================================================================

>+nsSAXLocator::nsSAXLocator(): mColumnNumber(0),mLineNumber(0)

Space after ,

>Index: parser/sax/src/nsSAXParserModule.cpp
>===================================================================

Hmm, I missed this previously, but we should put this in nsParserModule instead.

>Index: parser/sax/src/nsSAXXMLReader.cpp
>===================================================================

>+nsSAXXMLReader::nsSAXXMLReader(): mAsync(PR_TRUE)

Space before the :

>+nsSAXXMLReader::HandleStartElement(const PRUnichar *aName,

>+  if (mContentHandler) {
>+    nsresult rv;
>+    nsCOMPtr<nsSAXAttributes> atts =
>+      do_CreateInstance(NS_SAXATTRIBUTES_CONTRACTID, &rv);

I think you should create this directly with |new nsSAXAttributes()|

>+    NS_ENSURE_SUCCESS(rv, rv);
>+    
>+    nsAutoString uri, localName, qName;
>+    for (; *aAtts; aAtts += 2) {
>+      SplitExpatName(aAtts[0], uri, localName, qName);
>+      // XXX don't have attr type information
>+      // could support xmlns reporting, it's a standard SAX feature
>+      if (!uri.EqualsLiteral(XMLNS_URI)) {
>+        atts->AddAttribute(uri, localName, qName,
>+                           NS_LITERAL_STRING("CDATA"),
>+                           nsDependentString(aAtts[1]));
>+      }
>+    }
>+    // Deal with the element name
>+    SplitExpatName(aName, uri, localName, qName);
>+    return mContentHandler->StartElement(uri, localName, qName, atts);
>+  }
>+  return NS_OK;

Return early instead:

  if (!mContentHandler) {
    return NS_OK;
  }

  ...
Attachment #216269 - Flags: review?(peterv) → review+
(Assignee)

Comment 49

11 years ago
Created attachment 217770 [details] [diff] [review]
peterv's suggestions
Attachment #216269 - Attachment is obsolete: true
(Assignee)

Comment 50

11 years ago
Created attachment 217773 [details] [diff] [review]
nits
Attachment #217770 - Attachment is obsolete: true
Attachment #217773 - Flags: superreview?(bugmail)
One thing I would add...it would be nice the nsISAXXMLReader implenting object had an asyncParse function and was itself a nsIStreamListener... 

In the in-page feed processing code, we perform the following steps:

- we sniff all page content and determine that a feed is being loaded, so set the content type to application/vnd.mozilla.maybe-feed
- we set up a stream converter from application/vnd.mozilla.maybe-feed to application/vnd.mozilla.xul+xml... this converter receives the data for the document load asynchronously from the network and is itself a nsIStreamListener. 

It would be nice if we could create a nsISAXXMLReader in the converter's onStartRequest and forward data from onDataAvailable into it, rather than having to buffer it up ourselves and then parse it in a single lump at the end.

The following would be OK:

onStartRequest: function(request, cx, inStream, offset, count) {
  this._processor = 
   Cc["@mozilla.org/feed-processor;1"].createInstance(Ci.nsIFeedProcessor);
  this._processor.listener = this;
  this._processor.asyncParse(null);
  this._processor.QueryInterface(Ci.nsIStreamListener);
  this._processor.onStartRequest(request, cx);
},

onDataAvailable: function(request, context, inStream, offset, count) {
  this._processor.onDataAvailable(request, context, inStream, offset, count);
},

onStopRequest: function(request, context, status) {
  this._processor.onStopRequest(request, context, status);
},

handleResult: function(result) {
  var container = result.doc;
  ... // load XUL page
},

onStopRequest: function(request, context, status) {
  this._processor.onStopRequest(request, context, status);
}

And then set this._processor to null to break the cycle?
Er yes, of course. Thanks boris ;-)
Comment on attachment 217773 [details] [diff] [review]
nits

>Index: parser/xml/public/nsISAXAttributes.idl
...
>+  AString getType(in unsigned long index);

Like bsmedberg points out in comment 25, why not make the |type| argument a PRUInt32 and have constants in the idl? Same thing in nsIMutableSAXAttributes.


>Index: parser/xml/public/nsISAXLexicalHandler.idl

>+interface nsISAXLexicalHandler : nsISupports {

Is there a reason why nsISAXLexicalHandler doesn't inherit nsISAXContentHandler? Since if you implement the former you're always going to implement the latter (right?) the only downside I can see is if class B implements nsISAXContentHandler and class A wants to extend class B and add nsISAXLexicalHandler. Is that commonly the case? (and even if it was there are macros to implement the forwarding code needed).

>Index: parser/xml/public/nsISAXMutableAttributes.idl

>+interface nsISAXMutableAttributes : nsISAXAttributes {

Do we need to follow the org.xml.sax.helpers.AttributesImpl class here? Honestly, this interface is horrible :(

If we really do want to follow that interface then the remaining comments for this file is moot.

>+  /**
>+   * Add an attribute to the end of the list.
>+   *
>+   * For the sake of speed, this method does no checking
>+   * to see if the attribute is already in the list: that is
>+   * the responsibility of the application.
>+   *

This seems like a highly doubtful 'optimization'. Seems a lot better to let the single implementation worry then having every single user worry.

>+   * @param uri The Namespace URI, or the empty string if
>+   *        none is available or Namespace processing is not
>+   *        being performed.
>+   * @param localName The local name, or the empty string if
>+   *        Namespace processing is not being performed.
>+   * @param qName The qualified (prefixed) name, or the empty string
>+   *        if qualified names are not available.

Ugh, duplicate data. What to do if the local part of qName is different from localName? Having an argument with just the prefix would have been better.

>+  /**
>+   * Set the attributes list. This method will clear any attributes in
>+   * the list.

"This method will clear any attributes in the list before adding the attributes from the argument." seems better.

>+  /**
>+   * Set an attribute in the list.
>+   *
>+   * For the sake of speed, this method does no checking for name
>+   * conflicts or well-formedness: such checks are the responsibility
>+   * of the application.
>+   *
>+   * @param index The index of the attribute (zero-based).
>+   * @param uri The Namespace URI, or the empty string if
>+   *        none is available or Namespace processing is not
>+   *        being performed.
>+   * @param localName The local name, or the empty string if
>+   *        Namespace processing is not being performed.
>+   * @param qName The qualified name, or the empty string
>+   *        if qualified names are not available.
>+   * @param type The attribute type as a string.
>+   * @param value The attribute value.
>+   */
>+  void setAttribute(in unsigned long index,
>+                    in AString uri,
>+                    in AString localName,
>+                    in AString qName,
>+                    in AString type,
>+                    in AString value);

This method seems very redundant given that removeAttribute and addAttribute already exists. Would have been better to let addAttribute take care of both adding new and changing existing attributes.

>+  void setLocalName(in unsigned long index, in AString localName);
>+  void setQName(in unsigned long index, in AString qName);
>+  void setType(in unsigned long index, in AString type);
>+  void setURI(in unsigned long index, in AString uri);
>+  void setValue(in unsigned long index, in AString value);

These all seem fairly useless.


>Index: parser/xml/public/nsISAXXMLReader.idl

>+   * Set the value of a feature flag. NOT CURRENTLY IMPLEMENTED.
>+  void setFeature(in AString name, in boolean value);

>+   * Look up the value of a feature flag. NOT CURRENTLY IMPLEMENTED.
>+  boolean getFeature(in AString name);

>+   * Set the value of a property. NOT CURRENTLY IMPLEMENTED.
>+  void setProperty(in AString name, in nsISupports value);

>+   * Look up the value of a property. NOT CURRENTLY IMPLEMENTED.
>+  boolean getProperty(in AString name);

Are you planning on adding more then stubbs for these anytime soon? If not, please don't add them for now.


>+  void parseFromBuffer([const,array,size_is(bufLen)] in octet buf,
>+                       in PRUint32 bufLen, in string contentType);
>+  void parseFromStream(in nsIInputStream stream,
>+                       in string charset,
>+                       in string contentType);

It's sort of a pity to place these methods on the same interface as the other methods since it means that we can't make this class accessible by web-content. I guess we can just move these methods if we ever plan to do so.

>Index: parser/xml/src/Makefile.in
...
>\ No newline at end of file

Please make sure you end all files with a newline. Some platforms freak out otherwise.

>Index: parser/xml/src/nsSAXAttributes.cpp
...
>+nsSAXAttributes::GetLocalName(PRUint32 aIndex, nsAString &aResult)
>+{
>+  PRUint32 len = mAttrs.Length();
>+  if (aIndex < 0 || aIndex >= len) {

No need to test for |aIndex < 0| since aIndex is an unsigned. Same in other functions in this file.

>+    aResult.SetIsVoid(PR_TRUE);
>+  } else {
>+    const SAXAttr &att = mAttrs[aIndex];
>+    aResult.Assign(att.localName);

Simply |aResult = mAttrs[aIndex].localName;| is clearer. Same in other functions in this file.


>+nsSAXAttributes::AddAttribute(const nsAString &aURI,

>+  att->uri.Assign(aURI);
>+  att->localName.Assign(aLocalName);
>+  att->qName.Assign(aQName);
>+  att->type.Assign(aType);
>+  att->value.Assign(aValue);

Just use |=| instead, it's easier on the eyes.

>+nsSAXAttributes::SetAttributes(nsISAXAttributes *aAttributes)
>+    aAttributes->GetURI(i, att->uri);
>+    aAttributes->GetLocalName(i, att->localName);
>+    aAttributes->GetQName(i, att->qName);
>+    aAttributes->GetType(i, att->type);
>+    aAttributes->GetValue(i, att->value);

Might as well check the return value here too.

>+nsSAXAttributes::SetAttribute(PRUint32 aIndex,
>+                              const nsAString &aURI,
>+                              const nsAString &aLocalName,
>+                              const nsAString &aQName,
>+                              const nsAString &aType,
>+                              const nsAString &aValue)
>+{
>+  if (aIndex >= mAttrs.Length()) {
>+    return NS_ERROR_FAILURE;
>+  }

This should throw even for aIndex == mAttrs.Length()

>+  SAXAttr &att = mAttrs[aIndex];
>+  att.uri.Assign(aURI);
>+  att.localName.Assign(aLocalName);
>+  att.qName.Assign(aQName);
>+  att.type.Assign(aType);
>+  att.value.Assign(aValue);

What have you got against |=| anyway ;)

>+  return NS_OK;

It's nice to insert an empty newline before returns to make them a bit easier to spot.

>+nsSAXAttributes::SetLocalName(PRUint32 aIndex, const nsAString &aLocalName)
>+{
>+  if (aIndex >= mAttrs.Length()) {
>+    return NS_ERROR_FAILURE;
>+  }
>+  mAttrs[aIndex].localName.Assign(aLocalName);
>+  return NS_OK;
>+}

Same three comments as for SetAttribute applies here too. And in the functions below.


>Index: parser/xml/src/nsSAXLocator.h

>+class nsSAXLocator : public nsISAXLocator
>+protected:
>+  nsresult SetColumnNumber(PRInt32 aColumnNumber);
>+  nsresult SetLineNumber(PRInt32 aLineNumber);
>+  nsresult SetSystemId(const nsAString & aSystemId);
>+  nsresult SetPublicId(const nsAString & aPublicId);

No one will be able to call these methods, and indeed noone does. Do we really need this class? And if we do, why not set the members in the constructor?


>Index: parser/xml/src/nsSAXXMLReader.cpp

>+nsSAXXMLReader::HandleStartElement(const PRUnichar *aName,
>+                                   const PRUnichar **aAtts,
>+                                   PRUint32 aAttsCount,
>+                                   PRInt32 aIndex,
>+                                   PRUint32 aLineNumber)
>+{
>+  if (!mContentHandler) {
>+    return NS_OK;;
>+  }
>+
>+  nsCOMPtr<nsSAXAttributes> atts = new nsSAXAttributes();
>+  if (!atts)
>+    return NS_ERROR_FAILURE;

Should be NS_ERROR_OUT_OF_MEMORY

>+  nsAutoString uri, localName, qName;
>+  for (; *aAtts; aAtts += 2) {
>+    SplitExpatName(aAtts[0], uri, localName, qName);
>+    // XXX don't have attr type information
>+    // could support xmlns reporting, it's a standard SAX feature

What is the xmlns comment about?

>+    if (!uri.EqualsLiteral(XMLNS_URI)) {
>+      atts->AddAttribute(uri, localName, qName,
>+                         NS_LITERAL_STRING("CDATA"),

Use NS_NAMED_LITERAL_STRING for now to avoid creating a new stringobject in every iteration.


>+nsSAXXMLReader::ParseFromBuffer(const PRUint8 *aBuf,
>+                                PRUint32 aBufLen,
>+                                const char *aContentType)
>+{
>+  return NS_ERROR_NOT_IMPLEMENTED;
>+}

Please remove this function, it can always be added later.


>+nsSAXXMLReader::SplitExpatName(const PRUnichar *aExpatName,
>+                               nsAString& aURI,
>+                               nsAString& aLocalName,
>+                               nsAString& aQName)

Make these arguments |nsString&|, that'll save some binary size and cycles.
Do you want to set aURI.IsVoid() here to get |null| as default namespace rather them empty-string? I think you do.


With that fixed or explained, sr=sicking
Attachment #217773 - Flags: superreview?(bugmail) → superreview+
(Assignee)

Comment 55

11 years ago
(In reply to comment #54)
> (From update of attachment 217773 [details] [diff] [review] [edit])
> >Index: parser/xml/public/nsISAXAttributes.idl
> ...
> >+  AString getType(in unsigned long index);
> 
> Like bsmedberg points out in comment 25, why not make the |type| argument a
> PRUInt32 and have constants in the idl? Same thing in nsIMutableSAXAttributes.

Many people have said it might be a good idea to expose this to web content at some point in the future. If we're going to do that, we should follow SAX / MSXML wherever possible, even where it diverges from Mozilla style a bit.

> >Index: parser/xml/public/nsISAXLexicalHandler.idl
> 
> >+interface nsISAXLexicalHandler : nsISupports {
> 
> Is there a reason why nsISAXLexicalHandler doesn't inherit
> nsISAXContentHandler? Since if you implement the former you're always going to
> implement the latter (right?) 

I don't think that's necessarily the case. The SAX design allows for chaining and swapping handlers during a parse... I can't think of a problem with making nsISAXLexicalHandler inherit, but I don't see much of a benefit, either.

> >Index: parser/xml/public/nsISAXMutableAttributes.idl
> 
> >+interface nsISAXMutableAttributes : nsISAXAttributes {
> 
> Do we need to follow the org.xml.sax.helpers.AttributesImpl class here?

I think so, see above :/.

> 
> No one will be able to call these methods, and indeed noone does. 

fixed.

> Do we really need this class? And if we do, why not set the members in the 
> constructor?

From comment #51, you can see that the parse methods are going to change. When I get them right, I'll work on the Locator class. Also, strict SAX compliance require that we have these...

> >+    // could support xmlns reporting, it's a standard SAX feature
> 
> What is the xmlns comment about?

There's a SAX feature flag that causes xmlns attributes to be reported in the attributes list.
 


> 
> 
> >+nsSAXXMLReader::SplitExpatName(const PRUnichar *aExpatName,
> >+                               nsAString& aURI,
> >+                               nsAString& aLocalName,
> >+                               nsAString& aQName)
> 
> Make these arguments |nsString&|, that'll save some binary size and cycles.
> Do you want to set aURI.IsVoid() here to get |null| as default namespace rather
> them empty-string? I think you do.

No, SAX says it is the empty string.
(Assignee)

Comment 56

11 years ago
Created attachment 218550 [details] [diff] [review]
Address comments from beng and sicking
Attachment #217773 - Attachment is obsolete: true
(Assignee)

Comment 57

11 years ago
Created attachment 218551 [details] [diff] [review]
New async parse code.

bz, you don't need to review most of this. I just added async parse and listener code. Only to methods from nsSAXXMLReader::ParseFromStream on down to nsSAXXMLReader::InitParser need review.
Attachment #218550 - Attachment is obsolete: true
Attachment #218551 - Flags: review?(bzbarsky)
Would it be possible to either split out the async stuff into a separate patch (and bug?) or at least post an interdiff?
Fwiw, I think the right way to do it is to land the already-reviewed patch, then do the async parsing as a followup instead of mutating the patch until it's perfect...  That would make the review process much much simpler.  As things stand, I really do have to review the whole patch, because otherwise I can't tell how things fit in with each other and what changes had to be made to bolt on the async stuff.
(Assignee)

Comment 60

11 years ago
(In reply to comment #59)
> Fwiw, I think the right way to do it is to land the already-reviewed patch,
> then do the async parsing as a followup

OK, I'll do that.
(Assignee)

Updated

11 years ago
Attachment #218551 - Flags: review?(bzbarsky)
(Assignee)

Comment 61

11 years ago
Created attachment 218619 [details] [diff] [review]
patch w/ sicking's comments, no async
Attachment #218551 - Attachment is obsolete: true
Aside from the addition of new interfaces, what sort of changes to existing interfaces does this patch and the patch in the dependent bug 326477 introduce? I am asking wrt 1.8 branch-worthiness, since this bug is in the dependency tree for a Firefox2 feature. 
(Assignee)

Comment 63

11 years ago
(In reply to comment #62)
> Aside from the addition of new interfaces, what sort of changes to existing
> interfaces does this patch and the patch in the dependent bug 326477 introduce?
> I am asking wrt 1.8 branch-worthiness, since this bug is in the dependency tree
> for a Firefox2 feature. 

This patch and bug 326477 don't change any interfaces, they just add new ones. However, they do depend on bug 323908, which made a small change to the signature of nsIExpatSink::ReportError in order to support SAX.
Depends on: 323908
This patch seems to use constructs like:

 foo ? bar : baz 

where bar and baz are both nsAStrings (sometimes of different classes).  That's not portable to all our platforms last I checked (some compilers obligingly go red, others just silently compile it into runtime crashes).
(In reply to comment #63)
> This patch and bug 326477 don't change any interfaces, they just add new ones.
> However, they do depend on bug 323908, which made a small change to the
> signature of nsIExpatSink::ReportError in order to support SAX.

Do you know of any way to make your patch compatible with the 1.8 branch requirement of not changing interfaces?
(Assignee)

Comment 66

11 years ago
(In reply to comment #65)
>
> Do you know of any way to make your patch compatible with the 1.8 branch
> requirement of not changing interfaces?

ns*I*SAXXMLReader::ReportError is the only place it matters, and I
believe the patch can be conditionalized or slightly altered to use the old signature on the branch. I will check out the branch and try it.
(Assignee)

Comment 67

11 years ago
(In reply to comment #66)
>
> ns*I*SAXXMLReader::ReportError 

Er, I mean nsIExpatSink::ReportError and nsSAXXMLReader::ReportError.
(Assignee)

Comment 68

11 years ago
Created attachment 218663 [details]
backtrace

One of the build bustage fixes bz checked in last night caused a segfault in nsCharTraits/nsDependentString when I ran my tests. I have a patch that fixes it, but the crash seems like a bug to me.
(Assignee)

Comment 69

11 years ago
Created attachment 218664 [details] [diff] [review]
Fix HandleEndNamespaceDecl when there's aPrefix is null
(Assignee)

Updated

11 years ago
Attachment #218664 - Flags: review?(bzbarsky)

Comment 70

11 years ago
strings explicitly refuse to allow you to ask them to wrap null. you have to use SetVoid and friends. it's not a bug, just one of those things...
Comment on attachment 218664 [details] [diff] [review]
Fix HandleEndNamespaceDecl when there's aPrefix is null

That won't work since nullChar goes out of scope before you use it. Isn't the problem that you've got the |if(!aPrefix)| test inversed?
Attachment #218664 - Flags: review?(bzbarsky) → review-

Comment 72

11 years ago
Comment on attachment 218619 [details] [diff] [review]
patch w/ sicking's comments, no async

mozilla/parser/xml/src/nsSAXXMLReader.h 	1.1
mozilla/Makefile.in 	1.342
mozilla/parser/htmlparser/src/Makefile.in 	1.103
mozilla/parser/htmlparser/src/nsParserModule.cpp 	1.61
mozilla/parser/xml/public/Makefile.in 	1.1
mozilla/parser/xml/public/nsISAXAttributes.idl 	1.1
mozilla/parser/xml/public/nsISAXContentHandler.idl 	1.1
mozilla/parser/xml/public/nsISAXDTDHandler.idl 	1.1
mozilla/parser/xml/public/nsISAXErrorHandler.idl 	1.1
mozilla/parser/xml/public/nsISAXLexicalHandler.idl 	1.1
mozilla/parser/xml/public/nsISAXLocator.idl 	1.1
mozilla/parser/xml/public/nsISAXMutableAttributes.idl 	1.1
mozilla/parser/xml/public/nsISAXXMLFilter.idl 	1.1
mozilla/parser/xml/public/nsISAXXMLReader.idl 	1.1
mozilla/parser/Makefile.in 	1.2
mozilla/parser/xml/Makefile.in 	1.1
mozilla/parser/xml/src/Makefile.in 	1.1
mozilla/parser/xml/src/nsSAXAttributes.cpp 	1.1
mozilla/parser/xml/src/nsSAXAttributes.h 	1.1
mozilla/parser/xml/src/nsSAXLocator.cpp 	1.1
mozilla/parser/xml/src/nsSAXLocator.h 	1.1
mozilla/parser/xml/src/nsSAXXMLReader.cpp 	1.1
mozilla/parser/xml/src/nsSAXLocator.cpp 	1.2
mozilla/parser/xml/src/nsSAXLocator.h 	1.2
mozilla/parser/xml/src/nsSAXXMLReader.cpp 	1.2
mozilla/parser/xml/src/nsSAXXMLReader.cpp 	1.3
mozilla/parser/xml/src/nsSAXXMLReader.cpp 	1.4
Attachment #218619 - Attachment is obsolete: true
(Assignee)

Comment 73

11 years ago
Created attachment 218679 [details] [diff] [review]
fix endNamespaceDecl

(In reply to comment #71)
>That won't work since nullChar goes out of scope before you use it. Isn't the
>problem that you've got the |if(!aPrefix)| test inversed?

You're right of course. I managed to botch my line numbers and didn't see it.
Attachment #218663 - Attachment is obsolete: true
Attachment #218664 - Attachment is obsolete: true
(Assignee)

Comment 74

11 years ago
Created attachment 218680 [details] [diff] [review]
fix endNamespaceDecl
Attachment #218679 - Attachment is obsolete: true
Attachment #218680 - Flags: review?(bugmail)
Attachment #218680 - Flags: superreview+
Attachment #218680 - Flags: review?(bugmail)
Attachment #218680 - Flags: review+
Comment on attachment 218680 [details] [diff] [review]
fix endNamespaceDecl

I checked this in.  Sorry about the boolean.  Too much code rearrangement in a hurry.  :(
Depends on: 334304
(Assignee)

Updated

11 years ago
Depends on: 334494
Can we mark this fixed now? Seems the remaining work will happen in separate bugs.
*** Bug 15090 has been marked as a duplicate of this bug. ***
This still needs a 1.8 branch-safe patch. 
Created attachment 220578 [details] [diff] [review]
1.8 branch patch

Here's the branch patch. I'll check this in tomorrow, after landing the patch for bug 326477.
Attachment #220578 - Flags: approval-branch-1.8.1+
(Assignee)

Comment 80

11 years ago
Created attachment 220751 [details] [diff] [review]
patch including fix from follow-on bug 334304
Attachment #220751 - Flags: approval-branch-1.8.1?(peterv)
Comment on attachment 220751 [details] [diff] [review]
patch including fix from follow-on bug 334304

We're not going to mix all patches into one. This one needs to go in first, then ask for branch approval in bug 334304.
Attachment #220751 - Flags: approval-branch-1.8.1?(peterv) → approval-branch-1.8.1-
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.