Closed Bug 105137 Opened 23 years ago Closed 23 years ago

Eliminate Tokens and ParserNodes from XML parsing

Categories

(Core :: DOM: HTML Parser, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: dp, Assigned: harishd)

References

(Depends on 1 open bug)

Details

(Keywords: perf)

Attachments

(12 files, 1 obsolete file)

215.83 KB, patch
Details | Diff | Splinter Review
234.47 KB, patch
Details | Diff | Splinter Review
238.69 KB, patch
Details | Diff | Splinter Review
380.83 KB, patch
Details | Diff | Splinter Review
381.76 KB, patch
Details | Diff | Splinter Review
402.58 KB, patch
Details | Diff | Splinter Review
401.69 KB, patch
hjtoi-bugzilla
: review+
Details | Diff | Splinter Review
21.28 KB, patch
jag+mozilla
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
5.77 KB, patch
harishd
: review+
Details | Diff | Splinter Review
504 bytes, patch
harishd
: review+
Details | Diff | Splinter Review
5.23 KB, text/html
Details
5.23 KB, text/html
Details
Creation of Tokens and ParserNodes while parsing XML seems to be expensive
(about 3% of startup).

The plan is to eliminate them and hookup expat directly to the contentsinks.
Blocks: 104957
Keywords: perf
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.6
OS: Windows 2000 → All
Hardware: PC → All
Need time to address script/style loading. Pushing to 0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Attached patch patch v1.0Splinter Review
Needs cleanup ( especially the sinks ) / regression testing.
Two initial comments on the XBL content sink portion.

"const nsAutoString key(aAtts[0]);"

The above does a copy, doesn't it?  It seems like you should be using
nsDependentString instead of nsAutoString.

I see this pattern a fair bit too:

name? &nsDependentString(name):0

The XBL APIs took strings because that's what they got back from the old content
sink.  They actually would rather have had PRUnichars* anyway, so rather than
doing this complicated conditional to pass in to the XBL APIs, you could just
patch the XBL APIs.  If you want, you can give me a bug on this, and I'll do it
after you land.
hyatt: If you noticed I've have used nsDependentString where ever possible. In
some places I've retained "const nsAutoString key(aAtts[0]);" because it
involves changing more APIs and was planning on leaving it to the module owner.
The same applies to name? &nsDependentString(name):0. Yes, I totally agree that
XBL APIs should be changed to accept PRUnichar*. I'll open up a bug on that
after I land my changes.
const nsAutoString is new.  It is something you introduced with your patch that
has added an extra copy.

The old way was this:

const nsAReadableString& key = aNode.GetKeyAt(i);

which didn't copy.
Comment on attachment 59936 [details] [diff] [review]
patch v1.1 [ minor changes to nsExpatDriver & nsXBLContentSink ]

There are a bunch of places early in the file where you're being
inconsistent about tabs vs. spaces.  If you look at the attachment
using Mozilla you'll see them easily.

Also search the patch for these:
>\ No newline at end of file
and fix them so you don't break HP.


>+nsresult
>+RDFContentSinkImpl::PushNameSpacesFrom(const PRUnichar** aAttributes)
>+{
>+  // Remember the current top of the stack as the namespace
>+  // scope. When popping namespaces, we'll remove stack elements
>+  // until we hit this.
>+  mNameSpaceScopes.AppendElement(mNameSpaceStack);
>+
>+  while (*aAttributes) {
>+    nsAutoString key(aAttributes[0]);

This is an example of a place where you're better off using
|const nsDependentString|.  It really shouldn't be hard to change APIs
from const nsString& to const nsAString& if needed.

>+nsresult
>+RDFContentSinkImpl::OpenProperty(const PRUnichar* aName, const PRUnichar** aAttributes)
>+{
>+    nsresult rv;
>+
>+    // an "object" non-terminal is either a "description", a "typed
>+    // node", or a "container", so this change the content sink's
>+    // state appropriately.
>+    const char* nameSpaceURI;
>+    nsCOMPtr<nsIAtom> tag;
>+    ParseTagString(aName, &nameSpaceURI, getter_AddRefs(tag));
>+
>+    nsCAutoString propertyStr(nameSpaceURI);
>+
>+    const PRUnichar *attrName;
>+    tag->GetUnicode(&attrName);
>+
>+    propertyStr += NS_ConvertUCS2toUTF8(attrName);

You're better off moving the declaration of propertyStr down to here
and writing it as:

nsCAutoString propertyStr(nsDependentString(nameSpaceURI) +
			  NS_ConvertUCS2toUTF8(attrName));

so that you don't need to allocate twice (if nameSpaceURI is long).
dbaron, hyatt: I just realized that my last patch is sliglty obsolete than what
I have in my tree. I've replaced nsAutoString with nsDependentString where ever
possible ( there may be a few places that still needs cleanup though ).
dbaron: I'm not sure how the tabs got introduced. 
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

Have you tested XSLT, MathML and SVG? For XSLT, just go to 
http://www.mozilla.org/projects/xslt/ and try the sample.
MathML you would need to build first and then try a sample,
check out http://www.mozilla.org/projects/mathml/

Also, have you submitted an enhancement bug request to the Expat
Sourceforge project with only the expat part of the change? Please
do (you might want to also point them here for better larger
picture). I believe they would be the experts in reviewing that
part of the change.

>Index: expat/xmlparse/xmlparse.c
>===================================================================
>-	if (startElementHandler)
>+  if (startElementHandler) {
> 	  startElementHandler(handlerArg, name.str, (const XML_Char **)atts);
>+    if (blocked) {
>+      return XML_ERROR_PARSER_BLOCKED;
>+    }
>+  }

Indentation looks weird (although it is bizarre throughout Expat)...

>Index: htmlparser/public/nsIExpatSink.idl
>===================================================================
>+/* ***** BEGIN LICENSE BLOCK *****
>+ * Version: NPL 1.1/GPL 2.0/LGPL 2.1

Do we really want NPL/GPL? Shouldn't this be MPL/GPL? Are there
any precedents like this? This block is in all the new files.

> !ifdef MOZ_DEBUG
>-	nsLoggingSink.cpp   \
>-	nsHTMLNullSink.cpp  \
>+	nsLoggingSink.cpp    \
>+	nsHTMLNullSink.cpp   \
> !endif
> 	$(NULL)
> 
>@@ -73,6 +74,7 @@
> 	.\$(OBJDIR)\nsDTDUtils.obj      \
> 	.\$(OBJDIR)\nsHTMLTokenizer.obj \
> 	.\$(OBJDIR)\nsExpatTokenizer.obj\
>+  .\$(OBJDIR)\nsExpatDriver.obj   \
> 	.\$(OBJDIR)\nsElementTable.obj  \
> 	.\$(OBJDIR)\CNavDTD.obj         \
> 	.\$(OBJDIR)\COtherDTD.obj       \

Shouldn't the logging and null sink be also included in the obj list
in debug cases?

>Index: htmlparser/src/nsExpatDriver.cpp
>===================================================================
>+/***************************** EXPAT CALL BACKS *******************************/
>+
>+#ifdef __cplusplus
>+extern "C" {
>+  void Driver_HandleDefault(void *aUserData, 
>+                            const XML_Char *aCData, 
>+                            int aLength) 
>+  {

Use of aCData parameter name is confusing here, just aData or
something would probably be better...

I will continue but decided to give you some feedback as I go 
(it is a pretty big patch...)
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: htmlparser/src/nsExpatDriver.cpp
>===================================================================

>+static const char* kDTDDirectory = "dtd/";
>+
>+#if defined(XP_PC)
>+#pragma warning( disable : 4275 )
>+#endif

What is that warning we are disabling, and why?

>+/***************************** EXPAT CALL BACKS *******************************/
>+
>+#ifdef __cplusplus
>+extern "C" {
>+#endif

Why did you put the function bodies inside #ifdef __cplusplus ? I think the
old code just put the headers outside and that seemed to go ok with
all compilers.

>+nsExpatDriver::nsExpatDriver()
>+  :mExpatParser(0), 
>+   mSink(0), 
>+   mBytesParsed(0),
>+   mInDoctype(0), 
>+   mInCData(0), 
>+   mInternalState(NS_OK)
>+{

Please reorder these; the members should be initialized in the same
order as they were declared.

Will continue...
Also please test loading XML as data. Try these XMLHttpRequest GET tests
http://www.mozilla.org/xmlextras/tests.html and then a part of James Clark's XML
parser test suite modified to run via the web using XMLHttpRequest at
http://green.nscp.aoltw.net/heikki/oasis/bug57673.html

(hmm, seems like it dies on the latter at least...)
Problem loading http://green.nscp.aoltw.net/heikki/oasis/bug57673.html is fixed in 
my tree.
 
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

Have you run the new code with Purify? I think it would be a 
good thing to do to catch potential off by one errors and
the like.

There are several places where I mention you could get rid of
some variables; I understand that itis easier to debug when
they are there, and if they are in code path that is only
used when we report XML errors I am fine leaving them as is.

>Index: htmlparser/src/nsExpatDriver.cpp
>===================================================================
>+    PRUint32 attrlen = XML_GetSpecifiedAttributeCount(mExpatParser) / sizeof(PRUnichar);
>+    PRUint32 index   = XML_GetIdAttributeIndex(mExpatParser);
>+    PRUint32 lineNo  = XML_GetCurrentLineNumber(mExpatParser);
>+    
>+    mInternalState = mSink->HandleStartElement(aValue, aAtts, attrlen, index, lineNo);

Since you only use the above variables once, there is no need to have
them...

>+    // XXX what do we do with CRLFs?
>+    mInternalState = mSink->HandleCharacterData(aValue, aLength);

What do you mean by that comment?

>+    const PRUnichar* tag = (const PRUnichar*)XML_GetMismatchedTag(mExpatParser);
>+    description.Append(NS_LITERAL_STRING(". Expected: "));
>+    description.Append(NS_LITERAL_STRING("</"));
>+    description.Append(tag);

You could also get rid of the tag variable since it is only used once.

>+    PRInt32 byteIndexRelativeToFile = 0;
>+    byteIndexRelativeToFile = XML_GetCurrentByteIndex(mExpatParser);
>+    GetLine(aBuffer, aLength, (byteIndexRelativeToFile - mBytesParsed), sourceLine);

Likewise you can get rid of the variable...

>+  PRInt32 lineNumber   = XML_GetCurrentLineNumber(mExpatParser);
>+  PRInt32 colNumber    = XML_GetCurrentColumnNumber(mExpatParser) + 1; // Adjust the column number so that it is one based rather than zero based.
>+  PRUnichar* sourceURL = (PRUnichar*)XML_GetBase(mExpatParser);
>+
>+  ReportError(mSink, description.get(), sourceLine.get(), sourceURL, lineNumber, colNumber);

And here as well, you can get rid of these variables.

>+nsresult 
>+nsExpatDriver::ParseBuffer(const char* aBuffer, PRUint32 aLength, PRBool aIsFinal)
>+{
>+  nsresult result = NS_OK;
>+  NS_ASSERTION((aBuffer && aLength) || (aBuffer == nsnull && aLength == 0), "?");

NS_ENSURE_TRUE so we also bail out in release builds?

>+         mBytesParsed += mPosition = (XML_GetCurrentByteIndex(mExpatParser) - mBytesParsed);

This looks a bit compact, maybe break it into two lines?

>+	  mBytesParsed += aLength; 
>+    mPosition = 0;

Weird indentation.

>+  while (start != end) {
>+    PRUint32 fragLength = PRUint32(start.size_forward());
>+    PRUint32 bufLength = fragLength * sizeof(PRUnichar);
>+    const PRUnichar* expatBuffer = start.get();
>+    
>+    mInternalState = ParseBuffer((const char *)expatBuffer, bufLength,aFlushTokens);

You could get rid of the bufLength variable.

>+    if (NS_FAILED(mInternalState)) {
>+      if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK) {
>+        aScanner.SetPosition(start.advance(mPosition / 2),PR_TRUE);
>+        aScanner.Mark();
>+      }
>+      return mInternalState;
>+    }
>+
>+    start.advance(fragLength);
>+  }
>+
>+  aScanner.SetPosition(end, PR_TRUE);
>+  
>+  if(NS_SUCCEEDED(mInternalState)) {
>+    return aScanner.Eof();
>+  }

I think you need to explain this part to me.


>+        aParserContext.SetMimeType(NS_ConvertASCIItoUCS2(kXMLTextContentType));

SetMimeType(NS_LITERAL_STRING("text/xml")) or something to avoid conversion?


>+NS_IMETHODIMP
>+nsExpatDriver::DidTokenize(PRBool aIsFinalChunk)
>+{
>+  return ParseBuffer(nsnull, 0, aIsFinalChunk);
>+} 

I guess you should explain this to me.

>+NS_IMETHODIMP  nsExpatDriver::Terminate(nsIParser* aParser)
>+{
>+  return mInternalState;
>+}

Does this mean it will no longer be possible to terminate the
parse? What about autoloading XLinks that will need to terminate
the parse from the XML content sink?

>+/*************************** Unused methods ***************************************/ 

Please return NS_OK in these methods instead of 0 if the signature
says nsresult.

>+NS_IMETHODIMP_(PRBool)
>+nsExpatDriver::CanContain(PRInt32 aParent,PRInt32 aChild) const
>+{
>+  return NS_ERROR_NOT_IMPLEMENTED;
>+}

Boolean return?

>+/******************************************************************************/
>\ No newline at end of file

???

>Index: htmlparser/src/nsExpatDriver.h
>===================================================================
>+#ifndef NS_EXPAT_DRIVER__
>+#define NS_EXPAT_DRIVER__

Double underscores are verboten by C++ ;)

>+  static nsresult OpenInputStream(const XML_Char* aURLStr, const XML_Char* aBaseURL, nsIInputStream** in, nsString* aAbsURL);
>+  static nsresult LoadStream(nsIInputStream* in, PRUnichar* &uniBuf, PRUint32 &retLen);
>+  nsresult HandleStartElement(const PRUnichar *aName, const PRUnichar **aAtts);
>+  nsresult HandleEndElement(const PRUnichar *aName);
>+  nsresult HandleCharacterData(PRUnichar *aCData, PRUint32 aLength);
>+  nsresult HandleComment(const PRUnichar *aName);
>+  nsresult HandleProcessingInstruction(const PRUnichar *aTarget, const PRUnichar *aData);
>+  nsresult HandleDefault(const PRUnichar *aData, PRUint32 aLength);
>+  nsresult HandleStartCdataSection();
>+  nsresult HandleEndCdataSection();
>+  nsresult HandleStartDoctypeDecl(const PRUnichar *aDoctypeName);
>+  nsresult HandleEndDoctypeDecl();
>+  nsresult ParseBuffer(const char* aBuffer, PRUint32 aLength, PRBool aIsFinal);
>+  nsresult HandleError(const char *aBuffer, PRUint32 aLength, PRBool aIsFinal);

Please check all these method implementations: do they always return
NS_OK? If so, make return type void.

>\ No newline at end of file

???

Will continue...
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: htmlparser/src/nsExpatTokenizer.h
>===================================================================

Why do we still have nsExpatTokenizer.*?

>Index: htmlparser/src/nsLoggingSink.cpp
>===================================================================
>-static NS_DEFINE_IID(kISupportsIID, NS_ISUPPORTS_IID);
>+static NS_DEFINE_IID(kIExpatSinkIID, NS_IEXPATSINK_IID);

>-  if (aIID.Equals(kISupportsIID)) {
>-    nsISupports* tmp = this;
>+  if (aIID.Equals(kIExpatSinkIID)) {
>+    nsIExpatSink* tmp = this;

Er... shouldn't an impl always QI to nsISupports? Also you wouldn't
necessarily need to define those statics, you could just use
the macro NS_GET_IID(<interface>).

>Index: htmlparser/src/nsParser.cpp
>===================================================================
>-            mUnusedInput.Truncate(0);
>+           mUnusedInput.Truncate(0);

Indentation?

Will continue from...

>Index: htmlparser/src/nsParser.h
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: htmlparser/src/nsParser.h
>===================================================================
>+    PRPackedBool        mDTDVerification;
>+    PRPackedBool        mParserEnabled;
>+    PRPackedBool        mObserversEnabled;
>     PRPackedBool        mPendingContinueEvent;
>     PRPackedBool        mCanInterrupt;

Minor point, but you could save 4 more bytes by inroducing one "flags"
variable and setting/checking bits on it. Not by any means necessary.

>Index: htmlparser/src/nsParserFactory.cpp
>===================================================================
>+    nsresult rv = NS_NewXMLLLoggingSink(&cs);

Is there one 'L' too many?

>+    if (NS_OK != rv) {

And shouldn't this been replaced with NS_FAILED?

Will continue with

>Index: htmlparser/src/nsParserModule.cpp
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: htmlparser/src/nsViewSourceHTML.cpp
>===================================================================
>-NS_IMETHODIMP CViewSourceHTML::WillResumeParse(void){
>+NS_IMETHODIMP CViewSourceHTML::WillResumeParse(nsIContentSink* aSink){
>   nsresult result = NS_OK;
>   if(mSink) {
>     result = mSink->WillResume();

Does the view source DTD need the mSink member anymore? Have you tested
that view source works still?

>Index: htmlparser/src/nsWellFormedDTD.cpp
>===================================================================
>+PRBool  
>+CWellFormedDTD::IsBlockElement(PRInt32 aTagID,PRInt32 aParentID) const 
>+{
>+  return PR_FALSE;
>+}
>+
>+PRBool  
>+CWellFormedDTD::IsInlineElement(PRInt32 aTagID,PRInt32 aParentID) const 
>+{
>+  return PR_FALSE;
> }

Are these supposed to be called? If not, you might want to add an
assertion if someone tried that.

>Index: htmlparser/tests/logparse/TestParser.cpp
>===================================================================
>-nsresult ParseData(char* anInputStream,char* anOutputStream) {
>+nsresult ParseData(char* anInputStream,char* anOutputStream, char* aChoice) {

Space after comma.

>Index: htmlparser/tests/logparse/TestParser.pl
>===================================================================
>-     system("$drive\mozilla/dist/WIN32_D.obj/bin/TestParser.exe $input $output[0].b");
>+     system("s:/mozilla/dist/WIN32_D.obj/bin/TestParser.exe $input $output[0].b $ARGV[2]");

>-     system("$drive\mozilla/dist/WIN32_D.obj/bin/TestParser.exe $input $output[0].v");
>+     system("s:/mozilla/dist/WIN32_D.obj/bin/TestParser.exe $input $output[0].v $ARGV[2]");

Did you just hardcode the drive and path, it doesn't feel nice...?

Will continue from

>Index: content/xbl/src/nsXBLContentSink.cpp
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

Please remember to remove the old code that you still seem to have around
for perf testing; pull a new tree for comparison or something.

>Index: content/xbl/src/nsXBLContentSink.cpp
>===================================================================
>+  nsresult rv = nsXMLContentSink::HandleStartElement(aName,aAtts,aAttsCount,aIndex,aLineNumber);

Space after comma. Please check all changed files...

>+    const nsAReadableString& key = nsDependentString(aAtts[0]);
>+
>+    SplitXMLName(key, getter_AddRefs(nameSpacePrefix),
>+                 getter_AddRefs(nameAtom));

There are several instancies of this kind of pattern where you could
probably get rid of the 'key' variable because it is only used once.
It is likely the compiler will optimize these away but you could 
do it as well.

>\ No newline at end of file

Add newline...

Will continue with

>Index: content/xml/document/src/nsXMLContentSink.cpp
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: content/xml/document/src/nsXMLContentSink.cpp
>===================================================================
>+  NS_INTERFACE_MAP_ENTRY(nsIExpatSink)
> 	NS_INTERFACE_MAP_ENTRY(nsIObserver)

Whitespace?

>+      domComment->AppendData(nsDependentString(aName));

>+      domCDATA->AppendData(nsDependentString(aData));

Please do SetData(), we just created the node.

>+  nsAutoString data(aData); // XXX replace this with nsDependentString. Change nsParserUtils APIs
>+  
>+  // ParseProcessingInstruction(text, target, data);
>+  result = NS_NewXMLProcessingInstruction(getter_AddRefs(node), target, data);

You can get rid of the data variable since it is used only once. Also you can
loose
the comment, right?

>+nsXMLContentSink::PushNameSpacesFrom(const PRUnichar** aAtts)
>+        if (*start == ':') {
>+          ++start;
>+
>+          prefixAtom =
>+            dont_AddRef(NS_NewAtom(nsDependentSubstring(start, end)));

Could you please test what happens if we have an element like this:

<foo xmlns:="bar"/>

I suspect the nsDependentSubstring will either hang, crash or give you
a value you didn't expect. Maybe we should add a check after ++start
that if (start != ebnd), then do prefixAtom?

>+// XXX Code copied from nsHTMLContentSink. It should be shared.
>+nsresult
>+nsXMLContentSink::AddAttributes(const PRUnichar** aAtts,

I guess you should either remove or modify that comment now...

>+  while (*aAtts) {
>+    // Get upper-cased key
>+    const nsAReadableString& key = nsDependentString(aAtts[0]);

And what is it with that upper-cased key? I don't think it is uppercased.
Same in XBL content sink. Remove those?

>+      nsresult rv = xmlcontent->MaybeTriggerAutoLink(mWebShell);
>+      if (rv == NS_XML_AUTOLINK_REPLACE ||
>+          rv == NS_XML_AUTOLINK_UNDEFINED) {
>+        // If we do not terminate the parse, we just keep generating link trigger
>+        // events. We want to parse only up to the first replace link, and stop.
>+        mParser->Terminate();

So does this now work in your tree?

>\ No newline at end of file

Add newline.

Will continue with

>Index: content/xul/document/src/nsXULContentSink.cpp
Removed nsIExpatTokenizer, CWellFormedDTD, nsExpatTokenizer.
Out of time --> 0.9.8
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Blocks: 110464
No longer blocks: tec-osx
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: content/xul/document/src/nsXULContentSink.cpp
>===================================================================

We are missing a PR_LOGGING sections in this file - were they added
after you attached this patch or what is the deal?

>-    static PRBool IsDataInBuffer(PRUnichar* aBuffer, PRInt32 aLength);
>+    /**** BEGIN NEW APIs ****/
>+    static PRBool IsDataInBuffer(PRUnichar* aBuffer, PRInt32 aLength);
>+    /**** END NEW APIs ****/

Hmm, IsDataBuffer() seems like an old function so move it outside of
the new API block?

>+    if (FindInReadable(NS_LITERAL_STRING("xul-overlay"), tmpStart = targetStart, targetEnd)) {

Uuh... that assignment in the parameter list looks dangerous, if you
really meant it I'd rather see a separate line for it!

>+      nsParserUtils::GetQuotedAttributeValue(data, NS_LITERAL_STRING("href"), href);

You have a perf bug 77791 to fix GetQuotedAttributeValue() (there
are several things broken in it, I think you should pull it to
0.9.8 or something...)

>+XULContentSinkImpl::PushNameSpacesFrom(const PRUnichar** aAttributes)

It looks like the impl of this method is different in XML and XUL content sink,
shouldn't they be the same? I think I would prefer the one in XML sink.

>+XULContentSinkImpl::ParseTag(const PRUnichar* aText, 

This method could be made faster by using substring I think, 
you could then avoid some string copies and Left & Cut operations.

>+#ifdef PR_LOGGING
>+        anodeC.AssignWithConversion(aNode.GetText());
>+        PR_LOG(gLog, PR_LOG_ALWAYS,
>+               ("xul: unable to create element '%s' at line %d",
>+                NS_STATIC_CAST(const char*, anodeC),
>+                aNode.GetSourceLineNumber()));*/
>+#endif

Check the code inside the #ifdefs that you normally won't compile,
seems like there are still some refs to parser nodes in there.

>+          nsAutoString  type(aAttributes[1]);
>+          SplitMimeType(type, mimeType, params);

Could you can get rid of type variable if it is only used once,
to avoid copy?

>+        nsAutoString  lang(aAttributes[1]);
>+        isJavaScript = nsParserUtils::IsJavaScriptLanguage(lang, &jsVersionString);

Could you can get rid of lang variable if it is only used once,
to avoid copy?

>+XULContentSinkImpl::AddAttributes(const PRUnichar** aAttributes, 
>+  PRInt32 count = aAttrLen;

Loose the count variable and use aAttrLen directly instead.

>+          /*nsCAutoString qnameC;
>+          qnameC.AssignWithConversion(qname);
>+          PR_LOG(gLog, PR_LOG_ALWAYS,
>+                 ("xul: unable to parse attribute '%s' at line %d",
>+                  (const char*) qnameC, aNode.GetSourceLineNumber()));*/

Why this commented out section?

>\ No newline at end of file

Add newline.

Will continue with (last file, phew!)

>Index: rdf/base/src/nsRDFContentSink.cpp
Comment on attachment 60239 [details] [diff] [review]
patch v1.2 [ slightly cleaned up. Could be cleaned up further ]

>Index: rdf/base/src/nsRDFContentSink.cpp
>===================================================================
> static NS_DEFINE_IID(kIContentSinkIID,         NS_ICONTENT_SINK_IID); // XXX grr...
>+static NS_DEFINE_IID(kIExpatSinkIID,           NS_IEXPATSINK_IID);
> static NS_DEFINE_IID(kIRDFDataSourceIID,       NS_IRDFDATASOURCE_IID);
> static NS_DEFINE_IID(kIRDFServiceIID,          NS_IRDFSERVICE_IID);
> static NS_DEFINE_IID(kISupportsIID,            NS_ISUPPORTS_IID);

You could get rid of these by using NS_GET_IID() macro.

>+       AddRef();

Hmm... it seems we are not logging addref/release in RDF content
sink. Maybe we should?

>+RDFContentSinkImpl::HandleCDataSection(const PRUnichar *aData, 
>+                                       PRUint32 aLength)
>+{
>+    return NS_OK;

Shouldn't we start supporting CData sections in 
RDF as well, this should be trivial to add?

>+RDFContentSinkImpl::PushNameSpacesFrom(const PRUnichar** aAttributes)

Once again check if we could have just one implementation of
PushNameSpacesFrom().

>+       // PR_LOG(gLog, PR_LOG_ALWAYS,
>+       //        ("rdfxml: expected RDF:RDF at line %d",
>+       //         aNode.GetSourceLineNumber()));

Loose/change these kinds of comments now that we don't have line
number information available?

>+        NS_NewRDFContainer(getter_AddRefs(c));
>+        c->Init(mDataSource, container);

Add out of memory check.

THE END.
>Shouldn't the logging and null sink be also included in the obj list
>in debug cases?
They are.

>+NS_IMETHODIMP  nsExpatDriver::Terminate(nsIParser* aParser)
>+{
>+  return mInternalState;
>+}

>Does this mean it will no longer be possible to terminate the
>parse? What about autoloading XLinks that will need to terminate
>the parse from the XML content sink?

Good point. I've made the necessary changes ( refer to patchv1.4/1.5 ).

>Index: htmlparser/src/nsLoggingSink.cpp
>===================================================================
>-static NS_DEFINE_IID(kISupportsIID, NS_ISUPPORTS_IID);
>+static NS_DEFINE_IID(kIExpatSinkIID, NS_IEXPATSINK_IID);

>-  if (aIID.Equals(kISupportsIID)) {
>-    nsISupports* tmp = this;
>+  if (aIID.Equals(kIExpatSinkIID)) {
>+    nsIExpatSink* tmp = this;

>Er... shouldn't an impl always QI to nsISupports? Also you wouldn't
>necessarily need to define those statics, you could just use
>the macro NS_GET_IID(<interface>).

Please ignore logging sink and TestParser.cpp/TestParser.pl. It shouldn't have
been a part of this patch.

>Does the view source DTD need the mSink member anymore? Have you tested
>that view source works still?

Yes, it still does require mSink :-(

>+    const nsAReadableString& key = nsDependentString(aAtts[0]);
>+
>+    SplitXMLName(key, getter_AddRefs(nameSpacePrefix),
>+                 getter_AddRefs(nameAtom));

>There are several instancies of this kind of pattern where you could
>probably get rid of the 'key' variable because it is only used once.
>It is likely the compiler will optimize these away but you could 
>do it as well.

No, 'key' is used in more than one place.


>+  nsAutoString data(aData); // XXX replace this with nsDependentString. Change
nsParserUtils APIs
>+  
>+  // ParseProcessingInstruction(text, target, data);
>+  result = NS_NewXMLProcessingInstruction(getter_AddRefs(node), target, data);

>You can get rid of the data variable since it is used only once. Also you can
>loose the comment, right?

No, can't do it. Since it would require changing nsParserUtils API and I'm not
planning on changing those APIs for now.

>+      nsresult rv = xmlcontent->MaybeTriggerAutoLink(mWebShell);
>+      if (rv == NS_XML_AUTOLINK_REPLACE ||
>+          rv == NS_XML_AUTOLINK_UNDEFINED) {
>+        // If we do not terminate the parse, we just keep generating link trigger
>+        // events. We want to parse only up to the first replace link, and stop.
>+        mParser->Terminate();

>So does this now work in your tree?
Yes it does.

>+XULContentSinkImpl::PushNameSpacesFrom(const PRUnichar** aAttributes)

>It looks like the impl of this method is different in XML and XUL content sink,
>shouldn't they be the same? I think I would prefer the one in XML sink.

Left as an excersise for the module owner. Note: I haven't introduced anything new.

>+XULContentSinkImpl::ParseTag(const PRUnichar* aText, 
>This method could be made faster by using substring I think, 
>you could then avoid some string copies and Left & Cut operations.

May be. Again this not as a result of my change.

- In xmlparse.c:

@@ -1378,6 +1409,9 @@
 	    return result;
 	  startElementHandler(handlerArg, tag->name.str, (const XML_Char **)atts);
 	  poolClear(&tempPool);
+    if (blocked) {
+      return XML_ERROR_PARSER_BLOCKED;
+    }

Fix the tabs to match the other code (the code uses tabs, yikes, so
you should too). There's lots of stuff like this in the xmlparse.c
changes.

- In CNavDTD.h, don't undo the |virtual foo| to |NS_IMETHOD_(foo)|
changes that I made a while back :-)

- In htmlparser/src/makefile.win, don't mess up the indentation.

- In nsExpatDriver.cpp, do we need the null aUserData checks in all
the Driver_ callbacks? Can that ever happen with a properly set up
parser? Maybe just leave the asserts in place and take out the if
checks?

Also, shouldn't all those callbacks be static,
i.e. PR_STATIC_CALLBACK(_type_)?

- In nsExpatDriver::HandleStartElement(), what's up with the
XML_GetSpecifiedAttributeCount(mExpatParser) / sizeof(PRUnichar)
stuff, why divide the count with size of PRUnichar? Doesn't seem to
make any sense to me. If this is correct, it needs a big comment
explaining what the deal is.

- In nsExpatDriver::HandleComment():

+    mDoctypeText.Append((PRUnichar*) aValue);

aValue is a const PRUnichar *, why the cast? Same thing in
nsExpatDriver::HandleDefault().

- In nsExpatDriver::HandleDefault(), what exactly is the newline code
supposed to do? Convert \n\r to \n\n and drop the rest of the
characters? Seems odd, and when exactly is ::HandleDefault() called? A
comments explaing that is needed.

- In nsExpatDriver::HandleStartDoctypeDecl(), aDoctypeName is
completely ignored, intentionally?

- In IsLoadableDTD(), could you make the code not use a in/out
parameter and use one in parameter, and one out parameter in stead?
Our inout param story is pretty weak, so don't use them unless
necessary, and nsCOMPtr<...> arguments are just scary.

Also:

+        nsSpecialSystemDirectory
dtdPath(nsSpecialSystemDirectory::OS_CurrentProcessDirectory);
+        nsString path; path.AssignWithConversion(kDTDDirectory);        
+        path.AppendWithConversion(fileName.get());
+        dtdPath += path;

Couldn't this be written as:

+        nsSpecialSystemDirectory
dtdPath(nsSpecialSystemDirectory::OS_CurrentProcessDirectory);
+        dtdPath += NS_ConvertASCIItoUCS2(nsDependentCString(kDTDDirectory) +
+                                         fileName);

to avoid unnecessary allocations and string copying? Same code appears
in nsExpatTokenizer.cpp.

- In nsExpatDriver::OpenInputStream():

+        char* absURL = nsnull;
+        uri->GetSpec(&absURL);
+        aAbsURL->AssignWithConversion(absURL);
+        nsCRT::free(absURL);

Make absURL a nsXPIDLCString, and the aAbsURL should really be a
nsAString& and not a nsString pointer. Change that and make the code
use CopyASCIItoUCS2() in stead of AssignWithConversion().

There's a few more methods that take nsString*/& arguments, change
those to nsAString& where possible.

- In nsExpatDriver::ParseBuffer():

+        mPosition = (XML_GetCurrentByteIndex(mExpatParser) - mBytesParsed);

This is what we talked about, right? mPosition is a character
positioin, and mBytesRead is the byte position. You need to divide by
sizeof(PRUnichar) here, or deal with it elsewhere. Also, make sure
these positions work correctly with unicode byteorder marks in the
beginning of the input stream.

Also, bad indentation towards the end of that method.

- In nsExpatDriver::GetLine(), could use some cleaning up, I don't see
the need for the endIndex and startIndex locals, use the pointers and
compare the pointers, if (start == end) is the special case, and do
aLine.Assign(start, end - start). Actually, there's no need for the
special case, you could do the assignment in that case too, it'll
assign an empty string into aLine, just what you want. Then you could
loose the aLine.Truncate() call too.

- Would it make sense to make one or more of mLastLine, mDoctypeText,
mCDataText nsAutoString's in stead of nsString's?

- In nsLoggingSink::QueryInterface():

-  if (aIID.Equals(kISupportsIID)) {
-    nsISupports* tmp = this;
+  if (aIID.Equals(kIExpatSinkIID)) {
+    nsIExpatSink* tmp = this;

Didn't you mean to add the kIExpatSinkIID and not replace the
nsISupports case? Also, don't define the IID's as static kIFooIID's,
use the NS_GET_IID(nsIFoo) macros.

- In nsParser.cpp:

@@ -1835,7 +1835,7 @@
             // as if it was read from the input stream. 
             // Adding UngetReadable() per vidur!!
             mParserContext->mScanner->UngetReadable(mUnusedInput);
-            mUnusedInput.Truncate(0);
+           mUnusedInput.Truncate(0);

?

- In nsParserFactory.cpp, use NS_DEFINE_CID() to define CID's, don't
use NS_DEFINE_IID() for those. Also, don't use NS_DEFINE_IID() to
define static IID's, use NS_GET_IID(interface).

- In nsXBLContentSink.cpp:

There's quite a few places where you do:

-    const nsAReadableString& key = aNode.GetKeyAt(i);
+    const nsAReadableString& key = nsDependentString(aAtts[0]);

I'm not sure that this is safe (it probably is in this case), this
will create a reference to a temporary whose lifetime might be limited
to this line of code. You should do this in stead:

-    const nsAReadableString& key = aNode.GetKeyAt(i);
+    const nsDependentString key(aAtts[0]);

There's *lots* of occurances of the above pattern, look for them all
and change them, no need for the reference to the temporary when you
can create a local nsDependentString and use that directly, that's
what you really want in this case anyway.

Then the code goes on and does:

     if (key.Equals(NS_LITERAL_STRING("event")))

This is sub-optimal, we already have the key as an atom, so compare
against a static atom, there might already be one in nsXBLAtoms, if
not, add one. I.e.:

     if (nameAtom == nsXBLAtoms::event)

If you don't want to do this here, file a new bug (and CC me and hyatt).

You might want to consider changing all the:

  while (*aAtts) {
    ...
    if (...) {
      aAtts += 2;
      continue;
    }
    ...
    aAtts += 2;
  }

loops to:

  for (; *aAtts; aAtts += 2) {
    ...
    if (...) {
      continue;
    }
    ...
  }

to eliminate the possibility of a infinite loop if someone forgets a
aAtts+=2 somewhere in one of those loops in the future.

- In nsXBLContentSink::AddAttributesToXULPrototype(), towards the end,
4 space indentation

- Remove nsIXMLContentSink.h from the repository.

- In nsXMLContentSink.cpp, GetDocTypeToken() really should return a
nsDependentSubstring() and the implementation should return a
substring of the string that's passed in in stead of cutting and
trimming the string that's passed in. This would be faster and
cleaner.

- ParseProcessingInstruction() could also need some new string code
love, but it's not obvious what the best thing to do there is.

- In nsXMLContentSink::LoadXSLStyleSheet(), get rid of the utf8
nsAutoString, use NS_NAMED_LITERAL_STRING() and make the methods we
pass the string to take const nsAString& if they don't already.

Also:

+  nsCOMPtr<nsIStreamListener> sl;
+  rv = parser->QueryInterface(NS_GET_IID(nsIStreamListener),
(void**)getter_AddRefs(sl));

use do_QueryInterface().

- In nsXMLContentSink::HandleCDataSection():

+      domCDATA->SetData(nsDependentString(aData));

You have the length of the string aData too, so pass it to
nsDependentString() so that it doesn't have to measure the length of
aData.

- Fix the indentation in content/xul/document/src/makefile.win

- In nsXULContentSink::ReportError() (and in
nsXMLContentSink::ReportError() too):

+  const PRUnichar* atts[] = {NS_LITERAL_STRING("xmlns").get(),
+                            
NS_LITERAL_STRING("http://www.w3.org/1999/xhtml").get(),
+                             nsnull};

Pointers to internals of temporary objects, use
NS_NAMED_LITERAL_STRING() outside the initializer of the atts array
and pass in those named literals to the array initializer. I'm pretty
sure this would result in a crash, or incorrect behavior on Linux
where NS_LITERAL_STRING's are nsAutoString's.

- In XULContentSinkImpl::OpenTag(), please fix the PR_LOGGING code you
commented out before checking in. Likewise in
XULContentSinkImpl::AddAttributes().

- In all the RDF changes, stick to 4 space indentation like the rest
of the code does.

- In RDFContentSinkImpl::OpenMember(), fix the PR_LOG code in stead of
commenting it out.

Fix all that and I'll sr...
> I'm not sure that this is safe (it probably is in this case), this
> will create a reference to a temporary whose lifetime might be limited
> to this line of code. You should do this in stead:

Binding a temporary to a reference ensures that the lifetime of the temporary
does not end before the reference goes out of scope.  (But it's true that
there's no point doing it when you don't need to, and it is dangerous when the
temporary is constructed from intermediate temporaries that are not bound to
references).
> In CNavDTD.h, don't undo the |virtual foo| to |NS_IMETHOD_(foo)|
> changes that I made a while back :-)

I use |virtual foo| for non interface methods. Correct me if that's not the
convention.

>Also, shouldn't all those callbacks be static,
>i.e. PR_STATIC_CALLBACK(_type_)?

May be it should. But it wasn't before and therefore I didn't make that change.

>In nsExpatDriver::HandleStartElement(), what's up with the
>XML_GetSpecifiedAttributeCount(mExpatParser) / sizeof(PRUnichar)

Donno what I was smoking :(. It should read 
XML_GetSpecifiedAttributeCount(mExpatParser) / 2; // since name-value pairs are
ordered in a linera fashion.

>In nsExpatDriver::HandleDefault(), what exactly is the newline code
>supposed to do? Convert \n\r to \n\n and drop the rest of the
>characters? Seems odd, and when exactly is ::HandleDefault() called?

Just accounts for newlines. Doesn't do any compression ( hoping expat parser
would do the compression for us. If not, there is code in the content sink ).

Note: The code is not new ( except for the check for '\r'). Differs slightly.
Before 
------
void Tokenizer_HandleDefault(void *userData, const XML_Char *s, int len) {
  XMLParserState* state = (XMLParserState*) userData;
  if (state->indoctype) {
    state->doctypeText.Append((PRUnichar*)s, len);
  }
  else {
    nsAutoString str((PRUnichar *)s, len);
    PRInt32 offset = -1;
    CToken* newLine = 0;
    
    while ((offset = str.FindChar('\n', PR_FALSE, offset + 1)) != -1) {
      newLine = state->tokenAllocator->CreateTokenOfType(eToken_newline,
eHTMLTag_unknown);
      nsExpatTokenizer::AddToken(newLine, NS_OK, state->tokenDeque,
state->tokenAllocator);
    }
  }
}

After
-----
nsresult 
nsExpatDriver::HandleDefault(const PRUnichar *aValue, 
                             const PRUint32 aLength) 
{
  NS_ASSERTION(mSink, "content sink not found!");

  if (mInDoctype) {
    mDoctypeText.Append(aValue, aLength);
  }
  else if (mSink) {
    static const PRUnichar newline[] = {'\n','\0'};
    for (PRUint32 i = 0; i < aLength && NS_SUCCEEDED(mInternalState); i++) {
      if (aValue[i] == '\n' || aValue[i] == '\r') {
        mInternalState = mSink->HandleCharacterData(newline, 1);
      }
    }
  }
  
  return NS_OK;
}

>In nsExpatDriver::HandleStartDoctypeDecl(), aDoctypeName is
>completely ignored, intentionally?

Have removed it. It's not necessary.

>nsExpatDrive, IsLoadableDTD()
>nsXMLContentSink.cpp, GetDocTypeToken()
>ParseProcessingInstruction()
>nsXMLContentSink::LoadXSLStyleSheet(),
Not as a result of new changes. Will do the clean up of the routine post landing.

















Forgot to mention:
Please ignore changes to nsLoggingSink and nsParserFactory. Those aren't real
changes.
Comment on attachment 63308 [details] [diff] [review]
patch v1.6 [ includes jst's suggestions ]

You could still change this to NS_LITERAL_STRING to avoid copy, no? 

+	
aParserContext.SetMimeType(NS_ConvertASCIItoUCS2(kXMLTextContentType));

There is also still this dangerous looking thing:

+    if (FindInReadable(NS_LITERAL_STRING("xul-overlay"), tmpStart =
targetStart, targetEnd)) {

I just checked the substantive comments I made to see if you fixed them. With
the two exceptions above it looks good, so provided you do something about them
r=heikki. I did not check the complete patch this time, nor did I check the
comments jst made.
Attachment #63308 - Flags: review+
Comment on attachment 63308 [details] [diff] [review]
patch v1.6 [ includes jst's suggestions ]

sr=jst
Attachment #63308 - Flags: superreview+
filed as bug 118907 for compile errors with Sun CC. Please fix. Thank you.

The startup time on facedown tinderbox dropped by 9.5%.
But Txul and Tp on comet, btek and sleestack went up, as did leaks.
Jag: does page load tests include xml documents? If it doesn't then my changes
should not affect page load.
page load tests do not include any xml documents. 

I'll note that there are two indirect ways that this (or whatever) 
could affect page load time: 

1) there is some interaction with the chrome during page load, so things that 
   affect chrome may affect page load time (but since all that chrome should
   be in the XUL cache by the time this test is run, I don't quite see how 
   parser changes would be in play).

2) Leaks. I've seen it before where just leaking will result in slower times, 
   particularly if the leak keeps happening during the execution of the test 
   (i.e., it gets progressively slower).
Comment on attachment 64107 [details] [diff] [review]
patch v1.7 [ to fix ports bustage. However, this breaks bugzilla pages. Needs work ] 

r=jag

Note that the IsEmpty() check isn't quite the same as a null check on a
pointer, IsEmpty() catches |!ptr && !*ptr|, but in this case that seems fine,
though I'd like hyatt to confirm that.
Attachment #64107 - Flags: review+
On irc harish mentioned that with the above patch backed out he's still seeing
bugzilla weirdness.
Ths OS/2 and AIX bustage is a little different.

It appears that:

NS_DECL_NSIEXPATSINK is not getting properly expanded.

I looked in LXR, and I don't see a #define for NS_DECL_NSIEXPATSINK.

Can someone tell me where this magic happens?
NS_DECL_NSIEXPATSINK is defined by the .h file generated by the IDL compiler.
My fix for the AIX and OS/2 bustage removes all the [const] modifiers from
nsIExpatSink.idl.  For an |in string|, adding [const] does nothing to the C++
signature.  For an |in long|, it adds |const| to the C++ signature, which
*should* not change the real signature at all (since the toplevel cv-qualifiers
are ignored for the signature since they're just local to the function), but AIX
and OS/2 don't think so.  So the patch removes the |const| from the PRInt32
parameters in the signatures which was used in some places and not others.
The problem with nebiros is that its "cache" is trying to build files that
aren't in the makefiles anymore.  (That might be helped by the cvs-removal of
nsExpatTokenizer.cpp and perhaps nsWellFormedDTD.cpp.)

So that means there are only two pieces of bustage on Ports.  One should be
fixed by attachment 64107 [details] [diff] [review], and the other should be fixed by attachment 64139 [details] [diff] [review].
Attachment #64139 - Flags: review+
Attached patch leak fixSplinter Review
Should this be in the destructor or in DidBuildModel?  I put it in the
destructor, which hopefully gets called soon enough.
DTOR is fine. That's where it was in nsExpatTokenizer.

nsExpatTokenizer::~nsExpatTokenizer(){
  if (mExpatParser) {    
    XML_ParserFree(mExpatParser);
    mExpatParser = nsnull;
  }
  
  if (mState)
    delete mState;
}
Attachment #64158 - Flags: review+
Comment on attachment 64107 [details] [diff] [review]
patch v1.7 [ to fix ports bustage. However, this breaks bugzilla pages. Needs work ] 

>Index: src/nsXBLContentSink.cpp
>+  NS_NewXBLPrototypeHandler(event, phase, action, command, keycode, charcode, modifiers, button, clickcount, getter_AddRefs(newHandler));

Watch the long line.

r=dbaron (and marking jag's review as an sr=, since this is a bustage fix
anyway)
Attachment #64107 - Flags: superreview+
I am crashing at startup in the new code (on Win2K, Modern skin, clobber-build)
due to a null pointer deference in:

nsXBLContentSink::HandleEndElement() 
{
[...]
      if (mState == eXBL_InImplementation && tagAtom ==
nsXBLAtoms::implementation)
	mState = eXBL_InBinding;
      else if (mState == eXBL_InBinding && tagAtom == nsXBLAtoms::binding) {
	mState = eXBL_InDocument;
CRASH>>>mBinding->Initialize();
	mBinding = nsnull; // Clear our current binding ref.
      }
[...]

Anybody is seeing that too? Or is there something that I am missing?
Yes. Refer to bug 119068. 

btw, its fixed.
update content/xbl/src/nsXBLContentSink.h
Comment on attachment 64206 [details]
stack trace after crashing with the new code

I pulled the fix that was checked in and it has resolved the problem.
Attachment #64206 - Attachment is obsolete: true
FYI: Attachement 64413 is as a result of fix in bug 119038 (
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=64414 ).
Ignore attachments 64411 & 64413. For some reason the build with expat changes
but without the recent patch ( in bug 119038 ) seems to produce the same result.
Ts on commet and sleestack dropped by 4% and 5% respectively ( after fixing bug
119038 ). However, I don't see a significant change in Ts/Txul :-(
Correction: In my last comment Ts, in Ts/Txul, should be replace with Tp.
Out of time. Mass move to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Marking this bug fixed. For Tp/Txul issue refer to bug 119066.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Making fixed, code checked into cvs (Rev 1.9) 

Status: RESOLVED → VERIFIED
Depends on: 330426
This bug may have caused bug 330426.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: