Closed Bug 123041 Opened 23 years ago Closed 12 years ago

Most of XML parsing time is taken by a slew of string routines

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Future

People

(Reporter: sfraser_bugs, Unassigned)

References

Details

(Keywords: perf)

Attachments

(3 files, 1 obsolete file)

Profiling startup with Chronoscope, I notice XML parsing taking quite a bit of 
time. Most of the time under XML_Parse() is taken by a slew of different string 
routines which nickel-and-dime us for time. I'll attach some data for the 
descendents of a single XML_Parse() call.
ack. Adding harish for parsing stuff.. I can't tell - is this a backtrace or
just a list of functions where we spend lots of time? it would be REALLY useful
to have 2-3 stack traces for the more common routines.
Some data highlights:
nsString::GetReadableFragment() takes 5.7% of the time (minus descendents), and 
nsASingleFragmentString::GetReadableFragment() 3.6%.

[Ignore ReadWriteProc, which is the routine that does the file IO: it was 
incompletely covered by this profile.]
This is just a flat list of the functions that take the most (exclusive) time 
during XML_Parse(). I'll have to hack on my profiling tool to print backtraces.
I think this was a .rdf file, so most of the calls are made via 
nsRDFContentSinkImpl::OpenObject() which often calls down into 
rdf_MakeAbsoluteURI. I'll attach some data from a XUL file parse.
Most of the XUL string calls come from functions called directly or indirectly 
from XULContentSinkImpl::AddAttributes().
I love the idea of fixing this, but I'm not sure when I'll have time for the
analysis. 0.9.9, P3.
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
The RDF-related calls are covered by bug 115342, although having this profile 
is handy.
Depends on: 115342
Attached patch minor string cleanup (obsolete) — Splinter Review
mostly this reduces the number of string copies that we do when looking up
entities.. not a huge cleanup, but stuff I've had in my tree for a few weeks.

can I get an r=/sr= from harishd/sfraser?
Comment on attachment 68033 [details] [diff] [review]
minor string cleanup

r=harishd
Attachment #68033 - Flags: review+
Comment on attachment 68033 [details] [diff] [review]
minor string cleanup

sr=sfraser
Attachment #68033 - Flags: superreview+
+  nsAString::const_iterator start;
+  PRUnichar theChar=*theStr.BeginReading(start);

A simpler way to write this is:

  PRUnichar theChar = theStr.First();



+    if(kQuote==*end) {
+      end.advance(-1);

The |end| iterator points right after the last character, not at the last
character in the string, so *end (if this is a zero terminated string) will give
you '\0', and if it's not zero terminated you may end up causing a memory access
fault. Just change it to:

  if (kQuote == theChar) {
    --end;

(and you could use |++start|, too).

Hmmm, looking at the original code it seems you'll need to change this part too:

  if(kSemicolon==theChar) {
    theLen--;
  }

So, something like:

CNamedEntity* nsDTDContext::GetEntity(const nsAString& aName) const {
  if (aName.Length() > 2) {
    nsAString::const_iterator start, end;
    aName.BeginReading(start);
    aName.EndReading(end);

    if (*start == kQuote)
      ++start;

    PRUnichar theChar = aName.Last();
    if (theChar == kSemicolon)
      --end;
    else if (theChar == kQuote)
      --end;

    // alternatively, you'd do something like:
    // --end;
    // if (*end != kSemicolon && *end != kQuote)
    //   ++end;

    PRInt32 theCount = mEntities.GetSize();
    const nsAString& entityName = Substring(start, end);
    for (PRInt32 theIndex = 0; theIndex < theCount; ++theIndex) {
      CNamedEntity *theResult = (CNamedEntity*)mEntities.ObjectAt(theIndex);
      if (theResult &&
          theResult->mName.Equals(entityName,
                                  nsCaseInsensitiveStringComparator())) {

          // a literal translation of the old code would require
          // taking Substring(theResult->mName, 0, entityName.Length())
          // but I suspect the old code was wrong since it would match
          // entity "a" against "abc", which I don't think was its intent.

        return theResult;
      }
    }
  }

  return 0;
}
Attachment #68033 - Flags: needs-work+
I discovered more exensive changes I could make, after taking jag's string
iterator fixes...

one more try for r=harishd or jag, sr=sfraser?
Attachment #68033 - Attachment is obsolete: true
Comment on attachment 68638 [details] [diff] [review]
more extensive changes

>Index: htmlparser/src/CNavDTD.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/htmlparser/src/CNavDTD.cpp,v
>retrieving revision 3.369
>diff -u -r3.369 CNavDTD.cpp
>--- htmlparser/src/CNavDTD.cpp	5 Feb 2002 02:02:16 -0000	3.369
>+++ htmlparser/src/CNavDTD.cpp	8 Feb 2002 23:14:51 -0000
>@@ -2128,8 +2128,9 @@
> 
>   nsresult  result=NS_OK;
> 
>-  nsAutoString theStr(aToken->GetStringValue());
>-  PRUnichar theChar=theStr.CharAt(0);
>+  const nsAString& theStr = aToken->GetStringValue();
>+  nsAString::const_iterator start;
>+  PRUnichar theChar=*theStr.BeginReading(start);

PRUnichar theChar = theStr.First();

>Index: htmlparser/src/nsDTDUtils.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/htmlparser/src/nsDTDUtils.cpp,v
>retrieving revision 3.123
>diff -u -r3.123 nsDTDUtils.cpp
>--- htmlparser/src/nsDTDUtils.cpp	5 Feb 2002 02:02:17 -0000	3.123
>+++ htmlparser/src/nsDTDUtils.cpp	8 Feb 2002 23:14:52 -0000
>@@ -857,9 +863,11 @@
>       theIncrValue=0;
>     }
>     else if(theKey.Equals(NS_LITERAL_STRING("format"), nsCaseInsensitiveStringComparator())){
>-      PRUnichar theChar=theValue.CharAt(0);
>+      nsAString::const_iterator start;
>+      
>+      PRUnichar theChar=*theValue.BeginReading(start);
>       if('"'==theChar)
>-        theChar=theValue.CharAt(1);
>+        theChar=*start++;

This dereferences first, then moves the iterator to the next position.
I guess what you want is *++start.

sr=jag with the above comments addressed.
Attachment #68638 - Flags: superreview+
Comment on attachment 68638 [details] [diff] [review]
more extensive changes

r=sfraser
Attachment #68638 - Flags: review+
ok, that patch has landed...thanks folks. Keeping open for more analysis, but
moving out to 1.0
Target Milestone: mozilla0.9.9 → mozilla1.0
Target Milestone: mozilla1.0 → mozilla1.1
so where are we with this? a new profile would be good.. if only simon wasn't on
vacation :)
woah, 1.1 came up quick. Throwing these bugs over to 1.2.. little or no work has
been done and there is no way that these bugs can be fixed in 1.1.
Target Milestone: mozilla1.1alpha → mozilla1.2alpha
pushing a few bugs out to 1.2beta - too much stuff on my 1.2alpha plate!
I could really use a new profile on this bug if anyone has quantify or this mac
profiler working at the moment..
Target Milestone: mozilla1.2alpha → mozilla1.2beta
mozilla 1.1alpha is more or less done, so I'm moving non-critical mozilla1.2beta
bugs out to the next milestone to make room for the mozilla1.1alpha bugs that
didn't make it.
Target Milestone: mozilla1.2beta → mozilla1.3alpha
Furturing bugs that keep getting knocked from milestone to milestone...if you
feel this is in error, please nominate this bug using the appropriate keyword.
Target Milestone: mozilla1.3alpha → Future
Assignee: alecf → nobody
Status: ASSIGNED → NEW
QA Contact: scc → xpcom
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: