Open Bug 197956 Opened 17 years ago Updated 4 months ago

Loading a 698KB XML file is very slow (pretty printed with XSLT)


(Core :: XSLT, defect)

Not set





(Reporter: code.5772156649, Unassigned)


(Depends on 2 open bugs, Blocks 3 open bugs)


(Keywords: perf, Whiteboard: [reflow-refactor]ongoing progress, work is done in dependent bugs)


(5 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312

Loading a slightly large XML file (698KB in this case) is very slow. On my
laptop, here are times I got for load and display:
    * Mozilla: 5min10sec;
    * IE: 17sec;

Reproducible: Always

Steps to Reproduce:

Expected Results:  
Match IE's speed or be faster.

Please contact me and I'll email you my XML file.
Attached file Compressed XML file
Attachment #117559 - Attachment mime type: application/octet-stream → application/zip
OK, so if I turn off prettyprinting (by adding an HTML <a> to the XML) the
document loads in about 6 seconds.  With prettyprinting on, it takes forever (5
minutes?  I didn't time).

The flat part of the profile is:

Total hit count: 11804
Count %Total  Function Name
6703   56.8     nsVoidArray::IndexOf(void *) const
184   1.6     SearchTable
182   1.5     nsCOMPtr_base::~nsCOMPtr_base(void)
127   1.1     nsGenericElement::QueryInterface(nsID &, void **)
111   0.9     nsXMLElement::QueryInterface(nsID &, void **)

(yes, that's 56.8% of time spent in nsVoidArray::IndexOf).

The callstack for IndexOf is:

nsVoidArray::IndexOf(void *) const called from
nsSmallVoidArray::IndexOf(void *) const called from
nsGenericContainerElement::IndexOf(nsIContent *, int &) const called from
nsGenericElement::GetNextSibling(nsIDOMNode **) and
nsGenericDOMDataNode::GetNextSibling(nsIDOMNode **) called from 
nsTextNode::GetNextSibling(nsIDOMNode **) and
nsXMLElement::GetNextSibling(nsIDOMNode **) both called from 
Node::getNextSibling(void) called mostly from  
txStepPattern::matches(Node *, txIMatchContext *)

The problem, of course, is that with array storage nextSibling() is O(N) and the
walk over all siblings in txStepPattern::matches is O(N^2) in the number of
children (and this XML file have a very shallow and wide DOM tree).
Ever confirmed: true
Keywords: perf
OS: Windows XP → All
Hardware: PC → All
My walker for the source file should make this faster.
Depends on: txwalker
Assignee: heikki → peterv
Component: XML → XSLT
QA Contact: ashishbhatt → keith
Summary: Loading a 698KB XML file is very slow → Loading a 698KB XML file is very slow (pretty printed with XSLT)
*** Bug 200875 has been marked as a duplicate of this bug. ***
*** Bug 206129 has been marked as a duplicate of this bug. ***
*** Bug 182710 has been marked as a duplicate of this bug. ***
Interesting too: Displaying a 1,09M file exported from Visio not only takes a
lot of time to display...

Firebird 0.6                    300M RAM
Mozilla 1.4b+(Gecko/20030615)   177M RAM
IExplore 6                      126M RAM

Needless to say that IExplore display it in about an instant (progressively)
while Mozilla locks up for quite a while. (Additionally the stylesheet sucks
(I'll open another bug for that)).
the attachment in bug 208172 makes opening the testcase 6 (!!) times as fast.
*** Bug 213557 has been marked as a duplicate of this bug. ***
Loading an extremely simplistic 1.6MB XML file (i.e. lots of simple elements
with a maximum depth of only two elements) just took 1 hour and 23 minutes of
*CPU TIME* on a Pentium 4 at 2.4GHz. Mozilla also allocated 221MB of RAM to
manage 1.6MB of data. The file can be found at the URL below if you want to look
at it.

Based on my observations of Mozilla's behavior during loading, it appears that
the XML loader is repeatedly allocating memory in very tiny chunks. This
ridiculously slow performance is consistent with the performance I encountered
when writing my own XML library years ago and initially using the C++ STL. By
switching away from the STL and eliminating the tiny incremental allocations
through better suited memory management, the XML library I ultimately wrote
loads the file in a second or two. Hopefully, this info will prove helpful in
pinpointing the source of the problem for you guys. It's almost certainly the
memory management techniques being employed.
Rob: Are you sure that the time isn't spent performing XSLT on the document
rather then expat allocating memory? Executing the prettyprint XSLT stylesheet
is a few orders of magnitude slower then expat processing the document.

If you are sure that we're spending a lot of time on it please file a separate
bug as this bug is about time spent in prettyprint. And preferably contact the
expat people since it that is where the problem ultimatly would be.
Jonas: I'm honestly not sure whether it's a prettyprint issue or an XML load
issue, and I leave for the airport in a few minutes, so I can't explore this
further until I get back. When I went to report the bug, I did a search and
encountered this one, so I assumed it was the same issue. In retrospect, I
didn't try to distinguish between an XML DOM issue and an XSLT issue. 

That being said, I very much DO believe it's a memory management issue from the
symptoms I witnessed. Watching Task Manager, Mozilla kept allocating memory in
very tiny chunks until the entire 220MB of RAM was consumed and the XML
displayed. This was the exact same behavior I encountered when writing my own
DOM a few years back, in which I was using the STL and appending characters to
an std::string as they were read in. The memory issue could reside within either
the core XML code or the XSLT code, but the evidence I saw definitely looks like
a memory utilization issue.

I'll try to look into this further when I get back from my trip...
I believe Expat's memory management is efficient, which you can witness by
loading the file with XMLHttpRequest, for example. You could also try turning
pretty print off, which should still be quite fast. I think the obvious thing to
investigate now is to look into the memory management inside Transformiix. Does
it use its own recycled memory pool (I seem to recall the HTML parser does
something like this)? If not, that might provide a big win.
Yes, transformiix recycles memory for much of the temporary objects being
created. The biggest exception is when using string-classes, but I'm not sure if
there is a lot we can do about that.

The bulk of the memory consumed during prettyprinting is probably the wrappers
created for transformation (which will go away in bug 151002) and the
result-document. We might be able to reduce the size of the result-document, but
some layout guru would have to help out with that.

Anyhow, I don't think any analysis right now will be very productive since a lot
will change once the two bugs blocking this (bug 151002, bug 208172) are fixed.
We will probably see around a factor 10 in speedup from the two combined.
It's been almost a year now since this bug was created and it's still marked as
NEW.  This is a real problem.  I'm not so sure that this should wait until Bug
208172 gets fixed.  This could be a completely separate issue.  Loading a 400 K
XML file is taking 20-25 seconds on Mozilla and it loads in under 5 seconds in
IE.  Furthermore since it loads progresively, the usability is also much better.
 Mozilla consumes over 35 MB to load the file and IE consumes around 30 MB. 
Both seem excessive.
So according to Milind's numbers Mozilla is 4-5 times slower than IE. That shows
a HUGE improvement from the situation where this bug was filed (Mozilla being
over 18 times slower than IE). So I'd say the people that are working on this
bug know what they are doing. Those of us not working on this bug should just
relax and wait for the good stuff... complaining that a bug is not yet fixed is
not helping.

Oh, and thanks for the great work! ;)
Two things:
1st, smaller testcases are likely to be relatively faster. Performance is 
measured in C*N^k, N being the size of the source. Size being very vague, as
some of the speed depends on the structure of the XML. Getting C down is nice,
getting k down is sexy. The walker bug 151002 improved on both, for example.
I'm afraid that we still might have the one or the other point in the code
where our k is bigger than that of IE, which is why smaller testcases don't behave
as bad as really huge ones.
2nd, Milind, we can't just take some magic powers and make the code obey your 
wishes. We need to analyse the code, testcases, stylesheets and extract specific
problems, design solutions (and agree on them :-() and code it. That's what 
dependent bugs are for. (Changed the whiteboard to make that more obvious.)

PS: Progressive output might come as part of bug 205778, which is going to 
improve perceived speed (yet another number, yay). Not 100% sure if this is
going to hit prettyprinting, and if, immediately. Please don't make this bug
depend on bug 205778, as it does not improve the speed of the transformation
Whiteboard: ongoing progress, work is done in dependent bugs
Axel, my wish is pretty straight forward I'd think.  The same as the wish of the
person who opened this bug a year ago.  Make it run as fast or faster than IE.

As long as it's sitting waiting on an XPATH bug to get fixed, the code is not
being analysed, designed, coded.  My initial feeling was that perhaps it's the
stylesheet or schema validation in the XML file that is taking time.  But it's
not.  I removed the stylesheet and the schema reference from the XML file and
the performance does not change.  I don't know where XPATH comes into this.  But
I feel just displaying such a small XML file should not take this long.
> As long as it's sitting waiting on an XPATH bug to get fixed, the code is
> not being analysed, designed, coded.

Since pretty-printing uses XSLT I really don't know how we could make it faster
without making XPath faster. I think everyone working on transformiix has
invested a lot of time to make it faster and we have had good results. If that's
not enough for you, I'm sorry, but there's not more hours in my day. Giving up
this bug, maybe someone else has enough free time to fix it by yesterday.
Assignee: peterv → nobody
Target Milestone: --- → Future
Assignee: nobody → nobody
I see.  I was wondering what the connection with XPath was.

But yes.  It's not good enough for me or anyone else who has to load an XML file
in Mozilla.  Not by a long shot.  Until this bug is fixed, I'll have to load XML
files in IE.  

I'm not dumping on you Peter, but just as managing priorities while devoting
limited resources to development is a reality, so is the fact that pretty
printing of XML files can certainly not be considered as a feature of Mozilla.
I have forked some ideas about the bloat into bug 231925.
Depends on: 231925
*** Bug 237181 has been marked as a duplicate of this bug. ***
*** Bug 229729 has been marked as a duplicate of this bug. ***
*** Bug 237798 has been marked as a duplicate of this bug. ***
*** Bug 255625 has been marked as a duplicate of this bug. ***
*** Bug 247340 has been marked as a duplicate of this bug. ***
*** Bug 295746 has been marked as a duplicate of this bug. ***
*** Bug 300788 has been marked as a duplicate of this bug. ***
*** Bug 299658 has been marked as a duplicate of this bug. ***
Seems like the most relevent bug to mention this. Just (unknowingly) tried to load a large (14 meg) xml file. Firefox completely locked and had to be killed.

The file was an iTunes music library file, and is available:
Try using a trunk build, it should be a lot faster there.
I'd say crash may be related to Adding to dependencies, correct me if I am wrong (never used them)
Blocks: 325456
Jeremy: Sorry, didn't read your comment thoroughly. This bug is about performance. If you see crashes or hangs plase file a separate bug.
No longer blocks: 325456
I can't agree on the numbers. On a AMD 2700+ an 1.5 MB xmlfile (even gzipped) took a minute to parse. I guess that is fair, but one minute of no response is not good.

Firefox should:
1) Give you a warning, and an option to download the file instead of having it opened.
2) Parse the file in a low priority thread, so the user is able to continue work on the other tabs.
3) Use a progress bar to show how long the parsing has come.
No longer depends on: 351740
Depends on: 351740
*** Bug 199810 has been marked as a duplicate of this bug. ***
Now that a lot of the dependencies have been fixed, it would be interesting to see what a new profile looks like.
This is just here for my reference; using this depends on the XML file being in a particular place, etc.
Attachment #241120 - Attachment is patch: false
Attachment #241120 - Attachment mime type: text/plain → application/zip
That most recent profile is in a build with the patches for bug 221335 and bug 355246 in it.  Total hit count: 346356.  I'm doing 8192KHz, so this is 42 seconds or so.  That's a huge win.  That's about 50% smaller than the time at which I got tired and killed the app the last time I profiled this (see bug 351725).

More importantly, only 68017 of the hits are under nsXMLContentSink::MaybePrettyPrint.

Other significant contributors are (all numbers under, not in):
 110828 ViewportFrame::Reflow
  82427 nsCSSFrameConstructor::ContentInserted
  76749 nsCSSFrameConstructor::RestyleEvent::Run

These four numbers add up to 96% of the total hits.  The rest is likely parsing, waiting on the disk, etc, etc.

My first question is why we're posting a restyle event here at all.  It looks like we post it when the stylesheets finish loading.  I wonder whether we could hold off frame construction till after that's done in this case...

Given the use of tables here, the reflow stuff probably depends on the reflow branch.  Should certainly reprofile once that's landed.

The frame construction code is about 60% style resolution.  The only thing that will help there is having fewer nodes...

For MaybePrettyPrint itself, bug 351731 is still biting (of the 68017 MaybePrettyPrint hits, 5284 are under HTMLCaseSensitiveAtomTagToId).  Another 6000 or so are under the Observe() call on the binding, inserting nodes (mostly the recursive BindToTree call, I think).  Again, fewer nodes would help.  Atom array parsing when setting attributes is another 9000 hits or so.  The NS_NewAtom thing kinda ends up biting because we're in fact not passing in flat strings...

One of the XSLT folks should look at the rest of the XSLT stuff to see if anything obvious jumps out.
Whiteboard: ongoing progress, work is done in dependent bugs → [reflow-refactor]ongoing progress, work is done in dependent bugs
Wow, this looks way different. We should be a lot faster now.

I basically can't see anything major we can do on the XSLT side to make this faster. Seems like most of the time is spent in layout rendering the page, and in content just building the contentmodel.

I'm very confused by the numbers for nsGeneric(HTML)Element::BindToTree. How could this be:

82467 306    4628 nsGenericHTMLElement::BindToTree(nsIDocument*, ...
            28263 nsGenericElement::BindToTree(nsIDocument*, nsIContent*, ...
              334 nsGenericHTMLElement::ReparseStyleAttribute()

How can we spend more time in a subcall than in the function itself? And nsGenericElement::BindToTree shows the same thing, but in reverse. Are there really almost 30000 hits in BindToTree? If so, that's almost 50% of all the time spent in XSLT, which doesn't seem right.
Btw, I think the next major thing to do would be to fix bug 231925.
I think the biggest win on the XSLT engine side would probably be txStylesheet::findTemplate. It's probably not going to make a big difference here, a few percent or so, but it'd be a good idea in general. I filed bug 355322 on it.
Depends on: 355322
> I'm very confused by the numbers for nsGeneric(HTML)Element::BindToTree.

The numbers reported by jprof for recursive things like that are a little weird.  In brief, ignore them.  ;)  The numbers on the first line there (4628 hits under BindToTree, of which 306 were in the function itself) are correct, as are the other numbers under BindToTree.  Also, note that I would think at least half of the BindToTree time probably comes from the call the XBL makes to insert the new content into the DOM...

> We should be a lot faster now.

Yep.  ;)

> I basically can't see anything major we can do on the XSLT side

Short of preventing what is effectively a "flash of unstyled content" due to stylesheets loading after we called StartLayout() here, I agree.  And that's not strictly "XSLT side" anyway.

There may be minor XSLT things that could help a tad, but I think we should try to address the other issues first before they really become visible.

And yes, fixing bug 231925 should help a lot.  If we can go from 5 HTML nodes per tag to 4 that's a 20% speedup, generally speaking.  ;)  It'd help across the board, with style resolution, reflow, content model construction, etc.
> > I basically can't see anything major we can do on the XSLT side
> Short of preventing what is effectively a "flash of unstyled content" due to
> stylesheets loading after we called StartLayout() here, I agree.  And that's
> not strictly "XSLT side" anyway.

Yup, I would say that's the prettyprinters job. We do the right thing if you do a transformation using <?xml-stylesheet?>
Actually, that last profile is not quite right.  There's another 40 seconds or so spent on the reflow triggered by the sheet loads....  Probably a lot better on reflow branch, but still not likely to be completely gone.

So we really should work on holding StartLayout for the sheet loads.  Is there a bug filed on that?
Filed bug 380612 on holding off reflow for sheet loads.  The profile with the patch for that bug of the testcase in bug 291643 shows the following breakdown:

734093 total hits.  Of these:
   33326 hits parsing the original XML
  191185 hits under txMozillaXSLTProcessor::TransformToFragment
  215691 hits under frame construction for the prettyprinted document
  258716 hits under reflow of said frame tree

And some minor stuff.

The only place frame construction happens is from InitialReflow(), called from the end of the stylesheet load.  The only place the reflow happens is from the flush we do before firing the onload event.  So we're no longer duplicating various work.  Reflow may or may not get helped by the textframe landing (about 50% of the reflow time is in reflowing nsTextFrames).

A bit over 50% of the frame construction time is style resolution (with about 1/3 of that being probing for various pseudo-style-contexts).  But even something basic like constructing the nsTextFrame objects take about 1.5-2 seconds over here (out of a total time of 60 seconds).  It's just a big file.  :(
Attachment #264721 - Attachment is patch: false
Attachment #264721 - Attachment mime type: text/plain → application/zip
On the XSLT side I can't really see anything. Almost all the time is spent in the output handler creating nodes or inserting them in the DOM. Mostly the latter.

We do spend some time looking for mutation handlers which seems like a waste, we should be able to not notify which would avoid that check. But it's minor stuff, 14095 hits.
So a current profile shows us split about evenly between frame construction, reflow, and nsXMLPrettyPrinter::PrettyPrint.

Of this last, about 1/6 is the BindToTree for that big DOM.  The rest is txMozillaXSLTProcessor::TransformToFragment.

Not sure what we could do to improve that part, and we're sort of reaching diminishing returns there anyway...  In any case, the last profile I posted shows more or less the same thing, so anyone who wants to optimize the XSLT part can look at what that shows.
Yeah, i think we should mark this one fixed. We should be a ton faster now than we were when this bug was opened.
The speed is quite bit better then it was 4 years ago, that's for sure, but Firefox still quits responding and uses way too many CPU cycles to open such a small file.

I downloaded to my desktop, which is a 1.6MB XML file, closed Firefox, re-opened Firefox, dragged the XML file to Firefox and it took roughly 25 seconds to open the file and Firefox was NOT responsive during this time.

I have an Intel 975XBX2 motherboard + Intel Core 2 Quad with 4 cores running at 2.8ghz each, 2GB of DDR2-667 and 2x 320GB Maxtor SATA drives in RAID-0 configuration, so there should be no reason why it takes 25 seconds to open a 1.6MB file on this machine. 
Was the test in comment 52 performed with a trunk build?
I also think it should be a prerequirement for this bug, that firefox shouldn't be unresponsive no matter the size of the file.
A progress bar would be nice.
I'm sorry Boris, but no, it was not. It was performed the regular release.

(In reply to comment #53)
> Was the test in comment 52 performed with a trunk build?

Brian, I checked in several patches on trunk that significantly improve memory usage and performance on these sorts of testcases in the last several weeks.  Please try a trunk build, ok?  Testing this with branch is pretty much pointless now.

Trunk still won't be "fast", but it'll be faster.
Bug 231925 would win some 5%, and bug 355322 some 2-3%.

We might also save some additional 2-3% by setting aNotify to false when generating the DOM. I'm not sure what the consequences of that would be though.

Other than that, I can't really think of anything off the top of my head.
There are some addrefs/releases from nsCOMArrays in xslt... I wonder whether we can use nsTArray<nsIContent> in some of those places safely.
Don't think we can. Scripts can execute during the creation of the result page.
No longer depends on: 289150
Duplicate of this bug: 289150
I noticed I was voting for this bug and it's still marked new, so I'll posted an update.  Firefox 3.1b3pre build 20090218033015 displays the Compressed XML File linked @ almost instantly.
Right, but if I copy it a few times so it becomes 2,7 MB, it takes a good 10 sec on my machine, and in that time firefox is absolutely unresponsive - even the tab-throbber stops.
A progressbar would be good as a fallback when even the best code isn't fast enough.
Thomas, mind compressing your test file and attaching it to this bug?
Oh, and as a note, some of the code involved just runs to completion, which means the progress bar just wouldn't update.  That includes the XSLT (25% of total time), frame construction (25% of total) and reflow (50% of total).  Reflow is going to become interruptible pretty soon here; that might help responsiveness a tad, at least in terms of making the browser respond to a "close this tab" faster. is 1.53MB in size. Loading it from local disk takes about 6 seconds to completely load (Quad Core Q9450 @ 2.66ghz + 4GB DDR2). During that time, Firefox 3.1b3pre Build 20090218033015 with no other tabs open used 248MB of RAM and 100% of 1 CPU core.
QA Contact: keith → xslt
(In reply to comment #66)
> is 1.53MB in size.
> Loading it from local disk takes about 6 seconds to completely load.
I can reproduce that in 5.0. And while loading Firefox is frozen.
This problem seems to affect Thunderbird as well.  (I am using Thunderbird 38.2.0 on openSUSE 13.2 for x86-64.)  Someone recently e-mailed me a 500 KB XML file; Thunderbird attempts to display the attachment in the message itself and takes 30 seconds to do so, during which time it is completely unresponsive.
Bug 736951 is possibly related or a duplicate.
Blocks: 736951
I'm experiencing a similar issue on Thunderbird 38.6.0 on both Linux and Windows, both 64 bit. This issue has been happening for a very long time now, probably close to 2 years. In my case I'm receiving an email from a web server that contains information about a PHP error. I think that in my case the problem is caused by a large serialized PHP object. I don't get this error often, which is why I've refrained from reporting this earlier. The longest line in the email is 70223 characters long. I see CPU usage go to 100% for as long as 2 minutes when Thunderbird opens the email. The UI is completely blocked during this time.

I'll attach a sample to this bug.
Please file a separate bug on that. This bug is about slowness when rendering XML files. The file that you attached is a Text file, not an XML file.

So not related to the problem in this bug.
I tried to find the most relevant bug possible instead of opening a new one. There's a common factor here, and that seems to be long lines. Granted, I'm not familiar with the Thunderbird code, but if I had to guess, I'd say there's a pretty good chance both these issues are caused by the same problem. Feel free to move it though. I've done my part.
Blocks: 1245453
You need to log in before you can comment on or make changes to this bug.