Assignee: firefox → peterv
Component: General → XSLT
Product: Firefox → Core
QA Contact: general → keith
Version: unspecified → 1.7 Branch
So on trunk, the test url spins the CPU for a good long time... I profiled it; the results are: Total hit count: 287165 Count %Total Function Name 18700 6.5 _PR_x86_AtomicIncrement 18533 6.5 nsGenericElement::GetAttrNameAt(unsigned, int*, nsIAtom**, nsIAtom**) const 14208 4.9 nsAttrAndChildArray::GetSafeAttrNameAt(unsigned) const 13682 4.8 PR_AtomicIncrement 12244 4.3 _PR_x86_AtomicDecrement 9816 3.4 nsCOMPtr_base::~nsCOMPtr_base() 9754 3.4 __i686.get_pc_thunk.bx 9618 3.3 nsAttrAndChildArray::IndexOfChild(nsIContent*) const 8767 3.1 txXPathTreeWalker::moveToValidAttribute(unsigned) 8265 2.9 txNameTest::matches(txXPathNode const&, txIMatchContext*) 7673 2.7 AtomImpl::Release() 7508 2.6 AtomImpl::AddRef() 7179 2.5 nsCOMPtr_base::begin_assignment() 6343 2.2 nsAttrAndChildArray::MappedAttrCount() const 6072 2.1 txXPathNodeUtils::getLocalName(txXPathNode const&) 5363 1.9 PR_AtomicDecrement etc. More interestingly: Total hits: 287165 Hits under AtomImpl::AddRef: 31302 Hits under AtomImpl::Release: 23173 Almost all the addrefs are from nsGenericElement::GetAttrNameAt (and the releases are when the COMPtr we got the atom into goes away). In fact, total hits under GetAttrNameAt: 71493 These break down as: 18533 in nsGenericElement::GetAttrNameAt 27586 AtomImpl::AddRef() 12652 nsAttrAndChildArray::GetSafeAttrNameAt(unsigned) const 8939 PR_AtomicIncrement 1155 nsAttrAndChildArray::MappedAttrCount() const 1049 __i686.get_pc_thunk.bx 815 _init 764 __i686.get_pc_thunk.bx For a more top-down view, of the 287165 total hits we have 188281 under FilterExpr::evaluate and 98804 under PathExpr::evaluate. These both end up going through RelationalExpr::evaluate (286539 hits under here). Then I get lost in the maze of XSLT code. I'll attach a profile, but maybe we can start by deCOMtaminating getAttrNameAt? Maybe it can return a struct that holds two nsIAtom pointers? Or return a weak nsIAtom and out another weak one, with some clear documentation? Or even return an nsAttrName* (and null for bogus incoming index)?
Some of this is bug 208172. But then we may want to have a good nonCOM method to get the local name of an atom. I did an lxr search on call sites to GetAttrNameAt, and far from all use the prefix, so having a non-addref only-localname and namespace ID method might be worth it. One thing that we may optimize right away inside tx code would be to get a qname struct in txNameTest::matches. Because actually, we end up calling GetAttrNameAt twice, once for the namespace, once for the localname. So for each attr, we end up with 2(4) addrefs-release cycles instead of - none with a special entry point - 1(2) with a qname struct. The numbers in braces are for namespaced attrs, which we don't have in this case.
Status: UNCONFIRMED → NEW
Depends on: 208172
Ever confirmed: true
Would most of that particular problem be solved by txNamedAttributeStep?
Actually, the GetAttrNameAt problem is worsened by the fact that module txXPathTreeWalker::moveToNextAttribute will call GetAttrNameAt checking that the attribute isn't an xmlns one. That would be solved by txNamedAttributeStep, but an alternative solution is to move the xmlns checks to a more optimized place making it the responsibility of the caller of moveToNextAttribute to filter them out.
Not really. http://seanwhalen.home.comcast.net/xslt_test/RevealBombs.xsl has expressions like "//click[last()]/@h" "//square[@h=$hClick and @v=$vClick]" "//square[ (@isRevealed = 0 and not (@sqID = $alreadyRevealed/@sqID )) ]" so txNamedAttributeStep is just part of it. It may help a few parts, but the real buggers are inside the // part and the simple predicates that don't need the whole context.
In this nice little thing: <xsl:variable name = "revealing" select = "$field[ @isBomb != -1 and @isRevealed = 0 and ( (concat(@h -1 ,'/', @v ) = $zeros/@sqID) or (concat(@h +1 ,'/', @v ) = $zeros/@sqID) or (concat(@h -1 ,'/', @v -1) = $zeros/@sqID) or (concat(@h -1 ,'/', @v +1) = $zeros/@sqID) or (concat(@h ,'/', @v ) = $zeros/@sqID) or (concat(@h +1 ,'/', @v +1) = $zeros/@sqID) or (concat(@h +1 ,'/', @v -1) = $zeros/@sqID) or (concat(@h ,'/', @v +1) = $zeros/@sqID) or (concat(@h ,'/', @v -1) = $zeros/@sqID) )] " /> I bet we would be immensly helped by txNamedAttributeStep since we'd in all the attrsteps inside the concat would just end up calling getAttr rather then moveToNamedAttr etc. The @sqID call is trickier to optimize though since that could potentially return a nodeset with many nodes, in which case we actually do need to walk to the attribute node (note how comparisons with nodesets with multiple nodes work). What we could do is to figure out the case when the nodeset only contains one node. But that would be very tricky in this case and would have to be done at runtime. I.e. we'd have to check that the variable only contains one node and that that node does in fact have an sqID attribute. This seems like it in most cases would just be a waste of cycles compared to actually just getting the nodeset. Hmm.. one possibility is to have some function that returns either a nodeset if the result contains multiple nodes. Or returns a string if the result will only contain one node. Something like Expr::evaluateToMultiNodeSetOrString. Again, might be adding a lot of code for a small gain. A better approach is probably to make the @sqID evaluate as quickly as possible. One thing to do is to have an early-bail in moveToNamedAttr that calls hasAttr and bails if that returns false. Another thing would be to add nsIContent::IndexOfAttr(PRInt32, nsIAtom*), that would help enormously but is fairly xslt specific. We could also of course try to detect that there are several identical "$zeros/@sqID" calls and break that out and just evaluate once. This might be hard to do without spending way too much time comparing expressions though. And you could argue that in the most cases the author will probably do that if he suspects that the expression is slow to evaluate so we'd rarly actually find duplicated expressions worth bothering about.
This chunk is actually something that calls for a different optimisation, that is, evaluating $zeros/@sqID, @h and @v once in advance and use pseudo vars inside the predicate. I'd future thoughts like this. Let's focus on making the easy parts fast first.
(In reply to comment #3) > Some of this is bug 208172. But then we may want to have a good nonCOM method > to get the local name of an atom. I did an lxr search on call sites to > GetAttrNameAt, and far from all use the prefix, so having a non-addref > only-localname and namespace ID method might be worth it. How about we make GetAttrNameAt() return a struct by value and that struct would hold weak nsIAtom pointers. No refcounting needed at all then, and no separate API for the case where the caller wants one or both of the atoms.
That idea is as simple as it is brilliant :) We even have such a struct: nsAttrName. We could just change the signature to be: const nsAttrName* GetAttrNameAt(PRUint32 aIndex); That should be almost as fast as IndexOfAttr and will benefit all callers of GetAttrNameAt.
/me points to the last paragraph of comment 1.
Lol, sorry, i missed that. There's too much going on these days... So that along with bug 208172 will probably make a big enough difference here that we should just do that and get a new profile and reevaluate once that is done.
Hmm.. wanted to get one more idea in here just so I don't forget. We could change txXPathNode to instead of holding an index for the attribute hold an atom+nsid. This way a @foo step would simply be calling HasAttr and setting the atom+nsid. The downside with this is that it'll make the class bigger and cause more refcounting when we're standing on an attribute. And we'd have to have some other way of getting all the attributes for an element for @* type steps. Actually, the more I think about it the less I like it. If it turns out that @foo steps are still perfcritical after all else is done we should just add nsIContent::IndexOfAttr.
Depends on: 284950
I don't know if my commenting and voting for this bug would help, but I do hope it will. I too have been bitten by Firefox's taking some time to process the XSLT. http://blogs.ebusiness-apps.com/dave/?p=45 shows a benchmark comparison between Firefox and IE for Ajax related data transformations. And the following suggests on the neglected state of affairs of the XSL engine. "It is because, the internal XSL engine (transformMix) didn't got any improvement since Keith from Mitre gave it to the Mozilla project." - http://www.biglist.com/lists/xsl-list/archives/200603/msg00086.html And I have personally tried out Opera 9.0 Beta, which provides XSLT support, and it does perform significant better than Firefox. It is perhaps on par with IE, if not better; but I haven't gotten around to quantify that. I hope that something can be done before Firefox 2.0 gets out. Even a partial or a quick fix would be something to be grateful for.
Someone claiming that the xslt engine hasn't gotten any improvments since Keith left the project does not know what they are talking about at all. Such a claim is just dumb. Performance is steadily increasing by a lot. We still have the occational patch that doubles performance of certain common stylesheets (like the docbook one). So I hope your trying with a recent build when you complain about performance. Frankly, in most cases performance is good enough that the xslt is no longer a bottle neck. The benchmark you are pointing at doesn't seem to be pure xslt vs. pure xslt, but adds scripting into the mix, no? If you have problems with a stylesheet i strongly encourage you to file a new bug and attach your stylesheet. There is no lack of ideas of how to improve performance. There is only a lack of time and matter of prioritizing. The question is not what can we improve, but what is worth improving. That is much easier answered if we get good stylesheets to profile.
(In reply to comment #16) Sorry, my mistake. I didn't check when the transformation was completed and when the new document fragment was appended to the document. The transformation itself was fast, but appending the document fragment to the document was where the performance took a hit. At least for my application.
If you have testcases that show DOM performance issues, please file bugs, attach the testcases, and cc me.
So what are we timing here? The time to the first alert? Or something else?
(In reply to comment #19) > So what are we timing here? The time to the first alert? Or something else? > Technically the bug report is about the time between the 2nd and 3rd alert boxes. The 2.0 RC version of FF posted last week has a duration of about 8.5 seconds while IE 6.0.29 has a duration of less than 3 seconds.
We should be a lot faster now, but we still don't optimize expressions like '//foo' so we should soon become even faster.
> the time between the 2nd and 3rd alert boxes Ah, ok. I didn't realize the third box had this time in it, because its top was always cut off over here.... ;) So when I test Firefox 2.0RC2, I get times in the 20.7 second (plus/minus 0.2 seconds) range on my computer (P3-733). When I test today's trunk Firefox nightly (think Firefox 3 pre-alpha), I see times in the 5.5 second (plus/minus 0.3 seconds) range. So if things scale linearly to your machine, you should be seeing times comparable to IE on trunk. I'll attach a trunk profile of the time between the second and third alert.
Created attachment 242358 [details] Profile with current trunk Highlights: 34239 hits under txMozillaXSLTProcessor::TransformToFragment (that's about 4-some seconds). This seems to break down (digging down a bit; all hit counts are "under", not "in") as: 9023 RelationalExpr::compareResults 8711 txNamedAttributeStep::evaluate 6979 txNodeSet::addAndTransfer 2198 txAExprResult::Release 2384 txSingleNodeContext::recycler (the Release() and recycler() calls have are about 50% of the time "in", btw). compareResults spends about half its time under txXPathNodeUtils::appendNodeValue and the rest in things like nsSubsting::SetLength, etc. txNamedAttributeStep::evaluate spends almost a third of the time under txXPathTreeWalker::moveToNamedAttribute (lots of calls to GetAttrNameAt, looks like). txResultRecycler::getNodeSet is a good bit too. txNodeSet::addAndTransfer seems to be mostly under txNodeSet::add, which is back to txXPathNodeUtils::comparePosition. Lots of auto void array construction, IndexOf calls, etc.
Attachment #176237 - Attachment is obsolete: true
It definitely looks like optimizing '//foo' type expressions will help a lot. That should decrease the last three numbers in the previous comment since we'll do a lot less node shuffeling. One thing that is good news is that it seems like the recycler is working, because we spend next to no time in malloc. Other than that nothing immideately jumps out at me. Possibly switching the recycler and some other code to use nsTArray since it's more inlined and some of the array operations might be faster.
Depends on: 357345
You need to log in before you can comment on or make changes to this bug.