Closed Bug 40672 Opened 25 years ago Closed 23 years ago

Make transformiix faster

Tracking

()

Status:

VERIFIED FIXED

Milestone:

Future

People

(Reporter: Joerg.Brunsmann, Assigned: keith)

References

Details

(Keywords: perf)

Attachments

(4 files)

Profile 25 years ago Joerg.Brunsmann 13.80 KB, application/octet-stream		Details
Adds support for disabling duplicate checking, required by next patch 25 years ago Keith Visco 3.61 KB, patch		Details \| Diff \| Splinter Review
Patch to ProcessorState for speeding up document order sorting 25 years ago Keith Visco 1.86 KB, patch		Details \| Diff \| Splinter Review
Docbook RelaxNG Schema - example of large xml document. 23 years ago Peter Wilson 289.68 KB, text/xml		Details

Joerg.Brunsmann

Reporter

Description

•

25 years ago

I think transformiix and the combination of transformiix with mozilla is of great value. You've done a terrific job so far. But I must admit that transformiix performance is improveable. Even for mid-sized documents it gets rather slow. This poor performance is inaceptable. I wouldn't recommend to present a slow xsl processor to the mozilla web browser community. It would lessen the acceptance of xsl as whole, I guess. I've done some profiling with 'gprof'. Please see the result attached. Perhaps it's of any value for you.

Joerg.Brunsmann

Reporter

Comment 1

•

25 years ago

Attached file Profile — Details

Axel Hecht

Comment 2

•

25 years ago

Hi Joerg, profiling might be a good idea, and WOW, we just have performance bugs? Keith, you gotta be good. Anyway, the profiling data seems to be broken. I guess it's just to large for bugzilla. gprof /tmp/build/dist/bin/transformiix /tmp/profl gprof: unexpected EOF after reading 7056/111538 samples Is profiling useful yet? I know we just overhauled the Strings, other stuff pending? Where does transformiix range in matters of speed? Do we have alternatives to test against? Axel

Joerg.Brunsmann

Reporter

Comment 3

•

25 years ago

Hi Axel, I just checked; i can download and gunzip the data without problem. It's 14000 bytes long; shouldn't be too large. Profiling might be too early. Just wanted the get the message over that - from my viewpoint - it would be a mistake to give transformiix to the "masses" to early. I transformiix would have bugs in the implementation, I would suggest to follow the 'release often, release early' theme. But if the xlst-module suffers from performance problems, it's a complete different issue, which could handicap the wide adoption of xslt in client web browser. I know of two other C++ implementation, although I didn't test them: http://www.gingerall.com/ http://mdc-xsl.sourceforge.net/ I can say that Xalan/Xerces-liason (http://xml.apache.org/) are faster than transformiix. They are written in Java.

Blake Ross

Updated

•

25 years ago

Keywords: perf

Keith Visco

Assignee

Comment 4

•

25 years ago

First off...I'm not sure we want to discuss performance issues via Bugzilla. Secondly...there are a number of areas I can improve performance that I am aware of. I've already done a lot of this for XSL:P so I know where to begin. For the standalone version, I really need to use streamed output. It currently builds a complete result tree...and then prints it, which is not necessary as we've discussed a number of times on .xslt. For both standalone and the mozilla component, I know of a number of areas in my XPath implementation that I can improve on. Personally I think we need to concentrate on XSLT conformance, and robustness, first, as some of the performance improvements will probably make the code less enjoyable to read.

Blake Ross

Updated

•

25 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keith Visco

Assignee

Comment 5

•

25 years ago

I'm changing the status of this *bug* to Later, since it's not really a bug, though important, but shouldn't be dealt with until after some additional forthcoming changes are in place, and integration with Mozilla is further along.

Status: NEW → RESOLVED

Closed: 25 years ago

Resolution: --- → LATER

Blake Ross

Comment 6

•

25 years ago

reopening and marking Future...

Status: RESOLVED → REOPENED

Resolution: LATER → ---

Target Milestone: --- → Future

kmike

Comment 7

•

25 years ago

i think you'll find this useful. here is XSLT benchmarking results (including transformiix) using XSLTMark - XSLT benchmarking application: http://www.datapower.com/XSLTMark/res_2000_11_20.html they were using M18 though, but this benchmark is downloadable.

Keith Visco

Assignee

Comment 8

•

25 years ago

Attaching a couple of patches to help speed up sorting by document order.

Status: REOPENED → ASSIGNED

Keith Visco

Assignee

Comment 9

•

25 years ago

Attached patch Adds support for disabling duplicate checking, required by next patch — Details — Splinter Review

Keith Visco

Assignee

Comment 10

•

25 years ago

Attached patch Patch to ProcessorState for speeding up document order sorting — Details — Splinter Review

Axel Hecht

Comment 11

•

25 years ago

Looked at the source, these look good. I'll try them out later. About document order, I recognized that LocationPath reverses the nodes for some axes. We should fix that. And if we did, do we need to explicitly sort by document order for anything but nodeset unions? Axel

Keith Visco

Assignee

Comment 12

•

25 years ago

Hi Axel, Yes, we're going to need to preserve the Axis ordering. A reverse axis like the ancestor axis should remain in reverse order. So we'll need to be careful about this, which I admit, I curently am not, because a call to sortByDocumentOrder will rearrange the NodeSet. --Keith

Axel Hecht

Comment 13

•

25 years ago

Could we add a property to a NodeSet, saying whether it's document order forward, reverse, or unsorted? So in case we have a reverse axis, we don't have to sort, but just to reverse the set. Axel

Keith Visco

Assignee

Comment 14

•

25 years ago

Yes, we could do that. It'll help as long as there is only one PathExpr in the Union. If we can guarantee all NodeSet's are in document order we can do a simple merge sort. And I made a mistake in my previous post. I think I am currently being careful about reverse-axis. sortByDocumentOrder is only called at the end, so it's ok. Late Night...time to go to bed! :-) --Keith

Axel Hecht

Comment 15

•

25 years ago

Hi Keith, I tested your patch, it gave me some 10% speed improvement. r=me, get this in the tree as soon as it opens ;-). Axel

Keith Visco

Assignee

Comment 16

•

25 years ago

Your idea about saving the state of a NodeSet's sort order should also improve our performance, before I implement that solution I'd like to get the above patches checked in first. My local tree is getting ugly. So I'll wait until the tree re-opens, I check in the above patches, before I try out the NodeSet sort order proposal.

Axel Hecht

Comment 17

•

24 years ago

the two patches are checked in, r=me (peterv). Axel

Axel Hecht

Updated

•

24 years ago

Depends on: 77830

Keith Visco

Assignee

Comment 18

•

24 years ago

I really need to get back into the code, it's been too long! In any case I was doing some thinking. It might be nice to tweak the XPath expressions to perform an early return under certain cases. For example: bar[1] is probably doing too much work, unless only 1 bar exists to begin with. It first selects all the bar nodes...and then applies the predicate. In this case the predicate is the position 1, which means just the first node is kept. This means we created a nodeset unecessarily with all bar children (potentially needing to resize the nodeset), just to select the first one. The best way to solve this is to change the way we evaluate expressions. Instead of selecting all bar children first and then apply the predicate, we can select and apply as we go. So we select the first bar and apply the predicate. if it matches we add it to the nodeset and increment a position counter. If it doesn't matches we simply increment the position counter. We could add a flag to the predicates...such as returnOnTrue, which would inidicate that if the predicate matches (like with the position() function) then we know we don't need to check any further and we are done. thoughts?

Axel Hecht

Comment 19

•

24 years ago

Funny you say that, had that in my mind tonight. plus a awkward counter example: foobartag[position()=last()-5], and I can do worse: foobartag[mod(position(),3)=mod(last(),2))] Are there other ways to make the result depend on the nodeset? Axel

Axel Hecht

Updated

•

24 years ago

No longer depends on: 77830

Axel Hecht

Updated

•

24 years ago

Depends on: 77889

Keith Visco

Assignee

Comment 20

•

24 years ago

yes, of course....if the expression contains any sub-expressions that need the entire nodeset, we can't return early...but that doesn't mean we should always produce the complete nodeset....just because we might need it. We simply need to label the expressions.... so: 1. incremental Any expression which can evaluate predicates incrementally. 2. incremental + return-early Any expression which can evaluate predicates incrementally and safely return once it finds the first match. 3. non-incremental Any expression which needs the entire nodeset to evaluate predicates. 4. non-incremental + return-early Any expression which needs the entire nodeset to evaluate predicates but can safely return once it finds the first match. :-)

xyzzy

Comment 21

•

24 years ago

Is this strictly a Sun/FreeBSD issue?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 22

•

24 years ago

hell no, transformiix is slow on all platforms :)

OS: FreeBSD → All

Hardware: Sun → All

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 23

•

24 years ago

I think a nice way to spead up xpath evaluation is to have special classes for common optimizable (sub)expressions. For example the following would be nice to have special classes for: @foo A step that consists of just an attibute axes and a nametest without wildcards foo | bar | baz | zap:* two or more expressions which all consists of a single step using the child axis foo/bar/baz a PathExpr consisting only of steps along the child axis. (Not sure if there is a big win here) The nice thing about the last is that if they are evaluated properly the result dosn't need to be sorted. We need to be able to handle these already-sorted nodesets in the xslt code though.

Axel Hecht

Updated

•

24 years ago

Priority: P3 → P5

David Barrett

Comment 24

•

24 years ago

I might sound out of place, but in support of Joerg Brunsmann's post in which he mentioned the Java Xerces and Xalan parsers, I'm wondering if you guys have considered distributing tomcat as a servlet engine for Mozilla, installing cocoon, and have cocoon do the XSLT for you? P.S. What programming language are you using?

Axel Hecht

Updated

•

23 years ago

Depends on: 113611

Peter Wilson

Comment 25

•

23 years ago

Attached file Docbook RelaxNG Schema - example of large xml document. — Details

The load of the docbook.rng.xml takes approximately 5 minutes on my 600mh Win200 ith 128M of memory. The Javascript app takes approximately 1min 10 seconds of which only about 3 seconds is file load time. I don't have a separate load time for the style sheet as I just loaded the file directly using the browser.

Axel Hecht

Comment 26

•

23 years ago

standalone branch speeds this up by a factor of 3, debug builds. Module trunk (not really recent) crashed on me after some 15 minutes and 120Mb. :-( (note that this is still far from out-of-memory conditions on my machine)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 27

•

23 years ago

we are faster these days. Lets close this and open bugs on specific issues when we find them

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 28

•

23 years ago

sorry, forgot to actually mark as fixed

Status: ASSIGNED → RESOLVED

Closed: 25 years ago → 23 years ago

Resolution: --- → FIXED

Axel Hecht

Comment 29

•

22 years ago

mass verifying

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.