Closed Bug 59783 Opened 24 years ago Closed 22 years ago

Implement dynamic XPath

Categories

(Core :: XSLT, enhancement, P5)

enhancement

Tracking

()

VERIFIED WONTFIX
Future

People

(Reporter: dr, Assigned: peterv)

References

Details

(Keywords: arch, Whiteboard: [Hixie-PF])

Attachments

(3 files)

XBL, dynamic XSLT, and maybe XLink, all need a dynamic implementation of XPath, in which re-evaluation of a given XPath expression with respect to a given document is efficient, given the result of the expression and the exact change to the document. For example, given a document: <foo> <bar/> <bar/> <bar/> </foo> and an XPath: count(/foo/bar) if the document changes to include a fourth bar, the XPath shouldn't re-evaluate entirely, but should instead use the prior result (3) and the change (addition of an extra bar) to calculate the new result (4) without having to traverse the document again. Other criteria for the XPath implementation include: - Ability to plug additional functions/axes/etc. into the language based on context (eg. XSLT, XPointer, XSL-FO, etc. add extensions to the XPath language). - Ability to use from anywhere, distinguishing between dynamic and one-pass use (eg. define both nsIXPath and nsIDynamicXPath). - Fit cleanly with DOM mutation events, and use some simple notification scheme for clients. Setting component to XSLT, since that's the umbrella under which this will be implemented.
Blocks: 18722
Severity: normal → enhancement
Status: NEW → ASSIGNED
Keywords: arch
Target Milestone: --- → Future
To get a feeling for XPath expression coming up in XSLT, I extraced most (?) of them from the docbook stylesheet by Norman Walsh. I fed them thru some perl-fu, to find duplicates, and sort them according to frequency. I will attach the listing in a minute, don't wonder about NCName,that's a [a-z][a-z0-9]*, part of perlfu. We could use that list to evaluate how important performance of specific expressions is. Of course this should be done by counting their use, not frequence, I may do that someday. Axel
Axel -- Kickass work! I cleaned up your stats a little bit. There were a lot of things that looked like: $NCName.NCName.NCName.NCName Which are really just variable references, where NCName[.NCName]* is just one NCName (for example, "$arg.choice.plain.open.str"). I've collected those cases into $VarRef. Also, instances of NCName:: I've turned into AxisName::. Also sorted things better (sort -g -r does the numeric fu you want ;). Very interesting stuff, though. The most common XPath by far (549 occurrences) is a plain NCName. This is a relief, since that is a trivial case to optimize for. Second most common (164) is a variable reference, which doesn't even *need* optimizing with respect to the unnecessary traversals of the source tree. Third most common (142, but I suspect it would be more common than variable references in less complex XSLTs) is a single step expression, NCName/NCName. Cases like [("/"|"//")NCName]* strike me as easy enough -- we can probably match the name of the element changed against the list of names in the XPath and do something smart there... Further common cases are ".", "@NCName", unions, etc. which would reasonably have been expected as common, and should follow similarly to the cases above... One thing we might want to exploit is pre-implementing some of what XPath 2.0 might have to offer. What I'm thinking about specifically is the desire to loosen restriction on step expressions, so that instead of having to write things like: a1/b1/c | a1/b2/c | a2/b1/c | a2/b2/c you can write: (a1|a2)/(b1|b2)/c The ability to express things like this, I'm thinking, might be exploitable for optimization. In the DocBook stylesheets, you see a lot of things that look like this (for example, "simplesectinfo/authorgroup|docinfo/authorgroup"), and if those can be logically collapsed in the parsing stage, that would save a decent amount of work during transformation. Axel: can you do some more of your magic? It would be great to get a spread of data for different flavors of XSLT stylesheets. Anybody else still in college who wants to do some thesis work? *grin*
Updated website to include XPath implementation plans: http://people.netscape.com/dr/xslt/xpath.html
Hi Dan, I read your new doc (sorry, mine is still pending, I know), and some comments: First of all, schedule, hey, cool ;-). Concerning the changes of grammar for XSLT and XPointer, can you give the modified rules? I wonder, because it didn't pop up to my eyes reading the XSLT spec. When it comes down to optimizations, can the plugged optimizations be so fine grained and efficient, that they win over the price of being able to plug? Do we have more than two kind of grammar rules in XPath? Lists and Switches? You talked about doing code generation and reuse, does this look like two sets of macros? I'll probably whack some profiling before I'll get down to docs. Don't know when though. Thinking about tonight, but doubting still. Axel
Here goes some profiling data, dr, you may rename those again ;-) This was done by running the docbook samples test.xml and book2.xml once, with the html stylesheets. Axel
Target Milestone: Future → mozilla1.1
Blocks: xlink
reposted xpath planning doc to include Capital Letters (it's a miracle!) and a little bit of clarifying. so i've been trying to hack together a parser and tokenizer using bison and re2c, per scc@mozilla.org's suggestion. re2c has been moderately well-behaved, but bison does not like generating either c++-friendly or mozilla-friendly code. another popular tool called antlr seemed a possibility, but it doesn't seem particularly mozilla-friendly either. i may either have to do some perl-fu to get the output from antlr to play nicely, or may have to hand-code the parser for now (at least the tokenizer will still be generated). still looking for help in this area... any ideas anybody?
XPath is fairly small. Though not very pretty, I coded the transformiix XPath lexer/tokenizer by hand. I can probably help...or maybe I can port the existing one to your XPath object model. I still need to spend some time reading your ideas. I apologize for not having had a chance to look at them yet. --Keith
Hi Dan, I just cross read a bit of antlr's documentation, and it looked to me as we would need to link the executable to a antlr library. How far did you test this? I didn't even bother downloading after reading this ;-). Could you attach the code you have? I'm curious as to why mozilla won't love the bison code. Probably lots of reasons, I just wonder if there are a few good ones :-). I haven't found any other parsers. But isn't our grammar rather limited? So how about doing the code gen directly in perl? Axel Axel
Yeah, I was similarly disappointed with antlr. Amazingly, gnu lex (flex) has the same shortcoming -- you have to compile with -lfl for it to work. No way. Once I can actually start working on this (i.e. when I manage to transfer to the XML/DOM team) I'm going to hack together a nice and simple tokenizer interface, and we can write the parser(s) however we want. If you are able to write a parser generator in perl, (are you serious?!) that would be very cool... Otherwise, I'm starting to lean towards hand-coding the parser, because it's simple enough, and because we need to get this done before any "real" work can happen.
I've done a bunch more work on this in my spare time. I have a sexy UML diagram living on my whiteboard, and I was coordinating a little bit with rayw, who was trying to design a standard API for XPath... I'd love to get a copy of Rational Rose, to get this stuff down in a decent form. Sooner or later (grumble grumble) I'll be able to do some real work on this...
If your interested in a standard API for XPath, I can make available the new one I will be using in XSL:P. Assaf Arkin and I took what I had initially in XSL:P and came up a much cleaner --Keith
I'd be very interested to review that. Could you post it as an attachment here? Thanks very much!
Can anyone give a sort of very general, newbie update on how this is progressing? I am interested in using it in my XULMaker program as an alternative to IDs.
Michael: I'm working on this only in my spare time right now, due to more pressing work obligations, but you might be able to make use of peterv's excellent work (an API is exposed from transformiix to use XPaths). I suggest you email him.
doesn't look like i'll be doing any xml work any time soon :( ... futuring and reassigning to peterv.
Assignee: dr → peterv
Status: ASSIGNED → NEW
Target Milestone: mozilla1.1 → Future
Priority: P3 → P5
Whiteboard: [Hixie-PF]
alias for faster searching
Alias: xpath
WONTFIX'ing. dr's document is gone, and we have done alot of optimisations to the xpath engine since this bug was filed. There is a XPath js interface by now, for example. We should evaluate if we want expressions that return nodesets to return iterators, though. So that results are polled, rather than fed. But that's a different bug. Which should be filed once somebody has a plan, with predicates and reverse axes and stuff. Kai, please don't set alias on bugs which you know nothing about. This is not THE XPath bug. It might be a good idea to leave aliases up to those responsible for the module, or the bug.
Alias: xpath
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
Verifying. See comments in 18722.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: