Closed
Bug 75102
Opened 23 years ago
Closed 23 years ago
New XPath parser and lexer
Categories
(Core :: XSLT, defect, P2)
Core
XSLT
Tracking
()
VERIFIED
FIXED
mozilla0.9.3
People
(Reporter: sicking, Assigned: peterv)
References
Details
(Whiteboard: [nsBranch+][PDT+])
Attachments
(19 files)
126.31 KB,
patch
|
Details | Diff | Splinter Review | |
2.04 KB,
text/plain
|
Details | |
1.19 KB,
text/plain
|
Details | |
5.73 KB,
text/plain
|
Details | |
3.08 KB,
text/plain
|
Details | |
137.08 KB,
patch
|
Details | Diff | Splinter Review | |
6.55 KB,
text/plain
|
Details | |
3.91 KB,
text/plain
|
Details | |
1.41 KB,
text/plain
|
Details | |
133.45 KB,
patch
|
Details | Diff | Splinter Review | |
10.99 KB,
text/plain
|
Details | |
152.95 KB,
patch
|
Details | Diff | Splinter Review | |
149.60 KB,
patch
|
Details | Diff | Splinter Review | |
10.98 KB,
text/plain
|
Details | |
1.41 KB,
text/plain
|
Details | |
153.55 KB,
patch
|
Details | Diff | Splinter Review | |
149.92 KB,
patch
|
Details | Diff | Splinter Review | |
26.40 KB,
application/zip
|
Details | |
153.52 KB,
patch
|
Details | Diff | Splinter Review |
This is a tracker bug for the new XPath parser (done by me) and the new XPath lexer (Pike is working on it) The parser changes are dependent on the lexer changes, either some small fixes I've made or Pikes rewrite. Pike will attach pathes
Reporter | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 1•23 years ago
|
||
Comment 2•23 years ago
|
||
Comment 3•23 years ago
|
||
Comment 4•23 years ago
|
||
Comment 5•23 years ago
|
||
Comment 6•23 years ago
|
||
so, finally I got the lexer revamped. I attached code by sicking@bigfoot.com for the parser, and all changes necessary for removing Pattern. The lexer rewrite is pretty brutal, and just in case you wonder, there is no big difference in a diff -w, though I changed indention to 2. I didn't implement error reporting to XSLT yet, that's a different bug, imho. I moved most of the lexer to be iterator friendly, and the Token handling could need some additional love. I'd like to have that separate though. (Tokens into a list, instead of linked list, and Tokens just storing indexes (iterators) to the lex'd string instead of newly allocated substrings) Note: scc-string-fu for doing comparisons of substrings is supposed to appear somewhere. I changed ProcessorState to not add pure Named Templates to the match templs. hrm. More? probably, but it won't pop up in my mind. Axel
Keywords: review
Reporter | ||
Comment 7•23 years ago
|
||
This kicks ass! Comments below (some of them are nitpicks bug this is my only chance to get back at nickpicks from you and peterv ;-) ) * Do we really need ExprLexer::lookAhead? IMHO it could be removed. Same thing for countAllTokens and countRemainingTokens * Why not use TX_CHAR_RANGE macro in isDigit * IMHO it dosn't look that nice to use greaterthen and lessthen ops on token->type in nextIsOperatorToken... hmm... it's a long list.. Maybe if you just did the comparison for all ops (ie not AXIS_IDENTIFIER) and added a comment it would be more selfexplaining * Need to add |token->type == Token::COMMA| comparison in nextIsOperatorToken. Also, several small if's instead of one big * Endcomment in nextIsOperatorToken is wrong * defType is set to Token::CNAME both in the declaration and as the first thing in the while loop * |iter++ < size| should be |++iter < size| on a lot of places in ::parse same for |iter++ == size| on line 326 * check isLetter on first char after colon rather then isNCNameChar * You need to handle "Qnames" like mathml:* somehow * A default is needed on the last switch which does some errorreporting. Currently the code goes into an neverending loop on an unknown char * Coulnd't ExprLexer::TOKENS and ExprLexer::MULTIPLY be removed? * Would you mind moving the |ch=pattern.charAt(iter)| out from the if for the PERIOD code? * You don't really have to give all tokentypes a value (for example PARENT_NODE dosn't need to have the value ".."). The only tokentypes that need values are CNAME, AXIS_IDENTIFIER, VAR_REFERENCE, LITERAL and FUNCTION_NAME * If an error occurs before any token is added bad things happen * *I* would like to have whitespace between ) and { in |if (...){| Also since, between the two of us, we're rewriting both lexer and parser. Why don't we rename CNAME to QNAME, sorry about not doing so myself...
Reporter | ||
Comment 8•23 years ago
|
||
ugh s/Also, several small if's/Also, why not use several small if's/
Reporter | ||
Comment 9•23 years ago
|
||
Just noticed that my fix for bug 75307 has snuck into your patch as well as a fix for PathExpr::getDefaultPriority... Not that I don't want it in... :)
Comment 10•23 years ago
|
||
Comment 11•23 years ago
|
||
Comment 12•23 years ago
|
||
Comment 13•23 years ago
|
||
Comment 14•23 years ago
|
||
Incorporated most of sickings comments. Basically all but the one for nextIsOperator, instead I changed the order of the tokens to have just a single range (plus NULL_TOKEN), and commented that in ExprLexer.h. I also added NS_ASSERTIONS(0,message) in Expr.cpp, as we shouldn't have code paths calling these. Now I'm ready for more comments. Axel
Reporter | ||
Comment 15•23 years ago
|
||
Only picky stuff left to comment on... * ExprLexer::MULTIPLY could still be removed ;) * could prevToken be local in parse (or removed even?) and check last token in nextIsOperatorToken? If you did that I think we could remove NULL_TOKEN compleatly! * You still give all tokentypes a value which IMHO is unneccesary. You could even remove the |Token(UNICODE_CHAR uniChar, short type)| constructor * I'd like a stronger comment the tokentype list, something like "these have to be in an unbreaked list" (my english suck sometimes :( )
Comment 16•23 years ago
|
||
bullet 1: ugh. Will remove ExprLexer::MULTIPLY. bullet 2: Hrm. Not convinced, I need to change the previous token type for :: (axis identifier) and ( (making previous function or nodetype). I like the way that is done now, as it handles whitespace transparently. So for removing the previousToken, I had to get the previous Token at three places in the code. Could I convince you in keeping it? bullet 3: This is a step towards iterators, awkward as it may seem. I want to have iterators as members of tokens in the end, and even trivial tokens should know their position for error reporting. So I assigned a substring (in most cases, yeah), to have the needed iterators at the right places. bullet 4: I will do a better comment. Axel
Reporter | ||
Comment 17•23 years ago
|
||
Bullet 2: No biggie, it's ok with me as it is now... Bullet 3: Seems like an excellent plan!
Reporter | ||
Comment 18•23 years ago
|
||
Just realized another bug. Variable names are not allowed to have the form moz:*, they must be valid QNames. Not a big problem really, evaluation will fail since no variable could ever be named moz:* ...
Comment 19•23 years ago
|
||
nice catch, I fixed that in my tree. Code starting at 304 goes now as else if (pattern.charAt(iter)=='*') if (defType == Token::VAR_REFERENCE) { // Error, VariableReference must not be $foo:* errorPos = iter; errorCode = ERROR_UNRESOLVED_VAR_REFERENCE; if (firstItem) firstItem->token->type=Token::ERROR; else addToken(new Token('\0',Token::ERROR)); iter=size; // bail } else ++iter; /* eat wildcard */ Axel
Reporter | ||
Comment 20•23 years ago
|
||
adding some depenencies... these bugs should be fixed when this code is checked in
Comment 21•23 years ago
|
||
Adding a current patch. Synched with the tip, removed PathExpr::getDefaultPriority (is now in 84677), did some cosmetic things to lexer for peterv. Changed the way lexer handles the char tests, these are now in a separate file ExprLexerChars.cpp, and NCNameChar.h and Letter.h are gone. Expr.cpp (attachment 30666 [details]) and UnaryExpr.cpp (attachment 30268 [details]) are still current. ExprLexerChars.cpp coming up, too. Axel
Comment 22•23 years ago
|
||
Comment 23•23 years ago
|
||
Comment 24•23 years ago
|
||
I introduction to what the XPath lexer is supposed to do is posted on http://www.mozilla.org/projects/xslt/coding/HowToLexer.html Axel
Reporter | ||
Comment 25•23 years ago
|
||
reassigning to peterv for review
Assignee: sicking → peterv
Status: ASSIGNED → NEW
Assignee | ||
Comment 27•23 years ago
|
||
Assignee | ||
Comment 28•23 years ago
|
||
Assignee | ||
Comment 29•23 years ago
|
||
r=peterv. Let's try to get this into 0.9.2, it's a big improvement and fixes a lot of bugs.
Comment 30•23 years ago
|
||
Peter changed the diff to -u -4 (who hates macCVS, raise your hands), included the new files ExprLexerChars.cpp and UnaryExpr.cpp, but not Expr.cpp, into the patch, and this patch has TX_EXE instead of MOZ_XSL. The old patch should still work, for those not in favor of mac line endings. And finally I have realized that I used a NPL license plate for ExprLexerChars.cpp :-(, I sure wanted MPL. new one coming up. Expr.cpp needs TX_EXE, or 77830. Axel
Comment 31•23 years ago
|
||
Comment 32•23 years ago
|
||
Comment 33•23 years ago
|
||
adjusting keywords. the xpath docs are on http://www.mozilla.org/projects/xslt/coding/xpath-parser.html, I should move the lexer stuff to http://www.mozilla.org/projects/xslt/coding/xpath-lexer.html Axel
Severity: normal → major
Priority: -- → P2
Target Milestone: --- → mozilla0.9.2
Assignee | ||
Comment 34•23 years ago
|
||
Hmm, it gets a bit messy. Well, I'm attaching the patch super-reviewers should look at. I'll try to get this in for 0.9.2.
Assignee: sicking → peterv
Assignee | ||
Comment 35•23 years ago
|
||
Assignee | ||
Comment 36•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: Fix in hand. Have r=. Need sr=, a=.
Comment 38•23 years ago
|
||
Comment 39•23 years ago
|
||
I just realized that we miss a "what's this" in here. Basically, the lexer and the parser don't really make any sense as patch. We rigorously rewired stuff, it's really more a rewrite, following *some* of the lines of the old code. So if you don't like something, cvs-blame will be almost all ours anyway. Therefor the pack contains the files for the lexer and parser as whole files, not as patches. It contains the extra files needed, to, plus a stripped down patch. Just to give you a number, the patch is 50% now. Most of it is cleanup of Expr.h, which used to have the same comment for methods in all inheriting objects over and over. We removed most of the Pattern notation in xpath, this is XSLT special, and is not supposed to be in xpath. Plus, BasicNodeExpr can now hold a name, which is for processing-instruction(), (see xpath spec 2.3). I think that's it. Axel
Comment 40•23 years ago
|
||
Addressing jst's comments: fixup the TX_CHAR_RANGE and friend macros to take the character as argument. Don't assume to work on ch. Fixup whitespace in conditions, or at least make that consistent. I made function definitions consistent per file as well, moving to new style where it was inconsistent. diff -u -N coming up Axel
Comment 41•23 years ago
|
||
Comment 42•23 years ago
|
||
sr=jst
Reporter | ||
Comment 43•23 years ago
|
||
Checked in successfully (so far). Keeping open for peterv to try to get this on the branch
Whiteboard: Fix in hand. Have r=. Need sr=, a=.
Comment 44•23 years ago
|
||
adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left open for tracking checkin to branch (nsbranch) when appropriate. Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
nsBranch ok, limited to XSLT only.
Comment 46•23 years ago
|
||
This is the first introduction of our XSLT implementation into the market. These changes do not affect the rest of the Mozilla codebase, only the XSLT part. They make our XSLT implementation much more standards conformant by fixing close to 200 hundred bugs on the Apache XSLT test suite. Marking nsBranch+ for PDT consideration.
Whiteboard: [nsBranch+]
Comment 47•23 years ago
|
||
Steve granted this PDT+ status in today's PDT meeting.
Whiteboard: [nsBranch+] → [nsBranch+][nsPDT+]
Updated•23 years ago
|
Whiteboard: [nsBranch+][nsPDT+] → [nsBranch+][PDT+]
Assignee | ||
Comment 48•23 years ago
|
||
Went in on the branch (with a bit of bumps and bruises).
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•23 years ago
|
Comment 49•23 years ago
|
||
sicking@bigfoot.com (Jonas Sicking) or kvisco@ziplink.net - can you help verify this bug on the branch 092 builds? If not, can your direct me to a test suite (URL) which I can run to verify? Thanks.
Comment 50•23 years ago
|
||
Hi, tested this on the branch, marking verified. Axel
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•