Closed Bug 75102 Opened 25 years ago Closed 24 years ago

New XPath parser and lexer

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.3

People

(Reporter: sicking, Assigned: peterv)

References

Details

(Whiteboard: [nsBranch+][PDT+])

Attachments

(19 files)

parser, lexer, + stuff needed, needs following extra files 25 years ago Axel Hecht 126.31 KB, patch		Details \| Diff \| Splinter Review
unary minus impl by sicking 25 years ago Axel Hecht 2.04 KB, text/plain		Details
Expression base class by sicking 25 years ago Axel Hecht 1.19 KB, text/plain		Details
header of macros for isLetter test in lexer 25 years ago Axel Hecht 5.73 KB, text/plain		Details
macro header for addtional NCNameChars 25 years ago Axel Hecht 3.08 KB, text/plain		Details
updated with sickings comments, the diff, against current tip 25 years ago Axel Hecht 137.08 KB, patch		Details \| Diff \| Splinter Review
added license plate 25 years ago Axel Hecht 6.55 KB, text/plain		Details
NCNameChar.h, added license plate (previous is Letter.h) 25 years ago Axel Hecht 3.91 KB, text/plain		Details
Expr.cpp, added assertions for mozmodule, as we should never call these 25 years ago Axel Hecht 1.41 KB, text/plain		Details
new diff for parser/lexer/PatternExpr removal 24 years ago Axel Hecht 133.45 KB, patch		Details \| Diff \| Splinter Review
ExprLexerChars.cpp 24 years ago Axel Hecht 10.99 KB, text/plain		Details
Final reviewed patch 24 years ago Peter Van der Beken [:peterv] 152.95 KB, patch		Details \| Diff \| Splinter Review
Same but diff -u -w 24 years ago Peter Van der Beken [:peterv] 149.60 KB, patch		Details \| Diff \| Splinter Review
updated license plate on ExprLexerChars.cpp 24 years ago Axel Hecht 10.98 KB, text/plain		Details
TX_EXEfied Expr.cpp. If we land 77830, the #ifndef can go away 24 years ago Axel Hecht 1.41 KB, text/plain		Details
To be super-reviewed (diff -u -N) 24 years ago Peter Van der Beken [:peterv] 153.55 KB, patch		Details \| Diff \| Splinter Review
To be super-reviewed (diff -u -w -N) 24 years ago Peter Van der Beken [:peterv] 149.92 KB, patch		Details \| Diff \| Splinter Review
the superreview pack, patch, plus 7 files, zipped 24 years ago Axel Hecht 26.40 KB, application/zip		Details
XPathLexerParser20010629.patch, diff -u -N, addresses comments by jst 24 years ago Axel Hecht 153.52 KB, patch		Details \| Diff \| Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Description

•

25 years ago

This is a tracker bug for the new XPath parser (done by me) and the new XPath lexer (Pike is working on it) The parser changes are dependent on the lexer changes, either some small fixes I've made or Pikes rewrite. Pike will attach pathes

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

25 years ago

Status: NEW → ASSIGNED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

25 years ago

Blocks: 72810

Axel Hecht

Comment 1

•

25 years ago

Attached patch parser, lexer, + stuff needed, needs following extra files — Details — Splinter Review

Axel Hecht

Comment 2

•

25 years ago

Attached file unary minus impl by sicking — Details

Axel Hecht

Comment 3

•

25 years ago

Attached file Expression base class by sicking — Details

Axel Hecht

Comment 4

•

25 years ago

Attached file header of macros for isLetter test in lexer — Details

Axel Hecht

Comment 5

•

25 years ago

Attached file macro header for addtional NCNameChars — Details

Axel Hecht

Comment 6

•

25 years ago

so, finally I got the lexer revamped. I attached code by sicking@bigfoot.com for the parser, and all changes necessary for removing Pattern. The lexer rewrite is pretty brutal, and just in case you wonder, there is no big difference in a diff -w, though I changed indention to 2. I didn't implement error reporting to XSLT yet, that's a different bug, imho. I moved most of the lexer to be iterator friendly, and the Token handling could need some additional love. I'd like to have that separate though. (Tokens into a list, instead of linked list, and Tokens just storing indexes (iterators) to the lex'd string instead of newly allocated substrings) Note: scc-string-fu for doing comparisons of substrings is supposed to appear somewhere. I changed ProcessorState to not add pure Named Templates to the match templs. hrm. More? probably, but it won't pop up in my mind. Axel

Keywords: review

Axel Hecht

Updated

•

25 years ago

Blocks: 70866

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 7

•

25 years ago

This kicks ass! Comments below (some of them are nitpicks bug this is my only chance to get back at nickpicks from you and peterv ;-) ) * Do we really need ExprLexer::lookAhead? IMHO it could be removed. Same thing for countAllTokens and countRemainingTokens * Why not use TX_CHAR_RANGE macro in isDigit * IMHO it dosn't look that nice to use greaterthen and lessthen ops on token->type in nextIsOperatorToken... hmm... it's a long list.. Maybe if you just did the comparison for all ops (ie not AXIS_IDENTIFIER) and added a comment it would be more selfexplaining * Need to add |token->type == Token::COMMA| comparison in nextIsOperatorToken. Also, several small if's instead of one big * Endcomment in nextIsOperatorToken is wrong * defType is set to Token::CNAME both in the declaration and as the first thing in the while loop * |iter++ < size| should be |++iter < size| on a lot of places in ::parse same for |iter++ == size| on line 326 * check isLetter on first char after colon rather then isNCNameChar * You need to handle "Qnames" like mathml:* somehow * A default is needed on the last switch which does some errorreporting. Currently the code goes into an neverending loop on an unknown char * Coulnd't ExprLexer::TOKENS and ExprLexer::MULTIPLY be removed? * Would you mind moving the |ch=pattern.charAt(iter)| out from the if for the PERIOD code? * You don't really have to give all tokentypes a value (for example PARENT_NODE dosn't need to have the value ".."). The only tokentypes that need values are CNAME, AXIS_IDENTIFIER, VAR_REFERENCE, LITERAL and FUNCTION_NAME * If an error occurs before any token is added bad things happen * *I* would like to have whitespace between ) and { in |if (...){| Also since, between the two of us, we're rewriting both lexer and parser. Why don't we rename CNAME to QNAME, sorry about not doing so myself...

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 8

•

25 years ago

ugh s/Also, several small if's/Also, why not use several small if's/

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 9

•

25 years ago

Just noticed that my fix for bug 75307 has snuck into your patch as well as a fix for PathExpr::getDefaultPriority... Not that I don't want it in... :)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

25 years ago

Blocks: 71062

Axel Hecht

Comment 10

•

25 years ago

Attached patch updated with sickings comments, the diff, against current tip — Details — Splinter Review

Axel Hecht

Comment 11

•

25 years ago

Attached file added license plate — Details

Axel Hecht

Comment 12

•

25 years ago

Attached file NCNameChar.h, added license plate (previous is Letter.h) — Details

Axel Hecht

Comment 13

•

25 years ago

Attached file Expr.cpp, added assertions for mozmodule, as we should never call these — Details

Axel Hecht

Comment 14

•

25 years ago

Incorporated most of sickings comments. Basically all but the one for nextIsOperator, instead I changed the order of the tokens to have just a single range (plus NULL_TOKEN), and commented that in ExprLexer.h. I also added NS_ASSERTIONS(0,message) in Expr.cpp, as we shouldn't have code paths calling these. Now I'm ready for more comments. Axel

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 15

•

25 years ago

Only picky stuff left to comment on... * ExprLexer::MULTIPLY could still be removed ;) * could prevToken be local in parse (or removed even?) and check last token in nextIsOperatorToken? If you did that I think we could remove NULL_TOKEN compleatly! * You still give all tokentypes a value which IMHO is unneccesary. You could even remove the |Token(UNICODE_CHAR uniChar, short type)| constructor * I'd like a stronger comment the tokentype list, something like "these have to be in an unbreaked list" (my english suck sometimes :( )

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

25 years ago

No longer blocks: 72810

Axel Hecht

Comment 16

•

24 years ago

bullet 1: ugh. Will remove ExprLexer::MULTIPLY. bullet 2: Hrm. Not convinced, I need to change the previous token type for :: (axis identifier) and ( (making previous function or nodetype). I like the way that is done now, as it handles whitespace transparently. So for removing the previousToken, I had to get the previous Token at three places in the code. Could I convince you in keeping it? bullet 3: This is a step towards iterators, awkward as it may seem. I want to have iterators as members of tokens in the end, and even trivial tokens should know their position for error reporting. So I assigned a substring (in most cases, yeah), to have the needed iterators at the right places. bullet 4: I will do a better comment. Axel

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 17

•

24 years ago

Bullet 2: No biggie, it's ok with me as it is now... Bullet 3: Seems like an excellent plan!

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 18

•

24 years ago

Just realized another bug. Variable names are not allowed to have the form moz:*, they must be valid QNames. Not a big problem really, evaluation will fail since no variable could ever be named moz:* ...

Axel Hecht

Comment 19

•

24 years ago

nice catch, I fixed that in my tree. Code starting at 304 goes now as else if (pattern.charAt(iter)=='*') if (defType == Token::VAR_REFERENCE) { // Error, VariableReference must not be $foo:* errorPos = iter; errorCode = ERROR_UNRESOLVED_VAR_REFERENCE; if (firstItem) firstItem->token->type=Token::ERROR; else addToken(new Token('\0',Token::ERROR)); iter=size; // bail } else ++iter; /* eat wildcard */ Axel

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 20

•

24 years ago

adding some depenencies... these bugs should be fixed when this code is checked in

Blocks: 45831, 67004, 67307, 69478

Axel Hecht

Comment 21

•

24 years ago

Adding a current patch. Synched with the tip, removed PathExpr::getDefaultPriority (is now in 84677), did some cosmetic things to lexer for peterv. Changed the way lexer handles the char tests, these are now in a separate file ExprLexerChars.cpp, and NCNameChar.h and Letter.h are gone. Expr.cpp (attachment 30666 [details]) and UnaryExpr.cpp (attachment 30268 [details]) are still current. ExprLexerChars.cpp coming up, too. Axel

Axel Hecht

Comment 22

•

24 years ago

Attached patch new diff for parser/lexer/PatternExpr removal — Details — Splinter Review

Axel Hecht

Comment 23

•

24 years ago

Attached file ExprLexerChars.cpp — Details

Axel Hecht

Comment 24

•

24 years ago

I introduction to what the XPath lexer is supposed to do is posted on http://www.mozilla.org/projects/xslt/coding/HowToLexer.html Axel

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 25

•

24 years ago

reassigning to peterv for review

Assignee: sicking → peterv

Status: ASSIGNED → NEW

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 26

•

24 years ago

taking back...

Assignee: peterv → sicking

Peter Van der Beken [:peterv]

Assignee

Comment 27

•

24 years ago

Attached patch Final reviewed patch — Details — Splinter Review

Peter Van der Beken [:peterv]

Assignee

Comment 28

•

24 years ago

Attached patch Same but diff -u -w — Details — Splinter Review

Peter Van der Beken [:peterv]

Assignee

Comment 29

•

24 years ago

r=peterv. Let's try to get this into 0.9.2, it's a big improvement and fixes a lot of bugs.

Axel Hecht

Comment 30

•

24 years ago

Peter changed the diff to -u -4 (who hates macCVS, raise your hands), included the new files ExprLexerChars.cpp and UnaryExpr.cpp, but not Expr.cpp, into the patch, and this patch has TX_EXE instead of MOZ_XSL. The old patch should still work, for those not in favor of mac line endings. And finally I have realized that I used a NPL license plate for ExprLexerChars.cpp :-(, I sure wanted MPL. new one coming up. Expr.cpp needs TX_EXE, or 77830. Axel

Axel Hecht

Comment 31

•

24 years ago

Attached file updated license plate on ExprLexerChars.cpp — Details

Axel Hecht

Comment 32

•

24 years ago

Attached file TX_EXEfied Expr.cpp. If we land 77830, the #ifndef can go away — Details

Axel Hecht

Comment 33

•

24 years ago

adjusting keywords. the xpath docs are on http://www.mozilla.org/projects/xslt/coding/xpath-parser.html, I should move the lexer stuff to http://www.mozilla.org/projects/xslt/coding/xpath-lexer.html Axel

Severity: normal → major

Priority: -- → P2

Target Milestone: --- → mozilla0.9.2

Peter Van der Beken [:peterv]

Assignee

Comment 34

•

24 years ago

Hmm, it gets a bit messy. Well, I'm attaching the patch super-reviewers should look at. I'll try to get this in for 0.9.2.

Assignee: sicking → peterv

Peter Van der Beken [:peterv]

Assignee

Comment 35

•

24 years ago

Attached patch To be super-reviewed (diff -u -N) — Details — Splinter Review

Peter Van der Beken [:peterv]

Assignee

Comment 36

•

24 years ago

Attached patch To be super-reviewed (diff -u -w -N) — Details — Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 37

•

24 years ago

r=sicking on the stuff by Pike

Status: NEW → ASSIGNED

Peter Van der Beken [:peterv]

Assignee

Updated

•

24 years ago

Whiteboard: Fix in hand. Have r=. Need sr=, a=.

Axel Hecht

Comment 38

•

24 years ago

Attached file the superreview pack, patch, plus 7 files, zipped — Details

Axel Hecht

Comment 39

•

24 years ago

I just realized that we miss a "what's this" in here. Basically, the lexer and the parser don't really make any sense as patch. We rigorously rewired stuff, it's really more a rewrite, following *some* of the lines of the old code. So if you don't like something, cvs-blame will be almost all ours anyway. Therefor the pack contains the files for the lexer and parser as whole files, not as patches. It contains the extra files needed, to, plus a stripped down patch. Just to give you a number, the patch is 50% now. Most of it is cleanup of Expr.h, which used to have the same comment for methods in all inheriting objects over and over. We removed most of the Pattern notation in xpath, this is XSLT special, and is not supposed to be in xpath. Plus, BasicNodeExpr can now hold a name, which is for processing-instruction(), (see xpath spec 2.3). I think that's it. Axel

Peter Van der Beken [:peterv]

Assignee

Updated

•

24 years ago

Keywords: nsBranch

Target Milestone: mozilla0.9.2 → mozilla0.9.3

Axel Hecht

Comment 40

•

24 years ago

Addressing jst's comments: fixup the TX_CHAR_RANGE and friend macros to take the character as argument. Don't assume to work on ch. Fixup whitespace in conditions, or at least make that consistent. I made function definitions consistent per file as well, moving to new style where it was inconsistent. diff -u -N coming up Axel

Axel Hecht

Comment 41

•

24 years ago

Attached patch XPathLexerParser20010629.patch, diff -u -N, addresses comments by jst — Details — Splinter Review

Johnny Stenback (:jst)

Comment 42

•

24 years ago

sr=jst

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 43

•

24 years ago

Checked in successfully (so far). Keeping open for peterv to try to get this on the branch

Whiteboard: Fix in hand. Have r=. Need sr=, a=.

lchiang

Comment 44

•

24 years ago

adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left open for tracking checkin to branch (nsbranch) when appropriate. Once bug has been fixed on the branch also, pls remove vtrunk keyword.

Keywords: vtrunk

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 45

•

24 years ago

nsBranch ok, limited to XSLT only.

Nisheeth Ranjan

Comment 46

•

24 years ago

This is the first introduction of our XSLT implementation into the market. These changes do not affect the rest of the Mozilla codebase, only the XSLT part. They make our XSLT implementation much more standards conformant by fixing close to 200 hundred bugs on the Apache XSLT test suite. Marking nsBranch+ for PDT consideration.

Whiteboard: [nsBranch+]

Nisheeth Ranjan

Comment 47

•

24 years ago

Steve granted this PDT+ status in today's PDT meeting.

Whiteboard: [nsBranch+] → [nsBranch+][nsPDT+]

Nisheeth Ranjan

Updated

•

24 years ago

Whiteboard: [nsBranch+][nsPDT+] → [nsBranch+][PDT+]

Peter Van der Beken [:peterv]

Assignee

Comment 48

•

24 years ago

Went in on the branch (with a bit of bumps and bruises).

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Peter Van der Beken [:peterv]

Assignee

Updated

•

24 years ago

Keywords: nsBranch, review, vtrunk

lchiang

Comment 49

•

24 years ago

sicking@bigfoot.com (Jonas Sicking) or kvisco@ziplink.net - can you help verify this bug on the branch 092 builds? If not, can your direct me to a test suite (URL) which I can run to verify? Thanks.

Axel Hecht

Comment 50

•

24 years ago

Hi, tested this on the branch, marking verified. Axel

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.