New XPath parser and lexer

VERIFIED FIXED in mozilla0.9.3

Status

()

P2
major
VERIFIED FIXED
18 years ago
18 years ago

People

(Reporter: sicking, Assigned: peterv)

Tracking

Trunk
mozilla0.9.3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [nsBranch+][PDT+])

Attachments

(19 attachments)

126.31 KB, patch
Details | Diff | Splinter Review
2.04 KB, text/plain
Details
1.19 KB, text/plain
Details
5.73 KB, text/plain
Details
3.08 KB, text/plain
Details
137.08 KB, patch
Details | Diff | Splinter Review
6.55 KB, text/plain
Details
3.91 KB, text/plain
Details
1.41 KB, text/plain
Details
133.45 KB, patch
Details | Diff | Splinter Review
10.99 KB, text/plain
Details
152.95 KB, patch
Details | Diff | Splinter Review
149.60 KB, patch
Details | Diff | Splinter Review
10.98 KB, text/plain
Details
1.41 KB, text/plain
Details
153.55 KB, patch
Details | Diff | Splinter Review
149.92 KB, patch
Details | Diff | Splinter Review
26.40 KB, application/zip
Details
153.52 KB, patch
Details | Diff | Splinter Review
This is a tracker bug for the new XPath parser (done by me) and the new XPath
lexer (Pike is working on it)

The parser changes are dependent on the lexer changes, either some small fixes
I've made or Pikes rewrite.

Pike will attach pathes
Status: NEW → ASSIGNED

Comment 1

18 years ago
Created attachment 30266 [details] [diff] [review]
parser, lexer, + stuff needed, needs following extra files

Comment 2

18 years ago
Created attachment 30268 [details]
unary minus impl by sicking

Comment 3

18 years ago
Created attachment 30269 [details]
Expression base class by sicking

Comment 4

18 years ago
Created attachment 30270 [details]
header of macros for isLetter test in lexer

Comment 5

18 years ago
Created attachment 30271 [details]
macro header for addtional NCNameChars

Comment 6

18 years ago
so, finally I got the lexer revamped. I attached code by sicking@bigfoot.com for
the parser, and all changes necessary for removing Pattern.
The lexer rewrite is pretty brutal, and just in case you wonder, there is no
big difference in a diff -w, though I changed indention to 2.
I didn't implement error reporting to XSLT yet, that's a different bug, imho.
I moved most of the lexer to be iterator friendly, and the Token handling
could need some additional love. I'd like to have that separate though.
(Tokens into a list, instead of linked list, and Tokens just storing indexes
(iterators) to the lex'd string instead of newly allocated substrings)

Note: scc-string-fu for doing comparisons of substrings is supposed to appear
somewhere.

I changed ProcessorState to not add pure Named Templates to the match templs.

hrm. More? probably, but it won't pop up in my mind.

Axel
Keywords: review

Updated

18 years ago
Blocks: 70866
This kicks ass! Comments below (some of them are nitpicks bug this is my only 
chance to get back at nickpicks from you and peterv ;-) )

* Do we really need ExprLexer::lookAhead? IMHO it could be removed. Same thing
  for countAllTokens and countRemainingTokens

* Why not use TX_CHAR_RANGE macro in isDigit

* IMHO it dosn't look that nice to use greaterthen and lessthen ops on
  token->type in nextIsOperatorToken... hmm... it's a long list.. Maybe if you
  just did the comparison for all ops (ie not AXIS_IDENTIFIER) and added a
  comment it would be more selfexplaining

* Need to add |token->type == Token::COMMA| comparison in nextIsOperatorToken.
  Also, several small if's instead of one big

* Endcomment in nextIsOperatorToken is wrong

* defType is set to Token::CNAME both in the declaration and as the first thing
  in the while loop

* |iter++ < size| should be |++iter < size| on a lot of places in ::parse
  same for |iter++ == size| on line 326

* check isLetter on first char after colon rather then isNCNameChar

* You need to handle "Qnames" like mathml:* somehow

* A default is needed on the last switch which does some errorreporting.
  Currently the code goes into an neverending loop on an unknown char

* Coulnd't ExprLexer::TOKENS and ExprLexer::MULTIPLY be removed?

* Would you mind moving the |ch=pattern.charAt(iter)| out from the if for the
  PERIOD code?

* You don't really have to give all tokentypes a value (for example
  PARENT_NODE dosn't need to have the value ".."). The only tokentypes that need
  values are CNAME, AXIS_IDENTIFIER, VAR_REFERENCE, LITERAL and FUNCTION_NAME

* If an error occurs before any token is added bad things happen

* *I* would like to have whitespace between ) and { in |if (...){|

Also since, between the two of us, we're rewriting both lexer and parser. Why 
don't we rename CNAME to QNAME, sorry about not doing so myself...
ugh
s/Also, several small if's/Also, why not use several small if's/
Just noticed that my fix for bug 75307 has snuck into your patch as well as a 
fix for PathExpr::getDefaultPriority... Not that I don't want it in... :)

Comment 10

18 years ago
Created attachment 30662 [details] [diff] [review]
updated with sickings comments, the diff, against current tip

Comment 11

18 years ago
Created attachment 30663 [details]
added license plate

Comment 12

18 years ago
Created attachment 30664 [details]
NCNameChar.h, added license plate (previous is Letter.h)

Comment 13

18 years ago
Created attachment 30666 [details]
Expr.cpp, added assertions for mozmodule, as we should never call these

Comment 14

18 years ago
Incorporated most of sickings comments.
Basically all but the one for nextIsOperator, instead I changed the order of
the tokens to have just a single range (plus NULL_TOKEN), and commented that
in ExprLexer.h.
I also added NS_ASSERTIONS(0,message) in Expr.cpp, as we shouldn't have code
paths calling these.

Now I'm ready for more comments.

Axel
Only picky stuff left to comment on...

* ExprLexer::MULTIPLY could still be removed ;)

* could prevToken be local in parse (or removed even?) and check last token in
  nextIsOperatorToken? If you did that I think we could remove NULL_TOKEN
  compleatly!

* You still give all tokentypes a value which IMHO is unneccesary. You could
  even remove the |Token(UNICODE_CHAR uniChar, short type)| constructor

* I'd like a stronger comment the tokentype list, something like "these have to
  be in an unbreaked list" (my english suck sometimes :( )

Comment 16

18 years ago
bullet 1: ugh. Will remove ExprLexer::MULTIPLY.
bullet 2: Hrm. Not convinced, I need to change the previous token type for ::
(axis identifier) and ( (making previous function or nodetype). I like the way
that is done now, as it handles whitespace transparently. So for removing the
previousToken, I had to get the previous Token at three places in the code.
Could I convince you in keeping it?
bullet 3: This is a step towards iterators, awkward as it may seem. I want to
have iterators as members of tokens in the end, and even trivial tokens should
know their position for error reporting. So I assigned a substring (in most 
cases, yeah), to have the needed iterators at the right places.
bullet 4: I will do a better comment.

Axel
Bullet 2: No biggie, it's ok with me as it is now...

Bullet 3: Seems like an excellent plan!
Just realized another bug. Variable names are not allowed to have the form 
moz:*, they must be valid QNames. Not a big problem really, evaluation will 
fail since no variable could ever be named moz:* ...

Comment 19

18 years ago
nice catch, I fixed that in my tree. Code starting at 304 goes now as

          else if (pattern.charAt(iter)=='*')
            if (defType == Token::VAR_REFERENCE) {
              // Error, VariableReference must not be $foo:*
              errorPos = iter;
              errorCode = ERROR_UNRESOLVED_VAR_REFERENCE;
              if (firstItem)
                firstItem->token->type=Token::ERROR;
              else
                addToken(new Token('\0',Token::ERROR));
              iter=size; // bail
            } else
              ++iter; /* eat wildcard */

Axel
adding some depenencies... these bugs should be fixed when this code is checked in
Blocks: 45831, 67004, 67307, 69478

Comment 21

18 years ago
Adding a current patch. Synched with the tip, removed 
PathExpr::getDefaultPriority (is now in 84677), did some cosmetic things to 
lexer for peterv.
Changed the way lexer handles the char tests, these are now in a separate file
ExprLexerChars.cpp, and NCNameChar.h and Letter.h are gone.
Expr.cpp (attachment 30666 [details]) and UnaryExpr.cpp (attachment 30268 [details]) are still
current.
ExprLexerChars.cpp coming up, too.

Axel

Comment 22

18 years ago
Created attachment 37652 [details] [diff] [review]
new diff for parser/lexer/PatternExpr removal

Comment 23

18 years ago
Created attachment 37653 [details]
ExprLexerChars.cpp

Comment 24

18 years ago
I introduction to what the XPath lexer is supposed to do is posted on
http://www.mozilla.org/projects/xslt/coding/HowToLexer.html

Axel
reassigning to peterv for review
Assignee: sicking → peterv
Status: ASSIGNED → NEW
taking back...
Assignee: peterv → sicking
(Assignee)

Comment 27

18 years ago
Created attachment 38503 [details] [diff] [review]
Final reviewed patch
(Assignee)

Comment 28

18 years ago
Created attachment 38508 [details] [diff] [review]
Same but diff -u -w
(Assignee)

Comment 29

18 years ago
r=peterv. Let's try to get this into 0.9.2, it's a big improvement and fixes a
lot of bugs.

Comment 30

18 years ago
Peter changed the diff to -u -4 (who hates macCVS, raise your hands),
included the new files ExprLexerChars.cpp and UnaryExpr.cpp, but not Expr.cpp,
into the patch, and this patch has TX_EXE instead of
MOZ_XSL. The old patch should still work, for those not in favor of mac line
endings.
And finally I have realized that I used a NPL license plate for 
ExprLexerChars.cpp :-(, I sure wanted MPL. new one coming up.
Expr.cpp needs TX_EXE, or 77830.

Axel

Comment 31

18 years ago
Created attachment 38590 [details]
updated license plate on ExprLexerChars.cpp

Comment 32

18 years ago
Created attachment 38591 [details]
TX_EXEfied Expr.cpp. If we land 77830, the #ifndef can go away

Comment 33

18 years ago
adjusting keywords.
the xpath docs are on
http://www.mozilla.org/projects/xslt/coding/xpath-parser.html, I should move
the lexer stuff to http://www.mozilla.org/projects/xslt/coding/xpath-lexer.html
Axel
Severity: normal → major
Priority: -- → P2
Target Milestone: --- → mozilla0.9.2
(Assignee)

Comment 34

18 years ago
Hmm, it gets a bit messy. Well, I'm attaching the patch super-reviewers should
look at. I'll try to get this in for 0.9.2.
Assignee: sicking → peterv
(Assignee)

Comment 35

18 years ago
Created attachment 38595 [details] [diff] [review]
To be super-reviewed (diff -u -N)
(Assignee)

Comment 36

18 years ago
Created attachment 38596 [details] [diff] [review]
To be super-reviewed (diff -u -w -N)
r=sicking on the stuff by Pike
Status: NEW → ASSIGNED
(Assignee)

Updated

18 years ago
Whiteboard: Fix in hand. Have r=. Need sr=, a=.

Comment 38

18 years ago
Created attachment 39504 [details]
the superreview pack, patch, plus 7 files, zipped

Comment 39

18 years ago
I just realized that we miss a "what's this" in here.

Basically, the lexer and the parser don't really make any sense as patch. We
rigorously rewired stuff, it's really more a rewrite, following *some* of the
lines of the old code. So if you don't like something, cvs-blame will be almost
all ours anyway. Therefor the pack contains the files for the lexer and parser
as whole files, not as patches. It contains the extra files needed, to, plus a
stripped down patch.
Just to give you a number, the patch is 50% now. Most of it is cleanup of
Expr.h, which used to have the same comment for methods in all inheriting
objects over and over. 
We removed most of the Pattern notation in xpath, this is XSLT special, and is
not supposed to be in xpath.
Plus, BasicNodeExpr can now hold a name, which is for processing-instruction(),
(see xpath spec 2.3).

I think that's it.

Axel
(Assignee)

Updated

18 years ago
Keywords: nsBranch
Target Milestone: mozilla0.9.2 → mozilla0.9.3

Comment 40

18 years ago
Addressing jst's comments: fixup the TX_CHAR_RANGE and friend macros to take
the character as argument. Don't assume to work on ch.
Fixup whitespace in conditions, or at least make that consistent. I made 
function definitions consistent per file as well, moving to new style where it
was inconsistent. diff -u -N coming up

Axel

Comment 41

18 years ago
Created attachment 40618 [details] [diff] [review]
XPathLexerParser20010629.patch, diff -u -N, addresses comments by jst
sr=jst
Checked in successfully (so far). Keeping open for peterv to try to get this on 
the branch
Whiteboard: Fix in hand. Have r=. Need sr=, a=.

Comment 44

18 years ago
adding vtrunk keyword to indicate the bug has been fixed on the trunk.  Bug left
open for tracking checkin to branch (nsbranch) when appropriate.

Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
nsBranch ok, limited to XSLT only.

Comment 46

18 years ago
This is the first introduction of our XSLT implementation into the market. 
These  changes do not affect the rest of the Mozilla codebase, only the XSLT
part.  They make our XSLT implementation much more standards conformant by
fixing close to 200  hundred bugs on the Apache XSLT test suite.

Marking nsBranch+ for PDT consideration.
Whiteboard: [nsBranch+]

Comment 47

18 years ago
Steve granted this PDT+ status in today's PDT meeting.
Whiteboard: [nsBranch+] → [nsBranch+][nsPDT+]

Updated

18 years ago
Whiteboard: [nsBranch+][nsPDT+] → [nsBranch+][PDT+]
(Assignee)

Comment 48

18 years ago
Went in on the branch (with a bit of bumps and bruises).
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
(Assignee)

Updated

18 years ago
Keywords: nsBranch, review, vtrunk

Comment 49

18 years ago
sicking@bigfoot.com (Jonas Sicking) or kvisco@ziplink.net - can you help verify
this bug on the branch 092 builds?  If not, can your direct me to a test suite
(URL) which I can run to verify?  Thanks.

Comment 50

18 years ago
Hi, tested this on the branch, marking verified.

Axel
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.