Closed
Bug 75102
Opened 24 years ago
Closed 24 years ago
New XPath parser and lexer
Categories
(Core :: XSLT, defect, P2)
Core
XSLT
Tracking
()
VERIFIED
FIXED
mozilla0.9.3
People
(Reporter: sicking, Assigned: peterv)
References
Details
(Whiteboard: [nsBranch+][PDT+])
Attachments
(19 files)
126.31 KB,
patch
|
Details | Diff | Splinter Review | |
2.04 KB,
text/plain
|
Details | |
1.19 KB,
text/plain
|
Details | |
5.73 KB,
text/plain
|
Details | |
3.08 KB,
text/plain
|
Details | |
137.08 KB,
patch
|
Details | Diff | Splinter Review | |
6.55 KB,
text/plain
|
Details | |
3.91 KB,
text/plain
|
Details | |
1.41 KB,
text/plain
|
Details | |
133.45 KB,
patch
|
Details | Diff | Splinter Review | |
10.99 KB,
text/plain
|
Details | |
152.95 KB,
patch
|
Details | Diff | Splinter Review | |
149.60 KB,
patch
|
Details | Diff | Splinter Review | |
10.98 KB,
text/plain
|
Details | |
1.41 KB,
text/plain
|
Details | |
153.55 KB,
patch
|
Details | Diff | Splinter Review | |
149.92 KB,
patch
|
Details | Diff | Splinter Review | |
26.40 KB,
application/zip
|
Details | |
153.52 KB,
patch
|
Details | Diff | Splinter Review |
This is a tracker bug for the new XPath parser (done by me) and the new XPath
lexer (Pike is working on it)
The parser changes are dependent on the lexer changes, either some small fixes
I've made or Pikes rewrite.
Pike will attach pathes
Reporter | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
Comment 3•24 years ago
|
||
Comment 4•24 years ago
|
||
Comment 5•24 years ago
|
||
Comment 6•24 years ago
|
||
so, finally I got the lexer revamped. I attached code by sicking@bigfoot.com for
the parser, and all changes necessary for removing Pattern.
The lexer rewrite is pretty brutal, and just in case you wonder, there is no
big difference in a diff -w, though I changed indention to 2.
I didn't implement error reporting to XSLT yet, that's a different bug, imho.
I moved most of the lexer to be iterator friendly, and the Token handling
could need some additional love. I'd like to have that separate though.
(Tokens into a list, instead of linked list, and Tokens just storing indexes
(iterators) to the lex'd string instead of newly allocated substrings)
Note: scc-string-fu for doing comparisons of substrings is supposed to appear
somewhere.
I changed ProcessorState to not add pure Named Templates to the match templs.
hrm. More? probably, but it won't pop up in my mind.
Axel
Keywords: review
Reporter | ||
Comment 7•24 years ago
|
||
This kicks ass! Comments below (some of them are nitpicks bug this is my only
chance to get back at nickpicks from you and peterv ;-) )
* Do we really need ExprLexer::lookAhead? IMHO it could be removed. Same thing
for countAllTokens and countRemainingTokens
* Why not use TX_CHAR_RANGE macro in isDigit
* IMHO it dosn't look that nice to use greaterthen and lessthen ops on
token->type in nextIsOperatorToken... hmm... it's a long list.. Maybe if you
just did the comparison for all ops (ie not AXIS_IDENTIFIER) and added a
comment it would be more selfexplaining
* Need to add |token->type == Token::COMMA| comparison in nextIsOperatorToken.
Also, several small if's instead of one big
* Endcomment in nextIsOperatorToken is wrong
* defType is set to Token::CNAME both in the declaration and as the first thing
in the while loop
* |iter++ < size| should be |++iter < size| on a lot of places in ::parse
same for |iter++ == size| on line 326
* check isLetter on first char after colon rather then isNCNameChar
* You need to handle "Qnames" like mathml:* somehow
* A default is needed on the last switch which does some errorreporting.
Currently the code goes into an neverending loop on an unknown char
* Coulnd't ExprLexer::TOKENS and ExprLexer::MULTIPLY be removed?
* Would you mind moving the |ch=pattern.charAt(iter)| out from the if for the
PERIOD code?
* You don't really have to give all tokentypes a value (for example
PARENT_NODE dosn't need to have the value ".."). The only tokentypes that need
values are CNAME, AXIS_IDENTIFIER, VAR_REFERENCE, LITERAL and FUNCTION_NAME
* If an error occurs before any token is added bad things happen
* *I* would like to have whitespace between ) and { in |if (...){|
Also since, between the two of us, we're rewriting both lexer and parser. Why
don't we rename CNAME to QNAME, sorry about not doing so myself...
Reporter | ||
Comment 8•24 years ago
|
||
ugh
s/Also, several small if's/Also, why not use several small if's/
Reporter | ||
Comment 9•24 years ago
|
||
Just noticed that my fix for bug 75307 has snuck into your patch as well as a
fix for PathExpr::getDefaultPriority... Not that I don't want it in... :)
Comment 10•24 years ago
|
||
Comment 11•24 years ago
|
||
Comment 12•24 years ago
|
||
Comment 13•24 years ago
|
||
Comment 14•24 years ago
|
||
Incorporated most of sickings comments.
Basically all but the one for nextIsOperator, instead I changed the order of
the tokens to have just a single range (plus NULL_TOKEN), and commented that
in ExprLexer.h.
I also added NS_ASSERTIONS(0,message) in Expr.cpp, as we shouldn't have code
paths calling these.
Now I'm ready for more comments.
Axel
Reporter | ||
Comment 15•24 years ago
|
||
Only picky stuff left to comment on...
* ExprLexer::MULTIPLY could still be removed ;)
* could prevToken be local in parse (or removed even?) and check last token in
nextIsOperatorToken? If you did that I think we could remove NULL_TOKEN
compleatly!
* You still give all tokentypes a value which IMHO is unneccesary. You could
even remove the |Token(UNICODE_CHAR uniChar, short type)| constructor
* I'd like a stronger comment the tokentype list, something like "these have to
be in an unbreaked list" (my english suck sometimes :( )
Comment 16•24 years ago
|
||
bullet 1: ugh. Will remove ExprLexer::MULTIPLY.
bullet 2: Hrm. Not convinced, I need to change the previous token type for ::
(axis identifier) and ( (making previous function or nodetype). I like the way
that is done now, as it handles whitespace transparently. So for removing the
previousToken, I had to get the previous Token at three places in the code.
Could I convince you in keeping it?
bullet 3: This is a step towards iterators, awkward as it may seem. I want to
have iterators as members of tokens in the end, and even trivial tokens should
know their position for error reporting. So I assigned a substring (in most
cases, yeah), to have the needed iterators at the right places.
bullet 4: I will do a better comment.
Axel
Reporter | ||
Comment 17•24 years ago
|
||
Bullet 2: No biggie, it's ok with me as it is now...
Bullet 3: Seems like an excellent plan!
Reporter | ||
Comment 18•24 years ago
|
||
Just realized another bug. Variable names are not allowed to have the form
moz:*, they must be valid QNames. Not a big problem really, evaluation will
fail since no variable could ever be named moz:* ...
Comment 19•24 years ago
|
||
nice catch, I fixed that in my tree. Code starting at 304 goes now as
else if (pattern.charAt(iter)=='*')
if (defType == Token::VAR_REFERENCE) {
// Error, VariableReference must not be $foo:*
errorPos = iter;
errorCode = ERROR_UNRESOLVED_VAR_REFERENCE;
if (firstItem)
firstItem->token->type=Token::ERROR;
else
addToken(new Token('\0',Token::ERROR));
iter=size; // bail
} else
++iter; /* eat wildcard */
Axel
Reporter | ||
Comment 20•24 years ago
|
||
adding some depenencies... these bugs should be fixed when this code is checked in
Comment 21•24 years ago
|
||
Adding a current patch. Synched with the tip, removed
PathExpr::getDefaultPriority (is now in 84677), did some cosmetic things to
lexer for peterv.
Changed the way lexer handles the char tests, these are now in a separate file
ExprLexerChars.cpp, and NCNameChar.h and Letter.h are gone.
Expr.cpp (attachment 30666 [details]) and UnaryExpr.cpp (attachment 30268 [details]) are still
current.
ExprLexerChars.cpp coming up, too.
Axel
Comment 22•24 years ago
|
||
Comment 23•24 years ago
|
||
Comment 24•24 years ago
|
||
I introduction to what the XPath lexer is supposed to do is posted on
http://www.mozilla.org/projects/xslt/coding/HowToLexer.html
Axel
Reporter | ||
Comment 25•24 years ago
|
||
reassigning to peterv for review
Assignee: sicking → peterv
Status: ASSIGNED → NEW
Assignee | ||
Comment 27•24 years ago
|
||
Assignee | ||
Comment 28•24 years ago
|
||
Assignee | ||
Comment 29•24 years ago
|
||
r=peterv. Let's try to get this into 0.9.2, it's a big improvement and fixes a
lot of bugs.
Comment 30•24 years ago
|
||
Peter changed the diff to -u -4 (who hates macCVS, raise your hands),
included the new files ExprLexerChars.cpp and UnaryExpr.cpp, but not Expr.cpp,
into the patch, and this patch has TX_EXE instead of
MOZ_XSL. The old patch should still work, for those not in favor of mac line
endings.
And finally I have realized that I used a NPL license plate for
ExprLexerChars.cpp :-(, I sure wanted MPL. new one coming up.
Expr.cpp needs TX_EXE, or 77830.
Axel
Comment 31•24 years ago
|
||
Comment 32•24 years ago
|
||
Comment 33•24 years ago
|
||
adjusting keywords.
the xpath docs are on
http://www.mozilla.org/projects/xslt/coding/xpath-parser.html, I should move
the lexer stuff to http://www.mozilla.org/projects/xslt/coding/xpath-lexer.html
Axel
Severity: normal → major
Priority: -- → P2
Target Milestone: --- → mozilla0.9.2
Assignee | ||
Comment 34•24 years ago
|
||
Hmm, it gets a bit messy. Well, I'm attaching the patch super-reviewers should
look at. I'll try to get this in for 0.9.2.
Assignee: sicking → peterv
Assignee | ||
Comment 35•24 years ago
|
||
Assignee | ||
Comment 36•24 years ago
|
||
Assignee | ||
Updated•24 years ago
|
Whiteboard: Fix in hand. Have r=. Need sr=, a=.
Comment 38•24 years ago
|
||
Comment 39•24 years ago
|
||
I just realized that we miss a "what's this" in here.
Basically, the lexer and the parser don't really make any sense as patch. We
rigorously rewired stuff, it's really more a rewrite, following *some* of the
lines of the old code. So if you don't like something, cvs-blame will be almost
all ours anyway. Therefor the pack contains the files for the lexer and parser
as whole files, not as patches. It contains the extra files needed, to, plus a
stripped down patch.
Just to give you a number, the patch is 50% now. Most of it is cleanup of
Expr.h, which used to have the same comment for methods in all inheriting
objects over and over.
We removed most of the Pattern notation in xpath, this is XSLT special, and is
not supposed to be in xpath.
Plus, BasicNodeExpr can now hold a name, which is for processing-instruction(),
(see xpath spec 2.3).
I think that's it.
Axel
Comment 40•24 years ago
|
||
Addressing jst's comments: fixup the TX_CHAR_RANGE and friend macros to take
the character as argument. Don't assume to work on ch.
Fixup whitespace in conditions, or at least make that consistent. I made
function definitions consistent per file as well, moving to new style where it
was inconsistent. diff -u -N coming up
Axel
Comment 41•24 years ago
|
||
Comment 42•24 years ago
|
||
sr=jst
Reporter | ||
Comment 43•24 years ago
|
||
Checked in successfully (so far). Keeping open for peterv to try to get this on
the branch
Whiteboard: Fix in hand. Have r=. Need sr=, a=.
Comment 44•24 years ago
|
||
adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left
open for tracking checkin to branch (nsbranch) when appropriate.
Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
nsBranch ok, limited to XSLT only.
Comment 46•24 years ago
|
||
This is the first introduction of our XSLT implementation into the market.
These changes do not affect the rest of the Mozilla codebase, only the XSLT
part. They make our XSLT implementation much more standards conformant by
fixing close to 200 hundred bugs on the Apache XSLT test suite.
Marking nsBranch+ for PDT consideration.
Whiteboard: [nsBranch+]
Comment 47•24 years ago
|
||
Steve granted this PDT+ status in today's PDT meeting.
Whiteboard: [nsBranch+] → [nsBranch+][nsPDT+]
Updated•24 years ago
|
Whiteboard: [nsBranch+][nsPDT+] → [nsBranch+][PDT+]
Assignee | ||
Comment 48•24 years ago
|
||
Went in on the branch (with a bit of bumps and bruises).
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•24 years ago
|
Comment 49•24 years ago
|
||
sicking@bigfoot.com (Jonas Sicking) or kvisco@ziplink.net - can you help verify
this bug on the branch 092 builds? If not, can your direct me to a test suite
(URL) which I can run to verify? Thanks.
Comment 50•24 years ago
|
||
Hi, tested this on the branch, marking verified.
Axel
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•