Closed Bug 508466 Opened 15 years ago Closed 14 years ago

Distinguish the descendant combinator from the end-of-list marker in nsCSSSelector objects

Categories

(Core :: CSS Parsing and Computation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: zwol, Assigned: zwol)

References

Details

Attachments

(3 files, 8 obsolete files)

17.64 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
12.56 KB, patch
zwol
: review+
Details | Diff | Splinter Review
7.80 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
Attached patch patch (obsolete) — Splinter Review
From bug 494117:

    we use PRUnichar(0) as both the descendant combinator and the default
    combinator in right-most selectors.  Ideally we'd use different values for
    the two cases (a separate bug blocking this one?).

This patch does just that - ' ' is used as the descendant combinator, PRUnichar(0) continues to be the end-of-list marker.  There should be no visible external effect.

Reftests are green, try server is running.
Attachment #392645 - Flags: review?(bzbarsky)
Blocks: 494117
Comment on attachment 392645 [details] [diff] [review]
patch

>-      if ((NS_IS_GREEDY_OPERATOR(selector->mOperator)) &&
>-          (selector->mNext) &&
>-          (selector->mNext->mOperator != selector->mOperator) &&
>+      if (NS_IS_GREEDY_OPERATOR(selector->mOperator) &&
>+          selector->mNext &&
>+          selector->mNext->mOperator != selector->mOperator &&
>           !(selector->mOperator == '~' &&
>-            selector->mNext->mOperator == PRUnichar(0))) {
>-
>+            (selector->mNext->mOperator == ' ' ||
>+             selector->mNext->mOperator == '>'))) {

Why the addition of '>'?  Do you have a testcase?
(In reply to comment #1)
> >+            (selector->mNext->mOperator == ' ' ||
> >+             selector->mNext->mOperator == '>'))) {
> 
> Why the addition of '>'?  Do you have a testcase?

It seemed like an obvious omission, from the comment above this conditional:

      // to avoid greedy matching, we need to recur if this is a
      // descendant or general sibling combinator and the next
      // combinator is different, but we can make an exception for
      // sibling, then parent, since a sibling's parent is always the
      // same.

The observation that a sibling of the current node will always have the same parent as the node itself seems equally applicable to '>' and ' ' combinators.  I confess, though, I don't fully grok this algorithm.  I *think* the attached test case exercises this change thoroughly -- if you agree I'll add it to reftests, otherwise please suggest selector sequences I should try.
Incidentally, the try server run was completely successful, except for two xpcshell-tests failures on Windows, and a Tp crash on qm-pubuntu-try04, both of which appear to be unrelated to this change: see http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249438888.1249447832.8386.gz and http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249442408.27.1249445054.10339.gz respectively.

The built binaries are at https://build.mozilla.org/tryserver-builds/zweinberg@mozilla.com-desc-comb/ if anyone wants to poke at them.
Is a reftest okay, or would you prefer that I add to test_selectors.html?
test_selectors.html has advantages because it tests serialization and cloning "for free", so the more of our selector tests we have there, the better.
Attached patch patch with mochitests (obsolete) — Splinter Review
OK, here is a revised patch which adds all of the tests in attachment 392748 [details] to test_selectors.html.
Attachment #392645 - Attachment is obsolete: true
Attachment #392748 - Attachment is obsolete: true
Attachment #393649 - Flags: review?(dbaron)
Attachment #392645 - Flags: review?(bzbarsky)
Could you explain the idea behind the refactoring?
If you mean using ' ' for mOperator when the selector has a descendant, it's so the patch in bug 494117 won't need to add an mNextOfSomething field.  I'm working on updating that patch now.

If you mean the parser changes, those are just because I couldn't convince myself I had made the right changes to ParseSelectorGroup without rewriting it so the control flow made sense.
(In reply to comment #9)
> If you mean the parser changes, those are just because I couldn't convince
> myself I had made the right changes to ParseSelectorGroup without rewriting it
> so the control flow made sense.

Yeah, that's what I meant.

It looks like there are at least three separate changes here (refactoring ParseSelectorGroup, the greediness optimization in SelectorMatchesTree, and the one described in the bug summary).  In the future, things like that are easier to review if you do them as separate mq patches.
I'd be happy to split this one up if you want.
Don't; I've already started reviewing it.
Comment on attachment 393649 [details] [diff] [review]
patch with mochitests

I'm tempted to say HoistPseudoElement should be static, and the caller
should call mScanner.SetLowLevelError(NS_ERROR_OUT_OF_MEMORY) if it
returns false (which is always for out-of-memory).  However, given that
I plan to get rid of it shortly, I don't know if it's worth the bother
(see below).

The whole pulling-out-pseudo-elements business is really considerably
more avoidable than you make it, since we know the second we parse a
pseudo-class or pseudo-element which one it is, so we could get it right
right at the beginning.  Maybe worth fixing as a separate patch later.

A simpler further simplification (if you don't want the above) would be
to make the IsSinglePseudoClass check a special case of its own else,
i.e., share the same code, for the IsSinglePseudoClass && listsel->mNext
case as for the !IsSinglePseudoClass case.  Also better as a followup.

Actually, I just wrote the first of the above patches, so don't do
either of these.  It should be pretty easy to merge after this lands.
See
http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/3f582d21326c/remove-hoist-pseudo-element

I believe your patch reintroduces the bug that we accept the selector
"div*p" as equivalent to "div * p" instead of as a parsing error (or,
likewise for "div*" or "*div").  Please add tests for these cases and
re-fix the bug.  (There was a reason for the GetToken(PR_FALSE) followed
by the GetToken(PR_TRUE) when parsing combinators.)

>-#define NS_IS_GREEDY_OPERATOR(ch) (ch == PRUnichar(0) || ch == PRUnichar('~'))
>+#define NS_IS_GREEDY_OPERATOR(ch) (ch == PRUnichar(' ') || ch == PRUnichar('~'))

While you're here, please fix the macro to brace its own argument, i.e.,
 ((ch) == PRUnichar ...

>-      if ((NS_IS_GREEDY_OPERATOR(selector->mOperator)) &&
>-          (selector->mNext) &&
>-          (selector->mNext->mOperator != selector->mOperator) &&
>+      if (NS_IS_GREEDY_OPERATOR(selector->mOperator) &&
>+          selector->mNext &&
>+          selector->mNext->mOperator != selector->mOperator &&
>           !(selector->mOperator == '~' &&
>-            selector->mNext->mOperator == PRUnichar(0))) {
>-
>+            (selector->mNext->mOperator == ' ' ||
>+             selector->mNext->mOperator == '>'))) {

And somewhere around here (preferably somewhere it gets checked more
often rather than less), could you toss in an assertion, to check your
own new code, that if the selector has an mNext, its mNext's operator is
not PRUnichar(0)?

 
In nsCSSSelector::ToString:
>-        if (oper != PRUnichar(0)) {
>+        if (oper != PRUnichar(0) && oper != PRUnichar(' ')) {
I think you can assert that it's not PRUnichar(0), and only check
whether it's a space.

+    return NULL;

nsnull

>+void
>+nsCSSSelectorList::CalcWeight()
>+{
>+  mWeight = 0;
>+  for (nsCSSSelector* sel = mSelectors; sel; sel = sel->mNext)
>+    mWeight += sel->CalcWeight();
> }

Could you brace the single-line body of the for-loop?

Also, it might be slightly better to use a local variable for the
accumulating and then assign that into mWeight at the end.

+   * Calculate the weight of the complete list.

...and assign it to mWeight.

Also comment that this should be called only during parsing, and other
callers should just use mWeight.

> }
>-
> </script>

I actually like that blank line (but not the other one).
Attachment #393649 - Flags: review?(dbaron) → review-
Depends on: 511147
Attached patch above, merged to e9cc1a601ec6 (obsolete) — Splinter Review
This is the above patch, merged to e9cc1a601ec6; the difference is because parts of it landed in bug 511147.

I made no attempt to address review comments.

I'll attempt to merge against the next revision shortly.

(Were you planning to get this patch in?)
I got this by merging fe51cec5c007 on top of the previous patch and then diffing the result against the original fe51cec5c007 (bug 520848).  Not sure if I got it right, though, and not compiled/tested.
(In reply to comment #14)
> Created an attachment (id=405339) [details]
> above, merged to e9cc1a601ec6
> 
> This is the above patch, merged to e9cc1a601ec6; the difference is because
> parts of it landed in bug 511147.

Thanks!

> I made no attempt to address review comments.
> I'll attempt to merge against the next revision shortly.
> (Were you planning to get this patch in?)

I was just coming back to it today, since you landed the pseudoclass/pseudoelement patch.
This is a thorough revision addressing all the review comments.  This piece is just refactoring ParseSelector[Group], along the same lines as the previous one, but not exactly the same.  Responsibility for updating the nsCSSSelectorList object is moved to ParseSelector, which makes the control flow in ParseSelectorGroup much easier to follow and doesn't complicate ParseSelector much.  (The loop in ParseSelector has too many exits, but I wish to blame this on EOF not being a token, which I propose to fix in the near future.)

In this version, I left the GetToken() with PR_FALSE followed by GetToken() with PR_TRUE thing strictly alone.  I might mess with that later, but it seems like it ought to be its own bug.

Your other review comments belong to the other piece; however, I wish to draw attention to the new XXX comment in ParseSelector:

    // XXX shouldn't selectors with pseudo-elements contribute to the
    // weight, even if the pseudo-element itself doesn't?

AFAICT, in a selector construct like "a b::pseudo", we don't add anything to the weight for "b" -- both before and after this refactor -- that seems wrong to me.  But I'm leaving it alone.  (The older patch would have changed this, by its introduction of CalcWeight() on lists, which paid no attention to pseudo-elements.)
Attachment #393649 - Attachment is obsolete: true
Attachment #405339 - Attachment is obsolete: true
Attachment #405341 - Attachment is obsolete: true
Attachment #405548 - Flags: review?(dbaron)
This piece is the meat of the patch, changing nsCSSSelector to represent the descendant combinator by ' ' in mOperator, rather than '\0'.  I still change nsCSSSelectorList::AddSelector around a bit, but there is no longer a nsCSSSelectorList::CalcWeight.  Other review comments ...

> >-#define NS_IS_GREEDY_OPERATOR(ch) (ch == PRUnichar(0) || ch == PRUnichar('~'))
> >+#define NS_IS_GREEDY_OPERATOR(ch) (ch == PRUnichar(' ') || ch == PRUnichar('~'))
> 
> While you're here, please fix the macro to brace its own argument, i.e.,
>  ((ch) == PRUnichar ...

Done; are the casts to PRUnichar necessary?

> In nsCSSSelector::ToString:
> >-        if (oper != PRUnichar(0)) {
> >+        if (oper != PRUnichar(0) && oper != PRUnichar(' ')) {
> I think you can assert that it's not PRUnichar(0), and only check
> whether it's a space.

Done.

> +    return NULL;
> nsnull

Done, but why is this necessary?  (Personally I prefer to write the null pointer as '0', but I imagine that's not going to be popular even if it works.)
Attachment #405553 - Flags: review?(dbaron)
I pushed both patches to the try server together, btw.  (Yay, I can do that now!)
When somebody writes "a b::before" we actually represent it as "a b > ::before".
(In reply to comment #20)
> When somebody writes "a b::before" we actually represent it as "a b >
> ::before".

I do realize that.  But, in the present code, *neither* the "b" node nor the "::before" node contributes to the weight, which seems wrong.
Filed bug 524175 on the weight issue, which is quite real.
Comment on attachment 405548 [details] [diff] [review]
revised patch 1/2: refactor ParseSelector/ParseSelectorGroup

Sorry for taking so longe to get to this.

Could you make sure that this patch doesn't cause us to incorrectly 
accept:
  foo::first-letter bar, p
and in turn let it match p elements.  You should add a test that it
doesn't.  I think it's ok because of the eSelectorParsingStatus_Done
returned from ParsePseudoSelector.

Also, are you sure this doesn't break querySelector[All] with extra
whitespace at the end of the selector?  Could you make sure we have 
tests for that?

Please make sure you've merged correctly with the fix to bug 524175.

r=dbaron
Attachment #405548 - Flags: review?(dbaron) → review+
Comment on attachment 405553 [details] [diff] [review]
revised patch 2/2: use ' ' instead of '\0' for descendant combinator

>-      if (!next->IsPseudoElement()) {
>+      if (next->IsPseudoElement()) {
>+        NS_ASSERTION(oper == PRUnichar('>'),
>+                     "improperly chained pseudo element");
>+      } else {
>+        PRUnichar oper = s->mOperator;
>+        NS_ASSERTION(oper != PRUnichar(0),
>+                     "compound selector without combinator");
>+

This doesn't look like it compiles #ifdef DEBUG, since you need to move the
declaration of oper up before the if.

r=dbaron


Also, you should write one additional patch on top of these, to remove
the distinction between eSelectorParsingStatus_Empty and
eSelectorParsingStatus_Error (replace Empty with Error), and move the
error reporting for the cases currently returning empty into
ParseSelector (which the second patch makes easier).
Attachment #405553 - Flags: review?(dbaron) → review+
This needed a lot of merge work.  Also, I added tests for querySelector with trailing whitespace, and for pseudo-elements not at the end of the selector invalidating the entire ruleset.  The latter exposed a bug which was easiest fixed by having ParsePseudoSelector be pickier about what comes after a pseudo-element (it was not looking past whitespace).

This is different enough from the previous iteration that I'd like another pair of eyes on it before it goes in.
Attachment #405548 - Attachment is obsolete: true
Attachment #433195 - Flags: review?(dbaron)
(In reply to comment #25)
> This needed a lot of merge work.  Also, I added tests for querySelector with
> trailing whitespace, and for pseudo-elements not at the end of the selector
> invalidating the entire ruleset.  The latter exposed a bug which was easiest
> fixed by having ParsePseudoSelector be pickier about what comes after a
> pseudo-element (it was not looking past whitespace).

Could you explain the bug in a little more detail, and also explain why the selector parsing status wasn't sufficient to fix it?
(In reply to comment #26)
> (In reply to comment #25)
> > This needed a lot of merge work.  Also, I added tests for querySelector with
> > trailing whitespace, and for pseudo-elements not at the end of the selector
> > invalidating the entire ruleset.  The latter exposed a bug which was easiest
> > fixed by having ParsePseudoSelector be pickier about what comes after a
> > pseudo-element (it was not looking past whitespace).
> 
> Could you explain the bug in a little more detail, and also explain why the
> selector parsing status wasn't sufficient to fix it?

Consider 

  div::first-line > a, #other { ... }

The buggy version of ParsePseudoSelector would look for another construct following 'first-line' but would only do so with GetToken(PR_FALSE), i.e. not skipping whitespace.  It would see the whitespace, assume all was well, and return eSelectorParsingStatus_Done.  That would break us out of the loop in ParseSelector.  But ParseSelector *always* returns eSelectorParsingStatus_Done when it has finished successfully, so ParseSelectorGroup would proceed to look for a combinator, find the '>', and carry on.

The change makes ParsePseudoSelector look ahead *skipping* whitespace and issue an error unless it finds a comma or a {.  This is safe, because the descendant combinator (white space) is improper after a pseudo-element, too.  And we get better diagnostics this way than we would if I bodged ParseSelector into returning eSelectorParsingStatus_Continue when there could be a combinator following.
dbaron: you were correct about 'oper' in nsCSSSelector::ToString.  This revision fixes that, merges up with the previous patch, and fixes up some comments.
Attachment #405553 - Attachment is obsolete: true
Attachment #433233 - Flags: review+
Here's dbaron's requested patch to remove eSelectorParsingStatus_Empty.  It then becomes natural for ParseSelector to return a boolean, so I made that change.  It *almost* makes sense to eliminate the nsSelectorParsingStatus enumeration altogether, but it would mean reexamining the same token a bunch of times, which might be bad.
Attachment #433240 - Flags: review?(dbaron)
Minor corrections based on try server results -- one of the return statements in ParseSelector had the wrong value, and I missed some error cases.
Attachment #433240 - Attachment is obsolete: true
Attachment #433629 - Flags: review?(dbaron)
Attachment #433240 - Flags: review?(dbaron)
Comment on attachment 433195 [details] [diff] [review]
re-revised patch 1/2: refactor ParseSelector/ParseSelectorGroup

r=dbaron.

Please don't make me review this again; it's a pretty messy refactoring to follow, and three times is enough.
Comment on attachment 433629 [details] [diff] [review]
revised patch 3/2: eliminate eSelectorParsingStatus_Empty

r=dbaron
Attachment #433629 - Flags: review?(dbaron) → review+
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: