Closed Bug 1196896 Opened 9 years ago Closed 9 years ago

expose AppendImpliedEOFCharacters via CSSLexer

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: tromey, Assigned: tromey)

References

Details

Attachments

(1 file, 7 obsolete files)

For the devtools as-authored project (see bug 984880) we'd like access
to the implied EOF characters that are known by the CSS lexer.
In particular we'd like AppendImpliedEOFCharacters accessible via
CSSLexer.
Here's the initial patch.  I'm not sure if I really want this handling
of backslash, or something else.
Blocks: 1195398
Attachment #8650653 - Flags: review?(cam)
Comment on attachment 8650653 [details] [diff] [review]
add CSSLexer.impliedEOFCharacters

Review of attachment 8650653 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/CSSLexer.cpp
@@ +83,5 @@
> +                  | nsCSSScanner::eEOFCharacters_DropBackslash)) {
> +    eofChars &= ~(nsCSSScanner::eEOFCharacters_ReplacementChar
> +                  | nsCSSScanner::eEOFCharacters_DropBackslash);
> +    aResult.Append('\\');
> +  }

Why do you want to trailing backslashes differently from how the CSS scanner actually deals with them?  It's hard for me to tell without knowing how you will use this in devtools.
My suggestion, if you want something that returns strings rather than a bitfield like the actual nsCSSScanner::GetEOFCharacters() return value, would be to have two IDL attributes -- one that returns whether a trailing backslash should be removed, and one that returns the AppendImpliedEOFCharacters() string unmodified.  Then you have all the information required to make those fixups in devtools code.
Blocks: 1197967
(In reply to Cameron McCormack (:heycam) from comment #2)

> Why do you want to trailing backslashes differently from how the CSS scanner
> actually deals with them?  It's hard for me to tell without knowing how you
> will use this in devtools.

Yes, sorry, I should have explained this more.

The basic use for this in the devtools is to terminate values when rewriting
CSS text.  This helps avoid problems where unusual user input can have a bad
effect beyond just invalidating a given property (say by invalidating the rest
of the style sheet).

(In reply to Cameron McCormack (:heycam) from comment #3)
> My suggestion, if you want something that returns strings rather than a
> bitfield like the actual nsCSSScanner::GetEOFCharacters() return value,
> would be to have two IDL attributes -- one that returns whether a trailing
> backslash should be removed, and one that returns the
> AppendImpliedEOFCharacters() string unmodified.  Then you have all the
> information required to make those fixups in devtools code.

Sounds good to me, I will make this change.
Thanks.
Attachment #8650653 - Attachment is obsolete: true
Attachment #8650653 - Flags: review?(cam)
Attachment #8655541 - Flags: review?(cam)
Comment on attachment 8655541 [details] [diff] [review]
add CSSLexer.impliedEOFCharacters and CSSLexer.trailingBackslash

Review of attachment 8655541 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/CSSLexer.cpp
@@ +73,5 @@
>    return mScanner.GetColumnNumber();
>  }
>  
> +bool
> +CSSLexer::TrailingBackslash()

DroppedTrailingBackslash might be more descriptive?

@@ +77,5 @@
> +CSSLexer::TrailingBackslash()
> +{
> +  uint32_t eofChars = mScanner.GetEOFCharacters();
> +  return (eofChars & (nsCSSScanner::eEOFCharacters_ReplacementChar
> +                      | nsCSSScanner::eEOFCharacters_DropBackslash)) != 0;

This still loses a bit of information.  A trailing backslash can cause two behaviours: (a) it's in a string, so it's just dropped, and (b) it's outside of a string, so it's dropped, but a U+FFFD character is added in its place.  If you want to handle both of these, you could make TrailingBackslash just return whether eEOFCharacters_DropBackslash is in eofChars, and then allow the U+FFFD to be added to the string returned by GetImpliedEOFCharacters.
Attachment #8655541 - Flags: review?(cam)
(In reply to Cameron McCormack (:heycam) from comment #6)

> This still loses a bit of information.  A trailing backslash can cause two
> behaviours: (a) it's in a string, so it's just dropped, and (b) it's outside
> of a string, so it's dropped, but a U+FFFD character is added in its place. 
> If you want to handle both of these, you could make TrailingBackslash just
> return whether eEOFCharacters_DropBackslash is in eofChars, and then allow
> the U+FFFD to be added to the string returned by GetImpliedEOFCharacters.

The main use for this addition, on the devtools side, is to tidy up weird
user input before editing a style sheet.  In this code I think we definitely
don't want to insert U+FFFD, as that would be confusing.

That said, I do agree that it would be useful (if not immediately used) to
be able to mimic the real CSS parser using this API.  But it seems to me that
the data is recoverable -- if droppedTrailingBackslash is true, then one
can decide whether to add U+FFFD depending on whether or not a quote appears
in impliedEOFCharacters.

I'll add a comment to this effect to the webidl.

Actually looking at

https://dxr.mozilla.org/mozilla-central/source/layout/style/nsCSSScanner.cpp#601

... it seems that the U+FFFD is appended *and* the ReplacementChar bit is set.
I don't understand that.
Updated per review.
Attachment #8655541 - Attachment is obsolete: true
Comment on attachment 8656060 [details] [diff] [review]
add CSSLexer.impliedEOFCharacters and CSSLexer.trailingBackslash

I made the rename and added a comment about trailing backslash handling.

If you still want a change around the U+FFFD thing, then I think perhaps
a second boolean attribute would be good; since I think we don't want
U+FFFD in impliedEOFCharacters for devtools -- we'd just need separate
code to remove it.  Plus there is the oddity that the U+FFFD is already
added (which doesn't presently affect devtools since it is primarily using
impliedEOFCharacters to terminate user input, and in this scenario not
really using the "computed" text).
Attachment #8656060 - Flags: review?(cam)
The backslash-just-before-EOF behaviour needs to append a U+FFFD character to aOutput, since that's what the spec says the "escape" generates.  EOFCharacters indicates what special processing is required if we want to take the input and now make it not appear right before EOF -- in this case, appending an U+FFFD character does the trick.  For example:

  <style>p { --a:ab\</style>

will declare a variable whose value is a single ident token whose value is three characters long -- "a", "b" and U+FFFD.  If we need to paste that value somewhere in the middle of another value:

  <style>p { --b:var(--a)cd</style>

then putting a U+FFFD after the backslash will give the string (using JS string syntax here) "ab\\�" pasted together with the string "cd" to give "ab\\�/**/cd" which will end up being interpreted as two ident tokens (per the rules for pasting token strings together in css-variables).  If we didn't add the U+FFFD character, we'd have "ab\\/**/cd", which would tokenize as an ident "ab/", two symbols "*" and "/", and a second ident "cd".

(Note that for example appending a "0" instead of U+FFFD would also do the trick, as "\0" gets replaced by U+FFFD too.)
So I was wrong in comment 6 above; sorry.  A trailing backslash outside of a string is *not* dropped; we imply a U+FFFD character after the backlash.
(In reply to Tom Tromey :tromey from comment #7)
> The main use for this addition, on the devtools side, is to tidy up weird
> user input before editing a style sheet.  In this code I think we definitely
> don't want to insert U+FFFD, as that would be confusing.

I guess it depends where in the UI you're using this.  If it's in the right hand panel where you edit individual values, then it might or might not make sense to add the U+FFFD there.  Something needs to happen I suppose -- you can't leave the backslash at the end of the value, which in the style sheet would probably be followed by a semicolon.  I am sure no user would complain at dropping the backslash though. :-)

TBH if we have the DroppedTrailingBackslash method then it should reflect whether a trailing backslash was dropped, not whether "a trailing backslash was dropped, OR a trailing backslash was encountered, preserved, and a U+FFFD was implied too".

So how about this: have a single method named something like PerformEOFFixup, which takes the input string and then applies the EOFCharacters instructions to it.  You would need to handle DropBackslash yourself in this function before calling AppendEOFCharacters.  If you definitely want to avoid having user input in the style editor add a U+FFFD when you have a trailing backslash, then you can have an argument to the method which would convert eEOFCharacters_ReplacementChar into eEOFCharacters_DropBackslash, before you do your DropBackslash processing.

Would that work for you?  I apologise for taking a couple of weeks to get back to this.
Flags: needinfo?(ttromey)
(In reply to Cameron McCormack (:heycam) from comment #12)

> So how about this: have a single method named something like
> PerformEOFFixup, which takes the input string and then applies the
> EOFCharacters instructions to it.  You would need to handle DropBackslash
> yourself in this function before calling AppendEOFCharacters.  If you
> definitely want to avoid having user input in the style editor add a U+FFFD
> when you have a trailing backslash, then you can have an argument to the
> method which would convert eEOFCharacters_ReplacementChar into
> eEOFCharacters_DropBackslash, before you do your DropBackslash processing.
> 
> Would that work for you? 

Yeah, this will work for me.  It seems basically equivalent to the first approach,
at least if I understand properly.

All the current cases in my patch series work like this:

function getLexerEOFChars(lexer) {
  return (lexer.droppedTrailingBackslash ? "\\" : "") +
    lexer.impliedEOFCharacters;
}

... which is basically the same idea.

I may be overthinking the trailing backslash issue.  On the one hand, appending
U+FFFD would be confusing in the UI.  But on the other, I agree, just dropping
the backslash would be very unlikely to trouble anyone.
Flags: needinfo?(ttromey)
Rewritten per review.
Attachment #8656060 - Attachment is obsolete: true
Attachment #8656060 - Flags: review?(cam)
Attachment #8661871 - Flags: review?(cam)
Comment on attachment 8661871 [details] [diff] [review]
add CSSLexer.impliedEOFCharacters and CSSLexer.trailingBackslash

Review of attachment 8661871 [details] [diff] [review]:
-----------------------------------------------------------------

PerformEOFFixup needs to remove the trailing backslash, if eEOFCharacters_DropBackslash is present, not add one.  Otherwise re-parsing the string will give a different result.  Parsing a value (using JS string syntax) of '"ab\' should be normalized to '"ab"', not '"ab\\"'.  If devtools would like to have this different behaviour, in addition to the different behaviour of not adding a U+FFFD for a trailing backslash outside a string but instead adding another backslash, then that's what the aAppendBackslash argument should control.  Maybe call it aLessSurprisingTrailingBackslashHandling or something similar? :)

So, ultimately PerformEOFFixup needs to do the following, if those two argument-controlled differences are desired:

  result = aInput
  if aLessSurprisingTrailingBackslashHandling, then:
      if eofChars has eEOFCharacters_ReplacementChar and/or
                      eEOFCharacters_DropBackslash, then:
          append a backslash to result
          remove eEOFCharacters_ReplacementChar and
                 eEOFCharacters_DropBackslash from eofChars
  if eofChars has eEOFCharacters_DropBackslash, then:
      if result ends with "\\", then remove it
  AppendImpliedEOFCharacters(eofChars, result)

r=me if you change PerformEOFFixup to do that and update the comment and test accordingly.
Attachment #8661871 - Flags: review?(cam) → review+
Attached patch add CSSLexer.performEOFFixup (obsolete) — Splinter Review
Thanks for the review.  I finally understand what you were telling me;
I'm sorry it took so long.

I believe this patch addresses the review comments.
Attachment #8661871 - Attachment is obsolete: true
Attached patch add CSSLexer.performEOFFixup (obsolete) — Splinter Review
Rebased; required a tiny fix in the test to account for a let->var
change.
Attachment #8662523 - Attachment is obsolete: true
(In reply to Tom Tromey :tromey from comment #16)
> Thanks for the review.  I finally understand what you were telling me;
> I'm sorry it took so long.

Looks good!  I probably was less than clear explaining the issue. :-)

A couple of nits, if you want to address them before landing:

* You can use result.Truncate(result.Length() - 1) if you want to avoid the extra arg calling Cut.
* Can you mention in the comment in CSSLexer.webidl that the |preserveBackslash = false| behaviour is what the "correct" behaviour according to the CSS Syntax spec is, but that |preserveBackslash = true| might give more intuitive behaviour?
Attached patch add CSSLexer.performEOFFixup (obsolete) — Splinter Review
Attachment #8662525 - Attachment is obsolete: true
Attachment #8662945 - Flags: review+
Keywords: checkin-needed
this needs peer review for the change in /webidl/CSSLexer.webidl
Flags: needinfo?(ttromey)
Keywords: checkin-needed
Oops, I'm sorry about that.
Flags: needinfo?(ttromey)
Comment on attachment 8662945 [details] [diff] [review]
add CSSLexer.performEOFFixup

Needs dom peer review for the webidl change.
Attachment #8662945 - Flags: review+ → review?(bugs)
Comment on attachment 8662945 [details] [diff] [review]
add CSSLexer.performEOFFixup

bz is even better reviewer, given that he reviewed Bug 1152033.
Attachment #8662945 - Flags: review?(bugs) → review?(bzbarsky)
Comment on attachment 8662945 [details] [diff] [review]
add CSSLexer.performEOFFixup

>+  nsString& result(aResult.AsAString());

If you plan to do this anyway, just declare aResult as being of type |nsAString&| and the bindings will do the right thing for you (via DOMString's conversion operator to nsAString, but that's an implementation details).

r=me with that.
Attachment #8662945 - Flags: review?(bzbarsky) → review+
Updated per review.
Attachment #8662945 - Attachment is obsolete: true
Attachment #8664445 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6f33c532c707
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: