Closed Bug 1152033 Opened 9 years ago Closed 9 years ago

expose CSS tokenizer to javascript

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: tromey, Assigned: tromey)

References

Details

(Keywords: dev-doc-needed)

Attachments

(1 file, 7 obsolete files)

For the "as-authored" feature (bug 984880) in devtools, we'd like to be able to reuse
the existing CSS tokenizer.  This would let us remove our somewhat buggy pure js
tokenizer.

I have a patch in progress, but fwiw what we need precisely is some way to provide
the a string and get back an iterator over tokens.  The tokens should include any
useful information (token type of course, but also identifier, etc), and also the
offset of the start and end of the token (so the precise input text can be retrieved).

Additionally, it would be convenient for another devtools use to be able to
request that comments be returned as tokens.
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
Here's a first draft of the patch.  No tests yet.

I chose to reuse the webidl enum constants in nsCSSScanner, but didn't
do a wholesale search-and-replace.  Perhaps the latter would be
cleaner, but I wanted to keep the patch reasonably small.  Relatedly,
I did consider that we'd possibly want to keep the internal and
exposed token types separate, in case at some point they need to
differ.  However, this seemed unnecessary and anyway the current
approach doesn't lock us in.

I have a patch to change the devtools to use this code, but I haven't
converted css-autocompleter.js yet.

I had hoped to have the interface be more generator-like, but I didn't
see a way to do that in webidl.  In the devtools patch I replace
css-tokenizer.js with a function* wrapping this interface.

Finally, note the pre-existing FIXME comment in nsCSSScanner::Next
about bug 60290.  After this change, this comment is no longer
properly placed.  However, I don't know where else it should go.
Maybe just deleting it is best.
> I had hoped to have the interface be more generator-like, but I didn't
> see a way to do that in webidl.

You can do that by defining an iterable interface, kinda.  Still not quite the same as a generator.
(In reply to Not doing reviews right now from comment #2)
> > I had hoped to have the interface be more generator-like, but I didn't
> > see a way to do that in webidl.
> 
> You can do that by defining an iterable interface, kinda.  Still not quite
> the same as a generator.

Sorry, I wrote the wrong thing -- the lexer interface is xpcom, not webidl.
A couple notes to remind myself of things to do --

The eventual tests ought to test EOF conditions, so that if bug 880151 is implemented,
we can be sure that the js-exposed tokenizer does what we want.  My feeling is that it
should stick to lexing what is there, not provide any synthetic tokens.

This patch needs to call SetErrorReporter on the tokenizer.  These errors can simply
be dropped, I think.  The tests ought to cover these cases as well.
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
Now with tests.

Testing revealed some oddities in the CSS tokenizer.  It doesn't seem
possible to provoke a BAD_STRING token, and BAD_URL (contra both the
spec and comments in nsCSSScanner.cpp) is not returned if the URL is
unterminated at EOF.  I'll bother people on #css to get an answer
about this.

It was simpler in the end to have nsCSSScanner handle the case of
mReporter==nullptr rather than create a new instance of ErrorReporter
just to throw the result away.  (Plan B here would be to introduce a
new abstract base class.)

I'm mostly done converting existing callers of cssTokenizer to use the
new functionality.  css-autocompleter.js is down to just a few
remaining fixups -- it works strictly on a line+column basis (and,
worse, tries to lex line-by-line, which isn't always valid) and the
current API only returns string offsets.  I may just add line+column
info to nsICSSLexer, as the information is available, and this would
make the conversion simpler.
Attachment #8589655 - Attachment is obsolete: true
Blocks: 1153305
Blocks: 978557
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
I think this version is ready for review.  No answer on #css about
either the oddities I found or who ought to review it; will ask around
a bit more aggressively next week.

I ended up adding line+column info to the interface, on the theory
that the lexer keeps the information anyhow, so it is cheap to expose.
On the other hand, it isn't really that useful in most cases, so I
didn't add it to the token objects.
Attachment #8590488 - Attachment is obsolete: true
(In reply to Tom Tromey :tromey from comment #6)
> I think this version is ready for review.  No answer on #css about
> either the oddities I found or who ought to review it; will ask around
> a bit more aggressively next week.

I think #css is more for people with questions about using CSS; Gecko developers familiar with our CSS code tend to be in #layout.  Any of {me,dbaron,bz,dholbert} would be good reviewers I think.
I think heycam should review it. :-)
(In reply to Tom Tromey :tromey from comment #5)
> Testing revealed some oddities in the CSS tokenizer.  It doesn't seem
> possible to provoke a BAD_STRING token, and BAD_URL (contra both the
> spec and comments in nsCSSScanner.cpp) is not returned if the URL is
> unterminated at EOF.  I'll bother people on #css to get an answer
> about this.

In general, the spec has been changed so that all constructs unclosed at EOF are validly closed.  I don't know if it's been changed well, so the prose might be more up-to-date than some of the formalism.  Bugs present in http://dev.w3.org/csswg/css2/syndata.html#tokenization or http://dev.w3.org/csswg/css-syntax/ should be reported, though; if you're looking at an older spec it might be the wrong spec.  I can completely believe some of our comments are out-of-date.

You should be able to get a BAD_STRING token by including \0, LF, FF, or CR.

You should be able to get BAD_URL with one of three things:
 * url() containing a BAD_STRING
 * url() containing a string followed by something other than whitespace and then a )
 * url( followed by optional whitespace and then the legal characters for an unquoted URL (which, notably, do not include quotation marks, whitespace, parentheses, or ASCII control characters), followed by something other than optional whitespace and a )
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #9)

> Bugs present in
> http://dev.w3.org/csswg/css2/syndata.html#tokenization or
> http://dev.w3.org/csswg/css-syntax/ should be reported, though; if you're
> looking at an older spec it might be the wrong spec.  I can completely
> believe some of our comments are out-of-date.

Thanks.  I was indeed looking at the wrong spec.
I've updated my test case to follow.

While doing this I think I found one bug; brought it up on #layout
but will perhaps also file.
I filed bug 1153981 to document the lexer discrepancy.
It perhaps doesn't really matter, but it's handy at least for commenting
in the test case.
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
Updated to document possibly-unexpected lexer decisions and to link to
bug 1153981; and to fix test failures due to me changing the lexer to
use 0-based lines.
Attachment #8591064 - Attachment is obsolete: true
Attachment #8591820 - Flags: review?(cam)
I hope it's not too late yet to suggest this, though I wanted to ask whether it wouldn't make sense to orient this API by the Parser API[1] exposed for JavaScript.

Sebastian

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API
OS: Linux → All
Hardware: x86_64 → All
I wouldn't be surprised if in the future the Houdini work in the CSS Working Group produces an API that exposes the CSS parser to script, at which point we would probably want to align with that.  (Although that work may well take the SpiderMonkey Parser API / ESTree spec as inspiration too.)

Which is to say, I don't have a strong opinion on how this bug's API looks; I'll leave that to Tom to decide, since he and the other devtools folks will be the ones using it.
(In reply to Sebastian Zartner [:sebo] from comment #13)
> I hope it's not too late yet to suggest this, though I wanted to ask whether
> it wouldn't make sense to orient this API by the Parser API[1] exposed for
> JavaScript.

Yeah - I'm aware of Houdini and should probably have mentioned it earlier.

Currently the reason to expose the CSS tokenizer is just for use by devtools.
For this purpose I think it's fine for the interface to mirror the firefox
internals in any convenient way.

I agree that in the long run we should have a single way to expose the tokenizer,
and it should conform to the relevant standards.  However, we need something
now, and Houdini doesn't seem to be ready.

So my suggested plan would be to put this in (after reviews etc), and then, when
Houdini is ready, rework this interface and devtools to conform.  This won't be hard
and I'd be happy to do it.

Also see bug 1153981, which I filed to note differences between CSS Level 3 and
firefox tokenization.  I think this (plus associated parser changes) would be
needed for the Houdini approach.

FWIW I also looked at modifying the CSS parser to expose an AST.  I have like 25% of
a patch for it.  It's pretty complicated, and I felt that a simpler approach might
be preferable for my current goals.  Also, in the devtools code it's often more convenient
to work with a token stream, as some bits of code deal just with a single declaration,
and so would need special entry points into the parser as well.  (I'm happy to discuss
this ad nauseum but maybe somewhere else.)
Fwiw, I 100% agree with comment 15.
Agreed as well.

One comment, though:

(In reply to Tom Tromey :tromey from comment #15)
> FWIW I also looked at modifying the CSS parser to expose an AST.  I have
> like 25% of
> a patch for it.  It's pretty complicated, and I felt that a simpler approach
> might
> be preferable for my current goals.

If we were to do something like this, we'd probably want to first change the parser to look much more like css-syntax-3, i.e., by separating the abstract (what css-syntax-3 defines) and more concrete levels of parsing.
Blocks: 1154469
Blocks: 930680
Comment on attachment 8591820 [details] [diff] [review]
expose CSS lexer to js

Review of attachment 8591820 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.  Generally this looks good.  A couple of substantive comments/questions below.

::: dom/webidl/CSSToken.webidl
@@ +6,5 @@
> +
> +// The possible values for CSSToken.tokenType.
> +enum CSSTokenType {
> +  // Whitespace.
> +  "WHITESPACE",

I think the string values would be better as lowercase words.  That matches the style other Web IDL enums tend to use.  So "whitespace", "id", "htmlcomment", etc.

You will also need a DOM peer's review on this file.

::: layout/inspector/nsICSSLexer.idl
@@ +11,5 @@
> + * @see CSSToken for a description of the tokens.
> + * @see inIDOMUtils.getCSSLexer to create an instance of the lexer.
> + */
> +[scriptable, uuid(82d0fba1-fb53-4739-9ba0-e8b006253446)]
> +interface nsICSSLexer : nsISupports

Is it important to have an XPIDL interface for this?  (Is the intention for add-ons to be able to use it?)  Given that we are using Web IDL definitions for the meaty parts of the interface (the actual representation of the tokens), could we just make this a Web IDL interface instead?  That'd let us avoid out-params for get{Line,Column}Number, should let us write nsCSSLexer::NextToken without the ToJSValue call (as the dictionary value would be provided as an out-param argument, rather than a JS::MutableHandle), and generally just be a little neater.  WDYT?

::: layout/style/nsCSSLexer.cpp
@@ +39,5 @@
> +}
> +
> +/* [implicit_jscontext] jsval nextToken (); */
> +nsresult
> +nsCSSLexer::NextToken (JSContext* aCx, JS::MutableHandle<JS::Value> aToken)

No space before "(".

@@ +43,5 @@
> +nsCSSLexer::NextToken (JSContext* aCx, JS::MutableHandle<JS::Value> aToken)
> +{
> +  nsCSSToken token;
> +  if (!mScanner.Next(token, eCSSScannerExclude_none)) {
> +    aToken.set(JS::UndefinedValue());

aToken.setUndefined() should work too.

@@ +54,5 @@
> +  resultToken.mStartOffset = mScanner.GetTokenOffset();
> +  resultToken.mEndOffset = mScanner.GetTokenEndOffset();
> +
> +  switch (token.mType) {
> +  case eCSSToken_Whitespace:

Indent case statements one more level.

@@ +96,5 @@
> +  case eCSSToken_Containsmatch:
> +    break;
> +
> +  case eCSSToken_URange:
> +    resultToken.mText.Construct(token.mIdent);

CSSToken.webidl should mention that we set mText in this case.

::: layout/style/nsCSSLexer.h
@@ +11,5 @@
> +#include "nsICSSLexer.h"
> +
> +class nsCSSLexer : public nsICSSLexer
> +{
> + public:

"public:" and "private:" in (1-based) column 1 please.

@@ +22,5 @@
> + private:
> +
> +  virtual ~nsCSSLexer();
> +
> + private:

No need to duplicate "private:".

::: layout/style/nsCSSScanner.h
@@ +26,5 @@
>  enum nsCSSTokenType {
>    // White space of any kind.  No value fields are used.  Note that
>    // comments do *not* count as white space; comments separate tokens
>    // but are not themselves tokens.
> +  eCSSToken_Whitespace = static_cast<int>(mozilla::dom::CSSTokenType::WHITESPACE),     //

Hmm, it would be good not to have nsCSSScanner depend on things in dom/, and I wonder if we can make this a bit cleaner somehow.  What about not specifying the values for each of the nsCSSTokenType enums, but enforcing their equivalence to the CSSTokenType enum values using some static_asserts somewhere (in nsCSSLexer.cpp?) that verify that they are equal?

@@ +190,5 @@
> +  eCSSScannerExclude_none,
> +  // Include whitespace but exclude comments.
> +  eCSSScannerExclude_comments,
> +  // Exclude whitespace and comments.
> +  eCSSScannerExclude_whitespace_and_comments

I would suggest "eCSSScannerExclude_WhitespaceAndComments" to match the naming schemes other enums tend to use.

::: layout/style/test/xpcshell.ini
@@ +1,3 @@
> +[DEFAULT]
> +head = 
> +tail = 

Trailing white space.
Attachment #8591820 - Flags: review?(cam)
(In reply to Cameron McCormack (:heycam) from comment #18)

> I think the string values would be better as lowercase words.  That matches
> the style other Web IDL enums tend to use.  So "whitespace", "id",
> "htmlcomment", etc.
> 
> You will also need a DOM peer's review on this file.

I made the change; I'll r? a DOM peer next round.

> ::: layout/inspector/nsICSSLexer.idl
> @@ +11,5 @@
> > + * @see CSSToken for a description of the tokens.
> > + * @see inIDOMUtils.getCSSLexer to create an instance of the lexer.
> > + */
> > +[scriptable, uuid(82d0fba1-fb53-4739-9ba0-e8b006253446)]
> > +interface nsICSSLexer : nsISupports
> 
> Is it important to have an XPIDL interface for this?  (Is the intention for
> add-ons to be able to use it?)  Given that we are using Web IDL definitions
> for the meaty parts of the interface (the actual representation of the
> tokens), could we just make this a Web IDL interface instead?

As far as I know this isn't important.

I started conversion to WebIDL.  The first issue I have encountered
is that I don't know what to return from GetParentObject.  There doesn't
seem to be any convenient or logical result.

I'm hoping you have some advice here :)

> > +  case eCSSToken_URange:
> > +    resultToken.mText.Construct(token.mIdent);
> 
> CSSToken.webidl should mention that we set mText in this case.

I just removed this line instead, because I don't think it's important
for a token to carry the original source text.  It's easy enough to call
substring if it is needed.

I've got all the other style and naming issues fixed locally.
Flags: needinfo?(cam)
> I don't know what to return from GetParentObject

For which object, exactly?
(In reply to Not doing reviews right now from comment #20)
> For which object, exactly?

For the nsCSSLexer object created by nsIDOMUtils.getCSSLexer().  I guess we could make getCSSLexer implicit_jscontext, grab the window from its global, and store that on the nsCSSLexer and return it from GetParentObject?
Flags: needinfo?(cam) → needinfo?(bzbarsky)
(In reply to Not doing reviews right now from comment #20)
> > I don't know what to return from GetParentObject
> 
> For which object, exactly?

The CSSLexer object (aka nsCSSLexer in the patch attached to this bug).
Maybe I can use the inDOMUtils object?
Why does it need a GetParentObject method at all?  Are you making it wrappercached?  If so, just don't do that.  You'll have to do your own wrapping if you want to return it from an xpconnect method (and have the xpconnect method return jsval), but that should be ok.  Use dom::WrapNewBindingNonWrapperCachedObject and pass the current global on the JSContext as scopeArg.

> grab the window from its global

That will fail once someone tries to use this API in a JSM, right?  Let's not need to go there.  ;)

If for some reason we really do want this to be wrappercached, we can use the nsIGlobalObject that's current at call time as the parent or something.  But we really shouldn't need wrappercaching here.
Flags: needinfo?(bzbarsky)
(In reply to Not doing reviews right now from comment #23)
> Why does it need a GetParentObject method at all?  Are you making it
> wrappercached?  If so, just don't do that.

I don't know, I was just trying to follow the only docs I found:
https://developer.mozilla.org/en-US/docs/Mozilla/WebIDL_bindings

Step 2 says to write GetParentObject.
But now I see a note at the end about not needing this in some situation.

I couldn't find any documentation on wrapper caches.  Is there some?
I would like to learn about that.
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
Here is a new version that now uses webidl and that, I believe,
addresses the various style issues.
Attachment #8591820 - Attachment is obsolete: true
Comment on attachment 8599918 [details] [diff] [review]
expose CSS lexer to js

On irc bz said he'd do the DOM peer review bit.
Attachment #8599918 - Flags: review?(cam)
Attachment #8599918 - Flags: review?(bzbarsky)
Comment on attachment 8599918 [details] [diff] [review]
expose CSS lexer to js

It's probably worth documenting explicitly that "text" may not be present in the dictionary.  Also, is it present for url tokens?  Is it present for "number" tokens?  The comment makes it sound like it is, but I expect that it's only present for "dimension" tokens, not "number" ones.  Looking at the code, it's also present for "function" tokens...  Worth documenting the exact behavior.

Similarly, document when the hasSign/isInteger properties are present.  And "number" is present for dimension tokens and percentage tokens, not just number ones, right?

>+  unsigned long getLineNumber();

Is there a reason this is not a "readonly attribute long lineNumber"?  Similar for getColumnNumber.

For nextToken(), please document that null return means EOF?

0-based is an odd choice for line numbers... everything else in the system uses 1-base, no?  I'm OK with it, assuming there's a good reason, but just noting it.

>+#include "CSSLexer.h"

mozilla/dom/CSSLexer.h

>+using namespace mozilla;

For this file, probably better to just wrap all the stuff in:

  namespace mozilla {
  namespace dom {
  ...
  } // namespace mozilla
  } // namespace dom

with the comments like that.  Also, please add similar comments in the header.

>+namespace mozilla
>+{

Curly should be on previous line.

Please document in nsCSSScanner that it does NOT take ownership of the passed-in string, which must therefore outlive the scannner.

r=me on the DOM bits with those nits.
Attachment #8599918 - Flags: review?(bzbarsky) → review+
(In reply to Not doing reviews right now from comment #27)

> Similarly, document when the hasSign/isInteger properties are present. 
[...]

I'll fix up the docs, etc.

> For nextToken(), please document that null return means EOF?
> 
> 0-based is an odd choice for line numbers... everything else in the system
> uses 1-base, no?  I'm OK with it, assuming there's a good reason, but just
> noting it.

I am not sure I would call it a good reason exactly...

After converting all the devtools code to using the new lexer, there is just
a single user of the line/column information, css-autocompleter.js.  It seemed
somewhat difficult (and low-value) to port to using offsets.  And, it already
used 0-based line numbers, because that is what the previous js tokenizer used.

So, we could change this API to be 1-based, but then the only user would have
to wrap it to be 0-based.

Ideally we'd rewrite css-autocompleter to use offsets and then we could simply
remove this part of the API.
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
I think this version addresses all the review comments from bz.
Attachment #8599918 - Attachment is obsolete: true
Attachment #8599918 - Flags: review?(cam)
Comment on attachment 8600003 [details] [diff] [review]
expose CSS lexer to js

Not sure how to carry over one r+ but not another, so I'll just
note here that I intend to do so.
Attachment #8600003 - Flags: review?(cam)
Comment on attachment 8600003 [details] [diff] [review]
expose CSS lexer to js

Review of attachment 8600003 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  r=me with minor things below addressed.

::: layout/inspector/inDOMUtils.cpp
@@ +45,5 @@
>  #include "nsColor.h"
>  #include "nsStyleSet.h"
>  #include "nsStyleUtil.h"
>  #include "nsQueryObject.h"
> +#include "CSSLexer.h"

I think you missed bz's comment here to make this mozilla/dom/CSSLexer.h.  Also please keep the includes sorted, since we already are.

::: layout/style/CSSLexer.cpp
@@ +12,5 @@
> +namespace dom {
> +
> +// Ensure that constants are consistent.
> +
> +#define CHECK(X, Y) \

Make sure to #undef CHECK after using it, so that it doesn't leak into other files because of unified compilation.

@@ +136,5 @@
> +  }
> +}
> +
> +} // dom
> +} // mozilla

Should be "// namespace mozilla" to match the style used in other files.

::: layout/style/CSSLexer.h
@@ +7,5 @@
> +#define CSSLexer_h___
> +
> +#include "mozilla/UniquePtr.h"
> +#include "nsCSSScanner.h"
> +#include "mozilla/dom/CSSLexerBinding.h"

I think CSSLexerBinding.h isn't actually needed here in CSSLexer.h.

@@ +15,5 @@
> +
> +class CSSLexer : public NonRefcountedDOMObject
> +{
> +public:
> +

Nit: No blank line after this "public:" (and the "private:" below).

@@ +33,5 @@
> +  nsCSSScanner mScanner;
> +};
> +
> +} // dom
> +} // mozilla

Here too.
Attachment #8600003 - Flags: review?(cam) → review+
(In reply to Tom Tromey :tromey from comment #30)
> Not sure how to carry over one r+ but not another, so I'll just
> note here that I intend to do so.

I usually note this with a r=whoever in the attachment title of the updated patch so I don't forget.
(In reply to Cameron McCormack (:heycam) from comment #31)
> Comment on attachment 8600003 [details] [diff] [review]

> > +#include "CSSLexer.h"
> 
> I think you missed bz's comment here to make this mozilla/dom/CSSLexer.h. 

Sorry about that.  I got one spot but missed the other.

> Also please keep the includes sorted, since we already are.

The include lines in inDOMUtils.cpp don't seem sorted to me.
In the new patch I've moved the #include to a spot near some other
mozilla/dom includes.

If you want I can file another bug and sort these #includes.

> > +} // dom
> > +} // mozilla
> 
> Should be "// namespace mozilla" to match the style used in other files.

Yeah, I don't know how I misread bz's request so badly here as well :(
(In reply to Cameron McCormack (:heycam) from comment #31)

> I think CSSLexerBinding.h isn't actually needed here in CSSLexer.h.

It is needed to get the definition of CSSToken for:

  void NextToken(Nullable<CSSToken>& aResult);
Attached patch expose CSS lexer to js (obsolete) — Splinter Review
Updated according to review.
Carrying forward r+.
Attachment #8600003 - Attachment is obsolete: true
Attachment #8600886 - Flags: review+
> It is needed to get the definition of CSSToken for:

Note that you could probably forward-declare that if you really wanted to.  That said, including binding .h files in implementation headers is explicitly ok because other things in bindings could not be forward-declared until recently (e.g. enums).
(In reply to Not doing reviews right now from comment #37)
> > It is needed to get the definition of CSSToken for:
> 
> Note that you could probably forward-declare that if you really wanted to. 
> That said, including binding .h files in implementation headers is
> explicitly ok because other things in bindings could not be forward-declared
> until recently (e.g. enums).

From reading the code, I think Nullable<T> requires T to be complete,
specifically because AlignedStorage2<T> uses sizeof:

    char mBytes[sizeof(T)];

So I think a forward declaration won't work here.
Add missing "explicit", pointed out the static analysis build.
I'm carrying over the r+ as this is trivial.
Attachment #8600989 - Flags: review+
Attachment #8600886 - Attachment is obsolete: true
> From reading the code, I think Nullable<T> requires T to be complete,

Ah, yes.  You could forward-declare Nullable, but again that's not really necessary.
(In reply to Tom Tromey :tromey from comment #33)
> > Also please keep the includes sorted, since we already are.
> 
> The include lines in inDOMUtils.cpp don't seem sorted to me.
> In the new patch I've moved the #include to a spot near some other
> mozilla/dom includes.

Sorry, was tricked by the context in the patch looking sorted, but the entire list of includes isn't.
That try run is gross at first glance, but I think it is due to
some infrastructure issue, as retriggering various parts of it
made it work ok.  So I am requesting checkin-needed.
Keywords: checkin-needed
For future reference, when making changes to an interface, please run ./mach update-uuids to bump the UUID. Otherwise, the build system won't know to re-build it.

In this case, it would be |./mach update-uuids inIDOMUtils|
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/4e973ce874e2
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Just wanted to add my sincere congrats and deep thanks for fixing this RFE.
This will allow me to *deeply* revamp and simplify the stylesheet handler
of BlueGriffon and finally rely on the builtin CSS lexer. Just wow.
Best case ever for Houdini...

One major nit though: the lexer replies incorrect values when asked to
deal with numbers having a fractional value. Example:

  var domUtils = Components
                   .classes["@mozilla.org/inspector/dom-utils;1"]
                   .getService(Components.interfaces.inIDOMUtils);
  var lexer = domUtils.getCSSLexer("12.3");
  var token = lexer.nextToken();

And token.number is.... 12.30000019073486
12.3 is not exactly representable as a floating-point number, so that's not surprising.
And in particular, that value is what is actually being stored internally when the CSS in question is parsed.
(In reply to Boris Zbarsky [:bz] from comment #49)
> 12.3 is not exactly representable as a floating-point number, so that's not
> surprising.

But the goal here is to allow access to specified values through lexing
of a stylesheet's contents, and that's also a potential step on Houdini's
path. The token should return the precise value in some way. I agree that
since the startOffset and the endOffset are available, it's always possible
to get the string corresponding to the number; and parseFloat'ing it will
be more reliable than token.number, but the whole thing is a bit overkill
and a bit weird too, I must say.
What does "precise value" mean?  Unless the tokenizer is doing binary coded decimal or something it's not going to be able to precisely express 12.3.  And even if it tried, as soon as you do .number you get an imprecise value in JavaScript.  Try it: (12.3).toFixed(23).  It just has a bit less error because it never goes through a 32-bit float in that case, but the error is still there...
Oh, and if the tokenizer _is_ doing BCD, it's not likely to be the same tokenizer as actually gets used for stylesheets.

Maybe the right answer is to remove .number, since we just can't win with it.
Ah, I was sure we had had this discussion before.  See bug 1163047.
Is the addon-available CSS tokenizer documented somewhere at MDN? I can't find any. Thanks.
I don't think there is any documentation on MDN.
However CSSLexer.webidl has comments describing the interface:
https://dxr.mozilla.org/mozilla-central/source/dom/webidl/CSSLexer.webidl
It's not documented yet. (That's why it still has the dev-doc-needed flag.)
It should be documented here:
https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/CSSLexer

Somebody steps up for it? Otherwise I may take a look at it at some point.

Sebastian
You need to log in before you can comment on or make changes to this bug.