Closed Bug 513132 Opened 15 years ago Closed 14 years ago

minimize css during build/packaging

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: taras.mozilla, Assigned: benedict)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [ts][needs patch][needs review css])

Attachments

(1 file, 6 obsolete files)

Our css parsing code will be quite happy if we make it work less. Removing comments, whitespace should be a a measurable win. Skipping comments is showing up in profiles here.
Whiteboard: ts
This would make CSS errors/warnings fail to report proper source file line numbers, which is kind of unfortunate...
(In reply to comment #1)
> This would make CSS errors/warnings fail to report proper source file line
> numbers, which is kind of unfortunate...

we can try to keep same lines, we'd still have a good speedup
Attached patch trims comments, whitespame (obsolete) — Splinter Review
does not preserve line numbers. I can work on that in the next patch.

also, if this results in a win, there are a few professional-grade CSS Minifiers that seem like they might be worth pursuing.
ben, can you port code from bug 463066 into jarmaker? It claims that it deals with "" and stuff properly, which yours doesn't.
It deals with them more safely, in the sense that if there's any quotes before a comment block, it refuses to remove that comment block.

So 

>"this is all /* not a */ comment" 

is correctly left untrimmed, but so is

> "this is " /*partially a */ "comment".


I wouldn't mind writing a trimmer that handles these a little better, but I guess at some point we might as well just use (or port) a fully-developed trimmer. For the purposes of testing, the attached patch does produce a working browser.

There's also a few special cases I wanted to ask about:
- Taras, you said that '/* comment */' needs to be replaced with '/**/', not with ''. Why is this?
- In bug 463066, '//@' and '-*-' are left alone. Why is this?
Incidentally, here is a really perverse test-case in which the code in bug 463066 trims too aggressively:

> #quoteslinecomment:before {
>     content: " \
> //before cotent";
> }

(Expected result is that "//before content" shows up before the content in the div with id quoteslinecomment, but the trimmer in 463066 will eat line 3)

So again, I think if we're going to get into special cases, there's some more work to be done in getting it right.
I couldn't find anything about //-style comments in the CSS spec. Should I trim those as well...?
Attachment #400167 - Attachment is obsolete: true
Attachment #400428 - Flags: review?(tglek)
(In reply to comment #8)

> I couldn't find anything about //-style comments in the CSS spec. Should I trim
> those as well...?

if you see them in our files, i don't see why not
Attachment #400428 - Attachment is obsolete: true
Attachment #400428 - Flags: review?(tglek)
Attachment #400546 - Flags: review?(tglek)
//-style comments don't show up in our css code, so far as I could tell. Also, the c++ scanner doesn't seem to consider them as comments.
By the way, there's a comment in the CSS Parser that says:

> // XXXldb We need to make the scanner not skip CSS comments!  (Or
> // should we?)

(http://mxr.mozilla.org/mozilla-central/source/layout/style/nsCSSParser.cpp#1183)

which seems like something I should ask about when he gets back.
This ends up saving 20K of css io and appears to be a 1-3% win on n810. Not sure if this is worth further work.

Can you update the patch in bug 510611 to apply on top of this?
Sorry, which patch are you referring to?

By the way, this patch preserves all newlines (as requested above), so there could be some additional savings if we strip those as well.
i was testing with first patch you r?ed me on in this bug and bug 512272 utf16 zerocopy stuff(sorry, got wrong bugno in prev comment)
And yes, lets try this patch with newline stripping too.
Attached patch trims newlines as well (obsolete) — Splinter Review
I'll update the patch in the other bug as well.
Attachment #400574 - Flags: review?(tglek)
Attachment #400574 - Flags: review?(ted.mielczarek)
Did some more tests. I looks like this patch speeds up fennec startup on n810 by 1-4%. The mmap zerocopy stuff(bug 512272) does not speed things up significantly, but it does make numbers slightly more stable.

I think it would be worthwhile to support css comment stripping. Once lower-hanging fruit like bug 511761 are fixed, we should do this too.
Ran some OS X tests on my Mac mini but the difference wasn't greater than the normal noise I see between runs.  Can't say whether it helps or hurts here.
Whiteboard: ts → [ts]
OS: Linux → All
Hardware: x86 → All
Do we want to keep working on this? FWIW, I can't think of any reason that this would ever make performance _worse_, and it definitely decreases CSS size. The only real downside is that it makes the jar'd CSS files less human-readable. (If we keep the newlines, though, it doesn't make them that much less readable.)
We're going to be pushing out onto ever more mobile platforms, and other resource-constrained devices, so projects like this are absolutely worth doing.
Comment on attachment 400574 [details] [diff] [review]
trims newlines as well

I'd rather have Pike review JarMaker changes.

Would be nice to see a unittest for this code, see the other examples in config/tests.
Attachment #400574 - Flags: review?(ted.mielczarek) → review?(l10n)
Comment on attachment 400574 [details] [diff] [review]
trims newlines as well

r-, on various things.

Firstly, I made a mistake in naming modules with CamelCase, they should be camelCase, to be in sync with what python does.

I guess that having trimCSS.py as a standalone script is good for testing? If so, it should read from stdin if no input filename is provided. In that case, it should write to stdout, too.

I'm not so fond of the file close stuff, but I didn't come up with something better.

You shouldn't have to use a temporary file, though, cStringIO should be good enough and less disk-prone.

I'd appreciate the death of [-4:=] == '.css' in favor of endswith() (my bug elsewhere).

The indention in the conditionals is a bit off, the whitespace after \ doesn't align with anything.

I'm not sure that all the functions should be there, or at least if they should be top-level functions.

firstMatchOf could be just
min(map(lambda m: m and m.start or None, [strMatch1, strMatch2, cmtMatch]))

trimCSS could be a generator, not sure if that makes the code prettier or not.

If I read the bug right, keepNewLines should be True in practice.

I'm not sure if we should make the CSS compression behaviour an unconditional behaviour, in particular for external users.

I'm not sure if folks that have set their jar format to symlink want this on.

Even for regular jars, it should probably only overwrite if newer, as there are no defines etc going into the preprocessing that force us to do that unconditionally.

And more comments, at least file-wide and for exposed methods, please. Right now, you have to reverse-engineer regexps to figure out what trimCSS does.
Attachment #400574 - Flags: review?(l10n) → review-
Assignee: nobody → bhsieh
Attached patch addresses some comments (obsolete) — Splinter Review
>Firstly, I made a mistake in naming modules with CamelCase, they should be
>camelCase, to be in sync with what python does.
done.

>I guess that having trimCSS.py as a standalone script is good for testing? If
>so, it should read from stdin if no input filename is provided. In that case,
>it should write to stdout, too.
done.

>I'm not so fond of the file close stuff, but I didn't come up with something
>better.
>
>You shouldn't have to use a temporary file, though, cStringIO should be good
>enough and less disk-prone.
Switched to cStringIO as suggested.

>I'd appreciate the death of [-4:=] == '.css' in favor of endswith() (my bug
>elsewhere).
Okay.

>The indention in the conditionals is a bit off, the whitespace after \ doesn't
>align with anything.
This should be fixed now, or at least I don't see any misaligned in my patch now.

>I'm not sure that all the functions should be there, or at least if they should
>be top-level functions.
Made one private, removed two (just inlined the functionality)

>firstMatchOf could be just
>min(map(lambda m: m and m.start or None, [strMatch1, strMatch2, cmtMatch]))
This isn't exactly what I needed, since I needed the index of the match and also didn't want to choose None unless there was no other option. I tried to stick with the spirit of the suggestion, but it came out a little ugly:

min([strMatch1, strMatch2, cmtMatch],
    key = lambda m: (m and (m.start() or -1)) \
                     or sys.maxint)

Not sure if the conciseness is worth the weirdness here.

>trimCSS could be a generator, not sure if that makes the code prettier or not.
I don't think it would, seems like you'd just end up wrapping it in a loop to write it out anyways.

>If I read the bug right, keepNewLines should be True in practice.
Not sure, but I changed it to be on (keeping newlines) now. Anyone else have an opinion? Throwing out newlines makes for a smaller file, of course...

>I'm not sure if we should make the CSS compression behaviour an unconditional
>behaviour, in particular for external users.
I wasn't sure what to do here. I can make the JarMaker.py take a flag to minify, and then modify our build files to pass in that flag. Is that what we want to do?

>I'm not sure if folks that have set their jar format to symlink want this on.
>
>Even for regular jars, it should probably only overwrite if newer, as there are
>no defines etc going into the preprocessing that force us to do that
>unconditionally.
Okay. 

>And more comments, at least file-wide and for exposed methods, please. Right
>now, you have to reverse-engineer regexps to figure out what trimCSS does.
Added some comments and reduced visibility of methods (only trimCSS should be exposed). 

----

Main question I have is how to address this:
>I'm not sure if we should make the CSS compression behaviour an unconditional
>behaviour, in particular for external users.

Will look into unit tests, also. I have some test inputs written, hopefully I can figure out the rest from examples.
Attachment #400546 - Attachment is obsolete: true
Attachment #400574 - Attachment is obsolete: true
Attachment #400546 - Flags: review?(tglek)
Attachment #400574 - Flags: review?(tglek)
Sorry, 

>This isn't exactly what I needed, since I needed the index of the match and
>also didn't want to choose None unless there was no other option.

isn't exactly right. I meant I needed the match object itself (the first of the three to occur in the string).
Attached patch adds unit test (obsolete) — Splinter Review
The only comment I haven't addressed should be:
>I'm not sure if we should make the CSS compression behaviour an unconditional
>behaviour, in particular for external users.

Again, I can add flags to JarMaker.py and modify the build scripts, but in some sense that's still "unconditional". Also, I'm not sure what an "external user" in this case is.

Could you clarify these points for me?
Attachment #402652 - Attachment is obsolete: true
Attachment #402696 - Attachment is obsolete: true
Attachment #403908 - Flags: review?(l10n)
Whiteboard: [ts] → [ts][has patch][needs review l10n]
So... Is there a reason we're inventing a poor-man's CSS parser here instead of using our own and then just serializing the result?

From a brief skim of the code, this looks like it will change the set of things matched by this CSS selector:

  hb/* a comment */ox

(and in particular, will change it from an invalid selector into a selector matching <hbox>).  I believe I'd mentioned this case to Taras before.  I can probably come up with more fun edge cases like that if you like...
Attachment #403908 - Flags: review?(l10n) → review-
Comment on attachment 403908 [details] [diff] [review]
adds a flag to JarMaker.py to specify minify

This patch needs technical updates for some recent landings to JarMaker.py and the Makefile. The tests should probably use the helper in mozunit.py, which outputs python unit tests in the way that our infra likes. See unit-JarMaker.py for a use case.

I think that the prime reviewer for this should be one or more of our CSS experts though, who are now on the bug.
Whiteboard: [ts][has patch][needs review l10n] → [ts][needs patch][needs review css]
(In reply to comment #28)
>   hb/* a comment */ox
> 
> (and in particular, will change it from an invalid selector into a selector
> matching <hbox>).

The selector rules hurt my head, but isn't "hb/**/ox" equivalent to "hb ox"?
The token stream for that is IDENT, IDENT (since comments are ignored but do separate tokens).  The token stream for "hb ox" is IDENT, S, IDENT.  The selector production is:

  simple_selector [ combinator selector | S+ [ combinator? selector ]? ]?

where combinator is:

  '+' S* | '>' S*

IDENT is a perfectly good |simple_selector|.  But then we match neither the following IDENT matches neither |combinator| nor |S+|, so |selector| only consumes the first IDENT.  The |ruleset| production requires commas between |selector|s and a '{' after them and an IDENT is neither, so this gives a parse error.
Keywords: perf
Blocks: 447581
bsmedberg suggests that if we want to speed up css startup stuff in the future, we probably want to store a token stream or similar (which we create at build time, and compile into libxul). Stripping out whitespace is probably all that can be done with an approach like this one, even removing comments will most likely require a tokenizer.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.