Closed
Bug 69426
Opened 24 years ago
Closed 8 years ago
bytecode compression for xml
Categories
(Core :: XML, defect, P2)
Tracking
()
RESOLVED
WONTFIX
Future
People
(Reporter: alecf, Assigned: alecf)
Details
(Keywords: perf)
I've been fiddling with an xml parser, and may have found a nice way to tokenize
xml into a faster-loading bytecode... this could make loading XUL much faster.
the basic idea is this: xml is a very verbose language that often has as much
overhead in syntax as it has raw data. It would be very easy to atomize tags and
attributes, and serialize well-formed xml into a more compact format on disk,
which could be further compressed by a true compression algorithm such as that
found in zip or gzip.
tags, attribute names, <, >, = take up about 35% of an average XUL file. Early
analysis based on existing xul suggests that we could cut the space used by tags
and attribute names by about 60%, an overall savings of about 20% of the file.
My theory is that this atomized file will actually compress even better than
existing xml.
This analysis does not include stripping of xml comments (like the license,
which takes up about 800 bytes per file) or whitespace.
Assignee | ||
Comment 1•24 years ago
|
||
reassingning to me (wanted to get the default XML owner cc'ed)
Assignee: heikki → alecf
Comment 2•24 years ago
|
||
What if we just used compression on the JAR files?
Comment 3•24 years ago
|
||
The .jar files are already compressed. I think the goal here is to save parser
time rather than disk space.
Comment 4•24 years ago
|
||
Gotcha. alecf, how much of an issue is parser time?
Assignee | ||
Comment 5•24 years ago
|
||
it's a bit of both.
I think I can warp XML into a format that will compress better, and will be
parse faster... this makes the jar files (already compressed) smaller..
further analysis with a larger sample (gotta love perl5's XML::Parser) seems to
indicate that I could actually compact the XML markup by about 70%
the average XUL file is
22% whitespace and comments
27% XML markup (tags, attribute names, and <,>, =, and ")
with the xml-to-bytecode compiler, we could eliminate the comments and 70% of
the 27% markup, a grand total of 30% of the file, before compression.
this is kind of a blue-sky sort of thing, so marking mozilla 1.1 for now
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla1.1
Assignee | ||
Comment 6•24 years ago
|
||
one way this will make parsing faster is by atomizing the strings. This will
greatly reduce the number of allocations done by the parser's tokenizer because
the tokenizing will be done at compile-time.
Comment 7•24 years ago
|
||
Another blue-sky way to address this problem (parsing XUL takes time) is just to
compile the XUL all the way into the structs (nsXULPrototypeElement, etc.) that
you actually want to have around in memory. Then you have no parsing to do at
all: you just do a binary read and relocation fixup. (If I understand correctly,
this is how `.fasl' files work in many common lisp implementations.)
gagan has suggested that it would be possible to write a cache module (in the
New New Cache Architecture, of course) that could do this sort of thing.
alecf's idea sounds a lot like WML (the language used in WAP applications). They
replace tags with numbers (or something like that, been a while...) so that a
WAP server needs to transmit less and a WAP browser can have simpler parser. If
you want to pursue this I'd advice you have a look at some of the WAP/WML
docs... [By the way, the ViewPort SGML/HyTime engine could also store SGML into
a fast loading binary format that was readable only by applications based on
ViewPort. The speed improved typically by a factor of 10.]
However, I find waterson's idea more appealing. If we could compile XUL into a
binary we could directly load into memory it would be even better.
Regardless of the approach I believe we should have the original XUL in normal
XML as it is now, and have some sort of cache for the compiled, fast-loading
versions. I wouldn't want to lose the benefits of XML here. Maybe this has been
implied all along, I just wanted to make it explicit.
Assignee | ||
Comment 9•24 years ago
|
||
waterson - that sounds pretty cool. as far as this goes, I have two goals here,
in this order:
1) reduce download size
2) speed up reading of XUL from disk..
while the idea of reading in structs from disk does appeal to me, it looks like
WML is really what I should be looking at (thanks for the reference)
As for caching vs. distributing raw data is concerned, since my biggest concern
is reducing disk/download footprint, I would rather not distribute xml as it is
today... 0.1% of consumers may want to unpack .jar files and muck with the
contents, but the other 99.9% want a fast browser..
My preference is for:
1) distributing some sort of XP compacted format, perhaps wml
2) permentantly caching this compressed wml in the new new cache architecture as
structs, like waterson said, so that it's even faster.
Anyway, before I read these comments I fiddled with my perl program a bit more
and determined that I could compact the files to save about 36% overall,
pre-compression. assuming 50% compression (my format is still very compressable)
we could get it down to about 32% of it's original size.
Now to explore wml.
Comment 10•24 years ago
|
||
Yep, with alecf's investigations, looks like just going a further step ahead
with waterson's pre-compilation will give maximum performance... no parsing and
related chords; just load, and do some pointers arithmetic/initialization, and
all is ready to go at full gear!
Keeping the original files (as heikki reminded), and some other code paths
to still be able to handle things the old text way, is all that is needed to
get set... sounds really appealing, and may cause a great divide amongst those
who keep complaining about the overhead/speed of XUL :-)
Keywords: perf
Comment 11•24 years ago
|
||
Of course, as alecf noted, the original files need not be shipped -- except in
cvs and debug builds... as usual, developers get all the crap...
Comment 12•24 years ago
|
||
nav triage team:
This would be way cool, but not a beta stopper ;-) Marking nsbeta1-
Keywords: nsbeta1-
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla1.1alpha → Future
Updated•23 years ago
|
QA Contact: petersen → rakeshmishra
Comment 13•23 years ago
|
||
Hoping to wake-up interest in this bug, because I have a real customer intranet
application that needs help. The problem is the app is designed around a sort of
homegrown 'GUI framework', implemented as XML/XSL library. The pages use
xsl:import href= for the library files. In this way, then, even a simple page,
builds up a 170K xml document. It takes a Very Long Time to parse through this
document. Actual 'execution time' is very small compared to parse time. Problem
is it is all re-parsed each use, nothing much is optimized for subsequent reuse.
(as opposed to IE xml engine which is much faster on reload of these docs).
Okay, a problem here is the customer considers their app confidential and they
do not want it posted in public domain, so I can't put the testcase here, at
least for now.
I am far from being expert in XML stylesheet usage. Would also appreciate any
ideas about how to speed up the app itself on Mozilla. Suggesting to them the
obvious answer, use more granular 'xsl library files' and importing only whats
needed as needed... gets the same old answer... "IE does not have a problem, it
is mozilla that has the problem, not our application." ... I hate when that
happens.
Appreciate any comments or suggestions........
Updated•22 years ago
|
QA Contact: rakeshmishra → ashishbhatt
![]() |
||
Comment 14•21 years ago
|
||
Given fastload, how much of an issue is this?
Updated•16 years ago
|
QA Contact: ashshbhatt → xml
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•