bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

bytecode compression for xml

RESOLVED WONTFIX

Status

()

Core
XML
P2
normal
RESOLVED WONTFIX
18 years ago
11 months ago

People

(Reporter: Alec Flett, Assigned: Alec Flett)

Tracking

({perf})

Trunk
Future
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

18 years ago
I've been fiddling with an xml parser, and may have found a nice way to tokenize
 xml into a faster-loading bytecode... this could make loading XUL much faster.

the basic idea is this: xml is a very verbose language that often has as much
overhead in syntax as it has raw data. It would be very easy to atomize tags and
attributes, and serialize well-formed xml into a more compact format on disk,
which could be further compressed by a true compression algorithm such as that
found in zip or gzip.

tags, attribute names, <, >, = take up about 35% of an average XUL file. Early
analysis based on existing xul suggests that we could cut the space used by tags
and attribute names by about 60%, an overall savings of about 20% of the file.
My theory is that this atomized file will actually compress even better than
existing xml.

This analysis does not include stripping of xml comments (like the license,
which takes up about 800 bytes per file) or whitespace.
(Assignee)

Comment 1

18 years ago
reassingning to me (wanted to get the default XML owner cc'ed)
Assignee: heikki → alecf

Comment 2

18 years ago
What if we just used compression on the JAR files?
The .jar files are already compressed. I think the goal here is to save parser 
time rather than disk space.

Comment 4

18 years ago
Gotcha. alecf, how much of an issue is parser time?
(Assignee)

Comment 5

18 years ago
it's a bit of both.
I think I can warp XML into a format that will compress better, and will be
parse faster... this makes the jar files (already compressed) smaller..

further analysis with a larger sample (gotta love perl5's XML::Parser) seems to
indicate that I could actually compact the XML markup by about 70%
the average XUL file is
22% whitespace and comments
27% XML markup (tags, attribute names, and <,>, =, and ")

with the xml-to-bytecode compiler, we could eliminate the comments and 70% of
the 27% markup, a grand total of 30% of the file, before compression.

this is kind of a blue-sky sort of thing, so marking mozilla 1.1 for now
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla1.1
(Assignee)

Comment 6

18 years ago
one way this will make parsing faster is by atomizing the strings. This will
greatly reduce the number of allocations done by the parser's tokenizer because
the tokenizing will be done at compile-time.

Comment 7

18 years ago
Another blue-sky way to address this problem (parsing XUL takes time) is just to 
compile the XUL all the way into the structs (nsXULPrototypeElement, etc.) that 
you actually want to have around in memory. Then you have no parsing to do at 
all: you just do a binary read and relocation fixup. (If I understand correctly, 
this is how `.fasl' files work in many common lisp implementations.)

gagan has suggested that it would be possible to write a cache module (in the 
New New Cache Architecture, of course) that could do this sort of thing.
alecf's idea sounds a lot like WML (the language used in WAP applications). They
replace tags with numbers (or something like that, been a while...) so that a
WAP server needs to transmit less and a WAP browser can have simpler parser. If
you want to pursue this I'd advice you have a look at some of the WAP/WML
docs... [By the way, the ViewPort SGML/HyTime engine could also store SGML into
a fast loading binary format that was readable only by applications based on
ViewPort. The speed improved typically by a factor of 10.]

However, I find waterson's idea more appealing. If we could compile XUL into a
binary we could directly load into memory it would be even better.

Regardless of the approach I believe we should have the original XUL in normal
XML as it is now, and have some sort of cache for the compiled, fast-loading
versions. I wouldn't want to lose the benefits of XML here. Maybe this has been
implied all along, I just wanted to make it explicit.
(Assignee)

Comment 9

18 years ago
waterson - that sounds pretty cool. as far as this goes, I have two goals here,
in this order:
1) reduce download size
2) speed up reading of XUL from disk..

while the idea of reading in structs from disk does appeal to me, it looks like
WML is really what I should be looking at (thanks for the reference)

As for caching vs. distributing raw data is concerned, since my biggest concern
is reducing disk/download footprint, I would rather not distribute xml as it is
today... 0.1% of consumers may want to unpack .jar files and muck with the
contents, but the other 99.9% want a fast browser..

My preference is for:
1) distributing some sort of XP compacted format, perhaps wml
2) permentantly caching this compressed wml in the new new cache architecture as
structs, like waterson said, so that it's even faster.

Anyway, before I read these comments I fiddled with my perl program a bit more
and determined that I could compact the files to save about 36% overall,
pre-compression. assuming 50% compression (my format is still very compressable)
we could get it down to about 32% of it's original size.

Now to explore wml.

Comment 10

18 years ago
Yep, with alecf's investigations, looks like just going a further step ahead
with waterson's pre-compilation will give maximum performance... no parsing and
related chords; just load, and do some pointers arithmetic/initialization, and
all is ready to go at full gear!

Keeping the original files (as heikki reminded), and some other code paths
to still be able to handle things the old text way, is all that is needed to
get set... sounds really appealing, and may cause a great divide amongst those
who keep complaining about the overhead/speed of XUL :-)
Keywords: perf

Comment 11

18 years ago
Of course, as alecf noted, the original files need not be shipped -- except in
cvs and debug builds... as usual, developers get all the crap...

Comment 12

17 years ago
nav triage team:

This would be way cool, but not a beta stopper ;-) Marking nsbeta1-
Keywords: nsbeta1-
(Assignee)

Updated

17 years ago
Target Milestone: mozilla1.1alpha → Future

Updated

16 years ago
QA Contact: petersen → rakeshmishra

Comment 13

16 years ago
Hoping to wake-up interest in this bug, because I have a real customer intranet
application that needs help. The problem is the app is designed around a sort of
homegrown 'GUI framework', implemented as XML/XSL library. The pages use
xsl:import href= for the library files. In this way, then, even a simple page,
builds up a 170K xml document. It takes a Very Long Time to parse through this
document. Actual 'execution time' is very small compared to parse time. Problem
is it is all re-parsed each use, nothing much is optimized for subsequent reuse.
(as opposed to IE xml engine which is much faster on reload of these docs). 

Okay, a problem here is the customer considers their app confidential and they
do not want it posted in public domain, so I can't put the testcase here, at
least for now. 

I am far from being expert in XML stylesheet usage. Would also appreciate any
ideas about how to speed up the app itself on Mozilla. Suggesting to them the
obvious answer, use more granular 'xsl library files' and importing only whats
needed as needed... gets the same old answer... "IE does not have a problem, it
is mozilla that has the problem, not our application." ... I hate when that
happens. 

Appreciate any comments or suggestions........ 

Updated

15 years ago
QA Contact: rakeshmishra → ashishbhatt
Given fastload, how much of an issue is this?
QA Contact: ashshbhatt → xml

Updated

11 months ago
Status: ASSIGNED → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.