Closed Bug 691001 Opened 11 years ago Closed 11 years ago

E4X performance enhancements


(Core :: JavaScript Engine, defect)

Not set





(Reporter: andrew, Assigned: andrew)



(3 files, 3 obsolete files)

Simple analysis in profiling tools of "new XML()" being passed a ~500k XML document show a few low-hanging fruit that drastically help reduce CPU time and memory usage. I'll attach patches for the simplest fixes I can see which actually make a big difference in real-world usage.
This patch addresses two simple issues:

1) XMLToXMLString() is a recursive function that calls GetBooleanXMLSetting as its first order of business to check if pretty-printing is enabled. When toString()-ing a large XML object, this winds up getting called N times where N is the number of E4X objects. This simply hoists the property check up a level so that it is not checked on every recursive call.

2) When GetNamespace() is initially called with a NULL set of ancestor namespaces, the function uses an empty namespace array. This has a side-effect of *never* producing a match for the default namespace assigned to every object (""). Thus, every call to GetNamespace() allocates a *new* Namespace object even though they all have the default namespace string (""). This simply creates a new Namespace object with "" values and inserts it into the empty array object in this case. The end result is that only one Namespace object is created for the entire traversal of the XML document. Previously N Namespace objects would be created for N E4X objects.
Attachment #563931 - Flags: review?(brendan)
This patch prevents the creation of a transient StringBuffer and a JSFlatString created from it on every AppendAttributeValue when performing toString(). This prevents triple-copying all attribute text as it is written out and memory pressure from doing so.
Attachment #563932 - Flags: review?(brendan)
Re-upped last patch, missed two NULL->false return value changes.
Attachment #563932 - Attachment is obsolete: true
Attachment #563932 - Flags: review?(brendan)
Attachment #563933 - Flags: review?(brendan)
Here are performance stats to give you an idea of how big a difference just these changes make. I applied these patches and also added this printf as the last thing in ToXMLString() prior to returning: printf("compartment %u\n", cx->compartment->gcBytes); My test loads a ~500k XML document into an object and then calls toString() on it 10 times. When running this test, I am also timing the wall & CPU time.

Without patches:

compartment 6365184
compartment 5775360
compartment 8212480
compartment 4509696
compartment 6946816
compartment 9379840
compartment 5611520
compartment 8044544
compartment 4329472
compartment 6766592
wall 9416.54ms user 9210.00ms

With patches:

compartment 4112384
compartment 4001792
compartment 4186112
compartment 4370432
compartment 3948544
compartment 4132864
compartment 4317184
compartment 4501504
compartment 4079616
compartment 4263936
wall 4689.59ms user 4440.00ms
Comment on attachment 563933 [details] [diff] [review]
AppendAttributeValue escapes values without creating transient StringBuffer and JSFlatString

Looks great, ask #jsapi to land for you -- traveling and super-busy, sorry I can't do it.

Attachment #563933 - Flags: review?(brendan) → review+
Assignee: general → andrew
Re-up patch as export with header for checkin.
Attachment #563933 - Attachment is obsolete: true
Attachment #569259 - Flags: review+
Re-up patch as export with header for checkin.
Attachment #563931 - Attachment is obsolete: true
Attachment #563931 - Flags: review?(brendan)
Attachment #569260 - Flags: review+
Keywords: checkin-needed
Andrew, in the future could you please include the bug number in the checkin comments?

Flags: in-testsuite?
Keywords: checkin-needed
Target Milestone: --- → mozilla10
Your patch is a great optimization!

Not sure how much time EscapeAttributeValueBuffer takes, but is it worth to optimize EscapeAttributeValueBuffer, so that chunks of non-escaped chars are append in one go?  Appending an array of chars results in a memcpy, but doing one by one with length checks in each s much more costly.
My optimizations were guided by Quantify. After applying the patch it didn't show up as a hotspot. I suppose things can always be made faster but I was just focusing on small wins which were grossly affecting the run time.
You need to log in before you can comment on or make changes to this bug.