Closed Bug 24762 Opened 25 years ago Closed 24 years ago

optimize RDFGenericBuilder::Is[Container|Empty]()

Categories

(Core Graveyard :: RDF, defect, P2)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: waterson, Assigned: waterson)

References

Details

(Keywords: perf, Whiteboard: [PDT-])

Attachments

(2 files)

This accounts for approximately 13% of the time that it takes to open a large 
mail folder.
Status: NEW → ASSIGNED
Keywords: perf
Priority: P3 → P2
Target Milestone: M14
Blocks: 20192
Keywords: beta1
PDT agrees that "loading a large folder faster" is a goal.  The bug should try 
to say how slow we are relative to 4.x, and what our goal is.  It is always 
scary to see a bug that just says "optimize a specific function" when it is not 
clear that the function *can* be optimized, and not clear what the overall time 
goal is (and hence, even if we wanted to optimize, when should we say enough is 
enough?).

We do want faster performance... and bugs that say what they are after would get 
a PDT+ if we're outside spec, but we couldn't grok this from the bug statement.  
We wouldn't hold beta if this function was not optimized... but we would hold 
beta if our download times are too poor.  That's why we ended up going PDT- on 
*this* bug (don't read it as "performance is not important"!)

Hope that makes sense,

Jim
Whiteboard: [PDT-]
jar: this bug was filed as part of a fan-out of work that scottip, rjc, and i 
discovered while profiling "open" for a large (~2000 message) folder. if you 
*read the bug*, you'll see that 13% of the time is spent in these functions. go 
look at the mail performance stats. what else do you need to know?
add jar to cc list. jar: if you're gonna comment on bugs, add yourself to the cc 
list.
I just wanted to verify Chris's stats.  I just ran Quantify on a 5000 msg folder 
and IsContainer was 12.96% of the time to load a folder.  On a Windows target 
machine 1000 msgs takes 5.28 seconds.  Assuming it's proportional, which it may 
not be, it would take about 26 seconds for 5000 messages. So we'd save close to 
3 seconds if we could eliminate most of the time.
Thanks for the feedback.  I think the beta spec called for less than a factor of
two difference in most email operations from 4.7.  I tried to time the loading
of my 5500 message inbox on my P300 box.  (I wasn't sure if that was even the
time you'd be after... as perchance I'm supposed to blow away prior mail load...
but this seemed like a reasonable guess.)

On 5.0, it took about 14 seconds to get into the folder.
On 4.7 it took about 5 seconds to get into the folder.
I was running the 1/31 comercial build.

Hence this performance is well outside the range for the beta stopper.  I'm
pretty sure this will make PDT+ with that info... even though I wouldn't have
though this performance was that bad from my gut (it measured out as being
pretty different... but I never noticed this as a user :-/  ).

For other performance bugs... to make the issue clearer... it would help if you
would quote the current ratio to 4.7.   Thanks.

I removed the PDT- to get the bug reviewed again.

Whiteboard: [PDT-]
Keywords: beta1
Whiteboard: [NEED INFO]
nominating for beta1

4.7 Loading a 1000 msg pop folder = 1.93
5.0 = 7.03
Keywords: beta1
Putting on PDT+ radar for beta1.
Whiteboard: [NEED INFO] → [PDT+]
Whiteboard: [PDT+] → [PDT+] 2/26
Whiteboard: [PDT+] 2/26 → [PDT+] 3/3
Waterson indicated that this will not be completed by the original 3/3 cutoff
time we assigned, I am removing the PDT+ designation for reconsideration.
Whiteboard: [PDT+] 3/3
PDT- if no time to look at this
Whiteboard: [PDT-]
adding myself to CC. I actually rewrote this function a long time ago and put my
stuff on a branch, I'll see if I can recover the branch/remember what my
optimization was.
ok, on a positive note, someone optimized IsContainmentProperty by forcing the
containment attribute to be at the template root, so that's good.
It looks like IsContainer itself is doing what it should - expensive yes but
right now I'm not sure IsContainer itself can be sped up. However, there are two
places we call IsContainer. One is IsTemplateRuleMatch where we call it inside a
loop with parameters which are invariant during the loop.

I'm attaching a patch to optimize IsTemplateRuleMatch which was calling
IsContainer in a loop on the same element/rule over and over. the patch caches
the IsContainer result outside the loop.

hold the phone I looked at my changes on the branch and now I remember what my
optimizations were.  I'm rewriting them on the tip - I was able to speed up
IsContainmentProperty pretty signifigantly, which speeds up both IsContainer and
IsEmpty() and allows us to re-enable the parent-searching code that was
commented out from IsContainmentProperty(), without any loss in big-O time.

Almost done with my patch, I'll put it here in a bit.
ok, so what you see here is that I've pulled alot of operations out of loops -
essentially pre-calculating things like the containment attribute for a node,
outside of loops where they're needed...in the process I changed the IsEmpty()
and IsContainer() functions to take this pre-calculated value, so that they
didn't have to do the work themselves... the old code that was commented out is
now in GetContainmentAttribute, but now that code is called only where
appropriate, not inside loop iterations.

oh, this essentially makes IsContainmentProperty a constant time operation
INCLUDING support for containment= attributes on any node in the parent chain.

I think we can optimize this even further... looking into that now.
one way that would definately speed this up is if we added APIs to the
nsIRDFDataSource (!) called HasArcLabelOut(nsIRDFResource *aSource
nsIRDFResource *aProperty, PRBool *aResult) - essentially asking if the given
arc label would have been returned in the array of arcs...
Yes, the HasArcOut function would be useful.  When Chris and I were originally 
talking about this it looked like that would speed us up a lot since almost all 
of the time in this function is spent iterating over the arcsout enumerator when 
all we really want to know is if it has the containment property.  Obviously one 
optimization we could make would be to return the containment properties first 
in mailnews but I'm sure that might slow something else down when in fact the 
HasArcsOut function would be even better. Chris had sent around a bunch of 
suggestions for how this might work with pros and cons of each one but I don't 
remember what they were.
Yeah, we could add HasArcLabel[In|Out]. The only thing that turns my stomach is 
that the real semantics are "could possibly have this arc label, but maybe 
doesn't right now". Ugh.

Another idea would be to add some kind of container-manipulation interface that 
you could get to from an nsIRDFDataSource.
Target Milestone: M14 → M15
Target Milestone: M15 → M16
Add arena-based fixed-size allocator for nsInMemoryDataSource and 
nsCompositeDataSource to reduce time spent in ArcLabelsOut, etc. Quantify sez 
this shaves off about 1/3 of the time spent in this routine. Quantify also sez I 
can get another big boost by recycling nsAdapterEnumerator objects, so I'm 
poking with that now.
Allright. I'm going to mark this as fixed, even though there is more work that 
can be done. I'm going to tackle -that- work under that auspices of bug 35022, 
creating a better interface for dealing with RDF containers.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
marking verified per engineer's comments - profiler he is running shows 
improvement
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: