Reduce QI overhead for snow white objects

RESOLVED FIXED

Status

()

enhancement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mccr8, Assigned: mccr8)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Assignee

Description

2 years ago
Ehsan says the overhead of doing QIs on snow white objects shows up in speedometer. He said "the real issue is the QIs in question need to scan all of the table of interfaces in the child class, then find nothing, then go to the base class and find the answers there".

Comment 1

2 years ago
Example profile: https://perfht.ml/2uYsYI7
Assignee

Comment 2

2 years ago
My idea for doing this is to treat nsISupports like we do native classes. Though I recall there are some gotchas for why subclasses don't work with our approach for native classes, so I'll have to remember what that is...

Updated

2 years ago
Assignee

Updated

2 years ago
See Also: → 802442
Assignee

Comment 3

2 years ago
I've started looking at this. It relies on subclasses that implement their own participant also implementing their own addref and release. I think we can do dynamic checks for that. Hopefully that won't be too expensive for debug builds.

The problem I was thinking of in bug 802442 only applies if we get rid of canonicalize participant entirely. It'll be harder to get rid of for NoteXPCOMChild, because in that case we know nothing. Maybe we could have a class that all CC'd nsISupports classes inherit from, and put some methods on that, but that would likely be quite a bit of work.
Assignee: nobody → continuation

Comment 4

2 years ago
FWIW bug 1384821 reduces the amount of time that NoteXPCOMChild takes up in profiles of Speedometer, if we decide to take the patch there.
Assignee

Updated

2 years ago
See Also: → 1384821
Assignee

Comment 5

2 years ago
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> Another possibility if we're willing to break binary compatibility of
> nsISupports is add an explicit nsISupports.getCycleCollectionParticipant
> method and bypass QI altogether.

I've started looking into this. While it is a huge change, it may require less less of code changed, and it'll let us get rid of these weird CC QIs altogether.
Assignee

Updated

2 years ago
Depends on: 1385459
Assignee

Comment 6

2 years ago
Posted patch prototypeSplinter Review
Here's a prototype. It at least compiles and starts the browser. I haven't done any speed tests yet.

It adds a new method Participant() to nsISupports, then uses it instead of the QI to nsXPCOMCycleCollectionParticipant in the CC. In theory we should be able to remove that QI entirely, though some other places may use it.

We still need QI to do the canonicalization of nsISupports during CC traversal. I'm not sure of a way to avoid that, besides adding another method to nsISupports. It might be possible to avoid ambiguous upcasts by doing virtual inheritance of all nsIFoo interfaces, and then we could just static cast to nsISupports when passing pointers to the various CC interfaces, but that would be a large change, and I don't know what the performance impact would be, if any.
Assignee

Updated

2 years ago
Depends on: 1384821
Assignee

Comment 7

2 years ago
Ehsan, do these QIs still show up in profiles? There have been two patches (one by you and one by me) to reduce this overhead.
Flags: needinfo?(ehsan)

Comment 8

2 years ago
My patches haven't landed yet (assuming you're mentioning bug 1384821 and bug 1384798).  Here is a profile from current m-c tip: https://perfht.ml/2vQyOvU  Not sure if this shows any differences compared to comment 1 that you expected?
Flags: needinfo?(ehsan)
Assignee

Comment 9

2 years ago
Oh, sorry, for some reason I'd thought that bug 1384821 had already landed.

It looks like we're still spending just as much time in the table driven QI from the purple buffer, so I guess my patch had no real impact. Maybe calling QI twice in a row on the same object is just as fast as calling it once?

Comment 10

2 years ago
(In reply to Andrew McCreight [:mccr8] from comment #9)
> Oh, sorry, for some reason I'd thought that bug 1384821 had already landed.
> 
> It looks like we're still spending just as much time in the table driven QI
> from the purple buffer, so I guess my patch had no real impact. Maybe
> calling QI twice in a row on the same object is just as fast as calling it
> once?

Yes it should be, because the second time it will be a well predicted branch so it should be almost free...

Comment 11

2 years ago
This is a profile from the latest trunk from the first ~150 subtests of Speedometer.  Things look better now: https://perfht.ml/2i43XpR

Not sure how much further we can improve things...
Assignee

Comment 12

2 years ago
That is a lot better! I guess bug 1384821 worked like magic. It looks like the residual overhead is all from scheduler label stuff. Maybe Bevis has some ideas of easy wins we could get here?
Flags: needinfo?(btseng)
I see only 1ms out of a 30 or so second profile.
Comment hidden (obsolete)

Comment 15

2 years ago
(In reply to Bill McCloskey (:billm) from comment #13)
> I see only 1ms out of a 30 or so second profile.

I think you're looking at the wrong content process perhaps?  Sorry I realized the generated link was misleading, this one is better: https://perfht.ml/2i2ZhQZ

I don't believe in magic.  :-) Reopening.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee

Updated

2 years ago
Flags: needinfo?(btseng)
Assignee

Updated

2 years ago
Depends on: 1390660
Assignee

Updated

2 years ago
Depends on: 1390692
Assignee

Comment 16

2 years ago
Ehsan, when you have a free moment (ha ha), I'd be interested to see if the table driven QI still shows up in profiles. I think I've made enough changes that nothing CC-related should show up for that.
Flags: needinfo?(ehsan)
I'll try to remember to profile this tomorrow.
Flags: needinfo?(bugs)
So I see 11ms QI over 15s period. That is 0.073%
In ehsan's profile there is 16ms over similar period of time. 

So there is something there, but hard to say whether anything to optimize in QI implementation, or should we try to make canSkip and content blasting more effective so that we wouldn't need to traverse so much?
Flags: needinfo?(bugs)

Comment 19

2 years ago
(In reply to Andrew McCreight [:mccr8] from comment #16)
> Ehsan, when you have a free moment (ha ha), I'd be interested to see if the
> table driven QI still shows up in profiles. I think I've made enough changes
> that nothing CC-related should show up for that.

Here is a profile from a build from last Thursday before I went away which does have all of the recent optimizations: https://perfht.ml/2jrGPCd

Now all of the major offenders that I was seeing previously are gone, and the amount of time spent in NS_TableDrivenQI is also reduced drastically, and if you pay close attention to the remaining callers, I don't think any of them would matter as far as Speedometer is concerned really.  So for benchmark purposes I think this bug is all done I think.  :-)  Thanks!
Flags: needinfo?(ehsan)
Assignee

Comment 20

2 years ago
Thanks for checking.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.