Optimize WrapperCache::getWrapper

NEW
Unassigned

Status

()

Core
DOM
P2
normal
3 months ago
2 months ago

People

(Reporter: ting, Unassigned)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---

Firefox Tracking Flags

(firefox57 affected)

Details

Profile shows following two places are taking time, I'd like to seek chances to improve them:

- Pointer chasing in EdgeNeedsSweepUnbarriered() from GetWrapperPreserveColor() [1]
- Slow ExposeObjectToActiveJS [2]

[1] http://searchfox.org/mozilla-central/rev/18c16ebf818abb86805ce08a6e537e4cd826f044/dom/base/nsWrapperCacheInlines.h#18
[2] http://searchfox.org/mozilla-central/rev/18c16ebf818abb86805ce08a6e537e4cd826f044/dom/base/nsWrapperCacheInlines.h#34
(In reply to Ting-Yu Chou [:ting] from comment #0)
> Profile shows following two places are taking time, I'd like to seek chances
> to improve them:
> 
> - Pointer chasing in EdgeNeedsSweepUnbarriered() from
> GetWrapperPreserveColor() [1]

Seems the result of IsInsideNursery() can be cached in nsWrapperCache::mFlags and updated when UpdateWrapper() gets called.

> - Slow ExposeObjectToActiveJS [2]

I've seen this a few times since I worked on bug 613498 (see bug 613498 comment 9), is it OK to return early if the object is marked as black?
Flags: needinfo?(jdemooij)

Comment 2

3 months ago
Hmm, do we have any spare bits in mFlags. But yes, we could have a flag telling that wrapper is in tenured heap (that is the common case).
(In reply to Ting-Yu Chou [:ting] from comment #1)
> > - Slow ExposeObjectToActiveJS [2]
> 
> I've seen this a few times since I worked on bug 613498 (see bug 613498
> comment 9), is it OK to return early if the object is marked as black?

Hm where? In ExposeGCThingToActiveJS? Note that we also have to check for incremental barriers there.
Flags: needinfo?(jdemooij) → needinfo?(janus926)
(In reply to Jan de Mooij [:jandem] from comment #3)
> Hm where? In ExposeGCThingToActiveJS? Note that we also have to check for
> incremental barriers there.

Yes, in ExposeGCThingToActiveJS. Then are there anything we can do to optimize for already "exposed" GC thing, like return early?
Flags: needinfo?(janus926) → needinfo?(jdemooij)
It seems that TenuredCell::readBarrier does two things: 1) IncrementReadBarrier if the zone is marking which goes to GCMarker::traverse to mark the thing black, 2) UnmarkGrayGCThingRecursively if the thing is marked as gray and UnmarkGrayTracer::onChild will mark them black. Doesn't this imply isMarkedBlack can be an early return condition for ExposeGCThingToActiveJS?
IsInsideNursery needs to access the last page of a Chunk (1MB) where the GC thing lives, which is cache unfriendly. Would it be faster by binary searching the starting addresses of chunks for |ptr &= ~js::gc::ChunkMask| (not sure how many chunks would there be normally)?
Forwarding to Jon for the questions in comment 5 and 6.

If you store isInsideNursery in nsWrapperCache::mFlags, as discussed in this bug, we could optimize ExposeGCThingToActiveJS by taking advantage of that, though.
Flags: needinfo?(jdemooij) → needinfo?(jcoppeard)

Comment 8

3 months ago
(In reply to Ting-Yu Chou [:ting] from comment #5)
> Doesn't
> this imply isMarkedBlack can be an early return condition for
> ExposeGCThingToActiveJS?

Yes, I think this would be correct.  I'm not sure how much improvement that would give you though - quite often this will be false even if there is nothing to do in this function.

> IsInsideNursery needs to access the last page of a Chunk (1MB) where the GC
> thing lives, which is cache unfriendly. Would it be faster by binary
> searching the starting addresses of chunks for |ptr &= ~js::gc::ChunkMask|
> (not sure how many chunks would there be normally)?

That doesn't sound like it would be faster.

Do you have a particular benchmark where this is showing up?
Flags: needinfo?(jcoppeard)
This is in the Speedometer tree to seems like it should be P2.
Priority: -- → P2
(In reply to Jon Coppeard (:jonco) from comment #8)
> Do you have a particular benchmark where this is showing up?

See bug 1380061 comment 1 and comment 9.
If we have nursery/tenured chunks ordered in memory like:

  |N0|N1|N2|N3|T0|T1|T2|T3|


then IsInsideNursery can be a simple comparison to N3's end or T0's start.
FWIW, VTune shows that never inline IsInsideNursery takes only 0.2% for running Speedometer2 on Linux.
(In reply to Ting-Yu Chou [:ting] from comment #11)
> If we have nursery/tenured chunks ordered in memory like:
> 
>   |N0|N1|N2|N3|T0|T1|T2|T3|

Unfortunately the chunks are not ordered like this, so this isn't possible.
You need to log in before you can comment on or make changes to this bug.