Closed Bug 1392928 Opened 7 years ago Closed 3 years ago

Consider to keep unused Ion LifoAllocs because freeing it is not cheap

Categories

(Core :: JavaScript Engine, defect, P3)

x86_64
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Performance Impact medium
Tracking Status
firefox57 --- affected

People

(Reporter: ting, Assigned: nbp)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

I saw LifoAlloc::freeAll on both VTune and WPA, the cost is mainly in VirtualFree and memset for poisoning. I wonder 1) can we have a unused BumpChunks linked list to keep them instead of freeing them directly, 2) is poisoning necessary?

xul.dll!js::jit::FreeIonBuilder
  xul.dll!js::LifoAlloc::freeAll
    mozglue.dll!free_impl
      mozglue.dll!arena_run_dalloc
        mozglue.dll!arena_purge [1]
          mozglue.dll!pages_decommit
            KernelBase.dll!VirtualFree
      vcruntime140.dll!memset [2][3]

[1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/memory/mozjemalloc/mozjemalloc.cpp#2967
[2] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/memory/mozjemalloc/mozjemalloc.cpp#3562
[3] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/memory/mozjemalloc/mozjemalloc.cpp#3636
(In reply to Ting-Yu Chou [:ting] from comment #0)
> I saw LifoAlloc::freeAll on both VTune and WPA, the cost is mainly in
> VirtualFree and memset for poisoning. I wonder 1) can we have a unused
> BumpChunks linked list to keep them instead of freeing them directly, 2) is
> poisoning necessary?

(2) is redundant, there won't be poisoning if we don't free them.
Assignee: nobody → janus926
Attached patch experiment patch (obsolete) — Splinter Review
On my desktop, running https://mozilla.github.io/arewefastyet-speedometer/2.0/ with attachment 8904212 [details] [diff] [review] I got:

  83.88, 81.81, 89.16, 89.65, 87.98

m-c 1401e3eec44d got:

  86.78, 87.84, 87.20, 87.68, 82.13
With this experiment patch, I got following numbers on Ubuntu for https://mozilla.github.io/arewefastyet-speedometer/2.0, which shows Speedometer 2.0-r221659:

  46.36, 45.55, 45.95, 45.45, 45.97

and without the patch:

  43.91, 43.73, 43.68, 44.59, 43.40

The experiment patch does not free any BumpChunk in the pool. But there should have a limitation like ChunkPool [1]. I am thinking to free chunks in BumpChunk::delete_ if the pool reaches its size limit and can't take any more chunks, not to free them when GC so we don't free many of them at once.

What do you think?

[1] http://searchfox.org/mozilla-central/rev/67f38de2443e6b613d874fcf4d2cd1f2fc3d5e97/js/src/jsgc.cpp#726
Attachment #8904212 - Attachment is obsolete: true
Attachment #8905421 - Flags: feedback?(luke)
Whiteboard: [qf]
(In reply to Ting-Yu Chou [:ting] from comment #4)
> With this experiment patch, I got following numbers on Ubuntu for
> https://mozilla.github.io/arewefastyet-speedometer/2.0, which shows
> Speedometer 2.0-r221659:
>   46.36, 45.55, 45.95, 45.45, 45.97
> and without the patch:
>   43.91, 43.73, 43.68, 44.59, 43.40

The numbers on Windows with the patch are:

  43.87, 43.16, 43.59, 43.53, 43.41

m-c:

  41.99, 41.30, 41.53, 41.46, 41.98
Comment on attachment 8905421 [details] [diff] [review]
experiment patch v2

Thanks for the preliminary investigation here, but I don't think this is the right strategy to remove this overhead.

Normally, LifoAllocs *do* keep BumpChunks around during normal use which uses mark()+release().  freeAll() is explicitly intended to release memory with the goal of avoiding permanently keeping memory allocated in a long-lived LifoAllocs that never gets given back.  In particular, if you have a one time spike in usage, you don't want that to hang around forever.

So if you're seeing freeAll() hot in profiles, the solutions that make sense to me are:
 - can we call freeAll() less aggressively (e.g., waiting until GCs, and then, use the GC's background free thread to do the actual free() call)
 - can we create some freeExcessAllocation() that just frees down to a moderate allocation so that in most cases we're not freeing but still achieving the goal of not permanently wasting memory b/c of a one-time spike in usage.

Jan, WDYT?
Flags: needinfo?(jdemooij)
Attachment #8905421 - Flags: feedback?(luke) → feedback-
A good start to this would be to collect total size of these buffers for all compiles across a browsing session, and look for a good 80% or 90% percentile cutoff, and if that's reasonable, use that as the "standard" size.
Whiteboard: [qf] → [qf:p2]
(In reply to Luke Wagner [:luke] from comment #6)
> freeAll() is explicitly intended to release memory
> with the goal of avoiding permanently keeping memory allocated in a
> long-lived LifoAllocs that never gets given back. 

I think the problem here is the Ion compilation LifoAlloc and not freeAll() on long-lived LifoAllocs.

What we could do is have a free-list of 1 or 2 Ion LifoAllocs and then reuse them (making sure we don't waste a lot memory, of course). (Luke and I were actually discussing this a while ago.)
Flags: needinfo?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #8)
> What we could do is have a free-list of 1 or 2 Ion LifoAllocs and then reuse
> them (making sure we don't waste a lot memory, of course). (Luke and I were
> actually discussing this a while ago.)

Yeah, I still like this idea (just haven't gotten around to it yet).  (Except I was imagining a pool of up to numCpus LifoAllocs, culled during GC.)
Summary: Consider to keep unused BumpChunks when LifoAlloc::freeAll because freeing it is not cheap → Consider to keep unused Ion LifoAllocs because freeing it is not cheap
Keeping Ion LifoAllocs means the BumpChunks in different LifoAlloc can't be shared even though they're not used.
Priority: -- → P3
Assignee: janus926 → nobody
Keywords: perf
Whiteboard: [qf:p2] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]
Whiteboard: [qf:f61][qf:p1] → [qf:f64][qf:p1]
Jan: Do you know if this is still a issue or if it was already fixed in another way (or seperately)?  Luke thought there was maybe a chance this could have been fixed already.
Flags: needinfo?(jdemooij)
(In reply to Steven DeTar [:sdetar] from comment #11)
> Jan: Do you know if this is still a issue or if it was already fixed in
> another way (or seperately)?  Luke thought there was maybe a chance this
> could have been fixed already.

The fact that we do not have a pool of LifoAlloc chunks is still be an issue.

Also, The current patch is out-dated since I rewrote the BumpChunk structure to ensure that we have a single owner.

I think a simpler solution would be to transfer the unused compilation chunks to the next Ion compilation, and reclaim any unused chunks at the end of each compilation.
Flags: needinfo?(jdemooij)
We now free the Ion LifoAllocs off-thread so that at least avoids the problem in comment 0. I agree it might be nice to reuse (a small number of) allocators or pages.
Whiteboard: [qf:f64][qf:p1] → [qf:p1:f64]
I think Bug 1489572 might be a better approach than reusing existing LifoAlloc pages, by reducing the number of Bumpchunk allocations.

This should remove the VirtualFree, but keep anything which attempt to iterate over the pages to poison it.
I've been doing some profiling work today, re-evaluating some old QF bugs.  As :nbp noted earlier in a meeting today, these _are_ showing up in profiles in a reasonably significant way.

I left my notes at work, so I'll post an update tomorrow with details of where I observed it.
When I was investigating LifoAlloc related crashes, I noticed that TI, IonMonkey and the parser were the main consumers. TI is probably the most offending one at it will create a new LifoAlloc and move everything to the new one, in order to ""sweep"" the unused data. Using Bug 1489572 will probably solve this issue.

Another aspect of TI uses of LifoAlloc is the fact that it relies on resizable data structures, which would cause a lot of memory to be wasted.
Flags: needinfo?(nicolas.b.pierron)
Assignee: nobody → nicolas.b.pierron
Whiteboard: [qf:p1:f64] → [qf:p2]
As expected Bug 1489572 reduces the number of chunks allocations by a factor of a 1000 for small block sizes such as TI, and should subsequently reduce the number of free that we have to do.

Once Bug 1489572 is unblocked, we should reevaluate this issue to highlight that the signature reported in comment 0 disappeared as expected.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> Once Bug 1489572 is unblocked, we should reevaluate this issue to highlight
> that the signature reported in comment 0 disappeared as expected.

Needinfo bas to re-evaluate, as I recall you used VTune recently, and that you might be able to reproduce the analysis described in comment 0.

If this signature disappeared or is not interesting anymore, then we should close this issue as a duplicate of Bug 1489572.
Flags: needinfo?(bas)

I don't have current data on this.

Flags: needinfo?(bas)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Performance Impact: --- → P2
Whiteboard: [qf:p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: