Closed Bug 598650 Opened 14 years ago Closed 14 years ago

OperationCallbacks and GCs occurring during SunSpider tests

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta8+

People

(Reporter: sayrer, Assigned: gwagner)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(5 files, 1 obsolete file)

This is much better than it was previously, but it's still occurring. We can't afford the impact of the variance introduced by this behavior. I'll attach a patch with my logging enabled.
STR:

Set the pref browser.dom.window.dump.enabled to true.

Apply the printf patch.

Visit http://people.mozilla.com/~sayrer/2010/08/logged-sunspider/results.html and click "run again".
Attachment #477506 - Attachment is obsolete: true
Comment on attachment 477526 [details]
sunspider log with more logging

Facts gleaned from the log:

We are executing a LastDitchGC on startup.

The GC is never called from the public API during a test, only between them.

LastDitchGC is sometimes called during the tests.

TriggerGC is called from NewGCArena during the tests.

If OperationCallbacks are triggered without a GC occuring, they do not penalize our score.
Run with outliers caused by GC during tests:

http://bit.ly/bLho1V

Same run, but with outliers changed to match the other values:

http://bit.ly/d4EfCw

For those following along at home, here's what this GC behavior is costing our total score:

** TOTAL **:           *1.169x as slow*  273.4ms +/- 2.6%   319.6ms +/- 7.9%
Assignee: general → anygregor
The check at the top of NewGCArena is responsible for all of the GCs during tests. We are basically bumping up on that 37.5MB limit a bunch of times.
blocking2.0: --- → ?
Gregor, we should fix this. Suggestions?
A few points:

The LastDitchGC is only on the very first startup after an update (When the new update site appears). I don't know what "extra" work the browser is doing there.
All the following starups shouldn't have a LastDitchGC.

The number of GCs we are running during the benchmarks should be reduced now after the compartmental GC tuning. We decided to wait for bhacketts patch for fine tuning. I will measure again.

We can change to a more aggressive working set strategy but this makes much more sense if we have a fixed upper limit like proposed in bug 592907.

I am working on bug 593729 right now. That will give us more insight on how we should tune our GC. 

So once 584917 and 593729 are landed we should measure again.
Blocks: 603050
Depends on: 603529
(In reply to comment #7)
 
> For those following along at home, here's what this GC behavior is costing our
> total score:
> 
> ** TOTAL **:           *1.169x as slow*  273.4ms +/- 2.6%   319.6ms +/- 7.9%

This is very similar to what I'm seeing (following along at home...nice).
To paraphrase something bz said in another bug... "Reporters don't run in shell".
Attached file current GC situation
Back to this...
That's the current situation for starting the browser and run the V8 benchmark suite. We call the GC very often from RefillTypedFreeList. We should avoid this.
And for the shell-harness:
call js_GC from js_DestroyContext
enter js_GC for cx: 0x100412b90,
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
{
    "v8-crypto": 229,
    "v8-deltablue": 311,
    "v8-earley-boyer": 898,
    "v8-raytrace": 478,
    "v8-regexp": 169,
    "v8-richards": 170,
    "v8-splay": 374

It seems that we call the API function between each benchmark. Nevertheless we also force a GC twice. We should get rid of it.
So the problem in the shell is not the GC tuning. We are running out of memory because of the heap size that is set for the shell. We either have to increase the heap size or we have to stick with the GC.
This brings us back to bug 592907 where I want to bring the shell settings closer to the browser settings.
(In reply to comment #15)
> So the problem in the shell is not the GC tuning. We are running out of memory
> because of the heap size that is set for the shell. We either have to increase
> the heap size or we have to stick with the GC.
> This brings us back to bug 592907 where I want to bring the shell settings
> closer to the browser settings.

I agree. We want the shell to be a good model for browser perf.
blocking2.0: ? → beta8+
Depends on: 606637
For running SS in the browser (starting the browser, running SS and closing the browser) I get following (note that GC calls might have more than one cause):
Total GC runs:                  22

JS_GC API calls:                10
LastDitchGC calls:              1
Call from AllocateArena:        0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc:      9
Calls because of shapeOverflow: 0
Calls from destroyContext:      3
Successful Maybe_GC calls:      0
Calls via operationCallback:    8

Increasing the gcMallocBytes to 128MB instead of 64MB leads to:
Total GC runs:                  21

JS_GC API calls:                18
LastDitchGC calls:              0
Call from AllocateArena:        0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc:      0
Calls because of shapeOverflow: 0
Calls from destroyContext:      3
Successful Maybe_GC calls:      0
Calls via operationCallback:    0

So we would move the GC trigger to the browser.

Running SS 6 times in the browser shows:
tip:             after:
[ms]   +-[%]      [ms]   +-[%]
216.7  3.0       209.3   0.9
222    5.0       211.2   1.1
221.5  4.5       218.1   5.9
223.6  4.6       209.8   0.6
218.6  3.6       209.4   1.1
222.6  4.9       210.1   1.0

We reduce the overall runtime and the variance but the individual results are all over the place:
TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.059x as fast    222.6ms +/- 4.9%   210.1ms +/- 1.0%     significant

=============================================================================

  3d:                  ??                 30.0ms +/- 1.6%    30.5ms +/- 3.0%     not conclusive: might be *1.017x as slow*
    cube:              -                  11.0ms +/- 0.0%    11.0ms +/- 0.0% 
    morph:             ??                  4.6ms +/- 8.0%     4.7ms +/- 7.3%     not conclusive: might be *1.022x as slow*
    raytrace:          ??                 14.4ms +/- 2.6%    14.8ms +/- 7.5%     not conclusive: might be *1.028x as slow*

  access:              -                  30.4ms +/- 1.6%    30.3ms +/- 2.2% 
    binary-trees:      1.143x as fast      4.0ms +/- 0.0%     3.5ms +/- 10.8%     significant
    fannkuch:          -                  15.6ms +/- 2.4%    15.6ms +/- 3.2% 
    nbody:             ??                  4.7ms +/- 7.3%     5.0ms +/- 0.0%     not conclusive: might be *1.064x as slow*
    nsieve:            ??                  6.1ms +/- 3.7%     6.2ms +/- 4.9%     not conclusive: might be *1.016x as slow*

  bitops:              -                  13.6ms +/- 5.6%    12.7ms +/- 3.8% 
    3bit-bits-in-byte: -                   0.7ms +/- 49.3%     0.6ms +/- 61.5% 
    bits-in-byte:      -                   7.1ms +/- 5.7%     6.8ms +/- 4.4% 
    bitwise-and:       -                   1.3ms +/- 26.6%     1.0ms +/- 0.0% 
    nsieve-bits:       -                   4.5ms +/- 11.2%     4.3ms +/- 8.0% 

  controlflow:         -                   2.9ms +/- 14.0%     2.8ms +/- 10.8% 
    recursive:         -                   2.9ms +/- 14.0%     2.8ms +/- 10.8% 

  crypto:              -                  16.7ms +/- 4.1%    16.7ms +/- 5.7% 
    aes:               -                   9.4ms +/- 5.3%     9.0ms +/- 6.5% 
    md5:               ??                  4.4ms +/- 8.4%     4.5ms +/- 8.4%     not conclusive: might be *1.023x as slow*
    sha1:              ??                  2.9ms +/- 14.0%     3.2ms +/- 14.1%     not conclusive: might be *1.103x as slow*

  date:                -                  29.6ms +/- 3.6%    29.5ms +/- 3.5% 
    format-tofte:      ??                 17.0ms +/- 4.0%    17.3ms +/- 4.4%     not conclusive: might be *1.018x as slow*
    format-xparb:      -                  12.6ms +/- 4.0%    12.2ms +/- 3.7% 

  math:                -                  19.6ms +/- 2.5%    19.2ms +/- 2.4% 
    cordic:            -                   4.9ms +/- 8.3%     4.9ms +/- 4.6% 
    partial-sums:      -                   9.8ms +/- 3.1%     9.5ms +/- 4.0% 
    spectral-norm:     -                   4.9ms +/- 4.6%     4.8ms +/- 6.3% 

  regexp:              -                  13.8ms +/- 42.3%    11.5ms +/- 3.3% 
    dna:               -                  13.8ms +/- 42.3%    11.5ms +/- 3.3% 

  string:              -                  66.0ms +/- 15.3%    56.9ms +/- 1.9% 
    base64:            -                   3.4ms +/- 10.9%     3.3ms +/- 10.5% 
    fasta:             -                  11.2ms +/- 5.0%    10.8ms +/- 2.8% 
    tagcloud:          -                  21.1ms +/- 39.3%    15.4ms +/- 2.4% 
    unpack-code:       -                  21.5ms +/- 1.8%    21.4ms +/- 3.2% 
    validate-input:    -                   8.8ms +/- 66.3%     6.0ms +/- 0.0%
> So we would move the GC trigger to the browser.

Which I think tries to trigger at 'idle' times... not sure whether it's managing in this case.
Another possibility to reduce the gcMallocBytes pressure for SS is to modify the code for ropes. If we look at the allocation distribution only for new ropes in js_ConcatStrings we get:

mozilla@alien:~/gwagner/meter4$ uniq -c out2.txt
 104193 128
  55179 256
  18400 512
    221 1024
    177 2048
    177 4096
    176 8192
    176 16384
    176 32768
    110 65536
     44 131072
     44 262144
     44 524288
    308 1048576

This definitely causes GC runs right now.
Andreas said we could modify the counter when we actually flatten the string.

The numbers are coming from js_concatStrings:
        /* Neither child is a rope: need to make a new buffer. */
        size_t capacity;
        size_t allocSize = RopeAllocSize(length, &capacity);
        printf("%d\n", allocSize);
        buf = (JSRopeBufferInfo *) cx->malloc(allocSize);
We should allocate memory without accounting and then accout when and iff only we actually flatten and actually touch that memory we eagerly allocated at rope creation time.
Ups my numbers are from the malloc in ObtainRopeBuffer and not from js_ConcatStrings.
The actual code in ObtainRopeBuffer is

        cx->free(sourceBuffer);
        buf = (JSRopeBufferInfo *) cx->malloc(allocSize);

We should only account for the difference of the 2 buffers and not for the whole new buffer.
Attached patch patchSplinter Review
numbers for this patch are in #17.
Attachment #487390 - Flags: review?(gal)
Attachment #487390 - Flags: review?(gal) → review+
Whiteboard: fixed-in-tracemonkey
Depends on: 609543
Blocks: 613551
Shouldn't this be marked as resolved? (as well as the other merged TM bugs).
http://hg.mozilla.org/mozilla-central/rev/9e7777fd1d19
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
No longer depends on: 609543
Depends on: 609543
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: