bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

OperationCallbacks and GCs occurring during SunSpider tests

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: Robert Sayre, Assigned: gwagner)

Tracking

unspecified
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 beta8+)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

8 years ago
This is much better than it was previously, but it's still occurring. We can't afford the impact of the variance introduced by this behavior. I'll attach a patch with my logging enabled.
(Reporter)

Comment 1

8 years ago
Created attachment 477506 [details] [diff] [review]
add some printfs to log operation callbacks and js_GC calls
(Reporter)

Comment 2

8 years ago
Created attachment 477507 [details]
log of sunspider run, one window open, right after startup
(Reporter)

Comment 3

8 years ago
STR:

Set the pref browser.dom.window.dump.enabled to true.

Apply the printf patch.

Visit http://people.mozilla.com/~sayrer/2010/08/logged-sunspider/results.html and click "run again".
(Reporter)

Comment 4

8 years ago
Created attachment 477524 [details] [diff] [review]
another patch, with more logging
Attachment #477506 - Attachment is obsolete: true
(Reporter)

Comment 5

8 years ago
Created attachment 477526 [details]
sunspider log with more logging
(Reporter)

Comment 6

8 years ago
Comment on attachment 477526 [details]
sunspider log with more logging

Facts gleaned from the log:

We are executing a LastDitchGC on startup.

The GC is never called from the public API during a test, only between them.

LastDitchGC is sometimes called during the tests.

TriggerGC is called from NewGCArena during the tests.

If OperationCallbacks are triggered without a GC occuring, they do not penalize our score.
(Reporter)

Comment 7

8 years ago
Run with outliers caused by GC during tests:

http://bit.ly/bLho1V

Same run, but with outliers changed to match the other values:

http://bit.ly/d4EfCw

For those following along at home, here's what this GC behavior is costing our total score:

** TOTAL **:           *1.169x as slow*  273.4ms +/- 2.6%   319.6ms +/- 7.9%

Updated

8 years ago
Assignee: general → anygregor
(Reporter)

Comment 8

8 years ago
The check at the top of NewGCArena is responsible for all of the GCs during tests. We are basically bumping up on that 37.5MB limit a bunch of times.
blocking2.0: --- → ?

Comment 9

8 years ago
Gregor, we should fix this. Suggestions?
(Assignee)

Comment 10

8 years ago
A few points:

The LastDitchGC is only on the very first startup after an update (When the new update site appears). I don't know what "extra" work the browser is doing there.
All the following starups shouldn't have a LastDitchGC.

The number of GCs we are running during the benchmarks should be reduced now after the compartmental GC tuning. We decided to wait for bhacketts patch for fine tuning. I will measure again.

We can change to a more aggressive working set strategy but this makes much more sense if we have a fixed upper limit like proposed in bug 592907.

I am working on bug 593729 right now. That will give us more insight on how we should tune our GC. 

So once 584917 and 593729 are landed we should measure again.

Updated

8 years ago
Blocks: 603050
(Reporter)

Updated

8 years ago
Duplicate of this bug: 603528
Blocks: 578133
(Reporter)

Updated

8 years ago
Depends on: 603529

Comment 12

8 years ago
(In reply to comment #7)
 
> For those following along at home, here's what this GC behavior is costing our
> total score:
> 
> ** TOTAL **:           *1.169x as slow*  273.4ms +/- 2.6%   319.6ms +/- 7.9%

This is very similar to what I'm seeing (following along at home...nice).
To paraphrase something bz said in another bug... "Reporters don't run in shell".
(Assignee)

Comment 13

8 years ago
Created attachment 483252 [details]
current GC situation

Back to this...
That's the current situation for starting the browser and run the V8 benchmark suite. We call the GC very often from RefillTypedFreeList. We should avoid this.
(Assignee)

Comment 14

8 years ago
And for the shell-harness:
call js_GC from js_DestroyContext
enter js_GC for cx: 0x100412b90,
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
{
    "v8-crypto": 229,
    "v8-deltablue": 311,
    "v8-earley-boyer": 898,
    "v8-raytrace": 478,
    "v8-regexp": 169,
    "v8-richards": 170,
    "v8-splay": 374

It seems that we call the API function between each benchmark. Nevertheless we also force a GC twice. We should get rid of it.
(Assignee)

Comment 15

8 years ago
So the problem in the shell is not the GC tuning. We are running out of memory because of the heap size that is set for the shell. We either have to increase the heap size or we have to stick with the GC.
This brings us back to bug 592907 where I want to bring the shell settings closer to the browser settings.
(In reply to comment #15)
> So the problem in the shell is not the GC tuning. We are running out of memory
> because of the heap size that is set for the shell. We either have to increase
> the heap size or we have to stick with the GC.
> This brings us back to bug 592907 where I want to bring the shell settings
> closer to the browser settings.

I agree. We want the shell to be a good model for browser perf.
(Reporter)

Updated

8 years ago
blocking2.0: ? → beta8+
Depends on: 606637
(Assignee)

Comment 17

8 years ago
For running SS in the browser (starting the browser, running SS and closing the browser) I get following (note that GC calls might have more than one cause):
Total GC runs:                  22

JS_GC API calls:                10
LastDitchGC calls:              1
Call from AllocateArena:        0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc:      9
Calls because of shapeOverflow: 0
Calls from destroyContext:      3
Successful Maybe_GC calls:      0
Calls via operationCallback:    8

Increasing the gcMallocBytes to 128MB instead of 64MB leads to:
Total GC runs:                  21

JS_GC API calls:                18
LastDitchGC calls:              0
Call from AllocateArena:        0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc:      0
Calls because of shapeOverflow: 0
Calls from destroyContext:      3
Successful Maybe_GC calls:      0
Calls via operationCallback:    0

So we would move the GC trigger to the browser.

Running SS 6 times in the browser shows:
tip:             after:
[ms]   +-[%]      [ms]   +-[%]
216.7  3.0       209.3   0.9
222    5.0       211.2   1.1
221.5  4.5       218.1   5.9
223.6  4.6       209.8   0.6
218.6  3.6       209.4   1.1
222.6  4.9       210.1   1.0

We reduce the overall runtime and the variance but the individual results are all over the place:
TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.059x as fast    222.6ms +/- 4.9%   210.1ms +/- 1.0%     significant

=============================================================================

  3d:                  ??                 30.0ms +/- 1.6%    30.5ms +/- 3.0%     not conclusive: might be *1.017x as slow*
    cube:              -                  11.0ms +/- 0.0%    11.0ms +/- 0.0% 
    morph:             ??                  4.6ms +/- 8.0%     4.7ms +/- 7.3%     not conclusive: might be *1.022x as slow*
    raytrace:          ??                 14.4ms +/- 2.6%    14.8ms +/- 7.5%     not conclusive: might be *1.028x as slow*

  access:              -                  30.4ms +/- 1.6%    30.3ms +/- 2.2% 
    binary-trees:      1.143x as fast      4.0ms +/- 0.0%     3.5ms +/- 10.8%     significant
    fannkuch:          -                  15.6ms +/- 2.4%    15.6ms +/- 3.2% 
    nbody:             ??                  4.7ms +/- 7.3%     5.0ms +/- 0.0%     not conclusive: might be *1.064x as slow*
    nsieve:            ??                  6.1ms +/- 3.7%     6.2ms +/- 4.9%     not conclusive: might be *1.016x as slow*

  bitops:              -                  13.6ms +/- 5.6%    12.7ms +/- 3.8% 
    3bit-bits-in-byte: -                   0.7ms +/- 49.3%     0.6ms +/- 61.5% 
    bits-in-byte:      -                   7.1ms +/- 5.7%     6.8ms +/- 4.4% 
    bitwise-and:       -                   1.3ms +/- 26.6%     1.0ms +/- 0.0% 
    nsieve-bits:       -                   4.5ms +/- 11.2%     4.3ms +/- 8.0% 

  controlflow:         -                   2.9ms +/- 14.0%     2.8ms +/- 10.8% 
    recursive:         -                   2.9ms +/- 14.0%     2.8ms +/- 10.8% 

  crypto:              -                  16.7ms +/- 4.1%    16.7ms +/- 5.7% 
    aes:               -                   9.4ms +/- 5.3%     9.0ms +/- 6.5% 
    md5:               ??                  4.4ms +/- 8.4%     4.5ms +/- 8.4%     not conclusive: might be *1.023x as slow*
    sha1:              ??                  2.9ms +/- 14.0%     3.2ms +/- 14.1%     not conclusive: might be *1.103x as slow*

  date:                -                  29.6ms +/- 3.6%    29.5ms +/- 3.5% 
    format-tofte:      ??                 17.0ms +/- 4.0%    17.3ms +/- 4.4%     not conclusive: might be *1.018x as slow*
    format-xparb:      -                  12.6ms +/- 4.0%    12.2ms +/- 3.7% 

  math:                -                  19.6ms +/- 2.5%    19.2ms +/- 2.4% 
    cordic:            -                   4.9ms +/- 8.3%     4.9ms +/- 4.6% 
    partial-sums:      -                   9.8ms +/- 3.1%     9.5ms +/- 4.0% 
    spectral-norm:     -                   4.9ms +/- 4.6%     4.8ms +/- 6.3% 

  regexp:              -                  13.8ms +/- 42.3%    11.5ms +/- 3.3% 
    dna:               -                  13.8ms +/- 42.3%    11.5ms +/- 3.3% 

  string:              -                  66.0ms +/- 15.3%    56.9ms +/- 1.9% 
    base64:            -                   3.4ms +/- 10.9%     3.3ms +/- 10.5% 
    fasta:             -                  11.2ms +/- 5.0%    10.8ms +/- 2.8% 
    tagcloud:          -                  21.1ms +/- 39.3%    15.4ms +/- 2.4% 
    unpack-code:       -                  21.5ms +/- 1.8%    21.4ms +/- 3.2% 
    validate-input:    -                   8.8ms +/- 66.3%     6.0ms +/- 0.0%
> So we would move the GC trigger to the browser.

Which I think tries to trigger at 'idle' times... not sure whether it's managing in this case.
(Assignee)

Comment 19

8 years ago
Another possibility to reduce the gcMallocBytes pressure for SS is to modify the code for ropes. If we look at the allocation distribution only for new ropes in js_ConcatStrings we get:

mozilla@alien:~/gwagner/meter4$ uniq -c out2.txt
 104193 128
  55179 256
  18400 512
    221 1024
    177 2048
    177 4096
    176 8192
    176 16384
    176 32768
    110 65536
     44 131072
     44 262144
     44 524288
    308 1048576

This definitely causes GC runs right now.
Andreas said we could modify the counter when we actually flatten the string.

The numbers are coming from js_concatStrings:
        /* Neither child is a rope: need to make a new buffer. */
        size_t capacity;
        size_t allocSize = RopeAllocSize(length, &capacity);
        printf("%d\n", allocSize);
        buf = (JSRopeBufferInfo *) cx->malloc(allocSize);

Comment 20

8 years ago
We should allocate memory without accounting and then accout when and iff only we actually flatten and actually touch that memory we eagerly allocated at rope creation time.
(Assignee)

Comment 21

8 years ago
Ups my numbers are from the malloc in ObtainRopeBuffer and not from js_ConcatStrings.
(Assignee)

Comment 22

8 years ago
The actual code in ObtainRopeBuffer is

        cx->free(sourceBuffer);
        buf = (JSRopeBufferInfo *) cx->malloc(allocSize);

We should only account for the difference of the 2 buffers and not for the whole new buffer.
(Assignee)

Comment 23

8 years ago
Created attachment 487390 [details] [diff] [review]
patch

numbers for this patch are in #17.
Attachment #487390 - Flags: review?(gal)

Updated

8 years ago
Attachment #487390 - Flags: review?(gal) → review+

Updated

8 years ago
Whiteboard: fixed-in-tracemonkey
Depends on: 609543

Updated

8 years ago
Blocks: 613551

Comment 25

8 years ago
Shouldn't this be marked as resolved? (as well as the other merged TM bugs).
(Reporter)

Comment 26

8 years ago
http://hg.mozilla.org/mozilla-central/rev/9e7777fd1d19
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Updated

8 years ago
No longer depends on: 609543

Updated

8 years ago
Depends on: 609543
You need to log in before you can comment on or make changes to this bug.