Instrument Baseline generated code to collect branch profiles.

RESOLVED DUPLICATE of bug 1209515

Status

()

Core
JavaScript Engine
RESOLVED DUPLICATE of bug 1209515
5 years ago
2 years ago

People

(Reporter: nbp, Assigned: wuwei)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 13 obsolete attachments)

47.37 KB, patch
Details | Diff | Splinter Review
Branch Profiler collects the targets of conditional jumps by instrumenting SpiderMonkey engine. There are three potential instrumentation points in SpiderMonkey: the bytecode interpreter, IonMonkey and the baseline compiler. Instrumenting interpreter is the easiest way to get things work but the the data it collects is fragmented and less important, since "hot" JavaScript methods are not executed by interpreter. More importantly, Instrumenting interpreter will slow down scripts which are running only once. IonMonkey is another place that can insert profiling code and in fact it do has the PCCount interface for profiling, but this mechanism is activated in debug build only, since the overhead caused by profiling makes it unpractical.

Branch Profiler leverages the Baseline Compiler to generate instrumented machine codes. It replaces the output of baseline compiler when it generates code for JSOP_IFEQ, JSOP_IFNE, and JSOP_TABLESWITCH bytecodes.

Basic block counter and circular address buffer are two possible ways to store branch profiling data. The basic block counter solution keeps two counters for each branch, as well as the number of the targets the branch might jump. When a block jumps to another the correlate counter will be increased by 1. Then these frequencies will be transformed into probabilities and consumed by some transform passes. On the other hand, circular address buffer maintains a sequence of branch targets. Every time Branch Profiler is triggered it pushes the target address of the branch into buffer. The main benefit of this design is that not only frequencies of each branch can be calculated but also the relationships between jumps are able to be inferred, which may generate new possibilities for more sophisticated optimizations.

In this project we choose the circular buffer as our preliminary implementation and allocates one circular buffer for each IonScript. We will switch to basic block counter later if the overhead caused by the circular buffer is high, or if we fail to find any benchmark which can take advantage of a more-clever way of inferring branches ordering.

[*] https://github.com/lazyparser/gsoc2013/blob/master/proposal.md
(Assignee)

Comment 1

5 years ago
In baseline compiler, js bytecodes are translated into asm directly, without building a control-flow graph. If we want to use block counters as our data structure, we need to know the total number of MIR blocks the jsscript has, which is not available currently.
(In reply to Wei Wu [:wuwei] from comment #1)
> In baseline compiler, js bytecodes are translated into asm directly, without
> building a control-flow graph.

Except for loop-edges all jumps are generated before the jump-target.  One way would be to annotate the map the jsbytecode with the block counter (maintain a mapping structure).  We do not share the CFG, as we reconstruct it from the bytecode when we compile with IonMonkey.

The loop edges will be annotated with source notes, see how we identify loops in IonBuilder.cpp (callers of IonBuilder::pushLoop).
(Assignee)

Comment 3

5 years ago
Created attachment 771718 [details] [diff] [review]
branch_counter.patch

The patch instruments the jitcode generated by the Baseline Compiler. Currently only JSOP_IFEQ and and JSOP_IFNE are instruemented. I uploaded it for early review.

Here are the problems I did't solve yet:

1. How to directly access the address of counters. In this patch I use jsscript->baselineScript->branchCountersOffset_ + counterIndex to calculate the address of a counter, which is cumbersome and ugly. (I know how to use 'Label' family to 'patch' code, but I was not able to use them to get the address of counters.)

2. This patch keeps two counters for each branch. The other way is to allocate one counter for each block, then calcutate the probabilities of each branch target by solving equations. We might want to implement the 'one counter' solution, if the overhead of 'two counters' solution is high.

3. This patch might causes segmentation fault problem, and I have not found the reason yet.
(Assignee)

Comment 4

5 years ago
Created attachment 772094 [details] [diff] [review]
block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only)
Attachment #771718 - Attachment is obsolete: true
Comment on attachment 772094 [details] [diff] [review]
block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only)

Review of attachment 772094 [details] [diff] [review]:
-----------------------------------------------------------------

So far, I cannot eye spot the SEGV.

::: js/src/ion/BaselineCompiler.cpp
@@ +198,5 @@
> +    for (size_t i = 0; i < blockCounterLabels_.length(); i++) {
> +        CodeOffsetLabel label = blockCounterLabels_[i].label;
> +        label.fixup(&masm);
> +        size_t bcEntry = blockCounterLabels_[i].bcEntry;
> +        BlockCounterEntry *bcEntryAddr = &(baselineScript->blockCounterEntry(bcEntry));

nit: no need for parentheses after '&'.

@@ +814,5 @@
>      if (!knownBoolean && !emitToBoolean())
>              return false;
>  
> +    Label counterLabel;
> +    masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel);

I am not a big fan of duplicating conditions.

::: js/src/ion/BaselineJIT.h
@@ +98,5 @@
> +{
> +    uint32_t counter;
> +    jsbytecode *pc;
> +    BlockCounterEntry(jsbytecode *pc)
> +    :pc(pc),counter(0)

nit: indent and add new lines.

::: js/src/ion/shared/BaselineCompiler-shared.h
@@ +106,5 @@
> +    BlockCounterEntry *allocateBlockCounterEntry(jsbytecode *pc) {
> +        if(!blockCounterEntries_.append(BlockCounterEntry(pc)))
> +            return NULL;
> +        BlockCounterEntry &bcEntry = blockCounterEntries_[blockCounterEntries_.length() - 1];
> +        return &bcEntry;

nit: return blockCounterEntries_.back();
Attachment #772094 - Flags: feedback+
(Assignee)

Comment 6

5 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> Comment on attachment 772094 [details] [diff] [review]
> block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only)

> @@ +814,5 @@
> >      if (!knownBoolean && !emitToBoolean())
> >              return false;
> >  
> > +    Label counterLabel;
> > +    masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel);
> 
> I am not a big fan of duplicating conditions.
> 

If we want to avoid this kind of conditional jumps, the instrumentation code should be placed in front of each bytecode witch is the target of any jumps (a.k.a. basic block headers). In this case we need to analyze bytecodes and source notes to figure out this information before emitBody(). It might be reasonable to implement the analysis codes in BytecodeAnalysis class.
(Assignee)

Comment 7

5 years ago
There is a 'jumpTarget' boolean property existing in BytecodeInfo struct, so we might able to reuse it directly.
(Assignee)

Comment 8

5 years ago
Created attachment 773018 [details] [diff] [review]
Emit one block counter for each jump target

Updates:

1. Map block counters to the offset of bytecodes instead of absoulte memory address.
2. Emit one block counter for each bytecode which is the target of any jumps.
3. Add two IonSpewer output for debugging.

TODO && FIXME:

1. The SEGV bug remains in current patch despite that all jittests were passed (make check). Ask Nicolas [:nbp] and Kannan [:djvj] for help.
2. Avoid emitting counters for bytecodes which is target of jumps but not a header of basic block.
(Assignee)

Comment 9

5 years ago
Comment on attachment 772094 [details] [diff] [review]
block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only)

Review of attachment 772094 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/BaselineCompiler.cpp
@@ +817,5 @@
> +    Label counterLabel;
> +    masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel);
> +    emitBlockCounter(pc + GET_JUMP_OFFSET(pc));
> +    masm.bind(&counterLabel);
> +

The next branch might jump to a wrong location, for
the value in register R0 was overwritten in emitBlockCounter(). It then caused JS_ASSERT() failed, which raised the SEGV.
(Assignee)

Comment 10

5 years ago
Created attachment 774034 [details] [diff] [review]
Dumping block counters when building MIRGraph.

This patch is for testing only. Just to make sure everything was correct.

Dumps can be viewed and downloaded at:
https://github.com/lazyparser/gsoc2013/tree/master/block_counter_dump/

You may want to use 'cat dumpFile | grep "block counter"' to filter out unrelated spews.

Here is an example:

$grep 'block counter' 3d-cube.js.counters.dump
...
[BaselineOp] [block counter] Emitting a block counter for op @ 1021: loophead (lineno: 312 column: 42)
[BaselineOp] [block counter] Emitting a block counter for op @ 1022: getgname (lineno: 312 column: 42)
[BaselineOp] [block counter] Emitting a block counter for op @ 1144: loopentry (lineno: 312 column: 18)
[BaselineOp] [block counter] Emitting a block counter for op @ 1170: getgname (lineno: 315 column: 2)
[BaselineOp] [block counter] Emitting a block counter for op @ 1276: loophead (lineno: 319 column: 36)
[BaselineOp] [block counter] Emitting a block counter for op @ 1277: callgname (lineno: 319 column: 36)
[BaselineOp] [block counter] Emitting a block counter for op @ 1306: loopentry (lineno: 319 column: 18)
[BaselineOp] [block counter] Emitting a block counter for op @ 1327: bindgname (lineno: 321 column: 2)
[BaselineOp] [block counter] Emitting a block counter for op @ 1438: loophead (lineno: 326 column: 0)
[BaselineOp] [block counter] Emitting a block counter for op @ 1439: getgname (lineno: 326 column: 4)
[BaselineOp] [block counter] Emitting a block counter for op @ 1497: loopentry (lineno: 325 column: 9)
[BaselineOp] [block counter] Emitting a block counter for op @ 1510: callgname (lineno: 328 column: 2)
[BaselineOp] [block counter] Emitting a block counter for op @ 1559: loophead (lineno: 335 column: 0)
[BaselineOp] [block counter] Emitting a block counter for op @ 1560: getgname (lineno: 335 column: 0)
[BaselineOp] [block counter] Emitting a block counter for op @ 1588: loophead (lineno: 337 column: 6)
[BaselineOp] [block counter] Emitting a block counter for op @ 1589: getlocal (lineno: 337 column: 6)
[BaselineOp] [block counter] Emitting a block counter for op @ 1614: loopentry (lineno: 336 column: 20)
[BaselineOp] [block counter] Emitting a block counter for op @ 1633: getlocal (lineno: 334 column: 32)
[BaselineOp] [block counter] Emitting a block counter for op @ 1643: loopentry (lineno: 334 column: 18)
[BaselineOp] [block counter] Emitting a block counter for op @ 1664: getlocal (lineno: 339 column: 0)
[BaselineOp] [block counter] Emitting a block counter for op @ 1682: string (lineno: 340 column: 4)
[BaselineOp] [block counter] Emitting a block counter for op @ 1718: stop (lineno: 340 column: 120)
[BaselineScripts] [block counter] 22 block counters were emitted for script 3d-cube.js:258 (0x1f468d0) (lineno: 340 column: 120)
...
[BaselineScripts] [block counter] Dump 22 block counters for 0x1f468d0
[BaselineScripts] [block counter] Op offset: 1021, count: 6
[BaselineScripts] [block counter] Op offset: 1022, count: 6
[BaselineScripts] [block counter] Op offset: 1144, count: 7
[BaselineScripts] [block counter] Op offset: 1170, count: 1
[BaselineScripts] [block counter] Op offset: 1276, count: 1014
[BaselineScripts] [block counter] Op offset: 1277, count: 1014
[BaselineScripts] [block counter] Op offset: 1306, count: 1016
[BaselineScripts] [block counter] Op offset: 1327, count: 1
[BaselineScripts] [block counter] Op offset: 1438, count: 9
[BaselineScripts] [block counter] Op offset: 1439, count: 9
[BaselineScripts] [block counter] Op offset: 1497, count: 10
[BaselineScripts] [block counter] Op offset: 1510, count: 1
[BaselineScripts] [block counter] Op offset: 1559, count: 9
[BaselineScripts] [block counter] Op offset: 1560, count: 9
[BaselineScripts] [block counter] Op offset: 1588, count: 36
[BaselineScripts] [block counter] Op offset: 1589, count: 36
[BaselineScripts] [block counter] Op offset: 1614, count: 45
[BaselineScripts] [block counter] Op offset: 1633, count: 9
[BaselineScripts] [block counter] Op offset: 1643, count: 10
[BaselineScripts] [block counter] Op offset: 1664, count: 1
[BaselineScripts] [block counter] Op offset: 1682, count: 0
[BaselineScripts] [block counter] Op offset: 1718, count: 1
...
(Assignee)

Comment 11

5 years ago
Here is a preliminary performance profiling result. The test platform has 8-cores (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), running Ubuntu 12.04 amd64. 
The profiling data may not be reliable, and I am doing more profiling now.

Kraken on Firefox:

TEST                         COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:                 -                 1508.5ms +/- 1.8%   1502.0ms +/- 1.8% 

====================================================================================

  ai:                        -                   96.2ms +/- 1.0%     95.3ms +/- 0.9% 
    astar:                   -                   96.2ms +/- 1.0%     95.3ms +/- 0.9% 

  audio:                     ??                 565.2ms +/- 4.6%    567.8ms +/- 4.1%     might be *1.005x as slow*
    beat-detection:          ??                 154.6ms +/- 17.0%    159.2ms +/- 14.4%     might be *1.030x as slow*
    dft:                     -                  192.0ms +/- 0.8%    191.0ms +/- 0.4% 
    fft:                     ??                  74.4ms +/- 0.9%     74.5ms +/- 2.4%     might be *1.001x as slow*
    oscillator:              -                  144.2ms +/- 2.9%    143.1ms +/- 2.8% 

  imaging:                   -                  321.9ms +/- 0.6%    319.7ms +/- 0.9% 
    gaussian-blur:           -                  105.8ms +/- 1.1%    105.6ms +/- 1.2% 
    darkroom:                1.010x as fast     113.0ms +/- 0.4%    111.9ms +/- 0.5%     significant
    desaturate:              -                  103.1ms +/- 2.0%    102.2ms +/- 2.2% 

  json:                      ??                  78.6ms +/- 6.4%     80.5ms +/- 7.6%     might be *1.024x as slow*
    parse-financial:         -                   34.4ms +/- 5.6%     33.5ms +/- 1.1% 
    stringify-tinderbox:     ??                  44.2ms +/- 12.1%     47.0ms +/- 12.9%     might be *1.063x as slow*

  stanford:                  -                  446.6ms +/- 1.9%    438.7ms +/- 1.4% 
    crypto-aes:              1.062x as fast     198.9ms +/- 1.0%    187.3ms +/- 1.0%     significant
    crypto-ccm:              -                   80.8ms +/- 7.9%     80.4ms +/- 7.7% 
    crypto-pbkdf2:           ??                 119.2ms +/- 6.9%    124.9ms +/- 5.8%     might be *1.048x as slow*
    crypto-sha256-iterative: -                   47.7ms +/- 5.7%     46.1ms +/- 4.6% 

SunSpider on Firefox:

TEST                   COMPARISON               FROM                 TO             DETAILS

===============================================================================

** TOTAL **:           -                 150.8ms +/- 1.5%    150.8ms +/- 0.7%  

===============================================================================

  3d:                  ??                 24.2ms +/- 2.7%     24.3ms +/- 3.4%      not conclusive: might be *1.004x as slow*
    cube:              ??                 10.1ms +/- 4.0%     10.2ms +/- 3.0%      not conclusive: might be *1.010x as slow*
    morph:             ??                  4.0ms +/- 0.0%      4.1ms +/- 5.5%      not conclusive: might be *1.025x as slow*
    raytrace:          -                  10.1ms +/- 4.0%     10.0ms +/- 4.8%  

  access:              -                  13.8ms +/- 3.3%     13.4ms +/- 5.2%  
    binary-trees:      ??                  1.9ms +/- 11.9%     2.1ms +/- 10.8%     not conclusive: might be *1.105x as slow*
    fannkuch:          -                   5.9ms +/- 6.9%      5.6ms +/- 6.6%  
    nbody:             -                   2.7ms +/- 12.8%     2.7ms +/- 12.8% 
    nsieve:            -                   3.3ms +/- 10.5%     3.0ms +/- 0.0%  

  bitops:              -                   8.8ms +/- 9.2%      8.8ms +/- 5.1%  
    3bit-bits-in-byte: -                   1.0ms +/- 0.0%      1.0ms +/- 0.0%  
    bits-in-byte:      -                   2.8ms +/- 16.1%     2.7ms +/- 12.8% 
    bitwise-and:       ??                  1.7ms +/- 20.3%     1.8ms +/- 16.7%     not conclusive: might be *1.059x as slow*
    nsieve-bits:       -                   3.3ms +/- 10.5%     3.3ms +/- 10.5% 

  controlflow:         -                   2.0ms +/- 0.0%      2.0ms +/- 0.0%  
    recursive:         -                   2.0ms +/- 0.0%      2.0ms +/- 0.0%  

  crypto:              -                  13.6ms +/- 2.7%     13.5ms +/- 4.5%  
    aes:               -                   6.9ms +/- 3.3%      6.7ms +/- 7.2%  
    md5:               ??                  3.7ms +/- 9.3%      3.8ms +/- 7.9%      not conclusive: might be *1.027x as slow*
    sha1:              -                   3.0ms +/- 0.0%      3.0ms +/- 0.0%  

  date:                ??                 18.5ms +/- 2.0%     18.9ms +/- 2.1%      not conclusive: might be *1.022x as slow*
    format-tofte:      ??                 10.4ms +/- 3.5%     10.5ms +/- 3.6%      not conclusive: might be *1.010x as slow*
    format-xparb:      ??                  8.1ms +/- 2.8%      8.4ms +/- 4.4%      not conclusive: might be *1.037x as slow*

  math:                -                  16.4ms +/- 3.0%     16.1ms +/- 4.4%  
    cordic:            -                   2.3ms +/- 15.0%     2.2ms +/- 13.7% 
    partial-sums:      -                  12.5ms +/- 3.0%     12.3ms +/- 2.8%  
    spectral-norm:     -                   1.6ms +/- 23.1%     1.6ms +/- 23.1% 

  regexp:              -                   9.2ms +/- 3.3%      9.2ms +/- 3.3%  
    dna:               -                   9.2ms +/- 3.3%      9.2ms +/- 3.3%  

  string:              ??                 44.3ms +/- 1.3%     44.6ms +/- 1.5%      not conclusive: might be *1.007x as slow*
    base64:            -                   5.2ms +/- 5.8%      5.2ms +/- 5.8%  
    fasta:             -                   5.5ms +/- 6.8%      5.5ms +/- 6.8%  
    tagcloud:          ??                 12.8ms +/- 2.4%     13.1ms +/- 1.7%      not conclusive: might be *1.023x as slow*
    unpack-code:       -                  16.0ms +/- 2.1%     16.0ms +/- 2.1%  
    validate-input:    -                   4.8ms +/- 6.3%      4.8ms +/- 6.3%  



SunSpider on JSShell (runs=5000):

TEST                   COMPARISON               FROM                 TO             DETAILS

===============================================================================

** TOTAL **:           *1.003x as slow*  156.5ms +/- 0.0%    156.9ms +/- 0.0%      significant

===============================================================================

  3d:                  *1.002x as slow*   28.3ms +/- 0.1%     28.3ms +/- 0.1%      significant
    cube:              1.001x as fast     11.9ms +/- 0.1%     11.9ms +/- 0.1%      significant
    morph:             *1.002x as slow*    3.8ms +/- 0.1%      3.8ms +/- 0.1%      significant
    raytrace:          *1.004x as slow*   12.6ms +/- 0.1%     12.7ms +/- 0.1%      significant

  access:              1.002x as fast     12.9ms +/- 0.1%     12.9ms +/- 0.1%      significant
    binary-trees:      1.002x as fast      1.9ms +/- 0.1%      1.9ms +/- 0.1%      significant
    fannkuch:          *1.002x as slow*    5.8ms +/- 0.1%      5.8ms +/- 0.1%      significant
    nbody:             *1.003x as slow*    2.6ms +/- 0.1%      2.6ms +/- 0.1%      significant
    nsieve:            1.017x as fast      2.6ms +/- 0.1%      2.5ms +/- 0.1%      significant

  bitops:              1.003x as fast      8.2ms +/- 0.1%      8.2ms +/- 0.1%      significant
    3bit-bits-in-byte: ??                  0.8ms +/- 0.1%      0.8ms +/- 0.2%      not conclusive: might be *1.002x as slow*
    bits-in-byte:      -                   2.8ms +/- 0.1%      2.8ms +/- 0.1%  
    bitwise-and:       -                   1.4ms +/- 0.1%      1.4ms +/- 0.1%  
    nsieve-bits:       1.007x as fast      3.2ms +/- 0.1%      3.2ms +/- 0.1%      significant

  controlflow:         -                   2.0ms +/- 0.1%      2.0ms +/- 0.1%  
    recursive:         -                   2.0ms +/- 0.1%      2.0ms +/- 0.1%  

  crypto:              *1.003x as slow*   15.2ms +/- 0.1%     15.3ms +/- 0.1%      significant
    aes:               *1.005x as slow*    9.0ms +/- 0.1%      9.1ms +/- 0.1%      significant
    md5:               -                   3.6ms +/- 0.1%      3.6ms +/- 0.1%  
    sha1:              *1.005x as slow*    2.6ms +/- 0.1%      2.6ms +/- 0.1%      significant

  date:                *1.003x as slow*   20.9ms +/- 0.1%     20.9ms +/- 0.1%      significant
    format-tofte:      *1.005x as slow*   10.1ms +/- 0.1%     10.2ms +/- 0.1%      significant
    format-xparb:      *1.002x as slow*   10.8ms +/- 0.1%     10.8ms +/- 0.1%      significant

  math:                -                  12.3ms +/- 0.1%     12.3ms +/- 0.1%  
    cordic:            *1.003x as slow*    2.2ms +/- 0.1%      2.3ms +/- 0.1%      significant
    partial-sums:      -                   8.5ms +/- 0.1%      8.5ms +/- 0.1%  
    spectral-norm:     *1.003x as slow*    1.6ms +/- 0.1%      1.6ms +/- 0.1%      significant

  regexp:              *1.002x as slow*   10.5ms +/- 0.0%     10.5ms +/- 0.0%      significant
    dna:               *1.002x as slow*   10.5ms +/- 0.0%     10.5ms +/- 0.0%      significant

  string:              *1.006x as slow*   46.2ms +/- 0.1%     46.5ms +/- 0.1%      significant
    base64:            *1.007x as slow*    4.8ms +/- 0.1%      4.8ms +/- 0.1%      significant
    fasta:             *1.003x as slow*    5.3ms +/- 0.1%      5.3ms +/- 0.1%      significant
    tagcloud:          *1.003x as slow*   13.1ms +/- 0.1%     13.1ms +/- 0.1%      significant
    unpack-code:       *1.008x as slow*   18.2ms +/- 0.1%     18.3ms +/- 0.1%      significant
    validate-input:    *1.009x as slow*    4.9ms +/- 0.1%      4.9ms +/- 0.1%      significant
(Assignee)

Comment 12

5 years ago
Created attachment 775146 [details] [diff] [review]
Attaching block profiles to MBasicBlocks, with testing code.

For early review.
Attachment #775146 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 775146 [details] [diff] [review]
Attaching block profiles to MBasicBlocks, with testing code.

Review of attachment 775146 [details] [diff] [review]:
-----------------------------------------------------------------

This sounds good modulo the few remarks which are following …

::: js/src/ion/Ion.cpp
@@ +959,5 @@
>  
>      if (mir->shouldCancel("Renumber Blocks"))
>          return false;
>  
> +    if (!AttachBlockProfiles(graph))

OptimizeMIR is used both by OdinMoney and by IonMonkey, you might want to only do it in IonMonkey as OdinMonkey is an ahead-of-time compiler, which does not even run into the baseline compiler.

::: js/src/ion/IonAnalysis.cpp
@@ +51,5 @@
> +        BaselineScript *blscript = jsscript->baselineScript();
> +        size_t numCounters = blscript->numBlockCounters();
> +
> +        IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] jsscript(%p) blscript(%p) code(%p)",
> +                   jsscript, blscript, code);

You might want to have a new Spew channel for profile data.

@@ +53,5 @@
> +
> +        IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] jsscript(%p) blscript(%p) code(%p)",
> +                   jsscript, blscript, code);
> +
> +        jsbytecode *pc = block->pc();

Not all blocks have a pc, only blocks which have a resume points.

@@ +57,5 @@
> +        jsbytecode *pc = block->pc();
> +        size_t i;
> +        for (i = 0; i < numCounters; i++) {
> +            BlockCounterEntry entry = blscript->blockCounterEntry(i);
> +            if (entry.pcOffset + code == pc) {

As long as iterating linearly for each block is not performance issue, this should be fine.  Otherwise, I guess the block counters can be sorted by pc.

Also you can break early, and handle this case after the loop as you already do with the error message in the spew.

@@ +66,5 @@
> +                break;
> +            }
> +        }
> +        if (i >= numCounters) {
> +            IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] Oops no counter for block(%zu)", block->id());

This case might happen when we inline some code, where we might for example insert an instruction to switch to one basic block or the other in function of the JSFunction pointer.

::: js/src/ion/MIRGraph.h
@@ +496,5 @@
> +
> +    bool isWorthFiltered() {
> +        uint32_t scriptUseCount = info_.script()->getUseCount();
> +        // TODO: Decide whether this block is worth to be filtered out.
> +        if (isBlockUseCountAvailable() && getBlockUseCount() < scriptUseCount / 10)

This should be part of another bug, but ok.
Attachment #775146 - Flags: feedback?(nicolas.b.pierron) → feedback+
(Assignee)

Comment 14

5 years ago
Here is a preliminary performance profiling result. The test platform has 8-cores (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), running Ubuntu 12.04 amd64. 

The IonMonkey was disabled (--no-ion).

sunspider, 100 times, baseline vs. instrumented
     
    TEST                   COMPARISON               FROM                 TO             DETAILS
     
    ===============================================================================
     
    ** TOTAL **:           -                 352.9ms +/- 0.6%    352.5ms +/- 0.2% 
     
    ===============================================================================
     
      3d:                  -                  48.8ms +/- 0.7%     48.5ms +/- 1.0% 
        cube:              ??                 19.0ms +/- 0.9%     19.0ms +/- 0.7%      not conclusive: might be *1.002x as slow*
        morph:             1.028x as fast     13.5ms +/- 0.9%     13.1ms +/- 1.4%      significant
        raytrace:          ??                 16.4ms +/- 0.8%     16.4ms +/- 1.1%      not conclusive: might be *1.002x as slow*
     
      access:              *1.014x as slow*   57.1ms +/- 0.8%     57.9ms +/- 0.2%      significant
        binary-trees:      1.025x as fast     10.2ms +/- 1.0%      9.9ms +/- 1.0%      significant
        fannkuch:          *1.035x as slow*   26.2ms +/- 0.8%     27.2ms +/- 0.2%      significant
        nbody:             1.040x as fast     12.6ms +/- 1.4%     12.2ms +/- 0.2%      significant
        nsieve:            *1.074x as slow*    8.0ms +/- 0.9%      8.6ms +/- 0.6%      significant
     
      bitops:              *1.036x as slow*   52.1ms +/- 0.6%     54.0ms +/- 0.4%      significant
        3bit-bits-in-byte: *1.186x as slow*    7.0ms +/- 0.8%      8.4ms +/- 1.8%      significant
        bits-in-byte:      *1.045x as slow*   13.3ms +/- 0.7%     13.9ms +/- 0.6%      significant
        bitwise-and:       ??                 17.5ms +/- 0.8%     17.6ms +/- 0.3%      not conclusive: might be *1.007x as slow*
        nsieve-bits:       1.010x as fast     14.4ms +/- 0.8%     14.2ms +/- 0.3%      significant
     
      controlflow:         1.032x as fast      7.2ms +/- 0.8%      7.0ms +/- 0.2%      significant
        recursive:         1.032x as fast      7.2ms +/- 0.8%      7.0ms +/- 0.2%      significant
     
      crypto:              1.023x as fast     33.4ms +/- 1.0%     32.7ms +/- 0.4%      significant
        aes:               1.021x as fast     18.1ms +/- 1.3%     17.7ms +/- 0.4%      significant
        md5:               1.016x as fast      8.2ms +/- 1.0%      8.1ms +/- 0.8%      significant
        sha1:              1.036x as fast      7.1ms +/- 1.2%      6.9ms +/- 0.5%      significant
     
      date:                -                  33.8ms +/- 1.0%     33.7ms +/- 0.6% 
        format-tofte:      -                  23.1ms +/- 1.1%     23.0ms +/- 0.6% 
        format-xparb:      -                  10.7ms +/- 1.2%     10.7ms +/- 1.5% 
     
      math:                1.016x as fast     40.0ms +/- 0.7%     39.4ms +/- 0.3%      significant
        cordic:            1.041x as fast     19.5ms +/- 1.0%     18.7ms +/- 0.4%      significant
        partial-sums:      *1.012x as slow*   13.0ms +/- 0.9%     13.1ms +/- 0.7%      significant
        spectral-norm:     -                   7.6ms +/- 1.1%      7.6ms +/- 0.2% 
     
      regexp:              1.020x as fast     10.8ms +/- 0.8%     10.6ms +/- 0.2%      significant
        dna:               1.020x as fast     10.8ms +/- 0.8%     10.6ms +/- 0.2%      significant
     
      string:              1.013x as fast     69.6ms +/- 0.8%     68.7ms +/- 0.5%      significant
        base64:            -                   8.0ms +/- 1.0%      7.9ms +/- 0.3% 
        fasta:             ??                 17.7ms +/- 1.0%     17.8ms +/- 0.8%      not conclusive: might be *1.002x as slow*
        tagcloud:          1.019x as fast     17.3ms +/- 1.2%     17.0ms +/- 1.0%      significant
        unpack-code:       1.023x as fast     17.5ms +/- 1.1%     17.1ms +/- 0.5%      significant
        validate-input:    -                   9.0ms +/- 1.1%      8.9ms +/- 0.7%
Drive by comment. Since this information will only be used by IonMonkey, please disable instrumenting baseline scripts that we know will not run in IM. We also do this for usecount, see "ion/BaselineCompiler.cpp:440".

Also discussed this with Nicolas and since instrumenting has some overhead, we might also want code that dynamically disables this profiling when IonBuilder is ran and the script is marked as uncompileable. Since baseline doesn't recompile, we currently don't do this for usecount.
Attachment #772094 - Attachment is obsolete: true
(Assignee)

Comment 16

5 years ago
(In reply to Hannes Verschore [:h4writer] from comment #15)
> Drive by comment. Since this information will only be used by IonMonkey,
> please disable instrumenting baseline scripts that we know will not run in
> IM. We also do this for usecount, see "ion/BaselineCompiler.cpp:440".

Agree. Baseline scripts should not be instrumented if neither ionCompileable_ nor  ionOSRCompileable_ is true.

> Also discussed this with Nicolas and since instrumenting has some overhead,
> we might also want code that dynamically disables this profiling when
> IonBuilder is ran and the script is marked as uncompileable. Since baseline
> doesn't recompile, we currently don't do this for usecount.

Currently I don't know how to disable the counters dynamically. It might be
possible to load and check a flag (boolean value) which indicates whether
the profiling is necessary or not, but it introduces more overhead,
unfortunately. Would you provide me some hints on it?
(In reply to Wei Wu [:wuwei UTC+8] from comment #16)
> > Also discussed this with Nicolas and since instrumenting has some overhead,
> > we might also want code that dynamically disables this profiling when
> > IonBuilder is ran and the script is marked as uncompileable. Since baseline
> > doesn't recompile, we currently don't do this for usecount.
> 
> Currently I don't know how to disable the counters dynamically. It might be
> possible to load and check a flag (boolean value) which indicates whether
> the profiling is necessary or not, but it introduces more overhead,
> unfortunately. Would you provide me some hints on it?

The idea was to replace the code in-place by an unconditional relative jump code which does the increment of the counters.

Discussing with sstangl, who made the instrumentation for doing a similar thing for the Incremental GC write barriers.  He suggests that we land the patch without bothering with the performance of the baseline compiler (as it is between 2% - 10% slowdown when only the baseline is enabled)
(Assignee)

Comment 18

5 years ago
Created attachment 776869 [details] [diff] [review]
Toggle Branch Profiling

1. Do not emit block counters if neither 'ionCompileable_' nor 'ionOSRCompileable_' is true.
2. Add a jsshell command line option for toggling instrumentation on/off.
Attachment #776869 - Flags: feedback?(nicolas.b.pierron)
(Assignee)

Comment 19

5 years ago
Created attachment 777098 [details] [diff] [review]
Dynamic toggle branch profiling

Learning from SPS profiler related implementations.
Attachment #777098 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 773018 [details] [diff] [review]
Emit one block counter for each jump target

Review of attachment 773018 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/BaselineCompiler.cpp
@@ +356,5 @@
> +    CodeOffsetLabel counterOffset = masm.movWithPatch(ImmWord(-1), addressReg);
> +    Address counterAddr(addressReg, BlockCounterEntry::offsetOfCounter());
> +    masm.load32(counterAddr, counterReg);
> +    masm.add32(Imm32(1), counterReg);
> +    masm.store32(counterReg, counterAddr);

This should just be:

masm.add32(Imm32(1), counterAddr);

The macroAssembler will make it into 1 instruction on x86 and x64.
(Assignee)

Comment 21

5 years ago
Created attachment 777501 [details] [diff] [review]
Simplify counter increment code

Eliminating explicit load/store instructions when incrementing a counter. (Thank you, Kannan :-) )
Attachment #777501 - Flags: feedback?(nicolas.b.pierron)
Attachment #777501 - Flags: feedback?(kvijayan)
(Assignee)

Updated

5 years ago
Attachment #773018 - Flags: feedback?(nicolas.b.pierron)
Attachment #773018 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 776869 [details] [diff] [review]
Toggle Branch Profiling

Review of attachment 776869 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/BaselineCompiler.cpp
@@ +343,5 @@
>  
>  bool
>  BaselineCompiler::emitBlockCounter(jsbytecode *pc)
>  {
> +    if (!ionCompileable_ && !ionOSRCompileable_)

what is the meaning of ionOSRCompileable?  Can we still Ion compile without OSR?

::: js/src/ion/Ion.h
@@ +181,5 @@
>  
> +    // Whether baseline scripts are instrumented.
> +    //
> +    // Default: false
> +    bool baselineBranchProfiling;

I will suggest on by default if the perf are not bad.  If the preformances show a slow down, we can disable it in a followup patch.
Attachment #776869 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 777098 [details] [diff] [review]
Dynamic toggle branch profiling

Review of attachment 777098 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good.  What is the overhead of the dummy comparisons for the first iterations (or after a bailout) in the baseline code?
Attachment #777098 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 777501 [details] [diff] [review]
Simplify counter increment code

Review of attachment 777501 [details] [diff] [review]:
-----------------------------------------------------------------

Fold that in the previous patch, there is no need to make a separated patch for this small modification.
Attachment #777501 - Flags: feedback?(nicolas.b.pierron)
(Assignee)

Updated

5 years ago
Attachment #777501 - Flags: feedback?(kvijayan)
(Assignee)

Comment 25

5 years ago
Created attachment 780478 [details] [diff] [review]
Instrumentation and Propagation

All in one patch.
Attachment #773018 - Attachment is obsolete: true
Attachment #774034 - Attachment is obsolete: true
Attachment #775146 - Attachment is obsolete: true
Attachment #776869 - Attachment is obsolete: true
Attachment #777098 - Attachment is obsolete: true
Attachment #777501 - Attachment is obsolete: true
Attachment #780478 - Flags: review?(nicolas.b.pierron)
Comment on attachment 780478 [details] [diff] [review]
Instrumentation and Propagation

Review of attachment 780478 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good.  Please address the comments and submit another patch for review & checkin.

::: js/src/ion/BaselineCompiler.cpp
@@ +186,5 @@
>                                             ImmWord(uintptr_t(-1)));
>      }
>  
> +    IonSpew(IonSpew_BranchProfiles,
> +            "[BaselineCompiler] Statistics %d counters emitted for script %s:%d (%p)",

nit: Remove the brackets from the spew message.

@@ +675,5 @@
>              return Method_Error;
>  
> +        if (js_IonOptions.baselineBranchProfiling) {
> +            // Instrument all jump targets and the first opcode.
> +            if ( (pc == script->code || info->jumpTarget ) && !emitBlockCounter(pc))

nit: remove extra space between the 2 parentheses.

::: js/src/ion/BaselineJIT.h
@@ +157,5 @@
>      uint32_t pcMappingOffset_;
>      uint32_t pcMappingSize_;
>  
> +    uint32_t blockCounterOffset_;
> +    uint32_t blockCounters_;

nit: the naming scheme here is that the size-argument mention if it is a number of Entries or the Size in bytes.  Add "Entries" at the end of blockCounters.

@@ +286,5 @@
>      static size_t offsetOfFlags() {
>          return offsetof(BaselineScript, flags_);
>      }
> +
> +    static size_t offsetOfBlockCounterOffset(){

nit: remove this function, it is not used.

::: js/src/ion/Ion.cpp
@@ +976,5 @@
>          return false;
>  
> +    if (js_IonOptions.baselineBranchProfiling && !AttachBranchProfiles(graph))
> +        return false;
> +    IonSpewPass("Attach Branch Profiles");

You want any of these spew & checks only if the phase is enabled.

if (js_IonOptions.baselineBranchProfiling) {
  if (!AttachBranchProfiles(graph))
    …
  …
}

::: js/src/ion/IonAnalysis.cpp
@@ +50,5 @@
> +bool
> +ion::AttachBranchProfiles(MIRGraph &graph)
> +{
> +    // Skip AsmJs MIRGraphs.
> +    if (isAsmJSCompilation(graph))

nit: In the comment, mention why we are skipping, such as "Skip AsmJS optimization as OdinMonkey is an ahead of time compiler and we do not have any profiled information."

@@ +64,5 @@
> +
> +        jsbytecode *pc = block->pc();
> +
> +        // Not all blocks have a pc, only blocks which have a resume point.
> +        if (!pc)

Hum … the comment is right about the resume point, but the condition check the block->pc() which is not the same as the resume point one.  And this one must be set in all IonBuilder constructed block and after, except AsmJS.

So I will try to assert that the pc is not NULL here.

@@ +72,5 @@
> +        // might assign the pc of previous opcode (JSOP_NOP or JSOP_GOTO)
> +        // to corresponding MBasicBlock. These mismatch can be corrected
> +        // by adding the length of the previous opcode.
> +        int loopHeaderOffset = 0;
> +        JSOp op = JSOp(*pc);

move this inside the if brackets.

@@ +75,5 @@
> +        int loopHeaderOffset = 0;
> +        JSOp op = JSOp(*pc);
> +        if (block->isLoopHeader()) {
> +            if(op == JSOP_NOP || op == JSOP_GOTO)
> +                loopHeaderOffset = GetBytecodeLength(pc);

replace that by:
  pc += GetBytecodeLength(pc);

and remove the loopHeaderOffset variable.

@@ +77,5 @@
> +        if (block->isLoopHeader()) {
> +            if(op == JSOP_NOP || op == JSOP_GOTO)
> +                loopHeaderOffset = GetBytecodeLength(pc);
> +            else
> +                JS_ASSERT(op == JSOP_LOOPHEAD);

replace the else by the update of op before doing this assertion.

@@ +86,5 @@
> +        // We can switch to binary search if necessary.
> +        size_t numCounters = blscript->numBlockCounters();
> +        for (size_t i = 0; i < numCounters; i++) {
> +            BlockCounterEntry entry = blscript->blockCounterEntry(i);
> +            if (entry.pcOffset + code == pc + loopHeaderOffset) {

Compute the block pcOffset before entering the loop instead of comparing with the sum every time.

::: js/src/ion/MIRGraph.h
@@ +494,5 @@
> +    uint32_t getBlockUseCount() {
> +        return blockUseCount_;
> +    }
> +
> +    bool isWorthFiltered() {

This function is not used in this patch, remove it.

@@ +497,5 @@
> +
> +    bool isWorthFiltered() {
> +        uint32_t scriptUseCount = info_.script()->getUseCount();
> +        // TODO: Decide whether this block is worth to be filtered out.
> +        if (isBlockUseCountAvailable() && getBlockUseCount() < scriptUseCount / 10)

at first I would not try removing branches.

::: js/src/shell/js.cpp
@@ +5366,5 @@
>                                 "  lsra: Linear Scan register allocation (default)\n"
>                                 "  backtracking: Priority based backtracking register allocation\n"
>                                 "  stupid: Simple block local register allocation")
> +        || !op.addStringOption('\0', "branch-profiling", "on/off",
> +                               "Profile baseline generated codes (default: on, off to disable)")

make it "off" by default until we get any optimization working on top of it.
Attachment #780478 - Flags: review?(nicolas.b.pierron) → feedback+
Blocks: 896783
(Assignee)

Comment 27

5 years ago
Created attachment 783135 [details] [diff] [review]
branchProfiling.patch

Fixed mentioned nits + rebased.
Attachment #783135 - Flags: review?(nicolas.b.pierron)
(Assignee)

Updated

5 years ago
Attachment #780478 - Attachment is obsolete: true
Comment on attachment 783135 [details] [diff] [review]
branchProfiling.patch

Review of attachment 783135 [details] [diff] [review]:
-----------------------------------------------------------------

This sounds good to me.

Just a last modification, to avoid calling the annotating function when we can avoid it.

::: js/src/ion/BaselineCompiler.cpp
@@ +199,5 @@
> +    for (size_t i = 0; i < blockCounterLabels_.length(); i++) {
> +        CodeOffsetLabel label = blockCounterLabels_[i].label;
> +        label.fixup(&masm);
> +        size_t bcEntry = blockCounterLabels_[i].bcEntry;
> +        BlockCounterEntry *bcEntryAddr = &baselineScript->blockCounterEntry(bcEntry);

This sounds weird to me to have 2 vectors with identical indexes just to split between data used only during the code generation and data used after the code generation, but this is the current design of the Baseline IC so I will not block this patch on it.

I would have preferred to have a BaselineCompiler::BlockCounterEntry and a BaselineScript::BlockCounterEntry, and let the copyBlockCounterEntries do the conversion between the Compiler's structure to the Script's structure. (stripping the label)

::: js/src/ion/Ion.cpp
@@ +986,5 @@
>  
>      if (mir->shouldCancel("Renumber Blocks"))
>          return false;
>  
> +    if (js_IonOptions.baselineBranchProfiling) {

Add "&& mir->compilingAsmJS()" in the condition, to avoid doing the spew and the check for asm js compilations.

::: js/src/ion/IonAnalysis.cpp
@@ +55,5 @@
> +{
> +    // Skip AsmJS optimization as OdinMonkey is an ahead of time compiler
> +    // and we do not have any profiled information.
> +    if (isAsmJSCompilation(graph))
> +        return true;

Use an assert instead of a if.
Replace "Skip" by "Forbid".
Attachment #783135 - Flags: review?(nicolas.b.pierron)
Attachment #783135 - Flags: review?(kvijayan)
Attachment #783135 - Flags: review+
Comment on attachment 783135 [details] [diff] [review]
branchProfiling.patch

Review of attachment 783135 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/BaselineCompiler.cpp
@@ +215,5 @@
>      if (cx->runtime()->spsProfiler.enabled())
>          baselineScript->toggleSPS(true);
>  
> +    if (js_IonOptions.baselineBranchProfiling)
> +        baselineScript->toggleBlockCounters(true);

This should also be done in case of failures of Ion compilations, to prevent the profiling overhead.
(Assignee)

Updated

5 years ago
Blocks: 901221
(Assignee)

Comment 30

5 years ago
Created attachment 785770 [details] [diff] [review]
Disable profiling after ionmonkey failed.
(Assignee)

Updated

5 years ago
Blocks: 906418
(Assignee)

Comment 31

5 years ago
Created attachment 793852 [details] [diff] [review]
Update: Disable profiling after ionmonkey failed

Scripts in bug757428.js entered ForbidCompilation() more than once with --baseline-eager option, which crashed toggleBlockCounters(). To void this we test the state of block counters before we toggle.
Attachment #785770 - Attachment is obsolete: true
(Assignee)

Comment 32

5 years ago
Created attachment 798635 [details] [diff] [review]
Update: Normalize counters

Update AttachBranchProfiles(), trying to normalize counter values.
Attachment #793852 - Attachment is obsolete: true
Attachment #798635 - Flags: feedback?(nicolas.b.pierron)
(Assignee)

Updated

5 years ago
Attachment #798635 - Attachment description: branchProfiling.patch → Update: Normalize counters
Comment on attachment 798635 [details] [diff] [review]
Update: Normalize counters

Review of attachment 798635 [details] [diff] [review]:
-----------------------------------------------------------------

I cannot see anything related to the normalization, did you upload the right patch, or can you split the new stuff from the old one.
Comment on attachment 798635 [details] [diff] [review]
Update: Normalize counters

Review of attachment 798635 [details] [diff] [review]:
-----------------------------------------------------------------

Cancel the review (appears in my review list), see last answer.
Attachment #798635 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 783135 [details] [diff] [review]
branchProfiling.patch

Canceling review because I'm just not able to get to this ATM.. especially considering that it's already been reviewed.
Attachment #783135 - Flags: review?(kvijayan)
(Assignee)

Comment 36

5 years ago
Created attachment 808575 [details] [diff] [review]
BranchProfiling.patch

Fix several issues.
1. Fixed a bug related with loopheader: when a basic block is loop header, and its pc points to JSOP_GOTO, we should use the counter of JSOP_LOOPENTRY that the JSOP_GOTO jumps to.

2. A basic block might be divided into two blocks because of function inlining. The first block (bb1) inherits the pc of the original basic block, and hence has a counter. The second block (bb2) is created with a new pc which is not a jump target, so it doesn't have any counters available. In this case we find bb1 for bb2 by iterating the dominators of bb2, and assign bb1's profile to bb2.

3. 'FunctionDispatch' and 'TypeObjectDispatch' generates multiple basic blocks before and after inlined functions, and a return block as their post dominator. This return block has no counter available. In this case, it should use its dominator's counter.

4. Two blocks are generated for JSOP_TRY during MIRGraph building with different bytecode addresses (pc). The pc of the first block is the address of JSOP_TRY, which has a counter available. The pc of second block is the address of the bytecode right after the JSOP_TRY, which is not a jump target. In this case we copy the counter of JSOP_TRY to the second block.

5. Things get complicated when the lastIns of a basic block is 'TableSwitch': 1) There might be more than one split blocks, and 2) one pc might be shared between multiple successors, which makes the result incorrect. This issue haven't been fixed yet, and I will update the patch once I fix it.
Attachment #783135 - Attachment is obsolete: true
Attachment #798635 - Attachment is obsolete: true
Attachment #808575 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 808575 [details] [diff] [review]
BranchProfiling.patch

Review of attachment 808575 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Ion.h
@@ +200,5 @@
> +    // Whether baseline scripts are instrumented.
> +    //
> +    // Default: false
> +    bool baselineBranchProfiling;
> +   

nit: trailing whitespace.

::: js/src/jit/IonAnalysis.cpp
@@ +181,5 @@
> +            // we should not copy the counter from its dominator.
> +            for (; i < block->numPredecessors(); i++)
> +                if (resumePoint == block->getPredecessor(i)->callerResumePoint())
> +                    break;
> +            if (i == block->numPredecessors()) {

These Test should be moved to the previous condition.  It is weird to read something like:

  if (…) {
    … some predicate code …
    if (predicate is true) {
      … some specialized code …
    }
  }

it would be easier to read:

  … some predicate code …
  if (… && predicate is true) {
    … some specialized code …
    continue;
  }

The same remarks apply for the other conditions.

@@ +200,5 @@
> +                            block->id(), pcOffset);
> +                }
> +            }
> +        }
> +        // 'FunctionDispatch' and 'TypeObjectDispatch' generates multiple

nit: Add a new line before this comment.
Attachment #808575 - Flags: feedback?(nicolas.b.pierron) → feedback+
(In reply to Wei Wu [:wuwei UTC+8] from comment #36)
> Fix several issues.

This is really nice that if we can distinguish them precisely, and have this in the code.
Good work!

How do you spot such cases, do you add an assertion to spot missing blocks?
(Assignee)

Comment 39

5 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #38)
> (In reply to Wei Wu [:wuwei UTC+8] from comment #36)
> > Fix several issues.
> 
> This is really nice that if we can distinguish them precisely, and have this
> in the code.
> Good work!
> 
> How do you spot such cases, do you add an assertion to spot missing blocks?

I added some ionspews and assertions to catch these missing or wrong matched blocks, and analyzed them case by case. These scaffolds were removed before submitting.
(Assignee)

Comment 40

2 years ago
The function branch profiler was finished that time, but never landed since I failed to gain significant performance progress in subsequent optimization passes. I'd like to find out whether this profiler is still useful, and retrofit it if necessary.
Flags: needinfo?(lazyparser)
This work should no longer be necessary, as when --ion-pgo=on is given to the JS shell, the basic blocks are annotated with the number of time they got visited out-side IonMonkey. (see Bug 1209515)

I think we can mark this bug as a duplicate of Bug 1209515.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(lazyparser)
Resolution: --- → DUPLICATE
Duplicate of bug: 1209515
You need to log in before you can comment on or make changes to this bug.