Closed Bug 505123 Opened 15 years ago Closed 15 years ago

js3250.dll linking fails on MinGW on fastcall inline functions.

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jacek, Assigned: graydon)

Details

Attachments

(4 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko/2009070514 Gentoo Firefox/3.0.11
Build Identifier: 

It seems to be a bug in MinGW that wrongly handles fastcall inline functions. I get errors like this:

jsmath.o:jsmath.cpp:(.text+0x38e): undefined reference to `@_Z16math_ceil_kerneld@8'
jsmath.o:jsmath.cpp:(.text+0x1828): undefined reference to `@_Z16math_ceil_kerneld@8'

Specyfing calling conventions for inline functions doesn't make much sense. Even if they are not really inlined in generated code, compiler optimizes them much better than changing calling convention anyway.

Reproducible: Always
Attachment #389377 - Flags: review?(brendan)
Comment on attachment 389377 [details] [diff] [review]
Don't use FASTCALL calling convention in inline functions.

Jim for jsmath, graydon for nanojit.

/be
Attachment #389377 - Flags: review?(jim)
Attachment #389377 - Flags: review?(graydon)
Attachment #389377 - Flags: review?(brendan)
Comment on attachment 389377 [details] [diff] [review]
Don't use FASTCALL calling convention in inline functions.

Odd. I suppose I'm not generally opposed to the avmplus / LIR changes, though it seems like an odd workaround.
Attachment #389377 - Flags: review?(graydon) → review+
This is js part rebased version version. It fixes a new problem with dense_grow. Both dense_grow and math_atan2_kernel are used only as function pointers so inlining them doesn't make sense. math_ceil_kernel is inline so JS_FASTCALL doesn't make sense.
Attached patch Rebase nanojit part (obsolete) — Splinter Review
This is a rebased version. Please check it in.
Attachment #389377 - Attachment is obsolete: true
Attachment #390734 - Flags: review?(graydon)
Attachment #389377 - Flags: review?(jim)
Attachment #390733 - Flags: review?(jim)
I notice that the patches are based on TM codebase, should this be moved out of the "tamarin" codebase?
Flags: flashplayer-triage+
Sorry, meant to question if this issue should be moved out of the tamarin "Product" not "codebase"
Comment on attachment 390733 [details] [diff] [review]
New js part fix [committed]

Why does this remove the 'inline' instead of the 'FASTCALL'?  It would be kind of nice to have those jsmath.cpp functions inlined, I think.
(In reply to comment #8)
> (From update of attachment 390733 [details] [diff] [review])
> Why does this remove the 'inline' instead of the 'FASTCALL'?  It would be kind
> of nice to have those jsmath.cpp functions inlined, I think.

Oh, that's right --- because it's referenced from the expansion of the JS_DEFINE_TRCINFO_1 macro.
Comment on attachment 390733 [details] [diff] [review]
New js part fix [committed]

In the jsarray.cpp case, the function wasn't getting inlined even with the 'inline' keyword.  In the jsmath.cpp case, the patch had no measurable effect on performance on Linux; it seemed that the functions were getting inlined even without the 'inline' keyword.
Attachment #390733 - Flags: review?(jim) → review+
It's possible someone will just add back the 'inline' keywords in the future.  You could add a comment, or just hope that by the time that happens your toolchain has been fixed.
Keywords: checkin-needed
Thanks for the review.

(In reply to comment #11)
> It's possible someone will just add back the 'inline' keywords in the future. 
> You could add a comment,

I would have to add comment to every single static function to be really sure it won't happen again. Fortunately I had to fix very few function comparing to their number in js directory, so it looks like it's very rare case.

> or just hope that by the time that happens your toolchain has been fixed.

That would be great, but unfortunately it's not likely to happen soon. Bug 445294 describes a problem with newer gcc versions that points to gcc bug, which seems to be ignored by gcc developers...
Attachment #390734 - Attachment is obsolete: true
Attachment #392697 - Flags: review?(graydon)
Attachment #390734 - Flags: review?(graydon)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #390733 - Attachment description: New js part fix → New js part fix [committed]
Keywords: checkin-needed
Attachment #392697 - Flags: review?(graydon) → review+
Patch #1 landed:
http://hg.mozilla.org/tracemonkey/rev/e11b014105bc

Thanks Jacek.
Whiteboard: fixed-in-tracemonkey
Assignee: nobody → graydon
Hmm.  So this led to a 0.9% SunSpider regression on m-c.
Which one, #1 or #2?
#2 is the only thing that landed on m-c so far.
That's stragne, I'd expect compilers to ignore FASTCALL for inline function. Perhaps they are not really inlined and compiler doesn't even change its calling convention (although it should optimize them anyway). Attached patch may help. It marks these function as inlined explicitly and increases the chance of truly inlining them.
Well, that sucks.  I wonder why this didn't show up when I benchmarked it.

Reverted:
http://hg.mozilla.org/mozilla-central/rev/2b27d847797a
Whiteboard: fixed-in-tracemonkey
Ops, I thought you meant the other patch. The attached patch should fix the regression. I was wrong in comment #4, my apoligize, that dense_grow is used only as function pointer. It is used only in function calls, so it should be inline (still, it's a static function so compilers could optimize it anyway).
Here's the effect of that patch on my machine, Ubuntu on a MacBook Pro.  I'm not sure what we're seeing here.  But we can't do a 1% slowdown.

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.009x as fast    1067.9ms +/- 0.7%   1057.9ms +/- 0.4%     significant

=============================================================================

  3d:                  -                  166.5ms +/- 1.9%    164.8ms +/- 1.5% 
    cube:              -                   45.3ms +/- 1.8%     45.0ms +/- 1.4% 
    morph:             -                   35.1ms +/- 1.8%     34.7ms +/- 1.5% 
    raytrace:          -                   86.1ms +/- 3.2%     85.1ms +/- 2.5% 

  access:              -                  151.1ms +/- 1.5%    149.8ms +/- 0.6% 
    binary-trees:      -                   40.5ms +/- 2.1%     40.0ms +/- 1.0% 
    fannkuch:          -                   67.1ms +/- 1.7%     66.7ms +/- 1.1% 
    nbody:             -                   31.3ms +/- 0.9%     31.0ms +/- 0.8% 
    nsieve:            -                   12.3ms +/- 2.2%     12.0ms +/- 1.8% 

  bitops:              -                   40.8ms +/- 2.0%     40.1ms +/- 1.0% 
    3bit-bits-in-byte: -                    1.7ms +/- 7.2%      1.6ms +/- 9.1% 
    bits-in-byte:      -                    8.5ms +/- 2.9%      8.4ms +/- 2.2% 
    bitwise-and:       -                    2.6ms +/- 8.1%      2.4ms +/- 7.2% 
    nsieve-bits:       -                   27.9ms +/- 2.0%     27.7ms +/- 1.0% 

  controlflow:         -                   38.0ms +/- 1.7%     37.6ms +/- 1.1% 
    recursive:         -                   38.0ms +/- 1.7%     37.6ms +/- 1.1% 

  crypto:              -                   63.9ms +/- 1.4%     63.3ms +/- 0.7% 
    aes:               -                   35.5ms +/- 1.8%     35.0ms +/- 0.8% 
    md5:               -                   18.0ms +/- 1.4%     18.0ms +/- 1.4% 
    sha1:              -                   10.3ms +/- 1.7%     10.3ms +/- 2.4% 

  date:                -                  144.0ms +/- 0.9%    142.6ms +/- 0.4% 
    format-tofte:      -                   68.6ms +/- 0.7%     68.1ms +/- 0.6% 
    format-xparb:      -                   75.4ms +/- 1.4%     74.5ms +/- 0.4% 

  math:                -                   43.5ms +/- 0.9%     43.5ms +/- 0.9% 
    cordic:            -                   13.9ms +/- 2.2%     13.8ms +/- 1.7% 
    partial-sums:      -                   22.2ms +/- 0.9%     22.2ms +/- 0.7% 
    spectral-norm:     ??                   7.3ms +/- 2.3%      7.5ms +/- 2.8%     not conclusive: might be *1.033x as slow*

  regexp:              -                   44.2ms +/- 0.8%     43.9ms +/- 0.5% 
    dna:               -                   44.2ms +/- 0.8%     43.9ms +/- 0.5% 

  string:              1.010x as fast     376.0ms +/- 0.7%    372.3ms +/- 0.5%     significant
    base64:            ??                  18.6ms +/- 1.2%     18.7ms +/- 1.2%     not conclusive: might be *1.010x as slow*
    fasta:             1.017x as fast      79.5ms +/- 1.1%     78.1ms +/- 0.7%     significant
    tagcloud:          1.011x as fast     117.8ms +/- 0.9%    116.5ms +/- 0.5%     significant
    unpack-code:       -                  126.6ms +/- 1.0%    125.6ms +/- 0.9% 
    validate-input:    -                   33.5ms +/- 0.9%     33.3ms +/- 0.6%
(In reply to comment #22)
> Here's the effect of that patch on my machine, Ubuntu on a MacBook Pro.  I'm
> not sure what we're seeing here.  But we can't do a 1% slowdown.

Just for clarity: the patch is a .9% speedup on my system.  But we can't accept a slowdown on any platform.
I did a Linux build of mozilla-central to test it. I couldn't get as stable results as you did (I usually got between +/- 2 and 3%, the best I could get was 1.1%). I've compiled on AMD Phenom, Gentoo 32-bit, gcc 4.3.2. I've three variants: (1) without patch #2, (2) with patch #2 and (3) with both patch #2 and patchfrom comment #21. The fastest build was (2), second (3) and current code (1) was the slowest.

Here is comparison between FROM (1) and TO (2):

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.03x as fast     1402.2ms +/- 1.1%   1364.8ms +/- 2.2%     significant

=============================================================================

  3d:                  -                  207.4ms +/- 11.3%    206.4ms +/- 15.0% 
    cube:              -                   63.0ms +/- 5.0%     61.2ms +/- 1.7% 
    morph:             ??                  49.8ms +/- 1.1%     50.0ms +/- 1.8%     not conclusive: might be *1.00x as slow*
    raytrace:          ??                  94.6ms +/- 28.2%     95.2ms +/- 32.7%     not conclusive: might be *1.01x as slow*

  access:              1.02x as fast      200.2ms +/- 1.3%    197.2ms +/- 1.0%     significant
    binary-trees:      1.02x as fast       59.0ms +/- 1.5%     57.8ms +/- 2.4%     significant
    fannkuch:          ??                  84.4ms +/- 1.3%     84.6ms +/- 0.8%     not conclusive: might be *1.00x as slow*
    nbody:             1.05x as fast       41.8ms +/- 3.3%     39.8ms +/- 1.4%     significant
    nsieve:            -                   15.0ms +/- 5.9%     15.0ms +/- 5.9% 

  bitops:              *1.01x as slow*     47.0ms +/- 0.0%     47.4ms +/- 2.3%     significant
    3bit-bits-in-byte: ??                   1.4ms +/- 48.6%      2.0ms +/- 44.0%     not conclusive: might be *1.43x as slow*
    bits-in-byte:      -                   10.8ms +/- 5.1%     10.2ms +/- 5.5% 
    bitwise-and:       *1.20x as slow*      2.0ms +/- 0.0%      2.4ms +/- 28.4%     significant
    nsieve-bits:       -                   32.8ms +/- 1.7%     32.8ms +/- 1.7% 

  controlflow:         ??                  56.0ms +/- 1.6%     57.0ms +/- 2.2%     not conclusive: might be *1.02x as slow*
    recursive:         ??                  56.0ms +/- 1.6%     57.0ms +/- 2.2%     not conclusive: might be *1.02x as slow*

  crypto:              ??                  68.0ms +/- 3.7%     68.2ms +/- 2.0%     not conclusive: might be *1.00x as slow*
    aes:               ??                  39.6ms +/- 6.1%     40.0ms +/- 2.2%     not conclusive: might be *1.01x as slow*
    md5:               ??                  17.4ms +/- 3.9%     17.6ms +/- 3.9%     not conclusive: might be *1.01x as slow*
    sha1:              1.04x as fast       11.0ms +/- 0.0%     10.6ms +/- 6.4%     significant

  date:                -                  199.0ms +/- 3.2%    196.4ms +/- 2.8% 
    format-tofte:      ??                 111.2ms +/- 2.3%    112.0ms +/- 3.9%     not conclusive: might be *1.01x as slow*
    format-xparb:      -                   87.8ms +/- 4.7%     84.4ms +/- 2.2% 

  math:                1.04x as fast       55.2ms +/- 3.7%     53.0ms +/- 0.0%     significant
    cordic:            -                   22.0ms +/- 8.0%     20.4ms +/- 5.5% 
    partial-sums:      1.01x as fast       23.0ms +/- 0.0%     22.8ms +/- 2.4%     significant
    spectral-norm:     -                   10.2ms +/- 5.5%      9.8ms +/- 5.7% 

  regexp:              1.44x as fast      103.4ms +/- 22.8%     71.8ms +/- 14.5%     significant
    dna:               1.44x as fast      103.4ms +/- 22.8%     71.8ms +/- 14.5%     significant

  string:              ??                 466.0ms +/- 1.5%    467.4ms +/- 4.1%     not conclusive: might be *1.00x as slow*
    base64:            *1.07x as slow*     21.4ms +/- 5.2%     22.8ms +/- 4.6%     significant
    fasta:             *1.02x as slow*    104.4ms +/- 1.4%    107.0ms +/- 3.9%     significant
    tagcloud:          -                  135.2ms +/- 3.0%    134.4ms +/- 4.7% 
    unpack-code:       *1.05x as slow*    149.2ms +/- 1.6%    156.2ms +/- 4.9%     significant
    validate-input:    1.19x as fast       55.8ms +/- 4.0%     47.0ms +/- 9.0%     significant

and between FROM (1) and TO (3)
TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           -                 1402.2ms +/- 1.1%   1384.0ms +/- 2.9% 

=============================================================================

  3d:                  -                  207.4ms +/- 11.3%    202.4ms +/- 1.3% 
    cube:              *1.06x as slow*     63.0ms +/- 5.0%     67.0ms +/- 2.6%     significant
    morph:             ??                  49.8ms +/- 1.1%     50.0ms +/- 1.8%     not conclusive: might be *1.00x as slow*
    raytrace:          -                   94.6ms +/- 28.2%     85.4ms +/- 2.2% 

  access:              ??                 200.2ms +/- 1.3%    200.6ms +/- 1.7%     not conclusive: might be *1.00x as slow*
    binary-trees:      ??                  59.0ms +/- 1.5%     59.4ms +/- 1.9%     not conclusive: might be *1.01x as slow*
    fannkuch:          ??                  84.4ms +/- 1.3%     85.4ms +/- 2.2%     not conclusive: might be *1.01x as slow*
    nbody:             -                   41.8ms +/- 3.3%     41.2ms +/- 3.9% 
    nsieve:            -                   15.0ms +/- 5.9%     14.6ms +/- 4.7% 

  bitops:              *1.03x as slow*     47.0ms +/- 0.0%     48.2ms +/- 2.2%     significant
    3bit-bits-in-byte: ??                   1.4ms +/- 48.6%      2.0ms +/- 44.0%     not conclusive: might be *1.43x as slow*
    bits-in-byte:      -                   10.8ms +/- 5.1%     10.8ms +/- 5.1% 
    bitwise-and:       *1.20x as slow*      2.0ms +/- 0.0%      2.4ms +/- 28.4%     significant
    nsieve-bits:       ??                  32.8ms +/- 1.7%     33.0ms +/- 0.0%     not conclusive: might be *1.01x as slow*

  controlflow:         ??                  56.0ms +/- 1.6%     56.4ms +/- 2.5%     not conclusive: might be *1.01x as slow*
    recursive:         ??                  56.0ms +/- 1.6%     56.4ms +/- 2.5%     not conclusive: might be *1.01x as slow*

  crypto:              *1.05x as slow*     68.0ms +/- 3.7%     71.4ms +/- 1.0%     significant
    aes:               ??                  39.6ms +/- 6.1%     42.4ms +/- 6.1%     not conclusive: might be *1.07x as slow*
    md5:               -                   17.4ms +/- 3.9%     17.4ms +/- 3.9% 
    sha1:              *1.05x as slow*     11.0ms +/- 0.0%     11.6ms +/- 20.9%     significant

  date:                ??                 199.0ms +/- 3.2%    203.0ms +/- 3.6%     not conclusive: might be *1.02x as slow*
    format-tofte:      ??                 111.2ms +/- 2.3%    113.0ms +/- 3.7%     not conclusive: might be *1.02x as slow*
    format-xparb:      ??                  87.8ms +/- 4.7%     90.0ms +/- 4.0%     not conclusive: might be *1.03x as slow*

  math:                -                   55.2ms +/- 3.7%     54.2ms +/- 4.4% 
    cordic:            -                   22.0ms +/- 8.0%     21.6ms +/- 9.6% 
    partial-sums:      1.02x as fast       23.0ms +/- 0.0%     22.6ms +/- 3.0%     significant
    spectral-norm:     -                   10.2ms +/- 5.5%     10.0ms +/- 0.0% 

  regexp:              -                  103.4ms +/- 22.8%     91.4ms +/- 23.1% 
    dna:               -                  103.4ms +/- 22.8%     91.4ms +/- 23.1% 

  string:              -                  466.0ms +/- 1.5%    456.4ms +/- 3.4% 
    base64:            ??                  21.4ms +/- 5.2%     22.0ms +/- 5.7%     not conclusive: might be *1.03x as slow*
    fasta:             1.03x as fast      104.4ms +/- 1.4%    101.8ms +/- 4.6%     significant
    tagcloud:          -                  135.2ms +/- 3.0%    130.6ms +/- 3.1% 
    unpack-code:       ??                 149.2ms +/- 1.6%    151.4ms +/- 4.3%     not conclusive: might be *1.01x as slow*
    validate-input:    1.10x as fast       55.8ms +/- 4.0%     50.6ms +/- 15.5%     significant


I'd read it as a sign that GCC doesn't like inline FASTCALL. Where can I find what platform regressed?
In general, the m.d.tree-management newsgroup.  This particular case was:

Regression: SunSpider increase 0.90% on Tiger Firefox

Previous results: 75.2 from build 20090806113252 of revision bae405b94b96 at 2009-08-06 11:32:00 on talos-rev2-tiger14 run # 0

New results: 75.88 from build 20090806114313 of revision f1a04b1c0b37 at 2009-08-06 11:43:00 on qm-pmac-fast03 run # 0

Suspected checkin range: from revision bae405b94b96 to revision f1a04b1c0b37
http://hg.mozilla.org/mozilla-central/rev/e11b014105bc
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: