Last Comment Bug 745233 - test_finalizer.js crashes on Linux64 PGO builds [@ js::ctypes::CDataFinalizer::CallFinalizer]
: test_finalizer.js crashes on Linux64 PGO builds [@ js::ctypes::CDataFinalizer...
: crash
Product: Core
Classification: Components
Component: js-ctypes (show other bugs)
: 14 Branch
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: David Teller [:Yoric] (please use "needinfo")
: Jason Orendorff [:jorendorff]
Depends on:
Blocks: ctypes.finalize
  Show dependency treegraph
Reported: 2012-04-13 10:07 PDT by Matt Brubeck (:mbrubeck)
Modified: 2012-05-02 09:33 PDT (History)
2 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

disable test on Linux64 opt/pgo (826 bytes, patch)
2012-04-13 10:15 PDT, Matt Brubeck (:mbrubeck)
no flags Details | Diff | Splinter Review

Description Matt Brubeck (:mbrubeck) 2012-04-13 10:07:20 PDT
test_finalizer.js was green when it landed in bug 720771 on Wednesday, but suddenly became perma-orange on Linux64 PGO builds about one day later on Thursday, starting with push to inbound, which was a merge between m-c and inbound which were both green previously:

and also starting with this push to mozilla-central which happened 14 hours later and does not contain any of the same patches:

Since this happens only on PGO builds, it may be related to a bug in gcc's PGO.  Since it started on unrelated changesets on two different branches, perhaps it is triggered by a threshold of code size, or something similar that does not depend on the specific code that changed.

Log from one of the crashes:
Rev3 Fedora 12x64 mozilla-central pgo test xpcshell on 2012-04-13 03:50:17 PDT for push 10622eaff4fc

TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/test_finalizer.js | [test_result_dispose : 322] 0 == 0

TEST-INFO | (xpcshell/head.js) | test 1 finished

TEST-INFO | (xpcshell/head.js) | exiting test

TEST-PASS | (xpcshell/head.js) | 3100 (+ 0) check(s) passed

TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
Downloading symbols from:
PROCESS-CRASH | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/test_finalizer.js | application crashed (minidump found)
Crash dump filename: /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/594a7af1-47c9-6492-61c53786-7b27a475.dmp
Operating system: Linux
                  0.0.0 Linux #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x7fa4fdafc990

Thread 0 (crashed)
 0  0x7fa4fdafc990
    rbx = 0x019b8150   r12 = 0x00000008   r13 = 0x00000001   r14 = 0x1a2d4170
    r15 = 0x00000001   rip = 0xfdafc990   rsp = 0x1a2d4168   rbp = 0x1a2d4170
    Found by: given as instruction pointer in context
 1!ffi_call [ffi64.c:10622eaff4fc : 485 + 0x24]
    rip = 0x0c0e6231   rsp = 0x1a2d4190
    Found by: stack scanning
 2 + 0x15445ff
    rip = 0x0bf5c600   rsp = 0x1a2d41c8
    Found by: stack scanning
 3!js::ctypes::CDataFinalizer::CallFinalizer [CTypes.cpp:10622eaff4fc : 6673 + 0x4]
    rip = 0x0c0d148c   rsp = 0x1a2d4280
    Found by: stack scanning
 4!js::ctypes::CDataFinalizer::Finalize [CTypes.cpp:10622eaff4fc : 6819 + 0x9]
    rbx = 0x019b8140   r12 = 0x00000040   r13 = 0xfdf5d0c0   rip = 0x0c0d14c0
    rsp = 0x1a2d42b0   rbp = 0xfdf5d040
    Found by: call frame info
 5!js::gc::FinalizeTypedArenas<JSObject> [jsobjinlines.h:10622eaff4fc : 256 + 0x25]
    rbx = 0xfdf5d080   r12 = 0x00000040   r13 = 0xfdf5d0c0   rip = 0x0bf6e03c
    rsp = 0x1a2d42c0   rbp = 0xfdf5d040
    Found by: call frame info
 6!js::gc::ArenaLists::finalizeObjects [jsgc.cpp:10622eaff4fc : 1499 + 0x2c]
    rbx = 0x1a2d4410   r12 = 0x1a2d4410   r13 = 0x01a02250   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf6f8e5   rsp = 0x1a2d4390   rbp = 0x01aab010
    Found by: call frame info
 7!GCCycle [jsgc.cpp:10622eaff4fc : 3171 + 0xe]
    rbx = 0x01a02000   r12 = 0x1a2d4410   r13 = 0x01a02250   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf6fcad   rsp = 0x1a2d43b0   rbp = 0x1a2d4400
    Found by: call frame info
 8!Collect [jsgc.cpp:10622eaff4fc : 3685 + 0x10]
    rbx = 0x01a02000   r12 = 0x01a028a0   r13 = 0x00000000   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf70445   rsp = 0x1a2d44c0   rbp = 0x01a02250
    Found by: call frame info
 9  xpcshell!main [xpcshell.cpp:10622eaff4fc : 2017 + 0xc]
    rbx = 0x0952e6f0   r12 = 0x00000000   r13 = 0x01aa1f00   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x00407c8e   rsp = 0x1a2d4500   rbp = 0x00000000
    Found by: call frame info
10 + 0x1eb1c
    rbx = 0x00000000   r12 = 0x004044f0   r13 = 0x1a2d4850   r14 = 0x00000000
Comment 1 Matt Brubeck (:mbrubeck) 2012-04-13 10:15:44 PDT
Created attachment 614846 [details] [diff] [review]
disable test on Linux64 opt/pgo

This patch disables the test for now on Linux64 opt/pgo builds.  (There's no way to disable it for PGO only.)
Comment 2 Jason Orendorff [:jorendorff] 2012-04-13 10:33:34 PDT
Comment on attachment 614846 [details] [diff] [review]
disable test on Linux64 opt/pgo

We talked it over on IRC, and:

<bsmedberg> that doesn't sound like the kind of test-disablement we want
<mbrubeck> jorendorff, bsmedberg: Alternately we could try backing out all of bug 720771.
<jorendorff> mbrubeck: I was thinking that
<jorendorff> mbrubeck: certainly if the bug is reproducible, backing out the whole thing seems better to me
<mbrubeck> okay, I'll see if it backs out cleanly
<bsmedberg> I think that backing out is preferable if this needs to be solved immediately.

I agree. Poor Yoric.
Comment 3 Matt Brubeck (:mbrubeck) 2012-04-13 10:53:01 PDT
Backed out bug 720771:

Leaving this bug open since this issue still needs to be fixed before the backed-out patches can re-land.  You can close this bug if you'd rather just track the work in bug 720771.
Comment 4 David Teller [:Yoric] (please use "needinfo") 2012-04-13 14:40:36 PDT
Just for clarification: the problem is *not* in bug 720771 but in dependent bug 742384, which was landed immediately after.
Comment 5 David Teller [:Yoric] (please use "needinfo") 2012-04-14 07:15:13 PDT
My bad, it may actually be in bug 720771. I am currently investigating the issue and I have the impression that there is a strange interaction between PGO and garbage-collection (see bug 745448).
Comment 6 David Teller [:Yoric] (please use "needinfo") 2012-04-18 08:42:13 PDT
Ok, issue identified:
- our gc is Boehm-style conservative, so _anything_ can cause a reference to be falsely identified as live;
- my test erroneously relied on all dead references being released.

I have posted fixes to the test suite, now waiting for jorendorff's review to land.

Note You need to log in before you can comment on or make changes to this bug.