test_finalizer.js crashes on Linux64 PGO builds [@ js::ctypes::CDataFinalizer::CallFinalizer]

RESOLVED FIXED

Status

()

Core
js-ctypes
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: mbrubeck, Assigned: Yoric)

Tracking

({crash})

14 Branch
x86_64
Linux
crash
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
test_finalizer.js was green when it landed in bug 720771 on Wednesday, but suddenly became perma-orange on Linux64 PGO builds about one day later on Thursday, starting with push to inbound, which was a merge between m-c and inbound which were both green previously:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=21106c79a43d

and also starting with this push to mozilla-central which happened 14 hours later and does not contain any of the same patches: https://tbpl.mozilla.org/?rev=10622eaff4fc

Since this happens only on PGO builds, it may be related to a bug in gcc's PGO.  Since it started on unrelated changesets on two different branches, perhaps it is triggered by a threshold of code size, or something similar that does not depend on the specific code that changed.

Log from one of the crashes:

https://tbpl.mozilla.org/php/getParsedLog.php?id=10870167&tree=Firefox
Rev3 Fedora 12x64 mozilla-central pgo test xpcshell on 2012-04-13 03:50:17 PDT for push 10622eaff4fc

TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/test_finalizer.js | [test_result_dispose : 322] 0 == 0

TEST-INFO | (xpcshell/head.js) | test 1 finished

TEST-INFO | (xpcshell/head.js) | exiting test

TEST-PASS | (xpcshell/head.js) | 3100 (+ 0) check(s) passed

TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
<<<<<<<
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-pgo/1334300401/firefox-14.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
PROCESS-CRASH | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/test_finalizer.js | application crashed (minidump found)
Crash dump filename: /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/ctypes/tests/unit/594a7af1-47c9-6492-61c53786-7b27a475.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x7fa4fdafc990

Thread 0 (crashed)
 0  0x7fa4fdafc990
    rbx = 0x019b8150   r12 = 0x00000008   r13 = 0x00000001   r14 = 0x1a2d4170
    r15 = 0x00000001   rip = 0xfdafc990   rsp = 0x1a2d4168   rbp = 0x1a2d4170
    Found by: given as instruction pointer in context
 1  libxul.so!ffi_call [ffi64.c:10622eaff4fc : 485 + 0x24]
    rip = 0x0c0e6231   rsp = 0x1a2d4190
    Found by: stack scanning
 2  libxul.so + 0x15445ff
    rip = 0x0bf5c600   rsp = 0x1a2d41c8
    Found by: stack scanning
 3  libxul.so!js::ctypes::CDataFinalizer::CallFinalizer [CTypes.cpp:10622eaff4fc : 6673 + 0x4]
    rip = 0x0c0d148c   rsp = 0x1a2d4280
    Found by: stack scanning
 4  libxul.so!js::ctypes::CDataFinalizer::Finalize [CTypes.cpp:10622eaff4fc : 6819 + 0x9]
    rbx = 0x019b8140   r12 = 0x00000040   r13 = 0xfdf5d0c0   rip = 0x0c0d14c0
    rsp = 0x1a2d42b0   rbp = 0xfdf5d040
    Found by: call frame info
 5  libxul.so!js::gc::FinalizeTypedArenas<JSObject> [jsobjinlines.h:10622eaff4fc : 256 + 0x25]
    rbx = 0xfdf5d080   r12 = 0x00000040   r13 = 0xfdf5d0c0   rip = 0x0bf6e03c
    rsp = 0x1a2d42c0   rbp = 0xfdf5d040
    Found by: call frame info
 6  libxul.so!js::gc::ArenaLists::finalizeObjects [jsgc.cpp:10622eaff4fc : 1499 + 0x2c]
    rbx = 0x1a2d4410   r12 = 0x1a2d4410   r13 = 0x01a02250   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf6f8e5   rsp = 0x1a2d4390   rbp = 0x01aab010
    Found by: call frame info
 7  libxul.so!GCCycle [jsgc.cpp:10622eaff4fc : 3171 + 0xe]
    rbx = 0x01a02000   r12 = 0x1a2d4410   r13 = 0x01a02250   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf6fcad   rsp = 0x1a2d43b0   rbp = 0x1a2d4400
    Found by: call frame info
 8  libxul.so!Collect [jsgc.cpp:10622eaff4fc : 3685 + 0x10]
    rbx = 0x01a02000   r12 = 0x01a028a0   r13 = 0x00000000   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x0bf70445   rsp = 0x1a2d44c0   rbp = 0x01a02250
    Found by: call frame info
 9  xpcshell!main [xpcshell.cpp:10622eaff4fc : 2017 + 0xc]
    rbx = 0x0952e6f0   r12 = 0x00000000   r13 = 0x01aa1f00   r14 = 0x00000000
    r15 = 0x00000000   rip = 0x00407c8e   rsp = 0x1a2d4500   rbp = 0x00000000
    Found by: call frame info
10  libc-2.11.so + 0x1eb1c
    rbx = 0x00000000   r12 = 0x004044f0   r13 = 0x1a2d4850   r14 = 0x00000000
(Reporter)

Comment 1

5 years ago
Created attachment 614846 [details] [diff] [review]
disable test on Linux64 opt/pgo

This patch disables the test for now on Linux64 opt/pgo builds.  (There's no way to disable it for PGO only.)
Attachment #614846 - Flags: review?(jorendorff)
Comment on attachment 614846 [details] [diff] [review]
disable test on Linux64 opt/pgo

We talked it over on IRC, and:

<bsmedberg> that doesn't sound like the kind of test-disablement we want
<mbrubeck> jorendorff, bsmedberg: Alternately we could try backing out all of bug 720771.
<jorendorff> mbrubeck: I was thinking that
<jorendorff> mbrubeck: certainly if the bug is reproducible, backing out the whole thing seems better to me
<mbrubeck> okay, I'll see if it backs out cleanly
<bsmedberg> I think that backing out is preferable if this needs to be solved immediately.

I agree. Poor Yoric.
Attachment #614846 - Flags: review?(jorendorff)
(Reporter)

Comment 3

5 years ago
Backed out bug 720771:
https://hg.mozilla.org/mozilla-central/rev/e1f0bb28fbb4

Leaving this bug open since this issue still needs to be fixed before the backed-out patches can re-land.  You can close this bug if you'd rather just track the work in bug 720771.
Just for clarification: the problem is *not* in bug 720771 but in dependent bug 742384, which was landed immediately after.
Assignee: nobody → dteller
My bad, it may actually be in bug 720771. I am currently investigating the issue and I have the impression that there is a strange interaction between PGO and garbage-collection (see bug 745448).
Ok, issue identified:
- our gc is Boehm-style conservative, so _anything_ can cause a reference to be falsely identified as live;
- my test erroneously relied on all dead references being released.

I have posted fixes to the test suite, now waiting for jorendorff's review to land.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.