Closed
Bug 912168
Opened 11 years ago
Closed 11 years ago
startup precompilation cache broken on sparc64
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: gaston, Assigned: gsvelto)
References
Details
Attachments
(2 files)
Assigning to xpcom, but feel free to reassign to JS engine ?
Since some days, make package on sparc64 is broken in xpcshell cache precompilation.
e45c455f085a was fine (see http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/533)
dc7b76fcf7e4 is broken (see http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/534)
Trying to analyze xpcshell.core in gdb only leads to gdb 6.3 exploding, because libxul.so is huuuuge. gdb 7.6 cant read the core file.
I've started bisecting on my builder with rev fe6833808b5a.. but if anyone has an idea of a possible candidate in the 280-so involved changesets, i'm all ears.
Reporter | ||
Comment 1•11 years ago
|
||
jsapi-tests are all fine with m-c from last night , so it might be an xpcom/xpcshell issue, and not a js issue ?
Reporter | ||
Comment 2•11 years ago
|
||
fe6833808b5a is good, testing now with 79fd8b08b959...
Reporter | ||
Comment 3•11 years ago
|
||
79fd8b08b959 is bad, testing now with 81fb29b23c8a (only 60 changesets in the regression window!)
Reporter | ||
Comment 4•11 years ago
|
||
e45c455f085a:good
dc7b76fcf7e4:bad
fe6833808b5a:good
79fd8b08b959:bad
81fb29b23c8a:bad
6197cc8e1a3b:bad
testing cd5991a56874 now.. my regression window is hg log -r fe6833808b5a:6197cc8e1a3b so far (20 csets)
Reporter | ||
Comment 5•11 years ago
|
||
I've almost finished bisecting, and so far it seems http://hg.mozilla.org/mozilla-central/rev/416075f77249 is the changeset breaking sparc64. Given that this is a merge cset, i dont really know where to dig for more details as for the involved csets, nor which of the bugs in fx-team should be marked as blocker for that one..
e45c455f085a:good
fe6833808b5a:good
cd5991a56874:good
416075f77249
a7d0dd73fc25:bad
2643fd47538b:bad
6197cc8e1a3b:bad
81fb29b23c8a:bad
79fd8b08b959:bad
dc7b76fcf7e4:bad
Reporter | ||
Comment 6•11 years ago
|
||
confirmed, cd5991a56874 is good and 416075f77249 is bad. How can i bisect the csets in the merge ?
Reporter | ||
Comment 7•11 years ago
|
||
After a leenghty bisection (thanks hg bisect --extend), i finally got further :
The first bad revision is:
changeset: 144521:ca06d27f049f
user: Brian Hackett <bhackett1024@gmail.com>
date: Tue Aug 27 11:48:55 2013 -0600
summary: Bug 908301 - Remove dedicated source compression thread, use JS worker threads instead, allow saving source when parsing off thread, r=benjamin.
Trying a build of m-c tip with that revision backed out to confirm the suspicion.
Depends on: 908301
Reporter | ||
Comment 8•11 years ago
|
||
a build of m-c tip with ca06d27f049f backed out builds, packages and runs fine on sparc64. Given that it's an architecture without ION, might it be related to the changes made in that rev wrt #ifdefs ?
Reporter | ||
Updated•11 years ago
|
Assignee: nobody → general
Component: XPCOM → JavaScript Engine
Reporter | ||
Comment 9•11 years ago
|
||
It seems JS_WORKER_THREADS is only defined if JS_ION is defined (js/src/vm/Runtime.h) so probably not the case on exotic archs. Is there a safe fallback in this case ?
Reporter | ||
Comment 10•11 years ago
|
||
From what i understand of the diff, it seems previously the sourceCompressorThread was available #if JS_THREADSAFE (which afaict is enabled everywhere, including sparc64), and it is now exclusively done from a workerthread, which is only available if JS_ION is on. I dont grok anything of this, but i'm not sure exotic non-ion archs have been taken care of in that commit...
Reporter | ||
Comment 11•11 years ago
|
||
Other datapoint: http://hg.mozilla.org/mozilla-central/rev/43259182e1a0 builds, packages and runs fine on powerpc, which is also !JS_ION. This cset is more recent than ca06d27f049f
Reporter | ||
Comment 12•11 years ago
|
||
As a huge hack/workaround, force-disabling the helper threads helps:
--- a/js/src/jsapi.cpp Tue Sep 10 14:58:50 2013 +0900
+++ b/js/src/jsapi.cpp Tue Sep 10 20:53:27 2013 +0200
@@ -700,7 +700,7 @@
{
MOZ_ASSERT(jsInitState == Running,
"must call JS_Init prior to creating any JSRuntimes");
-
+ useHelperThreads = JS_NO_HELPER_THREADS;
This is gross, but with that everything is fine at runtime/package time. Now, who will figure out what is wrong with helper threads on sparc64...
Reporter | ||
Comment 13•11 years ago
|
||
Ping ? would be nice to get that properly fixed before next uplift, and i doubt i can fix that myself.
Comment 14•11 years ago
|
||
Flags: needinfo?(bhackett1024)
Comment 15•11 years ago
|
||
Per comment 11 this is likely a sparc specific issue. I think we should just use the workaround in comment 12 (which I suggested on IRC) #ifdef'ed for sparc.
Flags: needinfo?(bhackett1024)
Reporter | ||
Comment 16•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #15)
> Per comment 11 this is likely a sparc specific issue. I think we should
> just use the workaround in comment 12 (which I suggested on IRC) #ifdef'ed
> for sparc.
Thats... hiding issues under the carpet, and i think we expect a better quality standard from our codebase :)
Reporter | ||
Comment 17•11 years ago
|
||
a build of m-c from cset d5fc994ca2ed packages and runs fine on powerpc. So its not a 'all non-JS_ION platforms' issue.
Reporter | ||
Comment 18•11 years ago
|
||
Reporter | ||
Comment 19•11 years ago
|
||
Contrary to what i said in a previous comment, jsapi-tests badly fail (ie explode/fail/segfault) when the helper threads are used, and mostly all pass when it is disabled (ie useHelperThreads = JS_NO_HELPER_THREADS as bhackett suggested)
I'd like to find the smallest test case possible to try to gather an actual coredump & a backtrace..
Reporter | ||
Comment 20•11 years ago
|
||
gdb 6.3 cant load jsapi-tests.core, and 7.6 can load it but cant do anything with it. Exactly the same problem as with the xpcshell.core generated by make package.
[New process 30044]
warning: Couldn't recognize general-purpose registers in core file.
Core was generated by `jsapi-tests'.
Program terminated with signal 11, Segmentation fault.
warning: Couldn't recognize general-purpose registers in core file.
#0 <unavailable> in ?? ()
(gdb)
Reporter | ||
Comment 21•11 years ago
|
||
I can still run jsapi-tests directly from egdb. So apart all the TEST-UNEXPECTED-FAIL that are triggered by the helper threads, the one that crashes jsapi-tests gives this backtrace:
testOriginPrincipals
Program received signal SIGSEGV, Segmentation fault.
0x0000003321220630 in JS_GetFunctionScript (cx=0x3525d7d400, fun=<optimized out>)
at /home/landry/m-c/js/src/vm/OldDebugAPI.cpp:525
525 MOZ_CRASH();
(gdb) bt
#0 0x0000003321220630 in JS_GetFunctionScript (cx=0x3525d7d400, fun=<optimized out>)
at /home/landry/m-c/js/src/vm/OldDebugAPI.cpp:525
#1 0x0000003321185f30 in testInner (originPrincipal=0x3322057a6c <prin1>,
principal=0x3322057a6c <prin1>, asciiChars=0x332163e2a8 "function f() {return 1}; f;",
this=0x33224d4250 <cls_testOriginPrincipals_instance>)
at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:90
#2 cls_testOriginPrincipals::testOuter (this=0x33224d4250 <cls_testOriginPrincipals_instance>,
asciiChars=0x332163e2a8 "function f() {return 1}; f;")
at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:79
#3 0x0000003321187718 in cls_testOriginPrincipals::run (
this=0x33224d4250 <cls_testOriginPrincipals_instance>, global=...)
at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:27
#4 0x00000033211c1c98 in main (argc=<optimized out>, argv=<optimized out>)
at /home/landry/m-c/js/src/jsapi-tests/tests.cpp:100
Reporter | ||
Comment 22•11 years ago
|
||
cc'ing billm at bhackett's suggestion.. i can try any random idea, but i cant come up with a smaller testcase myself on how to debug this.
Sorry, I don't know anything about this code.
Reporter | ||
Comment 24•11 years ago
|
||
CC'ing martin husemann, since he did lots of sparc64 fixing he might have an idea.. I'd rather avoid disabling the helper threads on sparc64 :(
Comment 25•11 years ago
|
||
I am not sure this is related, but I get an assertion failure and crash in debug versions due to mozilla::TimeStamp::sFirstTimeStamp being instantiated in different DSOs and the value set in TimeStamp::Startup() early is overwritten with 0 by the constructor run when libxul.so is loaded:
Watchpoint 2: sFirstTimeStamp
Old value =
{mValue = 161451583385268, static sFirstTimeStamp = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = <same as static member of an already seen type>}}, static sProcessCreation = <same as static member of an already seen type>}
New value =
{mValue = 0, static sFirstTimeStamp = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = <same as static member of an already seen type>}}, static sProcessCreation = <same as static member of an already seen type>}
0x000000004569f5bc in mozilla::TimeStamp::TimeStamp (
this=0x4ae9e8a0 <mozilla::TimeStamp::sFirstTimeStamp>)
at ../../../dist/include/mozilla/TimeStamp.h:213
213 MOZ_CONSTEXPR TimeStamp() : mValue(0) {}
(gdb) bt
#0 0x000000004569f5bc in mozilla::TimeStamp::TimeStamp (
this=0x4ae9e8a0 <mozilla::TimeStamp::sFirstTimeStamp>)
at ../../../dist/include/mozilla/TimeStamp.h:213
#1 0x00000000488d12c0 in __static_initialization_and_destruction_0 (
__initialize_p=1, __priority=65535)
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:16
#2 0x00000000488d12f8 in global constructors keyed to TimeStamp.cpp(void) ()
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:62
Assertion failure: !aOther.IsNull() (Cannot compute with aOther null value), at ../../../dist/include/mozilla/TimeStamp.h:314
Program received signal SIGSEGV, Segmentation fault.
0x00000000456f5e04 in mozilla::TimeStamp::operator> (this=0xffffffffffffb370,
aOther=...) at ../../../dist/include/mozilla/TimeStamp.h:314
314 MOZ_ASSERT(!aOther.IsNull(), "Cannot compute with aOther null value");
(gdb) bt
#0 0x00000000456f5e04 in mozilla::TimeStamp::operator> (
this=0xffffffffffffb370, aOther=...)
at ../../../dist/include/mozilla/TimeStamp.h:314
#1 0x00000000488d118c in mozilla::TimeStamp::ProcessCreation (
aIsInconsistent=@0xffffffffffffb44f: false)
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:41
#2 0x00000000475aed4c in mozilla::StartupTimelineRecordExternal (aEvent=1,
aWhen=161353755552945)
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/toolkit/components/startup/StartupTimeline.cpp:37
#3 0x00000000475aedf0 in XRE_StartupTimelineRecord (aEvent=1,
aWhen=161353755552945)
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/toolkit/components/startup/StartupTimeline.cpp:66
#4 0x000000000010435c in main (argc=1, argv=0xffffffffffffb838)
at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/browser/app/nsBrowserApp.cpp:605
Comment 26•11 years ago
|
||
(In reply to Martin Husemann from comment #25)
please ignore the strange path names - source is a hg pull from yesterday, the firefox-24.0esr.source in the path name is from a few symlinks to make the local debug envrionment easier
Comment 27•11 years ago
|
||
Landry, could you test if the patch from #932329 helps for your problem? I am stil not sure it is the same; will rebuild now and do a test run.
Reporter | ||
Comment 28•11 years ago
|
||
I've put https://bug906754.bugzilla.mozilla.org/attachment.cgi?id=823533 on my buildslave, will confirm tmrw if it fixes "my" issue, or if it fixes another different issue..
Reporter | ||
Comment 29•11 years ago
|
||
As of now (and i shouldnt comment in this bug, really) build is broken for my anyway on sparc64 because of a different issue (caused by #898274 two months ago? wtf?):
eg++ -o testIntTypesABI.o -c -fvisibility=hidden -DEXPORT_JS_API -DIMPL_MFBT -DMOZ_GLUE_IN_PROGRAM -DNO_NSPR_10_SUPPORT -I/home/buildslave/mozilla-central-sparc64/build/js/src -I.. -I/home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests -I. -I../../../dist/include -I/data/obj/buildslave/m-c/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include ../js-confdefs.h -MD -MP -MF .deps/testIntTypesABI.o.pp -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Werror=conversion-null -Wsign-compare -Wno-invalid-offsetof -fno-rtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe -DNDEBUG -DTRIMMED -g -O -fomit-frame-pointer /home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests/testIntTypesABI.cpp
testJSEvaluateScript.o
In file included from /usr/include/machine/endian.h:7:0,
from /home/buildslave/mozilla-central-sparc64/build/js/src/jscpucfg.h:56,
from /home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests/testIntTypesABI.cpp:12:
/usr/include/sys/endian.h:162:1: error: '__uint64_t' does not name a type
__uint64_t htobe64(__uint64_t);
^
/usr/include/sys/endian.h:163:1: error: '__uint32_t' does not name a type
__uint32_t htobe32(__uint32_t);
http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/595
Reporter | ||
Comment 30•11 years ago
|
||
It 'seems' this issue has been fixed by bug #906754. My sparc64 builder is happy now (cf http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64, builds 599/600) with the header reordering in jscpucfg.h reverted (see #932991) and make package is fine there, so startup cache precompilation should be fixed too. Will confirm that doing an actual runtime testing.
Reporter | ||
Comment 31•11 years ago
|
||
runtime testing positive, ie no crash during make package and firefox from trunk starts fine on sparc64.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•