Closed Bug 912168 Opened 11 years ago Closed 11 years ago

startup precompilation cache broken on sparc64

Categories

(Core :: JavaScript Engine, defect)

Sun
OpenBSD
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: gaston, Assigned: gsvelto)

References

Details

Attachments

(2 files)

Assigning to xpcom, but feel free to reassign to JS engine ? Since some days, make package on sparc64 is broken in xpcshell cache precompilation. e45c455f085a was fine (see http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/533) dc7b76fcf7e4 is broken (see http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/534) Trying to analyze xpcshell.core in gdb only leads to gdb 6.3 exploding, because libxul.so is huuuuge. gdb 7.6 cant read the core file. I've started bisecting on my builder with rev fe6833808b5a.. but if anyone has an idea of a possible candidate in the 280-so involved changesets, i'm all ears.
jsapi-tests are all fine with m-c from last night , so it might be an xpcom/xpcshell issue, and not a js issue ?
fe6833808b5a is good, testing now with 79fd8b08b959...
79fd8b08b959 is bad, testing now with 81fb29b23c8a (only 60 changesets in the regression window!)
e45c455f085a:good dc7b76fcf7e4:bad fe6833808b5a:good 79fd8b08b959:bad 81fb29b23c8a:bad 6197cc8e1a3b:bad testing cd5991a56874 now.. my regression window is hg log -r fe6833808b5a:6197cc8e1a3b so far (20 csets)
I've almost finished bisecting, and so far it seems http://hg.mozilla.org/mozilla-central/rev/416075f77249 is the changeset breaking sparc64. Given that this is a merge cset, i dont really know where to dig for more details as for the involved csets, nor which of the bugs in fx-team should be marked as blocker for that one.. e45c455f085a:good fe6833808b5a:good cd5991a56874:good 416075f77249 a7d0dd73fc25:bad 2643fd47538b:bad 6197cc8e1a3b:bad 81fb29b23c8a:bad 79fd8b08b959:bad dc7b76fcf7e4:bad
confirmed, cd5991a56874 is good and 416075f77249 is bad. How can i bisect the csets in the merge ?
After a leenghty bisection (thanks hg bisect --extend), i finally got further : The first bad revision is: changeset: 144521:ca06d27f049f user: Brian Hackett <bhackett1024@gmail.com> date: Tue Aug 27 11:48:55 2013 -0600 summary: Bug 908301 - Remove dedicated source compression thread, use JS worker threads instead, allow saving source when parsing off thread, r=benjamin. Trying a build of m-c tip with that revision backed out to confirm the suspicion.
Depends on: 908301
a build of m-c tip with ca06d27f049f backed out builds, packages and runs fine on sparc64. Given that it's an architecture without ION, might it be related to the changes made in that rev wrt #ifdefs ?
Assignee: nobody → general
Component: XPCOM → JavaScript Engine
It seems JS_WORKER_THREADS is only defined if JS_ION is defined (js/src/vm/Runtime.h) so probably not the case on exotic archs. Is there a safe fallback in this case ?
From what i understand of the diff, it seems previously the sourceCompressorThread was available #if JS_THREADSAFE (which afaict is enabled everywhere, including sparc64), and it is now exclusively done from a workerthread, which is only available if JS_ION is on. I dont grok anything of this, but i'm not sure exotic non-ion archs have been taken care of in that commit...
Other datapoint: http://hg.mozilla.org/mozilla-central/rev/43259182e1a0 builds, packages and runs fine on powerpc, which is also !JS_ION. This cset is more recent than ca06d27f049f
As a huge hack/workaround, force-disabling the helper threads helps: --- a/js/src/jsapi.cpp Tue Sep 10 14:58:50 2013 +0900 +++ b/js/src/jsapi.cpp Tue Sep 10 20:53:27 2013 +0200 @@ -700,7 +700,7 @@ { MOZ_ASSERT(jsInitState == Running, "must call JS_Init prior to creating any JSRuntimes"); - + useHelperThreads = JS_NO_HELPER_THREADS; This is gross, but with that everything is fine at runtime/package time. Now, who will figure out what is wrong with helper threads on sparc64...
Ping ? would be nice to get that properly fixed before next uplift, and i doubt i can fix that myself.
Brian, can you take a look? See comment 7 and comment 9 for details.
Flags: needinfo?(bhackett1024)
Per comment 11 this is likely a sparc specific issue. I think we should just use the workaround in comment 12 (which I suggested on IRC) #ifdef'ed for sparc.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #15) > Per comment 11 this is likely a sparc specific issue. I think we should > just use the workaround in comment 12 (which I suggested on IRC) #ifdef'ed > for sparc. Thats... hiding issues under the carpet, and i think we expect a better quality standard from our codebase :)
a build of m-c from cset d5fc994ca2ed packages and runs fine on powerpc. So its not a 'all non-JS_ION platforms' issue.
Contrary to what i said in a previous comment, jsapi-tests badly fail (ie explode/fail/segfault) when the helper threads are used, and mostly all pass when it is disabled (ie useHelperThreads = JS_NO_HELPER_THREADS as bhackett suggested) I'd like to find the smallest test case possible to try to gather an actual coredump & a backtrace..
gdb 6.3 cant load jsapi-tests.core, and 7.6 can load it but cant do anything with it. Exactly the same problem as with the xpcshell.core generated by make package. [New process 30044] warning: Couldn't recognize general-purpose registers in core file. Core was generated by `jsapi-tests'. Program terminated with signal 11, Segmentation fault. warning: Couldn't recognize general-purpose registers in core file. #0 <unavailable> in ?? () (gdb)
I can still run jsapi-tests directly from egdb. So apart all the TEST-UNEXPECTED-FAIL that are triggered by the helper threads, the one that crashes jsapi-tests gives this backtrace: testOriginPrincipals Program received signal SIGSEGV, Segmentation fault. 0x0000003321220630 in JS_GetFunctionScript (cx=0x3525d7d400, fun=<optimized out>) at /home/landry/m-c/js/src/vm/OldDebugAPI.cpp:525 525 MOZ_CRASH(); (gdb) bt #0 0x0000003321220630 in JS_GetFunctionScript (cx=0x3525d7d400, fun=<optimized out>) at /home/landry/m-c/js/src/vm/OldDebugAPI.cpp:525 #1 0x0000003321185f30 in testInner (originPrincipal=0x3322057a6c <prin1>, principal=0x3322057a6c <prin1>, asciiChars=0x332163e2a8 "function f() {return 1}; f;", this=0x33224d4250 <cls_testOriginPrincipals_instance>) at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:90 #2 cls_testOriginPrincipals::testOuter (this=0x33224d4250 <cls_testOriginPrincipals_instance>, asciiChars=0x332163e2a8 "function f() {return 1}; f;") at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:79 #3 0x0000003321187718 in cls_testOriginPrincipals::run ( this=0x33224d4250 <cls_testOriginPrincipals_instance>, global=...) at /home/landry/m-c/js/src/jsapi-tests/testOriginPrincipals.cpp:27 #4 0x00000033211c1c98 in main (argc=<optimized out>, argv=<optimized out>) at /home/landry/m-c/js/src/jsapi-tests/tests.cpp:100
cc'ing billm at bhackett's suggestion.. i can try any random idea, but i cant come up with a smaller testcase myself on how to debug this.
Sorry, I don't know anything about this code.
CC'ing martin husemann, since he did lots of sparc64 fixing he might have an idea.. I'd rather avoid disabling the helper threads on sparc64 :(
I am not sure this is related, but I get an assertion failure and crash in debug versions due to mozilla::TimeStamp::sFirstTimeStamp being instantiated in different DSOs and the value set in TimeStamp::Startup() early is overwritten with 0 by the constructor run when libxul.so is loaded: Watchpoint 2: sFirstTimeStamp Old value = {mValue = 161451583385268, static sFirstTimeStamp = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = <same as static member of an already seen type>}}, static sProcessCreation = <same as static member of an already seen type>} New value = {mValue = 0, static sFirstTimeStamp = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = {mValue = 0, static sFirstTimeStamp = <same as static member of an already seen type>, static sProcessCreation = <same as static member of an already seen type>}}, static sProcessCreation = <same as static member of an already seen type>} 0x000000004569f5bc in mozilla::TimeStamp::TimeStamp ( this=0x4ae9e8a0 <mozilla::TimeStamp::sFirstTimeStamp>) at ../../../dist/include/mozilla/TimeStamp.h:213 213 MOZ_CONSTEXPR TimeStamp() : mValue(0) {} (gdb) bt #0 0x000000004569f5bc in mozilla::TimeStamp::TimeStamp ( this=0x4ae9e8a0 <mozilla::TimeStamp::sFirstTimeStamp>) at ../../../dist/include/mozilla/TimeStamp.h:213 #1 0x00000000488d12c0 in __static_initialization_and_destruction_0 ( __initialize_p=1, __priority=65535) at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:16 #2 0x00000000488d12f8 in global constructors keyed to TimeStamp.cpp(void) () at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:62 Assertion failure: !aOther.IsNull() (Cannot compute with aOther null value), at ../../../dist/include/mozilla/TimeStamp.h:314 Program received signal SIGSEGV, Segmentation fault. 0x00000000456f5e04 in mozilla::TimeStamp::operator> (this=0xffffffffffffb370, aOther=...) at ../../../dist/include/mozilla/TimeStamp.h:314 314 MOZ_ASSERT(!aOther.IsNull(), "Cannot compute with aOther null value"); (gdb) bt #0 0x00000000456f5e04 in mozilla::TimeStamp::operator> ( this=0xffffffffffffb370, aOther=...) at ../../../dist/include/mozilla/TimeStamp.h:314 #1 0x00000000488d118c in mozilla::TimeStamp::ProcessCreation ( aIsInconsistent=@0xffffffffffffb44f: false) at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/xpcom/ds/TimeStamp.cpp:41 #2 0x00000000475aed4c in mozilla::StartupTimelineRecordExternal (aEvent=1, aWhen=161353755552945) at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/toolkit/components/startup/StartupTimeline.cpp:37 #3 0x00000000475aedf0 in XRE_StartupTimelineRecord (aEvent=1, aWhen=161353755552945) at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/toolkit/components/startup/StartupTimeline.cpp:66 #4 0x000000000010435c in main (argc=1, argv=0xffffffffffffb838) at /usr/pkgobj/www/firefox/work/firefox-24.0esr.source/browser/app/nsBrowserApp.cpp:605
(In reply to Martin Husemann from comment #25) please ignore the strange path names - source is a hg pull from yesterday, the firefox-24.0esr.source in the path name is from a few symlinks to make the local debug envrionment easier
Landry, could you test if the patch from #932329 helps for your problem? I am stil not sure it is the same; will rebuild now and do a test run.
I've put https://bug906754.bugzilla.mozilla.org/attachment.cgi?id=823533 on my buildslave, will confirm tmrw if it fixes "my" issue, or if it fixes another different issue..
As of now (and i shouldnt comment in this bug, really) build is broken for my anyway on sparc64 because of a different issue (caused by #898274 two months ago? wtf?): eg++ -o testIntTypesABI.o -c -fvisibility=hidden -DEXPORT_JS_API -DIMPL_MFBT -DMOZ_GLUE_IN_PROGRAM -DNO_NSPR_10_SUPPORT -I/home/buildslave/mozilla-central-sparc64/build/js/src -I.. -I/home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests -I. -I../../../dist/include -I/data/obj/buildslave/m-c/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include ../js-confdefs.h -MD -MP -MF .deps/testIntTypesABI.o.pp -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Werror=conversion-null -Wsign-compare -Wno-invalid-offsetof -fno-rtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe -DNDEBUG -DTRIMMED -g -O -fomit-frame-pointer /home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests/testIntTypesABI.cpp testJSEvaluateScript.o In file included from /usr/include/machine/endian.h:7:0, from /home/buildslave/mozilla-central-sparc64/build/js/src/jscpucfg.h:56, from /home/buildslave/mozilla-central-sparc64/build/js/src/jsapi-tests/testIntTypesABI.cpp:12: /usr/include/sys/endian.h:162:1: error: '__uint64_t' does not name a type __uint64_t htobe64(__uint64_t); ^ /usr/include/sys/endian.h:163:1: error: '__uint32_t' does not name a type __uint32_t htobe32(__uint32_t); http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/595
It 'seems' this issue has been fixed by bug #906754. My sparc64 builder is happy now (cf http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64, builds 599/600) with the header reordering in jscpucfg.h reverted (see #932991) and make package is fine there, so startup cache precompilation should be fixed too. Will confirm that doing an actual runtime testing.
runtime testing positive, ie no crash during make package and firefox from trunk starts fine on sparc64.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee: general → gsvelto
Depends on: 906754
Target Milestone: --- → mozilla28
Blocks: 953211
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: