Closed Bug 350787 Opened 18 years ago Closed 18 years ago

1.8 branch startup crashes [@ js_Interpret], line 5506

Categories

(Core :: XPConnect, defect, P1)

1.8 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla1.8.1

People

(Reporter: dbaron, Assigned: brendan)

Details

(Keywords: crash, topcrash, verified1.8.1)

Crash Data

Attachments

(2 files, 3 obsolete files)

A bunch of users have been crashing on startup in 1.8 branch builds with stacks pointing to js_Interpret, line 5506. There seem to be two variations of this stack: Incident ID: 22670580 Stack Signature js_Interpret() 44249002 Product ID Firefox2 Build ID 2006082804 Trigger Time 2006-08-29 19:03:37.0 Platform LinuxIntel Operating System Linux 2.4.31 Module libmozjs.so + (00040b8d) URL visited User Comments Since Last Crash 0 sec Total Uptime 5 sec Trigger Reason SIGSEGV: Segmentation Fault: (signal 11) Source File, Line No. /builds/tinderbox/Fx-Mozilla1.8/Linux_2.4.21-27.0.4.EL_Depend/mozilla/js/src/jsinterp.c, line 5506 Stack Trace js_Interpret() [mozilla/js/src/jsinterp.c, line 5506] js_Invoke() [mozilla/js/src/jsinterp.c, line 1369] nsXPCWrappedJSClass::CallMethod() [mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp, line 1415] nsXPCWrappedJS::CallMethod() [mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp, line 468] PrepareAndDispatch() [mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_gcc_x86_unix.cpp, line 100] nsObserverService::NotifyObservers() [mozilla/xpcom/ds/nsObserverService.cpp, line 848] nsXREDirProvider::DoStartup() [mozilla/toolkit/xre/nsXREDirProvider.cpp, line 591] XRE_main() [mozilla/toolkit/xre/nsAppRunner.cpp, line 2308] main() [mozilla/browser/app/nsBrowserApp.cpp, line 62] libc.so.6 + 0x1544b (0x40a5044b) Incident ID: 22658782 Stack Signature js_Interpret 68a0a449 Product ID Firefox2 Build ID 2006082803 Trigger Time 2006-08-29 12:55:43.0 Platform Win32 Operating System Windows NT 5.1 build 2600 Module js3250.dll + (00028c47) URL visited startup User Comments Since Last Crash 26567 sec Total Uptime 33847 sec Trigger Reason Access violation Source File, Line No. c:/builds/tinderbox/Fx-Mozilla1.8/WINNT_5.2_Depend/mozilla/js/src/jsinterp.c, line 5506 Stack Trace js_Interpret [mozilla/js/src/jsinterp.c, line 5506] js_Invoke [mozilla/js/src/jsinterp.c, line 1369] nsXPCWrappedJSClass::CallMethod [mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp, line 1415] nsXPCWrappedJS::CallMethod [mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp, line 468] SharedStub [mozilla/xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp, line 147] mozJSComponentLoader::GetFactory [mozilla/js/src/xpconnect/loader/mozJSComponentLoader.cpp, line 452] nsFactoryEntry::GetFactory [mozilla/xpcom/components/nsComponentManager.h, line 302] nsComponentManagerImpl::CreateInstanceByContractID [mozilla/xpcom/components/nsComponentManager.cpp, line 1978] nsComponentManagerImpl::GetServiceByContractID [mozilla/xpcom/components/nsComponentManager.cpp, line 2410] CallGetService [mozilla/xpcom/build/nsComponentManagerUtils.cpp, line 95] nsCommandLine::EnumerateHandlers [mozilla/toolkit/components/commandlines/src/nsCommandLine.cpp, line 568] nsCommandLine::Run [mozilla/toolkit/components/commandlines/src/nsCommandLine.cpp, line 592] main [mozilla/browser/app/nsBrowserApp.cpp, line 61] kernel32.dll + 0x16fd7 (0x7c816fd7) Some users have been reporting the crash multiple times; it seems like it might be that a given user sees only one of the stacks, although I haven't looked too closely. The line with the crash is within the JSOP_INITCATCHVAR case: ok = OBJ_GET_ATTRIBUTES(cx, obj, id, NULL, &attrs);
Flags: blocking1.8.1?
The first report of this crash was in the 2006081504 build. Here's a histogram of the first few days: 2006081504 1 2006081606 1 2006081703 2 (one of which was a little later in startup) 2006081803 3 (all 3 from same user) 2006081903 1 2006081904 1 (first Linux report) 2006081907 2 (first Mac reports)
Possible cause: patch for bug 346494, which was buggy. Should be fixed by patch for bug 350312, which went in today. We'll see... /be
No, not fixed by that patch. Three crashes reported in 20060831 build: TB22753760, TB22753217, and TB22752579.
We're trying to keep the blocker list as real as possible, so while we would like to see this fixed, we're not going to block on it until we have a firmer understanding of cause and impact.
Flags: blocking1.8.1? → blocking1.8.1-
I have found that this crashes for me every time on first launch of a newly installed branch build if the previous Firsfox invocation was a recent trunk nightly and I was sharing profiles between branch and trunk. extensions i have installed are: ChatZilla 0.9.75 Console² 0.3.5 DOM Inspector 1.8.1b2 FireBug 0.4 FireFTP 0.94.3 Forecastfox Enhanced 0.8.5.2 IE View 1.3.0 JavaScript Options 1.2.4 Live HTTP Headers 0.12.1 [DISABLED] MapIt! 0.7.1 Nightly Tester Tools 1.0.4 Sage 1.3.6 Talkback 2.0b2 Tinderstatus 0.2.1 Update Channel Selector 1.0.1 User Agent Switcher 0.6.8 Yahoo! Mail Notifier 0.9.9.2
It looks like we (at least right now) have different XUL fastload file versions on branch and trunk, and we've been bumping them by two to avoid collisions... Is there a JS component fastload version that we need to bump? If not, why not?
This suggests a fastload problem. Bill, if you remove both the XUL and XPC fastload files after quitting trunk and restarting 1.8 branch build, do you still crash? /be
(In reply to comment #6) > It looks like we (at least right now) have different XUL fastload file versions > on branch and trunk, and we've been bumping them by two to avoid collisions... > > Is there a JS component fastload version that we need to bump? If not, why > not? That's the bug. I didn't pay enough attention when component fastloading was added. JS bytecode versioning is not done on each script serialized and later deserialized, to reduce overhead. Instead, it's up to the versioning done by a higher layer, written once per fastload file. The same sort of thing needs to be done in mozJSComponentLoader.cpp, at least. It would be better to have a single JS bytecode version number and check, but that would require more changes than I think we want at this point. /be
Assignee: general → dbradley
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
If removing both fastload files makes the crash go away, could you test that removing only one of them (in particular, XPC.mfl) also makes it go away? For what it's worth, I couldn't reproduce the crash jumping between Linux nightlies 2006-09-05-04-mozilla1.8 and 2006-09-05-05-trunk on a clean profile. So this could be related to the particular extensions in some way. However, if removing XPC.mfl makes it not crash, I think we understand it well enough that it's not necessary to figure out which extensions trigger it.
Flags: blocking1.8.1- → blocking1.8.1?
ccing the folks who did JS component fastload to start with
Attached patch fix (obsolete) — Splinter Review
I bit the bullet and separated JS bytecode version, which exposed redundant XUL fastload file version number reading and writing in nsXULPrototypeDocument.cpp. The 1.8 branch version of this patch will need a different JSXDR_BYTECODE_VERSION value, and the usual rule applies: any XDR'ed version number must increment to be greater than all values on live branches and trunk, whenever any branch or trunk changes. /be
Assignee: dbradley → brendan
Status: NEW → ASSIGNED
Attachment #236830 - Flags: superreview?(jst)
Attachment #236830 - Flags: review?(mrbkap)
Priority: -- → P1
Target Milestone: --- → mozilla1.8.1
Attached patch fixed fixSplinter Review
Interdiff should show just a comment typo fix and missing #include. /be
Attachment #236830 - Attachment is obsolete: true
Attachment #236833 - Flags: superreview?(jst)
Attachment #236833 - Flags: review?(mrbkap)
Attachment #236830 - Flags: superreview?(jst)
Attachment #236830 - Flags: review?(mrbkap)
Attachment #236833 - Flags: review?(mrbkap) → review+
Is it worth trying to ensure that the new files cause the old reader to fail, perhaps even by bumping MFL_FILE_VERSION? (There's also the low probability event that the string at the start of the old file matches the version in the new file; I haven't checked what the current version number looks like as a string for whether that's even remotely possible...)
(In reply to comment #13) > Is it worth trying to ensure that the new files cause the old reader to fail, > perhaps even by bumping MFL_FILE_VERSION? The old reader will probably (on 32-bit systems) fail with OOM for XPC.mfasl (overlarge byte array), or fail for want of a valid XUL fastload file version (the redundant one at the start of the first prototype doc, which redundancy this patch eliminates). But yeah, safety first. Bumping MFL_FILE_VERSION is the only way. Reviewers: assume I'm changing that with this patch: Index: xpcom/io/nsFastLoadFile.h =================================================================== RCS file: /cvsroot/mozilla/xpcom/io/nsFastLoadFile.h,v retrieving revision 3.23 diff -p -u -8 -r3.23 nsFastLoadFile.h --- nsFastLoadFile.h 15 Jun 2006 03:06:30 -0000 3.23 +++ nsFastLoadFile.h 5 Sep 2006 21:58:24 -0000 @@ -136,17 +136,18 @@ typedef PRUint32 NSFastLoadOID; * corrupted by FTP-as-ASCII and other likely errors, meaningful to clued-in * humans, and ending in ^Z to terminate erroneous text input on Windows. */ #define MFL_FILE_MAGIC "XPCOM\nMozFASL\r\n\032" #define MFL_FILE_MAGIC_SIZE 16 #define MFL_FILE_VERSION_0 0 #define MFL_FILE_VERSION_1 1000 -#define MFL_FILE_VERSION 4 // fix to note singletons in object map +#define MFL_FILE_VERSION 5 // rev'ed to defend against unversioned + // XPCOM JS component fastload files /** * Compute Fletcher's 16-bit checksum over aLength bytes starting at aBuffer, * with the initial accumulators seeded from *aChecksum, and final checksum * returned in *aChecksum. The return value is the number of unchecked bytes, * which may be non-zero if aBuffer is misaligned or aLength is odd. Callers * should copy any remaining bytes to the front of the next buffer. * > (There's also the low probability > event that the string at the start of the old file matches the version in the > new file; I haven't checked what the current version number looks like as a > string for whether that's even remotely possible...) Not possible for ASCII. The XUL and JS versions are magic (0xdeadbeef - n) and (0xb973c0de + m) numbers. But the old files did not start their payloads with strings, and XDR'd strings are counted by a leading uint32 length in any event. /be
Attachment #236833 - Flags: superreview?(jst) → superreview+
Comment on attachment 236833 [details] [diff] [review] fixed fix sr=jst too, fwiw.
Fixed on trunk: Checking in js/src/jsxdrapi.h; /cvsroot/mozilla/js/src/jsxdrapi.h,v <-- jsxdrapi.h new revision: 1.20; previous revision: 1.19 done Checking in js/src/xpconnect/loader/mozJSComponentLoader.cpp; /cvsroot/mozilla/js/src/xpconnect/loader/mozJSComponentLoader.cpp,v <-- mozJSComponentLoader.cpp new revision: 1.126; previous revision: 1.125 done Checking in content/xul/document/public/nsIXULPrototypeCache.h; /cvsroot/mozilla/content/xul/document/public/nsIXULPrototypeCache.h,v <-- nsIXULPrototypeCache.h new revision: 1.40; previous revision: 1.39 done Checking in content/xul/document/src/nsXULPrototypeCache.cpp; /cvsroot/mozilla/content/xul/document/src/nsXULPrototypeCache.cpp,v <-- nsXULPrototypeCache.cpp new revision: 1.60; previous revision: 1.59 done Checking in content/xul/document/src/nsXULPrototypeDocument.cpp; /cvsroot/mozilla/content/xul/document/src/nsXULPrototypeDocument.cpp,v <-- nsXULPrototypeDocument.cpp new revision: 1.80; previous revision: 1.79 done Checking in xpcom/io/nsFastLoadFile.h; /cvsroot/mozilla/xpcom/io/nsFastLoadFile.h,v <-- nsFastLoadFile.h new revision: 3.24; previous revision: 3.23 done /be
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Attached patch 1.8 branch version of fix (obsolete) — Splinter Review
This is a hazard for anyone upgrading from betas or nightlies, never mind switching between Minefield and Bone-cho. /be
Attachment #236879 - Flags: superreview+
Attachment #236879 - Flags: review+
Attachment #236879 - Flags: approval1.8.1?
Is the JS XDR fully compatible between branch and trunk right now?
Comment on attachment 236879 [details] [diff] [review] 1.8 branch version of fix Oops, old habits die hard. /be
Attachment #236879 - Attachment is obsolete: true
Attachment #236879 - Flags: superreview+
Attachment #236879 - Flags: review+
Attachment #236879 - Flags: approval1.8.1?
Going to ask for one dbaron re-review just to make sure. /be
Attachment #236881 - Flags: superreview+
Attachment #236881 - Flags: review?(dbaron)
Attachment #236881 - Flags: approval1.8.1?
Comment on attachment 236881 [details] [diff] [review] 1.8 version of patch with the right version number tweaked > // Increase the subtractor when changing version, say when changing the >-// (opaque to FastLoad code) format of JS script, function, regexp, etc. >-// XDR serializations. >-#define XUL_FASTLOAD_FILE_VERSION (0xfeedbeef - 19) >+// (opaque to XPCOM FastLoad code) format of XUL-specific XDR serializations. >+// See also JSXDR_BYTECODE_VERSION in jsxdrapi.h, which tracks incompatible JS >+// bytecode version changes. >+#define XUL_FASTLOAD_FILE_VERSION (0xfeedbeef - 20) Stick with 21 here, as we discussed, and r=dbaron. Trunk differs from branch at least due to DOM_AGNOSTIC_BRANCH landing, and probably other things.
Attachment #236881 - Flags: review?(dbaron) → review+
Attachment #236881 - Attachment is obsolete: true
Attachment #236883 - Flags: superreview+
Attachment #236883 - Flags: review+
Attachment #236883 - Flags: approval1.8.1?
Attachment #236881 - Flags: approval1.8.1?
Attachment #236881 - Flags: superreview+
Attachment #236881 - Flags: review+
(In reply to comment #7) > This suggests a fastload problem. Bill, if you remove both the XUL and XPC > fastload files after quitting trunk and restarting 1.8 branch build, do you > still crash? > > /be > Well, I would like to be able to do that, but for some reason I can no longer reproduce the failure. I think this may be because, as I just realized, besides running trunk and branch builds with that profile, i had also been testing the reflow branch with that profile as well. Until a few days ago the reflow branch had the 20060603 fastload code which was missing the previous fix to fastload. I suspect that that might have been contributing to what I was seeing. So, may have to just watch talkback stats. If I can get this to reliably fail again I will post a comment indicating if removing fastload files helps.
(In reply to comment #23) > So, may have to just watch talkback stats. If I can get this to reliably fail > again I will post a comment indicating if removing fastload files helps. This bug could explain more than one topcrash. I'm thinking of the block_getProperty bug, or bugs. /be
Comment on attachment 236883 [details] [diff] [review] 1.8 branch patch to commit a=schrep - approving topcrash fix so we can get better TB data asap for 18 branch to see if this fixes all the js_Interpret issues.
Attachment #236883 - Flags: approval1.8.1? → approval1.8.1+
Fixed on 1.8 branch. /be
Keywords: fixed1.8
verified1.8.1 based on branch nightly talkback data; the one js_Interpret crash since the checkin was much earlier in the function.
Status: RESOLVED → VERIFIED
Flags: blocking1.8.1?
Crash Signature: [@ js_Interpret]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: