Closed Bug 406800 Opened 17 years ago Closed 17 years ago

HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info

Categories

(Core :: XPConnect, defect, P3)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: mrz, Assigned: jag+mozilla)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(5 files, 1 obsolete file)

Running 3.0b2 nightly on OSX 10.5.1 and trying to access HP's Onboard Administrator (part of HP's BladeSystem) to manage the system. After authentication, Minefield hangs at "Loading enclosure..." and sometimes crashes. Worked in Fx2. My OA isn't publicly accessible and other than the two crash reports I've submitted I'm not sure what other information would be useful to debug this (error console shows errors in the CSS but crashes before I can grab anything out of it).
Note that without a testcase, we can't do anything about it. Do you have breakpad IDs from the crashes?
Don't have ids. Not at all possible for me to put the OA on an outside accessible network of course. It doesn't consistently crash so if there's any useful information I can get let me know. (I can give access to any mock folks of course)
In the error console: Error: missing ; before statement Source File: javascript:%2010.2.10.27 Line: 1, Column: 5 Source Code: 10.2.10.27 Error: uncaught exception: [Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIWebNavigation.loadURI]" nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)" location: "JS frame :: chrome://global/content/viewSource.js :: viewSource :: line 152" data: no] Page eventually loads but is missing content.
Keywords: qawanted
3.0b2 doesn't crash but this still fails for me on OSX 10.5. This prohibits me from using 3.0 full time :(
I can take a look if you give me access to the page. I'm assuming you can't wget the page or something, and attach a testcase?
I can't get uou access unless you're physically next to me. This gear sits on an internal network. What sort of info do you need from me?
(In reply to comment #6) > What sort of info do you need from me? Not sure what you mean - what we need to move forward is a stack trace or a testcase, ideally. If you can manage to save the page somehow and then remove any sensitive information while making sure it still crashes, you could attach that testcase here. Alternatively, when it crashes you should be able to click to see more details of the crash and get a stack trace from the Mac crash reporter app.
Works on this build: ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-22-04-trunk Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061122 Minefield/3.0a1 Fails the next day: ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-23-04-trunk Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061123 Minefield/3.0a1 Does that help narrow it down enough?
(In reply to comment #8) > Does that help narrow it down enough? In that range, the only things that pop out are bug 47903 and bug 354693. Bug 47903 is probably most likely - I didn't see any exceptions in the console when you showed me the failure, but it could be that it was handled by the app's JS code. It's going to be a bit tricky trying to progress further without a testcase. Can you try logging in again using Fx2, and once you're at the main screen where everything is functional, and then "Save As...", making sure you have "Complete" selected? Then see if opening that local file in Firefox 3 shows the bustage. I think when I tried it it was earlier in the loading process so I might not have saved the entire file correctly.
That page loads in either version of Minefield but that's also past the page where it fails.
While VPN'd to MTP I tried connecting to the site in question with the latest trunk and after entering the credentials (ask mrz), I get to a screen with a progress bar that stays at 0%. After a couple or minutes or so, Minefield crashes. http://crash-stats.mozilla.com/report/index/8c081938-ba4f-11dc-95d9-001a4bd43ef6?date=2008-01-03-22 I was able to reproduce the problem in XP and Mac OS X. I also tried confirming the regression range while using the Javascript Debugger. On the latter build from the regression range, I get to a point where it can't stept through an exception. I saved the state of my VM if you want to take a look.
I ran through the testcase with a Mac debug build on Leopard, and noticed a bunch of these: firefox-bin(950,0xa050bf60) malloc: *** error for object 0x1f5f3320: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug Attached are the stacks I get when I set a breakpoint in malloc_error_break. This looks like something nasty is happening when JS GC is called from cycle collection. This probably explains the seemingly random crashes/hangs I get when trying to reproduce. It might also explain why you got that regression range - the cycle collector was backed out in that range because of a perf regression.
This looks like a topcrash given the stacks. It's gotten worse over the last few days. http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0b3pre&range_value=2&signature=RtlpDeCommitFreeBlock (Also adding RtlpDeCommitFreeBlock since that seems to be pretty popular, but see below that.) See also bp-ecf96154-c01a-11dc-ad73-001a4bd43ef6.
Severity: normal → critical
Keywords: crash, topcrash
Summary: HP's OA crashes Minefield when loading blade enclosure info → HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
I ran this testcase through Purify, and this was one of the errors reported. Someone more familiar with the cycle collector or JS engine might be able to provide further insight based on these stacks.
I can try to get more complete stacks if that would be useful.
Clearly JS strings are somehow getting bogus u.chars.
Assignee: nobody → general
Component: General → JavaScript Engine
QA Contact: general → general
Not JS engine. Someone is giving ownership of a string to the JS engine, then welching on the deal. /be
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Can you get purify to tell you at what stacks that location in memory was previously allocated and freed?
I managed to figure out that the JSString being double-freed was owned by an XSLT node (via the XPCVariant seen in attachment 296769 [details]), and wasted a lot of time figuring out which JS was running and trying to figure out why the JS engine was trying to free the string as well, until shaver helpfully pointed out that the XPCVariant was the one that shouldn't be freeing it's JSString (I should have known, based on comment 18). jag then noticed that the destructor already checked JSVAL_IS_STRING(msJSVal) and only called CleanUp() if it was false. I breakpointed in the destructor to try and figure out how that could be, since the variant in this case most definitely was initialized with a string JSVal. Turns out the cycle collector unlink macro sets mJSVal to null (http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/xpconnect/src/xpcvariant.cpp&rev=1.29#101), which in turn causes JSVAL_IS_STRING(msJSVal) to be false in the destructor, which results in a call to Cleanup() that erroneously frees the variant's data. jag had some ideas for a patch. We essentially need to ensure that the variant doesn't free it's data from the destructor if its data is shared, without relying on msJSVal (because it may have been nulled out by that point).
We could just call nsVariant::SetToEmpty if JSVAL_IS_STRING(tmp->mJSVal) is true?
Ah, no, that calls Cleanup too. Maybe do both then, set type to EMPTY and clear the pointer?
Up to you. I prefer clearing the pointer as a sort of "and here's the second half" to go with manually pointing mData at the buffer we make the nsVariant depend on. On the other hand there's something to be said for making it clear that we're by-passing Cleanup ('coz none's needed) and we're just forcing the nsVariant into an EMPTY state. But if you're gonna do that there's no point really in clearing the pointer too. The state of the rest of the nsVariant fields is irrelevant once the type is set to EMPTY.
Comment on attachment 296812 [details] [diff] [review] Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2 >Index: js/src/xpconnect/src/xpcvariant.cpp >=================================================================== > if(!JSVAL_IS_STRING(tmp->mJSVal)) > nsVariant::Cleanup(&tmp->mData); >+ else Wrong indentation. >+ tmp->mData.mType = nsIDataType::VTYPE_EMPTY; Let's do this. I'm worried that just clearing the pointer will cause us to try to pass a null pointer in where we shouldn't do that (like into an nsDependentString).
Attachment #296812 - Flags: superreview+
Attachment #296812 - Flags: review+
Attachment #296810 - Flags: superreview+
Attachment #296810 - Flags: review+
Blocks: 409208
On irc, after I explained that Cleanup() will set mType to VTYPE_EMPTY, we decided to go with the first patch instead.
Status: NEW → ASSIGNED
Assignee: nobody → jag
Status: ASSIGNED → NEW
Checking in xpcvariant.cpp; /cvsroot/mozilla/js/src/xpconnect/src/xpcvariant.cpp,v <-- xpcvariant.cpp new revision: 1.30; previous revision: 1.29 done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
I'm running : Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3pre) Gecko/2008011504 Minefield/3.0b3pre and it still fails to load HP's Onboard Administrator page - fails with similiar symptions (though hasn't actually crashed on me). Should I re-open this?
I'd say file a new bug. This bug's summary says it's about a crash, and the crash is fixed.
I've noticed that if you wait long enough the page eventually does load. It seems to hang at the "Loading Enclosure Information" stage, though. Definitely a different bug, please do feel free to file it and CC me. Might also be worth contacting HP and letting them know of the problem? Their web app developers might be in the best position to figure out what's wrong...
I have bug #412550 open.
Crash Signature: [@js_FinalizeObject] [@ RtlpDeCommitFreeBlock]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: