Closed
Bug 406800
Opened 17 years ago
Closed 17 years ago
HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info
Categories
(Core :: XPConnect, defect, P3)
Tracking
()
RESOLVED
FIXED
People
(Reporter: mrz, Assigned: jag+mozilla)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(5 files, 1 obsolete file)
3.93 KB,
text/plain
|
Details | |
3.05 KB,
text/plain
|
Details | |
4.28 KB,
text/plain
|
Details | |
1.43 KB,
patch
|
peterv
:
review+
peterv
:
superreview+
|
Details | Diff | Splinter Review |
1.21 KB,
patch
|
peterv
:
review+
peterv
:
superreview+
|
Details | Diff | Splinter Review |
Running 3.0b2 nightly on OSX 10.5.1 and trying to access HP's Onboard Administrator (part of HP's BladeSystem) to manage the system. After authentication, Minefield hangs at "Loading enclosure..." and sometimes crashes.
Worked in Fx2.
My OA isn't publicly accessible and other than the two crash reports I've submitted I'm not sure what other information would be useful to debug this (error console shows errors in the CSS but crashes before I can grab anything out of it).
Comment 1•17 years ago
|
||
Note that without a testcase, we can't do anything about it.
Do you have breakpad IDs from the crashes?
Reporter | ||
Comment 2•17 years ago
|
||
Don't have ids. Not at all possible for me to put the OA on an outside accessible network of course. It doesn't consistently crash so if there's any useful information I can get let me know.
(I can give access to any mock folks of course)
Reporter | ||
Comment 3•17 years ago
|
||
In the error console:
Error: missing ; before statement
Source File: javascript:%2010.2.10.27
Line: 1, Column: 5
Source Code:
10.2.10.27
Error: uncaught exception: [Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIWebNavigation.loadURI]" nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)" location: "JS frame :: chrome://global/content/viewSource.js :: viewSource :: line 152" data: no]
Page eventually loads but is missing content.
Reporter | ||
Comment 4•17 years ago
|
||
3.0b2 doesn't crash but this still fails for me on OSX 10.5. This prohibits me from using 3.0 full time :(
Comment 5•17 years ago
|
||
I can take a look if you give me access to the page. I'm assuming you can't wget the page or something, and attach a testcase?
Reporter | ||
Comment 6•17 years ago
|
||
I can't get uou access unless you're physically next to me. This gear sits on an internal network.
What sort of info do you need from me?
Comment 7•17 years ago
|
||
(In reply to comment #6)
> What sort of info do you need from me?
Not sure what you mean - what we need to move forward is a stack trace or a testcase, ideally. If you can manage to save the page somehow and then remove any sensitive information while making sure it still crashes, you could attach that testcase here. Alternatively, when it crashes you should be able to click to see more details of the crash and get a stack trace from the Mac crash reporter app.
Reporter | ||
Comment 8•17 years ago
|
||
Works on this build:
ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-22-04-trunk
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061122 Minefield/3.0a1
Fails the next day:
ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-23-04-trunk
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061123 Minefield/3.0a1
Does that help narrow it down enough?
Comment 9•17 years ago
|
||
(In reply to comment #8)
> Does that help narrow it down enough?
In that range, the only things that pop out are bug 47903 and bug 354693. Bug 47903 is probably most likely - I didn't see any exceptions in the console when you showed me the failure, but it could be that it was handled by the app's JS code. It's going to be a bit tricky trying to progress further without a testcase. Can you try logging in again using Fx2, and once you're at the main screen where everything is functional, and then "Save As...", making sure you have "Complete" selected? Then see if opening that local file in Firefox 3 shows the bustage. I think when I tried it it was earlier in the loading process so I might not have saved the entire file correctly.
Reporter | ||
Comment 10•17 years ago
|
||
That page loads in either version of Minefield but that's also past the page where it fails.
Comment 11•17 years ago
|
||
While VPN'd to MTP I tried connecting to the site in question with the latest trunk and after entering the credentials (ask mrz), I get to a screen with a progress bar that stays at 0%. After a couple or minutes or so, Minefield crashes. http://crash-stats.mozilla.com/report/index/8c081938-ba4f-11dc-95d9-001a4bd43ef6?date=2008-01-03-22
I was able to reproduce the problem in XP and Mac OS X.
I also tried confirming the regression range while using the Javascript Debugger. On the latter build from the regression range, I get to a point where it can't stept through an exception. I saved the state of my VM if you want to take a look.
Comment 12•17 years ago
|
||
I ran through the testcase with a Mac debug build on Leopard, and noticed a bunch of these:
firefox-bin(950,0xa050bf60) malloc: *** error for object 0x1f5f3320: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
Attached are the stacks I get when I set a breakpoint in malloc_error_break. This looks like something nasty is happening when JS GC is called from cycle collection. This probably explains the seemingly random crashes/hangs I get when trying to reproduce. It might also explain why you got that regression range - the cycle collector was backed out in that range because of a perf regression.
Comment 13•17 years ago
|
||
And the bonsai URL for mrz's regression range, because I keep having to look it up again:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&branchtype=match&date=explicit&mindate=2006-11-22+02%3A00&maxdate=2006-11-23+05%3A00
Flags: blocking1.9?
Keywords: qawanted
Comment 14•17 years ago
|
||
This looks like a topcrash given the stacks. It's gotten worse over the last few days.
http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0b3pre&range_value=2&signature=RtlpDeCommitFreeBlock
(Also adding RtlpDeCommitFreeBlock since that seems to be pretty popular, but see below that.)
See also bp-ecf96154-c01a-11dc-ad73-001a4bd43ef6.
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
Comment 15•17 years ago
|
||
I ran this testcase through Purify, and this was one of the errors reported. Someone more familiar with the cycle collector or JS engine might be able to provide further insight based on these stacks.
Comment 16•17 years ago
|
||
I can try to get more complete stacks if that would be useful.
Comment 17•17 years ago
|
||
Clearly JS strings are somehow getting bogus u.chars.
Updated•17 years ago
|
Assignee: nobody → general
Component: General → JavaScript Engine
QA Contact: general → general
Comment 18•17 years ago
|
||
Not JS engine. Someone is giving ownership of a string to the JS engine, then welching on the deal.
/be
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Can you get purify to tell you at what stacks that location in memory was previously allocated and freed?
Comment 20•17 years ago
|
||
I managed to figure out that the JSString being double-freed was owned by an XSLT node (via the XPCVariant seen in attachment 296769 [details]), and wasted a lot of time figuring out which JS was running and trying to figure out why the JS engine was trying to free the string as well, until shaver helpfully pointed out that the XPCVariant was the one that shouldn't be freeing it's JSString (I should have known, based on comment 18).
jag then noticed that the destructor already checked JSVAL_IS_STRING(msJSVal) and only called CleanUp() if it was false. I breakpointed in the destructor to try and figure out how that could be, since the variant in this case most definitely was initialized with a string JSVal. Turns out the cycle collector unlink macro sets mJSVal to null (http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/xpconnect/src/xpcvariant.cpp&rev=1.29#101), which in turn causes JSVAL_IS_STRING(msJSVal) to be false in the destructor, which results in a call to Cleanup() that erroneously frees the variant's data.
jag had some ideas for a patch. We essentially need to ensure that the variant doesn't free it's data from the destructor if its data is shared, without relying on msJSVal (because it may have been nulled out by that point).
Assignee | ||
Comment 21•17 years ago
|
||
Assignee | ||
Comment 22•17 years ago
|
||
Assignee | ||
Comment 23•17 years ago
|
||
Attachment #296811 -
Attachment is obsolete: true
Comment 24•17 years ago
|
||
We could just call nsVariant::SetToEmpty if JSVAL_IS_STRING(tmp->mJSVal) is true?
Comment 25•17 years ago
|
||
Ah, no, that calls Cleanup too. Maybe do both then, set type to EMPTY and clear the pointer?
Assignee | ||
Comment 26•17 years ago
|
||
Up to you. I prefer clearing the pointer as a sort of "and here's the second half" to go with manually pointing mData at the buffer we make the nsVariant depend on.
On the other hand there's something to be said for making it clear that we're by-passing Cleanup ('coz none's needed) and we're just forcing the nsVariant into an EMPTY state.
But if you're gonna do that there's no point really in clearing the pointer too. The state of the rest of the nsVariant fields is irrelevant once the type is set to EMPTY.
Comment 27•17 years ago
|
||
Comment on attachment 296812 [details] [diff] [review]
Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2
>Index: js/src/xpconnect/src/xpcvariant.cpp
>===================================================================
> if(!JSVAL_IS_STRING(tmp->mJSVal))
> nsVariant::Cleanup(&tmp->mData);
>+ else
Wrong indentation.
>+ tmp->mData.mType = nsIDataType::VTYPE_EMPTY;
Let's do this. I'm worried that just clearing the pointer will cause us to try to pass a null pointer in where we shouldn't do that (like into an nsDependentString).
Attachment #296812 -
Flags: superreview+
Attachment #296812 -
Flags: review+
Updated•17 years ago
|
Attachment #296810 -
Flags: superreview+
Attachment #296810 -
Flags: review+
Assignee | ||
Comment 28•17 years ago
|
||
On irc, after I explained that Cleanup() will set mType to VTYPE_EMPTY, we decided to go with the first patch instead.
Status: NEW → ASSIGNED
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → jag
Status: ASSIGNED → NEW
Assignee | ||
Comment 29•17 years ago
|
||
Checking in xpcvariant.cpp;
/cvsroot/mozilla/js/src/xpconnect/src/xpcvariant.cpp,v <-- xpcvariant.cpp
new revision: 1.30; previous revision: 1.29
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Flags: in-testsuite?
Reporter | ||
Comment 35•17 years ago
|
||
I'm running :
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3pre) Gecko/2008011504 Minefield/3.0b3pre
and it still fails to load HP's Onboard Administrator page - fails with similiar symptions (though hasn't actually crashed on me).
Should I re-open this?
I'd say file a new bug. This bug's summary says it's about a crash, and the crash is fixed.
Comment 37•17 years ago
|
||
I've noticed that if you wait long enough the page eventually does load. It seems to hang at the "Loading Enclosure Information" stage, though. Definitely a different bug, please do feel free to file it and CC me. Might also be worth contacting HP and letting them know of the problem? Their web app developers might be in the best position to figure out what's wrong...
Reporter | ||
Comment 38•17 years ago
|
||
I have bug #412550 open.
Updated•14 years ago
|
Crash Signature: [@js_FinalizeObject]
[@ RtlpDeCommitFreeBlock]
You need to log in
before you can comment on or make changes to this bug.
Description
•