HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info

RESOLVED FIXED

Status

()

Core
XPConnect
P3
critical
RESOLVED FIXED
11 years ago
7 years ago

People

(Reporter: mrz, Assigned: jag (Peter Annema))

Tracking

({crash, topcrash})

Trunk
x86
Mac OS X
crash, topcrash
Points:
---
Bug Flags:
blocking1.9 +
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

11 years ago
Running 3.0b2 nightly on OSX 10.5.1 and trying to access HP's Onboard Administrator (part of HP's BladeSystem) to manage the system.  After authentication, Minefield hangs at "Loading enclosure..." and sometimes crashes.

Worked in Fx2.  

My OA isn't publicly accessible and other than the two crash reports I've submitted I'm not sure what other information would be useful to debug this (error console shows errors in the CSS but crashes before I can grab anything out of it).
Note that without a testcase, we can't do anything about it.

Do you have breakpad IDs from the crashes?
(Reporter)

Comment 2

11 years ago
Don't have ids. Not at all possible for me to put the OA on an outside accessible network of course. It doesn't consistently crash so if there's any useful information I can get let me know. 

(I can give access to any mock folks of course)
(Reporter)

Comment 3

11 years ago
In the error console:

Error: missing ; before statement
Source File: javascript:%2010.2.10.27
Line: 1, Column: 5
Source Code:
 10.2.10.27

Error: uncaught exception: [Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIWebNavigation.loadURI]"  nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)"  location: "JS frame :: chrome://global/content/viewSource.js :: viewSource :: line 152"  data: no]


Page eventually loads but is missing content.  
(Reporter)

Updated

11 years ago
Keywords: qawanted
(Reporter)

Comment 4

11 years ago
3.0b2 doesn't crash but this still fails for me on OSX 10.5.  This prohibits me from using 3.0 full time :(
I can take a look if you give me access to the page. I'm assuming you can't wget the page or something, and attach a testcase?
(Reporter)

Comment 6

11 years ago
I can't get uou access unless you're physically next to me. This gear sits on an internal network. 

What sort of info do you need from me?
(In reply to comment #6)
> What sort of info do you need from me?

Not sure what you mean - what we need to move forward is a stack trace or a testcase, ideally. If you can manage to save the page somehow and then remove any sensitive information while making sure it still crashes, you could attach that testcase here. Alternatively, when it crashes you should be able to click to see more details of the crash and get a stack trace from the Mac crash reporter app.
(Reporter)

Comment 8

11 years ago
Works on this build:
ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-22-04-trunk
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061122 Minefield/3.0a1

Fails the next day:
ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-23-04-trunk
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061123 Minefield/3.0a1


Does that help narrow it down enough?
(In reply to comment #8)
> Does that help narrow it down enough?

In that range, the only things that pop out are bug 47903 and bug 354693. Bug 47903 is probably most likely - I didn't see any exceptions in the console when you showed me the failure, but it could be that it was handled by the app's JS code. It's going to be a bit tricky trying to progress further without a testcase. Can you try logging in again using Fx2, and once you're at the main screen where everything is functional, and then "Save As...", making sure you have "Complete" selected? Then see if opening that local file in Firefox 3 shows the bustage. I think when I tried it it was earlier in the loading process so I might not have saved the entire file correctly.
(Reporter)

Comment 10

11 years ago
That page loads in either version of Minefield but that's also past the page where it fails.
While VPN'd to MTP I tried connecting to the site in question with the latest trunk and after entering the credentials (ask mrz), I get to a screen with a progress bar that stays at 0%. After a couple or minutes or so, Minefield crashes. http://crash-stats.mozilla.com/report/index/8c081938-ba4f-11dc-95d9-001a4bd43ef6?date=2008-01-03-22

I was able to reproduce the problem in XP and Mac OS X.

I also tried confirming the regression range while using the Javascript Debugger. On the latter build from the regression range, I get to a point where it can't stept through an exception. I saved the state of my VM if you want to take a look.


Created attachment 296035 [details]
stack trace to malloc error

I ran through the testcase with a Mac debug build on Leopard, and noticed a bunch of these:
firefox-bin(950,0xa050bf60) malloc: *** error for object 0x1f5f3320: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug

Attached are the stacks I get when I set a breakpoint in malloc_error_break. This looks like something nasty is happening when JS GC is called from cycle collection. This probably explains the seemingly random crashes/hangs I get when trying to reproduce. It might also explain why you got that regression range - the cycle collector was backed out in that range because of a perf regression.
And the bonsai URL for mrz's regression range, because I keep having to look it up again:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&branchtype=match&date=explicit&mindate=2006-11-22+02%3A00&maxdate=2006-11-23+05%3A00
Flags: blocking1.9?
Keywords: qawanted
This looks like a topcrash given the stacks. It's gotten worse over the last few days.

http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0b3pre&range_value=2&signature=RtlpDeCommitFreeBlock

(Also adding RtlpDeCommitFreeBlock since that seems to be pretty popular, but see below that.)

See also bp-ecf96154-c01a-11dc-ad73-001a4bd43ef6.
Severity: normal → critical
Keywords: crash, topcrash
Summary: HP's OA crashes Minefield when loading blade enclosure info → HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info

Updated

11 years ago
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
Created attachment 296769 [details]
purify double-free report

I ran this testcase through Purify, and this was one of the errors reported. Someone more familiar with the cycle collector or JS engine might be able to provide further insight based on these stacks.
I can try to get more complete stacks if that would be useful.
Created attachment 296776 [details]
another purify error (free memory read)

Clearly JS strings are somehow getting bogus u.chars.
Assignee: nobody → general
Component: General → JavaScript Engine
QA Contact: general → general
Not JS engine. Someone is giving ownership of a string to the JS engine, then welching on the deal.

/be
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Can you get purify to tell you at what stacks that location in memory was previously allocated and freed?
I managed to figure out that the JSString being double-freed was owned by an XSLT node (via the XPCVariant seen in attachment 296769 [details]), and wasted a lot of time figuring out which JS was running and trying to figure out why the JS engine was trying to free the string as well, until shaver helpfully pointed out that the XPCVariant was the one that shouldn't be freeing it's JSString (I should have known, based on comment 18).

jag then noticed that the destructor already checked JSVAL_IS_STRING(msJSVal) and only called CleanUp() if it was false. I breakpointed in the destructor to try and figure out how that could be, since the variant in this case most definitely was initialized with a string JSVal. Turns out the cycle collector unlink macro sets mJSVal to null (http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/xpconnect/src/xpcvariant.cpp&rev=1.29#101), which in turn causes JSVAL_IS_STRING(msJSVal) to be false in the destructor, which results in a call to Cleanup() that erroneously frees the variant's data.

jag had some ideas for a patch. We essentially need to ensure that the variant doesn't free it's data from the destructor if its data is shared, without relying on msJSVal (because it may have been nulled out by that point).
(Assignee)

Comment 21

11 years ago
Created attachment 296810 [details] [diff] [review]
Explicitly clear pointer to shared buffer and always call Cleanup()
(Assignee)

Comment 22

11 years ago
Created attachment 296811 [details] [diff] [review]
Alternatively just explicitly set the type to EMPTY and skip Cleanup()
(Assignee)

Comment 23

11 years ago
Created attachment 296812 [details] [diff] [review]
Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2
Attachment #296811 - Attachment is obsolete: true
We could just call nsVariant::SetToEmpty if JSVAL_IS_STRING(tmp->mJSVal) is true?
Ah, no, that calls Cleanup too. Maybe do both then, set type to EMPTY and clear the pointer?
(Assignee)

Comment 26

11 years ago
Up to you. I prefer clearing the pointer as a sort of "and here's the second half" to go with manually pointing mData at the buffer we make the nsVariant depend on.

On the other hand there's something to be said for making it clear that we're by-passing Cleanup ('coz none's needed) and we're just forcing the nsVariant into an EMPTY state.

But if you're gonna do that there's no point really in clearing the pointer too. The state of the rest of the nsVariant fields is irrelevant once the type is set to EMPTY.
Comment on attachment 296812 [details] [diff] [review]
Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2

>Index: js/src/xpconnect/src/xpcvariant.cpp
>===================================================================

>     if(!JSVAL_IS_STRING(tmp->mJSVal))
>         nsVariant::Cleanup(&tmp->mData);
>+     else

Wrong indentation.

>+        tmp->mData.mType = nsIDataType::VTYPE_EMPTY;

Let's do this. I'm worried that just clearing the pointer will cause us to try to pass a null pointer in where we shouldn't do that (like into an nsDependentString).
Attachment #296812 - Flags: superreview+
Attachment #296812 - Flags: review+
Attachment #296810 - Flags: superreview+
Attachment #296810 - Flags: review+
(Assignee)

Comment 28

11 years ago
On irc, after I explained that Cleanup() will set mType to VTYPE_EMPTY, we decided to go with the first patch instead.
Status: NEW → ASSIGNED
(Assignee)

Updated

11 years ago
Assignee: nobody → jag
Status: ASSIGNED → NEW
(Assignee)

Comment 29

11 years ago
Checking in xpcvariant.cpp;
/cvsroot/mozilla/js/src/xpconnect/src/xpcvariant.cpp,v  <--  xpcvariant.cpp
new revision: 1.30; previous revision: 1.29
done
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
Duplicate of this bug: 407502
Duplicate of this bug: 403145
Duplicate of this bug: 409208
Duplicate of this bug: 409382

Updated

11 years ago
Duplicate of this bug: 403145

Updated

11 years ago
Flags: in-testsuite?
(Reporter)

Comment 35

11 years ago
I'm running :

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3pre) Gecko/2008011504 Minefield/3.0b3pre

and it still fails to load HP's Onboard Administrator page - fails with similiar symptions (though hasn't actually crashed on me).  

Should I re-open this?
I'd say file a new bug.  This bug's summary says it's about a crash, and the crash is fixed.
I've noticed that if you wait long enough the page eventually does load. It seems to hang at the "Loading Enclosure Information" stage, though. Definitely a different bug, please do feel free to file it and CC me. Might also be worth contacting HP and letting them know of the problem? Their web app developers might be in the best position to figure out what's wrong...
(Reporter)

Comment 38

11 years ago
I have bug #412550 open.  
Duplicate of this bug: 409785
Crash Signature: [@js_FinalizeObject] [@ RtlpDeCommitFreeBlock]
You need to log in before you can comment on or make changes to this bug.