Closed Bug 406800 Opened 17 years ago Closed 17 years ago

HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: mrz, Assigned: jag+mozilla)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(5 files, 1 obsolete file)

stack trace to malloc error 17 years ago :Gavin Sharp [email: gavin@gavinsharp.com] 3.93 KB, text/plain		Details
purify double-free report 17 years ago :Gavin Sharp [email: gavin@gavinsharp.com] 3.05 KB, text/plain		Details
another purify error (free memory read) 17 years ago :Gavin Sharp [email: gavin@gavinsharp.com] 4.28 KB, text/plain		Details
Explicitly clear pointer to shared buffer and always call Cleanup() 17 years ago jag (Peter Annema) 1.43 KB, patch	peterv : review+ peterv : superreview+	Details \| Diff \| Splinter Review
Alternatively just explicitly set the type to EMPTY and skip Cleanup() 17 years ago jag (Peter Annema) 1.21 KB, patch		Details \| Diff \| Splinter Review
Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2 17 years ago jag (Peter Annema) 1.21 KB, patch	peterv : review+ peterv : superreview+	Details \| Diff \| Splinter Review

matthew zeier [:mrz]

Reporter

Description

•

17 years ago

Running 3.0b2 nightly on OSX 10.5.1 and trying to access HP's Onboard Administrator (part of HP's BladeSystem) to manage the system. After authentication, Minefield hangs at "Loading enclosure..." and sometimes crashes. Worked in Fx2. My OA isn't publicly accessible and other than the two crash reports I've submitted I'm not sure what other information would be useful to debug this (error console shows errors in the CSS but crashes before I can grab anything out of it).

Reed Loden [:reed]

Comment 1

•

17 years ago

Note that without a testcase, we can't do anything about it. Do you have breakpad IDs from the crashes?

matthew zeier [:mrz]

Reporter

Comment 2

•

17 years ago

Don't have ids. Not at all possible for me to put the OA on an outside accessible network of course. It doesn't consistently crash so if there's any useful information I can get let me know. (I can give access to any mock folks of course)

matthew zeier [:mrz]

Reporter

Comment 3

•

17 years ago

In the error console: Error: missing ; before statement Source File: javascript:%2010.2.10.27 Line: 1, Column: 5 Source Code: 10.2.10.27 Error: uncaught exception: [Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIWebNavigation.loadURI]" nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)" location: "JS frame :: chrome://global/content/viewSource.js :: viewSource :: line 152" data: no] Page eventually loads but is missing content.

matthew zeier [:mrz]

Reporter

Updated

•

17 years ago

Keywords: qawanted

matthew zeier [:mrz]

Reporter

Comment 4

•

17 years ago

3.0b2 doesn't crash but this still fails for me on OSX 10.5. This prohibits me from using 3.0 full time :(

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 5

•

17 years ago

I can take a look if you give me access to the page. I'm assuming you can't wget the page or something, and attach a testcase?

matthew zeier [:mrz]

Reporter

Comment 6

•

17 years ago

I can't get uou access unless you're physically next to me. This gear sits on an internal network. What sort of info do you need from me?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 7

•

17 years ago

(In reply to comment #6) > What sort of info do you need from me? Not sure what you mean - what we need to move forward is a stack trace or a testcase, ideally. If you can manage to save the page somehow and then remove any sensitive information while making sure it still crashes, you could attach that testcase here. Alternatively, when it crashes you should be able to click to see more details of the crash and get a stack trace from the Mac crash reporter app.

matthew zeier [:mrz]

Reporter

Comment 8

•

17 years ago

Works on this build: ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-22-04-trunk Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061122 Minefield/3.0a1 Fails the next day: ftp://ftp.mozilla.org/pub/firefox/nightly/2006/11/2006-11-23-04-trunk Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061123 Minefield/3.0a1 Does that help narrow it down enough?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 9

•

17 years ago

(In reply to comment #8) > Does that help narrow it down enough? In that range, the only things that pop out are bug 47903 and bug 354693. Bug 47903 is probably most likely - I didn't see any exceptions in the console when you showed me the failure, but it could be that it was handled by the app's JS code. It's going to be a bit tricky trying to progress further without a testcase. Can you try logging in again using Fx2, and once you're at the main screen where everything is functional, and then "Save As...", making sure you have "Complete" selected? Then see if opening that local file in Firefox 3 shows the bustage. I think when I tried it it was earlier in the loading process so I might not have saved the entire file correctly.

matthew zeier [:mrz]

Reporter

Comment 10

•

17 years ago

That page loads in either version of Minefield but that's also past the page where it fails.

juan becerra [:juanb]

Comment 11

•

17 years ago

While VPN'd to MTP I tried connecting to the site in question with the latest trunk and after entering the credentials (ask mrz), I get to a screen with a progress bar that stays at 0%. After a couple or minutes or so, Minefield crashes. http://crash-stats.mozilla.com/report/index/8c081938-ba4f-11dc-95d9-001a4bd43ef6?date=2008-01-03-22 I was able to reproduce the problem in XP and Mac OS X. I also tried confirming the regression range while using the Javascript Debugger. On the latter build from the regression range, I get to a point where it can't stept through an exception. I saved the state of my VM if you want to take a look.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 12

•

17 years ago

Attached file stack trace to malloc error — Details

I ran through the testcase with a Mac debug build on Leopard, and noticed a bunch of these: firefox-bin(950,0xa050bf60) malloc: *** error for object 0x1f5f3320: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug Attached are the stacks I get when I set a breakpoint in malloc_error_break. This looks like something nasty is happening when JS GC is called from cycle collection. This probably explains the seemingly random crashes/hangs I get when trying to reproduce. It might also explain why you got that regression range - the cycle collector was backed out in that range because of a perf regression.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 13

•

17 years ago

And the bonsai URL for mrz's regression range, because I keep having to look it up again: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&branchtype=match&date=explicit&mindate=2006-11-22+02%3A00&maxdate=2006-11-23+05%3A00

Flags: blocking1.9?

Keywords: qawanted

Samuel Sidler (old account; do not CC)

Comment 14

•

17 years ago

This looks like a topcrash given the stacks. It's gotten worse over the last few days. http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0b3pre&range_value=2&signature=RtlpDeCommitFreeBlock (Also adding RtlpDeCommitFreeBlock since that seems to be pretty popular, but see below that.) See also bp-ecf96154-c01a-11dc-ad73-001a4bd43ef6.

Severity: normal → critical

Keywords: crash, topcrash

Summary: HP's OA crashes Minefield when loading blade enclosure info → HP's OA crash [@js_FinalizeObject][@ RtlpDeCommitFreeBlock] when loading blade enclosure info

Mike Schroepfer

Updated

•

17 years ago

Flags: blocking1.9? → blocking1.9+

Priority: -- → P3

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 15

•

17 years ago

Attached file purify double-free report — Details

I ran this testcase through Purify, and this was one of the errors reported. Someone more familiar with the cycle collector or JS engine might be able to provide further insight based on these stacks.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 16

•

17 years ago

I can try to get more complete stacks if that would be useful.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 17

•

17 years ago

Attached file another purify error (free memory read) — Details

Clearly JS strings are somehow getting bogus u.chars.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

17 years ago

Assignee: nobody → general

Component: General → JavaScript Engine

QA Contact: general → general

Brendan Eich [:brendan]

Comment 18

•

17 years ago

Not JS engine. Someone is giving ownership of a string to the JS engine, then welching on the deal. /be

Assignee: general → nobody

Component: JavaScript Engine → XPConnect

QA Contact: general → xpconnect

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

17 years ago

Can you get purify to tell you at what stacks that location in memory was previously allocated and freed?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 20

•

17 years ago

I managed to figure out that the JSString being double-freed was owned by an XSLT node (via the XPCVariant seen in attachment 296769 [details]), and wasted a lot of time figuring out which JS was running and trying to figure out why the JS engine was trying to free the string as well, until shaver helpfully pointed out that the XPCVariant was the one that shouldn't be freeing it's JSString (I should have known, based on comment 18). jag then noticed that the destructor already checked JSVAL_IS_STRING(msJSVal) and only called CleanUp() if it was false. I breakpointed in the destructor to try and figure out how that could be, since the variant in this case most definitely was initialized with a string JSVal. Turns out the cycle collector unlink macro sets mJSVal to null (http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/xpconnect/src/xpcvariant.cpp&rev=1.29#101), which in turn causes JSVAL_IS_STRING(msJSVal) to be false in the destructor, which results in a call to Cleanup() that erroneously frees the variant's data. jag had some ideas for a patch. We essentially need to ensure that the variant doesn't free it's data from the destructor if its data is shared, without relying on msJSVal (because it may have been nulled out by that point).

jag (Peter Annema)

Assignee

Comment 21

•

17 years ago

Attached patch Explicitly clear pointer to shared buffer and always call Cleanup() — Details — Splinter Review

jag (Peter Annema)

Assignee

Comment 22

•

17 years ago

Attached patch Alternatively just explicitly set the type to EMPTY and skip Cleanup() (obsolete) — Details — Splinter Review

jag (Peter Annema)

Assignee

Comment 23

•

17 years ago

Attached patch Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2 — Details — Splinter Review

Attachment #296811 - Attachment is obsolete: true

Peter Van der Beken [:peterv]

Comment 24

•

17 years ago

We could just call nsVariant::SetToEmpty if JSVAL_IS_STRING(tmp->mJSVal) is true?

Peter Van der Beken [:peterv]

Comment 25

•

17 years ago

Ah, no, that calls Cleanup too. Maybe do both then, set type to EMPTY and clear the pointer?

jag (Peter Annema)

Assignee

Comment 26

•

17 years ago

Up to you. I prefer clearing the pointer as a sort of "and here's the second half" to go with manually pointing mData at the buffer we make the nsVariant depend on. On the other hand there's something to be said for making it clear that we're by-passing Cleanup ('coz none's needed) and we're just forcing the nsVariant into an EMPTY state. But if you're gonna do that there's no point really in clearing the pointer too. The state of the rest of the nsVariant fields is irrelevant once the type is set to EMPTY.

Peter Van der Beken [:peterv]

Comment 27

•

17 years ago

Comment on attachment 296812 [details] [diff] [review] Alternatively just explicitly set the type to EMPTY and skip Cleanup() v2 >Index: js/src/xpconnect/src/xpcvariant.cpp >=================================================================== > if(!JSVAL_IS_STRING(tmp->mJSVal)) > nsVariant::Cleanup(&tmp->mData); >+ else Wrong indentation. >+ tmp->mData.mType = nsIDataType::VTYPE_EMPTY; Let's do this. I'm worried that just clearing the pointer will cause us to try to pass a null pointer in where we shouldn't do that (like into an nsDependentString).

Attachment #296812 - Flags: superreview+

Attachment #296812 - Flags: review+

Peter Van der Beken [:peterv]

Updated

•

17 years ago

Attachment #296810 - Flags: superreview+

Attachment #296810 - Flags: review+

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Blocks: 409208

jag (Peter Annema)

Assignee

Comment 28

•

17 years ago

On irc, after I explained that Cleanup() will set mType to VTYPE_EMPTY, we decided to go with the first patch instead.

Status: NEW → ASSIGNED

jag (Peter Annema)

Assignee

Updated

•

17 years ago

Assignee: nobody → jag

Status: ASSIGNED → NEW

jag (Peter Annema)

Assignee

Comment 29

•

17 years ago

Checking in xpcvariant.cpp; /cvsroot/mozilla/js/src/xpconnect/src/xpcvariant.cpp,v <-- xpcvariant.cpp new revision: 1.30; previous revision: 1.29 done

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Jeff Walden [:Waldo]

Updated

•

17 years ago

Flags: in-testsuite?

matthew zeier [:mrz]

Reporter

Comment 35

•

17 years ago

I'm running : Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3pre) Gecko/2008011504 Minefield/3.0b3pre and it still fails to load HP's Onboard Administrator page - fails with similiar symptions (though hasn't actually crashed on me). Should I re-open this?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 36

•

17 years ago

I'd say file a new bug. This bug's summary says it's about a crash, and the crash is fixed.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 37

•

17 years ago

I've noticed that if you wait long enough the page eventually does load. It seems to hang at the "Loading Enclosure Information" stage, though. Definitely a different bug, please do feel free to file it and CC me. Might also be worth contacting HP and letting them know of the problem? Their web app developers might be in the best position to figure out what's wrong...

matthew zeier [:mrz]

Reporter

Comment 38

•

17 years ago

I have bug #412550 open.

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@js_FinalizeObject] [@ RtlpDeCommitFreeBlock]

You need to log in before you can comment on or make changes to this bug.