Closed Bug 500103 Opened 11 years ago Closed 7 years ago

crash in nsXPConnect::Traverse

Categories

(Core :: XPConnect, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox15 - ---
blocking2.0 --- -
status2.0 --- wanted
status1.9.1 --- wanted

People

(Reporter: samuel.sidler+old, Unassigned)

References

()

Details

(Keywords: crash, testcase, Whiteboard: [crashkill])

Crash Data

Attachments

(1 file, 1 obsolete file)

The current number 6 topcrash in Firefox 3.5 RC happens with the signature of nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&).

I noticed bug 481302 which claims to have fixed a crash with this signature, but this one is still active in Firefox 3.5 RCs.

I only see two distinct "stacks" for this signature.

#1 from bp-5175eaf8-38b2-4f9d-a54e-ed9202090620: 

Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	nsXPConnect::Traverse 	js/src/xpconnect/src/nsXPConnect.cpp:748
1 	xul.dll 	nsXPConnect::ToParticipant 	
2 		@0x12739fff

#2 from bp-fc8fc587-50de-47b1-ab08-6b99f2090621:

Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	nsXPConnect::Traverse 	js/src/xpconnect/src/nsXPConnect.cpp:748
1 	xul.dll 	nsXMLContentSink::HandleStartElement 	
2 		@0x9612fff

I only see this crash on Windows, but the sample size is rather small still... Calling it Windows-only for now.

Lars, can you get me a list of URLs? Feel free to put it into a private bug for privacy reasons.
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Depends on: 500183
Bug 500183 has URLs for Firefox 3.5, 3.5pre and 3.5b99 (in that order)
I couldn't reproduce on a local winxp vm. Am running mac os x now and will be trying win2k3 server later. Not sure I'll be able to effectively test this loading single pages if it's cyclecollection related.
http://fr.justin.tv/fopra from the url list crashed with a SIGILL on the mac os x xserve but I can't reproduce it on my macbook.
http://espn.go.com/broadband/espn360/player?gameId=5323&sportCode=TN&league=Tennis crashed and http://en.wikipedia.org/wiki/Samoa asserted on windows server, but not my winxp vm. I'll get stacks when the windows server machine finishes its run.
I can't reproduce the crash (yet) on espn. 

Exit with code 3 on wikipedia is due to 

###!!! ASSERTION: Invalid offset: 'aOffset <= mSkipChars->mCharCount', file c:/work/mozilla/builds/1.9.1/mozilla/gfx/thebes/src/gfxSkipChars.cpp, line 92

###!!! ASSERTION: Text run does not map enough text for our reflow: 'gfxSkipCharsIterator(iter).ConvertOriginalToSkipped(offset + length) <= mTextRun->GetLength()', file c:/work/mozilla/builds/1.9.1/mozilla/layout/generic/nsTextFrameThebes.cpp, line 5910

###!!! ASSERTION: redo line on totally empty line with non-empty band...: 'aState.IsImpactedByFloat()', file c:/work/mozilla/builds/1.9.1/mozilla/layout/generic/nsBlockFrame.cpp, line 3524

Block(li)(69)@05659370: yikes! spinning on a line over 1000 times!

That's all folks. Nothing related to nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&) on winxp, winserver, mac os x
I have been able to reproduce a hang (taking 100% CPU) on:

http://en.wikipedia.org/wiki/Samoa

Using Windows XP SP3, firefox 3.5 (first official release).

No full screen. Take the right border of the window. Move it to the left (as far as possible) and then to the right again.

Repeat this many times (2 minutes).

No scrolling down is required.

I can't confirm it is the crash as in the crash reports, but it is certainly not right.

Lucas
It was not necessary to have any other tabs open.
I got it better now. You have to resize the Samoa page to a certain width with pixel accuracy.

See attachment. Align the left line of the box containing the state information, just under the W of 'From Wikipedia'. It must be pixel accurate, but then it crashes 100% on my computer (Windows XP, SP3, FF3.5).

Lucas
Lucas, please open about:crashes and paste the crash ids here. Thanks!
Sorry, it is not a crash, it is a hang (100% CPU). I had to kill it with Windows Task Bar. No report send.

Maybe this is a different bug.
Looks like some battle heroes plugin seems to also trigger this crash according to the comments in the crash stats. Did anyone give that a try?
If you take other country pages in the Wikipedia and you resize the width, too small for normal browsing, then Firefox gets in stress. Very very slow. But I couldn't let it hang as for the Samoa page.
Flags: blocking1.9.1.1?
lucas: this bug is dedicated to a crash, if you have a hang, please use a new bug.
https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg
!analyze -v -hang
Ok, tried battlefieldheroes after a 360M download... If you are using a VM you'll need to enabled 3D acceleration. Can't reproduce so far.
i will look into this and see if i can reproduce this!
blocking1.9.1: --- → .2+
Flags: wanted1.9.1.x+
Flags: blocking1.9.1.1?
Flags: blocking1.9.1.1-
blocking1.9.1: .2+ → needed
Group: core-security
Although the Samoa problem started with this bug, I assume it is a separate bug. I submitted bug 505328.

I suggest this bugs continues about the crash and not the Samoa hang.
Flags: wanted1.9.1.x+
Attachment #386099 - Attachment is obsolete: true
blocking1.9.1: needed → ---
Has dropped to crash #36 in FF3.5.3, but probably more because the 3rd-party-related crashes have grown.
Whiteboard: [crashkill]
no well correlated - to start up.
456 total crashes for nsXPConnect::Traverse on 20091113-crashdata.csv
63 start up crashes inside 3 minutes

more prevalent in 3.5.5 than in the 36 betas

distribution of all versions where the nsXPConnect::Traverse crash was found on 20091113-crashdata.csv
 400 Firefox 3.5.5
  20 Firefox 3.5.4
  20 Firefox 3.5.3
   4 Firefox 3.5
   3 Firefox 3.6b2
   3 Firefox 3.6b1
   2 Firefox 3.5.1
   2 Firefox 3.1b3
   1 Firefox 3.5.2
   1 Firefox 3.0.8

domains of sites reflect mostly general web surfing so automated url testing might not help as much here.
  64 chrome://ietab
  45 \N//
  35 http://www.facebook.com
  23 http://apps.facebook.com
  16 http://www.orkut.com.br
  10 http://www.youtube.com
   7 http://www.metaboli.fr
   6 http://mail.live.com
   6 about:blank//
   4 https://mail.google.com
   4 http://nasza-klasa.pl
   4 http://jeuxvideo.orange.fr
   3 http://www.sat1.de
   3 http://www.orkut.com
   3 http://www.metaboli.co.uk
   3 http://www.google.com
   3 http://www.gamesflatrate.de
   3 http://mail.google.com
(In reply to comment #19)
> no well correlated - to start up.

Yeah, that makes sense -- we defer the first cycle collection until a little while after startup.
per crashkill meeting i will look into this again to find a testcase
Never nominated, but marking blocking1.9.2- to explicitly mark [CrashKill] bugs as either blocking or not.  If we can get a patch before RC, we should really consider taking it.
blocking2.0: --- → ?
Flags: blocking1.9.2-
I looked at six minidumps for Firefox 3.6b3:

http://crash-stats.mozilla.com/report/index/0ce0bb36-86dc-45e2-8db4-8c7422091121

Crashing on line 736 (the early part of the function, trying to compute
dontTraverse).

obj (ESI) is 0x91A9A68
clazz (EBP) is 0x00000000
so we crash calling IS_WRAPPER_CLASS(clazz).

Note that js_GetGCThingTraceKind worked, so obj does seem to be in a GC arena.


http://crash-stats.mozilla.com/report/index/24ab83bd-6fc9-44b4-b98b-49e432091121

Same exact point as the previous crash.

obj (ESI) is 0x027E8CA2
clazz (EBP) is 0x8C800080


http://crash-stats.mozilla.com/report/index/e28300db-a3a0-405a-b800-dcf912091121

Here we got a drop farther; IS_WRAPPER_CLASS() seemed to be true.

Didn't go in further; it looked a bit confusing.


http://crash-stats.mozilla.com/report/index/ea4d0cff-55b9-410e-ae71-0f4632091121

Looks just like the first two.

obj (ESI) is 0x4489918
clazz (EBP) is 0x00000000


http://crash-stats.mozilla.com/report/index/eafb504f-a64b-4cc9-838d-9bb612091121

This looked like the third.

I think it's inside WrapperIsNotMainThreadOnly inlined.

wrapper->Native() (ECX) is 0x45B4AE0
wrapper->Native()'s vtable (EDX) is at 0x00000000
So we crash trying to QI.


http://crash-stats.mozilla.com/report/index/f4498894-9375-4c14-ae1d-6a8c02091121

This one is very different.  I won't bother looking further.
(In reply to comment #23)
> obj (ESI) is 0x91A9A68
> clazz (EBP) is 0x00000000
> so we crash calling IS_WRAPPER_CLASS(clazz).

Not sure what this means, it seems like that shouldn't happen. AFAICT we'll always set the classword when creating a JSObject, and don't unset it. Stale pointer that happens to point in GC heap being passed as a child during traversal?
Keywords: testcase-wanted
Whiteboard: [crashkill] → [crashkill][sg:watch]
Not blocking the release on this, but we would obviously take a safe fix.
blocking2.0: ? → -
status2.0: --- → wanted
Does this need to remain security-sensitive?
Group: core-security
Whiteboard: [crashkill][sg:watch] → [crashkill][needs steps or testcase to appear]
related to bug 551163?
Crash Signature: [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)]
We still get lots of these but on recent versions the volume isn't high enough to keep the top crash key word. Sitting at #75 for FF 8.0. Removing the top crash key word.
Keywords: topcrash
Summary: top crash [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)] → crash [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)]
Crash Signature: [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)] → [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)] [@ nsXPConnect::Traverse]
OS: Windows XP → All
Hardware: x86 → All
Summary: crash [@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)] → crash in nsXPConnect::Traverse
Version: 1.9.1 Branch → Trunk
Attached file testcase
I'm seeing this crash occuring with this testcase after a while.
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #30)
> I'm seeing this crash occuring with this testcase after a while.
How did it take to crash?  What platform are you using?  I've had it sitting there for 10 or so minutes on OSX and it isn't crashing.
"How did it" should be "How long did it"
It should crash within 30 seconds or so.
Try the testcase locally or set Firefox to offline mode while the test is running, so the slow loading of bugzilla doesn't interfere.
Okay, it crashed for me if I opened and closed some blank tabs.
Your test case looks very similar to bug 752764 (video, document open and close), so let's move discussion of this over to there.
Depends on: 752764
Duplicate of this bug: 752324
Duplicate of this bug: 558757
Depends on: 551163
Depends on: 656494
Doesn't appear to be a recent regression, and not a top crasher. Not tracking for FF15.
Whiteboard: [crashkill][needs steps or testcase to appear] → [crashkill]
I guess this is fixed now that bug 752764 is fixed.
I don't see any crashes in the top 200 or so list like this on 16-19, probably because it morphed into something else.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.