Closed Bug 601457 Opened 9 years ago Closed 8 years ago

crash [@ JSRopeNodeIterator::init() ]


(Core :: JavaScript Engine, defect, critical)

Windows XP
Not set



Tracking Status
blocking2.0 --- -
status2.0 --- wanted


(Reporter: scoobidiver, Assigned: dmandelin)



(Keywords: crash, regression, topcrash)

Crash Data


(1 file)

Build : Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7pre) Gecko/20101002 Firefox/4.0b7pre

This is a new crash signature that first appeared in b7pre/20100916 build.
It crashes also once in 4.0b6.
It is #34 top crasher in b7pre build for the last week.

Signature	JSRopeNodeIterator::init()
UUID	e87363fe-b80b-4f2c-a71a-b7ecf2101003
Time 	2010-10-03 02:40:02.568312
Uptime	475
Install Age	1405 seconds (23.4 minutes) since version was first installed.
Product	Firefox
Version	4.0b7pre
Build ID	20101002041357
Branch	2.0
OS	Windows NT
OS Version	5.1.2600 Service Pack 2
CPU	x86
CPU Info	AuthenticAMD family 15 model 44 stepping 2
Crash Address	0x720063
App Notes 	AdapterVendorID: 10de, AdapterDeviceID: 0322

Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	JSRopeNodeIterator::init 	js/src/jsstr.h:637
1 	mozjs.dll 	js::gc::MarkAtomRange 	js/src/jsgcinlines.h:337
2 	mozjs.dll 	js_TraceScript 	js/src/jsscript.cpp:1370
3 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:2098
4 	mozjs.dll 	js_TraceObject 	js/src/jsobj.cpp:6175
5 	mozjs.dll 	js::gc::MarkChildren 	js/src/jsgcinlines.h:199
6 	mozjs.dll 	js::gc::MarkObject 	js/src/jsgcinlines.h:179
7 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:2085
8 	mozjs.dll 	js_TraceObject 	js/src/jsobj.cpp:6175

Because of the low level of crashes at the beginning, the regression range is large, but it could be :

More reports at:
blocking2.0: --- → ?
blocking2.0: ? → beta8+
Keywords: topcrash
Interesting that one of the reports I clicked on had this error:

Malformed security token Unknown token format
Error 401
Since this is targeted for Beta8, can we get it assigned to someone to look at?
(In reply to comment #2)
> Since this is targeted for Beta8, can we get it assigned to someone to look at?

sayrer, rob can you take a look at this ?
we saw 1 crash every few days since back to 4.0b4 but then more frequent daily volume starting the end of sept.

the regression range for this looks like it might be around sept 20 builds, or maybe just before in the the sept 16 builds

date     tl crashes -- count build, count build, ...
20100916 1 4.0b42010081813  
20100917 1 4.0b7pre2010091604 
20100920 1 4.0b7pre2010092004  
20100921 2 4.0b7pre2010092004  
20100922 1 4.0b7pre2010092204  
20100923 2 4.0b7pre2010092312  
20100924 4    2 4.0b7pre2010092312, 
	      2 4.0b7pre2010092204, 
20100926 3  2 4.0b7pre2010092504, 
	    1 4.0b7pre2010092204, 
20100927 1 4.0b7pre2010092704 1 , 
20100928 1 4.0b7pre2010092704 1 , 
20100929 4  2 4.0b7pre2010092804, 
	    1 4.0b7pre2010092904, 1 4.0b7pre2010092504, 
20100930 8  5 4.0b7pre2010093004, 
            1 4.0b7pre2010092904, 1 4.0b7pre2010092804, 
            1 4.0b7pre2010092304, 

current volume has built to around 18-34 crashes per day.
Preliminary analysis of the crash reports:

- This is a GC mark phase crash. The most common stack trace tails are:


It crashes on the first line of code in |init|, which tries to read the string header. So, we are crashing because the GC sees ids/atoms that are null pointers or invalid memory.

- In the latest nightly correlation report, 64% of these were associated with an addon called PriceGong and what appears to be one of its files, i0brstub.dll. PriceGong is some kind of adware that purports to be a comparison shopping tool.

So far, this looks like random memory corruption.
Depends on: 608860
Historical analysis:

- The first report with this signature was in a 9/16 build. The next was in a 9/20 build, then 9/22, then it built up to 5-10 per day with a couple of single-day spikes. I'm not sure how to interpret this: was the crash really less common at first, or were there just fewer ADUs then?

- Because the frequency was about 1 per 4 days at the beginning, we have to look back several days from 9/16 to spot changesets that might have introduced the problem. Bug 593256 is one possibility: it does stuff with scoped and landed on 9/11. The next merge was on 9/16, but nothing pops out at me there. And if this is random memory corruption, it could be anything, not just JS and not just Firefox.
Recommended next steps:

- Check the correlation report regularly to see if any other add-ons show up. 

- Take a look at the patches for bug 593256 to see if they could have introduced this.
Assignee: general → dmandelin
I've been watching correlation reports, and so far I see only that PriceGong is associated with about half of these.

Historical analysis part 2:

There was a massive spike on 10/20. Not sure what that's all about, but also, the volume was low (~1/buildday) before then, but moderate (~10/buildday) after that. There was a TM merge on 10/20. Unfortunately, I don't see anything suspicious there.
OK, I chatted about this for a while with Luke and we came up with a diagnostic idea to try after b7. The most common crash address is 0, and the most common stack trace goes through MarkId via the shape mark function. NULL is not a valid id for a shape, so we can instrument all the sites that modify Shape::id and crash if assigning NULL. If the diagnostic crash doesn't get hit, then it's random memory corruption and basically unsolvable. If it does get hit, as I expect it will, then we have more info and can move on from there.
Simple diagnostic patch, just to check for setting |Shape::id| to zero. This doesn't check for setting Shape::id zero by doing a memcpy on a Shape struct. I don't know an easy way to look for this. I would expect we are probably not doing it anyway, although we need to consider it happening to JSObjectMap and EmptyShape as well. But I figured I'd start with something easy, and look farther out later on if needed.
Attachment #488961 - Flags: review?(lw)
Attachment #488961 - Flags: review?(lw) → review+
Depends on: 610642
Depends on: 610910
The diagnostic has been in for a week. Analysis:

The diagnostic was designed to test the hypothesis "Instances of this crash via Shape::mark with a crash address of zero occur because a Shape is created with an id of all-bits-zero." If the hypothesis is correct, we would see those crashes stop (because we crash earlier in that case) and new crashes inside Shape::Shape.

Those crashes didn't stop, or even slow down. And we didn't see new crashes inside Shape::Shape. On the latter, it's possible that that function gets inlined, so we see a different signature. But in that case, we should have seen *some* new topcrash.

I also note that Shape::trace is a pretty common topcrash [1]. That one has nothing to do with Shape::id, but it is another kind of GC crash on shapes. This is evidence for a more general cause involving some kind of memory corruption.

I have a few ideas for how to move forward on this, but they are not particularly good. So I'm going to take out the diagnostics and move this bug back for possible reconsideration later.

blocking2.0: beta8+ → final+
Blocks: 613650
blocking2.0: final+ → -
status2.0: --- → wanted
It is #8 top crasher in 4.0b8 for the last week.
> I have a few ideas for how to move forward on this, but they are not
> particularly good. 

Might be good to get these ideas in the bug.

This is probably the #2 or #3 topcrash when just looking at unfixed regressions from 3.6.x.
Crash Signature: [@ JSRopeNodeIterator::init() ]
These aren't appearing in a release other than 4.0bx in the past 4 weeks. Resolving as works for me.
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.