Closed Bug 601457 Opened 11 years ago Closed 10 years ago
crash [@ JSRope
Node Iterator::init() ]
Build : Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7pre) Gecko/20101002 Firefox/4.0b7pre This is a new crash signature that first appeared in b7pre/20100916 build. It crashes also once in 4.0b6. It is #34 top crasher in b7pre build for the last week. Signature JSRopeNodeIterator::init() UUID e87363fe-b80b-4f2c-a71a-b7ecf2101003 Time 2010-10-03 02:40:02.568312 Uptime 475 Install Age 1405 seconds (23.4 minutes) since version was first installed. Product Firefox Version 4.0b7pre Build ID 20101002041357 Branch 2.0 OS Windows NT OS Version 5.1.2600 Service Pack 2 CPU x86 CPU Info AuthenticAMD family 15 model 44 stepping 2 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x720063 App Notes AdapterVendorID: 10de, AdapterDeviceID: 0322 Frame Module Signature [Expand] Source 0 mozjs.dll JSRopeNodeIterator::init js/src/jsstr.h:637 1 mozjs.dll js::gc::MarkAtomRange js/src/jsgcinlines.h:337 2 mozjs.dll js_TraceScript js/src/jsscript.cpp:1370 3 mozjs.dll fun_trace js/src/jsfun.cpp:2098 4 mozjs.dll js_TraceObject js/src/jsobj.cpp:6175 5 mozjs.dll js::gc::MarkChildren js/src/jsgcinlines.h:199 6 mozjs.dll js::gc::MarkObject js/src/jsgcinlines.h:179 7 mozjs.dll fun_trace js/src/jsfun.cpp:2085 8 mozjs.dll js_TraceObject js/src/jsobj.cpp:6175 Because of the low level of crashes at the beginning, the regression range is large, but it could be : http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0caec4ddff74&tochange=f38ef1080bfe More reports at: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=JSRopeNodeIterator%3A%3Ainit%28%29&version=Firefox%3A4.0b7pre
Interesting that one of the reports I clicked on had this error: Malformed security token e=AHbhhgq3gs8YL5bZKc%2FZsTuKmxfNtmi8wo3ucom.google.gadgets.auth.AuthTokenException: Unknown token format Error 401
Since this is targeted for Beta8, can we get it assigned to someone to look at?
(In reply to comment #2) > Since this is targeted for Beta8, can we get it assigned to someone to look at? sayrer, rob can you take a look at this ?
we saw 1 crash every few days since back to 4.0b4 but then more frequent daily volume starting the end of sept. the regression range for this looks like it might be around sept 20 builds, or maybe just before in the the sept 16 builds JSRopeNodeIterator::init.. date tl crashes -- count build, count build, ... 20100913 20100914 20100915 20100916 1 4.0b42010081813 20100917 1 4.0b7pre2010091604 20100918 20100919 20100920 1 4.0b7pre2010092004 20100921 2 4.0b7pre2010092004 20100922 1 4.0b7pre2010092204 20100923 2 4.0b7pre2010092312 20100924 4 2 4.0b7pre2010092312, 2 4.0b7pre2010092204, 20100925 20100926 3 2 4.0b7pre2010092504, 1 4.0b7pre2010092204, 20100927 1 4.0b7pre2010092704 1 , 20100928 1 4.0b7pre2010092704 1 , 20100929 4 2 4.0b7pre2010092804, 1 4.0b7pre2010092904, 1 4.0b7pre2010092504, 20100930 8 5 4.0b7pre2010093004, 1 4.0b7pre2010092904, 1 4.0b7pre2010092804, 1 4.0b7pre2010092304, current volume has built to around 18-34 crashes per day.
Preliminary analysis of the crash reports: - This is a GC mark phase crash. The most common stack trace tails are: Shape::trace|gc::MarkId|JSRopeNodeIterator::init js_TraceScript|gc::MarkAtomRange|JSRopeNodeIterator::init It crashes on the first line of code in |init|, which tries to read the string header. So, we are crashing because the GC sees ids/atoms that are null pointers or invalid memory. - In the latest nightly correlation report, 64% of these were associated with an addon called PriceGong and what appears to be one of its files, i0brstub.dll. PriceGong is some kind of adware that purports to be a comparison shopping tool. So far, this looks like random memory corruption.
Historical analysis: - The first report with this signature was in a 9/16 build. The next was in a 9/20 build, then 9/22, then it built up to 5-10 per day with a couple of single-day spikes. I'm not sure how to interpret this: was the crash really less common at first, or were there just fewer ADUs then? - Because the frequency was about 1 per 4 days at the beginning, we have to look back several days from 9/16 to spot changesets that might have introduced the problem. Bug 593256 is one possibility: it does stuff with scoped and landed on 9/11. The next merge was on 9/16, but nothing pops out at me there. And if this is random memory corruption, it could be anything, not just JS and not just Firefox.
Recommended next steps: - Check the correlation report regularly to see if any other add-ons show up. - Take a look at the patches for bug 593256 to see if they could have introduced this.
I've been watching correlation reports, and so far I see only that PriceGong is associated with about half of these. Historical analysis part 2: There was a massive spike on 10/20. Not sure what that's all about, but also, the volume was low (~1/buildday) before then, but moderate (~10/buildday) after that. There was a TM merge on 10/20. Unfortunately, I don't see anything suspicious there.
OK, I chatted about this for a while with Luke and we came up with a diagnostic idea to try after b7. The most common crash address is 0, and the most common stack trace goes through MarkId via the shape mark function. NULL is not a valid id for a shape, so we can instrument all the sites that modify Shape::id and crash if assigning NULL. If the diagnostic crash doesn't get hit, then it's random memory corruption and basically unsolvable. If it does get hit, as I expect it will, then we have more info and can move on from there.
Status: NEW → ASSIGNED
Simple diagnostic patch, just to check for setting |Shape::id| to zero. This doesn't check for setting Shape::id zero by doing a memcpy on a Shape struct. I don't know an easy way to look for this. I would expect we are probably not doing it anyway, although we need to consider it happening to JSObjectMap and EmptyShape as well. But I figured I'd start with something easy, and look farther out later on if needed.
Attachment #488961 - Flags: review?(lw)
Diagnostic landed to m-c: http://hg.mozilla.org/mozilla-central/rev/1ccf7b6e0eb7
The diagnostic has been in for a week. Analysis: The diagnostic was designed to test the hypothesis "Instances of this crash via Shape::mark with a crash address of zero occur because a Shape is created with an id of all-bits-zero." If the hypothesis is correct, we would see those crashes stop (because we crash earlier in that case) and new crashes inside Shape::Shape. Those crashes didn't stop, or even slow down. And we didn't see new crashes inside Shape::Shape. On the latter, it's possible that that function gets inlined, so we see a different signature. But in that case, we should have seen *some* new topcrash. I also note that Shape::trace is a pretty common topcrash . That one has nothing to do with Shape::id, but it is another kind of GC crash on shapes. This is evidence for a more general cause involving some kind of memory corruption. I have a few ideas for how to move forward on this, but they are not particularly good. So I'm going to take out the diagnostics and move this bug back for possible reconsideration later.  http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A4.0b8pre&query_search=signature&query_type=startswith&query=js%3A%3AShape&date=11%2F15%2F2010%2012%3A07%3A36&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=js%3A%3AShape%3A%3Atrace%28JSTracer*%29
Backout of diagnostic: http://hg.mozilla.org/mozilla-central/rev/feb768bc0cb9
It is #8 top crasher in 4.0b8 for the last week.
> I have a few ideas for how to move forward on this, but they are not > particularly good. Might be good to get these ideas in the bug. This is probably the #2 or #3 topcrash when just looking at unfixed regressions from 3.6.x.
Crash Signature: [@ JSRopeNodeIterator::init() ]
These aren't appearing in a release other than 4.0bx in the past 4 weeks. Resolving as works for me.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.