500105 - Crash @ GraphWalker

Reporter

Description

•

16 years ago

The current #7 topcrash occurs with a signature of GraphWalker::DoWalk(nsDeque&). This crash occurs across platforms (Mac and Windows so far). All crash signatures look like this one, taken from bp-a6c2a662-3402-487e-b4b7-a45442090623, sometimes ending on frame 0, sometimes with the GraphWalker::DoWalk line not repeated: Frame Module Signature Source 0 xul.dll GraphWalker::DoWalk(nsDeque&) xpcom/base/nsCycleCollector.cpp:1186 1 xul.dll GraphWalker::DoWalk(nsDeque&) xpcom/base/nsCycleCollector.cpp:1182 2 xul.dll GraphWalker::WalkFromRoots(GCGraph&) xpcom/base/nsCycleCollector.cpp:1170 3 xul.dll nsCycleCollector::BeginCollection() xpcom/base/nsCycleCollector.cpp:2469 Lars: Can you grab some URLs for this issue from Socorro?

Flags: wanted1.9.1.x+

K Lars Lohn [:lars] [:klohn]

Comment 1

•

16 years ago

Bug 500189 has URLs for Firefox 3.5, 3.5pre and 3.5b99 (in that order)

Bob Clary [:bc] (inactive)

Comment 2

•

16 years ago

No crashes on windows or mac, but I did get hangs on several urls: http://xhamster.com/movies/157398/british_blonde_fucks_a_black_guy_part_3.html http://www.tagged.com/messages.html?action=compose&rid=5393914694 http://www.tagged.com/messages.html?action=compose&rid=5421713351 http://www.tagged.com/messages.html#state=1f-1f4627374050 http://www.tagged.com/friends.html#tab=2 http://www.meebo.com/ http://www.gay.com/chat/grpChatPopout.do http://www.bigbooster.com/other/extractor.html http://www.86696.com/shenyewanzhuanwangyou/26860.html most of these were on a mac book pro, but several were also found on windows.

Daniel Veditz [:dveditz]

Comment 3

•

16 years ago

Haven't people learned by now? Porn kills (your computer). Seriously -- ask yourself. If some porn costs a lot of money or a subscription fee, why is some of it free? Is it perhaps because they have an alternate revenue stream: installing malware?

Samuel Sidler (old account; do not CC)

Reporter

Comment 4

•

16 years ago

Note: This also happens on 1.9.0 (currently #12 overall).

Flags: wanted1.9.0.x+

Samuel Sidler (old account; do not CC)

Reporter

Comment 5

•

16 years ago

See also bug 437449, another cycle collector topcrash.

Samuel Sidler (old account; do not CC)

Reporter

Comment 6

•

16 years ago

Peterv: Can you take a look at the crash reporter stack above and see if there's any problem we can fix here?

Assignee: nobody → peterv

Samuel Sidler (old account; do not CC)

Reporter

Updated

•

16 years ago

status1.9.1: --- → wanted

Flags: wanted1.9.1.x+

Dirkjan Ochtman (:djc)

Comment 7

•

16 years ago

I got one as well, seemingly without interaction. (Internal URLs only, sorry.) http://crash-stats.mozilla.com/report/index/54237630-c199-4b5b-af6e-63f462090728?p=1

mobqueen184

Comment 8

•

16 years ago

Hmmnnnn, my laptop seemed to have crashed on a specific link (http://profile.myspace.com/Modules/Applications/Pages/Canvas.aspx?appId=104283&appParams={%22show_user_id%22%3A%22152617042%22}) THis is the crashing thread i got: Frame Module Signature [Expand] Source 0 xul.dll GraphWalker::DoWalk xpcom/base/nsCycleCollector.cpp:1186 1 xul.dll GraphWalker::DoWalk xpcom/base/nsCycleCollector.cpp:1182 2 xul.dll GraphWalker::WalkFromRoots xpcom/base/nsCycleCollector.cpp:1170 3 xul.dll nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:2469 Show/hide other thread

mixxster

Comment 9

•

15 years ago

We have a Windows XP desktop suffering from these crashes when it runs Firefox 3.5.2, at least twice a day we encounter the problem. The only way we have been able to avoid this bug is to downgrade to the Firefox 3.0.xx branch. Often this occurs as soon as a link is clicked, but we find this hard to reproduce, it seems to be a very unpredictable occurrence. http://crash-stats.mozilla.com/report/index/46d27c8b-1809-4f90-bb17-9eff12090824 We may be having a related BSOD that crashes the entire system, which we never get while running Firefox 3.0.13 or earlier builds.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 10

•

15 years ago

Johnny/Jonas: we need to figure this out before we ship Firefox 3.6

Flags: blocking1.9.2+

Damon Sicore (:damons)

Updated

•

15 years ago

Priority: -- → P2

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

URL: http://crash-stats.mozilla.com/report... → http://crash-stats.mozilla.com/report...

Peter Van der Beken [:peterv]

Comment 11

•

15 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

I'm not completely sure this is what causes the crash, but it seems like it potentially could. The iterator always increments mPointer, even after it just jumped to the start of a new block. I think one potential crash is that if mLastChild is set to the first pointer of the block, we could end up reading uninitialized memory because the iterator wouldn't stop since it would skip over the first pointer when doing operator++ (we do |child = pi->mFirstChild, child_end = pi->mLastChild; child != child_end; ++child|). When writing we don't use an iterator, so we do write to the first pointer of the block.

Attachment #403974 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

15 years ago

Comment on attachment 403974 [details] [diff] [review] v1 I think the code looks correct to me as it is now; the idea here is that iterators never point to the first pointer in a block; instead they point to the null sentinel at the end of the previous block (or in the pool itself) and dereferencing an iterator pointing to the sentinel (see operator*) returns the first pointer in that next block. I think this simplified things in other ways, e.g., by allowing us to create a valid iterator for the position after the end of a block before we've created the next block. I certainly should have documented that better, though.

Attachment #403974 - Flags: review?(dbaron) → review-

Daniel Veditz [:dveditz]

Updated

•

15 years ago

Keywords: topcrash

Peter Van der Beken [:peterv]

Comment 13

•

15 years ago

Attached patch Add some debugging help (obsolete) — Details — Splinter Review

This adds a number of aborts when certain conditions fail (pointers outside of blocks, null pointers where we didn't expect it, ...). I think we should try to land this on trunk to get some more data out of crash reports. I'm also still looking into adding more stuff on the stack, so we can get more out of minidumps.

Attachment #403974 - Attachment is obsolete: true

Peter Van der Beken [:peterv]

Comment 14

•

15 years ago

Attached patch Add some debugging help (obsolete) — Details — Splinter Review

I'd like to land this on trunk (only), but only until we have a couple of crash reports and minidumps. The aborts will probably move the crash to a different spot, but that should give us slightly more data to go on.

Attachment #404970 - Attachment is obsolete: true

Attachment #406482 - Flags: review?(dbaron)

Damon Sicore (:damons)

Updated

•

15 years ago

Whiteboard: [crashkill]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 15

•

15 years ago

How is this going to help; NS_ABORT_IF_FALSE is DEBUG-only. Don't you want runtime aborts?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

15 years ago

Other than that, this looks fine, though. Hopefully it won't be a performance hit. Sorry for the delay in getting to it...

Peter Van der Beken [:peterv]

Comment 17

•

15 years ago

Grmbl, I misread nsDebug.h, I'll switch to NS_RUNTIMEABORT. As for performance, I ran this through tryserver. Most of the numbers didn't really change, shutdown numbers changed a bit but some were down, so not sure how much I need to care about the ones that went up.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 18

•

15 years ago

OK, r=dbaron with NS_RUNTIMEABORT (you need to write your own if-statements with that).

Peter Van der Beken [:peterv]

Comment 19

•

15 years ago

Attached patch Add some debugging help — Details — Splinter Review

I actually went with a CC_RUNTIME_ABORT_IF_FALSE. I'll run this through tryserver again.

Attachment #406482 - Attachment is obsolete: true

Attachment #408038 - Flags: review?(dbaron)

Attachment #406482 - Flags: review?(dbaron)

Johnny Stenback (:jst)

Updated

•

15 years ago

Whiteboard: [crashkill] → [crashkill] ready to land debugging help code?

Peter Van der Beken [:peterv]

Comment 20

•

15 years ago

Debugging help landed (in two pieces): http://hg.mozilla.org/mozilla-central/rev/9bb5e2a5c1ac http://hg.mozilla.org/mozilla-central/rev/80831c195191 I think the second piece missed today's nightly.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Attachment #408038 - Flags: review?(dbaron) → review+

Peter Van der Beken [:peterv]

Comment 21

•

15 years ago

Since landing this on trunk there have not been no new reports submitted on this crash, so I don't have any data yet from the logging patches.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

15 years ago

Are you sure they wouldn't show up under a different signature?

Peter Van der Beken [:peterv]

Comment 23

•

15 years ago

I check for signatures GraphWalker::DoWalk(nsDeque&), EdgePool::CheckIterator(Iterator&) and NodePool::CheckPtrInfo(PtrInfo*). I think that should catch them.

Damon Sicore (:damons)

Comment 24

•

15 years ago

What's next here?

Peter Van der Beken [:peterv]

Comment 25

•

15 years ago

Comment on attachment 408038 [details] [diff] [review] Add some debugging help On the beta the frequency of this crash is slightly higher (though not very high either). If we have a quick new beta I'd like to take this on the branch so it rides along, and we actually get some data.

Attachment #408038 - Flags: approval1.9.2?

Samuel Sidler (old account; do not CC)

Reporter

Comment 26

•

15 years ago

Peter: This bug is blocking1.9.2. You don't need approval to land that. :) (We're also planning to update current beta users to a new beta next week, iirc.)

Damon Sicore (:damons)

Updated

•

15 years ago

Whiteboard: [crashkill] ready to land debugging help code? → [crashkill][crashkill-fix] ready to land debugging help code?

Damon Sicore (:damons)

Updated

•

15 years ago

Whiteboard: [crashkill][crashkill-fix] ready to land debugging help code? → [crashkill][crashkill-debug] ready to land debugging help code?

Peter Van der Beken [:peterv]

Comment 27

•

15 years ago

Comment on attachment 408038 [details] [diff] [review] Add some debugging help I asked for approval because this isn't really a fix. But anyway, landed on 1.9.2: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/297f674eb90f http://hg.mozilla.org/releases/mozilla-1.9.2/rev/6b79d9973d7b Let's hope we get some reports.

Attachment #408038 - Flags: approval1.9.2?

Jesse Ruderman

Updated

•

15 years ago

Whiteboard: [crashkill][crashkill-debug] ready to land debugging help code? → [crashkill][crashkill-debug][debugging code landed on trunk and 1.9.2]

Peter Van der Beken [:peterv]

Comment 28

•

15 years ago

No new crash reports on trunk or 3.6b2pre :-(.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 29

•

15 years ago

Might these be useful? bp-e05f2c77-cfc4-4962-8f61-ba0142091106 bp-817ea2a1-ffdd-4e0a-821d-e4cdd2091108

Peter Van der Beken [:peterv]

Comment 30

•

15 years ago

David, wouldn't those be for bug 437449? That one seemed related to thread-safety issues?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 31

•

15 years ago

No, the MarkRoots crash has an almost identical statistical profile (core count distribution, module correlations) to this one, and I've been presuming it's the same underlying problem as this one. It's definitely not a threadsafety problem.

Peter Van der Beken [:peterv]

Comment 32

•

15 years ago

Well, MarkRoots doesn't have any of the debugging code.

Peter Van der Beken [:peterv]

Comment 33

•

15 years ago

Two reports on 3.6b2: http://crash-stats.mozilla.com/report/index/bfbedc89-ca77-4783-a519-124532091111 http://crash-stats.mozilla.com/report/index/1e1f7869-c343-46c7-b5d5-163632091112 I have the minidumps, the debugging code doesn't seem to have helped much. Back to trying to figure things out from the assembly.

puffin

Comment 34

•

15 years ago

This bug causes frequent intermittent crashes on our Windows XP (SP3) box. There's no single action that precipitates a crash, appearing to be completely random.

Peter Van der Beken [:peterv]

Comment 35

•

15 years ago

We seem to sometimes have a bogus pointer to the next block. At first I thought we might have a bogus mFirstChild/mLastChild, so we'd walk randomly in the blocks and mistake a null PtrInfo* for the sentinel. But one of the crashes seems to be in the debug code I added, when walking the blocks. We walk the blocks using the blocksize, so in that case we just seem to have a bogus pointer at the right spot (last item in the array). I've looked at the block code again, don't see how it could happen. Maybe something else is corrupting our blocks' memory.

Peter Van der Beken [:peterv]

Comment 36

•

15 years ago

(In reply to comment #34) > This bug causes frequent intermittent crashes on our Windows XP (SP3) box. How frequent? Any chance we could get you to generate a full dump when it crashes (I think we can use DrWatson for that)?

Damon Sicore (:damons)

Comment 37

•

15 years ago

-'ing.

blocking2.0: --- → alpha1

Flags: blocking1.9.2+ → blocking1.9.2-

(not currently active) Ted Mielczarek

Comment 38

•

15 years ago

(In reply to comment #18) > OK, r=dbaron with NS_RUNTIMEABORT (you need to write your own if-statements > with that). The NS_RUNTIMEABORT comments are scary, they sound to me like we wouldn't be able to trigger Breakpad on all platforms with that. Is that really what you want?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 39

•

15 years ago

I think we should back out the debugging code on m-c and 1.9.2.

Peter Van der Beken [:peterv]

Comment 40

•

15 years ago

I agree about 1.9.2 (http://hg.mozilla.org/releases/mozilla-1.9.2/rev/96a497f82546 pushed earlier this week), but I don't see why we want to remove it from m-c yet. I think we should make sure it brings up breakpad, and see if anything shows up on crash-stats then.

Peter Van der Beken [:peterv]

Comment 41

•

15 years ago

Attached patch Make debugging code bring up breakpad — Details — Splinter Review

I'd like to take this on trunk right now (unless we get a patch for bug 532490 in the meantime), and see if anything shows up in crash-stats.

Attachment #415830 - Flags: review?(dbaron)

Peter Van der Beken [:peterv]

Updated

•

15 years ago

Whiteboard: [crashkill][crashkill-debug][debugging code landed on trunk and 1.9.2] → [crashkill][crashkill-debug][debugging code landed on trunk]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Attachment #415830 - Flags: review?(dbaron) → review+

Johnny Stenback (:jst)

Comment 42

•

15 years ago

Not blocking the first alpha on this bug.

blocking2.0: alpha1 → beta1

Felix Miata

Comment 43

•

15 years ago

I got this reported yesterday using Mozilla/5.0 (X11; U; Linux i686; rv:1.9.1.9pre) Gecko/20100208 SeaMonkey/2.0.4pre but when it happened again (I presume the same) today the reporter apparently ignored it. Today's current URL was http://us.imdb.com/media/rm1043761920/nm0209289 and I was attempting to return to http://us.imdb.com/name/nm0209289/mediaindex when the crash occurred. As I was attempting to count the number of tabs open (after restore/restart; 23 counted, 2 left to count, total 25) so as to proceed with this comment, it crashed again, and again reported failed to come up. SM was started from Konsole, and this is that window's resulting output: The program 'seamonkey-bin' received an X Window System error. This probably reflects a bug in the program. The error was 'RenderBadPicture (invalid Picture parameter)'. (Details: serial 1998858 error_code 182 request_code 155 minor_code 5) (Note to programmers: normally, X errors are reported asynchronously; that is, you will receive the error a while after causing it. To debug your program, run it with the --sync command line option to change this behavior. You can then get a meaningful backtrace from your debugger if you break on the gdk_x_error() function.)

Felix Miata

Comment 44

•

15 years ago

In 8 or so hours since comment 43 it crashed again, and again. Then the machine locked up and would not reboot into Linux. Main memory failed in a big way according to Memtest86+ 4.0.

chris hofmann

Updated

•

15 years ago

Blocks: 557161

Damon Sicore (:damons)

Comment 45

•

15 years ago

Moving this to beta2. Not seeing a lot of movement here, but yell if you think this should block the first beta.

blocking2.0: beta1+ → beta2+

chris hofmann

Comment 46

•

15 years ago

is the debugging code talked about in comment 39 - 41 still on mozilla-central? that means it would be going out in beta. we should figure out if that's a good idea even if we don't have a good understanding of the cause of the crash or the fix yet.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 47

•

15 years ago

Moving this to beta3, where it will block hard at least on ensuring that the debugging code has been removed - not sure where it lands as a blocker for the fix.

blocking2.0: beta2+ → beta3+

Mike Beltzner [:beltzner, not reading bugmail]

Comment 48

•

15 years ago

Has the debugging code been removed? Can we get an answer to comment 46, please?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 49

•

15 years ago

The debugging code is still present in mozilla-central (and thus still needs to be removed).

Mike Beltzner [:beltzner, not reading bugmail]

Comment 50

•

15 years ago

PeterV: can we get that debugging code removed by Monday, Aug 2 at 23:00 PT please so we can bump this back off the blocking list as per comment 47

Peter Van der Beken [:peterv]

Comment 51

•

15 years ago

I just backed this out.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 52

•

15 years ago

Peter's backout is: http://hg.mozilla.org/mozilla-central/rev/01877f113dab - thanks! Moving back to blocking2.0:? for retriage on the crash issue.

blocking2.0: beta3+ → ?

Benjamin Smedberg

Updated

•

14 years ago

Summary: top crash [@ GraphWalker::DoWalk(nsDeque&)] → top crash [@ GraphWalker::DoWalk(nsDeque&)][@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)]

chris hofmann

Comment 54

•

14 years ago

about 1500 crashes per day. current volumes per release look like. checking --- GraphWalker::DoWalk.nsDeque.. 20101003-crashdata.csv found in: 3.6.10 3.5.13 3.6.8 3.0.19 3.6.3 3.6.6 3.6 3.6.9 3.6.4 3.5b4 3.5.7 3.6b5 3.6.2 3.5.5 3.5.11 3.5.3 3.1b2 3.5.9 3.5.2 3.0b2 3.6b1 3.6.7 3.5.6 3.5.10 3.5 3 .0b5 3.0.5 3.0.17 3.0.10 3.5.8 3.5.12 3.5.1 3.1b3 3.0.9 3.0.6 3.0.18 3.0.15 3.0.14 3.0 release total-crashes GraphWalker::DoWalk.nsDeque.. crashes pct. all 353258 1213 0.00343375 3.6.10 211184 822 0.00389234 3.5.13 17578 112 0.0063716 3.6.8 19390 65 0.00335224 checking --- GraphWalker.scanVisitor.::DoWalk.nsDeque.. 20101003-crashdata.csv found in: 4.0b6 4.0b2 4.0b4 4.0b7pre 4.0b1 4.0b5 4.0b3 3.7a1 release total-crashes GraphWalker.scanVisitor.::DoWalk.nsDeque.. crashes pct. all 353258 127 0.000359511 4.0b6 24891 87 0.00349524 4.0b2 1209 12 0.00992556 4.0b4 1853 10 0.00539665 4.0b7pre2328 5 0.00214777

Benjamin Smedberg

Comment 55

•

14 years ago

Not a serious regression, and without clues as how to reproduce probably not a blocker. I'd love to have more information, though. Correlation reports would be especially helpful.

blocking2.0: ? → -

Edgar Hatchel

Comment 57

•

14 years ago

So is this saying that the bug still exists from all the way back to 2009-06-23 18:57:38 PDT? That it still hasn't been fixed? If so is there an ETA of a fix?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 58

•

14 years ago

Nobody knows how to cause the crash to happen, and as a result no developer has been able to observe the crash happening and figure out why.

Edgar Hatchel

Comment 59

•

14 years ago

Thanks at least I have an answer to the question.

Scoobidiver (away)

Comment 60

•

14 years ago

It is #9 top crasher in 4.0b8 for the last week.

Scoobidiver (away)

Updated

•

14 years ago

Keywords: crash

Scoobidiver (away)

Comment 61

•

14 years ago

Still #9 in 4.0b9. GraphWalker<scanVisitor>::DoWalk(nsDeque&)|EXCEPTION_ACCESS_VIOLATION_READ (85 crashes) 18% (15/85) vs. 6% (805/14431) {AB2CE124-6272-4b12-94A9-7303C7397BD1} (Skype) 26% (22/85) vs. 14% (2016/14431) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865) 25% (21/85) vs. 16% (2373/14431) engine@conduit.com 13% (11/85) vs. 7% (1020/14431) {CAFEEFAC-0016-0000-0022-ABCDEFFEDCBA} (Java console) 92% (78/85) vs. 87% (12517/14431) testpilot@labs.mozilla.com (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)

Brian Carpenter [:geeknik]

Comment 62

•

14 years ago

Minefield just crashed on me while I was away from my PC, the crash report pointed me here. https://crash-stats.mozilla.com/report/index/bp-ea34b673-743a-44e2-ad7a-729f22110213

chris hofmann

Comment 63

•

14 years ago

Scoobidiver (away)

Comment 64

•

14 years ago

It starts showing up as #4 top crasher in 4.0 RC1. Some comments say: "I was on the Addons page, and had clicked to go to top rated personas when it crashed." "was downloading some stuff and got booted off the internet" "Just looking around on Amazon"

Robert Kaiser

Comment 65

•

14 years ago

#10 on 5.0b3 right now, FWIW.

Scoobidiver (away)

Comment 66

•

14 years ago

(In reply to comment #65) > #10 on 5.0b3 right now, FWIW. And #3 top crasher without hangs.

Nobody; OK to take it and work on it

Assignee

Updated

•

14 years ago

Crash Signature: [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)]

Scoobidiver (away)

Updated

•

13 years ago

Crash Signature: [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] → [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<ScanBlackVisitor>::DoWalk(nsDeque&) ]

Phil Ringnalda (:philor)

Comment 68

•

13 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=7562031&tree=Mozilla-Inbound - so there's one way to repro, run mochitest-other 50K times (or however many pushes we've actually triggered builds for), you'll hit it once.

Sheila Mooney

Comment 69

•

13 years ago

Still appears consistently in the top 20 crashes for releases. Can we investigate this further?

Phil Ringnalda (:philor)

Comment 70

•

13 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=8052266&tree=Mozilla-Inbound

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 71

•

13 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=9242840&tree=Mozilla-Beta&full=1#error0

Sheila Mooney

Comment 72

•

13 years ago

This has been a top crash for a long time. The stack that's consistently high is GraphWalker<scanVisitor>::DoWalk(nsDeque&). We have just over 3500 on 10.0 in the past week. It's not a startup crash. Is there anything we can do to investigate this further?

Andrew McCreight [:mccr8]

Comment 73

•

13 years ago

About half of them are null-derefs. Maybe we can add some release-mode assertions to push around the crash to an earlier point where it would be more useful. I can take a look at that after I finish with a NoteXPCOMChild crashes.

Whiteboard: [crashkill][crashkill-debug][debugging code landed on trunk] → [crashkill][crashkill-debug]

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Assignee: peterv → continuation

Sheila Mooney

Comment 74

•

13 years ago

mccr8, that would be awesome.

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Depends on: 727604

Andrew McCreight [:mccr8]

Comment 75

•

13 years ago

WalkFromRoots is a similar signature that has shown up recently. Probably the same thing, just showing up differently in the crash reports due to different inlining.

Crash Signature: [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<ScanBlackVisitor>::DoWalk(nsDeque&) ] → [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<ScanBlackVisitor>::DoWalk(nsDeque&) ] [@ GraphWalker<scanVisitor>::WalkFromRoots(GCGraph&)]

Scoobidiver (away)

Updated

•

13 years ago

Crash Signature: [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<ScanBlackVisitor>::DoWalk(nsDeque&) ] [@ GraphWalker<scanVisitor>::WalkFromRoots(GCGraph&)] → GraphWalker<ScanBlackVisitor>::Walk] [@ GraphWalker<scanVisitor>::WalkFromRoots(GCGraph&)] [@ GraphWalker<scanVisitor>::WalkFromRoots] [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk…

Summary: top crash [@ GraphWalker::DoWalk(nsDeque&)][@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] → Crash @ GraphWalker

Robert Kaiser

Comment 76

•

13 years ago

Still a topcrash - mccr8, did you get somewhere with what you mentioned in comment #73?

Andrew McCreight [:mccr8]

Comment 77

•

13 years ago

I landed some assertions and un-inlining, that are currently on Nightly and Aurora. No progress in figuring out what the problem is. I don't know if I should back out the changes or not. I don't think it will affect performance to any measurable extent, but I could check. It isn't that common on Nightly. If you add the two GraphWalker signatures up on Nightly, you get a ranking of around 65. On Aurora, around 45. On beta, it shows up at 16. In release 11 it is at 12. I'm not sure why there is a such a large difference. I've noticed it before. It could be malware/junkware related, or perhaps our cycle collector optimizations, which make the CC touch less things in memory, just avoid touching bad things, so it isn't showing up here.

Sheila Mooney

Comment 78

•

13 years ago

Still in the top 20 for crashes on release, Fx12.

Sheila Mooney

Comment 79

•

13 years ago

This has gone way down in volume on all channels. Still a valid crash but removing the top crash keyword.

Keywords: topcrash

Andrew McCreight [:mccr8]

Comment 80

•

12 years ago

Currently this is around #90 on 16, #80 on 17. Either the move to a new compiler fixed a compiler bug, or with our CC optimizations we're touching bad memory less.

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Version: 1.9.1 Branch → Trunk

Wayne Mery (:wsmwk)

Comment 81

•

12 years ago

top 50 crash for TB17

Whiteboard: [crashkill][crashkill-debug] → [crashkill][crashkill-debug][tbird crash]

Andrew McCreight [:mccr8]

Comment 82

•

11 years ago

Currently about #288 on Nightly.

Assignee: continuation → nobody

Jesper Hansen

Comment 83

•

10 years ago

Different crashes: https://crash-stats.mozilla.com/search/?product=Firefox&signature=DoWalk&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature Accounts for 1824 crashes the last 7 days. Adding Thunderbird to the mix raises the number to 1860. Currently placed as #104 for 38.0.5 for GraphWalker<T>::DoWalk(nsDeque&) Top-crashes however only counts 948 of these, meaning half of them are other versions of firefox. Using the search numbers would place it in top 50 for Firefox top-crashes.

BMO Automation

Updated

•

9 years ago

Crash Signature: GraphWalker<ScanBlackVisitor>::Walk] [@ GraphWalker<scanVisitor>::WalkFromRoots(GCGraph&)] [@ GraphWalker<scanVisitor>::WalkFromRoots] → GraphWalker<ScanBlackVisitor>::Walk] [@ GraphWalker<scanVisitor>::WalkFromRoots(GCGraph&)] [@ GraphWalker<scanVisitor>::WalkFromRoots] [@ GraphWalker::DoWalk] [@ GraphWalker<T>::DoWalk] [@ GraphWalker<T>::Walk] [@ GraphWalker<T>::WalkFromRoots]

alex_mayorga

Comment 85

•

9 years ago

¡Hola! Ended up here from https://support.mozilla.org/en-US/questions/1098307 4317 crashes in the past month per https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=GraphWalker%3CT%3E%3A%3ADoWalk#tab-sigsummary Updating flags accordingly FWIW. ¡Gracias!

status-firefox42: --- → affected

status-firefox43: --- → affected

status-firefox44: --- → affected

status-firefox45: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Flags: needinfo?(norikachi003)

Asif Youssuff

Updated

•

7 years ago

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox58: --- → affected

status-firefox-esr52: --- → affected

User Dderss

Comment 92

•

7 years ago

The bug is back -- 9 (!) years later: https://crash-stats.mozilla.com/report/index/cbfa60bf-d335-4ace-9e09-939341171106 I was not doing anything, I was away from the browser. The page with YouTube's subscriptions just died on its own.

John

Comment 94

•

6 years ago

I was not doing anything, I was away from the browser. maybe the ads script on the site: http://wiki.edu.vn/wiki http://wikideu.com/wiki/

Liz Henry (:lizzard) (relman/hg->git project)

Comment 95

•

6 years ago

This is now a fairly high volume crash on release 62, for example, for GraphWalker<T>::DoWalk there are 2400+ crashes in the last week: https://crash-stats.mozilla.com/signature/?signature=GraphWalker%3CT%3E%3A%3ADoWalk

status-firefox56: affected → wontfix

status-firefox57: affected → wontfix

status-firefox58: affected → wontfix

status-firefox62: --- → wontfix

status-firefox63: --- → affected

status-firefox64: --- → affected

status-firefox-esr52: affected → wontfix

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Updated

•

5 years ago

Restrict Comments: true

Wayne Mery (:wsmwk)

Comment 101

•

5 years ago

(In reply to Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal) from comment #95)

This is now a fairly high volume crash on release 62, for example, for
GraphWalker<T>::DoWalk there are 2400+ crashes in the last week: https://crash-stats.mozilla.com/signature/?signature=GraphWalker%3CT%3E%3A%3ADoWalk

Current rate is about 1,400 per week for Firefox.

TCW's Thunderbird crash bp-c8782125-241c-489a-81d9-7b91c0200712

BugBot [:suhaib / :marco/ :calixte]

Comment 102

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 20 desktop browser crashes on release (startup)
Top 10 content process crashes on beta
Top 10 content process crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash, topcrash-startup

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Crash Signature: [@ GraphWalker::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<scanVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::DoWalk(nsDeque&)] [@ GraphWalker<ScanBlackVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::Walk… → [@ GraphWalker<scanVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::Walk] [@ GraphWalker<scanVisitor>::WalkFromRoots] [@ GraphWalker::DoWalk] [@ GraphWalker<T>::DoWalk] [@ GraphWalker<T>::Walk] [@ GraphW…

Andrew McCreight [:mccr8]

Updated

•

2 years ago

blocking2.0: - → ---

status1.9.1: wanted → ---

status-firefox42: affected → ---

status-firefox43: affected → ---

status-firefox44: affected → ---

status-firefox45: affected → ---

status-firefox56: wontfix → ---

status-firefox57: wontfix → ---

status-firefox58: wontfix → ---

status-firefox62: wontfix → ---

status-firefox63: affected → ---

status-firefox64: affected → ---

status-firefox-esr52: wontfix → ---

BugBot [:suhaib / :marco/ :calixte]

Comment 103

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash-startup

BMO Automation

Updated

•

2 years ago

Severity: critical → S2

BugBot [:suhaib / :marco/ :calixte]

Comment 104

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

2 years ago

Crash Signature: GraphWalker<T>::WalkFromRoots] [@ EdgePool::Iterator::operator* ] → GraphWalker<T>::WalkFromRoots] [@ EdgePool::Iterator::operator*] [@ PtrInfo::WasTraversed]

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Component: XPCOM → Cycle Collector

Keywords: stalled

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Severity: S2 → S3

Priority: P2 → P5

Sylvestre Ledru [:Sylvestre]

Updated

•

1 year ago

status-firefox119: --- → affected

status-firefox120: --- → affected

status-firefox121: --- → affected

status-firefox-esr115: --- → affected

Sylvestre Ledru [:Sylvestre]

Updated

•

1 year ago

status-firefox119: affected → wontfix

status-firefox122: --- → affected

Gabriele Svelto [:gsvelto]

Comment 105

•

1 year ago

Removing all the signatures that don't have crashes on file anymore.

Crash Signature: [@ GraphWalker<scanVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::DoWalk] [@ GraphWalker<ScanBlackVisitor>::Walk] [@ GraphWalker<scanVisitor>::WalkFromRoots] [@ GraphWalker::DoWalk] [@ GraphWalker<T>::DoWalk] [@ GraphWalker<T>::Walk] [@ GraphW… → [@ GraphWalker<T>::DoWalk] [@ EdgePool::Iterator::operator*] [@ PtrInfo::WasTraversed]

Gabriele Svelto [:gsvelto]

Comment 106

•

1 year ago

I've dug into the three remaining signatures and I found that @ PtrInfo::WasTraversed and @ GraphWalker<T>::DoWalk are clearly caused by bad hardware. A lot of the crashes under those signatures have been detected as having a bit-flip and come from older machines, nothing really new there. @ EdgePool::Iterator::operator* on the other hand looks different, and maybe could be caused by a real bug. A handful of crashes under that signature have been caused by bad memory, this one is a good example. However crashes like that are a minority, the vast majority of crashes under that signature are dereferencing a NULL pointer and not an address that looks like the result of a bit-flip.

The exact point of those crashes is here, and in particular the (mPointer + 1)->block expression is what's hitting the NULL pointer. Here's what's peculiar about this. *mPointer yields a NULL pointer, that causes the condition in this line to be satisfied and so we enter the block where we crash. However, as the comment in the block states, that condition means we've found a sentinel and the following element in the array should be non-NULL. I've opened several minidumps and in all of them I found that both *mPointer and *(mPointer + 1) yielded NULL pointers. That is, the array we're iterating over contains two adjacent NULL elements, even though the code suggests that this should never happen.

Unfortunately I don't know the cycle-collector well enough to be able to tell what's going on, but it doesn't seem accidental.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 107

•

1 year ago

A few comments (as the original author of the iterator in 34606b4dd39f):

the EdgePool is basically storage for large blocks of edges in the graph that the cycle collector builds by calling its traversal methods. The PtrInfo (which are the nodes in the graph) store iterators for the first and last-plus-one outgoing edges, which are all stored adjacent to each other (logically, according to the iterators). But the edges are allocated in chunks, so the iterators sometimes need to jump from one chunk to the next.
They use the typical C++ iterator pattern where the "start" iterator points to the first item and the "end" iterator points to one past the last item.
I'm not sure why the operator* needs that code to look at the next block at all; in hindsight it feels like it should only be in the operator++. It seems like it should be invalid to dereference the one-past-the-end iterator. But if code actually depends on dereferencing the one-past-the-end iterator, then that could be the source of the problem, since (I think, though I didn't really reread the code carefully) that means such code would crash if the very last edge allocated in the graph exactly aligned with the end of a block, and we needed to dereference the iterator corresponding to that very last edge, since at that point I think the next block wouldn't be allocated at all.
(In theory, you could also reach a null-dereference crash as a result of memory corruption if the null sentinel itself were corrupted into non-null and then the traversal continued past the end of the array. However, such a crash seems likely to crash by null-dereference in only a minority of cases.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 108

•

1 year ago

On second thoughts, we probably need that dereferencing behavior because we sometimes use the end-of-block iterator as the start iterator of a new chunk. So never mind... I think.

Gabriele Svelto [:gsvelto]

Comment 109

•

1 year ago

I dug through a dozen crashes to check for the values that are immediately before the two NULLs, and I found two different patterns. One is this:

00 00 01 54 6d 38 1a 08 <--- these all look like valid pointers
00 00 01 54 8c ac 68 88 <----+ | |
00 00 01 54 73 52 a7 28 <------+ |
00 00 01 54 8b 4f d4 28 <--------+
00 00 00 00 00 00 00 00 <--- mPointer points here
00 00 00 00 00 00 00 00
... all zeroes past this point

Or this:

00 00 02 18 71 2e e4 00 <--- looks like a valid pointer
00 00 00 00 00 00 00 02 <--- 0x2 constant?
00 00 02 18 7d 28 30 00 <--- again, what looks like a valid pointer
00 00 02 18 7d 28 30 28 <--- so does this
00 00 00 00 00 00 00 00 <--- mPointer points here
00 00 00 00 00 00 00 00
00 00 02 18 78 ba 88 00 <--- looks like a valid pointer again
00 00 00 00 00 00 00 00 <--- one more NULL

I found three crashes with each pattern, and I'm fairly confident that they come from different machines. The second pattern surprised me quite a bit, the pointer-0x2-pointer-pointer-null-null sequence doesn't appear like something that would occur accidentally.

Gabriele Svelto [:gsvelto]

Comment 110

•

1 year ago

The @ EdgePool::Iterator::operator* is interesting and - save for the odd crash clearly caused by bad hardware - seems to be unrelated to the others. I'll split it out in a separate bug for further investigation.

Andrew McCreight [:mccr8]

Updated

•

1 year ago

URL: http://crash-stats.mozilla.com/report...

v1 15 years ago Peter Van der Beken [:peterv] 549 bytes, patch	dbaron : review-	Details \| Diff \| Splinter Review
Add some debugging help 15 years ago Peter Van der Beken [:peterv] 2.99 KB, patch		Details \| Diff \| Splinter Review
Add some debugging help 15 years ago Peter Van der Beken [:peterv] 4.02 KB, patch		Details \| Diff \| Splinter Review
Add some debugging help 15 years ago Peter Van der Beken [:peterv] 4.73 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
Make debugging code bring up breakpad 15 years ago Peter Van der Beken [:peterv] 986 bytes, patch	dbaron : review+	Details \| Diff \| Splinter Review