1232229 - spike in crashes in FinalizeTypedArenas<T> starting in 2015-12-11 nightly

There are no checkins from GC peers in that range, so this is likely some form of heap corruption. There's very little touching C++ and most of that isn't interacting with JS. It looks like Shu and Jan may be standing closest.

Flags: needinfo?(shu)

Flags: needinfo?(jdemooij)

Shu-yu Guo [:shu]

Comment 2

•

10 years ago

Hm, any tips on how to start debugging this short of backing out patches and seeing if the crashes go away? That signature is not helpful.

Terrence Cole [:terrence]

Comment 3

•

10 years ago

After discussion with Shu and Jan on IRC, this appears to be totally unactonable.

Flags: needinfo?(shu)

Flags: needinfo?(jdemooij)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 4

•

10 years ago

This spiked on aurora a few days later, starting with build 20151217004008. Pushlog for that range is: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=1d43e2723082dadfbca37e0babe873e8c2ea45eb&tochange=1ba4f6d91e637348dc0cdc7e9d311c4cddbc8a5e which isn't useful.

status-firefox44: unaffected → affected

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

10 years ago

It's possible the source of memory corruption is bug 1232231. Also, the aurora bump matches when we merged. See bug 1232231 comment 2.

Depends on: 1232231

Patrick McManus [:mcmanus]

Comment 6

•

10 years ago

https://crash-stats.mozilla.com/report/index/68e3fa2e-ef6b-46db-b920-eeb7d2160106 I triggered this while on the ny times website.. I allowed a google federated login to see some comments on an article.. maybe that will help?

Sylvestre Ledru [:Sylvestre]

Comment 7

•

10 years ago

Topcrash, tracking for now.

tracking-firefox45: ? → +

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: --- → ?

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: ? → m9+

Naveed Ihsanullah [:naveed]

Updated

•

10 years ago

Assignee: nobody → terrence

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Comment 8

•

10 years ago

Is there any way to look at what the top crashes on Nightly were in the period before 12-11? I don't see any other GC crashes in the top 30 crashes on Nightly, so maybe this is just a signature change due to inlining.

Terrence Cole [:terrence]

Comment 9

•

10 years ago

Attached patch more_instrumentation_and_fencing-v0.diff — Details — Splinter Review

As discussed in IRC, let's find out if the list head is corrupted and add more fencing around the BFS state to see if we can narrow down the problem.

Attachment #8715386 - Flags: review?(emanuel.hoogeveen)

Terrence Cole [:terrence]

Updated

•

10 years ago

Keywords: leave-open

Emanuel Hoogeveen [:ehoogeveen]

Comment 10

•

10 years ago

Comment on attachment 8715386 [details] [diff] [review] more_instrumentation_and_fencing-v0.diff Review of attachment 8715386 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine, hope it tells us something.

Attachment #8715386 - Flags: review?(emanuel.hoogeveen) → review+

Terrence Cole [:terrence]

Comment 11

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8c41c8315eeafe568b8d1ce99fedbd79eee62586 Bug 1232229 - Add some instrumentation and more fencing to ArenaLists; r=ehoogeveen

Patrick McManus [:mcmanus]

Comment 12

•

10 years ago

fwiw - I triggered this on slate.com somehow today. Can't repro. https://crash-stats.mozilla.com/report/index/990247a6-8b9e-4b14-bd87-659602160203

Terrence Cole [:terrence]

Comment 13

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ad652aac6b74eb862d79a52b9e516531b51c95df Backed 8c41c8315eea (bug 1232229) for breaking all the things on a CLOSED TREE.

Terrence Cole [:terrence]

Comment 14

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/64392b4fdaad6cefea48df5d9fd144006defda3c Bug 1232229 - Add some instrumentation and more fencing to ArenaLists; r=ehoogeveen

Emanuel Hoogeveen [:ehoogeveen]

Comment 15

•

10 years ago

For context on why Terrence relanded this unchanged: I didn't see any crashes locally or on treeherder, and upon doing a new build, Terrence didn't either (he suspects corruption related to his failing HDDs).

Carsten Book [:Tomcat]

Comment 16

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8c41c8315eea https://hg.mozilla.org/mozilla-central/rev/ad652aac6b74 https://hg.mozilla.org/mozilla-central/rev/64392b4fdaad

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: m9+ → +

Priority: -- → P1

Liz Henry (:lizzard) (relman/hg->git project)

Comment 17

•

10 years ago

Tracking for 46 and 47. This is the #2 topcrash for 46 aurora and #5 in 47 nightly.

status-firefox46: --- → affected

status-firefox47: --- → affected

tracking-firefox46: --- → +

tracking-firefox47: --- → +

Biru [:poiru]

Updated

•

10 years ago

Blocks: 1246180

Emanuel Hoogeveen [:ehoogeveen]

Comment 18

•

10 years ago

So far, nothing has shown up with sweepBackgroundThings or its callers as the signature (none of those have uintptr_t(-1) as the address): https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AsweepBackgroundThings#aggregations https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=free_impl+|+js%3A%3Agc%3A%3AGCRuntime%3A%3AsweepBackgroundThings#aggregations https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=js%3A%3AGCHelperState%3A%3AdoSweep#aggregations So I think the head of the ArenaList is probably fine - and changing the memory ordering of the BackgroundFinalizeState either fixed the crash, moved it, or we've gotten very lucky, as there haven't been any crashes in FinalizeTypedArenas<T> except for 1 on Android: https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=FinalizeTypedArenas%3CT%3E#aggregations We're still getting a steady supply of FinalizeArenas crashes, however: https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=FinalizeArenas#aggregations ... so I think it just moved. Perhaps we could do some strategic checking of ArenaLists on Nightly at points where the GC has touched it. Just dereferencing the whole list should be enough (assuming it doesn't get optimized out), which might not be *too* bad for performance.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 19

•

10 years ago

I think I just set this flag wrong in comment 4.

status-firefox44: affected → unaffected

Robert Kaiser

Updated

•

10 years ago

Crash Signature: [@ FinalizeTypedArenas<T>] [@ FinalizeArenas] → [@ FinalizeTypedArenas<T>] [@ FinalizeArenas] [@ je_free | FinalizeTypedArenas<T>]

Jan de Mooij [:jandem]

Assignee

Updated

•

10 years ago

Comment 20

•

10 years ago

Terrence, any news on this?

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 21

•

10 years ago

Jan pushed a diagnostic patch in bug 1233944. Crash-stats is giving me "Internal Server Error" at the moment so I can't check to see if that is showing up the problem.

Flags: needinfo?(terrence)

Emanuel Hoogeveen [:ehoogeveen]

Comment 22

•

10 years ago

Checking https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160205030204&version=47.0a1&signature=FinalizeArenas#aggregations I don't think the diagnostic patch shifted it. If DXR is right (and I think it is), the only places that actually set the "next" field of an ArenaHeader are the following four: https://dxr.mozilla.org/mozilla-central/rev/ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#926 https://dxr.mozilla.org/mozilla-central/rev/ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#2262 https://dxr.mozilla.org/mozilla-central/rev/ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#2810 https://dxr.mozilla.org/mozilla-central/rev/ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.h#461 The last one is probably too hot, but perhaps we could add an assertion on the first three (checking for uintptr_t(-1)). Does that sound reasonable?

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 23

•

10 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #22) > Checking > https://crash-stats.mozilla.com/signature/ > ?build_id=%3E%3D20160205030204&version=47. > 0a1&signature=FinalizeArenas#aggregations I don't think the diagnostic patch > shifted it. > > If DXR is right (and I think it is), the only places that actually set the > "next" field of an ArenaHeader are the following four: > > https://dxr.mozilla.org/mozilla-central/rev/ > ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#926 > https://dxr.mozilla.org/mozilla-central/rev/ > ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#2262 > https://dxr.mozilla.org/mozilla-central/rev/ > ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.cpp#2810 > https://dxr.mozilla.org/mozilla-central/rev/ > ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/js/src/jsgc.h#461 > > The last one is probably too hot, but perhaps we could add an assertion on > the first three (checking for uintptr_t(-1)). Does that sound reasonable? Wow, thanks for doing that research! Unless the slowdown is truly *massive*, let's just check all of them, at least for as long as we need to solve the quality issue.

Flags: needinfo?(terrence)

Jan de Mooij [:jandem]

Assignee

Comment 24

•

10 years ago

I'm not convinced it's a bogus ArenaHeader; for instance, the OS X crashes look more like a nullptr crash inside JSObject::finalize. It'd be great if someone could load a minidump in Visual Studio so we could stare at the actual assembly :)

Emanuel Hoogeveen [:ehoogeveen]

Comment 25

•

10 years ago

I missed a few instances where we're setting it through a cursor - on the plus side, several of them also write to the same value, so we don't need to check those. The ones involving SortedArenaList are probably the most painful in terms of performance, and I'd be surprised if that was implicated (if any part of the sorting was broken I'd expect a *lot* of crashes), so I'll probably take those out. Going to run at least Octane to see if the effect on performance is noticeable. (In reply to Jan de Mooij [:jandem] from comment #24) > I'm not convinced it's a bogus ArenaHeader; for instance, the OS X crashes > look more like a nullptr crash inside JSObject::finalize. We might be looking at different crashes - these are almost all on Windows, and most of those have crash address 0xffffffffffffffff (I don't see any with just 0xffffffff, but I don't know if the fact that it's 64-bit means anything). I missed a few instances where > It'd be great if someone could load a minidump in Visual Studio so we could > stare at the actual assembly :) Agreed on that count!

Robert Kaiser

Comment 26

•

10 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #25) > We might be looking at different crashes - these are almost all on Windows, > and most of those have crash address 0xffffffffffffffff (I don't see any > with just 0xffffffff, but I don't know if the fact that it's 64-bit means > anything). The fact that we show the address as 0xffffffffffffffff is a bug, or maybe even an artifact of the way x86_64 works (I don't remember exactly how that worked) - you need to check the registers in the JSON output within the "raw dump" tab to find out useful addresses. It looks to me like those 64bit crashes have "rax": "0x4b4b4b4b4b4b4b4b" while the 32bit ones have an address of 0x4b4b4b4b - which all sounds like JS_SWEPT_TENURED_PATTERN, see https://dxr.mozilla.org/mozilla-central/source/js/public/Utility.h#47

Emanuel Hoogeveen [:ehoogeveen]

Comment 27

•

10 years ago

Ack, good catch! In that case we're not even asserting on the right pattern right now!

Emanuel Hoogeveen [:ehoogeveen]

Comment 28

•

10 years ago

Attached patch Instrument setting ArenaHeader::next to catch misuse and fix existing instrumentation. (obsolete) — Details — Splinter Review

This *should* check all the places where the ArenaHeader::next field is set, including "indirectly", except where we dereference the value being set already. It may be a bit overzealous, but doesn't move the needle for me on Octane locally. This also fixes the existing assertion to check for the right pattern! Hopefully "uintptr_t(UINT64_C(0x4b4b4b4b4b4b4b))" is the right way to do this - IIUC it should truncate in the expected way on 32-bit.

Attachment #8719988 - Flags: review?(terrence)

Emanuel Hoogeveen [:ehoogeveen]

Updated

•

10 years ago

Attachment #8715386 - Flags: checkin+

Emanuel Hoogeveen [:ehoogeveen]

Comment 29

•

10 years ago

A try run just in case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2bab7c4a985a

Emanuel Hoogeveen [:ehoogeveen]

Comment 30

•

10 years ago

Attached patch Instrument setting ArenaHeader::next to catch misuse and fix existing instrumentation (v2). — Details — Splinter Review

Ugh, did I really manage to mess up the pattern there? It's been a long day. This one should be good (also fixed a line that got too long).

Attachment #8719988 - Attachment is obsolete: true

Attachment #8719988 - Flags: review?(terrence)

Attachment #8720000 - Flags: review?(terrence)

Emanuel Hoogeveen [:ehoogeveen]

Comment 31

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d75c753da55

Jan de Mooij [:jandem]

Assignee

Comment 32

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #24) > It'd be great if someone could load a minidump in Visual Studio so we could > stare at the actual assembly :) So, I just looked at the minidump of the following crash: https://crash-stats.mozilla.com/report/index/4b606ff1-e9aa-4df7-91d8-a369a2160213 FWIW, MSVC with PGO does a ton of inlining and speculative optimizations. For instance, when we call clasp->finalize() it actually checks for (and inlines) proxy_Finalize, SavedFrame::finalize, and XPC_WN_NoHelper_Finalize first. If all that fails, it does the pointer call. We're crashing in Arena::finalize, because we have a bogus JSObject. Its group_ field is 0x4b4b4b4b. We crash when we try to load its clasp. (I've no idea why Breakpad thinks the crash address is 0xffffffffffffffff. My best guess is that it's looking at the SliceBudget value on the stack, that one *does* have this value AFAICS.) // The !isMarked case. // Load the object's group. 000007FEE11D8DDC mov rax,qword ptr [rdi] // Load the group's Class. 000007FEE11D8DDF mov r15,qword ptr [rax] <--------------- // Load and null check clasp->finalize. 000007FEE11D8DE2 mov rax,qword ptr [r15+48h] 000007FEE11D8DE6 test rax,rax 000007FEE11D8DE9 jne FinalizeArenas+278h (07FEE11D8E58h) // No finalizer. Jump if !clasp->isNative(). 000007FEE11D8DEB test dword ptr [r15+8],40000h 000007FEE11D8DF3 jne FinalizeArenas+248h (07FEE11D8E28h) // Jump if nobj->hasDynamicSlots(). 000007FEE11D8DF5 mov rcx,qword ptr [rdi+10h] 000007FEE11D8DF9 test rcx,rcx 000007FEE11D8DFC jne FinalizeArenas+392h (07FEE11D8F72h) // Jump if nobj->hasDynamicElements(). 000007FEE11D8E02 mov rcx,qword ptr [rdi+18h] 000007FEE11D8E06 lea rax,[ReturnFloat32Reg (07FEE3F75B78h)] 000007FEE11D8E0D cmp rcx,rax 000007FEE11D8E10 jne FinalizeArenas+6CCh (07FEE11D92ACh) // Jump if nobj->shape_->listp == &nobj->shape_. 000007FEE11D8E16 mov rcx,qword ptr [rdi+8] 000007FEE11D8E1A lea rax,[rdi+8] 000007FEE11D8E1E cmp qword ptr [rcx+20h],rax 000007FEE11D8E22 je FinalizeArenas+98Bh (07FEE11D956Bh) // Check whether disablePoison has been initialized, set. 000007FEE11D8E28 test byte ptr [halfDomain+8h (07FEE3F6AF34h)],1 000007FEE11D8E2F je `js::irregexp::RegExpEmpty::GetInstance'::`2'::`dynamic atexit destructor for 'instance''+2DAA8h (07FEE198A108h) 000007FEE11D8E35 cmp byte ptr [disablePoison (07FEE3F6AF38h)],0 000007FEE11D8E3C jne FinalizeArenas+26Eh (07FEE11D8E4Eh) // Poison with 0x4b. 000007FEE11D8E3E mov r8,r13 000007FEE11D8E41 mov edx,4Bh 000007FEE11D8E46 mov rcx,rdi 000007FEE11D8E49 call memset (07FEE19396A4h)

Jan de Mooij [:jandem]

Assignee

Comment 33

•

10 years ago

thingSize seems to be 168, that's 21 words on x64 and that matches OBJECT16_BACKGROUND. So this object is pretty big (we may have crashes with different object sizes, of course).

Jan de Mooij [:jandem]

Assignee

Comment 34

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #33) > thingSize seems to be 168, that's 21 words on x64 and that matches > OBJECT16_BACKGROUND. Hm actually, OBJECT16 is 160 bytes, not 168. So either r8 does not hold the thingSize (maybe something clobbered it), or sizeof(JSObject_Slots16) is different on Windows. Will look into this a bit more. (Btw, it'd be nice to use some c++ magic to make thingSize a known constant for each alloc kind.)

Jan de Mooij [:jandem]

Assignee

Comment 35

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #34) > Hm actually, OBJECT16 is 160 bytes, not 168. So either r8 does not hold the > thingSize (maybe something clobbered it) Nevermind, r8 is also used for isMarked() check so it no longer holds the thingSize when we crash. Based on the stack (thingSize is in ebp-0x48), I think it's actually 0x20, which is very small: AllocKind::OBJECT0_BACKGROUND.

Emanuel Hoogeveen [:ehoogeveen]

Comment 36

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #32) > (I've no idea why Breakpad thinks the crash address is 0xffffffffffffffff. This is something to do with x86_64 on Windows, as Robert Kaiser pointed out in comment #26. I knew I remembered that from somewhere, and indeed: bug 974420 has some details.

Jan de Mooij [:jandem]

Assignee

Comment 37

•

10 years ago

Some more data: * The JSObject* we're finalizing has address 0x27415040. It's the arena's *second* object (it comes after the arena header and some other object). The ArenaCellIterImpl's limit is 0x27416000. * Our ArenaCellIterImpl has an empty FreeSpan: (0, 0) So, either: (1) Since it's the second object, the arena must be completely full (well, except for the first object maybe). This seems a bit unlikely, but also highly suspicious because we're trying to finalize an object that has already been poisoned. (2) The empty FreeSpan is bogus and we should have skipped this already-finalized object. Assuming it's a bogus FreeSpan, the question is how that can happen. Maybe we can add some instrumentation.

Terrence Cole [:terrence]

Comment 38

•

10 years ago

Comment on attachment 8720000 [details] [diff] [review] Instrument setting ArenaHeader::next to catch misuse and fix existing instrumentation (v2). Review of attachment 8720000 [details] [diff] [review]: ----------------------------------------------------------------- I'm not sure how likely this is to be at fault after Jan's analysis, but I think it's still worth eliminating as a possibility.

Attachment #8720000 - Flags: review?(terrence) → review+

Emanuel Hoogeveen [:ehoogeveen]

Comment 39

•

10 years ago

Okay, let's try it. Maybe we can do something similar for FreeSpan.

Keywords: checkin-needed

Biru [:poiru]

Updated

•

10 years ago

Blocks: e10s-crashes

Pulsebot

Comment 40

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e00a02282951

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 41

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e00a02282951

Brad Lassey [:blassey] (use needinfo?)

Comment 42

•

10 years ago

renom'ing. Top crash in beta experiment. Poiru, can you indicate where this is in the top crash list?

tracking-e10s: + → ?

Flags: needinfo?(birunthan)

Biru [:poiru]

Comment 43

•

10 years ago

(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #42) > renom'ing. Top crash in beta experiment. Poiru, can you indicate where this > is in the top crash list? This is #2 (3.51%) for content processes and #3 (3.35%) acros the board. See bug 1249209 comment 2 for full list.

Flags: needinfo?(birunthan)

Emanuel Hoogeveen [:ehoogeveen]

Comment 44

•

10 years ago

FWIW, I got to thinking following Jan's analysis and I don't think it's all that likely that we're setting ArenaHeader::next to the poison pattern directly at any point. More likely, we're pointing to an Arena that gets swept and poisoned, so *its* next field contains the poison value. Unfortunately I don't really see how we could recover where we started pointing to the offending (now poisoned) Arena. The only thing I can think of would be to use a few bits on each ArenaHeader to indicate the assignment site for the last write to an ArenaHeader's next field. Then we'd have to store that data (e.g. in a temporary variable) so it's available in the minidump. But considering how few assignment sites there are, maybe we should just stare at all of them and see if a place jumps out where we could be inserting the same Arena into a linked list twice. Of course, that might involve looking at a lot of code :\ It's a pity none of us have ever gotten this crash, since that would open up a lot more options (I wouldn't mind running with some sort of duplicate insertion checking code locally, but I imagine it'd be pretty slow).

Emanuel Hoogeveen [:ehoogeveen]

Comment 45

•

10 years ago

Jan pointed out to me that the assembly he analyzed above shows I'm probably barking up the wrong tree with the ArenaHeader::next thing. Also, I realized I actually got one of these crashes earlier today! It didn't register because it was a content crash (and I haven't been running with e10s for very long). So I might be able to help test this locally after all.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 46

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #8) > Is there any way to look at what the top crashes on Nightly were in the > period before 12-11? I don't see any other GC crashes in the top 30 crashes > on Nightly, so maybe this is just a signature change due to inlining. I just regenerated http://dbaron.org/mozilla/crashes-by-build with a 180 day window instead of 31 days, so it should be useful for those dates again. I don't see any obvious JS-related signatures that disappeared on the 11th. (I looked for signatures present on both the 10th and 7th, and with 4 or more crashes on the 10th.) But I skimmed relatively quickly...

Jonathan Howard

Comment 47

•

10 years ago

I stumbled upon seeing this telemetry spike on 2015-12-04. Not the experience to relate it to anything but know this bug is from the period and outstanding without great idea to resolve. Seems worth a shot posting even of not linked. https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=bucket-0&cumulative=0&end_date=2016-02-18&keys=!__none__!__none__&max_channel_version=nightly%252F47&measure=SHUTDOWN_OK&min_channel_version=nightly%252F44&processType=false&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-29&trim=1&use_submission_date=0

Jonathan Howard

Comment 48

•

10 years ago

Attached file linux-gdb-stacks.txt — Details

Managed to temporarily have bug intermittently. Seems whatever advert has gone now. First inside debugger on central. See attached. Was wondering about bug 1230162 but seems not after adding a few outputs. Tried nighties 7-10 all OK. Finally bisected twice. with e10s on. (First didn't seem to start well, build I expected to fail didn't.) 2016-02-22T15:48:10: INFO : Narrowed inbound regression window from [560d36a9, 39c8aabf] (3 revisions) to [560d36a9, a1caa7e7] (2 revisions) (~1 steps left) 2016-02-22T15:48:10: DEBUG : Starting merge handling... 2016-02-22T15:48:10: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=a1caa7e7949fb888023b335f22ea01b05dfb1bc8&full=1 2016-02-22T15:48:11: DEBUG : Found commit message: Backed out changeset bcb4ebf6ffac (bug 1198459) for bustage --------------- 2016-02-22T16:36:28: INFO : Narrowed inbound regression window from [286ad584, 151ce2b0] (4 revisions) to [6fce35b1, 151ce2b0] (2 revisions) (~1 steps left) 2016-02-22T16:36:28: DEBUG : Starting merge handling... 2016-02-22T16:36:28: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=151ce2b0e3f6b73505be35561f148678577dcbcb&full=1 2016-02-22T16:36:29: DEBUG : Found commit message: Bug 1225396 part 4 - Remove @@iterator workaround in Codegen.py. r=bz This maybe one to look into. Intermittent and there were four tests after the bad that I could not crash, followed by not crashing on my central build so can't discount older in the window.

Jan de Mooij [:jandem]

Assignee

Comment 49

•

10 years ago

(In reply to Jonathan Howard from comment #48) > Managed to temporarily have bug intermittently. Seems whatever advert has > gone now. Can you post exact STR, even though they may no longer work? This is super useful information, thank you!

Jan de Mooij [:jandem]

Assignee

Comment 50

•

10 years ago

OK I managed to catch this in rr today and with some serious IRC debugging help from the GC team we know what's going on. These bugs are impossible to fix without rr. EnsureParserCreatedClasses is allocating a nursery object for the background-Zone (that's the bug). Then a minor GC on the main thread races with refillFreeListOffMainThread called from the background thread. That messes up the background-Zone's free list and Finalize then thinks an Arena is free, while it's mostly garbage, so we crash. So this is a regression from bug 1225396, but extremely subtle, and we should add some asserts to catch similar bugs in the future. More tomorrow.

Depends on: 1225396

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Whiteboard: [rr]

Jim Mathies [:jimm]

Updated

•

10 years ago

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Comment 51

•

10 years ago

not e10s specific, untracking.

tracking-e10s: ? → -

Flags: needinfo?(jmathies)

Boris Zbarsky [:bzbarsky]

Comment 52

•

10 years ago

Is it possible that this and bug

Blocks: 1233944

Sylvestre Ledru [:Sylvestre]

Comment 53

•

10 years ago

Congrat, we will be watching your progress and take a patch in 45 to fix it! Encore bravo!

Jon Coppeard (:jonco)

Comment 54

•

10 years ago

Attached patch bug1232229-add-nursery-asserts — Details — Splinter Review

Not a fix, but adds assertions to catch this problem. We get 28 jit-test failures with this patch applied.

Attachment #8722926 - Flags: review?(terrence)

Jan de Mooij [:jandem]

Assignee

Comment 55

•

10 years ago

Attached patch Fix — Details — Splinter Review

This patch replaces the NewObjectWithGivenProto call with global->createBlankPrototypeInheriting, to match most other places where we create prototype objects. createBlankPrototypeInheriting calls CreateBlankProto, where we call NewNativeObjectWithGivenProto with a SingletonObject argument.

Assignee: terrence → jdemooij

Status: NEW → ASSIGNED

Attachment #8723026 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco)

Comment 56

•

10 years ago

Comment on attachment 8723026 [details] [diff] [review] Fix Review of attachment 8723026 [details] [diff] [review]: ----------------------------------------------------------------- Looks good!

Attachment #8723026 - Flags: review?(jcoppeard) → review+

Pulsebot

Comment 57

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/584e6e436e27

Robert Kaiser

Comment 58

•

10 years ago

If this is safe enough, we'd really like to get this into beta very soon, given that this is one of the topcrashes on 45 and stability on that train has not been good in general so far.

Sylvestre Ledru [:Sylvestre]

Comment 59

•

10 years ago

The gtb is tomorrow. Jandem, can we have an uplift request right now?

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Assignee

Comment 60

•

10 years ago

Comment on attachment 8723026 [details] [diff] [review] Fix Approval Request Comment [Feature/regressing bug #]: Bug 1225396. [User impact if declined]: Frequent crashes. [Describe test coverage new/current, TreeHerder]: I don't have a test for this but it does fix the debug asserts added by another patch. [Risks and why]: Low risk. The patch is oneliner-ish and we have decent tests for this code. [String/UUID change made/needed]: None.

Flags: needinfo?(jdemooij)

Attachment #8723026 - Flags: approval-mozilla-beta?

Attachment #8723026 - Flags: approval-mozilla-aurora?

Jonathan Howard

Comment 61

•

10 years ago

Steps eventually was retrying (more precise was in attachment.) http://www.nytimes.com/2016/01/24/technology/larry-page-google-founder-is-still-innovator-in-chief.html?_r=0 taken from bug 1233944 I think I got lucky on Monday having it occur multiple times. Tried yesterday crashed once but required more effort. Today still took a while to crash once on central build. Added patch and gave that a fair amount of attempts, no crash. Did have one hang for couple seconds that I hadn't experienced any other time trying, maybe a sign the patch worked. (Back to normal ad-blocked build.)

Terrence Cole [:terrence]

Comment 62

•

10 years ago

Comment on attachment 8722926 [details] [diff] [review] bug1232229-add-nursery-asserts Review of attachment 8722926 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for taking this!

Attachment #8722926 - Flags: review?(terrence) → review+

Liz Henry (:lizzard) (relman/hg->git project)

Comment 63

•

10 years ago

Comment on attachment 8723026 [details] [diff] [review] Fix Fix for topcrash, please uplift to aurora and beta. This should make it into tomorrow's beta 10 build.

Attachment #8723026 - Flags: approval-mozilla-beta?

Attachment #8723026 - Flags: approval-mozilla-beta+

Attachment #8723026 - Flags: approval-mozilla-aurora?

Attachment #8723026 - Flags: approval-mozilla-aurora+

Terrence Cole [:terrence]

Comment 64

•

10 years ago

I guess we should back out the speculative patches now?

Flags: needinfo?(emanuel.hoogeveen)

Emanuel Hoogeveen [:ehoogeveen]

Comment 65

•

10 years ago

(In reply to Terrence Cole [:terrence] from comment #64) > I guess we should back out the speculative patches now? Yes please! I don't think they're going to catch anything, so they just clutter up the code.

Flags: needinfo?(emanuel.hoogeveen)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 66

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/1606a71166df

status-firefox46: affected → fixed

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 67

•

10 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/7190c35c9e14

status-firefox45: affected → fixed

Pulsebot

Comment 68

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/17bfd6a2a529

Carsten Book [:Tomcat]

Comment 69

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/584e6e436e27

Terrence Cole [:terrence]

Comment 70

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d2bb13d6c37e90963992e9da9428c3aedf2bfaa Backout e00a02282951 (bug 1232229) as we no longer need the diagnostics.

Jan de Mooij [:jandem]

Assignee

Updated

•

10 years ago

Keywords: leave-open

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 71

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/17bfd6a2a529

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox47: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 72

•

10 years ago

Should the diagnostics be removed from aurora/beta, or are they harmless? (Or were they never there in the first place?)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 73

•

10 years ago

Also, marking as verified on nightly since the crash is no longer present in crash-stats for today's nightly.

Status: RESOLVED → VERIFIED

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 74

•

10 years ago

backout bugherder

Merged backout https://hg.mozilla.org/mozilla-central/rev/7d2bb13d6c37

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1251922

Sylvestre Ledru [:Sylvestre]

Comment 75

•

10 years ago

The crash stills occurs in 45 beta 10 but the overall rate is quite low (~50 crashes)

:Gijs (he/him)

Updated

•

10 years ago

Depends on: 1251589

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

9 years ago

Depends on: 1265667

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

No longer depends on: 1265667

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Version: unspecified → 45 Branch

Hardik Mehta

Comment 78

•

8 years ago

Hello Team, just for your information, i had multiple crashes in fireforESR 45.9.0 on my Kali Kinux system. CrashID: https://crash-stats.mozilla.com/report/index/421984b5-13e4-4c7d-9576-4b6371170509

more_instrumentation_and_fencing-v0.diff 10 years ago Terrence Cole [:terrence] 2.13 KB, patch	ehoogeveen : review+ ehoogeveen : checkin+	Details \| Diff \| Splinter Review
Instrument setting ArenaHeader::next to catch misuse and fix existing instrumentation. 10 years ago Emanuel Hoogeveen [:ehoogeveen] 4.94 KB, patch		Details \| Diff \| Splinter Review
Instrument setting ArenaHeader::next to catch misuse and fix existing instrumentation (v2). 10 years ago Emanuel Hoogeveen [:ehoogeveen] 4.97 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
linux-gdb-stacks.txt 10 years ago Jonathan Howard 14.79 KB, text/plain		Details
bug1232229-add-nursery-asserts 10 years ago Jon Coppeard (:jonco) 5.50 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Fix 10 years ago Jan de Mooij [:jandem] 1.34 KB, patch	jonco : review+ lizzard : approval-mozilla-aurora+ lizzard : approval-mozilla-beta+	Details \| Diff \| Splinter Review