Closed Bug 1646604 Opened 5 years ago Closed 5 years ago

Crash in [@ arena_t::~arena_t | ArenaCollection::DisposeArena] via SnowWhiteKiller

Categories

(Core :: DOM: Core & HTML, defect)

77 Branch
Unspecified
All
defect

Tracking

()

VERIFIED FIXED
86 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix
firefox86 --- fixed

People

(Reporter: wsmwk, Assigned: sefeng211)

References

(Regression)

Details

(Keywords: crash, regression, topcrash-thunderbird)

Crash Data

Attachments

(1 file, 1 obsolete file)

signature begins with version 77. All crashes are Thunderbird.
First crash is bp-85acf350-a2bf-45fa-8dbc-3afcc0200602.

Top 10 frames of crashing thread:

0 mozglue.dll arena_t::~arena_t memory/build/mozjemalloc.cpp:3565
1 mozglue.dll ArenaCollection::DisposeArena memory/build/mozjemalloc.cpp:1075
2 xul.dll nsIContent::Destroy dom/base/FragmentOrElement.cpp:149
3 xul.dll SnowWhiteKiller::Visit xpcom/base/nsCycleCollector.cpp:2457
4 xul.dll nsPurpleBuffer::VisitEntries<SnowWhiteKiller> xpcom/base/nsCycleCollector.cpp:957
5 xul.dll nsCycleCollector_doDeferredDeletionWithBudget xpcom/base/nsCycleCollector.cpp:3889
6 xul.dll AsyncFreeSnowWhite::Run js/xpconnect/src/XPCJSRuntime.cpp:147
7 xul.dll IdleRunnableWrapper::Run xpcom/threads/nsThreadUtils.cpp:326
8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
9 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:481

Another is bp-63f36bf9-d636-4c3a-92d6-032240200617

Whiteboard: [rare]

How might it be possible for all crashes to be happening only on beta channel? 100% of crashes on only one channel is quite odd.

bp-4e8d568e-a5f3-47e6-a02a-9fbe40201231 Windows

Mac 85 beta arena_t::~arena_t | arena_t::DallocSmall | arena_t::DallocSmall | ArenaCollection::DisposeArena bp-b732f931-58c7-4bdb-b393-f2b5e0201215
0 libmozglue.dylib arena_t::~arena_t() memory/build/mozjemalloc.cpp:3579
1 libmozglue.dylib arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) memory/build/mozjemalloc.cpp:3290
2 libmozglue.dylib arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) memory/build/mozjemalloc.cpp:3290
3 libmozglue.dylib ArenaCollection::DisposeArena(arena_t*) memory/build/mozjemalloc.cpp:1075
4 XUL nsIContent::Destroy() dom/base/FragmentOrElement.cpp:149
5 XUL SnowWhiteKiller::Visit(nsPurpleBuffer&, nsPurpleBufferEntry*) xpcom/base/nsCycleCollector.cpp:2457
6 XUL void nsPurpleBuffer::VisitEntries<SnowWhiteKiller>(SnowWhiteKiller&) xpcom/base/nsCycleCollector.cpp:957
7 libsystem_pthread.dylib _pthread_cond_updateval
8 XUL nsCycleCollector_doDeferredDeletionWithBudget(js::SliceBudget&) xpcom/base/nsCycleCollector.cpp:3889
9 XUL AsyncFreeSnowWhite::Run() js/xpconnect/src/XPCJSRuntime.cpp:147
10 XUL XUL@0x41c73f
11 XUL <name omitted> xpcom/threads/nsThreadUtils.cpp:344

Crash Signature: [@ arena_t::~arena_t | ArenaCollection::DisposeArena] → [@ arena_t::~arena_t | ArenaCollection::DisposeArena] [@ arena_t::~arena_t | arena_t::DallocSmall | arena_t::DallocSmall | ArenaCollection::DisposeArena ]
Flags: needinfo?(mkmelin+mozilla)
OS: Windows 10 → All
Summary: Crash in [@ arena_t::~arena_t | ArenaCollection::DisposeArena] → Crash in [@ arena_t::~arena_t | ArenaCollection::DisposeArena] via SnowWhiteKiller

The original bug might be different. Possibly this one is a rare crash from bug 1211292.

Flags: needinfo?(mkmelin+mozilla)

Let's go with that

Keywords: regression
Regressed by: 1211292
Crash Signature: [@ arena_t::~arena_t | ArenaCollection::DisposeArena] [@ arena_t::~arena_t | arena_t::DallocSmall | arena_t::DallocSmall | ArenaCollection::DisposeArena ] → [@ arena_t::~arena_t | ArenaCollection::DisposeArena] [@ arena_t::~arena_t | arena_t::DallocSmall | arena_t::DallocSmall | ArenaCollection::DisposeArena ] [@ arena_t::~arena_t | arena_t::DallocSmall | ArenaCollection::DisposeArena]
Flags: needinfo?(mkmelin+mozilla)
Whiteboard: [rare]

(In reply to Wayne Mery (:wsmwk) from comment #4)

now a topcrash on beta - #1

For some reason a big spike in last 24 hours - not just a few people according to https://crash-stats.mozilla.org/signature/?product=Thunderbird&signature=arena_t%3A%3A~arena_t%20%7C%20ArenaCollection%3A%3ADisposeArena&date=%3E%3D2021-01-03T16%3A57%3A00.000Z&date=%3C2021-01-04T16%3A57%3A00.000Z#summary

Spike appears to be related to gconversations addon 3.2.11 bp-849a9100-2238-4ad7-bc29-67d600210104

Flags: needinfo?(standard8)

The update I've just realised for Conversations changes how the javascript files are packaged - we're now using webpack to bundle them altogether. That's should generally be standard javascript code, though some of it does run in chrome context, but the packaging shouldn't matter.

The crash stacks here though are cycle collection related and I doubt there's anything I could do about it without actually having steps to repeat. So you're probably better off moving this to Core / Javascript: GC as a starting point.

Flags: needinfo?(standard8)
Component: General → JavaScript: GC
Product: Thunderbird → Core
Version: 77 → 77 Branch

jonco, can you give an assessment of this topcrash?

Flags: needinfo?(jcoppeard)

This is the cycle collector, not the GC.

Component: JavaScript: GC → XPCOM
Flags: needinfo?(jcoppeard) → needinfo?(continuation)

MOZ_CRASH Reason (Sanitized): MOZ_RELEASE_ASSERT(!mStats.allocated_small && !mStats.allocated_large) (Arena is not empty)

The large uptick is for tb beta builds with id 20201222142912 and some, 4% with 20201217170743
Looking at https://hg.mozilla.org/mozilla-central/log/tip/memory/build/mozjemalloc.cpp perhaps bug 1681003 could be the cause?

Flags: needinfo?(mkmelin+mozilla)

Unlikely, since that's a private arena. It seems something allocated in a DOMArena is outliving the arena itself...

(In reply to Jon Coppeard (:jonco) from comment #8)

This is the cycle collector, not the GC.

This is actually not the cycle collector either, though it is more the cycle collector than the GC. :) All cycle collected objects are destroyed by the "SnowWhiteKiller", so this just means that we crashed while running the destroyed for some cycle collected object. It looks like these stacks all have nsIContent::Destroy() in them, so this is some issue with the arena allocation of DOM nodes.

Component: XPCOM → DOM: Core & HTML
Flags: needinfo?(continuation)

Sean Feng worked on the arena allocator for DOM stuff in bug 1377999 and other bugs, so maybe he has some ideas of the next steps for investigating what is going wrong here.

Flags: needinfo?(sefeng)

(In reply to Andrew McCreight [:mccr8] from comment #11)
Ah, apologies. I saw something about the cycle collector in the stack and plumped for that :)

(In reply to Jon Coppeard (:jonco) from comment #13)

Ah, apologies. I saw something about the cycle collector in the stack and plumped for that :)

No worries. It is a very common point of confusion. It might be worth reworking the names of some of that stuff to make it clearer to people reading the stacks what is going on.

A quick update is that this seems releated to cross docGroup node adoption. After the freeing a node that has been adopted to a different docGroup via the nsIContent::Destroy() call, DOMArena thinks its safe to dispose the arena because nobody's owning it, so the underlying arena should be empty. In contrast, jemalloc tells us it's not empty.

I don' know how this can possibly happen though, and I am still investigating.

Severity: -- → S2

We use nsINode::Adopt to store the arenas to a hashtable to keep them
alive, however this method is not guaranteed to be called, so it
may cause arenas to be disposed before all nodes are destroyed.

Assignee: nobody → sefeng
Status: NEW → ASSIGNED

In a scenario that a node is adopted from docGroup A to B and
then B to C, the cached arena will be updated to B's arena
during the B to C adoption, which is not correct (We want to keep
A's arena alive because the node was created there).

Depends on D101042

Attachment #9195890 - Attachment is obsolete: true
Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/96bdbe3dd9f1 Fix cross docGroup node adoption may not correctly keep the arena alive r=smaug
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
Regressed by: 1377999
No longer regressed by: 1211292
Has Regression Range: --- → yes

Comment on attachment 9195889 [details]
Bug 1646604 - Fix cross docGroup node adoption may not correctly keep the arena alive r=smaug

Beta/Release Uplift Approval Request

  • User impact if declined: Users experience crashes.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Note that so far we've only found this bug is reproduciable in Thunderbird.
  1. Subscribe to the mozilla dev-platform mailing list https://lists.mozilla.org/listinfo/dev-platform
  2. Install Thunderbird Conversation extension. Make sure you've clicked Apply changes after installing the extension.
  3. Keep opening emails from this mailing list randomly.
  4. If there's no crashes after opening about 20 emails, then we are good.
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): Medium because I didn't land a crash test for it, so it requires some manual testing. And the patch belongs to dom node adoption, which has a fair amount of complexity.
  • String changes made/needed:
Flags: needinfo?(sefeng)
Attachment #9195889 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Given the risk called out in comment 20, the lack of crashes due to this bug in firefox, and the short time before 85 rc, I'd prefer to let this ride to 86.

Attachment #9195889 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Good news, the last Thunderbird daily to crash is buildid 20210107105528

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: