Closed Bug 943409 Opened 11 years ago Closed 1 year ago

Cycle collection problem with IndexedDB

Categories

(Core :: Storage: IndexedDB, defect, P5)

x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 945813

People

(Reporter: wingo, Unassigned)

Details

Attachments

(8 files, 1 obsolete file)

Attached patch cursor-cycle (obsolete) — Splinter Review
The attached patch adds an xpcshell test that introduces a debug-mode abort, as the post-shutdown cycle collection takes two iterations instead of one.  Could it be that some indexedDb object is not dropping references to callbacks as it should?
cc'ing some people who know about indexeddb.
Try running this with XPCOM_CC_LOG_SHUTDOWN=1 and MOZ_CC_LOG_DIRECTORY=some directory, then upload the cc-edges* files to this bug, and I can take a look.  There should be three logs.  The basic idea is to see what is being deleted in the third shutdown CC, then see why it isn't deleted in the second shutdown CC.
Attached file cc-edges.16338.log
Attached file cc-edges.16338-1.log
Attached file cc-edges.16338-2.log
I recompiled with the collector loop like this, to allow the loop to go to completion:

void
nsCycleCollector::ShutdownCollect()
{
    uint32_t i;
    for (i = 0; i < DEFAULT_SHUTDOWN_COLLECTIONS; ++i) {
        if (!Collect(ShutdownCC, nullptr)) {
            break;
        }
    }
    NS_ASSERTION(i < NORMAL_SHUTDOWN_COLLECTIONS, "Extra shutdown CC");
}
Some lines that stand out:

-0x7f5c35be2700 [rc=2] nsDOMEventTargetHelper 
+0x7f5c35be2700 [rc=1] nsDOMEventTargetHelper

and

+0x7f5c35bb5b30 [rc=1] IDBFactory
+> 0x7f5c33e3b100 mOwningObject
+0x7f5c35b4f700 [rc=1] IDBIndex
+> 0x7f5c36734ce0 mObjectStore
Thanks for the logs!  XPCShell fatally asserts with NS_ASSERTION?  Weird, that's not really great.
Assignee: nobody → continuation
Yep, you can get the assertion by applying the cursor-cycle patch and then "mach xpcshell-test dom/indexedDB/test/unit/test_cursor_cycle.js" in a debug-mode build.
Looking at the logs, using my find_roots script ( https://github.com/amccreight/heapgraph ), there's an IDBObjectStore with a refcount of 3 that is holding things alive.  In the second CC log, its refcount has dropped to 2, which are the same references as in the first one, so there's some edge the CC isn't being told about that goes away after the first CC.

The next step is to set a breakpoint at the start of the shutdown CC, then once you hit that, set a breakpoint in the release method IDBObjectStore::Release and see the stacks where it is being called.  I'll try to figure out how to get that going, or you can do it if you want.
(In reply to Andy Wingo [:wingo] from comment #9)
> Yep, you can get the assertion by applying the cursor-cycle patch and then
> "mach xpcshell-test dom/indexedDB/test/unit/test_cursor_cycle.js" in a
> debug-mode build.

Ah, thanks, I'll try my hand at getting lldb to work...
Well, no luck right now with getting lldb to do something useful.  I'll try again later today.
(In reply to comment #12)
> Well, no luck right now with getting lldb to do something useful.  I'll try
> again later today.

Are you having problems with setting breakpoints in lldb?
I followed the instructions here:
  https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_lldb
and I got the "WARNING:  Unable to resolve breakpoint to any actual locations." message, but it seems like that's expected.  Then trying to actually do run, I got a message about a bash syntax error.  This was running xpcshell through mach with --debugger=lldb.
(In reply to comment #14)
> I followed the instructions here:
>   https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_lldb
> and I got the "WARNING:  Unable to resolve breakpoint to any actual locations."
> message, but it seems like that's expected.  Then trying to actually do run, I
> got a message about a bash syntax error.  This was running xpcshell through
> mach with --debugger=lldb.

Bug 942133 might be helpful to you...
Attached file gdb-log
Attached is a GDB log following instructions in comment 10.
Flags: needinfo?(continuation)
Attached file gc-edges.16338.log
Attached file gc-edges.16338-1.log
Attached file gc-edges.16338-2.log
Could you also attach the cc-edges*-1 file if you have it?  Thanks.
Flags: needinfo?(wingo)
(In reply to Andrew McCreight [:mccr8] from comment #20)
> Could you also attach the cc-edges*-1 file if you have it?  Thanks.

Those were attached earlier.
Flags: needinfo?(wingo)
Kyle looked in the GC logs and found the object holding alive the object store, which leads to this chain in the GC log:

0x7f5c33e7f840 G Function
[...]
> 0x7f5c33e65fc0 G fun_callscope

0x7f5c33e65fc0 G Call <no private>
[...]
> 0x7f5c33e3f640 G objectStore

What holds the Function alive?  There's nothing in the log that refers to it, and it isn't a root in the log.

Either the log is incomplete, or something dying in that first GC causes the Function to become unreachable.  The latter scenario would be something like, we have a JS object A that holds alive some C++ object B.  Object A is garbage, but not yet collected.  Object B gray-roots the Call.  We start the GC, trace from object B, keeping the Call alive.  Then we sweep and kill off object A, which kills off object B, leaving the Call as an orphan that doesn't get cleaned up until the next GC.

The problem here is that the DumpHeapComplete runs after the end of the GC, not during it, so it doesn't see precisely what the GC sees.  Probably running a DumpHeapComplete before the GC would help.
Flags: needinfo?(continuation)
I captured a GC heap dump immediately prior to the first CC shutdown GC.  The garbage Function is reachable from two things.  First, there's the gray root "mCallback", which is not very descriptive, but maybe somebody else knows what that is (eg there's an mCallback field somewhere that points to the Function).

0x109b80840 G Function [...]

The Function is also being held by another Call, being held alive by another garbage Function:

0x109b80900 G Call <no private> [...]
> 0x109b80840 G **UNKNOWN SLOT 1** [...]

0x109b80940 G Function [...]
> 0x109b80900 G fun_callscope

There are no references to this Function.  This feels like it may be a situation where a chain of functions, calls, and event-whatevers is taking k GC-CC cycles to tear down.
I'm guessing mCallback is from CallbackObject.
Attachment #8338537 - Attachment is obsolete: true
Assignee: continuation → wingo
Comment on attachment 8341593 [details] [diff] [review]
Add a test that IndexedDB has some cycle bug, and a workaround

mccr8 suggested on IRC yesterday that we move ahead with a workaround, adding a gc() call before the test finishes.  That breaks the cycle and allows bug 927782 to proceed, while keeping this bug open.
Attachment #8341593 - Flags: review?(continuation)
Comment on attachment 8341593 [details] [diff] [review]
Add a test that IndexedDB has some cycle bug, and a workaround

Review of attachment 8341593 [details] [diff] [review]:
-----------------------------------------------------------------

There's definitely still some kind of issue here, so I'd like there to be two bugs, one for this workaround that will get closed when the patch merges to central, and another one that remains open.  So, please file a new bug.  Whether you want to open issue to be the new one or the old one doesn't matter much, as long as the comment is accurate and the bugs refer to each other.

An indexedDB person should at least rubberstamp the test changes.
Attachment #8341593 - Flags: review?(khuey)
Attachment #8341593 - Flags: review?(continuation)
Attachment #8341593 - Flags: review+
Sadly there appear to be three other cases in the xpcshell suite -- two more in indexedDB and one in jsdownloads.  Not sure if they are related, besides the fact that they only show up when all generator locals are on scope chains.

I'll post a new patch and make a new bug for the workarounds, and probably file another bug for the jsdownloads failure.
Comment on attachment 8341593 [details] [diff] [review]
Add a test that IndexedDB has some cycle bug, and a workaround

Clearing review request here in favor of bug 945813.
Attachment #8341593 - Flags: review?(khuey)
No longer blocks: 927782
Priority: -- → P5
Assignee: wingo → nobody
Severity: normal → S3

Duping this to bug 945813 which seems to have superseded this one and it's not clear that there is anything remaining here that's directly actionable. (Ideally we fixed all cycle collection problems in other bugs! :)

Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 945813
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: