Open Bug 1829760 Opened 2 years ago Updated 5 months ago

Crash in nsGlobalWindowOuter::PrepareForProcessChange due to GetRemoteOuterWindowProxy failing

Categories

(Core :: DOM: Window and Location, defect)

defect

Tracking

()

People

(Reporter: bradwerth, Unassigned)

References

Details

(Keywords: stalled)

Crash Data

Attachments

(1 obsolete file)

Crashed just now: https://crash-stats.mozilla.org/report/index/319e0958-2c94-496f-8e26-ac7cf0230424

Looks like nsGlobalWindowOuter::PrepareForProcessChange will null dereference the BrowsingContext if the window associated with the proxy has been unlinked.

Perhaps this means that the proxy should addref the window to prevent it from being unlinked? I don't know. I can write a patch that will prevent this specific crash, but I don't know if this is merely covering up a larger problem.

Andrew, do you know if this is the correct solution?

Flags: needinfo?(continuation)

mWindowProxy is a weak reference that is supposed to get cleared when we unlink the outer window, so I don't think that should be happening.

The crash you linked doesn't look like a null deref of a BC.

The crash reason is MOZ_CRASH(PrepareForProcessChange GetRemoteOuterWindowProxy), which means that the call to GetRemoteOuterWindowProxy failed. That in turn looks like it means that RemoteObjectProxyBase::GetOrCreateProxyObject() failed to return a proxy object, which can fail for a variety of reasons, like failing to allocate a JS object.

Flags: needinfo?(continuation)
Crash Signature: [@ nsGlobalWindowOuter::PrepareForProcessChange ]

(In reply to Andrew McCreight [:mccr8] from comment #4)

The crash reason is MOZ_CRASH(PrepareForProcessChange GetRemoteOuterWindowProxy), which means that the call to GetRemoteOuterWindowProxy failed. That in turn looks like it means that RemoteObjectProxyBase::GetOrCreateProxyObject() failed to return a proxy object, which can fail for a variety of reasons, like failing to allocate a JS object.

Ah, you're right, I wasn't reading the crash report correctly. I'm not sure how to fixup GetOrCreateProxyObject. I'll take myself off this Bug and abandon the patch.

Attachment #9330064 - Attachment is obsolete: true
Assignee: bwerth → nobody

Yeah, this stuff is pretty nasty. I think the basic issue is that it is hard to know what to even do if there is a failure, because we're in the middle of some complex mutating operation we can't really roll back. So we just crash.

Severity: -- → S3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on release

:edgar, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(echen)
Keywords: topcrash

(In reply to Andrew McCreight [:mccr8] from comment #4)

The crash reason is MOZ_CRASH(PrepareForProcessChange GetRemoteOuterWindowProxy), which means that the call to GetRemoteOuterWindowProxy failed. That in turn looks like it means that RemoteObjectProxyBase::GetOrCreateProxyObject() failed to return a proxy object, which can fail for a variety of reasons, like failing to allocate a JS object.

FWIW, the first 5 reports I clicked on all have JSOutOfMemory: Reported set.

IIUC the block that runs the nsAutoScriptBlocker::~nsAutoScriptBlocker() is here in void nsFrameLoaderOwner::ChangeRemotenessCommon. - which is a void function. The nsGlobalWindowOuter::PrepareForProcessChange function just returns if we find no outer window which is probably equally unexpected but then crashes apparently on OOM. If it is fine to just return in the first case, wouldn't it be fine to do so also in the OOM case (and maybe crash later due to it or not)? And if not, it would be nice to at least mark these crashes as OOM to make it easier to ignore them.

It does seem that the JSOutOfMemory: Reported state is potentially a temporary state since it can be recovered from during GC. Indeed in at least some of the crash reports with JSOutOfMemory: Reported in the crash annotations, there appears to be plenty of available memory.

If this is correct, then maybe it's possible for RemoteObjectProxyBase::GetOrCreateProxyObject or something further up the call stack to detect the failed allocation and then force a synchronous GC and try again.

increasing increasing the severity to S2 as it is a top-crash.

Severity: S3 → S2
Flags: needinfo?(echen)

Andrew, is there anything we can do here, based on new findings from comment 8 and comment 9? Thanks!

Flags: needinfo?(continuation)
Summary: nsGlobalWindowOuter::PrepareForProcessChange should survive being called on a proxy to an unlinked window → Crash in nsGlobalWindowOuter::PrepareForProcessChange due to GetRemoteOuterWindowProxy failing

Those are some interesting ideas, but I'm not sure if they will help.

The current code crashes instead of bailing out because we're in the middle of a multi-step process to replace one window proxy with another, where we've cleared out some references in an outer window and a browsing context, so we're not in a valid state. If we just keep going, we'll probably crash on a null deref later. With some refactoring, maybe we could attempt to create the new proxy object before we start clearing things out, but I'm not sure how weird it would be if we give up and leave that floating around.

As for GCing, I think we already do a "last ditch" GC whenever we fail to allocate a JS object, so I don't think an extra GC will help. (I'm not super familiar with that mechanism so I might be wrong.)

I could try pushing the crash into RemoteObjectProxyBase::GetOrCreateProxyObject() when !aTransplantTo, as the callers that pass in a non-null value all crash on failure. The failure could be happening in an unexpected place. I could also make the MOZ_ASSERT(!aTransplantTo) into a release assert to eliminate the chance there is some weirdness going on there. My observation of bug 1405521, where bz landed a tremendous amount of instrumentation to try to understand a crash in a similar-ish bit of DOM code makes me worry that this could be a bit of a wild goose chase, but moving the crash would be pretty simple so we might as well try it.

See Also: → 1405521

Peter, any ideas? I haven't looked at this code in a few years so maybe I'm overlooking something. Thanks.

Flags: needinfo?(peterv)
Depends on: 1851922

My new crash points have been hit 3 times now on Nightly.

bp-8d589a3d-c468-4e01-a7d8-209800230917 GOCPO failed at JS_SetImmutablePrototype
bp-3cd2051b-98d8-4fc5-ba41-a84da0230917 GOCPO failed at NewProxyObject
bp-ada12b07-3711-4bf3-8b73-bd6bd0230914 GOCPO failed at NewProxyObject

So, yeah it looks like it is failing after trying to allocate some JS stuff.

Crash Signature: [@ nsGlobalWindowOuter::PrepareForProcessChange ] → [@ nsGlobalWindowOuter::PrepareForProcessChange ] [@ mozilla::dom::RemoteObjectProxyBase::GetOrCreateProxyObject ]
Flags: needinfo?(continuation)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
Flags: needinfo?(peterv)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: