Closed
Bug 1334086
Opened 7 years ago
Closed 4 years ago
Crash in nsGlobalWindow::SetOpenerWindow
Categories
(Core :: DOM: Core & HTML, defect, P3)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | fixed |
firefox-esr52 | --- | fixed |
firefox53 | --- | wontfix |
firefox54 | --- | fixed |
firefox55 | --- | fixed |
People
(Reporter: philipp, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(2 files, 1 obsolete file)
1.22 KB,
patch
|
jcristau
:
approval-mozilla-beta+
lizzard
:
approval-mozilla-release-
|
Details | Diff | Splinter Review |
1.20 KB,
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-342a68bf-ed97-46a3-a811-1be472170126. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll nsGlobalWindow::SetOpenerWindow(nsPIDOMWindowOuter*, bool) dom/base/nsGlobalWindow.cpp:3235 1 xul.dll nsWindowWatcher::ReadyOpenedDocShellItem(nsIDocShellTreeItem*, nsPIDOMWindowOuter*, bool, bool, mozIDOMWindowProxy**) embedding/components/windowwatcher/nsWindowWatcher.cpp:2164 2 xul.dll nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, bool, bool, nsIDocShellLoadInfo*, mozIDOMWindowProxy**) embedding/components/windowwatcher/nsWindowWatcher.cpp:1077 3 xul.dll nsWindowWatcher::OpenWindow2(mozIDOMWindowProxy*, char const*, char const*, char const*, bool, bool, bool, nsISupports*, bool, bool, nsIDocShellLoadInfo*, mozIDOMWindowProxy**) embedding/components/windowwatcher/nsWindowWatcher.cpp:442 4 xul.dll nsGlobalWindow::OpenInternal(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, bool, bool, bool, bool, bool, nsIArray*, nsISupports*, nsIDocShellLoadInfo*, bool, nsPIDOMWindowOuter**) dom/base/nsGlobalWindow.cpp:12590 5 xul.dll nsGlobalWindow::OpenNoNavigate(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsPIDOMWindowOuter**) dom/base/nsGlobalWindow.cpp:8463 6 xul.dll nsDocShell::InternalLoad(nsIURI*, nsIURI*, bool, nsIURI*, unsigned int, nsIPrincipal*, nsIPrincipal*, unsigned int, nsAString_internal const&, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) docshell/base/nsDocShell.cpp:10068 7 xul.dll nsDocShell::OnLinkClickSync(nsIContent*, nsIURI*, char16_t const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, nsIDocShell**, nsIRequest**) docshell/base/nsDocShell.cpp:14043 8 xul.dll OnLinkClickEvent::Run() docshell/base/nsDocShell.cpp:13822 9 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1216 10 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96 11 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 12 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 13 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262 14 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:283 15 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:4467 16 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp:4600 17 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4691 this is a cross-platform crash that is regressing since firefox 52 with the MOZ_RELEASE_ASSERT(!contentOpener || !mTabGroup || mTabGroup == Cast(contentOpener)->mTabGroup) that was added in bug 1303196.
Comment 2•6 years ago
|
||
Ehsan, you've been looking at this type of crash recently, and may have a better idea of where this sort of problem would be coming from. Our best bet for 52 may be to remove the assertion and try to fix it in 54.
Flags: needinfo?(michael) → needinfo?(ehsan)
Comment 3•6 years ago
|
||
(In reply to Michael Layzell [:mystor] from comment #2) > Ehsan, you've been looking at this type of crash recently, and may have a > better idea of where this sort of problem would be coming from. I have? This is the first I'm seeing this type of crash as far as my memory serves. Are you confusing this with bug 1334281 (unrelated)?
Flags: needinfo?(ehsan) → needinfo?(michael)
Comment 4•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #3) > I have? This is the first I'm seeing this type of crash as far as my memory > serves. Are you confusing this with bug 1334281 (unrelated)? Yup, I think I am. sorry about that.
Flags: needinfo?(michael)
Comment 5•6 years ago
|
||
This seems like it's only occurring in Firefox 52 (at least I don't see any crashes on 53/54) - not sure why we wouldn't be hitting this crash in Firefox 53+. I fixed a lot of bugs like this one when I originally landed the OpenerWindow changes, but I don't really have any ideas about what in particular could be triggering this crash. The stack doesn't really tell me much.
Reporter | ||
Comment 6•6 years ago
|
||
there are crashes with 53&54 as well: http://bit.ly/2jx205H
Comment 7•6 years ago
|
||
(In reply to [:philipp] from comment #6) > there are crashes with 53&54 as well: http://bit.ly/2jx205H Ah, I'm just bad at using the crashstats API :)
Comment 8•6 years ago
|
||
Michael, could you suggest how to progress this bug? Thanks.
Flags: needinfo?(michael)
Comment 9•6 years ago
|
||
I looked into this crash a bit. There are a couple of interesting things to note: a) All but one of these crashes occurs in the chrome process, and b) IIRC we don't actually care about TabGroups being 100% accurate outside of the content process. This means that, at least as a temporary solution, we can simply disable this assertion in the parent process, as it shouldn't cause us any problems to simply have slightly incorrect TabGroups when in the chrome process. Ehsan, do you have thoughts on this as, at least a temporary solution. I spent a few hours about a week ago looking through a minidump from one of these crash reports and I wasn't able to figure out where the setting of the opener went wrong. The code paths for opening new windows with openers in the parent process are much more complex than the ones in the content process, so limiting ourselves to only having to ensure that the content process sets openers correctly would greatly simplify things. MozReview-Commit-ID: A9kIBs6lC5x
Attachment #8837200 -
Flags: review?(ehsan)
Updated•6 years ago
|
Flags: needinfo?(michael)
Comment 10•6 years ago
|
||
Can you provide a breakdown of how many of the existing reports are from the parent vs the child? Also, how are you distinguishing whether the crash is really coming from the parent vs the case where e10s is disabled? I'm OK to disable this check on beta to deal with the current bleeding but this is a wallpaper, and I would be more comfortable with it if we were on a path towards discovering why this happens...
Flags: needinfo?(michael)
Comment 11•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #10) > Can you provide a breakdown of how many of the existing reports are from the > parent vs the child? Also, how are you distinguishing whether the crash is > really coming from the parent vs the case where e10s is disabled? I'm OK to > disable this check on beta to deal with the current bleeding but this is a > wallpaper, and I would be more comfortable with it if we were on a path > towards discovering why this happens... I used the crash reporter to filter by which process was causing the crash ("Process type"). In the results of that check, I discovered that there was only 1 crash which occurred in a non-browser process (where browser as far as I can tell means chrome, as the options are "content", "browser", "gpu", and "plugin"). This is that crash: https://crash-stats.mozilla.com/report/index/d254b080-dfc9-498f-900f-ca95c2161013 The stack in that crash report looks mildly insane, as for some reason the call to Initialize in gfxGDIFont is calling into nsGlobalWindow::SetOpenerWindow, so I imagine that this crash is due to heap corruption of some form, rather than because of a problem with our opener logic. In addition, unlike the other crashes, this crash was not caused by the MOZ_RELEASE_ASSERT, but rather by a different memory access violation. The other crashes all occurred in the parent process. I also would prefer to be on a path toward discovering why this happens. I imagine that it occurs because there is some code path in the parent process where the opener window is not set correctly when a new window is created. I tried inspecting a minidump which you sent to me, and the codepath which it appears would have been used should have worked fine, so I don't know what the problem was. One advantage of this change is that the codepaths in the content process for correctly setting window.opener are much simpler than the codepaths in the parent process, meaning that we will have much less code to look through in the future if TabGroups end up being incorrect.
Flags: needinfo?(michael)
Comment 12•6 years ago
|
||
(In reply to Michael Layzell [:mystor] from comment #11) > (In reply to :Ehsan Akhgari from comment #10) > > Can you provide a breakdown of how many of the existing reports are from the > > parent vs the child? Also, how are you distinguishing whether the crash is > > really coming from the parent vs the case where e10s is disabled? I'm OK to > > disable this check on beta to deal with the current bleeding but this is a > > wallpaper, and I would be more comfortable with it if we were on a path > > towards discovering why this happens... > > I used the crash reporter to filter by which process was causing the crash > ("Process type"). In the results of that check, I discovered that there was > only 1 crash which occurred in a non-browser process (where browser as far > as I can tell means chrome, as the options are "content", "browser", "gpu", > and "plugin"). This is that crash: > https://crash-stats.mozilla.com/report/index/d254b080-dfc9-498f-900f- > ca95c2161013 Are you paying attention to the absense of the ProcessType field? AFAICT that field only gets added to the crash report in e10s. And most of these crashes indeed seem to be from non-e10s browsers (for example, comment 0.) > The stack in that crash report looks mildly insane, as for some reason the > call to Initialize in gfxGDIFont is calling into > nsGlobalWindow::SetOpenerWindow, so I imagine that this crash is due to heap > corruption of some form, rather than because of a problem with our opener > logic. In addition, unlike the other crashes, this crash was not caused by > the MOZ_RELEASE_ASSERT, but rather by a different memory access violation. Hmm, the stack trace there is surely suspect. I have no good theory what's going on here. > The other crashes all occurred in the parent process. I don't think that's correct, per above. > I also would prefer to be on a path toward discovering why this happens. I > imagine that it occurs because there is some code path in the parent process > where the opener window is not set correctly when a new window is created. I > tried inspecting a minidump which you sent to me, and the codepath which it > appears would have been used should have worked fine, so I don't know what > the problem was. > > One advantage of this change is that the codepaths in the content process > for correctly setting window.opener are much simpler than the codepaths in > the parent process, meaning that we will have much less code to look through > in the future if TabGroups end up being incorrect. Again, after looking at about 10 or so of these crashes, I'm now fairly certain that it happens in non-e10s builds rather than the parent process. So r- for the patch because of that. My suggestion for what to do next: try out the URLs that I sent you a while back with these crashes in a non-e10s browser to see if you can reproduce (assuming you can figure out how to create a new window from those sites). If that doesn't reveal anything, then I think the best and only remaining option would be to assert our invariants earlier and hope that one of such assertions would fire off earlier than the crash to shed more light on what's happening.
Updated•6 years ago
|
Attachment #8837200 -
Flags: review?(ehsan) → review-
Comment 13•6 years ago
|
||
Another thing to note is that this crash happens under https://hg.mozilla.org/releases/mozilla-beta/annotate/78ae21055d9f/docshell/base/nsDocShell.cpp#l9990. So you should look for code that does a window.open with a target window name but without an existing docshell registered for that target (it could be "_blank" or a real name I think if I'm reading the code correctly).
Updated•6 years ago
|
status-firefox-esr52:
--- → affected
Comment 14•6 years ago
|
||
Hey Michael, I don't suppose you've had a chance to come back to this? Fx52 goes to RC next Monday, so we're getting low on time.
Flags: needinfo?(michael)
Comment 15•6 years ago
|
||
We should remove the assertion on 52 beta.
Comment 16•6 years ago
|
||
Comment on attachment 8837200 [details] [diff] [review] Don't check opener's TabGroup when in parent process r? on this patch for Beta so we can disable the assertion there.
Flags: needinfo?(michael)
Attachment #8837200 -
Flags: review- → review?(ehsan)
Comment 17•6 years ago
|
||
Comment on attachment 8837200 [details] [diff] [review] Don't check opener's TabGroup when in parent process Review of attachment 8837200 [details] [diff] [review]: ----------------------------------------------------------------- The commit message also needs changing... ::: dom/base/nsGlobalWindow.cpp @@ +3424,5 @@ > + // when in a content process. > + if (XRE_IsContentProcess()) { > + nsPIDOMWindowOuter* contentOpener = GetSanitizedOpener(aOpener); > + MOZ_RELEASE_ASSERT(!contentOpener || !mTabGroup || > + mTabGroup == Cast(contentOpener)->mTabGroup); No need to assert in any process on beta. Please instead of doing this, just remove the MOZ_RELEASE_ASSERT completely. r=me with that for beta.
Attachment #8837200 -
Flags: review?(ehsan) → review+
Comment 18•6 years ago
|
||
MozReview-Commit-ID: 3cAsrJzTvXt
Updated•6 years ago
|
Attachment #8837200 -
Attachment is obsolete: true
Updated•6 years ago
|
Keywords: leave-open
Comment 19•6 years ago
|
||
Comment on attachment 8840498 [details] [diff] [review] Don't check opener's TabGroup in beta Approval Request Comment [Feature/Bug causing the regression]: bug 1303196 [User impact if declined]: Occasional unnecessary crashes [Is this code covered by automated tests?]: N/A [Has the fix been verified in Nightly?]: N/A [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No - this removes an unused-in-beta RELEASE_ASSERT which was occasionally being hit. [Why is the change risky/not risky?]: The assertion which is being removed cannot lead to memory unsafety in beta. [String changes made/needed]: None
Attachment #8840498 -
Flags: approval-mozilla-beta?
Comment 20•6 years ago
|
||
Comment on attachment 8840498 [details] [diff] [review] Don't check opener's TabGroup in beta remove a release assert in beta52
Attachment #8840498 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 21•6 years ago
|
||
We may want to turn this into a diagnostic assert on trunk and aurora for now, so that we don't have to keep fixing this on beta when 53 goes to beta...
Comment 22•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/3ba4a9e4bf35
Comment 23•6 years ago
|
||
MozReview-Commit-ID: 13zUBotz7Tr
Attachment #8840550 -
Flags: review?(ehsan)
Comment 24•6 years ago
|
||
Setting qe-verify- based on Michael's assessment on manual testing needs (see Comment 19).
Flags: qe-verify-
Comment 25•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/3ba4a9e4bf35
Updated•6 years ago
|
Attachment #8840550 -
Flags: review?(ehsan) → review+
Comment 26•6 years ago
|
||
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/32dce6ccde94 Change the SetOpenerWindow tabgroup assertion to a DIAGNOSTIC_ASSERT, r=ehsan
Comment 27•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/32dce6ccde94
Comment 29•6 years ago
|
||
(In reply to Mark Banner (:standard8) from comment #28) > Should this still be open? Yes, we haven't had a chance to fix the actual underlying problem, we have only disabled the problematic code in beta/release.
Flags: needinfo?(michael)
Comment 30•6 years ago
|
||
Michael and I just discussed how this *could* be caused by addons but looking at recent reports it does appear to be happening to people who only have the default addons.
![]() |
||
Comment 31•6 years ago
|
||
(In reply to Michael Layzell [:mystor] from comment #29) > (In reply to Mark Banner (:standard8) from comment #28) > > Should this still be open? > > Yes, we haven't had a chance to fix the actual underlying problem, we have > only disabled the problematic code in beta/release. This does not appear to be the case: https://hg.mozilla.org/releases/mozilla-release/annotate/d345b657d381/dom/base/nsGlobalWindow.cpp#l3191 That revision is tagged as 53 release, and the crash reports for this signature are still occurring in 53 release. What went wrong?
Flags: needinfo?(michael)
![]() |
||
Comment 32•6 years ago
|
||
Peculiarly, it only seems to be happening on 53 and earlier; no reports for 54 and 55.
Comment 33•6 years ago
|
||
How odd. It looks like when the patch was landed it was only landed on beta and central, and not into aurura (which is presumably 53 release today). Applying the patch from comment 19 should make this go away.
Flags: needinfo?(michael)
Comment 34•6 years ago
|
||
Comment on attachment 8840498 [details] [diff] [review] Don't check opener's TabGroup in beta Approval Request Comment see comment 19 The fix for this problem seems like it was only landed on beta and central when it was landed, and it was not landed on aurora (which at the time would have been 53). Landing this patch on 53 would make this crash signature go away. It's probably not worth triggering a dot release, but might be good as a ride-along.
Attachment #8840498 -
Flags: approval-mozilla-release?
This landed on m-b on Feb. 23 and m-c on Feb 25. Beta was 52 and nightly was 54 at that point. Because the status-firefox54 flag wasn't ever set to fixed (as it should be automatically from bugherder), we didn't see this as a "missed uplift" candidate.
status-firefox55:
--- → fixed
This isn't very high volume on release, so I think wontfixing is reasonable for 53.
Attachment #8840498 -
Flags: approval-mozilla-release? → approval-mozilla-release-
Comment 37•5 years ago
|
||
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Comment 38•4 years ago
|
||
The method should be now nsGlobalWindowOuter::SetOpenerWindow, but I don't see crash reports for it.
I would have expected MOZ_DIAGNOSTIC_ASSERT to crash at least occasionally.
Fission work has changed the code here recently.
If there are still issues, please reopen.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Keywords: leave-open
Assignee | ||
Updated•4 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•