Closed Bug 1503787 Opened 2 years ago Closed 7 months ago

Crash in mozilla::dom::ClientSource::WindowExecutionReady

Categories

(Core :: Privacy: Anti-Tracking, defect, P2)

65 Branch
Unspecified
All
defect

Tracking

()

VERIFIED FIXED
mozilla76
Tracking Status
firefox-esr68 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- disabled
firefox72 --- disabled
firefox73 --- disabled
firefox74 --- disabled
firefox75 --- wontfix
firefox76 --- verified

People

(Reporter: baku, Assigned: baku)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, steps-wanted, Whiteboard: [privacy65][anti-tracking])

Crash Data

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1499995 +++

We still have some crash reports for mozilla::dom::ClientSource::WindowExecutionReady's assertions.
baku, are there any next steps we can take here?
Flags: needinfo?(amarchesini)
Assignee: nobody → amarchesini
User report of repeated crashes on feedly:
https://github.com/webcompat/web-bugs/issues/21346
it appears to be semi-repeatable

I did see quite a few feedly crashes
jesup, have you been able to reproduce this crash somehow? Because I've tried any possible feed with feedly, but nothing.
I also suspect it's geolocation problem: I could receive different adverts from somebody else from another country.
Flags: needinfo?(amarchesini) → needinfo?(rjesup)
I have not tried; I don't use feedly (and barely touch reddit).  I was just investigating the crash reports
Flags: needinfo?(rjesup)
diagnostic asserts don't fire in late beta or release, wontfixing for 64.
See Also: → 1508044
Andrea, Ehsan said you might be interested in bug 1508044 comment 12 where I was able (with the patch in that bug) reproduce 

Assertion failure: spec.LowerCaseEqualsLiteral("about:blank") || StringBeginsWith(spec, static_cast<const nsLiteralCString&>(nsLiteralCString("" "blob:"))) || nsContentUtils::StorageAllowedForWindow(aInnerWindow) == nsContentUtils::StorageAccess::eAllow>

on https://www.youtube.com/?gl=FR&hl=fr
See Also: 1508044
Priority: -- → P1
Version: Trunk → 65 Branch
Target Milestone: --- → Future

This has a Target Milestone and is set to P1 so let's leave it off the 66 affected radar.

Component: DOM: Content Processes → Privacy: Anti-Tracking
Whiteboard: [privacy65] → [privacy65][anti-tracking]
Priority: P1 → P2

(I was wrong; the crash widget thing here seems to not match Socorro) There are quite a few patches across all channels, many at the same address. baku, can you take another look here?

Flags: needinfo?(amarchesini)

Baku, any news on this bug?

No crashes since beta 10, so wontfix for 67.

Flags: needinfo?(amarchesini)

Only one crash since 68 beta 9, so wontfix for 68

Adding 70 as affected. This is visible in 69 and one comment says "Still 100% tab crash on innoreader.com in RSS article with YouTube video in it. "

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #13)

Adding 70 as affected. This is visible in 69 and one comment says "Still 100% tab crash on innoreader.com in RSS article with YouTube video in it. "

It was me, I have last nightly and it still 100% crash on inoreader.com in RSS article with YouTube video in it 🙂
Steps to reproduce:

  1. Open https://www.inoreader.com/ and sign in or create new account.
  2. Subscribe to feed with many video articles, for example https://feeds.feedburner.com/failblog.
  3. Open article with embedded video in this feed.
  4. Gah. Your tab just crashed.

I got this crash (bp-051f94bd-6c2c-47c6-b6fb-8af6e0190809) on https://inexpensive-daisy.glitch.me/. Youtube had this service worker from about:serviceworkers:

Origin: https://www.youtube.com

Scope: https://www.youtube.com/
Script Spec: https://www.youtube.com/sw.js
Current Worker URL: https://www.youtube.com/sw.js
Active Cache Name: {ea8bd166-b827-4207-a351-8078d59a6524}
Waiting Cache Name: {$name}
Push Endpoint: null

(In reply to :Ehsan Akhgari from comment #15)

I got this crash (bp-051f94bd-6c2c-47c6-b6fb-8af6e0190809) on https://inexpensive-daisy.glitch.me/.

Reproduced the crash on the same page again: bp-042cf8ba-1858-4fa7-b732-e23bc0190812

STR using the latest Nightly:

  1. Set browser.startup.page to 3 and dom.serviceWorkers.parent_intercept to false.
  2. Visit youtube.com, and about:serviceworkers. Ensure that the YouTube service worker is installed.
  3. Make sure you only have one tab open. In that tab, navigate to https://inexpensive-daisy.glitch.me/.
  4. Quit Nightly.
  5. Restart Firefox. At this point the session is restored, and the inexpensive-daisy page crashes while restoring with this signature (see comment 15 and 16).
Has STR: --- → yes

Ehsan is this is the correct component or should it be in DOM::Service Workers ? I'm seeing a bit of a spike in nightly 70 right now so I'm a little worried how this is going to look in beta 70. Can you suggest an owner for this bug?

Flags: needinfo?(ehsan)

(In reply to Liz Henry (:lizzard) from comment #18)

Ehsan is this is the correct component or should it be in DOM::Service Workers ? I'm seeing a bit of a spike in nightly 70 right now so I'm a little worried how this is going to look in beta 70. Can you suggest an owner for this bug?

It is in the right component I guess, really what matters is who looks at it. This is the intersection of service workers and anti-tracking.

I'm quite certain that this only impacts youtube embeds in ETP with level 2 blocking which isn't shipping yet (it's limited to nightly and early beta). This can be verified in the STR in comment 17 by setting the privacy.annotate_channels.strict_list.enabled pref to false.

Andrea is very busy on the secure proxy project, perhaps Andrew has some time to have a look now that we have a reliable STR here?

Flags: needinfo?(ehsan) → needinfo?(bugmail)

Ah, that explains why the crash volume dropped off after beta 7. So, this shouldn't affect 70 release (at least not much.)

I debugged the steps to reproduce in comment 17. Here is what happens.

When loading the embedded youtube iframes when restoring the session, we start to run nsGlobalWindowInner::ExecutionReady(). First we run nsGlobalWindowInner::EnsureClientSource() which calls SetController().

SetController() has similar diagnostic asserts to the ones that fail and crash here, but those do not run yet since mOwner is nullptr so both GetInnerWindow() and GetWorkerPrivate() return nullptr. Note that mOwner is only set in WindowExecutionReady/WorkerExecutionReady, which haven't been called yet (see the above paragraph!)

So the first conclusion here is that we should be crashing earlier, but by sheer accident, we don't...

But why is the assertion not holding in the first place? We must have decided to intercept the channel when we shouldn't have. Let's see why that is. If we look at the anti-tracking check that happens within nsDocShell::ShouldPrepareForIntercept() when it's called for the embedded youtube's document channel, it bails out from here, even though youtube.com is a third-party tracker. The reason why IsThirdPartyTrackingResource() returns false here is that ShouldInterceptURI() is called very early before the channel is classified...

This is very similar to bug 1516309, it's just that we saw the same problem there with HTTP redirects, but here we are seeing it with service worker interceptions.

For service worker interceptions, given how close we are to turning on parent interception by default (which calls ShouldIntercept() after classification has happened) I think it makes sense to just wait on bug 1588154 instead of working on a fix here which would become obsolete either in this release cycle or the next...

Depends on: 1588154
Whiteboard: [privacy65][anti-tracking] → [privacy65][anti-tracking][will be fixed by bug 1588154]
See Also: → 1516309

Marking as disabled for 71 as it depends on the privacy.annotate_channels.strict_list.enabled pref, disabled on the release channel.

I noticed that I am still seeing these crashes with a STR similar to comment 17, except that dom.serviceWorkers.parent_intercept is true and the crash happens once Firefox is restarted.

I wonder if the fact that this happens during a restart is somehow related to bug 1599379... But before we rule that out, I wanted to double check an assumption that I made in comment 21 and now I cannot remember why I assumed that. Andrew, do you mind helping me verify that this call to ShouldInterceptURI can only be made when dom.serviceWorkers.parent_intercept is false? Thanks!

See Also: → 1599379

EDIT: I mis-spoke somewhat here. There should not be an intercept controller because the client docshell won't create one. This means the real call-stack is:

Calls:

Original Comment

I think that call to ShouldInterceptURI will be made, but because content-process ServiceWorkerManagers have no registrations, the call-chain[1] that terminates in a call to ServiceWorkerManager::IsAvailable will return false at that point. Unfortunately, all the logic related to STS upgrades and maybe other side-effects happen.

I think we want a parent-intercept guard at the ShouldInterceptURI call-site. Probably we should be asserting that if parent-intercept is on, mPostRedirectChannelShouldIntercept is false, and we should not invoke that call.

1: Calls:

Flags: needinfo?(bugmail)

An update on comment 23: I seem to no longer be able to reproduce this using the STR we have with no combination of prefs that I can think of at all, so it's possible that there is indeed nothing that we need to do here any more?

There are still regular crashes in 73 in this signature. - 34 crashes/26 installations in the last 7 days. All but one have MOZ_DIAGNOSTIC_ASSERT(spec.LowerCaseEqualsLiteral("about:blank") || StringBeginsWith(spec, static_cast<const nsLiteralCString&>(nsLiteralCString("" "blob:"))) || StorageAllowedForWindow(aInnerWindow) == StorageAccess::eAllow) .

Has STR: yes → no
Keywords: steps-wanted
Assignee: amarchesini → nobody
Whiteboard: [privacy65][anti-tracking][will be fixed by bug 1588154] → [privacy65][anti-tracking]
Depends on: 1607518

Given comment 27, it sounds like this might have been at least partially mitigated for 73+ by sw-e10s, though there's still some crashes showing up in Beta builds with low-ish volume.

I manage to reproduce the issue on a build from 2019-08-12 using Windows 10 x64 and a computer that has Intel HD Graphics Family by following the steps from comment 17.
Using the Latest Nightly on the same machine I tried various combination of scenarios ( changing prefs, restarting the browser, opening a lot of sites that have youtube videos embedded, installing various add-ons that I found in the crash reports) and sites that I took both from the comments left on the crash-stats.mozilla.org for this signature and from a list that was provided from englehardt (Thank you, Steve), but I didn't manage to reproduce the crash.

However, I did find out two things:

Flags: needinfo?(ehsan)

Thanks for trying to reproduce this.

(In reply to Oana Botisan, Desktop Release QA from comment #29)

I manage to reproduce the issue on a build from 2019-08-12 using Windows 10 x64 and a computer that has Intel HD Graphics Family by following the steps from comment 17.

That matches the regression range I found in comment 27 (in that the crash STR stopped working around October 9).

Using the Latest Nightly on the same machine I tried various combination of scenarios ( changing prefs, restarting the browser, opening a lot of sites that have youtube videos embedded, installing various add-ons that I found in the crash reports) and sites that I took both from the comments left on the crash-stats.mozilla.org for this signature and from a list that was provided from englehardt (Thank you, Steve), but I didn't manage to reproduce the crash.

However, I did find out two things:

That bug is unrelated, but please file it separately in the service workers component and make it block bug 1588154. Thanks!

Flags: needinfo?(ehsan)

However, I did find out two things:

That bug is unrelated, but please file it separately in the service workers component and make it block bug 1588154. Thanks!
I logged bug 1610501 for this crash.

Is there anything else that we can help you with regarding this issue?

Flags: needinfo?(ehsan)

(In reply to Oana Botisan, Desktop Release QA from comment #32)

Is there anything else that we can help you with regarding this issue?

We still need steps to reproduce for the crash...

Flags: needinfo?(ehsan)

Lots of crashes on the 2/26 Linux Nightly. Lots of URLs are https://www.thecrag.com/ where one of the crashes has the comment "Site has started crashing with last few nightlies.". Just visiting the site doesn't reproduce the crash for me.

OS: Windows 10 → All
Depends on: 1619086

The crashtest layout/generic/crashtests/471360.html in fission crashtests (not run anywhere by default) hits this pretty reliably. (You have to disable a couple other crashtests that crash for other reasons first to get to this test).

Duplicate of this bug: 1620674

(In reply to Timothy Nikkel (:tnikkel) from comment #35)

The crashtest layout/generic/crashtests/471360.html in fission crashtests (not run anywhere by default) hits this pretty reliably. (You have to disable a couple other crashtests that crash for other reasons first to get to this test).

Crashtest dom/base/crashtests/607222.html also intermittently hits this crash when Fission is enabled. That is duplicate bug 1620674.

Can we turn the diagnostic assert into a MOZ_ASSERT so this stops crashing for users, if we're not going to actually fix the crash soon?
It's become fairly common on nightly and DevEdition now.

Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Assignee: nobody → amarchesini
Status: NEW → ASSIGNED

I wrote the patch, but if there is a STR, I'm happy to work on it.

Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f1a26f022ec7
Make WindowExecutionReady assertions MOZ_ASSERT only, r=timhuang
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: Future → mozilla76
Flags: qe-verify+

Reproduced the crash using Firefox 70.0a1 (20190812095530) on Windows 10 and STR from comment 17. Here is the crash report.
The issue is verified fixed using Firefox 76.0a1 (20200327215207) and Firefox 76.0b7 (20200421231527) on Windows 10x64. No crashes encountered while following STR from comment 17 and visiting https://www.thecrag.com/.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.