Closed Bug 1499995 Opened 6 years ago Closed 6 years ago

Crash in mozilla::dom::ClientSource::WindowExecutionReady

Categories

(Core :: DOM: Content Processes, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox-esr60 --- unaffected
firefox63 + disabled
firefox64 + fixed
firefox65 --- fixed

People

(Reporter: josh.tumath+bugzilla, Assigned: baku)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [privacy65])

Crash Data

Attachments

(3 files, 1 obsolete file)

I am encountering this tab crash every time I visit an article on The Verge. Nothing happens on first load, but when I start scrolling down the page, the tab crashes as soon as I reach a certain point. e.g. https://www.theverge.com/circuitbreaker/2018/10/16/17981684/amazon-kindle-paperwhite-update-new-waterproof-thinner-lighter-oasis-voyage This bug was filed from the Socorro interface and is report bp-0acb61ee-c4b1-4aaf-b462-0c0100181018. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::dom::ClientSource::WindowExecutionReady dom/clients/manager/ClientSource.cpp:286 1 xul.dll nsGlobalWindowOuter::SetNewDocument dom/base/nsGlobalWindowOuter.cpp:1958 2 xul.dll nsresult nsDocumentViewer::InitInternal layout/base/nsDocumentViewer.cpp:1032 3 xul.dll nsDocumentViewer::Init layout/base/nsDocumentViewer.cpp:771 4 xul.dll nsresult nsDocShell::Embed docshell/base/nsDocShell.cpp:6767 5 xul.dll nsDSURIContentListener::DoContent docshell/base/nsDSURIContentListener.cpp:196 6 xul.dll bool nsDocumentOpenInfo::TryContentListener uriloader/base/nsURILoader.cpp:759 7 xul.dll nsDocumentOpenInfo::OnStartRequest uriloader/base/nsURILoader.cpp:306 8 xul.dll void mozilla::net::HttpChannelChild::DoOnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:774 9 xul.dll void mozilla::net::HttpChannelChild::OnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:641 =============================================================
Thanks for filing Josh - we had this crash in Bug 1495285 - I am waiting to hear back from the developer as to whether we reopen that bug or track the crashes in this bug.
I'm working on this crash.
Andrew, do you have any idea why this crash happens after the landing of TrackingDummyChannel? I'm not able to reproduce it.
Flags: needinfo?(bugmail)
Can confirm this crash. It happened on https://www.reddit.com/r/DotA2/ and https://www.reddit.com/r/Artifact/ for me.
Seems to have been fixed in the last Nightly update.
Nevermind, ignore my last comment. It's still there. Sorry!
Can reproduce crash on https://www.reddit.com/r/VintageApple/comments/9pvjtl/apples_education_mac/ Webrender on, MacOS 10.14, MBP A1708. as of writing this, am on latest nightly build 20181020102231
I think our best option is if those able to reproduce this crash could run with "MOZ_LOG=AntiTracking:5" and record the output. The pages at https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Gecko_Logging#Enabling_Logging and https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging provide related information. Note that the ",sync" option would likely want to be used if logging to a file because otherwise the crash might eat the log info. Other than the potential for a race related to what the dummy channel perceives and what the subsequent normal check in the parent is, my best guess for an edge-case would be something related to controller inheritance and about:blank iframes. Per https://w3c.github.io/ServiceWorker/#selection, about:blank and other local schemes[1] will inherit the controller. I don't have any real idea how that could actually trigger a problem directly, it's just the alternate code path for marking a client controlled... The inheritance[2] uses ServiceWorkerAllowedToControlWindow()[3] which is a different check that doesn't involve the dummy channel. 1: https://fetch.spec.whatwg.org/#local-scheme 2: https://searchfox.org/mozilla-central/rev/4fc7bfa931189e0718dd70b4c59885829f1d761e/docshell/base/nsDocShell.cpp#2832 3: https://searchfox.org/mozilla-central/rev/4fc7bfa931189e0718dd70b4c59885829f1d761e/docshell/base/nsDocShell.cpp#13754
Flags: needinfo?(bugmail)
Uh, and note that the logs could quite potentially include private information from other windows/tabs, so you would want to: - Only include near the end of the log for the crashing page's site. - Probably send the log directly to :baku at amarchesini@mozilla.com
Tracking for 63 in case we have both a working fix and a dot release in the 63 to 64 timeframe.
Assignee: nobody → amarchesini
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
See Also: → 1495285
The signature is ranked #1 in content top-crashes for Firefox.
Keywords: regression, topcrash
@Moses and @Enzo - I am not able to reproduce the crash on my Mac on those reddit sites - is there something particular you are doing on the page when the crash happens, or does it crash when you load the page? Are you able to try what Andrew suggests in Comment 8 and Comment 9? If so, please let us know. Thanks.
Flags: needinfo?(spidersouris)
Flags: needinfo?(mb8672)
It seems to crash when loading some content. Sometimes, it crashes on scroll down and other times, it crashes when loading. I've sent an email to :baku with the logs yesterday, hope I did everything fine.
Flags: needinfo?(spidersouris)
Flags: needinfo?(mb8672)
Yes, I confirm. I received the log. I'm trying to understand why the crash happens. Thanks, Enzo!
With today's nightly I crash 100% of time on https://reddit.com when Content Blocking is enabled. It happens about 2 seconds into load. I'm on Win 10.
If I remove/add the youtube.com serviceworker the crashes stop/start. Still having trouble reproducing in a clean profile.
It seems to come down to something in permissions.sqlite that I don't understand. Clearing Site Preferences in settings fixes the crash. Here is the select lines from permissions.sqlite related to youtube/reddit. I can email you the full database if it is really needed. moz_perms table > 167|https://www.youtube.com|storageAccessAPI|1|2|1542904681092|1540312681092 > 178|https://www.reddit.com|3rdPartyStorage^https://www.youtube.com|1|2|1542644862782|1540052862782 > 30|https://www.reddit.com|storageAccessAPI|1|2|1542905134669|1540313134669 STR: - permissions.sqlite contains these entries - visit youtube.com to register service worker - visit reddit.com (no login, 'new' UI) - scrolll down until a youtube embed occurs - Crash
Great! I can reproduce it!
Tracking for 64 in case baku finds a fix - maybe this will be good for beta uplift.
I'm working on a test
Attachment #9019582 - Flags: review?(ehsan)
Attachment #9019582 - Attachment description: iframe.patch → part 1 - nested iframes + SW
TrackingDummyChannel has a bad bug: it always sets the tracking flag ignoring the real annotation result. We were passing all the tests because we don't have tests for intercepted 3rd party contexts: all the mochitests run in nested iframes and we use cookieBehavior 0 for them. In the next patches I add some tests. The fix consists in exposing nsIHttpChannelInternal interface and return the topWindowURL (the only method used by the classifier).
Attachment #9019663 - Flags: review?(ehsan)
Attached patch part 3 - testsSplinter Review
Tests for 3rd party intercepted contexts. We don't want a 3rd party intercepted context to load a sub intercepted iframe: in nightly/beta, opening an nested iframe from a intercepted iframe triggers the crash. I also fixed browser_antitracking.js because it was using multiple tabs, which were handled by different processes. Sometimes we were not controlling the iframe because the second process didn't know about the service worker registration yet.
Attachment #9019667 - Flags: review?(ehsan)
Attachment #9019582 - Flags: review?(ehsan) → review+
Comment on attachment 9019663 [details] [diff] [review] part 2 - fix TrackingDummyChannel Review of attachment 9019663 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/TrackingDummyChannel.h @@ +41,5 @@ > * 1496997 we can remove this implementation. Bug 1498259 covers removing this > * hack in particular. > */ > class TrackingDummyChannel final : public nsIChannel > + , public nsIHttpChannelInternal Hmm, a class which implements nsIHttpChannelInternal but not nsIHttpChannel. I wonder what a Necko peer would think about this. Do you mind asking someone like mayhemer to have a brief look at the patch please? @@ +49,5 @@ > > NS_DECL_THREADSAFE_ISUPPORTS > NS_DECL_NSIREQUEST > NS_DECL_NSICHANNEL > + NS_DECL_NSIHTTPCHANNELINTERNAL Do you mind adding a comment here saying only GetTopWindowURI is implemented?
Attachment #9019663 - Flags: review?(ehsan) → review+
Attachment #9019667 - Flags: review?(ehsan) → review+
One thing I don't understand about your fixes is how they will affect 63. The tracking flags on the bug seem to suggest that the crash affects 63, is that right? TrackingDummyChannel however doesn't even exist on that branch. What needs to be done for the release channel, if anything?
Flags: needinfo?(amarchesini)
Comment on attachment 9019663 [details] [diff] [review] part 2 - fix TrackingDummyChannel Mayhemer, do you mind to take a look at this patch? I need to expose nsIHttpChannelInternal::GetTopWindowURL to make URL-Classifier able to annotate this channel.
Flags: needinfo?(amarchesini)
Attachment #9019663 - Flags: review?(honzab.moz)
> What needs to be done for the release channel, if anything? The crash happens just in 64 and 65 because only in these 2 releases we have cookieBehavior==4 by default. For 63, we could have crashes for users who set cookieBehavior 3 or 4.
(In reply to Andrea Marchesini [:baku] from comment #26) > > What needs to be done for the release channel, if anything? > > The crash happens just in 64 and 65 because only in these 2 releases we have > cookieBehavior==4 by default. > For 63, we could have crashes for users who set cookieBehavior 3 or 4. So are you planning to write a different set of patches for 63? Will that happen in the same bug?
Flags: needinfo?(amarchesini)
If we want to fix 63, we should uplift bug 1495285 and this bug.
Flags: needinfo?(amarchesini)
(In reply to Andrea Marchesini [:baku] from comment #28) > If we want to fix 63, we should uplift bug 1495285 and this bug. Aha, great, thanks! That's what I was looking after. :-)
Comment on attachment 9019663 [details] [diff] [review] part 2 - fix TrackingDummyChannel Review of attachment 9019663 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/TrackingDummyChannelChild.cpp @@ +93,5 @@ > > RefPtr<HttpBaseChannel> httpChannel = do_QueryObject(channel); > + if (aTrackingResource) { > + httpChannel->SetIsTrackingResource(mIsThirdParty); > + } ups.. I missed this? Could we do: if (aTrackingResource && mIsThirdParty) { httpChannel->SetIsTrackingResource(true); } ?
Attachment #9019663 - Flags: review?(honzab.moz) → review+
> Could we do: > > if (aTrackingResource && mIsThirdParty) { > httpChannel->SetIsTrackingResource(true); > } If it's not a 3rd party, we should call SetIsTrackingResource(false).
This bit is missing for some existing tests.
Attachment #9019740 - Flags: review?(ehsan)
Whiteboard: [privacy65]
Comment on attachment 9019740 [details] [diff] [review] part 4 - propagation of the return value of GetTopWindowURI Review of attachment 9019740 [details] [diff] [review]: ----------------------------------------------------------------- Wow, so even the return value matters here! OK.
Attachment #9019740 - Flags: review?(ehsan) → review+
Do you mind filing a bug to get rid of this whole code once the SW team finishes their work on parent interception?
Flags: needinfo?(amarchesini)
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a81dc13c9089 part 1 - LoadInfo must expose the correct top-level-storage-area-principal for sub documents, r=ehsan https://hg.mozilla.org/integration/mozilla-inbound/rev/ccd2e78000b8 part 2 - TrackingDummyChannel must expose nsIHttpChannelInternal, r=ehsan, r=mayhemer https://hg.mozilla.org/integration/mozilla-inbound/rev/2393a7d9f454 part 3 - Tests for nested iframes controlled by ServiceWorkers, r=ehsan
In reply to :Ehsan Akhgari from comment #34) > Do you mind filing a bug to get rid of this whole code once the SW team > finishes their work on parent interception? We already have it: https://bugzilla.mozilla.org/show_bug.cgi?id=1498259
Flags: needinfo?(amarchesini)
Comment on attachment 9019582 [details] [diff] [review] part 1 - nested iframes + SW [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: Bug 1493211 User impact if declined: Nested iframes, controlled by a ServiceWorker, can trigger a crash Is this code covered by automated tests?: Yes Has the fix been verified in Nightly?: Yes Needs manual test from QE?: No If yes, steps to reproduce: List of other uplifts needed: Bug 1495285 Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): The fix is about exposing nsIHttpChannelInternal in the TrackingDummyChannel. Nothing risky. String changes made/needed: none
Attachment #9019582 - Flags: approval-mozilla-beta?
We have some crashes in 63. They are extremely rare because the crash can happen only for users who chose to use cookieBehavior 3. Do we want to uplift this and bug 1495285 to release?
Flags: needinfo?(ryanvm)
Pascal owns the 63 release.
Flags: needinfo?(ryanvm) → needinfo?(pascalc)
(In reply to Andrea Marchesini [:baku] from comment #38) > We have some crashes in 63. They are extremely rare because the crash can > happen only for users who chose to use cookieBehavior 3. Do we want to > uplift this and bug 1495285 to release? According to Socorro we don't have any crash for this signature for 63.0 on the release channel since we shipped We only had 2 crashes on the beta channel during the whole beta cycle and all the other crashes where on the aurora channel (Dev Edition), about 25 crashes last week. Given that data, I don't think we need to uplift a patch to the release channel. Thanks.
Flags: needinfo?(pascalc)
Same setup no longer crashes for me on Build 20181025220044. \o/
We're still seeing crashes with this signature on the 20181025220044 nightly builds. Baku is going to file a follow-up bug for those crashes as they're coming from a different path than this bug.
Flags: in-testsuite+
Comment on attachment 9019740 [details] [diff] [review] part 4 - propagation of the return value of GetTopWindowURI Per baku on IRC, this was merged into part 2.
Attachment #9019740 - Attachment is obsolete: true
Comment on attachment 9019582 [details] [diff] [review] part 1 - nested iframes + SW Reduces crashes when using nested iframes controlled by a ServiceWorker. Approved for 64.0b5.
Attachment #9019582 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Baku, was there a follow-up bug filed for the remaining crashes? We're clearly in a better place than we used to be, but the crashes are still there.
Flags: needinfo?(amarchesini)
Blocks: 1503787
Flags: needinfo?(amarchesini)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: