Crash in mozilla::dom::ClientSource::WindowExecutionReady

RESOLVED FIXED in Firefox 64

Status

()

defect
--
critical
RESOLVED FIXED
8 months ago
8 months ago

People

(Reporter: josh.tumath+bugzilla, Assigned: baku)

Tracking

(Blocks 1 bug, {crash, regression, topcrash})

Trunk
mozilla65
Unspecified
Windows 10
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox63+ disabled, firefox64+ fixed, firefox65 fixed)

Details

(Whiteboard: [privacy65], crash signature)

Attachments

(3 attachments, 1 obsolete attachment)

Reporter

Description

8 months ago
I am encountering this tab crash every time I visit an article on The Verge. Nothing happens on first load, but when I start scrolling down the page, the tab crashes as soon as I reach a certain point.

e.g.
https://www.theverge.com/circuitbreaker/2018/10/16/17981684/amazon-kindle-paperwhite-update-new-waterproof-thinner-lighter-oasis-voyage

This bug was filed from the Socorro interface and is
report bp-0acb61ee-c4b1-4aaf-b462-0c0100181018.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::ClientSource::WindowExecutionReady dom/clients/manager/ClientSource.cpp:286
1 xul.dll nsGlobalWindowOuter::SetNewDocument dom/base/nsGlobalWindowOuter.cpp:1958
2 xul.dll nsresult nsDocumentViewer::InitInternal layout/base/nsDocumentViewer.cpp:1032
3 xul.dll nsDocumentViewer::Init layout/base/nsDocumentViewer.cpp:771
4 xul.dll nsresult nsDocShell::Embed docshell/base/nsDocShell.cpp:6767
5 xul.dll nsDSURIContentListener::DoContent docshell/base/nsDSURIContentListener.cpp:196
6 xul.dll bool nsDocumentOpenInfo::TryContentListener uriloader/base/nsURILoader.cpp:759
7 xul.dll nsDocumentOpenInfo::OnStartRequest uriloader/base/nsURILoader.cpp:306
8 xul.dll void mozilla::net::HttpChannelChild::DoOnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:774
9 xul.dll void mozilla::net::HttpChannelChild::OnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:641

=============================================================
Thanks for filing Josh - we had this crash in Bug 1495285 - I am waiting to hear back from the developer as to whether we reopen that bug or track the crashes in this bug.
Assignee

Comment 2

8 months ago
I'm working on this crash.
Assignee

Comment 3

8 months ago
Andrew, do you have any idea why this crash happens after the landing of TrackingDummyChannel?
I'm not able to reproduce it.
Flags: needinfo?(bugmail)

Comment 4

8 months ago
Can confirm this crash. It happened on https://www.reddit.com/r/DotA2/ and https://www.reddit.com/r/Artifact/ for me.

Comment 5

8 months ago
Seems to have been fixed in the last Nightly update.

Comment 6

8 months ago
Nevermind, ignore my last comment. It's still there. Sorry!

Comment 7

8 months ago
Can reproduce crash on https://www.reddit.com/r/VintageApple/comments/9pvjtl/apples_education_mac/  Webrender on, MacOS 10.14, MBP A1708. as of writing this, am on latest nightly build 20181020102231
I think our best option is if those able to reproduce this crash could run with "MOZ_LOG=AntiTracking:5" and record the output.  The pages at https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Gecko_Logging#Enabling_Logging and https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging provide related information.  Note that the ",sync" option would likely want to be used if logging to a file because otherwise the crash might eat the log info.

Other than the potential for a race related to what the dummy channel perceives and what the subsequent normal check in the parent is, my best guess for an edge-case would be something related to controller inheritance and about:blank iframes.  Per https://w3c.github.io/ServiceWorker/#selection, about:blank and other local schemes[1] will inherit the controller.  I don't have any real idea how that could actually trigger a problem directly, it's just the alternate code path for marking a client controlled...  The inheritance[2] uses ServiceWorkerAllowedToControlWindow()[3] which is a different check that doesn't involve the dummy channel.

1: https://fetch.spec.whatwg.org/#local-scheme
2: https://searchfox.org/mozilla-central/rev/4fc7bfa931189e0718dd70b4c59885829f1d761e/docshell/base/nsDocShell.cpp#2832
3: https://searchfox.org/mozilla-central/rev/4fc7bfa931189e0718dd70b4c59885829f1d761e/docshell/base/nsDocShell.cpp#13754
Flags: needinfo?(bugmail)
Uh, and note that the logs could quite potentially include private information from other windows/tabs, so you would want to:
- Only include near the end of the log for the crashing page's site.
- Probably send the log directly to :baku at amarchesini@mozilla.com
Tracking for 63 in case we have both a working fix and a dot release in the 63 to 64 timeframe.
Assignee: nobody → amarchesini
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
See Also: → 1495285
The signature is ranked #1 in content top-crashes for Firefox.
Keywords: regression, topcrash
@Moses and @Enzo - I am not able to reproduce the crash on my Mac on those reddit sites - is there something particular you are doing on the page when the crash happens, or does it crash when you load the page? Are you able to try what Andrew suggests in Comment 8 and Comment 9? If so, please let us know. Thanks.
Flags: needinfo?(spidersouris)
Flags: needinfo?(mb8672)

Comment 13

8 months ago
It seems to crash when loading some content. Sometimes, it crashes on scroll down and other times, it crashes when loading. I've sent an email to :baku with the logs yesterday, hope I did everything fine.
Flags: needinfo?(spidersouris)
Flags: needinfo?(mb8672)
Assignee

Comment 14

8 months ago
Yes, I confirm. I received the log. I'm trying to understand why the crash happens. Thanks, Enzo!
With today's nightly I crash 100% of time on https://reddit.com when Content Blocking is enabled. It happens about 2 seconds into load. I'm on Win 10.
If I remove/add the youtube.com serviceworker the crashes stop/start. Still having trouble reproducing in a clean profile.
It seems to come down to something in permissions.sqlite that I don't understand. Clearing Site Preferences in settings fixes the crash. Here is the select lines from permissions.sqlite related to youtube/reddit. I can email you the full database if it is really needed.

moz_perms table
> 167|https://www.youtube.com|storageAccessAPI|1|2|1542904681092|1540312681092
> 178|https://www.reddit.com|3rdPartyStorage^https://www.youtube.com|1|2|1542644862782|1540052862782
> 30|https://www.reddit.com|storageAccessAPI|1|2|1542905134669|1540313134669

STR:
  - permissions.sqlite contains these entries
  - visit youtube.com to register service worker
  - visit reddit.com (no login, 'new' UI)
  - scrolll down until a youtube embed occurs
  - Crash
Assignee

Comment 18

8 months ago
Great! I can reproduce it!
Tracking for 64 in case baku finds a fix - maybe this will be good for beta uplift.
Assignee

Comment 20

8 months ago
I'm working on a test
Attachment #9019582 - Flags: review?(ehsan)
Assignee

Updated

8 months ago
Attachment #9019582 - Attachment description: iframe.patch → part 1 - nested iframes + SW
Assignee

Comment 21

8 months ago
TrackingDummyChannel has a bad bug: it always sets the tracking flag ignoring the real annotation result.
We were passing all the tests because we don't have tests for intercepted 3rd party contexts: all the mochitests run in nested iframes and we use cookieBehavior 0 for them. In the next patches I add some tests.

The fix consists in exposing nsIHttpChannelInternal interface and return the topWindowURL (the only method used by the classifier).
Attachment #9019663 - Flags: review?(ehsan)
Assignee

Comment 22

8 months ago
Tests for 3rd party intercepted contexts. We don't want a 3rd party intercepted context to load a sub intercepted iframe: in nightly/beta, opening an nested iframe from a intercepted iframe triggers the crash.

I also fixed browser_antitracking.js because it was using multiple tabs, which were handled by different processes. Sometimes we were not controlling the iframe because the second process didn't know about the service worker registration yet.
Attachment #9019667 - Flags: review?(ehsan)

Updated

8 months ago
Attachment #9019582 - Flags: review?(ehsan) → review+

Comment 23

8 months ago
Comment on attachment 9019663 [details] [diff] [review]
part 2 - fix TrackingDummyChannel

Review of attachment 9019663 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/TrackingDummyChannel.h
@@ +41,5 @@
>   * 1496997 we can remove this implementation.  Bug 1498259 covers removing this
>   * hack in particular.
>   */
>  class TrackingDummyChannel final : public nsIChannel
> +                                 , public nsIHttpChannelInternal

Hmm, a class which implements nsIHttpChannelInternal but not nsIHttpChannel.  I wonder what a Necko peer would think about this.  Do you mind asking someone like mayhemer to have a brief look at the patch please?

@@ +49,5 @@
>  
>    NS_DECL_THREADSAFE_ISUPPORTS
>    NS_DECL_NSIREQUEST
>    NS_DECL_NSICHANNEL
> +  NS_DECL_NSIHTTPCHANNELINTERNAL

Do you mind adding a comment here saying only GetTopWindowURI is implemented?
Attachment #9019663 - Flags: review?(ehsan) → review+

Updated

8 months ago
Attachment #9019667 - Flags: review?(ehsan) → review+

Comment 24

8 months ago
One thing I don't understand about your fixes is how they will affect 63.  The tracking flags on the bug seem to suggest that the crash affects 63, is that right?  TrackingDummyChannel however doesn't even exist on that branch.

What needs to be done for the release channel, if anything?
Flags: needinfo?(amarchesini)
Assignee

Comment 25

8 months ago
Comment on attachment 9019663 [details] [diff] [review]
part 2 - fix TrackingDummyChannel

Mayhemer, do you mind to take a look at this patch?
I need to expose nsIHttpChannelInternal::GetTopWindowURL to make URL-Classifier able to annotate this channel.
Flags: needinfo?(amarchesini)
Attachment #9019663 - Flags: review?(honzab.moz)
Assignee

Comment 26

8 months ago
> What needs to be done for the release channel, if anything?

The crash happens just in 64 and 65 because only in these 2 releases we have cookieBehavior==4 by default.
For 63, we could have crashes for users who set cookieBehavior 3 or 4.

Comment 27

8 months ago
(In reply to Andrea Marchesini [:baku] from comment #26)
> > What needs to be done for the release channel, if anything?
> 
> The crash happens just in 64 and 65 because only in these 2 releases we have
> cookieBehavior==4 by default.
> For 63, we could have crashes for users who set cookieBehavior 3 or 4.

So are you planning to write a different set of patches for 63?  Will that happen in the same bug?
Flags: needinfo?(amarchesini)
Assignee

Comment 28

8 months ago
If we want to fix 63, we should uplift bug 1495285 and this bug.
Flags: needinfo?(amarchesini)

Comment 29

8 months ago
(In reply to Andrea Marchesini [:baku] from comment #28)
> If we want to fix 63, we should uplift bug 1495285 and this bug.

Aha, great, thanks!  That's what I was looking after.  :-)
Comment on attachment 9019663 [details] [diff] [review]
part 2 - fix TrackingDummyChannel

Review of attachment 9019663 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/TrackingDummyChannelChild.cpp
@@ +93,5 @@
>  
>    RefPtr<HttpBaseChannel> httpChannel = do_QueryObject(channel);
> +  if (aTrackingResource) {
> +    httpChannel->SetIsTrackingResource(mIsThirdParty);
> +  }

ups.. I missed this?

Could we do:

if (aTrackingResource && mIsThirdParty) {
  httpChannel->SetIsTrackingResource(true);
}

?
Attachment #9019663 - Flags: review?(honzab.moz) → review+
Assignee

Comment 31

8 months ago
> Could we do:
> 
> if (aTrackingResource && mIsThirdParty) {
>   httpChannel->SetIsTrackingResource(true);
> }

If it's not a 3rd party, we should call SetIsTrackingResource(false).
Assignee

Comment 32

8 months ago
This bit is missing for some existing tests.
Attachment #9019740 - Flags: review?(ehsan)
Whiteboard: [privacy65]

Comment 33

8 months ago
Comment on attachment 9019740 [details] [diff] [review]
part 4 - propagation of the return value of GetTopWindowURI

Review of attachment 9019740 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, so even the return value matters here!  OK.
Attachment #9019740 - Flags: review?(ehsan) → review+

Comment 34

8 months ago
Do you mind filing a bug to get rid of this whole code once the SW team finishes their work on parent interception?
Flags: needinfo?(amarchesini)

Comment 35

8 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a81dc13c9089
part 1 - LoadInfo must expose the correct top-level-storage-area-principal for sub documents, r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccd2e78000b8
part 2 - TrackingDummyChannel must expose nsIHttpChannelInternal, r=ehsan, r=mayhemer
https://hg.mozilla.org/integration/mozilla-inbound/rev/2393a7d9f454
part 3 - Tests for nested iframes controlled by ServiceWorkers, r=ehsan
Assignee

Comment 36

8 months ago
In reply to :Ehsan Akhgari from comment #34)
> Do you mind filing a bug to get rid of this whole code once the SW team
> finishes their work on parent interception?

We already have it: https://bugzilla.mozilla.org/show_bug.cgi?id=1498259
Flags: needinfo?(amarchesini)
Assignee

Comment 37

8 months ago
Comment on attachment 9019582 [details] [diff] [review]
part 1 - nested iframes + SW

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1493211

User impact if declined: Nested iframes, controlled by a ServiceWorker, can trigger a crash

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: Bug 1495285

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): The fix is about exposing nsIHttpChannelInternal in the TrackingDummyChannel. Nothing risky.

String changes made/needed: none
Attachment #9019582 - Flags: approval-mozilla-beta?
Assignee

Comment 38

8 months ago
We have some crashes in 63. They are extremely rare because the crash can happen only for users who chose to use cookieBehavior 3. Do we want to uplift this and bug 1495285 to release?
Flags: needinfo?(ryanvm)
Pascal owns the 63 release.
Flags: needinfo?(ryanvm) → needinfo?(pascalc)
(In reply to Andrea Marchesini [:baku] from comment #38)
> We have some crashes in 63. They are extremely rare because the crash can
> happen only for users who chose to use cookieBehavior 3. Do we want to
> uplift this and bug 1495285 to release?

According to Socorro we don't have any crash for this signature for 63.0 on the release channel since we shipped We only had 2 crashes on the beta channel during the whole beta cycle and all the other crashes where on the aurora channel (Dev Edition), about 25 crashes last week. Given that data, I don't think we need to uplift a patch to the release channel. Thanks.
Flags: needinfo?(pascalc)
Same setup no longer crashes for me on Build 20181025220044. \o/
We're still seeing crashes with this signature on the 20181025220044 nightly builds. Baku is going to file a follow-up bug for those crashes as they're coming from a different path than this bug.
Flags: in-testsuite+
Comment on attachment 9019740 [details] [diff] [review]
part 4 - propagation of the return value of GetTopWindowURI

Per baku on IRC, this was merged into part 2.
Attachment #9019740 - Attachment is obsolete: true
Comment on attachment 9019582 [details] [diff] [review]
part 1 - nested iframes + SW

Reduces crashes when using nested iframes controlled by a ServiceWorker. Approved for 64.0b5.
Attachment #9019582 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Baku, was there a follow-up bug filed for the remaining crashes? We're clearly in a better place than we used to be, but the crashes are still there.
Flags: needinfo?(amarchesini)
Assignee

Updated

8 months ago
Blocks: 1503787
Assignee

Comment 49

8 months ago
Done: bug 1503787.
Flags: needinfo?(amarchesini)
You need to log in before you can comment on or make changes to this bug.