1487227 - Crash in mozilla::dom::ClientSource::WindowExecutionReady hitting MOZ_DIAGNOSTIC_ASSERT(spec.LowerCaseEqualsLiteral("about:blank") || StringBeginsWith(spec, static_cast<const nsLiteralCString&>(nsLiteralCString("" "blob:"))) || nsContentUtils::Storag...

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-70192428-acdc-471c-adf6-b4d1f0180829. ============================================================= There are a bunch of crashes showing up in nightly that are hitting: MOZ_DIAGNOSTIC_ASSERT(spec.LowerCaseEqualsLiteral("about:blank") || StringBeginsWith(spec, static_cast<const nsLiteralCString&>(nsLiteralCString("" "blob:"))) || nsContentUtils::StorageAllowedForWindow(aInnerWindow) == nsContentUtils::StorageAccess::eAllow) (seems like one user filing 6 of them, and then somebody else hitting it once). Top 10 frames of crashing thread: 0 xul.dll mozilla::dom::ClientSource::WindowExecutionReady dom/clients/manager/ClientSource.cpp:286 1 xul.dll nsGlobalWindowOuter::SetNewDocument dom/base/nsGlobalWindowOuter.cpp:1948 2 xul.dll nsresult nsDocumentViewer::InitInternal layout/base/nsDocumentViewer.cpp:1033 3 xul.dll nsDocumentViewer::Init layout/base/nsDocumentViewer.cpp:772 4 xul.dll nsresult nsDocShell::Embed docshell/base/nsDocShell.cpp:6674 5 xul.dll nsDSURIContentListener::DoContent docshell/base/nsDSURIContentListener.cpp:196 6 xul.dll bool nsDocumentOpenInfo::TryContentListener uriloader/base/nsURILoader.cpp:759 7 xul.dll nsDocumentOpenInfo::OnStartRequest uriloader/base/nsURILoader.cpp:306 8 xul.dll void mozilla::net::HttpChannelChild::DoOnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:765 9 xul.dll void mozilla::net::HttpChannelChild::OnStartRequest netwerk/protocol/http/HttpChannelChild.cpp:639 =============================================================

Flags: needinfo?(bugmail)

Yaron Tausky [:ytausky]

Updated

•

7 years ago

Priority: -- → P1

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 1

•

7 years ago

Immediate crash stack observations: - The 61 release failures seem like legit crashes that are not related to this assertion. - In all of the nightly crashes we're seeing that I've looked at: - The controller is set but we're actually receiving an OnStartRequest via IPC from the parent. - There rarely appears to be a ServiceWorker thread active in the child. This suggests that some SW scope matched but that we reset interception prior to actually spinning up the ServiceWorker. The most likely explanation for this is our no-fetch optimization. - In the no-fetch optimization we'll set the controller at https://searchfox.org/mozilla-central/rev/05d91d3e02a0780f44599371005591d7988e2809/dom/serviceworkers/ServiceWorkerManager.cpp#2200 and reset interception at https://searchfox.org/mozilla-central/rev/2fe43133dbc774dda668d4597174d73e3969181a/dom/serviceworkers/ServiceWorkerPrivate.cpp#1694 - This is notable because it's a very common occurrence when ServiceWorkers are registered for push notifications, and it doesn't involve complicated edge-cases like the registration having disappeared. A notable point about the assertion is that it's about us having a controller and storage access being forbidden which means we shouldn't have a controller. This means that the assertion doesn't care about other logic errors like us having a controller marked when the document isn't actually being controlled. (Although we may have an assertion about that somewhere?) In practice, it appears that we could screw this up when the no-fetch optimization is involved and just continually bounce things off the no-fetch ServiceWorker. Ugh, which could explain a performance complaint that reached me via the grape-vine that seemed like it shouldn't happen because of our no-fetch optimization... Anyways, the point is also that we do a StorageAllowedForChannel check when we get the navigation request. And that should avoid this problem... BUT... There is an asymmetry in the decision-making process between nsContentUtils::StorageAllowedForWindow used by the assertion and nsContentUtils::StorageAllowedForChannel(aChannel) used by ServiceWorkerInterceptController::ShouldPrepareForIntercept. StorageAllowedForChannel passes aWindow=null where StorageAllowedForWindow does not and this results in the following deviations: - InternalStorageAllowedForPrincipal does an aWindow-dependent SandboxFlags SANDBOXED_ORIGIN check. (It also does a privatebrowsing document check, but that's the legacy check that would only deviate for system principal windows marked as private browsing. The OriginAttribute checks should already be good enough for our purposes.) - StorageDisabledByAntiTracking gets invoked with the window and it... - farms out a decision to StorageDisabledByAntiTrackingInternal that will return disabled=true if !AntiTrackingCommon::IsFirstPartyStorageAccessGrantedFor(aWindow,...) - AntiTrackingCommon::IsFirstPartyStorageAccessGrantedFor does a ton of stuff. So there does seem to be some appreciable deviation here. So the general theory here would be: - In at least some situations we fail to clear the controller when we reset interception. This might be specific to interception resetting prior to actually dispatching the fetch event. - This means no-fetch optimized sites are impacted. - Tracking protection is involved, narrowing the number of impacted sites further since that depends on tracking protection being enabled (which is only enabled by default on private browsing? at least prior to our recent announcement... and SW's would be categorically denied in private browsing mode, so we'd never experience those scenarios), and also because it's quite possible that many users who would enable tracking protection outside of private browsing would set other settings that would disable ServiceWorkers even before tracking protection would get involved. The obvious next-steps for my investigation are: - Ensure that our no-fetch optimization is properly clearing the controller. - Consider landing a patch that gives us an additional bit of entropy, which is whether the channel-check and window-check for storage agree or not. This would explain a lot, but it still wouldn't explain why we have a controller marked when we no longer should. Leaving the needinfo up on myself for the next steps.

Julien Cristau [:jcristau]

Updated

•

7 years ago

status-firefox63: --- → affected

Marion Daly [:mdaly]

Updated

•

7 years ago

QA Whiteboard: DWS_NEXT

Marion Daly [:mdaly]

Updated

•

7 years ago

QA Whiteboard: DWS_NEXT

Whiteboard: DWS_NEXT

Marion Daly [:mdaly]

Comment 2

•

7 years ago

P1: What's the status on this bug?

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 3

•

7 years ago

The debugger confirms that we're not clearing the controller when we reset interception which is indeed problematic for navigations in general. (This is from our dom/serviceworkers/test/test_nofetch_handler.html test confirming that at WindowExecutionReady time we still have a controller marked subsequent to invoking ResetInterception and we end up with the same IPC stuff on the stack.) I need to do a little more investigation / reproduction as it relates to the asymmetry in storage-allowed-checking. It looks likely that the existing tests for third-party-cookie settings at https://searchfox.org/mozilla-central/source/dom/serviceworkers/test/test_third_party_iframes.html could be extended to cover whatever's going on. I should have a test and fix on Wednesday.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

7 years ago

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

Marcia Knous [:marcia]

Comment 4

•

7 years ago

Adding top crash keyword since is in the top 3 in nightly.

Keywords: topcrash

Josh Tumath

Comment 6

•

7 years ago

If it helps, I encounter this bug consistently when trying to play embedded Twitter videos on this Web page with third-party cookies and 'slow loading' trackers disabled: http://www.bbc.co.uk/programmes/articles/57f0LhFBpYpbzvTzXTjvklC/7-ways-strictly-week-one-made-a-splash-on-social

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 7

•

7 years ago

Yes, it does, thank you. I also got a good STR in bug 1491389 but the stumbling block has been in adding automated test coverage. It appears there may be an answer in bug 1493211 as to why I was not able to reproduce the crash. Specifically, that tracking protection breaks Service Worker interception when nested iframes are involved, which is how the test itself is structured. We should be able to resolve this shortly.

(no longer active)

Comment 8

•

7 years ago

FWIW I suspect the root cause here may be the imbalance between the window and channel paths that I mentioned in bug 1493211 comment 13...

(no longer active)

Comment 9

•

7 years ago

(In reply to :Ehsan Akhgari from comment #8) > FWIW I suspect the root cause here may be the imbalance between the window > and channel paths that I mentioned in bug 1493211 comment 13... If my guess here is correct, this should fix the crash: https://hg.mozilla.org/integration/autoland/rev/b69245da928a

Andrew McCreight [:mccr8]

Comment 10

•

7 years ago

(In reply to :Ehsan Akhgari from comment #9) > If my guess here is correct, this should fix the crash: > https://hg.mozilla.org/integration/autoland/rev/b69245da928a This crash did go away starting with the build that that commit landed in.

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Depends on: 1493211

Resolution: --- → FIXED

(no longer active)

Updated

•

7 years ago

Resolution: FIXED → DUPLICATE

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 12

•

7 years ago

For posterity, my mention in comment 1 and comment 3 about us improperly resetting interception are incorrect. ServiceWorkerContainer.controller should still be set to the matching ServiceWorker per step 12.6 of "Handle Fetch" even if we don't dispatch a fetch event or FetchEvent.respondWith is not invoked. The higher-level section 2.4 https://w3c.github.io/ServiceWorker/#selection covers this well; selection is independent of the fetch event.

Pascal Chevrel:pascalc

Comment 13

•

7 years ago

Bug 1493211 was uplifted to 63

status-firefox63: affected → fixed

Aria Beingessner [:Gankra]

Comment 14

•

7 years ago

The crash stats on this seems to suggest this wasn't actually fixed? (or regressed?) My friend seems to be running into this on linux: https://crash-stats.mozilla.com/report/index/be0e63ae-56bc-4ccc-ae24-e6f2a0181028#tab-details

Marcia Knous [:marcia]

Comment 15

•

7 years ago

(In reply to Alexis Beingessner [:Gankro] from comment #14) > The crash stats on this seems to suggest this wasn't actually fixed? (or > regressed?) > > My friend seems to be running into this on linux: > https://crash-stats.mozilla.com/report/index/be0e63ae-56bc-4ccc-ae24- > e6f2a0181028#tab-details This was just fixed on nightly on the 25th, and it appears from the crash report he is running a build from the 23rd. Please have him download a build after the 25th. Thanks.

(no longer active)

Comment 16

•

7 years ago

(Note that the final fix for this crash landed in bug 1493211.)

(no longer active)

Comment 17

•

7 years ago

(In reply to :Ehsan Akhgari from comment #16) > (Note that the final fix for this crash landed in bug 1493211.) Err, sorry bug 1499995. Also note https://bugzilla.mozilla.org/show_bug.cgi?id=1499995#c44; this may actually not be completely fixed yet...

Bugzilla

Crash in mozilla::dom::ClientSource::WindowExecutionReady hitting MOZ_DIAGNOSTIC_ASSERT(spec.LowerCaseEqualsLiteral("about:blank") || StringBeginsWith(spec, static_cast<const nsLiteralCString&>(nsLiteralCString("" "blob:"))) || nsContentUtils::Storag...

Categories

(Core :: DOM: Service Workers, defect, P1)

Tracking

()

People

(Reporter: dbaron, Assigned: asuth)

References

Details

(Keywords: crash, topcrash, Whiteboard: DWS_NEXT)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17