Open Bug 1588241 Opened 2 months ago Updated 8 days ago

Intermittent toolkit/components/antitracking/test/browser/browser_partitionedMessaging.js | We have first-party set on storagePrincipal - "" == "example.net" -

Categories

(Core :: Privacy: Anti-Tracking, defect, P2)

defect

Tracking

()

Fission Milestone M4.1

People

(Reporter: intermittent-bug-filer, Assigned: jya)

References

(Regression)

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [stockwell disabled])

Attachments

(10 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=270940365&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/Dim-wOaiS8yVOqCD2B3vyQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-10-11T21:16:19.292Z] 21:16:19 INFO - TEST-PASS | toolkit/components/antitracking/test/browser/browser_partitionedMessaging.js | We don't have first-party set on nodePrincipal - "" == "" -
[task 2019-10-11T21:16:19.293Z] 21:16:19 INFO - Buffered messages finished
[task 2019-10-11T21:16:19.296Z] 21:16:19 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/antitracking/test/browser/browser_partitionedMessaging.js | We have first-party set on storagePrincipal - "" == "example.net" -
[task 2019-10-11T21:16:19.297Z] 21:16:19 INFO - Stack trace:
[task 2019-10-11T21:16:19.298Z] 21:16:19 INFO - resource://testing-common/content-task.js line 110 > eval:ifr.onload:13
[task 2019-10-11T21:16:19.299Z] 21:16:19 INFO - Sending code to the 3rd party content
[task 2019-10-11T21:16:19.300Z] 21:16:19 INFO - GECKO(5691) | ++DOCSHELL 0x7f29dd87c800 == 7 [pid = 5818] [id = {e693c255-1ac0-47a0-903a-69ff4b885b24}]
[task 2019-10-11T21:16:19.301Z] 21:16:19 INFO - GECKO(5691) | ++DOMWINDOW == 20 (0x7f29df202f20) [pid = 5818] [serial = 768] [outer = (nil)]
[task 2019-10-11T21:16:19.302Z] 21:16:19 INFO - GECKO(5691) | ++DOMWINDOW == 21 (0x7f29dfc58c00) [pid = 5818] [serial = 769] [outer = 0x7f29df202f20]
[task 2019-10-11T21:16:19.303Z] 21:16:19 INFO - GECKO(5691) | ++DOMWINDOW == 22 (0x7f29dfc5c400) [pid = 5818] [serial = 770] [outer = 0x7f29df202f20]
[task 2019-10-11T21:16:19.304Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 36 (0x7f102e922980) [pid = 5763] [serial = 198] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.306Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 35 (0x7f102e9225c0) [pid = 5763] [serial = 195] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.307Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 34 (0x7f102e923e20) [pid = 5763] [serial = 208] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.308Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 33 (0x7f102e922b60) [pid = 5763] [serial = 203] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.309Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 32 (0x7f102f07fd40) [pid = 5763] [serial = 227] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.310Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 31 (0x7f102f07f020) [pid = 5763] [serial = 217] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html?0.042698885081182025]
[task 2019-10-11T21:16:19.312Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 30 (0x7f102e9227a0) [pid = 5763] [serial = 192] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.313Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 29 (0x7f102e922d40) [pid = 5763] [serial = 200] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.314Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 28 (0x7f102fbca7a0) [pid = 5763] [serial = 187] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.315Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 27 (0x7f102f07f980) [pid = 5763] [serial = 221] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.316Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 26 (0x7f102f07f3e0) [pid = 5763] [serial = 214] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.319Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 25 (0x7f102fbca5c0) [pid = 5763] [serial = 184] [outer = (nil)] [url = http://example.net/browser/toolkit/components/antitracking/test/browser/page.html]
[task 2019-10-11T21:16:19.320Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 24 (0x7f102f07fb60) [pid = 5763] [serial = 230] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.321Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 23 (0x7f102e922f20) [pid = 5763] [serial = 206] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.322Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 22 (0x7f102e9223e0) [pid = 5763] [serial = 190] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.324Z] 21:16:19 INFO - GECKO(5691) | --DOMWINDOW == 21 (0x7f102f07f7a0) [pid = 5763] [serial = 224] [outer = (nil)] [url = http://not-tracking.example.com/browser/toolkit/components/antitracking/test/browser/localStorageEvents.html]
[task 2019-10-11T21:16:19.326Z] 21:16:19 INFO - Console message: [JavaScript Warning: "Request to access cookie or storage on “http://tracking.example.org/browser/toolkit/components/antitracking/test/browser/3rdPartyStorage.html” was blocked because it came from a tracker and content blocking is enabled." {file: "http://example.net/browser/toolkit/components/antitracking/test/browser/page.html" line: 27 column: 19 source: "http://tracking.example.org/browser/toolkit/components/antitracking/test/browser/3rdPartyStorage.html"}]
[task 2019-10-11T21:16:19.327Z] 21:16:19 INFO - Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://tracking.example.org/browser/toolkit/components/antitracking/test/browser/3rdPartyStorageWO.html" line: 0}]
[task 2019-10-11T21:16:19.329Z] 21:16:19 INFO - GECKO(5691) | ++DOCSHELL 0x7f29dd7e6000 == 8 [pid = 5818] [id = {2862571f-3ea3-46b7-aec5-7330853a8afa}]
[task 2019-10-11T21:16:19.331Z] 21:16:19 INFO - GECKO(5691) | ++DOMWINDOW == 23 (0x7f29df203100) [pid = 5818] [serial = 771] [outer = (nil)]
[task 2019-10-11T21:16:19.332Z] 21:16:19 INFO - GECKO(5691) | ++DOMWINDOW == 24 (0x7f29dfc60000) [pid = 5818] [serial = 772] [outer = 0x7f29df203100]

A pernosco trace for this is here: https://pernos.co/debug/byaKmg98a4Os1nUqeI0bhA/index.html#f{m[A9QH,B5f4_,t[AYI,CPY_,f{e[A9QH,B5fu_,s{af1ybykAA,bASE,oGaz+GQ,uGaqIuQ___

It looks like we depend on the channel being classified from within HttpChannelChild::OnStartRequest. Stack for that is this:

::HttpChannelChild::RecvOnStartRequest () at /HttpChannelChild.cpp:488
::ChannelEventQueue::RunOrEnqueue () at /ChannelEventQueue.h:210
::StartRequestEvent::Run (this=0x7f5c94eb5000) at /HttpChannelChild.cpp:426
::HttpChannelChild::OnStartRequest () at /HttpChannelChild.cpp:607
::HttpChannelChild::DoOnStartRequest () at /HttpChannelChild.cpp:682
nsDocumentOpenInfo::OnStartRequest () at /nsURILoader.cpp:292
nsDocumentOpenInfo::DispatchContent () at /nsURILoader.cpp:413
nsDocumentOpenInfo::TryContentListener () at /nsURILoader.cpp:741
nsDSURIContentListener::DoContent () at /nsDSURIContentListener.cpp:184
nsDocShell::CreateContentViewer () at /nsDocShell.cpp:7873
nsDocShell::NewContentViewerObj () at /nsDocShell.cpp:8128
nsContentDLF::CreateInstance ()
nsContentDLF::CreateDocument () at /nsContentDLF.cpp:329
nsHTMLDocument::StartDocumentLoad () at /nsHTMLDocument.cpp:487
::Document::StartDocumentLoad () at /Document.cpp:2960
nsHTMLDocument::Reset () at /nsHTMLDocument.cpp:178
::Document::Reset () at /Document.cpp:2310
nsScriptSecurityManager::GetChannelResultPrincipals () at /nsScriptSecurityManager.cpp:283
::StoragePrincipalHelper::Create () at /StoragePrincipalHelper.cpp:80
::(anonymous namespace)::ChooseOriginAttributes () at /StoragePrincipalHelper.cpp:21

On the parent side, we receive HttpChannelParent::NotifyClassificationFlags before HttpChannelParent::OnStartRequest, but the notification flags go over the background channel, and OnStartRequest stays on the main-thread.

Looks like this was originally added in bug 1015466, to ensure that the flags arrived before OnStopRequest (which is also on the bg channel). It looks like the classification process is async, so it does seem possible that it could be later.

I think the regression here is that NotifyClassificationFlags is now delivered to DocumentChannelParent, which then forwards it on to HttpChannelParent, immediately before OnStartRequest (in FinishReplacementChannelSetup). I think that means it was just an existing race, which we're now losing more often.

baku, any ideas what we should be doing here? Should we be trying to guarantee that classification is done before OnStartRequest (and then send it over the normal main-thread channel so that it arrives on the child before too)? Or should we not be reading classification status until OnStopRequest (and can we enforce that somehow?).

Flags: needinfo?(amarchesini)
Regressed by: 1583700

Hmm, it looks like nsHttpChannel::AsyncOpen will wait for the classification callback (but didn't when bug 1015466 was written?).

I guess that means we can depend on all classification messages arriving before OnStartRequest, so we should move the IPDL bits to the PHttpChannel class, so that guarantee extends to the content process.

Assignee: nobody → jyavenard
See Also: → 1590649
See Also: → 1570912
See Also: → 1590608
Duplicate of this bug: 1590649
Duplicate of this bug: 1590276
Duplicate of this bug: 1570912
Duplicate of this bug: 1590608
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f37cd6cfcf9
disable browser_partitionedMessaging.js on all patforms r=gbrown
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

PrepentEvent was already moving the argument passed by reference. This makes it clear that the object will be moved.

Remove the need to create separate runnable classes. It's far more readable and remove the need to duplicate lots of code.

We unfortunately need to capture "this" in a ref counter to get around the static analyzer complaining about capturing this by value, even thouch the ChannelEventQueue guarantees that this will outlive the event.

Depends on D52260

We can guarantee that those callbacks will occur prior the call to OnStartRequest.
But running them all on the PHttpChannel we remove the potential race between the PHttpChannel which runs on the main thread and PHttpBackroundChannel which runs on the necko thread.

Depends on D52261

The information is now passed to the parent channel once replacement is completed.

Depends on D52262

This reverts bug 1590608 and bug 1590649.

Depends on D52263

See Also: → 1590696

And amend comment on other bugs failing for the same reasons, linking to bug that is the cause.

Fix style consistency while at it and a couple of constness issues.

Depends on D52503

Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/17e7f137067f
P1. Use move semantics with PrependEvent. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/8d2cf0fc6b9b
P2. Add ability to use lambdas for channel event handlers. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/a21cd1c41c6c
P3. Move classification methods to PHttpChannel. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/48362412c9d5
P4. Remove nsIClassifiedChannel implementation from DocumentChannelChild. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/c9a6203582ac
P5. Re-enable tests. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/449606f22baf
P6. Disable test under fission. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/125430c4316a
P7. Use NeckoTargetChannelFunctionEvent in DCC. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/6a4727c44078
P8. Use NeckoTargetChannelFunctionEvent with FTPChannelChild. r=mattwoodrow

We have a cycle in mUnknownDecoderEventQ as it now contains a strong reference to itself and unless UnknownDecoderInvolvedOnStartRequestCalled is called, it will never be cleared.

Flags: needinfo?(jyavenard)

(In reply to Jean-Yves Avenard [:jya] from comment #33)

We have a cycle in mUnknownDecoderEventQ as it now contains a strong reference to itself and unless UnknownDecoderInvolvedOnStartRequestCalled is called, it will never be cleared.

Flags: needinfo?(honzab.moz)

Jean-Yves - I will ask you one thing, please bisect at least which directory or even the particular test (or set of sub tests) cause the leak; I can't apply your patches locally on m-c with phab patch (even -a here), so those will need a rebase. It's hard to understand the code paths and the patch may be tricky. This involves diversion and it's always fragile when touched.

Flags: needinfo?(jyavenard)
See Also: → 1596295

(In reply to Honza Bambas (:mayhemer) from comment #35)

Jean-Yves - I will ask you one thing, please bisect at least which directory or even the particular test (or set of sub tests) cause the leak; I can't apply your patches locally on m-c with phab patch (even -a here), so those will need a rebase. It's hard to understand the code paths and the patch may be tricky. This involves diversion and it's always fragile when touched.

I've never managed to get moz-phab to apply patches unfortunately.

Patches where based on central already.

The best is to use one of my try push.

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=276135964&revision=fca8441736e9c041aec07d9b68df3b2ac79ba02c

In any case, removing the cycle due to HttpChannelChild::mUnknownDecoderEventQ will still cause leaks due to mEventQ not running some events / being cleared like here:
https://treeherder.mozilla.org/logviewer.html#?job_id=276141931&repo=try

leak at mozilla::net::HttpChannelChild::OnStopRequest, mozilla::net::ChannelEventQueue::FlushQueue, CompleteResume, mozilla::net::ChannelEventQueue::ResumeInterna

Those issues weren't used before because not running events or clearing the event queue wouldn't cause side effects.

Flags: needinfo?(jyavenard)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(amarchesini)

:ehsan ; the static analyser will always complain whenever you are capturing a pointer to a refcounted object by value.

There are legitimate cases on which you would want to do so; like the ChannelEventQueue ; it guarantees that the object in the queue will not outlive its owner.

Could we add a way to bypass this error? it makes for frustrating time sometimes, as we now need to deal with reference cycles that shouldn't exist in the first place.

Flags: needinfo?(ehsan)

(In reply to Jean-Yves Avenard [:jya] from comment #38)

:ehsan ; the static analyser will always complain whenever you are capturing a pointer to a refcounted object by value.

There are legitimate cases on which you would want to do so; like the ChannelEventQueue ; it guarantees that the object in the queue will not outlive its owner.

Could we add a way to bypass this error? it makes for frustrating time sometimes, as we now need to deal with reference cycles that shouldn't exist in the first place.

Do you mean something effectively like UnsafePtr in https://phabricator.services.mozilla.com/D52261 for https://hg.mozilla.org/integration/autoland/rev/8d2cf0fc6b9b#l1.35?

(How come the version of the patch that landed there was different than the one in Phabricator BTW?!)

If I understand what you want to achieve here correctly, then yes I think that should be easily possible. We could do so by adding an attribute hidden behind a macro that would allow you to write the code like this:

   template <typename T>
   NeckoTargetChannelFunctionEvent(T* aChild, std::function<void()>&& aCallback)
       : ChannelFunctionEvent(
            [MOZ_UNCHECKED child = aChild]() {
               MOZ_ASSERT(child);
               return child->GetNeckoTarget();
            },
            std::move(aCallback)) {}

Or something to that effect. That way usage of refcounted pointers with lambdas is safe by default and if you know the lifetime of your objects precisely and would like to opt out of the compiler-enforced safety checks you could do so and make it clear to the reader of the code that the specific lambda capture in question isn't being checked.

If this sounds good please file a bug and CC Andi, he should be able to help you out. :-)

Flags: needinfo?(ehsan)

This is blocking further fission work

Fission Milestone: --- → M4.1
Priority: P5 → P2

nsTArray::InsertElementsAt is infallible. So the test checking that nullptr wasn't returned would always be false.

Seeing that queuing an event is also an infallible operation, there's no point for prepentEvents to be.

(In reply to :ehsan akhgari from comment #39)

(In reply to Jean-Yves Avenard [:jya] from comment #38)

:ehsan ; the static analyser will always complain whenever you are capturing a pointer to a refcounted object by value.

There are legitimate cases on which you would want to do so; like the ChannelEventQueue ; it guarantees that the object in the queue will not outlive its owner.

Could we add a way to bypass this error? it makes for frustrating time sometimes, as we now need to deal with reference cycles that shouldn't exist in the first place.

Do you mean something effectively like UnsafePtr in https://phabricator.services.mozilla.com/D52261 for https://hg.mozilla.org/integration/autoland/rev/8d2cf0fc6b9b#l1.35?

yes

(How come the version of the patch that landed there was different than the one in Phabricator BTW?!)

Because I had hoped to resolve the cycle first rather than getting around the problem. That a cycle cause a leak is the apparent effect of that event taskqueue not always running a task. The problem has now appeared to be more complex than first thought and I have no particular interest in fixing it anymore as it's been a long problem (even if the effect were not noticed). So instead I've created bug 1596295.

If I understand what you want to achieve here correctly, then yes I think that should be easily possible. We could do so by adding an attribute hidden behind a macro that would allow you to write the code like this:

   template <typename T>
   NeckoTargetChannelFunctionEvent(T* aChild, std::function<void()>&& aCallback)
       : ChannelFunctionEvent(
            [MOZ_UNCHECKED child = aChild]() {
               MOZ_ASSERT(child);
               return child->GetNeckoTarget();
            },
            std::move(aCallback)) {}

yes, indeed that's precisely what I'm after; it makes it much easier to use automated capture with [=]

Or something to that effect. That way usage of refcounted pointers with lambdas is safe by default and if you know the lifetime of your objects precisely and would like to opt out of the compiler-enforced safety checks you could do so and make it clear to the reader of the code that the specific lambda capture in question isn't being checked.

If this sounds good please file a bug and CC Andi, he should be able to help you out. :-)

will do.

Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a6684436b37
P1. Use move semantics with PrependEvent. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/a00f164555ff
P2. Add ability to use lambdas for channel event handlers. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/2dda521c0591
P3. Move classification methods to PHttpChannel. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/e9ca1ee307d6
P4. Remove nsIClassifiedChannel implementation from DocumentChannelChild. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/354253d41c6a
P5. Re-enable tests. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/c70d60f8293c
P6. Disable test under fission. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/693144fd8c54
P7. Use NeckoTargetChannelFunctionEvent in DCC. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/0ba862a839d1
P8. Use NeckoTargetChannelFunctionEvent with FTPChannelChild. r=mattwoodrow
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2909f92bb594
P9. Make ChannelEventQueue::PrependEvents infallible. r=mattwoodrow
You need to log in before you can comment on or make changes to this bug.