Assertion failure: !mTrackingScripts.Contains(aURL), at nsDocument.cpp:3035

REOPENED
Unassigned

Status

()

defect
P2
normal
REOPENED
2 years ago
3 months ago

People

(Reporter: cbook, Unassigned)

Tracking

(Blocks 1 bug, {assertion})

Firefox Tracking Flags

(Not tracked)

Details

()

Attachments

(2 attachments)

Reporter

Description

2 years ago
found via bughunter and reproduce with latest m-c tinderbox debug build on windows 7 

Steps to reproduce:
-> Load 	https://www.privesc.eu/
--> Assertion failure 

Assertion failure: !mTrackingScripts.Contains(aURL), at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/dom/base/nsDocument.cpp:3035
This was added in bug 1321868.
Blocks: 1321868
Flags: needinfo?(ehsan)

Comment 2

2 years ago
Does this happen reliably?  Any chance you could elaborate how you reproduce this?  I tried to reproduce unsuccessfully.
Flags: needinfo?(ehsan) → needinfo?(cbook)
Reporter

Comment 3

2 years ago
working on this and will report back
Flags: needinfo?(cbook)
(In reply to Carsten Book [:Tomcat] from comment #3)
> working on this and will report back

Please let us know when you've had a chance to reproduce. Thanks.
Flags: needinfo?(cbook)

Comment 5

2 years ago
fyi, I submitted this url to bughunter twice and could not reproduce. I also couldn't find this in production bughunter at all. Tomcat: was this on your local instance?
Reporter

Comment 6

2 years ago
yeah so far didn't crash anymore, closing as incomplete!
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(cbook)
Resolution: --- → INCOMPLETE

Comment 7

2 years ago
Nightly 57 from today, Fedora 26:
https://kobieta.onet.pl/brigitte-macron-w-cygaretkach-oraz-bluzce-z-odslonietymi-ramionami-podczas-oficjalnej/10gjvy3

The first time I loaded the url, it took a while to assert. The second time I had to reload several times. Bughunter has 57 urls where this has been reproduced in the last month.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Comment 8

2 years ago
Posted file Debug Log
This is from my local build which reproduced the assertion after several reloads.
Looks like we have a URL here, Ehsan.
Flags: needinfo?(ehsan)
Priority: -- → P2

Comment 10

2 years ago
I tried loading the URL in comment 7 tens of times without any chance to reproduce the crash on trunk.

Bob, you mentioned there are 57 URLs that trigger this assertion.  Any chance you can post them here please?  Thanks!
Flags: needinfo?(ehsan) → needinfo?(bob)

Comment 11

2 years ago
I would have to screen them first for content or pii. I assume that you are primarily looking for a reproducible test case. I retested the urls that produced this assertion going back to 7/19 and only had two instances where it reproduced. I am beginning to think this might be ad related which would account for its lack of reproducibility. 

One of the urls is the one in this bug. https://kobieta.onet.pl/brigitte-macron-w-cygaretkach-oraz-bluzce-z-odslonietymi-ramionami-podczas-oficjalnej/10gjvy3 reproduced on Windows 7 32bit. I manually reproduced it there a couple of times in quick succession but have been unable to do so again.

The other url https://career.aglasem.com/ssc-cpo-si-asi-admit-card/ was also reproduced on Windows 7 32bit. I have managed to reproduce it 3 times on Fedora 26 x86_64 with a fresh build from this afternoon. I reloaded the page if it did not reproduce on the first load.

If you really want the full list, just NI me again and I'll screen the urls.
Flags: needinfo?(bob)

Comment 12

2 years ago
(In reply to Bob Clary [:bc:] from comment #11)
> I would have to screen them first for content or pii. I assume that you are
> primarily looking for a reproducible test case. I retested the urls that
> produced this assertion going back to 7/19 and only had two instances where
> it reproduced. I am beginning to think this might be ad related which would
> account for its lack of reproducibility. 

Yes, I'm only interested in reproducible cases indeed.

And this is definitely going to depend on tracking scripts typically used by ads, so by definition we're gonna have a hard time reproducing it sadly...

> One of the urls is the one in this bug.
> https://kobieta.onet.pl/brigitte-macron-w-cygaretkach-oraz-bluzce-z-
> odslonietymi-ramionami-podczas-oficjalnej/10gjvy3 reproduced on Windows 7
> 32bit. I manually reproduced it there a couple of times in quick succession
> but have been unable to do so again.
> 
> The other url https://career.aglasem.com/ssc-cpo-si-asi-admit-card/ was also
> reproduced on Windows 7 32bit. I have managed to reproduce it 3 times on
> Fedora 26 x86_64 with a fresh build from this afternoon. I reloaded the page
> if it did not reproduce on the first load.

Well let me first try out those two URLs tomorrow and see if I hit the assertion with one of them, and if not I'll bother you again!  Thanks a lot for your help so far.

Comment 13

2 years ago
Posted file tracking-urls.txt
There where 59 total urls. I scrubbed two. I ran these on a debug build on Fedora using the marionette based runner and reproduced this assertion on

https://career.aglasem.com/ssc-cpo-si-asi-admit-card/ 

I repeated manually and also reproduced again. A saved version did not reproduce so get it while you can.

Comment 15

2 years ago
So what's happening here is that at the time the assertion occurs, the channel in the child process doesn't get the HttpChannelChild::ProcessNotifyTrackingResource() function called on it yet so we think the channels resource isn't a tracking resource.

I noticed that bug 1015466 and bug 1338493 have completely changed the way that these notifications are delivered in between the parent and child processes since I wrote this code.  We send this notification from the parent in <https://searchfox.org/mozilla-central/rev/d08b24e613cac8c9c5a4131452459241010701e0/netwerk/protocol/http/HttpChannelParent.cpp#1751> and expect to receive it in <https://searchfox.org/mozilla-central/rev/d08b24e613cac8c9c5a4131452459241010701e0/netwerk/protocol/http/HttpBackgroundChannelChild.cpp#404>.

SC, in the current setup, is there anything that guarantees that RecvNotifyTrackingResource() gets called before OnStopRequest()?  That seems to not be the case.  Can we enforce such a guarantee somehow?
Flags: needinfo?(schien)
After bug 1325054 is landed, RecvNotifyTrackingResource is always called before OnStartRequest. Therefore we can guarantee that RecvNotifyTrackingResource will get called before OnStopRequest.
Flags: needinfo?(schien)

Comment 18

2 years ago
(In reply to (PTO 9/21 - 9/24) Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #16)
> After bug 1325054 is landed, RecvNotifyTrackingResource is always called
> before OnStartRequest. Therefore we can guarantee that
> RecvNotifyTrackingResource will get called before OnStopRequest.

Doesn't bug 1370211 comment 3 kind of contradict what you said here?  :-)

I have tried to debug this for several days, and have caught this under rr a couple of times.  Frustratingly, each time I haven't been able to replay the parent process inside rr so haven't been able to check what is happening on the parent side, but I have checked all of the logic on the child side and from what I can tell, the *only* way this can be happening is if RecvNotifyTrackingResource gets called too late in some situations.  This was happening with the Google Analytics script URL for me, so I'm 100% sure that this is a race condition based on that, since that script is classified as a tracking resource and there is no reason for RecvNotifyTrackingResource to not get called in the child process if a classification on that URL is triggered.

Is there any chance you can have a look at this please?  I'm not familiar with the Necko parts involved here, they're all new to me.  :-)  Thanks!
Flags: needinfo?(schien)
Yes I'm already digging to this bug, just too lazy to assign it to myself. ;)
Assignee: nobody → schien
Flags: needinfo?(schien)
SC and I found that this should have nothing to do with RecvNotifyTrackingResource being called too late.

From the log, I saw that the http channel at child side was intercepted by a service worker. AsyncOpen returned at [1]. So, nsChannelClassifier in parent process was not involved to check this URI. That's why this channel is not marked as tracking resource.

I think we should somehow make nsChannelClassifier work in child process, so the intercepted channel can by marked as tracking resource.

Ehsan, Francois, what do you think?

[1] https://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/netwerk/protocol/http/HttpChannelChild.cpp#2406
Flags: needinfo?(francois)
Flags: needinfo?(ehsan)
Thanks @kershaw for figuring out the root cause! Unassigned myself based on comment #20, since this might not be fixed in Necko.
Assignee: schien → nobody

Comment 22

2 years ago
Oh interesting, thanks for the investigation.

I guess we can use asyncClassifyLocalWithTables() from the content process in order to classify these URLs.  However we have been trying to avoid making an IPC roundtrip to the parent process for channels that are intercepted by a service worker so far.  The result of this classification needs to arrive before OnStopRequest, but still it would be nice if we could avoid the IPC if we can.  Francois, can you think of a way to do that?

Also, should we change the code in nsChannelClassifier to do this in addition to what it currently does, or just add some custom code for the case of intercepted channels?
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #22)
> The result of this classification
> needs to arrive before OnStopRequest, but still it would be nice if we could
> avoid the IPC if we can.  Francois, can you think of a way to do that?

Are we talking about the classification in the case where TP is in "passive" mode (not blocking anything) or when it's turned ON? In the case where it's turned on, the result needs to arrive before we even connect to the server.

I'm not sure if we can easily avoid going to the main thread for the classification. Updates need to be able to block local list lookups while they swap the old list with the new one.
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #23)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #22)
> > The result of this classification
> > needs to arrive before OnStopRequest, but still it would be nice if we could
> > avoid the IPC if we can.  Francois, can you think of a way to do that?
> 
> Are we talking about the classification in the case where TP is in "passive"
> mode (not blocking anything) or when it's turned ON? In the case where it's
> turned on, the result needs to arrive before we even connect to the server.
> 

Yes, we are talking about the channel annotation. SetIsTrackingResource() is only called when TP is disabled.

> I'm not sure if we can easily avoid going to the main thread for the
> classification. Updates need to be able to block local list lookups while
> they swap the old list with the new one.

The thing in this bug is that there is a tracking script which is loaded multiple times on a site where there is a service worker registered. At first time that script is loaded from network, and the second time it is from the service worker.
When a http channel is intercepted by a service worker in child process, the child channel will not send a IPC to parent process to build the real http channel. Unfortunately, the channel classification is only happened in parent process now. So, the child channel will not be marked as a tracking resource.

I think maybe we can add a table in nsChannelClassifier to cache tracking urls. In this way, we can do the url classification for the intercepted channel in child process. In parent process, we can also avoid the redundant db check if there is a hit in the table.

Any thoughts?
Flags: needinfo?(francois)

Comment 25

2 years ago
Here is an alternative proposal:

Perhaps it would make sense to record in the DOM Cache whether a channel came from a tracking resource, and when synthesizing a response from a cache entry we can mark the channel as tracking if the cached content originally came from ma tracking channel?
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #25)
> Perhaps it would make sense to record in the DOM Cache whether a channel
> came from a tracking resource, and when synthesizing a response from a cache
> entry we can mark the channel as tracking if the cached content originally
> came from ma tracking channel?

If you want an invariant, then I don't think this is adequate.  Many sites with synthesize a new Response based on the fetch'd Response.  They do this to tweak the Response, etc.  This would lose your hidden flag.

Is this something that could wait until after we move interception to the parent?
Flags: needinfo?(francois)

Comment 27

2 years ago
(In reply to Ben Kelly [:bkelly] from comment #26)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #25)
> > Perhaps it would make sense to record in the DOM Cache whether a channel
> > came from a tracking resource, and when synthesizing a response from a cache
> > entry we can mark the channel as tracking if the cached content originally
> > came from ma tracking channel?
> 
> If you want an invariant, then I don't think this is adequate.  Many sites
> with synthesize a new Response based on the fetch'd Response.  They do this
> to tweak the Response, etc.  This would lose your hidden flag.

Yes, good point.  That solution would only work for the simplest case of the Response being cached untouched, so we'd still have this problem for other cases.

> Is this something that could wait until after we move interception to the
> parent?

Yes I think so.  As things are, the implication of this bug is that we may think some tracking scripts are non-tracking and end up not throttling them which is unfortunate but not disastrous.

Do you mind setting up the correct bug dependencies so that we remember to get back to this once the work to move the interception to the parent process is finished please?  Thanks!
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.