The default bug view has changed. See this FAQ.

Pass ancestors URLs information to webRequest.onBeforeRequest listeners

ASSIGNED
Assigned to
(NeedInfo from)

Status

()

Toolkit
WebExtensions: Request Handling
P3
normal
ASSIGNED
6 months ago
2 days ago

People

(Reporter: mao, Assigned: evilpie, NeedInfo)

Tracking

(Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: triaged)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

6 months ago
Both Adblock Plus and NoScript (and presumably similar extension) have use cases for knowing the whole ancestors chain, up to the top frame, for both document and sub resources loads.

This is fairly simple to obtain in a nsIContentPolicy (which we still use, but only for non-HTTP loads) by directly walking up the frames hierarchy; on the other hand, AFAIK, is currently impossible in HTTP observers living in the parent process, which is how onBeforeRequest is implemented in the HTTP case.

Therefore I believe we preliminarily need to augment nsILoadInfo by adding another array attribute "ancestors", similar to redirectChainIncludingInternalRedirects, but filling it with the loading element's ancestors principals, starting with parent frame if any: if this is the top frame, the array would be empty, otherwise the first element would be the same as nsILoadInfo.loadingPrincipal, the second element would be the parent document's principal and so on, up to the top frame.

Once nsILoadInfo.ancestors is in place, we can use it in onBeforeRequest to fill an analogous "ancestors" array attribute in the event's data object, with the URLs (as strings)  of the collected principals.

Kris, Bill, what do you think?
(Reporter)

Updated

6 months ago
Blocks: 1166496
OS: Unspecified → All
Hardware: Unspecified → All

Comment 1

6 months ago
Would that also be include the owner frames for loads from workers?
(Reporter)

Comment 2

6 months ago
(In reply to The 8472 from comment #1)
> Would that also be include the owner frames for loads from workers?

I suspect that would be significantly more difficult, unless we cache this information at worker creation time.
Asking someone surely more knowledgeable than me. Andrea?
Flags: needinfo?(amarchesini)

Updated

6 months ago
Whiteboard: triaged

Updated

6 months ago
Blocks: 1226547
(Reporter)

Updated

6 months ago
Blocks: 1214733
(Assignee)

Updated

4 months ago
Assignee: nobody → evilpies
(Assignee)

Updated

4 months ago
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Assignee)

Comment 3

2 months ago
I find it difficult to come up with a good definition of what should actually be included in "frameAncestors".
Let's say we have some setup like this "top level document > iframe 1 > iframe 2 > image.png"
top level document => [] ancestors
iframe 1 => [] or [top level] ? "top level document" would already be the originUrl in this case
iframe 2 => [iframe 1, top level] or [top level], again "iframe1" would be the originUrl
image.png => [iframe 1, top level] or [iframe 2, iframe 1, top level]

Comment 4

2 months ago
these seem the most logical choices to me:

> top level document => [] ancestors
> iframe 1 => [top level]
> frame 2 => [iframe 1, top level]
> image.png => [iframe 2, iframe 1, top level]


But there is one further complication, the triggering principal. When an image is loaded by CSS, shouldn't we get the following?

image.png => [stylesheet, iframe2, iframe1, top level]
(Assignee)

Comment 5

2 months ago
Created attachment 8828906 [details] [diff] [review]
WIP frameAncestors

This sort of mixes two different ones, sorry. ContentPolicy vs LoadInfo is different here.

LoadInfo behaves like this
top level => []
iframe 1 => []
iframe 2 => [top level]
image.png => [iframe 1, iframe 2]

The content policy implementation is a bit different.

Comment 6

2 months ago
I think anything that requires post-processing by the addon to *add* information would end up being stateful. E.g. resolving tab or frame IDs to URIs would require keeping state about those relationships. URIs are ambiguous, frame IDs get reused if the frames get navigated (backwards navigation!). So it would be easy to miss some edge-cases.

Snapshotting as much information as possible during the correct time in the child process should yield the most accurate picture. Stripping information out of the array would be a lot easier than trying to fill in the blanks.
I just illustrated a way to get the tab during a request in bug 1330600 comment 9, could something along that line be used to work your way up through the ancestors?

Comment 8

2 months ago
(In reply to Shane Caraveo (:mixedpuppy) from comment #7)
> I just illustrated a way to get the tab during a request in bug 1330600
> comment 9, could something along that line be used to work your way up
> through the ancestors?

a) that uses async APIs, which means it's potentially subject to racy behavior
b) it only works for top level frames, i.e. the tab itself. there's no API to query information on frames
webNavigation.getAllFrames wouldn't work?  Arguably a weird place for that api, it would feel more correct if it were tabs.Tab.getAllFrames.  In fact it seems like that provides whats necessary for an ancestor chain.
and I'm really not sure about the racy behavior, all the apis are async.  if you're using blocking onBeforeRequest, the request is suspended so subrequests wouldn't happen.  What would be happening to cause racy conditions?

Comment 11

2 months ago
Oh, right, my mistake. That leaves the races.

> What would be happening to cause racy conditions?

Perhaps javascript in the child process navigating pages or frames? Especially back navigation, which won't trigger a webrequest. And with out of process addons other addons might be acting concurrently on other requests (suspending one request won't necessarily delay processing of others AIUI) and trigger additional actions. So anything could happen in the meantime.
(In reply to The 8472 from comment #11)
> Oh, right, my mistake. That leaves the races.
> 
> > What would be happening to cause racy conditions?
> 
> Perhaps javascript in the child process navigating pages or frames?

I don't see how using webrequest with a pre-filled ancestor chain would change that.

> Especially back navigation, which won't trigger a webrequest. 

Both the OP and the WIP attached to this bug only address webrequest, so I'm not clear how that makes a difference.

> And with out
> of process addons other addons might be acting concurrently on other
> requests (suspending one request won't necessarily delay processing of
> others AIUI) and trigger additional actions. So anything could happen in the
> meantime.

Suspending a request will prevent the resource being loaded from loading other resources.  ie. the html page in a frame being suspended, the scripts/etc. it loads would not be available at that time.

Even with the changes in the WIP, multiple addons may be processing the request anyway and there is no promised order to that, so I don't see how doing this low level vs. using getAllFrames makes any difference there.

I am playing devils advocate here, and am not really against this change if it adds real value, but I'm trying to understand what it introduces that cannot be done already.

Comment 13

2 months ago
> Suspending a request will prevent the resource being loaded from loading other resources.

Child resources. But what about the ancestors? If the ancestor changes in the meantime, e.g. by frame navigation then getAllFrames might return frame URLs that are inconsistent with the ancestor chain that actually triggered the request.
(Reporter)

Comment 14

2 months ago
My use case, and the reason why I filed this bug, depends on the information about ancestors of the document which triggered the load being an accurate snapshot, freezing the moment when the load has been attempted.
When I look at the request info asynchronously I don't care if something has changed in the meanwhile, provided that my request is suspended, because I need to decide about that snapshot info.
Example (and an actual requirement of the Tor Browser): I need to know whether all the parents of the document being loaded had an HTTPS URI *when the load started*.

Therefore, taking the ancestor chain snapshot in the child process at the moment the load is triggered is the right thing to do even if the processing happens asynchronously, later, on a suspended channel. Looking at the ancestors chain "live" at processing time (i.e. generally after the fact) using getAllFrames() is not OK at all, as suggested also in comment 13.
Ok, that is a much clearer requirement, makes sense.  From the looks of this, you only need the urls, not a frameId or anything.  Knowing which frame a request was made in is not necessary?  I haven't looked deeply at the WIP, but I wonder if the frameId would be easy to just add in case it was useful.
(Reporter)

Comment 16

2 months ago
(In reply to Shane Caraveo (:mixedpuppy) from comment #15)
> Ok, that is a much clearer requirement, makes sense.  From the looks of
> this, you only need the urls, not a frameId or anything.  Knowing which
> frame a request was made in is not necessary?  I haven't looked deeply at
> the WIP, but I wonder if the frameId would be easy to just add in case it
> was useful.

The frameIds are useful (required in some real use cases in NoScript / Tor Browser) to associate and preserve some state to the frame structure, e.g. in order to make permission decisions based on previous decisions for the ancestors (like "Cascade permissions" or "Restrict subframes if parent or top frame has been restricted") or other data from the requests which loaded the ancestors (e.g. HTTP headers like CSP).
(Assignee)

Comment 17

2 months ago
frameIDs are a good idea, I didn't even think about adding those. Would be unfortunate if we had to add yet another array to LoadInfo for this though.

Comment 18

2 months ago
If I wanted to gather additional information in the child process and add it to the requestinfo, would that be possible through webext experiments?
(In reply to Tom Schuster [:evilpie] from comment #17)
> frameIDs are a good idea, I didn't even think about adding those. Would be
> unfortunate if we had to add yet another array to LoadInfo for this though.

Why would it need to be another array?  Just a new interface and an array of that interface.  In fact, it could probably just match the objects from webNavigation.getAllFrames.

[{tabId, processId, frameId, parentFrameId, url},...]
(In reply to The 8472 from comment #18)
> If I wanted to gather additional information in the child process and add it
> to the requestinfo, would that be possible through webext experiments?

If you mean the details objects that webrequest listeners receive, I don't think so, but an interesting thought (at least in respect to webRequest).  If not, what is requestinfo?

Comment 21

2 months ago
Yes, I meant the details.

Comment 22

2 months ago
Would it be possible to include information about html5 sandboxing restriction then? I think for privacy addons it might make sense to differentiate between sandboxed and non-sandboxed loads since the former can't use any persistent storage.

Updated

2 months ago
webextensions: --- → ?
(Assignee)

Comment 23

2 months ago
For frameIDs, we would probably just pass the outerWindowID around and lookup the right frame id in the webextension code. This should be easily doable.
Tom, we'd like to target this for 55.  Let me know if you'd like me to take over.
Flags: needinfo?(evilpies)
(Assignee)

Comment 25

a month ago
No I am still working on this, we are easily going to make 55.
Flags: needinfo?(evilpies)
(Assignee)

Comment 26

10 days ago
Created attachment 8849070 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs
(Assignee)

Comment 27

10 days ago
Created attachment 8849072 [details] [diff] [review]
Expose frameAncestors to webextensions

I changed frameAncestors to an array of objects, with the keys url and windowId, that should allow us to extend this in the future as required.

I still need to finish writing some tests, especially around the "WebRequest content policy" and make sure the behavior matches with the loadInfo implementation.
Attachment #8828906 - Attachment is obsolete: true
(Assignee)

Comment 28

10 days ago
Created attachment 8849082 [details] [diff] [review]
Expose frameAncestors to webextensions

This changes frameAncestors to behave like described in Comment 4. I find this should be the least surprising behavior for users.

Now the loadInfo based code should match the code for content policy.
Attachment #8849072 - Attachment is obsolete: true

Updated

2 days ago
webextensions: ? → +
You need to log in before you can comment on or make changes to this bug.