Pass ancestors URLs information to webRequest.onBeforeRequest listeners

ASSIGNED
Assigned to

Status

()

Toolkit
WebExtensions: Request Handling
P3
normal
ASSIGNED
11 months ago
a day ago

People

(Reporter: mao, Assigned: mixedpuppy)

Tracking

(Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: triaged)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments, 6 obsolete attachments)

(Reporter)

Description

11 months ago
Both Adblock Plus and NoScript (and presumably similar extension) have use cases for knowing the whole ancestors chain, up to the top frame, for both document and sub resources loads.

This is fairly simple to obtain in a nsIContentPolicy (which we still use, but only for non-HTTP loads) by directly walking up the frames hierarchy; on the other hand, AFAIK, is currently impossible in HTTP observers living in the parent process, which is how onBeforeRequest is implemented in the HTTP case.

Therefore I believe we preliminarily need to augment nsILoadInfo by adding another array attribute "ancestors", similar to redirectChainIncludingInternalRedirects, but filling it with the loading element's ancestors principals, starting with parent frame if any: if this is the top frame, the array would be empty, otherwise the first element would be the same as nsILoadInfo.loadingPrincipal, the second element would be the parent document's principal and so on, up to the top frame.

Once nsILoadInfo.ancestors is in place, we can use it in onBeforeRequest to fill an analogous "ancestors" array attribute in the event's data object, with the URLs (as strings)  of the collected principals.

Kris, Bill, what do you think?
(Reporter)

Updated

11 months ago
Blocks: 1166496
OS: Unspecified → All
Hardware: Unspecified → All

Comment 1

11 months ago
Would that also be include the owner frames for loads from workers?
(Reporter)

Comment 2

11 months ago
(In reply to The 8472 from comment #1)
> Would that also be include the owner frames for loads from workers?

I suspect that would be significantly more difficult, unless we cache this information at worker creation time.
Asking someone surely more knowledgeable than me. Andrea?
Flags: needinfo?(amarchesini)

Updated

11 months ago
Whiteboard: triaged

Updated

10 months ago
Blocks: 1226547
(Reporter)

Updated

10 months ago
Blocks: 1214733
Assignee: nobody → evilpies
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
I find it difficult to come up with a good definition of what should actually be included in "frameAncestors".
Let's say we have some setup like this "top level document > iframe 1 > iframe 2 > image.png"
top level document => [] ancestors
iframe 1 => [] or [top level] ? "top level document" would already be the originUrl in this case
iframe 2 => [iframe 1, top level] or [top level], again "iframe1" would be the originUrl
image.png => [iframe 1, top level] or [iframe 2, iframe 1, top level]

Comment 4

7 months ago
these seem the most logical choices to me:

> top level document => [] ancestors
> iframe 1 => [top level]
> frame 2 => [iframe 1, top level]
> image.png => [iframe 2, iframe 1, top level]


But there is one further complication, the triggering principal. When an image is loaded by CSS, shouldn't we get the following?

image.png => [stylesheet, iframe2, iframe1, top level]
Created attachment 8828906 [details] [diff] [review]
WIP frameAncestors

This sort of mixes two different ones, sorry. ContentPolicy vs LoadInfo is different here.

LoadInfo behaves like this
top level => []
iframe 1 => []
iframe 2 => [top level]
image.png => [iframe 1, iframe 2]

The content policy implementation is a bit different.

Comment 6

7 months ago
I think anything that requires post-processing by the addon to *add* information would end up being stateful. E.g. resolving tab or frame IDs to URIs would require keeping state about those relationships. URIs are ambiguous, frame IDs get reused if the frames get navigated (backwards navigation!). So it would be easy to miss some edge-cases.

Snapshotting as much information as possible during the correct time in the child process should yield the most accurate picture. Stripping information out of the array would be a lot easier than trying to fill in the blanks.
(Assignee)

Comment 7

7 months ago
I just illustrated a way to get the tab during a request in bug 1330600 comment 9, could something along that line be used to work your way up through the ancestors?

Comment 8

7 months ago
(In reply to Shane Caraveo (:mixedpuppy) from comment #7)
> I just illustrated a way to get the tab during a request in bug 1330600
> comment 9, could something along that line be used to work your way up
> through the ancestors?

a) that uses async APIs, which means it's potentially subject to racy behavior
b) it only works for top level frames, i.e. the tab itself. there's no API to query information on frames
(Assignee)

Comment 9

7 months ago
webNavigation.getAllFrames wouldn't work?  Arguably a weird place for that api, it would feel more correct if it were tabs.Tab.getAllFrames.  In fact it seems like that provides whats necessary for an ancestor chain.
(Assignee)

Comment 10

7 months ago
and I'm really not sure about the racy behavior, all the apis are async.  if you're using blocking onBeforeRequest, the request is suspended so subrequests wouldn't happen.  What would be happening to cause racy conditions?

Comment 11

7 months ago
Oh, right, my mistake. That leaves the races.

> What would be happening to cause racy conditions?

Perhaps javascript in the child process navigating pages or frames? Especially back navigation, which won't trigger a webrequest. And with out of process addons other addons might be acting concurrently on other requests (suspending one request won't necessarily delay processing of others AIUI) and trigger additional actions. So anything could happen in the meantime.
(Assignee)

Comment 12

7 months ago
(In reply to The 8472 from comment #11)
> Oh, right, my mistake. That leaves the races.
> 
> > What would be happening to cause racy conditions?
> 
> Perhaps javascript in the child process navigating pages or frames?

I don't see how using webrequest with a pre-filled ancestor chain would change that.

> Especially back navigation, which won't trigger a webrequest. 

Both the OP and the WIP attached to this bug only address webrequest, so I'm not clear how that makes a difference.

> And with out
> of process addons other addons might be acting concurrently on other
> requests (suspending one request won't necessarily delay processing of
> others AIUI) and trigger additional actions. So anything could happen in the
> meantime.

Suspending a request will prevent the resource being loaded from loading other resources.  ie. the html page in a frame being suspended, the scripts/etc. it loads would not be available at that time.

Even with the changes in the WIP, multiple addons may be processing the request anyway and there is no promised order to that, so I don't see how doing this low level vs. using getAllFrames makes any difference there.

I am playing devils advocate here, and am not really against this change if it adds real value, but I'm trying to understand what it introduces that cannot be done already.

Comment 13

7 months ago
> Suspending a request will prevent the resource being loaded from loading other resources.

Child resources. But what about the ancestors? If the ancestor changes in the meantime, e.g. by frame navigation then getAllFrames might return frame URLs that are inconsistent with the ancestor chain that actually triggered the request.
(Reporter)

Comment 14

7 months ago
My use case, and the reason why I filed this bug, depends on the information about ancestors of the document which triggered the load being an accurate snapshot, freezing the moment when the load has been attempted.
When I look at the request info asynchronously I don't care if something has changed in the meanwhile, provided that my request is suspended, because I need to decide about that snapshot info.
Example (and an actual requirement of the Tor Browser): I need to know whether all the parents of the document being loaded had an HTTPS URI *when the load started*.

Therefore, taking the ancestor chain snapshot in the child process at the moment the load is triggered is the right thing to do even if the processing happens asynchronously, later, on a suspended channel. Looking at the ancestors chain "live" at processing time (i.e. generally after the fact) using getAllFrames() is not OK at all, as suggested also in comment 13.
(Assignee)

Comment 15

7 months ago
Ok, that is a much clearer requirement, makes sense.  From the looks of this, you only need the urls, not a frameId or anything.  Knowing which frame a request was made in is not necessary?  I haven't looked deeply at the WIP, but I wonder if the frameId would be easy to just add in case it was useful.
(Reporter)

Comment 16

7 months ago
(In reply to Shane Caraveo (:mixedpuppy) from comment #15)
> Ok, that is a much clearer requirement, makes sense.  From the looks of
> this, you only need the urls, not a frameId or anything.  Knowing which
> frame a request was made in is not necessary?  I haven't looked deeply at
> the WIP, but I wonder if the frameId would be easy to just add in case it
> was useful.

The frameIds are useful (required in some real use cases in NoScript / Tor Browser) to associate and preserve some state to the frame structure, e.g. in order to make permission decisions based on previous decisions for the ancestors (like "Cascade permissions" or "Restrict subframes if parent or top frame has been restricted") or other data from the requests which loaded the ancestors (e.g. HTTP headers like CSP).
frameIDs are a good idea, I didn't even think about adding those. Would be unfortunate if we had to add yet another array to LoadInfo for this though.

Comment 18

7 months ago
If I wanted to gather additional information in the child process and add it to the requestinfo, would that be possible through webext experiments?
(Assignee)

Comment 19

7 months ago
(In reply to Tom Schuster [:evilpie] from comment #17)
> frameIDs are a good idea, I didn't even think about adding those. Would be
> unfortunate if we had to add yet another array to LoadInfo for this though.

Why would it need to be another array?  Just a new interface and an array of that interface.  In fact, it could probably just match the objects from webNavigation.getAllFrames.

[{tabId, processId, frameId, parentFrameId, url},...]
(Assignee)

Comment 20

7 months ago
(In reply to The 8472 from comment #18)
> If I wanted to gather additional information in the child process and add it
> to the requestinfo, would that be possible through webext experiments?

If you mean the details objects that webrequest listeners receive, I don't think so, but an interesting thought (at least in respect to webRequest).  If not, what is requestinfo?

Comment 21

7 months ago
Yes, I meant the details.

Comment 22

7 months ago
Would it be possible to include information about html5 sandboxing restriction then? I think for privacy addons it might make sense to differentiate between sandboxed and non-sandboxed loads since the former can't use any persistent storage.

Updated

7 months ago
webextensions: --- → ?
For frameIDs, we would probably just pass the outerWindowID around and lookup the right frame id in the webextension code. This should be easily doable.
(Assignee)

Comment 24

6 months ago
Tom, we'd like to target this for 55.  Let me know if you'd like me to take over.
Flags: needinfo?(evilpies)
No I am still working on this, we are easily going to make 55.
Flags: needinfo?(evilpies)
Created attachment 8849070 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs
Created attachment 8849072 [details] [diff] [review]
Expose frameAncestors to webextensions

I changed frameAncestors to an array of objects, with the keys url and windowId, that should allow us to extend this in the future as required.

I still need to finish writing some tests, especially around the "WebRequest content policy" and make sure the behavior matches with the loadInfo implementation.
Attachment #8828906 - Attachment is obsolete: true
Created attachment 8849082 [details] [diff] [review]
Expose frameAncestors to webextensions

This changes frameAncestors to behave like described in Comment 4. I find this should be the least surprising behavior for users.

Now the loadInfo based code should match the code for content policy.
Attachment #8849072 - Attachment is obsolete: true

Updated

5 months ago
webextensions: ? → +
(Assignee)

Updated

2 months ago
Assignee: evilpies → mixedpuppy
(Assignee)

Comment 29

2 months ago
Giorgio, did you get a chance to look at the patches?
Flags: needinfo?(g.maone)
(Assignee)

Comment 30

2 months ago
Tom, do you have any notes or comments about the last state of the patches?
Flags: needinfo?(evilpies)
Sorry, I am on vacation and can't access my work for about two weeks. I had some updated patches and started working on some tests, but it's basically not that different from what I is in my last patch.
Flags: needinfo?(evilpies)
(Assignee)

Comment 32

2 months ago
Tom, are those in a state where they could be attached here?  I took the bug, but if you're planning to finish it in the next couple weeks you can take it back.
Flags: needinfo?(evilpies)

Comment 33

a month ago
As I mentioned earlier, having effective sandbox properties and origin attributes per ancestor frame would be useful too.
I can attach the patches next week, not sure what the state is anymore. The patches aren't actually that large though.
Flags: needinfo?(evilpies)
Created attachment 8887049 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs

Not sure if we are okay with adding this additional information to all LoadInfos?
Attachment #8849070 - Attachment is obsolete: true
Attachment #8887049 - Flags: feedback?(mixedpuppy)
Created attachment 8887051 [details] [diff] [review]
Expose frameAncestors to webextensions

I just rebased those patches, but nothing substantial changed. The test I wrote doesn't pass anymore, I think because we always give the main_frame an id of 0 now. I am not totally convinced that the ContentPolicyManager matches the HttpObserverManager.
Attachment #8849082 - Attachment is obsolete: true
Attachment #8887051 - Flags: feedback?(mixedpuppy)
Attachment #8887051 - Attachment is patch: true
(Assignee)

Comment 37

22 days ago
Comment on attachment 8887049 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs

This is a patch I cannot really evaluate, see if we can find someone who can.
Attachment #8887049 - Flags: feedback?(mixedpuppy)
Attachment #8887049 - Flags: feedback?(honzab.moz)
Attachment #8887049 - Flags: feedback?(bzbarsky)
Comment on attachment 8887049 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs

Review of attachment 8887049 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a loadinfo peer, but personally would not be that much against this.
Attachment #8887049 - Flags: feedback?(honzab.moz)
Comment on attachment 8887049 [details] [diff] [review]
LoadInfo changes to include all ancestors principals and window IDs

This seems ok as a general idea.  Some caveats:

1)  If we ever decide to do out of process iframes, this would be somewhat of an impediment to doing so.  That said, if we end up implementing something like ancestorOrigins then we would have at least some form of origin information.  But window ids might end up meaningless in the oop iframe case, because they're per-process.

2)  It's not clear to me what the expected thing is here in sandboxing without allow-same-origin scenarios.  If adblockers expect to see the URL of the thing being sandboxed, you're not going to get that from the principal.

3)  The API documentation being added doesn't make much sense for non-document loads.  It needs to be fixed.

4)  property-getter APIs that return a new object each time are a bit of an antipattern.  Methods are probably better, if we don't want to cache the object.

I didn't check the details of how we're getting the things to put in the arrays; it's not worth spending time on until at least items 2 and 3 above are addressed.
Attachment #8887049 - Flags: feedback?(bzbarsky) → feedback+
(Assignee)

Comment 40

22 days ago
Comment on attachment 8887051 [details] [diff] [review]
Expose frameAncestors to webextensions

This all seems fine to me.
Attachment #8887051 - Flags: feedback?(mixedpuppy) → feedback+
(Assignee)

Comment 41

22 days ago
(In reply to Boris Zbarsky [:bz] from comment #39)
> Comment on attachment 8887049 [details] [diff] [review]

> 1)  If we ever decide to do out of process iframes, this would be somewhat
> of an impediment to doing so.  That said, if we end up implementing
> something like ancestorOrigins then we would have at least some form of
> origin information.  But window ids might end up meaningless in the oop
> iframe case, because they're per-process.

How much should we worry about this being an impediment?  If we went OOP for iframes in the next year (or perhaps even two) we should certainly figure out something better.
Flags: needinfo?(bzbarsky)
I really doubt we can do OOP iframes in the next year.  I can't speak to the next two.

One simple option in terms of the API surface is to ship over the process id, not just the window id, and then the recipient can figure things out as needed.
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 43

10 days ago
(In reply to Limited Availabilty till Oct | Tom Schuster [:evilpie] from comment #36)
> Created attachment 8887051 [details] [diff] [review]
> Expose frameAncestors to webextensions
> 
> I just rebased those patches, but nothing substantial changed. The test I
> wrote doesn't pass anymore, I think because we always give the main_frame an
> id of 0 now. I am not totally convinced that the ContentPolicyManager
> matches the HttpObserverManager.

Can you attach your test?

If you've got cycles and want to help in any of this let me know.
Flags: needinfo?(evilpies)
(Assignee)

Comment 44

9 days ago
(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #39)

> 2)  It's not clear to me what the expected thing is here in sandboxing
> without allow-same-origin scenarios.  If adblockers expect to see the URL of
> the thing being sandboxed, you're not going to get that from the principal.

Currently, there is no way to see a request inside a sandboxed iframe with webrequests, though I intend to change that.  

With a little hackery in the webrequest module, this is what I see for a sandboxed request without allow-same-origin, and it provides what I would have expected.  

It doesn't provide the url for the iframe itself (this is an image loaded in the iframe), which may prove to be an issue, but I'm not overly concerned with it (unless there is a very simple fix).

Did you expect something else?

{
  "requestId": "12",
  "url": "http://mochi.test:8888/.../file_image_good.png#sandboxed",
  "originUrl": "moz-nullprincipal:{03fd5623-648a-7748-a5c0-0ab591bcd52d}",
  "documentUrl": "moz-nullprincipal:{03fd5623-648a-7748-a5c0-0ab591bcd52d}",
  "method": "GET",
  "tabId": 1,
  "type": "image",
  "timeStamp": 1502313009296,
  "frameId": 2147483664,
  "parentFrameId": 2147483662,
  "frameAncestors": [{
      "url": "http://mochi.test:8888/.../file_simple_xhr_frame2.html",
      "frameId": 2147483662
    }, {
      "url": "http://mochi.test:8888/.../file_simple_xhr_frame.html",
      "frameId": 2147483660
    }, {
      "url": "http://mochi.test:8888/.../file_simple_xhr.html?nocache=0.7630002042184172",
      "frameId": 0
    }
  ]
}
Flags: needinfo?(bzbarsky)
(Assignee)

Updated

9 days ago
See Also: → bug 1388889
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 days ago
Attachment #8887049 - Attachment is obsolete: true
(Assignee)

Updated

9 days ago
Attachment #8887051 - Attachment is obsolete: true
(Assignee)

Comment 47

9 days ago
Comment on attachment 8895588 [details]
Bug 1305237 LoadInfo changes to include all ancestors principals and window IDs

This should address points 3 & 4, see comment 44 for point 2.
Attachment #8895588 - Flags: feedback?(bzbarsky)
(Assignee)

Comment 48

9 days ago
Comment on attachment 8895589 [details]
Bug 1305237 Expose frameAncestors to webextensions

Can you look at the tests?
Attachment #8895589 - Flags: feedback?(evilpies)
Comment on attachment 8895589 [details]
Bug 1305237 Expose frameAncestors to webextensions

https://reviewboard.mozilla.org/r/166796/#review172186

Looks good! I added this tiny test: https://pastebin.mozilla.org/9029323
Flags: needinfo?(evilpies)
Attachment #8895589 - Flags: feedback?(evilpies) → feedback+
> It doesn't provide the url for the iframe itself (this is an image loaded in the iframe),
> which may prove to be an issue

Right, that is the question.

> unless there is a very simple fix

The only fix I can think of would be to go from a request context to a window or document and get the url...

> Did you expect something else?

No.  The question is what extensions using these APIs will expect.

In particular, using the principal for "documentURL" seems a bit odd to me.  Maybe as a followup we _should_ do the "go to a document/window" thing.
Flags: needinfo?(bzbarsky)

Comment 51

9 days ago
mozreview-review
Comment on attachment 8895588 [details]
Bug 1305237 LoadInfo changes to include all ancestors principals and window IDs

https://reviewboard.mozilla.org/r/166794/#review172466

::: netwerk/base/LoadInfo.cpp:112
(Diff revision 1)
> +      do {
> +        if (!parent || prevParent == parent) {
> +          break;
> +        }
> +
> +        nsCOMPtr<nsIScriptObjectPrincipal> scriptObjPrin = do_QueryInterface(parent);

This worries me some.

In particular, if we can ever end up in this code for a document which has a navigated-away-from ancestor, then the principals will be wrong.  Ideally we'd walk the parent/embedding document chain instead, because that's the right thing that actually shows the provenance of our document.  This keeps coming up; we should really add a nice accessor for it on nsIDocument or nsPIDOMWindowInner or something...

::: netwerk/base/LoadInfo.cpp:118
(Diff revision 1)
> +        if (!scriptObjPrin) {
> +          break;
> +        }
> +
> +        mAncestors.AppendElement(scriptObjPrin->GetPrincipal());
> +        mAncestorsOuterWindowIDs.AppendElement(parent->WindowID());

Again, not clear whether outer ids or inner ids are more useful here.  What are the expected use cases for the ids?

And again, we should think about what the API exposed to consumers is; in a world where the ids are non-unique because we have multiple processes...

::: netwerk/base/LoadInfo.cpp:349
(Diff revision 1)
>                     bool aInitialSecurityCheckDone,
>                     bool aIsThirdPartyContext,
>                     const OriginAttributes& aOriginAttributes,
>                     RedirectHistoryArray& aRedirectChainIncludingInternalRedirects,
>                     RedirectHistoryArray& aRedirectChain,
> +                   nsTArray<nsCOMPtr<nsIPrincipal>>& aAncestors,

This seems like a footgun: a caller can pass in an array and it will get nuked on them, and that's pretty unexpected.

This should probably take an rvalue reference.  At least then the caller will need to explicitly Move() if they want to pass stuff here and it will be clear that it's getting moved.

::: netwerk/base/LoadInfo.cpp:394
(Diff revision 1)
>  
>    mRedirectChainIncludingInternalRedirects.SwapElements(
>      aRedirectChainIncludingInternalRedirects);
>  
>    mRedirectChain.SwapElements(aRedirectChain);
> +  mAncestors.SwapElements(aAncestors);

And you wouldn't need this; the rvalue-reference ctor that nsTArray has does the right thing.  You'd just mAncestors(Move(aAncestors)) in the initializer list.
Attachment #8895588 - Flags: feedback?(bzbarsky) → feedback+
(Assignee)

Comment 52

7 days ago
silence.
Flags: needinfo?(g.maone)
Flags: needinfo?(amarchesini)
(Assignee)

Comment 53

7 days ago
mozreview-review-reply
Comment on attachment 8895588 [details]
Bug 1305237 LoadInfo changes to include all ancestors principals and window IDs

https://reviewboard.mozilla.org/r/166794/#review172466

> Again, not clear whether outer ids or inner ids are more useful here.  What are the expected use cases for the ids?
> 
> And again, we should think about what the API exposed to consumers is; in a world where the ids are non-unique because we have multiple processes...

We use outer ids in webrequest, and these may need to be matched there.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 56

a day ago
mozreview-review-reply
Comment on attachment 8895588 [details]
Bug 1305237 LoadInfo changes to include all ancestors principals and window IDs

https://reviewboard.mozilla.org/r/166794/#review172466

> This worries me some.
> 
> In particular, if we can ever end up in this code for a document which has a navigated-away-from ancestor, then the principals will be wrong.  Ideally we'd walk the parent/embedding document chain instead, because that's the right thing that actually shows the provenance of our document.  This keeps coming up; we should really add a nice accessor for it on nsIDocument or nsPIDOMWindowInner or something...

I've looked at this a bit and am not entirely certain what to do here, could use some further guidance.

> This seems like a footgun: a caller can pass in an array and it will get nuked on them, and that's pretty unexpected.
> 
> This should probably take an rvalue reference.  At least then the caller will need to explicitly Move() if they want to pass stuff here and it will be clear that it's getting moved.

I think I have these fixed.
You need to log in before you can comment on or make changes to this bug.