Closed Bug 1255270 (CVE-2016-2830) Opened 8 years ago Closed 8 years ago

Favicon request doesn't timeout, or close when related window is closed

Categories

(Firefox :: Tabbed Browser, defect)

45 Branch
defect
Not set
major

Tracking

()

VERIFIED FIXED
Firefox 49
Tracking Status
firefox46 --- wontfix
firefox47 + wontfix
firefox48 + verified
firefox49 + verified
firefox-esr45 48+ verified
firefox50 --- verified

People

(Reporter: toni.mozilla, Assigned: Gijs)

References

Details

(Keywords: privacy, sec-high, Whiteboard: [adv-main48+][adv-esr45.3+])

Attachments

(5 files, 4 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160303134406

Steps to reproduce:

The Firefox does background request to load favicon.ico from the current web page. Unexpectedly, the connection is not terminated when the website window is closed which gives an server owner an ability to send data regularly to the browser and figure out when the transmission fails indicating the browser close. 

It is also possible to measure pings using ACKs. Theoretically, could it be possible to recognize mobile device geological position from a small list of alternative positions based on ping flow? Remember that the pinging is possible as long as browser is on.


Actual results:

No timeout on favicon connections.


Expected results:

Timeout on favicon connections.
Severity: normal → major
See Also: → 1255267
Also Tor Browser is affected. 'New Identity' feature triggers browser restart which will not kill the connection.
Summary: Favicon connection does not timeout → Browser open time leakage
Component: Untriaged → Tabbed Browser
Flags: sec-bounty?
Group: firefox-core-security → core-security
Component: Tabbed Browser → ImageLib
Product: Firefox → Core
Keywords: privacy
ImageLib doesn't make network requests. Maybe this is a networking bug but I think Gijs was right to blame it on the front-end which makes the favicon request. Maybe it needs to be put into the document's LoadGroup, or goverened under some other window-closing-event-watching mechanism.


> It is also possible to measure pings using ACKs. Theoretically, could it be possible to
> recognize mobile device geological position from a small list of alternative positions
> based on ping flow? Remember that the pinging is possible as long as browser is on.

Could you expand on what you mean here? Is this an extension (practical use) of this same bug or are you describing a different issue?
Group: core-security → firefox-core-security
Component: ImageLib → Tabbed Browser
Flags: needinfo?(toni.huttunen)
Product: Core → Firefox
Summary: Browser open time leakage → Favicon request doesn't timeout, or close when related window is closed
(In reply to Daniel Veditz [:dveditz] from comment #2)
> ImageLib doesn't make network requests. Maybe this is a networking bug but I
> think Gijs was right to blame it on the front-end which makes the favicon
> request. Maybe it needs to be put into the document's LoadGroup,

This is hard, especially under e10s. We could conceivably load it in-content and then message a data URI (or maybe a blob URI if those work x-process, not sure) to the parent, but that doesn't sound like much fun for perf, either. Even then though, I don't think we can easily add it under the document's load group.

As it is, in this case the image load as happens in the XUL <image> element should get stopped when the image stops being displayed or its src attribute changes - one of which is pretty much guaranteed to happen when you load a different URL. That we don't disconnect might be either a XUL or an imagelib/networking issue - not sure who calls the shots for that load, but we can't currently do much about it beyond what already happens from the frontend. Seth might still be a good person to ask how this is 'supposed' to work, so I'm pinging him. :-)

> or
> goverened under some other window-closing-event-watching mechanism.

This is theoretically possible but seems much more error-prone.
Flags: needinfo?(seth)
(In reply to :Gijs Kruitbosch from comment #3)
> the image stops being displayed or its src attribute
> changes - one of which is pretty much guaranteed to happen when you load a
> different URL.

To be slightly more specific, AIUI there are two cases here:

1) you close the tab. The tab and all its content, including the favicon XUL <image>, get removed from the DOM, and (I should expect...) GC'd.
2) you navigate elsewhere. We remove the image attribute from the tab, or set it to some other URL, which through XBL inheritance ends up modifying the src attribute on the XUL <image>. I think this, too, should cancel/abort the load of the initial image that we load.
Blocks: 1255267
(In reply to Daniel Veditz [:dveditz] from comment #2)
> ImageLib doesn't make network requests. Maybe this is a networking bug but I
> think Gijs was right to blame it on the front-end which makes the favicon
> request. Maybe it needs to be put into the document's LoadGroup, or
> goverened under some other window-closing-event-watching mechanism.
> 
> 
> > It is also possible to measure pings using ACKs. Theoretically, could it be possible to
> > recognize mobile device geological position from a small list of alternative positions
> > based on ping flow? Remember that the pinging is possible as long as browser is on.
> 
> Could you expand on what you mean here? Is this an extension (practical use)
> of this same bug or are you describing a different issue?
You are right that it could be a separate issue. I made it here because this bug makes it more practical. So, I saw it as practical use of this bug on the mobile environment. After page visit your device could be theoretically tracked by monitoring timeline of gsm latency which changes based on obstacles and distances to the active mobile tower. If you see that it should be more than remark I can fill a new bug. In any case, I don't see any reasonable fix for it so I think the solution is WONTFIX.
Flags: needinfo?(toni.huttunen)
Is there any more progress to brief us on yet?
Jet, can you assign someone to take a look at this since it has been sitting for three weeks with no response to ni?
Flags: needinfo?(bugs)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(seth) → needinfo?(bugs)
Keywords: sec-high
This is definitely not an ImageLib bug, and I don't think there's any behavior specific to XUL images here, either. Have we verified that the connection is not being kept alive by the favicon service? Because that's what I'd expect. Here's the entry point for loading a favicon:

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsFaviconService.cpp#210

I'm not really familiar with the favicon-related code in the frontend. Gijs, does this diagnosis seem correct to you?

If so, we should probably add a timeout for loading favicons, even if we continue to receive data. That's something that network folks would know more about than I would.
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bugs)
(In reply to Seth Fowler [:seth] [:s2h] from comment #8)
> This is definitely not an ImageLib bug, and I don't think there's any
> behavior specific to XUL images here, either. Have we verified that the
> connection is not being kept alive by the favicon service? Because that's
> what I'd expect. Here's the entry point for loading a favicon:
> 
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/places/
> nsFaviconService.cpp#210
> 
> I'm not really familiar with the favicon-related code in the frontend. Gijs,
> does this diagnosis seem correct to you?

Hm, it could be. I struggled to reproduce with the attached testcase a few days ago, but I probably did something wrong...

> If so, we should probably add a timeout for loading favicons, even if we
> continue to receive data. That's something that network folks would know
> more about than I would.

This is places code, so I'm forwarding this to Marco (sorry, Marco).
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mak77)
Hi Gijs, 
I would like to hear that you or someone else can reproduce. 

Here are steps that could help:

favicon.html:
- 1. set wait value for a bit less, for example use line: var wait = i*2; It's easier to test with higher request interval than what the default value is (1 min). 
- 2. set target address favicon("http://example.com:1337 to point to your server
redir.py:
- 1. start server: python redir.py 
- 2. ensure connection and redirection works and you can make request using your browser: http://example.com:1337/?wait=3&url=http://google.com

Logout from google service (so we can prove the request can be done at the later time). After that, open favicon.html with your browser and wait 10 sec delay until it redirects to google. Now, login to your google account and view your search history. You should see search requests like CSRF-hack<number> when the attack success. Before that, you may see a lot of traffic on the python service, like multiple requests are connecting and disconnecting. You should see csrf attacks when page is closen, socket errors when browser is closen, and when you kill your python service and restart it you see reconnections.
This is where the browser passes info to the favicon service:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/tabbrowser.xml#885
The favicons service loads the icon asynchronously and then stores it in the db:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/AsyncFaviconHelpers.cpp#538

Off-hand I don't know if there's any silver bullet in netwerk allowing to set a timeout on a channel, but in the worst case I suppose we may just check elapsed time in OnDataAvailable.
Do we agree the timeout solution would be enough to solve this? In the end it would disallow abusing a very long open connection, so it seems it may do.

Though, are we sure favicons are the only thing we load in background? Cause otherwise it may be better to evaluate a more general approach/solution.
Some of the things that came to my mind (downloads, RSS) are added by the user, so unlikely to be exploitable just by visiting a page... Anything else that comes to your mind?
Flags: needinfo?(mak77)
(In reply to Toni Huttunen from comment #10)
> Now, login to your google account and
> view your search history.

I don't have search history enabled, for the obvious security- and privacy-related reasons, so I cannot follow these steps.

I also don't understand why the testcase is so complex - the point of this particular bug is that the connection outlasts the tab, and as long as the python script accurately indicates whether the connection is alive that can be proved without the requests to google, no?

So I instead tried to use an empty HTML page that just used the relevant favicon, load it, then close the tab (but not the browser), and rely on the console output from redir.py - but that kept erroring out with raw python "client disconnected" errors.

The CSRF thing doesn't really make sense to me. If your steps involve logging out of google before ever opening the page with the favicon, how do the requests end up in a logged-in account's search history? That seems like a completely orthogonal CSRF issue with Google, that you could also reproduce if you made those requests via e.g. <img> elements on the original page, no? What am I missing?
Flags: needinfo?(toni.huttunen)
(In reply to Marco Bonardo [::mak] from comment #11)
> This is where the browser passes info to the favicon service:
> http://mxr.mozilla.org/mozilla-central/source/browser/base/content/
> tabbrowser.xml#885
> The favicons service loads the icon asynchronously and then stores it in the
> db:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/
> AsyncFaviconHelpers.cpp#538
> 
> Off-hand I don't know if there's any silver bullet in netwerk allowing to
> set a timeout on a channel, but in the worst case I suppose we may just
> check elapsed time in OnDataAvailable.

We could start our own one-shot nsITimer and keep a reference to the channel and abort it when the timer expires? That seems simplest if netwerk doesn't have a more general way of doing this.

> Do we agree the timeout solution would be enough to solve this? In the end
> it would disallow abusing a very long open connection, so it seems it may do.

Well, either that or make the favicon service return something that we can use to kill off the request when the tab or window goes away, or when setIcon is called again. That would require keeping track of all of that per-browser window, though, which is annoying. Then there's bug 1166891 as well...

The more I think about it the more I feel like it would make more sense for us to fetch the favicon from docshell-land and use a message to pass a blob/data thing to the parent process. It would fix a lot of the issues we have with favicons as it is (and probably get us new, as-yet-unknown ones instead...). What do you think?

> Though, are we sure favicons are the only thing we load in background? Cause
> otherwise it may be better to evaluate a more general approach/solution.
> Some of the things that came to my mind (downloads, RSS) are added by the
> user, so unlikely to be exploitable just by visiting a page... Anything else
> that comes to your mind?

Not really... most other things would be in-document and get nuked as part of the document's load group going away, I'd think.

Though don't service workers generally allow pages to do requests after they have been shut down? Maybe we explicitly disallow off-domain http redirects for those? I don't know...
Flags: needinfo?(mak77)
The (new) requests end up to the logged-in account because you do the login later. The point is that you can attack to page which has no active session currently but later has.

Also, I'm not sure if the timeout is the only issue. Even if you add timeout to a single favicon request it seems like they are queued(?) and only dozen(?) of them are active at the same time. You can just fill the queue and make more requests (even they do timeout)
(In reply to :Gijs Kruitbosch from comment #13)
> We could start our own one-shot nsITimer and keep a reference to the channel
> and abort it when the timer expires? That seems simplest if netwerk doesn't
> have a more general way of doing this.

yes, that's possible.

> > Do we agree the timeout solution would be enough to solve this? In the end
> > it would disallow abusing a very long open connection, so it seems it may do.
> 
> Well, either that or make the favicon service return something that we can
> use to kill off the request when the tab or window goes away, or when
> setIcon is called again.

It's also feasible from the favicons service api I think. But yes, it requires some bookkeeping to keep around the cancelable object. May not be so bad though, if the API just returns an object with a cancel() method and we store it in a property of the parent browser.

> The more I think about it the more I feel like it would make more sense for
> us to fetch the favicon from docshell-land and use a message to pass a
> blob/data thing to the parent process. It would fix a lot of the issues we
> have with favicons as it is (and probably get us new, as-yet-unknown ones
> instead...). What do you think?

The favicon service already supports setting favicon data before a call to SetAndFetchFaviconForPage (through replaceFaviconData or replaceFaviconDataFromDataURL) and in such a case it will use the handed data instead of fetching it from the network.
The problem is that when we ask the favicon service to store an icon, it first go checking if we already have a non expired favicon for that page url, and, if so, we just ignore the request and keep using the icon we already have. Icons expire according to cache rules, IIRC after 7 days.
If we move the loading out of the service, we'll need an API to ask if it wants an updated icon or not. We could add an ignoreExpired argument to getFaviconURLForPage for that.
Another problem is that the content process will have to ask the parent process if it should load the icon, wait for the answer, load the icon, then pass it back to the parent that will store it... it starts being complex.

> Not really... most other things would be in-document and get nuked as part
> of the document's load group going away, I'd think.

but we can't pass through the loadGroup cause it's in the in-content process, while we are in the chrome process, right?
Flags: needinfo?(mak77)
(In reply to Toni Huttunen from comment #14)
> Also, I'm not sure if the timeout is the only issue. Even if you add timeout
> to a single favicon request it seems like they are queued(?) and only
> dozen(?) of them are active at the same time. You can just fill the queue
> and make more requests (even they do timeout)

How do you make more requests when the page has been closed / navigated away from? That's the attack vector you're describing here, right? When the page is not running, it shouldn't be possible to trigger new favicon requests... As long as the timeout relates to when they got added to the queue, instead of when they start actually requesting data from the network, a timeout would work.
Yeah. Timeout will work if it relates to time when it was added to the queue and not when the actual transmission begins.
Flags: needinfo?(toni.huttunen)
(In reply to Marco Bonardo [::mak] from comment #15)
> It's also feasible from the favicons service api I think. But yes, it
> requires some bookkeeping to keep around the cancelable object. May not be
> so bad though, if the API just returns an object with a cancel() method and
> we store it in a property of the parent browser.

Right, I'm just worried we'll mess up the bookkeeping some way. It sounds error-prone.

Thinking about this more... what we really want is for the favicon request to be associated with the (inner) window of the content going away. We can't do that directly because the load group lives in the content process and the favicon requests live in the parent.

However... we can keep track of inner content windows going away using the dom-window-destroyed observer notification. If we associate the setIcon and the favicon service requests with an inner window ID, and we send an event/message/notification/whatever to the parent process when that inner window ID goes away, we can kill off all the requests associated with the window in the favicon service (which I assume is running in the parent...).

Does that sound better, or am I missing something else? Should probably run this by smaug/bz or someone like that as well to see how evil it is to depend on dom-window-destroyed for this... but from a quick test it seems like that could work?
Flags: needinfo?(mak77)
(In reply to :Gijs Kruitbosch from comment #18)
> However... we can keep track of inner content windows going away using the
> dom-window-destroyed observer notification. If we associate the setIcon and
> the favicon service requests with an inner window ID, and we send an
> event/message/notification/whatever to the parent process when that inner
> window ID goes away, we can kill off all the requests associated with the
> window in the favicon service (which I assume is running in the parent...).
> 
> Does that sound better, or am I missing something else? Should probably run
> this by smaug/bz or someone like that as well to see how evil it is to
> depend on dom-window-destroyed for this... but from a quick test it seems
> like that could work?

I think it's the safest approach. And we could still have a more general timeout in the favicons service, as a second shield, since regardless we don't want a network fetch to live forever and add-ons may not do the right thing.
I filed bug 1265420 to add an optional cancelable to setAndFetchFaviconForPage API, that can be public since it's a simple API change that doesn't expose any danger. I'll start looking into that to ensure it's feasible.
Flags: needinfo?(mak77)
bug 1265420 landed on inbound, so this should now be actionable.
(In reply to Marco Bonardo [::mak] from comment #20)
> bug 1265420 landed on inbound, so this should now be actionable.

I'll look at this next week.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Attached patch Patch v1 (obsolete) — Splinter Review
Bigger than I thought it was going to be, but here we go. Specific sub-parts:

1) I created a JSM to do the management of this stuff;
2) I removed direct references to the favicon service from tabbrowser.xml to make that file even slightly smaller...
3) I evicted a bunch of dead refs to the favicon service in feedwriter. We use a <select> dropdown now so we never need icons there anymore, so let's just dump the code.
4) I wrote an automated test;
5) then I realized that the work from bug 1265420 missed out the case of the continuous redirect, so I had to add support in there that kills off the redirect if the request has been cancelled by the time the request got sent.
Attachment #8751377 - Flags: review?(mak77)
How do we feel about the sec-high and backports here (this is like a pre-emptive discussion related to eventual sec-approval)? I think this patch is much too big to backport to ESR. I don't know if we can do a much simpler thing, as it's going to depend on bug 1265420 either way, which was also not small. But I also have to say that I don't understand why this is rated sec-high...
Flags: needinfo?(dveditz)
Comment on attachment 8751377 [details] [diff] [review]
Patch v1

Review of attachment 8751377 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/tabbrowser.xml
@@ +857,5 @@
>            <![CDATA[
>              let browser = this.getBrowserForTab(aTab);
>              browser.mIconURL = aURI instanceof Ci.nsIURI ? aURI.spec : aURI;
>  
> +            if (aURI && window.FaviconManager) {

FWIW, not entirely sure we need this check as to whether the service / manager exists. Why do we check for mFaviconService here?

::: toolkit/components/places/FaviconManager.jsm
@@ +20,5 @@
> +
> +let gWindowToRequestInfoMap = new Map();
> +
> +// Should be const, but isn't because of tests.
> +let FAVICON_REQUEST_TIMEOUT = 60 * 1000;

Self-nit: this can be const after all based on the test I ended up writing.

::: toolkit/components/places/tests/browser/infini-favicon.sjs
@@ +1,1 @@
> +const DELAY_MS = 5000;

Self-nit: this is now dead.

::: toolkit/content/browser-content.js
@@ +1186,5 @@
>    sendAsyncMessage("MozApplicationManifest", info);
>  }, false);
>  
> +if ("mozIAsyncFavicons" in Ci) {
> +  Cu.import("resource://gre/modules/FaviconManager.jsm");

This should be what's going on in browser-child.js and that should be enough.
Attached patch Patch v1.1 (obsolete) — Splinter Review
Attachment #8751377 - Attachment is obsolete: true
Attachment #8751377 - Flags: review?(mak77)
Attachment #8751378 - Flags: review?(mak77)
Comment on attachment 8751378 [details] [diff] [review]
Patch v1.1

Review of attachment 8751378 [details] [diff] [review]:
-----------------------------------------------------------------

The approach is OK, but I have a slightly different vision of how I'd spread this code, that I think in the end would bring to a smaller footprint.
First, I'd move this to browser, it doesn't look in the right place under toolkit/places.
Then, I think some of the code is too abstract for what it's doing, sure in the future we may need to split functions to methods, switch on messages and so on, but for now all of that is pointless, and we should try to not code too much for a future that may never come.

::: browser/base/content/tabbrowser.xml
@@ +940,5 @@
>        <method name="isFailedIcon">
>          <parameter name="aURI"/>
>          <body>
>            <![CDATA[
> +            if (window.FaviconManager) {

yeah I don't know why we do this check. Maybe it's a leftover from XUL Fennec.

@@ +945,3 @@
>                if (!(aURI instanceof Ci.nsIURI))
>                  aURI = makeURI(aURI);
> +              return FaviconManager.isFailedFavicon(aURI);

This abstraction is not really needed, just use PlacesUtils.favicons.isFailedFavicon(aURI)

::: toolkit/components/places/FaviconHelpers.cpp
@@ +512,5 @@
>  )
>  {
> +  // If we've canceled and send NS_BINDING_ABORTED, onStartRequest and
> +  // friends will still be called on oldChannel, and onStartRequest will
> +  // call Cancel() on the request and return early.

doesn't sound very readable, I had to re-read it thrice before understanding what you meant. worth some rewording.
Basically in case of cancel we vetoed the redirect and handle cancel on the original channel.

::: toolkit/components/places/FaviconManager.jsm
@@ +1,4 @@
> +// -*- indent-tabs-mode: nil; js-indent-level: 2 -*-
> +// This Source Code Form is subject to the terms of the Mozilla Public
> +// License, v. 2.0. If a copy of the MPL was not distributed with this
> +// file, You can obtain one at http://mozilla.org/MPL/2.0/.

Considering what this is doing and the fact it's managing browsers, I'd rather put this either into browser/components/places/ or in browser/modules. Toolkit Places is more for the raw APIs to add, edit, remove stuff. This is more browser-related code, that could also only be useful to Firefox.
Also, considering the only API is exposes is load() the FaviconManager names sounds exaggerated. Maybe FaviconLoader or FaviconLoadManager.

I could also suggest another alternative, just expose a loadFavicon from PlacesUIUtils and hide the implementation in an internal FaviconLoadManager object in the PlacesUIUtils global object.

@@ +13,5 @@
> +Cu.import("resource://gre/modules/Services.jsm");
> +Cu.import("resource://gre/modules/Timer.jsm");
> +
> +XPCOMUtils.defineLazyServiceGetter(this, "faviconSvc",
> +  "@mozilla.org/browser/favicon-service;1", "nsIFaviconService");

I'd prefer if you could use PlacesUtils.favicons instead.

@@ +17,5 @@
> +  "@mozilla.org/browser/favicon-service;1", "nsIFaviconService");
> +XPCOMUtils.defineLazyModuleGetter(this, "PrivateBrowsingUtils",
> +  "resource://gre/modules/PrivateBrowsingUtils.jsm");
> +
> +let gWindowToRequestInfoMap = new Map();

is "Info" really needed here? the name looks long enough even without it ;)

@@ +31,5 @@
> +        // Only care about the toplevel window being unloaded, and only when it's not
> +        // about:blank being unloaded in favour of something else.
> +        if (ev.target.defaultView == ev.target.defaultView.top &&
> +            ev.target.documentURI != "about:blank") {
> +          this.onUnload(ev.target.defaultView);

For readability please fully name event argument and let doc = event.target; let win = doc.defaultView;

@@ +42,5 @@
> +   * Handle inner-window-destroyed notifications that happened in the child process.
> +   */
> +  receiveMessage(msg) {
> +    switch (msg.name) {
> +      case "FaviconManager::inner-window-destroyed":

This is over engineered, please just inline a function in addMessageListener and invoke onInnerDestroyed from there.

@@ +65,5 @@
> +
> +  /**
> +   * Actually cancel the request, and clear the timeout for cancelling it.
> +   */
> +  _cancelRequest({request, uri, innerWindowID, timerID}, reason) {

Please better separate the external API and the implementation details, it doesn't make sense to have fake internal methods in the main exposed object, that one should only expose a clean API (that so far looks like being load and isFailedFavicon). implementation details should not be exposed outside of the module.

@@ +83,5 @@
> +      for (let i = 0; i < windowRequestInfos.length; i++) {
> +        let requestInfo = windowRequestInfos[i];
> +        if (requestInfo.innerWindowID == innerID) {
> +          windowRequestInfos.splice(i, 1);
> +          i--;

it's a bit error prone to modify the array you are walking...
Maybe you could use .filter to build a new array, and set it back into windowRequestInfos?

@@ +108,5 @@
> +  },
> +
> +  _registerWindow(win) {
> +    gWindowToRequestInfoMap.set(win, []);
> +    win.addEventListener("unload", this, true);

off-hand, the whole handleEvent code seems over-engineered for just this, it could be directly inlined as a function here. Also _registerWindow has just one caller, so it could be inlined in load, as well.

@@ +156,5 @@
> +    let win = browser.ownerDocument.defaultView;
> +    let loadType = PrivateBrowsingUtils.isWindowPrivate(win)
> +      ? faviconSvc.FAVICON_LOAD_PRIVATE
> +      : faviconSvc.FAVICON_LOAD_NON_PRIVATE;
> +    let callback = this._getCallback(win, innerWindowID);

s/get/make/ it should be clear we are generating a callback, not just getting it from somewhere else.
Maybe _makeUnregisterCallback?

@@ +203,5 @@
> +
> +Services.obs.addObserver(FaviconManager, "inner-window-destroyed", false);
> +if (!gInContentProcess) {
> +  Services.ppmm.addMessageListener("FaviconManager::inner-window-destroyed", FaviconManager);
> +}

in the main process looks like we need to do these only after the first load() call, so these could be moved there, behind an _initialized check.
the content process part would be better directly inlined in browser-child, imo.

::: toolkit/components/places/tests/browser/browser_favicon_timeouts.js
@@ +1,1 @@
> +"use strict";

I'd also move the test files to browser

@@ +38,5 @@
> +      onDataAvailable() {},
> +      QueryInterface: XPCOMUtils.generateQI([Ci.nsIHttpActivityObserver, Ci.nsIStreamListener, Ci.nsIRequestObserver]),
> +    };
> +  });
> +  let distributor = Cc["@mozilla.org/network/http-activity-distributor;1"].getService(Ci.nsIHttpActivityDistributor);

Some of the test code could be re indented to be kind with our legacy 80 chars width limit

::: toolkit/content/browser-child.js
@@ +12,5 @@
>  Cu.import("resource://gre/modules/RemoteAddonsChild.jsm");
>  Cu.import("resource://gre/modules/Timer.jsm");
> +if ("mozIAsyncFavicons" in Ci) {
> +  Cu.import("resource://gre/modules/FaviconManager.jsm");
> +}

I'd rather directly put the code here (at the bottom of the file) to forward the message, we don't need the whole module, just a few rows.
A small comment explaining its need would also be nice
Attachment #8751378 - Flags: review?(mak77)
Flags: sec-bounty? → sec-bounty+
(In reply to Marco Bonardo [::mak] from comment #26)
> @@ +17,5 @@
> > +  "@mozilla.org/browser/favicon-service;1", "nsIFaviconService");
> > +XPCOMUtils.defineLazyModuleGetter(this, "PrivateBrowsingUtils",
> > +  "resource://gre/modules/PrivateBrowsingUtils.jsm");
> > +
> > +let gWindowToRequestInfoMap = new Map();
> 
> is "Info" really needed here? the name looks long enough even without it ;)

Request objects are a thing, and these aren't requests. Maybe "gLoadAttemptMap" instead, or something?

> @@ +42,5 @@
> > +   * Handle inner-window-destroyed notifications that happened in the child process.
> > +   */
> > +  receiveMessage(msg) {
> > +    switch (msg.name) {
> > +      case "FaviconManager::inner-window-destroyed":
> 
> This is over engineered, please just inline a function in addMessageListener
> and invoke onInnerDestroyed from there.

This is fine... in the past, I've done this and then other reviewers tell me that we should always use objects over functions. It would be nice if we settled on code style for something like that. :-\

> @@ +65,5 @@
> > +
> > +  /**
> > +   * Actually cancel the request, and clear the timeout for cancelling it.
> > +   */
> > +  _cancelRequest({request, uri, innerWindowID, timerID}, reason) {
> 
> Please better separate the external API and the implementation details, it
> doesn't make sense to have fake internal methods in the main exposed object,
> that one should only expose a clean API (that so far looks like being load
> and isFailedFavicon). implementation details should not be exposed outside
> of the module.

I can do this, but you're aware that due to how Cu.import works, implementation details are always available anyway, even if you "hide" them in a separate object?

> @@ +203,5 @@
> > +
> > +Services.obs.addObserver(FaviconManager, "inner-window-destroyed", false);
> > +if (!gInContentProcess) {
> > +  Services.ppmm.addMessageListener("FaviconManager::inner-window-destroyed", FaviconManager);
> > +}
> 
> in the main process looks like we need to do these only after the first
> load() call, so these could be moved there, behind an _initialized check.

We load the module lazily in the main process, so this won't actually make a difference - we load the module in order to call load, only at the point where we do actually call load().

The load call is in the parent process, and the child process doesn't know when this happens, and so it just needs to know about inner-window-destroyed all the time, AIUI. So I don't think it's worth separating this out.

> the content process part would be better directly inlined in browser-child,
> imo.

The content process part is adding an observer. If we did this in browser-child, we'd end up with 1 observer for every tab - that's an inefficient pattern I've been trying to get rid of (bug 1195386). I can look at putting it in the content process script we have in browser/ (we don't have one in toolkit, I think).
Attached patch Patch v1.2Splinter Review
Attachment #8751378 - Attachment is obsolete: true
Attachment #8753406 - Flags: review?(mak77)
(In reply to :Gijs Kruitbosch from comment #27)
> This is fine... in the past, I've done this and then other reviewers tell me
> that we should always use objects over functions. It would be nice if we
> settled on code style for something like that. :-\

I honestly don't see a reason for such requests. If the code starts growing, then it makes sense to move to an object, but over-complicating current code for the sake of "possible" future code grow is a no-go, imo. It depends on the gain, as usual.

> I can do this, but you're aware that due to how Cu.import works,
> implementation details are always available anyway, even if you "hide" them
> in a separate object?

It is mostly a matter of presenting a cleaner API to non-malicious consumers and making things less bug prone (make clearer what should be used and what not and that if you need access to an internal detail, maybe you are doing it wrong)
Comment on attachment 8753406 [details] [diff] [review]
Patch v1.2

Review of attachment 8753406 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/places/PlacesUIUtils.jsm
@@ +148,5 @@
> +   * away) but that will be a no-op in such cases.
> +   */
> +  _makeCompletionCallback(win, id) {
> +    return {
> +      onComplete(uri) {

you don't need to return an object, since nsIFaviconDataCallback has the function decorator. Just return a function.
Attachment #8753406 - Flags: review?(mak77) → review+
(In reply to Marco Bonardo [::mak] from comment #30)
> Comment on attachment 8753406 [details] [diff] [review]
> Patch v1.2
> 
> Review of attachment 8753406 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: browser/components/places/PlacesUIUtils.jsm
> @@ +148,5 @@
> > +   * away) but that will be a no-op in such cases.
> > +   */
> > +  _makeCompletionCallback(win, id) {
> > +    return {
> > +      onComplete(uri) {
> 
> you don't need to return an object, since nsIFaviconDataCallback has the
> function decorator. Just return a function.

I do. I'm assigning something onto the callback object, and I'm using it with |this| in the function itself. The "something" is the request object which is the result of a call that takes this callback object as an argument, so it doesn't exist yet when the callback is created, so I can't pass it then... so this seemed the easiest thing to do. Is that OK or do you have another suggestion?
Flags: needinfo?(mak77)
(In reply to :Gijs Kruitbosch from comment #31)
> I do. I'm assigning something onto the callback object, and I'm using it
> with |this| in the function itself. The "something" is the request object
> which is the result of a call that takes this callback object as an
> argument, so it doesn't exist yet when the callback is created, so I can't
> pass it then... so this seemed the easiest thing to do. Is that OK or do you
> have another suggestion?

oh sorry, I didn't connect the 2 things. It's fine then.
Flags: needinfo?(mak77)
Comment on attachment 8753406 [details] [diff] [review]
Patch v1.2

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
The patch is big and has some unrelated refactoring, but it shouldn't be rocket science to see what we're fixing.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Yes, there's a test, so that kind of makes it obvious. Without that, it's a little less obvious, but if we uplift this together with bug 1265420 then it should still be fairly clear.

Which older supported branches are affected by this flaw?
All.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Not yet. They will be a little bit tricky, because we'll also need to backport bug 1265420, which is also not small. On the other hand, I don't think the favicon helpers have changed toooo drastically over the last few versions.

How likely is this patch to cause regressions; how much testing does it need?
In theory it could break favicon loading in some circumstances. It should get some testing, but equally it shouldn't have a terrible impact, either. The sec bug itself has an automated test.
Attachment #8753406 - Flags: sec-approval?
(In reply to :Gijs Kruitbosch from comment #33)
> Comment on attachment 8753406 [details] [diff] [review]
> Patch v1.2
> 
> [Security approval request comment]
FWIW, see also comment #23...
We can skip backporting to ESR45 if this is too hairy.

That said, I don't want us checking in a test until we ship this publicly. So this has sec-approval+ for the main patch but without the test. If possible, we want this on Aurora (at least) and Beta.
Attachment #8753406 - Flags: sec-approval? → sec-approval+
Attached patch Patch for aurora (obsolete) — Splinter Review
NB: this patch file is an hg export result of two changesets (the fix here and the one from bug 1265420). I tested "hg import" - ing such a cset, and I believe it should work. Let me know if there are issues.

Approval Request Comment
[Feature/regressing bug #]: n/a
[User impact if declined]: sec-high issue with favicon loads lasting longer than their corresponding tabs.
[Describe test coverage new/current, TreeHerder]: limited coverage. Some of the existing code has coverage, the new things we're doing likely does not.
[Risks and why]: low enough that shipping this on aurora shouldn't be an issue. Beta... it's a bit late, but hopefully there shouldn't be major issues.
[String/UUID change made/needed]: no string changes.
Attachment #8755371 - Flags: review+
Attachment #8755371 - Flags: approval-mozilla-aurora?
Attached patch Patch for beta (obsolete) — Splinter Review
Approval Request Comment --> comment #36.
Attachment #8755372 - Flags: review+
Attachment #8755372 - Flags: approval-mozilla-beta?
(I have a tree ready to push to inbound/fx-team, but everything is closed. :-( )
Attachment #8755377 - Flags: review+
Relanded on fx-team: https://hg.mozilla.org/integration/fx-team/rev/3ff4077f9e4bf923322c95fb7226bc0e9090f898
Flags: needinfo?(gijskruitbosch+bugs)
Attached patch Patch for auroraSplinter Review
Attachment #8755371 - Attachment is obsolete: true
Attachment #8755371 - Flags: approval-mozilla-aurora?
Attachment #8755626 - Flags: review+
Attachment #8755626 - Flags: approval-mozilla-aurora?
Attachment #8755626 - Attachment description: Patch for aurora (combined with bug 1227116 → Patch for aurora
Attached patch Patch for betaSplinter Review
Attachment #8755372 - Attachment is obsolete: true
Attachment #8755372 - Flags: approval-mozilla-beta?
Attachment #8755627 - Flags: review+
Attachment #8755627 - Flags: approval-mozilla-beta?
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment on attachment 8755627 [details] [diff] [review]
Patch for beta

Sec-high, Beta47+
Attachment #8755627 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8755626 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Al Billings [:abillings] from comment #35)
> We can skip backporting to ESR45 if this is too hairy.

Gijs, did you (or anybody else) look into whether this is actually too hairy? For obvious reasons we'd like to have this fix on ESR 45 as well (and the sec-high rating would support this). If that is not happening we would probably bite the bullet, though, trying to backport that patch ourselves. But if we could avoid adding yet another patch to our pile that would be strictly preferable. :)
Flags: needinfo?(gijskruitbosch+bugs)
Attached patch Patch for ESR 45Splinter Review
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec-high
User impact if declined: ditto
Fix Landed on Version: 49, uplifted to 47 and 48
Risk to taking this patch (and alternatives if risky): relatively large patch, certainly for ESR. I would feel better about it if it had had slightly more baking time on Nightly and aurora/beta. I don't know how the release cadence for ESR works, exactly, and if there is schedule space to do this. There is some automated test coverage, and I did some casual manual testing of this patch against ESR, so it's unlikely to break catastrophically, but there we are.
String or UUID changes made by this patch: Changes interface. No UUID change in the patch because we stopped doing that on m-c at the time of 47. Unclear whether you'd want me to change the UUID or rewrite the patch to not include a breaking interface change (ie adding a second method rather than modifying an existing one; this would be an extensive change and would weaken the impact of the patch to the point where e.g. add-ons might compromise its security benefit, and it would break compatibility of add-ons that rely on the new return value of the method we're changing on 49, when installed on 45). Note also that while we changed an interface in this patchset, the change is only breaking for CPP consumers. JS consumers should not be affected unless they depended on the (formerly nonexistent) return value being, well, nonexistent, which seems extremely unlikely. There are no comm-central CPP consumers other than the ones from mozilla-central, which the patch fixes. Binary add-ons were deprecated well before 45. IMO, considering all of the above, rewriting the patch is not a good idea. I'm open to rev'ing the UUID, but I haven't done that in this patch - it would be a trivial change, though. Please let me know what you want.
Flags: needinfo?(gijskruitbosch+bugs)
Attachment #8756301 - Flags: review+
Attachment #8756301 - Flags: approval-mozilla-esr45?
Toni, could you also verify that the fix on Nightly works against your testcase?
Flags: needinfo?(toni.huttunen)
Group: firefox-core-security → core-security-release
Depends on: 1276636
Whiteboard: [adv-main47+]
Alias: CVE-2016-2830
We are going to build on ESR tonight or tomorrow morning; already a day or two later than we should have. We try to build in some time for QE to test it. I know it would be best to have this in ESR45 but am leaning against it.
After talking with Dan a bit more we are going to hold off on this till ESR 45.3.0. But we should take it early in that cycle and do some testing.
Flags: needinfo?(dveditz)
(In reply to :Gijs Kruitbosch (gone until June 6) from comment #50)
> Toni, could you also verify that the fix on Nightly works against your
> testcase?

I retested this bug today and it appears to be still vulnerable. At least, I don't see any difference in behaviour. Favicon requests are still spawning even 10 min after page close. I'm using Firefox Nightly 49.0a1 (2016-06-05) ("Nightly is up to date"). Am I testing right version?
Flags: needinfo?(toni.huttunen) → needinfo?(gijskruitbosch+bugs)
(In reply to Toni Huttunen from comment #54)
> (In reply to :Gijs Kruitbosch (gone until June 6) from comment #50)
> > Toni, could you also verify that the fix on Nightly works against your
> > testcase?
> 
> I retested this bug today and it appears to be still vulnerable. At least, I
> don't see any difference in behaviour. Favicon requests are still spawning
> even 10 min after page close.

Can you elaborate on how you tested this and how I can reproduce this despite the errors I'm seeing with the python script? (comment #12) ?

> I'm using Firefox Nightly 49.0a1 (2016-06-05)
> ("Nightly is up to date"). Am I testing right version?

Yes.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(toni.huttunen)
$ unzip delayCSRF.zip
$ nano favicon.html
    modify:
        var wait = i*5;  // var wait = i*60;
        favicon("http://YOUR_SERVER_ADDRESS:1337/wait="...
$ python redir.py

After that, open favicon.html file with your browser. You will see a lot of broken pipe errors but they are not fatal. If server closes/crashes for some reason, just relaunch it. Wait until browser has redirected from favicon.html to google.com (10s).


Follow server messages and you should see messages like:
  GET /wait=270&url=https://www.google.com/search?q=CSRF-hack0.22463369489286722?i=42&t=1465409287404
Flags: needinfo?(toni.huttunen) → needinfo?(gijskruitbosch+bugs)
(In reply to Toni Huttunen from comment #56)
> $ unzip delayCSRF.zip
> $ nano favicon.html
>     modify:
>         var wait = i*5;  // var wait = i*60;
>         favicon("http://YOUR_SERVER_ADDRESS:1337/wait="...
> $ python redir.py
> 
> After that, open favicon.html file with your browser. You will see a lot of
> broken pipe errors but they are not fatal. If server closes/crashes for some
> reason, just relaunch it. Wait until browser has redirected from
> favicon.html to google.com (10s).
> 
> 
> Follow server messages and you should see messages like:
>   GET
> /wait=270&url=https://www.google.com/search?q=CSRF-hack0.
> 22463369489286722?i=42&t=1465409287404

Thank you for bearing with me. This turned out to be quite... annoying... because the server does end up quitting or hanging a lot. I found two issues with the patch:

1) the unload handler gets triggered by content windows unloading because it's checking |win| rather than |eventWin|. So the toplevel browser window in our map (browser windows -> requests) gets removed, and we stop being able to cancel requests in there. I *think* this is specific to non-e10s, but I'm not sure. Even so, see (2)!

2) even with that addressed, and when I can see that we're successfully cancelling all 1000-odd requests that are triggered this way, I see some odd behaviour in the server logs. It's hard to be sure about what's happening because the python server becomes very sad about all the requests we cancel early, and quits (by itself). When I restart it, several requests still come in. Then the server hangs / produces no more logging. Ctrl-C doesn't quit the server successfully, I have to go pkill it. When I do that and re-start the server, *more requests* come in. So what I *think* is happening is that TCP connections and request data is queued, and not "unqueued" when we call cancel on them, and then we end up cancelling them immediately as they're being sent. That's a bit stupid, of course...

It *seems* I can wallpaper over (2) by cancelling the previous requests for a content window as soon as we queue a new one, because the requests get queued on the next turn of the event loop, and the loop in the browser code is tight, and so the next request is made before the next turn of the event loop. This is probably a good idea anyway, but I'm concerned about the behaviour there - it would still just be a wallpaper, and adding a setTimeout to the testcase (favicon.html) in how quickly we add new favicon requests would probably break that.

Digging deeper, I *think* it's because the code here:

http://searchfox.org/mozilla-central/source/toolkit/components/places/FaviconHelpers.cpp#418-444

creates the channel, but doesn't store it anywhere. When we call Cancel(), the request hasn't started yet, so onStartRequest hasn't fired yet ( http://searchfox.org/mozilla-central/source/toolkit/components/places/FaviconHelpers.cpp#461-465 ) and that stores the request, so we don't end up canceling it until onStartRequest fires. All the queueing (which has come up in earlier comments) is internal to the HTTP/network stack, so we can't control it directly - I'm fairly sure it's just to do with the number of simultaneous requests we allow to happen to a single domain. In any case, I'm going to see if we can just store the channel we create as soon as asyncopen succeeds and if that also makes this behaviour go away (in addition to the wallpaper).

Marco, any comments on the above? Separately: new bug or do you want to keep this in here?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mak77)
(In reply to :Gijs Kruitbosch from comment #57)
> I'm going to see if we can just store the channel we create as soon as
> asyncopen succeeds and if that also makes this behaviour go away (in
> addition to the wallpaper).

This does seem to work. I think we should just use a new bug for the followup fixes; repeating the same bug number would be shooting ourselves in the foot in terms of exposing what we're fixing and that it's not fixed in 47, in case we don't do a point release (not my call). I'll clone in a second.
Flags: needinfo?(mak77)
Comment on attachment 8756301 [details] [diff] [review]
Patch for ESR 45

No point putting this on ESR if it's not fixed properly.
Attachment #8756301 - Flags: approval-mozilla-esr45?
Blocks: 1279208
Hi Gijs, this caught my attention as a potential fix to uplift in esr45.3.0. Is a proper fix ready for uplift now? We still have ~2 weeks that can be used to uplift sec-high/sec-crits on the ESR branch. Please let me know. Thanks!
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Ritu Kothari (:ritu) from comment #60)
> Hi Gijs, this caught my attention as a potential fix to uplift in esr45.3.0.
> Is a proper fix ready for uplift now? We still have ~2 weeks that can be
> used to uplift sec-high/sec-crits on the ESR branch. Please let me know.
> Thanks!

There's a proper fix on bug 1279208 that I've rebased to esr45 already, but it really really needs QA sign-off / a check that it works before I'm going to request approval. If you can ping folks to make that happen that would be very helpful.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(rkothari)
bugmail reading in oldest-to-newest order isn't always a good idea. Just saw QA verified in bug 1279208.
Flags: needinfo?(rkothari)
Blocks: 1283067
Verified in bug 1279208 for esr as well.
Status: RESOLVED → VERIFIED
Toni, can you verify this as well?
Flags: needinfo?(toni.huttunen)
Whiteboard: [adv-main47+] → [adv-main48+][adv-esr45.3+]
(In reply to Al Billings [:abillings] from comment #65)
> Toni, can you verify this as well?

Toni verified in bug 1283067 that this is fixed in nightly, and qa verified beta, aurora and esr (after rebasing hiccups on esr and beta).
Thanks!
Flags: needinfo?(toni.huttunen)
Depends on: 1279650
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.