Closed Bug 1419790 Opened 7 years ago Closed 6 years ago

User's browsing history from other tabs/highlights is exposed in Google Analytics via twitter:image tag used by Highlights

Categories

(Firefox :: Untriaged, defect)

57 Branch
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: olly.dell, Unassigned, NeedInfo)

References

Details

(Whiteboard: [closeme-2018-03-19])

Attachments

(1 file)

Attached image Firefox bug.png
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36

Steps to reproduce:

When adding twitter:image meta data, add a webpage (not an image) url to a web page:
<meta name="twitter:image" content="valid_webpage_url">
Request the web page in Firefox.



Actual results:

The valid_webpage_url is requested by Firefox to generate an image for the 'Highlights'
What is also collected in Google Analytics are url details of other browsing windows. This was discovered in GA reports:
Behaviour > Site Content > All Pages > Page > Secondary dimension: Referral Path.
Attached is some data of our users, including a Referral Path url including payment website parameters. Potentially, session data in urls could be exposed and exploited. To collect the data, all you need to do is include a twitter url to a webpage.
This can be seen live on our site:
https://www.starstable.com/de/article/4438

Viewing the HTML source, the url in the meta tag is incorrect (bug on our part):
<meta name="twitter:image" content="https://www.starstable.com//gfx/news/402/wildhorseheader.jpg">

The url https://www.starstable.com//gfx/news/402/wildhorseheader.jpg redirects to our frontpage, not the jpg image.

Activity to the url as a page request started 14 Nov.
Amongst the Source and Referral Paths assigned to the url, I have constructed urls from our users - something we should not be able to do. For example, This is a url that has been captured to a Polish bank and could be exploited:

https://www.centrum24.pl/przelew24/crypt.7z0MVzesbIQEfiP3nWTvxA/7z05c
I should clarify that I have found these urls via Google Analytics
This is odd behavior, but I'm not sure it's an issue with Firefox - its sounds to me like an issue with websites including google analytic on sensitive pages. But it's very odd if you are seeing this leakage only for FF57. As far as I can tell, when we decide to show something on the about:newtab page with activity steam, we take a screenshot of it - that means making a request to the website, rendering the page, and then screenshotting it. Note that credentials (cookies) are not sent in these requests. 

Now it looks like in bug 1393924 we added code to look for meta tags (instead of rendering the whole page, we would use the image instead). But im not sure why that would really change anything from a security perspective. 

cc'ing some folks who might be able to help.
Paul, that's correct. As of 57 release we're collecting and storing in moz_places preview images for all sites that you visit based on a list of accepted meta tags, twitter:image included (note that if the website provides a better preview image, we store that one instead). I don't think this is an issue with Firefox either. I'm not sure there's a whole lot we can do here, because in theory, preview images themselves could even contain sensitive information, so it's really up to the website to make sure they're not exposing sensitive information in meta tags.
Just to be clear, from talking to Ursula: I don't think we fully understand what's going on here yet. But at least one thing that's going wrong is that the thumbnailer is doing loads of these preview images it's collected in actual docshells, and then figuring out if it was an image afterwards (and the way it's doing that is also probably not the best). I would suggest that we should force these loads to be inside <img> tags (still in the no-cookie container etc. etc.)

However, that alone doesn't really help explain the bug as reported. I don't know enough about GA as a thing to understand what's going on here without some more information.

(In reply to Oliver Dell from comment #1)
> Amongst the Source and Referral Paths assigned to the url, I have
> constructed urls from our users - something we should not be able to do. For
> example, This is a url that has been captured to a Polish bank and could be
> exploited:
> 
> https://www.centrum24.pl/przelew24/crypt.7z0MVzesbIQEfiP3nWTvxA/7z05c

This is very concerning. Oliver, do you know (is there documentation?) of how GA collects these referral URLs? As in, do they just correspond to document.referrer / the referer header? Or is it based on correlation in their own databases and the GA identifier? Or something else? And, to be clear, this is all coming from the URL in this specific twitter:image metadata thing that particular page defines?

My current hunch is that somehow, the thumbnailer's method of loading URLs is passing earlier thumbnails loaded as referrer information. However, this doesn't really compute without some more explanations, for several reasons:

- we should have seen issues earlier. we did thumbnailing before 57, just not for preview images, but that doesn't really affect how all the thumbnails (pages or preview images) load. But perhaps this was less surprising in terms of what shows up in GA...
- it normally requires hoop-jumping to do a load that passes a referrer, so I don't really follow how the referrers would be passed
- those urls seem like very strange things to have high enough in your history to warrant thumbnailing
- I would expect those banking urls to redirect to "hey, you're not logged in" when loaded in the limited, no-cookie container we load them in, and for those urls to show up - not the originals.
Flags: needinfo?(olly.dell)
(In reply to :Gijs from comment #5)
> (In reply to Oliver Dell from comment #1)
> > Amongst the Source and Referral Paths assigned to the url, I have
> > constructed urls from our users - something we should not be able to do. For
> > example, This is a url that has been captured to a Polish bank and could be
> > exploited:
> > 
> > https://www.centrum24.pl/przelew24/crypt.7z0MVzesbIQEfiP3nWTvxA/7z05c
> 
> This is very concerning. Oliver, do you know (is there documentation?) of
> how GA collects these referral URLs? As in, do they just correspond to
> document.referrer / the referer header? Or is it based on correlation in
> their own databases and the GA identifier? Or something else? And, to be
> clear, this is all coming from the URL in this specific twitter:image
> metadata thing that particular page defines?

One way to check this would be to check your own server logs for referrer data, assuming you have access to those.
Based on:

https://dxr.mozilla.org/mozilla-central/rev/4affa6e0a8c622e4c4152872ffc14b73103830ac/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js#107-109

      this._webNav.loadURI(this._currentCapture.url,
                           Ci.nsIWebNavigation.LOAD_FLAGS_STOP_CONTENT,
                           null, null, null);

I don't think we should be passing referrer URIs... but I'm having trouble isolating one of these requests so I can actually check this.
Maybe I'm looking at the wrong thing? I did a few thumbnail captures before doing this:

Cu.import("resource://gre/modules/BackgroundPageThumbs.jsm");
BackgroundPageThumbs.capture("https://ed.agadak.net/refresh.html")

Where the page is just <meta http-equiv="refresh" content="1;https://ed.agadak.net/as/?refresh" />


The access logs show:
[23/Nov/2017:09:22:07 -0800] "GET /refresh.html HTTP/1.1" 200 553 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0" 
[23/Nov/2017:09:22:08 -0800] "GET /as/?refresh HTTP/1.1" 200 2735 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0" 

Neither request has referrer information?

I suppose if javascript is running on the requested page and redirected page, there could be some way to pass information from one to the other to record the redirect path?
(In reply to Ed Lee :Mardak from comment #8)
(In reply to Ed Lee :Mardak from comment #8)
> Maybe I'm looking at the wrong thing? I did a few thumbnail captures before
> doing this:
> 
> Cu.import("resource://gre/modules/BackgroundPageThumbs.jsm");
> BackgroundPageThumbs.capture("https://ed.agadak.net/refresh.html")
> 
> Where the page is just <meta http-equiv="refresh"
> content="1;https://ed.agadak.net/as/?refresh" />

> Neither request has referrer information?

I think my worry is whether capturing first page 1 and then page 2, using the existing queuing system, does anything wrong, because it looks like we reuse the docshell, and AFAICT don't clear cookies or anything...


> I suppose if javascript is running on the requested page and redirected
> page, there could be some way to pass information from one to the other to
> record the redirect path?

I guess this makes more sense, but it's worth checking we're sure there's no actual referer information, and, as it were, this is all google's fault.


... of course, if we ensured we cleared cookies, or loaded thumbnails with tracking protection / DNT, this might also be avoided...
We fixed the incorrect url in the twitter:image that was returning a web page. I have cleared all my Firefox history and requested a news article:
https://www.starstable.com/se/article/4435

This is now added added to Firefox Highlights, but the thumbnail image shown is of the old thumbnail render of the site home page - I would assumed the thumbnail was generated by my browser, but I get the feeling this is requested from a cache from a Mozilla thumbnail server? Could that be true?
Flags: needinfo?(olly.dell)
Olivier, all thumbnails are generated within the browser and are temporarily saved on disk in your profile directory until they expire and get deleted from there. We don't put any of the thumbnails on a server.
OK! Thanks Ursula

I can see site traffic in real time via Google Analytics, and there are current 60 active users on the site requesting 'pages' that are the twitter::image url. There is no way users would purposely use these urls. They are not indexed as landing pages on a search engine and there is no referring site. They are direct requests from a browser:

Page	Page Title	Page Views (Last 30 min)
/fr//gfx/news/402/wildhorseheader.jpg	Un jeu de chevaux en lign...aventures ! | Star Stable	72	24.91%
/pl//gfx/news/402/wildhorseheader.jpg	Pełna przygód gra o koniach online! | Star Stable	42	14.53%
/en//gfx/news/402/wildhorseheader.jpg	A horse game online full ...adventures! | Star Stable	26	9.00%
/de//gfx/news/402/wildhorseheader.jpg	Ein Online-Pferdespiel vo... Abenteuer! | Star Stable	19	6.57%
/de//gfx/news/dbsc/posrheadernov17.jpg	Ein Online-Pferdespiel vo... Abenteuer! | Star Stable	18	6.23%
/de//gfx/news/401/sealheader.jpg	Ein Online-Pferdespiel vo... Abenteuer! | Star Stable	14	4.84%
/hu//gfx/news/402/wildhorseheader.jpg	Egy kalandokkal teli lovas játék! | Star Stable		11	3.81%
/se//gfx/news/402/wildhorseheader.jpg	Ett hästspel online fullt...av äventyr! | Star Stable	9	3.11%
/en//gfx/news/dbsc/posrheadernov17.jpg	A horse game online full ...adventures! | Star Stable	7	2.42%
/pl//gfx/news/401/sealheader.jpg	Pełna przygód gra o koniach online! | Star Stable	7	2.42%
/pl//gfx/news/400/ambassadorheader.jpg	Pełna przygód gra o koniach online! | Star Stable	6	2.08%
/hu//gfx/news/dbsc/posrheadernov17.jpg	Egy kalandokkal teli lovas játék! | Star Stable		5	1.73%
/nl//gfx/news/402/wildhorseheader.jpg	Een online paardenspel bo... avonturen! | Star Stable	5	1.73%
/pl//gfx/news/400/ambassadorheader0.jpg	Pełna przygód gra o koniach online! | Star Stable	5	1.73%
/se//gfx/news/401/sealheader.jpg	Ett hästspel online fullt...av äventyr! | Star Stable	4	1.38%
/pl//gfx/news/dbsc/posrheadernov17.jpg	Pełna przygód gra o koniach online! | Star Stable	3	1.04%
/de//gfx/news/397/mistfallheader2.jpg	Ein Online-Pferdespiel vo... Abenteuer! | Star Stable	2	0.69%
/de//gfx/news/dbsc/scheadernov17.jpg	Ein Online-Pferdespiel vo... Abenteuer! | Star Stable	2	0.69%
/fr//gfx/news/400/ambassadorheader.jpg	Un jeu de chevaux en lign...aventures ! | Star Stable	2	0.69%
/fr//gfx/news/401/sealheader.jpg	Un jeu de chevaux en lign...aventures ! | Star Stable	2	0.69%
When is the thumbnail generated?
Is it whilst browsing a site, and the thumbnail is processed in the background.
Is it when the tab is closed or focus is changed to a new tab.
or when the application is restarted and the recent history is used to populate highlights.

I can see that we have seen 27000 sessions started with the //gfx/ urls as starting pages
(In reply to Oliver Dell from comment #12)
> OK! Thanks Ursula
> 
> I can see site traffic in real time via Google Analytics, and there are
> current 60 active users on the site requesting 'pages' that are the
> twitter::image url. There is no way users would purposely use these urls.
> They are not indexed as landing pages on a search engine and there is no
> referring site. They are direct requests from a browser:

Right, we're not disputing that the twitter:image is being requested. It's being requested by a background browser-ish thing inside Firefox that does a request in a separate container (so won't send the same cookies as when the user browses "normally") and takes a screenshot (using a separate container avoids e.g. displaying sensitive info on screenshots on the new tab page for logged in bank sites etc.). In fact, even if we fixed all the bugs here it would probably continue to be requested! If the site specifies (or specified at the time of last visit) a URL as the preview image, we'll try to use it as the preview image for that site on the new tab page, much like twitter would use that image if the page gets linked in a tweet. We'll update some code to avoid using twitter:image links that don't point to images.

The issues I'm much more concerned about is the referrer data that GA is collecting and associating with the request somehow. I'm confused that you're currently saying there isn't referer data. There was before, right? Or what is the screenshot and/or comment #1 about?
Flags: needinfo?(olly.dell)
(In reply to Oliver Dell from comment #13)
> When is the thumbnail generated?
> Is it whilst browsing a site, and the thumbnail is processed in the
> background.
> Is it when the tab is closed or focus is changed to a new tab.
> or when the application is restarted and the recent history is used to
> populate highlights.

If the screenshot is seen in the Highlight section, then the thumbnail is only generated once when you open a new tab. It caches the thumbnail for a period of time. If the browser is closed and then opened again it will regenerate new Highlights which will then request a new thumbnail.
OK. Good news!
This may not be as serious as I first thought. I don't believe we are capturing any general traffic from a user's browser that is not related to using our site.

I've gone through all the referrers recorded in Google Analytics, relating to the //gfx page requests. As usual there are site like google, facebook, mail and banner ads. But nothing that would be considered unrelated to our site visitors.

I can confirm that the sensitive sites that are listed as referrers are the payment providers we use internationally for our users to buy our product.

Once the purchase has been made there is a link to 'return to the site', hence the referrer.

So the 'background browser-ish thing inside Firefox that does a request in a separate container' is using the twitter:image url rather than the page actually requested in the referring link, and this may be effecting.

Example of the beacon pixel data that is sent to google when user visit our site from facebook:

t	pageview
dl	https://www.starstable.com/se/
dr	https://pl-pl.facebook.com/
tid	UA-20083095-1

I think that the 'background browser-ish' container is effecting the current location, and sending google data something like:
t	pageview
dl	https://www.starstable.com/de//gfx/news/dbsc/posrheadernov17.jpg
dr	https://pl-pl.facebook.com/
tid	UA-20083095-1
Flags: needinfo?(olly.dell)
Olivier, does this sound like we can close this as invalid then?
Flags: needinfo?(olly.dell)
Cleaning up pending needinfo? bugs. Will close as invalid if I don't hear back by the 19th.
Whiteboard: [close-me-2018-03-19]
Whiteboard: [close-me-2018-03-19] → [closeme-2018-03-19]
I filed bug 1443456 for the img vs. document confusion for twitter:image and friends.
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Group: firefox-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: