Closed Bug 481503 Opened 15 years ago Closed 7 years ago

do DNS prefetch for awesomebar matches

Categories

(Firefox :: Bookmarks & History, defect, P3)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: beltzner, Unassigned)

References

Details

(Keywords: perf, privacy, privacy-review-needed, Whiteboard: latency)

Attachments

(1 file)

Thinking of ways to speed up a user's browser experience, and DNS lookups are always a barrier between "user selects navigation target" and "pageload". I'm actually noticing that DNS and initial connection takes a large amount of time.

I was wondering if we could do something clever like prefetch the DNS connection for the result set in the awesomebar as a way of making the:

 - type in awesomebar
 - select entry
 - hit enter
 - page loads

sequence of events much quicker.

I imagine that we'll need to do it only once we've gotten some stability on the result set, but we can probably pick a close enough heuristic for that and start there.
Whiteboard: latency
Adding "perf" as I think this would be a perception-of-performance improvement.
Whiteboard: latency → latency, perf
Keywords: perf
Whiteboard: latency, perf → latency
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
There was a talk at blackhat about information disclosure when using DNS prefetch. I'll try to dig up the slides, but the way to mitigate it is to not DNS prefetch on SSL I believe
Blocks: dns_perf
Whiteboard: latency → latency, privacy
any other thoughts on privacy here other than:

ssl
private browsing
pref driven

?
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
I think the main concern is with moving a mobile device (phone, laptop) between zones with different privacy/content concerns. For example, there might be some websites that you would visit at home that you wouldn't visit at work. If you bring your home laptop to work then you would end up making DNS queries against your work DNS server for domains that you would never visit at work.

I think the existence of a pref isn't enough because the person that does not use private browsing and that uses their work laptop for porn or their pornputer for work is probably not going to use (or even be aware of) such a pref.
Keywords: privacy
Whiteboard: latency, privacy → latency
do pre fretches for awseomebar result set.. This patch really doesn't filter based on stage of matching. (It does restrict itself to http:// URLs.)

we don't filter based on stage (as per comment 0) because
* starting earlier increases the chance of being complete when the link is selected
* there is a lot of redundancy between stages and the dns cache should take care of that for us
* this just isn't a lot of data - even an aggressive set of results that manages to warp between a totally different set of hosts on every key press is just a few KB of transfer.

separately, we need to pursue telemetry approaches to confirm this.
Attachment #498869 - Flags: review?(jduell.mcbugs)
How do folks feels about the concerns in comment 6?  I think there's a fairly convincing case that doing network traffic for things we throw up as suggestions to the user (but that they haven't actually selected) is a privacy issue.  I'd mark this WONTFIX.
Comment on attachment 498869 [details] [diff] [review]
DNS for awesomebar results v1

If we do fix this, I don't think we'd want it at the generic toolkit level. It seems better suited to the URL bar autocomplete implementation specifically (nsPlacesAutoComplete.js).
Attachment #498869 - Flags: feedback-
I'm torn here.  This would in fact be a really nice change to make; do we have any data on how likely urls are to be in the "visit in location A but never location B" category but to not be opened in private browsing mode?
(In reply to comment #8)
> How do folks feels about the concerns in comment 6?  I think there's a fairly
> convincing case that doing network traffic for things we throw up as
> suggestions to the user (but that they haven't actually selected) is a privacy
> issue.  I'd mark this WONTFIX.

That's a very good point. This is essentially akin to exposing a subset of the user's history and bookmarks to the network, and unpredictably at that, since the awesomebar is (initially) unstable with its results.

It *would* be really nice to find some way to do this, though. I'm just not sure how. If we knew a) what network we're on and b) that we've visited this link before on this network, then privacy would be slightly less of a concern. But we don't have that data and I really doubt we want to start collecting it.

I'm not sure how valuable exposing this to the user as a pref would be, especially since we'd probably want to default it to off.
(In reply to comment #11)
> That's a very good point. This is essentially akin to exposing a subset of the
> user's history and bookmarks to the network, and unpredictably at that, since
> the awesomebar is (initially) unstable with its results.
We could only do this on things found in moz_input_history (which are things that the user has actually selected in the past) to make sure it's things that the user actually selects here.  We could also do it only for things with a frecency higher than X (to be determined).

> I'm not sure how valuable exposing this to the user as a pref would be,
> especially since we'd probably want to default it to off.
How is this less leaky than prefetching a website can currently do that we have default to prefd on?
We already have users complaining that the bookmarks dialog does unwanted traffic due to microsummaries, we usually suggest to disable them.
But this is a worse privacy hit, since it involves history and it is in a pretty commonly used piece of primary ui. Also we have Sync that brings history to all devices. A lot of users will feel like it is hurting knowledge of where data goes when they type something in the locationbar (this is one of the most commonly critics done to Chrome, for example).
I think we should find ways to mitigate the problem before taking the feature, or allow easy opt-out, maybe even by a allow/deny notification on first locationbar use. After all we ask use if he wants to share geolocation and other kind of data, I don't see why we could not ask if he wants to share history with dns servers. Would even better if this would be "per-network" so that changing network will ask me again.
We could timestamp history entries, and only use DNS prefetch for an entry if the network has not been changed since it was created (or last used).

But bookmarks still create a problem--though I imagine it's infrequent, it's real:

"If you create new Bookmarks while using Private Browsing, they will not be removed when you stop Private Browsing."

So even if one is scrupulous about using Private browsing, if you've bookmarked not-suitable-for-work links and then happen to start typing something that matches one of them while you're at work, you'll wind up making a DNS request for it.  I'm sure someone would get bitten by this.   

This is a tough call--it's a pretty big win for most users, but a bad privacy behavior for a small number.
Barring further clever ideas, I'd be fine with a policy that we only do DNS prefetch on awesomebar entries that are from history (not bookmarks), and that pass the 'network epoch' test in my last comment.  There doesn't seem to be any problematic info leakage in that case. 

This would mean mobile users wouldn't get much benefit from this, but at least desktop users would.
Ok so how do you measure "Network Epoc"?

(and what about entries that are both in the history and bookmarks?)
(In reply to comment #14)
> We could timestamp history entries, and only use DNS prefetch for an entry if
> the network has not been changed since it was created (or last used).

This means we have to store another large int for each entry, retrieve and check it, sounds painful.

> "If you create new Bookmarks while using Private Browsing, they will not be
> removed when you stop Private Browsing."

We took a sort of a decision that bookmarks have fewer privacy implication than history, so I'd care less about this case.

> This is a tough call--it's a pretty big win for most users, but a bad privacy
> behavior for a small number.

Actually the privacy hit is for everyone, the small number is probably those who will notice it and complain. It's like saying that we could share user's geolocation coords with all websites because only a few technical users will notice we do that.

Imo the best solution would still be to ask to the user when we detect a new network and store the choice for each network. Can we do this network detection?
Can we take this discussion to dev.apps.firefox guys?  It really doesn't belong in the bug.
>Can we take this discussion to dev.apps.firefox guys?  It really doesn't belong
>in the bug.

Everyone here seems to know exactly what they are talking about.  It seems like a small targeted group of experts should be able to effectively debate the aspects of this decision.  Requesting that a discussion go to the wider set of people at d.a.f kills it in two ways.  First the thread usually isn't created, and secondly if the thread is created, it is quickly inundated with the polarized opinions of random trolls.  I propose that if there is someone on the team who we need to weigh in on this decision, we just cc them here.
Our private browsing feature is not a good way to control this. Lots of people implement their private browsing without ever using Firefox's private browsing feature (e.g. using separate password-protected user accounts and user account switching). That's (one reason) why we implemented "Clear Recent History". And also, on mobile or on a non-shared computer, it is very unlikely the user would ever use private browsing or even "clear recent history."

It may be acceptable to implement this on a "recently used on the same network" basis but I am not sure how we can reliably identity the "same network".
Comment on attachment 498869 [details] [diff] [review]
DNS for awesomebar results v1

Clearing review pending a decision on whether we want to implement this, and if so, whether we need to have it be savvy about network changes (which would obviously mean changes to the patch).  I'll that call to the more security-minded among us.
Attachment #498869 - Flags: review?(jduell.mcbugs)
Note that we'll also need the same privacy issue resolved to do HTTP prefetching from the awesomebar, which Chrome 17 is now doing:

  http://chrome.blogspot.com/2012/01/speed-and-security.html

So this would be good to figure out.
Actually, we now have a table with domains sorted by frecency in places.sqlite that we use for the autoFill feature. This may be a good data source for prefetching.
(In reply to Marco Bonardo [:mak] from comment #23)
> Actually, we now have a table with domains sorted by frecency in
> places.sqlite that we use for the autoFill feature. This may be a good data
> source for prefetching.

My experience and some feedback I've seen seems to indicated that the auto-filled domain quite often doesn't match the URL you're really looking for.
(In reply to Dão Gottwald [:dao] from comment #24)
> My experience and some feedback I've seen seems to indicated that the
> auto-filled domain quite often doesn't match the URL you're really looking
> for.

Well, that's the url that you visited recently and often enough. Then it's hard to do mind-reading, most users tend to visit the same group of pages, we are a bit "special" regarding that, being more technical.
Btw, consider that autoFill only shows the first result, it's very likely the domain you are looking for is among the first 5 results, but you don't see them.
"visited recently and often enough" doesn't mean much here. You're looking for the best domain match, but the best domain match can have a low frecency if the user wasn't looking for a domain in the first place... and as it happens, a prime feature of the "awesome bar" is that you don't need to type domain names.
my point was that we may prefetch dns for the most probable domains using this data, without actually doing this when you are searching in the awesomebar. Could be done in background shortly after startup. You are likely going to visit those in the session, otherwise they'd not have top frecency.
Also, frecency of the domains is calculated out of the pages visited inside that domain (the algorithm can be improved over time, too), not to direct visits to the main path of it, so looking for a page is not much different than looking for its domain.
(In reply to Marco Bonardo [:mak] from comment #27)
> my point was that we may prefetch dns for the most probable domains using
> this data, without actually doing this when you are searching in the
> awesomebar. Could be done in background shortly after startup. You are
> likely going to visit those in the session, otherwise they'd not have top
> frecency.

Would that be wise

> Also, frecency of the domains is calculated out of the pages visited inside
> that domain (the algorithm can be improved over time, too), not to direct
> visits to the main path of it, so looking for a page is not much different
> than looking for its domain.

It is much different as soon as you're not looking for a domain when typing in the location bar. "The Foo Page" on whatever.com can have a way higher frecency than foobar.com and all its sub pages.
(In reply to Dão Gottwald [:dao] from comment #28)
> (In reply to Marco Bonardo [:mak] from comment #27)
> > my point was that we may prefetch dns for the most probable domains using
> > this data, without actually doing this when you are searching in the
> > awesomebar. Could be done in background shortly after startup. You are
> > likely going to visit those in the session, otherwise they'd not have top
> > frecency.
> 
> Would that be wise

... from a privacy point-of-view?
(In reply to Dão Gottwald [:dao] from comment #28)
> It is much different as soon as you're not looking for a domain when typing
> in the location bar. "The Foo Page" on whatever.com can have a way higher
> frecency than foobar.com and all its sub pages.

well in that case whatever.com would have an higher frecency than foobar.com and we could have prefetched it earlier. I'm not suggesting that we use the inline results to do prefetch, just that we have new interesting that more pertinent to domains.

(In reply to Dão Gottwald [:dao] from comment #29)
> > Would that be wise
> ... from a privacy point-of-view?

My understanding is that this is already an open issue (comment 22)
(In reply to Marco Bonardo [:mak] from comment #30)
> (In reply to Dão Gottwald [:dao] from comment #28)
> > It is much different as soon as you're not looking for a domain when typing
> > in the location bar. "The Foo Page" on whatever.com can have a way higher
> > frecency than foobar.com and all its sub pages.
> 
> well in that case whatever.com would have an higher frecency than foobar.com
> and we could have prefetched it earlier. I'm not suggesting that we use the
> inline results to do prefetch, just that we have new interesting that more
> pertinent to domains.

Yes, I was digressing. I wasn't talking about prefetching anymore since you clarified that your suggestion is to prefetch domains at startup rather than during autofill.

> (In reply to Dão Gottwald [:dao] from comment #29)
> > > Would that be wise
> > ... from a privacy point-of-view?
> 
> My understanding is that this is already an open issue (comment 22)

Ok. (I assumed you meant to propose a way to avoid or mitigate the issue.)
alking about prefetching anymore since you
> clarified that your suggestion is to prefetch domains at startup rather than
> during autofill.

startup probably won't work because our dns cache ttl is quite short and the amount of activity that goes on tends to push stuff out of heirarchical (i.e. OS/NAT) caches quite quickly.. so you really need temporal locality.
Assignee: mcmanus → hurley
Priority: -- → P2
Assignee: hurley → nobody
Status: ASSIGNED → NEW
Depends on: 1355443, 1355451
Priority: P2 → P3
With the work in bug 1363772, we're close to the initial task here. I'm not sure whether we should go further, for privacy reasons. Regardless, a new proposal should start from where we are with speculative connections.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: