Closed
Bug 12274
(prefetch)
Opened 25 years ago
Closed 22 years ago
RFE: Browser should prefetch LINK tag documents
Categories
(Core :: Networking: Cache, enhancement, P5)
Core
Networking: Cache
Tracking
()
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: sirilyan, Assigned: darin.moz)
References
()
Details
(Keywords: embed, topembed, Whiteboard: [parity-webtv])
Attachments
(3 files, 4 obsolete files)
40.24 KB,
patch
|
darin.moz
:
review+
rpotts
:
superreview+
|
Details | Diff | Splinter Review |
74.56 KB,
patch
|
Details | Diff | Splinter Review | |
70.78 KB,
patch
|
gagan
:
review+
rpotts
:
superreview+
jesup
:
approval+
|
Details | Diff | Splinter Review |
From a discussion in netscape.public.mozilla.wishlist, under the subject "browse in a non perishable cache and silent browse ahead": Matt Fletcher wrote: > > > Richard Shaw wrote: > [snip non-perishable cache idea] > > Also how about as an alternative to silent download, silent browse > > ahead. This would when the user is doing nothing would, use up the > > bandwidth by browsing ahead through the links on a page putting them > > in a temporary cache only adding them to a normal cache if they are > > actually visited. This temporary cache should be emptied on exit. > > A lot of ISPs and websites would blacklist Mozilla if it did this. For > instance, imdb.com had a section on how web 'accelerators' really slow > down the web (i.e. they waste bandwidth on pages at which one never > looks). > > A cvs-like system that checks frequently viewed pages for changes and > updates those that did might work well with an expanded cache (whether > your idea or another form). I believe this to be a more net friendly > and practical option. > > Fletch (The article Matt mentions can be found at http://www.imdb.com/irony.) This has actually come up a few times in npm.wishlist and I couldn't find a Bugzilla entry on it just now, so here it is. My admittedly imperfect memory gives this list of the things mentioned for "look ahead" cacheing: 1. Any look ahead solution should respect the robots.txt exclusion standard, at a bare minimum. 2. Ideally any look ahead will be history based, not link based. Preloading pages you have already visited and will probably visit again is good; preloading pages that you may or may not visit because they are linked from whatever's in the browser window is bad. 3. An implementation of the LINK tag relationship values "prev" and "next" might also cache the appropriate resources, regardless of whether they were seen or not, in the hopes that someone who is going through a series of documents will probably read them in series. There are some good reasons to close this bug early, though: 1. There are already many third-party products that provide these capabilities by acting as proxy servers, so this may be just another creeping feature for Mozilla. 2. Implementing this *badly* for Mozilla would be worse than doing nothing at all. Somewhat related is bug #11644 asking for more control over what types of resources get cached.
Reporter | ||
Updated•25 years ago
|
Assignee: gagan → nobody
Summary: [RFE] Browser should look ahead and cache [frequently visited] sites automatically → [HELP WANTED] Browser should look ahead and cache [frequently visited] sites automatically
Reporter | ||
Comment 1•25 years ago
|
||
Assigned to nobody@mozilla.org to flag as unclaimed feature request.
Comment 2•25 years ago
|
||
Note: WebTV precaches the document pointed to by <link rel="next"> elements, so if we do anything it should probably be that.
Bulk move of all Cache (to be deleted component) bugs to new Networking: Cache component.
Updated•25 years ago
|
Assignee: nobody → fur
Comment 4•25 years ago
|
||
->fur
Updated•25 years ago
|
Target Milestone: M20
Comment 5•25 years ago
|
||
We probably won't get to this for this release, but I'm going to leave it in the list in case someone wants to volunteer.
Comment 6•25 years ago
|
||
Assigning fur's cache bugs to Gordon. He can split them up with davidm.
Updated•25 years ago
|
Keywords: helpwanted
Updated•25 years ago
|
Summary: [HELP WANTED] Browser should look ahead and cache [frequently visited] sites automatically → Browser should look ahead and cache [frequently visited] sites automatically
Comment 7•24 years ago
|
||
spam, changing qa contact from paulmac to tever@netscape.com on networking/RDF bugs
QA Contact: paulmac → tever
marking rfe.
Summary: Browser should look ahead and cache [frequently visited] sites automatically → RFE: Browser should look ahead and cache [frequently visited] sites automatically
Moving to target milestone FUTURE. We'll take a look at it again after we ship N6.
Comment 10•23 years ago
|
||
Changing component and summary according to hixie's suggestion.
Component: Networking: Cache → Parser
Summary: RFE: Browser should look ahead and cache [frequently visited] sites automatically → RFE: Browser should prefetch LINK tag documents
Comment 11•23 years ago
|
||
Reassigning to component & QA owners.
Assignee: gordon → harishd
QA Contact: tever → bsharma
Comment 12•23 years ago
|
||
This isn't a parser bug. Giving to nobody.
Assignee: harishd → nobody
Component: Parser → Networking: Cache
Priority: P3 → P5
Comment 13•23 years ago
|
||
Who put in next, index and prev links if they dont want them to be used? Don't think this is bad in any way but wery good for the user. The browser shuld precash next, index and prev in that order. index and prev will be in the cach 90% of the time so they are no big deal to the webload. But precashing them vill be good 90% of the remaining 10% of the times. IMHO, this RFE is much more important than the link navigation GUI. Any webpage must define navigation in HTML anyway (most browsers dont implement link navigation) :-( Hovewer important, and I would *love* it for 1.0, it *is* a RFE and it is *not* critical ;-)
Comment 14•23 years ago
|
||
I also want this feature, although it's a minor problem. For example, http://www.cessna.com has this problem. On other browsers including NS4.78, prefetch works; ie hovering mouse cursor over the middle menu "Our Aircraft", "Owners.." etc you will see a different picture on the right. On mozilla, it's very slow(and actually it loads the picture each time via Internet), but NS4.78, it's very quick and it doesn't use the net at all. Apparently NS4.78 prefetches pictures.
Updated•22 years ago
|
Whiteboard: parity-webtv
Comment 15•22 years ago
|
||
I don't think it would make sense on Google: most of the time, I just look at the first page of results. (Google doesn't use link rel=next now, but they might add it if browsers get good keyboard shortcuts to access link rel=next.) On the other hand, this would make sense on most sites that use link rel=next, and rel=index certainly makes sense. Overall I think this rfe would be good to implement.
Comment 16•22 years ago
|
||
Prefetching is generally considered evil. To take an example close to heart, it woul approximately double the load on Bugzilla.
Whiteboard: parity-webtv → WONTFIX? parity-webtv
Comment 17•22 years ago
|
||
I don't think I've ever used Bugzilla's first/last/prev/next links, and I'm not sure Bugzilla is even using those links correctly.
Comment 18•22 years ago
|
||
I have used them, and they are used correctly.
Assignee | ||
Comment 20•22 years ago
|
||
we recently discussed prefetching with the caveat that the additional downloads would be low-priority, limited possibly to a single network connection, and preempted by new page loads. with our existing support for partial cache entries and byte range requests, we should be able to agressively drop partial prefetches without sacrificing all of the work done to prefetch. *** This bug has been marked as a duplicate of 159044 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Comment 21•22 years ago
|
||
oops, i closed the wrong bug :(
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Comment 22•22 years ago
|
||
*** Bug 159044 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•22 years ago
|
Status: REOPENED → ASSIGNED
Target Milestone: Future → mozilla1.2alpha
Assignee | ||
Comment 23•22 years ago
|
||
I am pretty worried about comment #16... in our recent discussions, we really haven't addressed server load as a potential concern. sure, any prefetching done might improve browser performance, but the overall benefits really depend on the likelihood that a prefetched document will actually be viewed. perhaps this is reason enough to trigger off a different tag? that way, servers could opt in. maybe a HTTP header would even be more appropriate, so as to ensure that only server admins get to choose the appropriate load level due to prefetching. or, maybe that shouldn't be any of mozilla's concern?! hmm...
Comment 24•22 years ago
|
||
IMO, Bugzilla should hide the list links unless you click "Go through this list one bug at a time". The current list links are wrong more often than they're right, take up a lot of space (especially if you haven't disabled the site navigation bar), and break caching. Bugzilla's strangeness should not stop us from making Mozilla faster on sites where Next actually means something.
Assignee | ||
Comment 25•22 years ago
|
||
updated status whiteboard and keywords
Whiteboard: WONTFIX? parity-webtv → [parity-webtv]
Assignee | ||
Comment 26•22 years ago
|
||
this patch implements a very simple version of prefetching. during page load it collects all of the link rel=next hrefs. when the page finishes loading, the prefetch code loads the first collected URL in the background. when that completes, it loads the next URL. it does this until all URLs are loaded or until the user clicks on a link or otherwise starts loading a new page. i've added some code to kill off prefetched loads of content that either is already from the cache or could not be taken from the cache without server revalidation. this should hopefully avoid taxing servers that misuse link rel=next (e.g., bugzilla).
Assignee | ||
Comment 27•22 years ago
|
||
for those inside the netscape firewall, there's a link rel=next testcase at http://unagi.mcom.com/~darinf/test0.html
Assignee | ||
Comment 28•22 years ago
|
||
this patch just tidies things up a bit. i've moved the prefetch code into the uriloader module (i think that makes sense). this patch works for the 1.0 branch and trunk (makefile.win changes only apply to the branch, of course). i haven't done the mac project changes yet.
Attachment #96109 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #96364 -
Flags: review+
Comment 29•22 years ago
|
||
Comment on attachment 96364 [details] [diff] [review] v0.1 patch r=dougt I spoke with darin about this patch and it is a great start. We/I would like to add a few things like per window precache queues, an attribute on nsICacheableChannel to determine if a channel can or should be precached, an d maybe some kind of UI that shows that there is precaching going on. Worse is better. Lets get this in.
Assignee | ||
Comment 30•22 years ago
|
||
one change i'd like to make is to not prefetch links containing a query string. those most likely correspond to dynamic content that will in most cases come back without cache headers forcing us to hit the server when loading the content for real. so, the value of prefetching such content is minimal. this nicely addresses the bugzilla problem BTW :-)
Assignee | ||
Comment 31•22 years ago
|
||
ok, minor change... just blocked URLs w/ query strings, and touched up some comments.
Attachment #96364 -
Attachment is obsolete: true
Assignee | ||
Comment 32•22 years ago
|
||
Comment on attachment 96936 [details] [diff] [review] v0.2 patch carrying forward r=dougt
Attachment #96936 -
Flags: review+
Assignee | ||
Comment 33•22 years ago
|
||
ok, this patch changes things a bit. instead of modifying the HTML content sink to invoke nsIPrefetchService::PrefetchURI, the prefetch service now hooks itself up as a tag observer (implementing nsIElementObserver). this way it'll be notified whenever the HTML parser encounters a <link> tag. i've also added code to make the prefetch service observe HTTP response headers. this would allow, for example, a proxy cache to dynamically introduce prefetch requests for content that is statistically very popular, for example.
Assignee | ||
Updated•22 years ago
|
Attachment #96936 -
Attachment is obsolete: true
Comment 34•22 years ago
|
||
Comment on attachment 97232 [details] [diff] [review] v1 patch r=dougt
Attachment #97232 -
Flags: review+
Assignee | ||
Comment 35•22 years ago
|
||
replaced the switch statement with an if-else to simplify code. thx dougt!
Attachment #97232 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #97242 -
Flags: review+
Comment 36•22 years ago
|
||
hey darin, this looks really good. the ownership model of the nsIURI within the nsPrefetchNode is kinda scary :-) You might want to add a comment or two explaining it ;-) also, could you leverage the loadgroup (associated with each document) to hold the prefetch requets... that way, you won't need to worry about cancelling them when a new document load is initiated... i don't think this is a big deal... just a thought. -- rick
Comment 37•22 years ago
|
||
Comment on attachment 97242 [details] [diff] [review] v1.1 patch - revised per comments from dougt over AIM sr=rpotts@netscape.com
Attachment #97242 -
Flags: superreview+
Assignee | ||
Comment 38•22 years ago
|
||
thx for the comments rick... 1- yeah, i'll add some comments on the URI ownership 2- i don't think using the loadgroup of the document would work. consider the case of two documents. one does prefetching, and in the other a user clicks on a link. now, loading the new page must contend with the prefetch traffic. whereas what i'd really like is to kill off all prefetch traffic when any other part of mozilla requests a page load.
Assignee | ||
Comment 39•22 years ago
|
||
patch landed on trunk... minus mac project changes. working on that now.
Comment 40•22 years ago
|
||
For the sake of people that have download limits on there accounts, I hope that a option has been put into place to turn this on and off as per a users requirements!!
Comment 41•22 years ago
|
||
I sure hope so too! I don't like the idea of URL prefetching. I have scarce bandwidth. Let me turn this off and I'll shut up. :-)
Comment 42•22 years ago
|
||
Chris, aaronl: Only links in the form of <link rel="next" href="..."> are prefetched. not <a href="...">. and there is a preference, disable this with: user_pref("network.prefetch-next", false);
Assignee | ||
Comment 43•22 years ago
|
||
the preference is only configurable from all.js for now. i probably should have made it dynamic (i.e., settable via prefs.js). the other thing to note is that prefetching will only occur when the browser has nothing else to do w/ the network connection, and furthermore any other browser requests to load anything will kill off any and all prefetch requests. we are also very selective in what we'll prefetch. that is, we only prefetch documents that can be reused.
Comment 44•22 years ago
|
||
what is now prefetched? Static files linked with <link rel="next"/> only, right?
Assignee | ||
Comment 45•22 years ago
|
||
any http:// URL that does not contain a ?query string will be prefetched. if the http headers indicate that the document would have to be fetched fresh each-and-everytime then we'll cancel the prefetch. in other words, yes, only static content will be prefetched.
Comment 46•22 years ago
|
||
Assuming it works as described above, I am impressed by the thought that has gone into this. Let's hope it makes pages appear to render nice and fast! Do we also support HTTP "Link" headers? They are supported for linking to CSS style sheets, so presumably it should work for this too, but they are not supported for site icons, so presumably we still don't have a single Link service yet, and this means it probably won't work for this either...
Assignee | ||
Comment 47•22 years ago
|
||
the prefetch service is a HTTP header observer, which means that it will pick up HTTP Link headers, but one downside is that HTTP only reports headers that are new. IOW, loading a page from the cache will not trigger HTTP header observers. that's one way in which Link: header differs from LINK tag. perhaps a unified Link service would be the best way to resolve these differences.
Comment 48•22 years ago
|
||
We have several consumers of link elements and headers, each done slightly differently (e.g. the stylesheet thing probes, rather than listening, and the site icon code only does <link> elements). One day someone will snap and unify all this, hopefully. :-) Great to hear than Link: headers were taken into account though! How about <meta http-equiv="link"> ?
Assignee | ||
Comment 49•22 years ago
|
||
nope... looks like the <meta HTTP-EQUIV="link" ...> would be missed :-( oh well... will work on a follow up patch! 1) make prefetching a user preference 2) add support for <meta HTTP-EQUIV="link" CONTENT="...">
Assignee | ||
Comment 50•22 years ago
|
||
checked in mac project changes.. so this should be in all builds of mozilla 1.2 alpha :-)
Assignee | ||
Comment 51•22 years ago
|
||
fixed-on-trunk see these bugs for the remaining issues: http://bugzilla.mozilla.org/show_bug.cgi?id=166647 http://bugzilla.mozilla.org/show_bug.cgi?id=166648
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
Comment 52•22 years ago
|
||
Would it be worth a special case to not stop the prefetching when the user clicks on a link to the page that is currently being prefetched? It seems that the probability of this happening is quite high on the rel=next case. I understand that the partial cache entries and byte range requests help with this somewhat, but the worst case is still tearing a tcp connection down, re-establishing it and sending a new request, right?
Assignee | ||
Comment 53•22 years ago
|
||
marko, if the next page references a large document that is being prefetched, and the user clicks a link to advance to the next page, then if we don't cancel the prefetched load, loading of the next page will appear to stop on the prefetched document. only when the prefetched document is entirely downloaded will the document appear to snap into place/view. this happens because the prefetch load and the new load are not at all tied together. instead, the second one gets blocked waiting for access to the cache. so, while it might be true that not canceling would give better page load times, the result is something that most likely won't appeal to many users. i think it's better to cancel the prefetch partway thru, so we can go ahead and display what we've already got ASAP.
Comment 54•22 years ago
|
||
Wouldn't it be cool if instead of blocking, it noticed that a fetch was already in progress for that resource, and simply hooked into it?
Assignee | ||
Comment 55•22 years ago
|
||
it would certainly be cool, but not at all easy to implement.
Comment 56•22 years ago
|
||
Concerning comment #38 (2): Why killing this, and not just suspending until any other activity ends? (example situation: webpage with search results. I open some of them in new windows/tabs to see if they contain what I want, but if not - I want to check the next page of results. Would be nice if it was there already.) Concerning comment #47: I wouldn't worry. If the document is already in cache, then there's a good chance that either the "next" document will be there as well, or it will not be needed, since if the user didn't follow it first time, they won't follow it now too. Worth consideration: Mozilla vs Law. "By clicking on this link you agree to terms of conidtions...". The server logs will say you followed this link even though you didn't. Luckily webmasters rather don't implement such points as <link... Also consider possiblity of maliciously formed pages to use this feature to exploit remote server vulnerablities. More on topic: http://www.phrack.org/show.php?p=57&a=10 Limiting number of links to follow to a fairly low number (8?) would prevent abusing this.
Comment 57•22 years ago
|
||
Bartosz wrote:
>Worth consideration: Mozilla vs Law. "By clicking on this link you agree to
>terms of conidtions...". The server logs will say you followed this link even
>though you didn't. Luckily webmasters rather don't implement such points as
><link...
As I was reading through this bug I was thinking exactly the same thing. TBH I
was considering whether 'next' might be considered a 'submit' link (eg. in a
wizard/druid of some kind), in which case preloading it would submit without the
user's express permission - as Bartosz said. But I'm sure there are worse ways
this could be exploited, much as I think its a good idea ;|
At some level this might be considered a bug in the web-app, but if a user
switches to a gecko browser and suddenly finds that their web-email inbox is
empty due to mozilla precaching the 'remove this email' link, there could be
problems...
Comment 58•22 years ago
|
||
Why would anyone ever say <link rel="next" href="./delete-mail"> ...and would they ever do it without a query string? I doubt it.
Assignee | ||
Comment 59•22 years ago
|
||
here's a site that uses <link rel=next> without any cache control headers. the pages appear to be served up using PHP without the use of query strings. as a result we prefetch each next page, but then kill off the load once we see that it doesn't have any cache headers. needless to say this is bad for a number of reasons. perhaps the best recourse is to evangelize such sites. hmm... http://www.gnome.org/start/2.0/releasenotes.html
Comment 60•22 years ago
|
||
I take it "Last-Modified" is one of those cache headers? VERIFIED FIXED. This bug makes reading the HTML4 spec so much nicer. Thanks Darin. You kick ass.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 61•22 years ago
|
||
yes, Last-Modified is good enough, because it let's us take a guess at how long the document can remain stale ((date - lastmodified)/10), and when the document does expire, all we have to do is send a conditional If-Modified-Since request to the server (allowing the server to say 304 not modified).
Comment 62•22 years ago
|
||
Uhm... why do I see two downloads of the same document with ethereal? I'm browsing the HTML specs and I see two full GET's for each page...! I'm using build 2002091505 .
Comment 63•22 years ago
|
||
With a current trunk build on win2k, I can't reproduce this (although instead of using ethereal I was just checking breakpoints in the code and also checking the server logs for a prefetch testcase modelled on the w3 TR documents). I see a GET for the first document, then a prefetch of the second. On moving to the second document, I see a prefetch of the third document, etc. Note: if you click on the 'Next' link in those pages before the prefetch of that next document was complete then this would cancel that pending prefetch and a new (partial) GET request would be issued for the next document. But this is by design.
Assignee | ||
Comment 64•22 years ago
|
||
nick: you should also verify that ethereal isn't "lying to you" ... sometimes (especially under windows) it'll report a packet twice. check the sequence numbers to be certain you aren't seeing an ethereal bug. barring that there is always the possibility that you are loading pages that do not allow caching. after loading one of the pages, you could look at the page info for the page to see if the server specified an expiration time. once that expiration time is past, the prefetched document will have to be validated the next time it is loaded.
Assignee | ||
Comment 65•22 years ago
|
||
combines original patch + all subsequent fixes (bug 166647, bug 171102, and bug 173278).
Assignee | ||
Comment 66•22 years ago
|
||
there's quite a bit of whitespace-noise in the real branch patch. review this one instead.
Assignee | ||
Updated•22 years ago
|
Keywords: adt1.0.2,
mozilla1.0.2
Assignee | ||
Updated•22 years ago
|
Alias: prefetch
Comment 67•22 years ago
|
||
While I agree that PRTimeToSeconds should really live in NSPR, we should minimize having multiple definitions of the same function (I count 3 more within Necko alone-- FTP has 2 and HTTP has another one) Can we at least consolidate that into Necko's common util.h file (I forget the exact name-- it's been that long!) Other than this nitpick it looks great! r=gagan
Comment 68•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review] 1.0 branch patch w/ no whitespace changes r=gagan
Attachment #102309 -
Flags: review+
Assignee | ||
Comment 69•22 years ago
|
||
gagan: thanks for the review. i talked to wtc about PRTimeToSeconds (can't remember the bug no), and he decided that he didn't want to put that function in NSPR because (1) no way to represent dates before 1970 and (2) no way to represent dates after ~2130. i see his point, and i'm hoping to eventually clean things up so that we can have one instance of this function.
Comment 70•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review] 1.0 branch patch w/ no whitespace changes sr=rpotts@netscape.com
Attachment #102309 -
Flags: superreview+
Comment 71•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review] 1.0 branch patch w/ no whitespace changes a=rjesup@wgate.com for 1.0 branch with the proviso that the default be that the feature is disabled unless the pref is used to enable it, as per driver discussions and Darin's agreement.
Attachment #102309 -
Flags: approval+
Updated•22 years ago
|
Keywords: mozilla1.0.2 → mozilla1.0.2+
Assignee | ||
Comment 72•22 years ago
|
||
default disabled sounds perfectly reasonable to me.
Comment 73•22 years ago
|
||
Discussed in bBird team meeting. We will give adt approval for this as soon as test cases are attached to this bug. Please make sure Bindu is cc'ed on bug. We need the checkin to happen by 10/16 COB. I will watch this bug and immediately plus it when the test cases are attached. We are going to reserve the right to ship with this turned off if problems are discovered.
Assignee | ||
Comment 74•22 years ago
|
||
this URL contains links to some examples (sorry for the internal site): http://unagi/~darinf/prefetch/testcases.html
Comment 75•22 years ago
|
||
Plussing per email from darin indicated lyecies approval to ship pref'ed on, and attached test cases.
Comment 76•22 years ago
|
||
Plussing per email from darin indicated lyecies approval to ship pref'ed on, and attached test cases.
Assignee | ||
Comment 77•22 years ago
|
||
marking fixed1.0.2 belatedly. patch landed 10/16.
Keywords: mozilla1.0.2+ → fixed1.0.2
Comment 79•22 years ago
|
||
Verified on 2002-10-25-branch build on Win 2K. The test case works as expected.
Keywords: fixed1.0.2 → verified1.0.2
You need to log in
before you can comment on or make changes to this bug.
Description
•