Closed Bug 1009122 Opened 6 years ago Closed 5 years ago

Rewrite Seer Backend

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox37 --- unaffected
firefox38 + fixed
relnote-firefox --- 38+

People

(Reporter: u408661, Assigned: u408661)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 10 obsolete files)

Just like the summary says, we need to rewrite the seer backend. Everything is going to be stored in the cache, but to avoid making things even worse, we're going to wait until the new cache backend is enabled by default on all products.
Nick, can you please outline as detailed as possible here how you are going to proceed?  Please expose as many details as you can.  Thanks.
Flags: needinfo?(hurley)
Here's the general plan for how to store things (I won't go into full detail of the workings of the seer/predictor, since those will remain unchanged).

There will be 2 extensions used as the second argument to AsyncOpenURI, one for origin-only seer cache entries, and one for full-uri seer cache entries. When the seer is asked to predict (or learn), it will open both entries that correspond to the URI that is the key. Once the entry is opened, the seer will do the work as necessary with the opened entry. Information kept for what pages to act on during startup will have a special predictor-schemed URI, to avoid conflicting with other things. Otherwise, they are exactly the same as any other predictor-created entry.

Each entry will have the following metadata items kept on it to facilitate the actual work:

[entries related directly to the page/origin]
* a timestamp of the last time this page was loaded (we can't use the cache entry's lastFetched or lastModified attributes, as we will access/modify the cache entry more often than we actually load the page)
* a number of times this page has been loaded (again, we can't use the cache entry's fetchCount attribute for the same reason as above)
* a count of the number of subresources we have stored with this entry

[optional entries related to redirections - only applicable to full-uri entries, not origin entries]
* a URI that this URI has redirected to in the past (the most recent redirection target)
* a timestamp for the last time we saw this redirection occur
* a count of how many times we've seen this redirection occur

[set of entries for what the seer calls a "subresource" - stuff linked to on a page (images, scripts, stylesheets). we will have each of these entries for each resource on a page]
* a mapping from URI to the index of everything else for this entry, so we can look up when we don't have the index explicitly available (which is the case when we're updating our database)
* the URI (keyed on index, so we can look up when we don't have the uri explicitly available, which is the case when we're performing a prediction)
* a timestamp for the last time we saw this resource loaded by this page
* a count of how many times we've seen this resource loaded by this page

The number of subresources will be limited (via a pref). I have not yet decided on a limit for this, but it will certainly be <= 100. The limit will probably be based on information from the HTTP Archive on how many resources the average page loads, plus a little bit of room for future growth. I also have not yet decided on a scheme for recycling indices (other than a general LRU kind of thing) and keeping track of which index to recycle next (thus evicting a lesser-used subresource). Most likely I will add one more metadata entry that keeps a list of used indices in order of use, and then I can just take the last index in the list as the one to evict.

You may be wondering why I chose to do this all in metadata, instead of storing some things in the actual data of the cache entry. It's pretty simple - I want to make this all available ASAP, and have as few round-trips to the main thread as possible (so using an async read or write on an input/output stream breaks that goal). Also, keeping everything in metadata keeps me from having to come up with a text format and parsing/creating it every time I need to do something. Memory usage may come up as a concern, but I'll need all of this in memory to operate on it anyway, so I'm fairly certain it's pretty much the same either way.

When the seer is asked to predict, it will open the appropriate cache entries, and when those are available, iterate through the subresources stored on the cache entries (based on index) and do it's calculations and take some action (or not) for each of those.

When it's asked to update its database, again the appropriate cache entries will be opened, and either the top-level data will be updated, or for a subresource, the index for that will be looked up, and then the appropriate metadata entries will be changed.

I have a large portion of this implemented already, though not ready for testing yet :) If there are major changes needed, that's fine, but I'm hoping there won't be.

Honza, does this give you enough of an idea?
Flags: needinfo?(hurley)
Nick, thanks for this list!  I was looking at it few times, but never started with the actual feedback.  Sorry for the delay!

(In reply to Nicholas Hurley [:hurley] from comment #2)
> Here's the general plan for how to store things (I won't go into full detail
> of the workings of the seer/predictor, since those will remain unchanged).
> 
> There will be 2 extensions used as the second argument to AsyncOpenURI, one
> for origin-only seer cache entries, and one for full-uri seer cache entries.

Why don't you use the cache entry for the 'full-uri' to store the stuff directly?  It would be perf-regressing to have like "http://example.com/index.html"|"" and then "http://example.com/index.html"|"seer:full-uri-data".  Is the reason to don't lose the metadata?  If so, I'd rather introduce a mechanism (if simple enough) to keep just metadata and get rid of the data during normal out-of-space eviction.  Also, your special entries will also go away when we hit the limit, so we may need a mechanism like this anyway.

Only drawback I can see that seer will raise the frecency value when opening the 'full-uri' entries.  But we can well introduce a flag (simple) to not affect frecency (we should have one anyway - e.g. for about:cache or any random consumer that may use it - bug 1000338).

> When the seer is asked to predict (or learn), it will open both entries that
> correspond to the URI that is the key. Once the entry is opened, the seer
> will do the work as necessary with the opened entry. Information kept for
> what pages to act on during startup will have a special predictor-schemed
> URI, to avoid conflicting with other things. 

OK, but you should say why - I don't see a conflict when using idext, how will you distinct http:/https:/whatever: ?

> Otherwise, they are exactly the
> same as any other predictor-created entry.
> 
> Each entry will have the following metadata items kept on it to facilitate
> the actual work:
> 
> [entries related directly to the page/origin]

"Page or origin" or "Page and origin"?  Is "page" standing here for 'full-uri'?

> * a timestamp of the last time this page was loaded (we can't use the cache
> entry's lastFetched or lastModified attributes, as we will access/modify the
> cache entry more often than we actually load the page)

FYI, bug 1000338.  Why not to use frecency and take advantage of the above proposed flag?

> * a number of times this page has been loaded (again, we can't use the cache
> entry's fetchCount attribute for the same reason as above)

Again, OPEN_SECRET may help, but I don't think this is implemented.  In general, please don't invent a wheel again, rather reuse what you can to save space.  Remember how much disk space and memory the last impl was eating.

> * a count of the number of subresources we have stored with this entry

So, I don't follow the 'origin' kind of entry.  That should keep this information too?

> 
> [optional entries related to redirections - only applicable to full-uri
> entries, not origin entries]
> * a URI that this URI has redirected to in the past (the most recent
> redirection target)

Could this be stored on the origin entry?  Like when we have redirs "http://example.com/a.htlm" -> "http://example.com/b.html" -> "http://example.com/c.html" the final channel can store the final "c.html" target on the "a.html" entry.

> * a timestamp for the last time we saw this redirection occur
> * a count of how many times we've seen this redirection occur

OPEN_SECRET

> 
> [set of entries for what the seer calls a "subresource" - stuff linked to on
> a page (images, scripts, stylesheets). we will have each of these entries
> for each resource on a page]

Why for Christ sake??

> * a mapping from URI to the index of everything else for this entry, so we
> can look up when we don't have the index explicitly available (which is the
> case when we're updating our database)

Can you please more explain this?  I don't much follow and it seems like pretty important to understand well.

> * the URI (keyed on index, so we can look up when we don't have the uri
> explicitly available, which is the case when we're performing a prediction)
> * a timestamp for the last time we saw this resource loaded by this page
> * a count of how many times we've seen this resource loaded by this page

I think this could well be stored on the entry, unless you need to leave the metadata.

> 
> The number of subresources will be limited (via a pref). I have not yet
> decided on a limit for this, but it will certainly be <= 100. The limit will
> probably be based on information from the HTTP Archive on how many resources
> the average page loads, plus a little bit of room for future growth. 

No prophecies please :)  I think 20 resources, and only those necessary for first-paint are OK.  I would not bother with images and any async scripts (or so).

BTW, you 6 months of work on the seer v1 - did it bring some useful data on what is good to store and what is good to ignore?  Because it appeared to me that you were storing just too much and it was actually slowing down in some cases.  We need to be smart here and also think of the disk overhead for the seer data.

> I also
> have not yet decided on a scheme for recycling indices (other than a general
> LRU kind of thing) and keeping track of which index to recycle next (thus
> evicting a lesser-used subresource). Most likely I will add one more
> metadata entry that keeps a list of used indices in order of use, and then I
> can just take the last index in the list as the one to evict.

(Maybe I'm stuck on the same thing here) Can you please more explain what indices you need here?  What is the index?  Why do you actually need it?

> 
> You may be wondering why I chose to do this all in metadata, instead of
> storing some things in the actual data of the cache entry. It's pretty
> simple - I want to make this all available ASAP, and have as few round-trips
> to the main thread as possible (so using an async read or write on an
> input/output stream breaks that goal). 

I fully understand, just please keep in mind that metadata are held in memory and we only have a limited space for it (250kb).  If you are going to keep a lot of your stuff in memory and try to have it all avail at once, you will probably throw away 'real' entries metadata away what will slow down load of warmed pages OTOH.  However, here I don't have enough data here (must fix bug 920606).

> Also, keeping everything in metadata
> keeps me from having to come up with a text format and parsing/creating it
> every time I need to do something. 

That's easy tho ;)

> Memory usage may come up as a concern,
> but I'll need all of this in memory to operate on it anyway, so I'm fairly
> certain it's pretty much the same either way.

Yeah, but with you approach you will hit limits that cause stuff to happen, just FYI.

> 
> When the seer is asked to predict, it will open the appropriate cache
> entries, 

The 'origin' and the 'full-uri'?  In this order?  Remember the OPEN_PRIORITY flag as well.

> and when those are available, iterate through the subresources
> stored on the cache entries (based on index) and do it's calculations and
> take some action (or not) for each of those.
> 
> When it's asked to update its database, 

What database?

> again the appropriate cache entries
> will be opened, and either the top-level data will be updated, or for a
> subresource, the index for that will be looked up, and then the appropriate
> metadata entries will be changed.
> 
> I have a large portion of this implemented already, though not ready for
> testing yet :) If there are major changes needed, that's fine, but I'm
> hoping there won't be.

I am afraid there will.. sorry :/

> 
> Honza, does this give you enough of an idea?
(In reply to Honza Bambas (:mayhemer) from comment #3)
> Nick, thanks for this list!  I was looking at it few times, but never
> started with the actual feedback.  Sorry for the delay!

No worries. I've had so much on my plate that I haven't totally committed to anything yet, anyway :)

> (In reply to Nicholas Hurley [:hurley] from comment #2)
> > Here's the general plan for how to store things (I won't go into full detail
> > of the workings of the seer/predictor, since those will remain unchanged).
> > 
> > There will be 2 extensions used as the second argument to AsyncOpenURI, one
> > for origin-only seer cache entries, and one for full-uri seer cache entries.
> 
> Why don't you use the cache entry for the 'full-uri' to store the stuff
> directly?  It would be perf-regressing to have like
> "http://example.com/index.html"|"" and then
> "http://example.com/index.html"|"seer:full-uri-data".  Is the reason to
> don't lose the metadata?  If so, I'd rather introduce a mechanism (if simple
> enough) to keep just metadata and get rid of the data during normal
> out-of-space eviction.  Also, your special entries will also go away when we
> hit the limit, so we may need a mechanism like this anyway.

It's to avoid losing the metadata, yes. I'm fine with entries being evicted when the size limit gets hit (no point in me trying to implement my own size-management on top of the cache's existing size management), but I don't want to lose data when, say, an entry is doomed because a revalidation failed. The other reason is that this makes it MUCH easier to clear out just the seer data from a "Clear Recent History" invocation or similar. Right now, seer is bunched in with "browsing history data" and not "cache data", though I can certainly see an argument for changing that up now. Iterating through the entries and looking for a particular ID extension seems easier than looking up a metadata entry to see if a cache entry contains seer data or not.

> Only drawback I can see that seer will raise the frecency value when opening
> the 'full-uri' entries.  But we can well introduce a flag (simple) to not
> affect frecency (we should have one anyway - e.g. for about:cache or any
> random consumer that may use it - bug 1000338).

This will be quite useful, whether we stick with separate entries or merge full-uri with the existing normal cache entry. I'll mark that work as blocking this bug, since it's a feature I would prefer to not work without (and greatly simplifies a lot of things, including not having as many metadata entries as you point out later).

> > When the seer is asked to predict (or learn), it will open both entries that
> > correspond to the URI that is the key. Once the entry is opened, the seer
> > will do the work as necessary with the opened entry. Information kept for
> > what pages to act on during startup will have a special predictor-schemed
> > URI, to avoid conflicting with other things. 
> 
> OK, but you should say why - I don't see a conflict when using idext, how
> will you distinct http:/https:/whatever: ?

How do I distinguish them for startup? (I hope it's obvious how it's distinguished for the non-startup case, since the keys are all the URIs!)

I didn't explain startup well enough. The startup entries are just "Here's a list of URIs that have been visited near browser startup", there isn't one entry per URI visited at startup (well, there is, but it's the same entry as if that URI wasn't visited during startup). For example, My startup URI entry might look something like this:

key: predictor://startup
loadCount: 30
lastLoad: <timestamp of this morning>
uris: gmail, google calendar, twitter, bugzilla

Then I use that list of URIs to open connections to those sites, since they're likely to be loaded again soon when the browser starts up.

> > Otherwise, they are exactly the
> > same as any other predictor-created entry.
> > 
> > Each entry will have the following metadata items kept on it to facilitate
> > the actual work:
> > 
> > [entries related directly to the page/origin]
> 
> "Page or origin" or "Page and origin"?  Is "page" standing here for
> 'full-uri'?

I've used "page" and "full-uri" interchangeably, sorry. The page/origin bit in brackets just means that both full-uri entries and origin-only entries have these entries. So, for a full-uri entry, these entries are related to the page itself. For an origin entry, these metadata entries are related to the origin overall.

> > * a timestamp of the last time this page was loaded (we can't use the cache
> > entry's lastFetched or lastModified attributes, as we will access/modify the
> > cache entry more often than we actually load the page)
> 
> FYI, bug 1000338.  Why not to use frecency and take advantage of the above
> proposed flag?

Sounds like a wonderful idea!

> > * a number of times this page has been loaded (again, we can't use the cache
> > entry's fetchCount attribute for the same reason as above)
> 
> Again, OPEN_SECRET may help, but I don't think this is implemented.  In
> general, please don't invent a wheel again, rather reuse what you can to
> save space.  Remember how much disk space and memory the last impl was
> eating.

Yes, it's definitely worth waiting for OPEN_SECRET. Nicely, most of the work can be done before that's in place, and then I just have to add a flag to the calls to AsyncOpen once the OPEN_SECRET work lands.

> > * a count of the number of subresources we have stored with this entry
> 
> So, I don't follow the 'origin' kind of entry.  That should keep this
> information too?

It does, but for an origin entry, the subresources are actually "origins from which we've loaded subresources for loads on this origin". The origin entries exist to help us make predictions when a page is loaded from an origin that the user has visited before, but not the particular URL that is being loaded. For example, I do a lot of wikipedia surfing. I don't hit the same entry over and over again very often, so the origin entries are designed to say "ok, for wikipedia.org, we often load entries from these origins" so we can get a bit of a win even on brand-new page loads. Of course there's no opportunity for prefetching on an origin-only entry, but that's ok.

> > 
> > [optional entries related to redirections - only applicable to full-uri
> > entries, not origin entries]
> > * a URI that this URI has redirected to in the past (the most recent
> > redirection target)
> 
> Could this be stored on the origin entry?  Like when we have redirs
> "http://example.com/a.htlm" -> "http://example.com/b.html" ->
> "http://example.com/c.html" the final channel can store the final "c.html"
> target on the "a.html" entry.

If I understand what you're asking correctly, that's what happens now, though I'm not sure why you're asking about the origin entry (perhaps we're just getting terminology confused). It doesn't make sense (IMHO) to talk about "origin a has redirected to origin b", since there could be plenty of pages on origin a that don't redirect at all, or that have redirects to some other origin (URL shortening comes to mind here).

In your example, on the a.html entry, we store "hey, this redirected to c.html". Then on the c.html entry, we store all the actual data associated with resources on c.html (so it can work both for a future redirect from a.html and a future direct navigation to c.html). Does that clarify?

> > * a timestamp for the last time we saw this redirection occur
> > * a count of how many times we've seen this redirection occur
> 
> OPEN_SECRET

Yes, please! :)

> > 
> > [set of entries for what the seer calls a "subresource" - stuff linked to on
> > a page (images, scripts, stylesheets). we will have each of these entries
> > for each resource on a page]
> 
> Why for Christ sake??

You mean other than "because it's the data the seer needs to do its work"? I can definitely see room for optimizing HOW the data is stored, but (with the possible exception of indices), this is all data that is needed for the seer to actually be useful, as our predictions are based on time since last load/hit (hit being defined as "last time a resource was loaded for a particular page") and load/hit count.

> > * a mapping from URI to the index of everything else for this entry, so we
> > can look up when we don't have the index explicitly available (which is the
> > case when we're updating our database)
> 
> Can you please more explain this?  I don't much follow and it seems like
> pretty important to understand well.

Absolutely. Let's assume you have a page with resources that looks like this:

http://example.com/
    http://some-cdn.com/a-css-reset-library.css
    http://example.com/style.css
    http://some-other-cdn.com/jquery.js
    http://example.com/logo.jpg

When the seer is asked to predict for example.com, right now (both in sqlite and the cache as I currently have it half-implemented), it needs to get the list of those resources, make calculations based on number of hits, last hit timestamp, etc. Sqlite made this easy - effectively select * from resources where loading_page = http://example.com/, and then iterate through the rows. The cache (the way I currently did this, though I'm starting to think of other ways to do this) makes this harder. So, I have to look up the number of "rows" (using sqlite terminology to show the mapping in my mind), then for each "row" get each of the relevant "columns". The natural way to do this was have most of the data stored with entries like this:
    uri-row-0 = http://some-cdn.com/a-css-reset-library.css
    hit-count-row-0 = 5
    last-hit-row-0 = <timestamp>
    uri-row-1 = http://example.com/style.css
    <etc, etc, etc>
That makes it really easy when predicting to do for (i = 0; i < num_rows; i++) { get each metadata entry for row i, do calculations, setup prediction if required }

However, when learning about something, we don't have the index (since that's seer-internal data), so we have to have some way to say "oh, you're trying to update information about http://example.com/style.css? That's at index 1!". Thus, there are entries like
    row-number-for-http://example.com/style.css = 1
So when we're told "Hey, http://example.com/ loaded http://example.com/style.css", we now have a way to look up the index for style.css on the example.com entry, and can update the metadata entries for that accordingly.

The other reason we need indices is so we can keep track easily of how many resources we are tracking for a page, and keep that list limited to a sane number (for some definition of sane!). We have the count, but we don't (yet!) have a way to figure out which one to evict once we've reached our limit. The answer to that is one more cache entry. For example:
    row-numbers-in-order-of-least-recently-to-most-recently-used = [3, 1, 19, 0]
Now when we hit row-count == max-row-count, it's easy to look up which index to evict (in this case, 3) and re-use (which would then make the entry look like [1, 19, 0, 3] where 3 is now the index for the new resource).

It's quite possible (probable, even!) that I'm making this way too complicated. I'm already thinking of a way to make this simpler (and reduce at least the number of metadata entries the seer will add). Perhaps storing a base64-encoded binary blob (to avoid issues with null bytes confusing the metadata parser) with all the resource URIs, hit counts, and timestamps that is serialized in LRU order. That reduces the number of metadata entries from 4*number_resources to 1, though base64-encoding is longer than the length of what it's encoding. Perhaps there's some method I'm not thinking of yet?

> > * the URI (keyed on index, so we can look up when we don't have the uri
> > explicitly available, which is the case when we're performing a prediction)
> > * a timestamp for the last time we saw this resource loaded by this page
> > * a count of how many times we've seen this resource loaded by this page
> 
> I think this could well be stored on the entry, unless you need to leave the
> metadata.

I'm not sure I follow here. Do you mean store this on the regular cache entry for the resource, not the page from which it was loaded? If so, I'm not a fan of that (though I could be convinced), as that will require a lot of round-trips to open the entry for the page, then read the list of resources from there, and open those entries, then do all the calculations to make predictions.

> > 
> > The number of subresources will be limited (via a pref). I have not yet
> > decided on a limit for this, but it will certainly be <= 100. The limit will
> > probably be based on information from the HTTP Archive on how many resources
> > the average page loads, plus a little bit of room for future growth. 
> 
> No prophecies please :)  I think 20 resources, and only those necessary for
> first-paint are OK.  I would not bother with images and any async scripts
> (or so).

That's why I said based on information from HTTP Archive. And FWIW, I wasn't far off... HTTP Archive says an average of 96 requests per pageload(!!!)

I'm fine with limiting to the more important resources, it definitely sounds like a good space optimization.

> BTW, you 6 months of work on the seer v1 - did it bring some useful data on
> what is good to store and what is good to ignore?  Because it appeared to me
> that you were storing just too much and it was actually slowing down in some
> cases.  We need to be smart here and also think of the disk overhead for the
> seer data.

Unfortunately, no, for a few reasons. One, the seer didn't spend a whole lot of time pref'd on outside my machine, so there's not a whole lot of good data to go on. And two, there was a bug with a decent amount of the telemetry causing it to not be accumulated at all, even on my machine :/

> > I also
> > have not yet decided on a scheme for recycling indices (other than a general
> > LRU kind of thing) and keeping track of which index to recycle next (thus
> > evicting a lesser-used subresource). Most likely I will add one more
> > metadata entry that keeps a list of used indices in order of use, and then I
> > can just take the last index in the list as the one to evict.
> 
> (Maybe I'm stuck on the same thing here) Can you please more explain what
> indices you need here?  What is the index?  Why do you actually need it?

Does the explanation a few sections above help?

> > 
> > You may be wondering why I chose to do this all in metadata, instead of
> > storing some things in the actual data of the cache entry. It's pretty
> > simple - I want to make this all available ASAP, and have as few round-trips
> > to the main thread as possible (so using an async read or write on an
> > input/output stream breaks that goal). 
> 
> I fully understand, just please keep in mind that metadata are held in
> memory and we only have a limited space for it (250kb).  If you are going to
> keep a lot of your stuff in memory and try to have it all avail at once, you
> will probably throw away 'real' entries metadata away what will slow down
> load of warmed pages OTOH.  However, here I don't have enough data here
> (must fix bug 920606).

This is good to know, and definitely something to keep an eye on (assuming we don't trash this all and go for seer v3 :P)

> > Also, keeping everything in metadata
> > keeps me from having to come up with a text format and parsing/creating it
> > every time I need to do something. 
> 
> That's easy tho ;)
> 
> > Memory usage may come up as a concern,
> > but I'll need all of this in memory to operate on it anyway, so I'm fairly
> > certain it's pretty much the same either way.
> 
> Yeah, but with you approach you will hit limits that cause stuff to happen,
> just FYI.

This is true. Which is why I'm running this by you early on, before I have something that I consider reviewable :)

> > 
> > When the seer is asked to predict, it will open the appropriate cache
> > entries, 
> 
> The 'origin' and the 'full-uri'?  In this order?  Remember the OPEN_PRIORITY
> flag as well.

That's the order in which it's currently written, yes. I'm totally open to changing things around, though. Also, I was avoiding OPEN_PRIORITY, since I didn't want to starve more legitimate cache lookups, but if you think it's OK, I'll go ahead and use it.

> > and when those are available, iterate through the subresources
> > stored on the cache entries (based on index) and do it's calculations and
> > take some action (or not) for each of those.
> > 
> > When it's asked to update its database, 
> 
> What database?

I'm just using "database" as a term for "the data stored in the cache that is related to the seer". No actual database used.

> > again the appropriate cache entries
> > will be opened, and either the top-level data will be updated, or for a
> > subresource, the index for that will be looked up, and then the appropriate
> > metadata entries will be changed.
> > 
> > I have a large portion of this implemented already, though not ready for
> > testing yet :) If there are major changes needed, that's fine, but I'm
> > hoping there won't be.
> 
> I am afraid there will.. sorry :/

Don't be. At least we'll (hopefully) get somewhere better than sqlite! :D
(In reply to Nicholas Hurley [:hurley] from comment #4)
> > Could this be stored on the origin entry?  Like when we have redirs
> > "http://example.com/a.htlm" -> "http://example.com/b.html" ->
> > "http://example.com/c.html" the final channel can store the final "c.html"
> > target on the "a.html" entry.
> 
> If I understand what you're asking correctly, that's what happens now,
> though I'm not sure why you're asking about the origin entry (perhaps we're
> just getting terminology confused). 

Yes!  It has to be 'redirect originating - aka the first in the chain - uri entry' not yours 'origin' entry.

> It doesn't make sense (IMHO) to talk
> about "origin a has redirected to origin b", since there could be plenty of
> pages on origin a that don't redirect at all, or that have redirects to some
> other origin (URL shortening comes to mind here).
> 
> In your example, on the a.html entry, we store "hey, this redirected to
> c.html". Then on the c.html entry, we store all the actual data associated
> with resources on c.html (so it can work both for a future redirect from
> a.html and a future direct navigation to c.html). Does that clarify?

When thinking of it, we only save loop over b.html here.  Also, a.html already keeps the information it's redirecting somewhere already - it has '30X, Location: b.html' response cached.

How is actually the redirect information you want to collect (regardless how we are eventually going to do that) later used?  We preconnect the target host?  We start preloading the target page?  Do we just skip the pages in the chain (for the above example the b.html)?  Would that be always save?  Because I don't believe it would be, multiple redirects are sometimes used to add/verify login cookies.  Do we have data how many redirects in the chain usually are?

I just want to understand how you want to benefit or speed up the redirect chaining with seer.  Also could be something we may want to do in a followup?  Splitting tasks is always good and this is relatively big!


> > > [set of entries for what the seer calls a "subresource" - stuff linked to on
> > > a page (images, scripts, stylesheets). we will have each of these entries
> > > for each resource on a page]
> > 
> > Why for Christ sake??
> 
> You mean other than "because it's the data the seer needs to do its work"? I
> can definitely see room for optimizing HOW the data is stored, but (with the
> possible exception of indices), this is all data that is needed for the seer
> to actually be useful, as our predictions are based on time since last
> load/hit (hit being defined as "last time a resource was loaded for a
> particular page") and load/hit count.

OK, maybe I here just need more details on how seer actually works.  From what you wrote it seems like you want to have mirror entry for EACH subresource on a page.  That sounds crazy.  But maybe you just want to store the list of subresources?

> 
> > > * a mapping from URI to the index of everything else for this entry, so we
> > > can look up when we don't have the index explicitly available (which is the
> > > case when we're updating our database)
> > 
> > Can you please more explain this?  I don't much follow and it seems like
> > pretty important to understand well.
> 
> Absolutely. Let's assume you have a page with resources that looks like this:
> 
> http://example.com/
>     http://some-cdn.com/a-css-reset-library.css
>     http://example.com/style.css
>     http://some-other-cdn.com/jquery.js
>     http://example.com/logo.jpg
> 
> When the seer is asked to predict for example.com, right now (both in sqlite
> and the cache as I currently have it half-implemented), it needs to get the
> list of those resources, make calculations based on number of hits, last hit
> timestamp, etc. Sqlite made this easy - effectively select * from resources
> where loading_page = http://example.com/, and then iterate through the rows.
> The cache (the way I currently did this, though I'm starting to think of
> other ways to do this) makes this harder. So, I have to look up the number
> of "rows" (using sqlite terminology to show the mapping in my mind), then
> for each "row" get each of the relevant "columns". The natural way to do
> this was have most of the data stored with entries like this:
>     uri-row-0 = http://some-cdn.com/a-css-reset-library.css
>     hit-count-row-0 = 5
>     last-hit-row-0 = <timestamp>
>     uri-row-1 = http://example.com/style.css
>     <etc, etc, etc>
> That makes it really easy when predicting to do for (i = 0; i < num_rows;
> i++) { get each metadata entry for row i, do calculations, setup prediction
> if required }

Maybe rather have:

"seer::http://some-cdn.com/a-css-reset-library.css" = "0,<timestamp>" (or frecency?)

and iterate (you can walk the metadata keys/values! - visitMetaData) and when there is "seer::" at the start, you get what you need.  no need for an index then.

Also note that "uri-row-" + "hit-count-row-" + "last-hit-row-" are relatively large strings and with having 100 items it's 3.5kB per entry.  Seems little, but since the time I wrote my first memory reporter, nothing is little :)  Each byte accumulates and you are then saving every byte you can.  This approach is IMO a bit wasting.  Have "seer:0" = "<uri>,0,<timestamp>" seems better.  Also, some compression of URIs would be good, err.. is needed :) .. have to think.

> 
> However, when learning about something, we don't have the index (since
> that's seer-internal data), so we have to have some way to say "oh, you're
> trying to update information about http://example.com/style.css? That's at
> index 1!". Thus, there are entries like
>     row-number-for-http://example.com/style.css = 1
> So when we're told "Hey, http://example.com/ loaded
> http://example.com/style.css", we now have a way to look up the index for
> style.css on the example.com entry, and can update the metadata entries for
> that accordingly.
> 
> The other reason we need indices is so we can keep track easily of how many
> resources we are tracking for a page, and keep that list limited to a sane
> number (for some definition of sane!). We have the count, but we don't
> (yet!) have a way to figure out which one to evict once we've reached our
> limit. The answer to that is one more cache entry. For example:
>     row-numbers-in-order-of-least-recently-to-most-recently-used = [3, 1,
> 19, 0]
> Now when we hit row-count == max-row-count, it's easy to look up which index
> to evict (in this case, 3) and re-use (which would then make the entry look
> like [1, 19, 0, 3] where 3 is now the index for the new resource).

Hmm.. have to think.  Let you know that when you add metadata, it's always added at the end.  And metadata, when being visited, always comes in the order as they've been added.  But this is very much implementation specific (not a good practice...)

> 
> It's quite possible (probable, even!) that I'm making this way too
> complicated. I'm already thinking of a way to make this simpler (and reduce
> at least the number of metadata entries the seer will add). Perhaps storing
> a base64-encoded binary blob (to avoid issues with null bytes confusing the
> metadata parser) with all the resource URIs, hit counts, and timestamps that
> is serialized in LRU order. That reduces the number of metadata entries from
> 4*number_resources to 1, though base64-encoding is longer than the length of
> what it's encoding. Perhaps there's some method I'm not thinking of yet?

Still have to think :)

> 
> > > * the URI (keyed on index, so we can look up when we don't have the uri
> > > explicitly available, which is the case when we're performing a prediction)
> > > * a timestamp for the last time we saw this resource loaded by this page
> > > * a count of how many times we've seen this resource loaded by this page
> > 
> > I think this could well be stored on the entry, unless you need to leave the
> > metadata.
> 
> I'm not sure I follow here. Do you mean store this on the regular cache
> entry for the resource, not the page from which it was loaded? If so, I'm
> not a fan of that (though I could be convinced), as that will require a lot
> of round-trips to open the entry for the page, then read the list of
> resources from there, and open those entries, then do all the calculations
> to make predictions.

OK, so when you have it on your own entry (with URL = URL of the real resource and idext = "seer") you save round trips?  The real resource entry has to be open as well.  We open entries serially, what will probably not change soon (maybe one day we will have multiple threads, but that is a big architectural change).

But let's for now go with special "seer" entries, it should not be a big problem for now hopefully, just let you know you will probably effectively slow down cold load times on slower medias.

> 
> > > 
> > > The number of subresources will be limited (via a pref). I have not yet
> > > decided on a limit for this, but it will certainly be <= 100. The limit will
> > > probably be based on information from the HTTP Archive on how many resources
> > > the average page loads, plus a little bit of room for future growth. 
> > 
> > No prophecies please :)  I think 20 resources, and only those necessary for
> > first-paint are OK.  I would not bother with images and any async scripts
> > (or so).
> 
> That's why I said based on information from HTTP Archive. And FWIW, I wasn't
> far off... HTTP Archive says an average of 96 requests per pageload(!!!)

Interesting data.

> 
> I'm fine with limiting to the more important resources, it definitely sounds
> like a good space optimization.

I'd personally start slowly :)  A bit here and a bit there.  That itself can do a lot of magic.

> 
> > BTW, you 6 months of work on the seer v1 - did it bring some useful data on
> > what is good to store and what is good to ignore?  Because it appeared to me
> > that you were storing just too much and it was actually slowing down in some
> > cases.  We need to be smart here and also think of the disk overhead for the
> > seer data.
> 
> Unfortunately, no, for a few reasons. One, the seer didn't spend a whole lot
> of time pref'd on outside my machine, so there's not a whole lot of good
> data to go on. And two, there was a bug with a decent amount of the
> telemetry causing it to not be accumulated at all, even on my machine :/
> 
> > > I also
> > > have not yet decided on a scheme for recycling indices (other than a general
> > > LRU kind of thing) and keeping track of which index to recycle next (thus
> > > evicting a lesser-used subresource). Most likely I will add one more
> > > metadata entry that keeps a list of used indices in order of use, and then I
> > > can just take the last index in the list as the one to evict.
> > 
> > (Maybe I'm stuck on the same thing here) Can you please more explain what
> > indices you need here?  What is the index?  Why do you actually need it?
> 
> Does the explanation a few sections above help?

Yep :)

> > > 
> > > When the seer is asked to predict, it will open the appropriate cache
> > > entries, 
> > 
> > The 'origin' and the 'full-uri'?  In this order?  Remember the OPEN_PRIORITY
> > flag as well.
> 
> That's the order in which it's currently written, yes. I'm totally open to
> changing things around, though. Also, I was avoiding OPEN_PRIORITY, since I
> didn't want to starve more legitimate cache lookups, but if you think it's
> OK, I'll go ahead and use it.

It depends.  when you schedule open for 100 images (no OPEN_PRIORITY set) and then you schedule open for two blocking ccs files (use OPEN_PRIORITY) then those two will skip opening and reading over all those 100 images.

Opening a page looks like this:

http://example.com/index.html, OPEN_PRIORITY
OnDataAvail then goes to the parser that hits:
http://example.com/logo.jpg, OPEN_*NORMALLY*
http://example.com/style.css, OPEN_PRIORITY
http://example.com/script.js, OPEN_PRIORITY
http://example.com/sidebarimage, OPEN_*NORMALLY*

so that style.css and script.js OnData/OnStop should happen sooner than for the two images.

When are you going to open your 'full-uri' entry?  What other entries are you going to open before/after user hits enter/releases mouse/release finger to go to a URL?  

It would only make sense to start (or do anything that can help speed up) load of the css and js resources (or any image that bug 975459 deals with) sooner than the parser hits them in the markup.  How are you going to achieve that?

Also note that doing too much may effectively slow stuff down overall.  I was observing this with paradoxically faster first-paint times when loading from a slow sd card than from an ssd disk!

This all needs to be smart.  We must do the right stuff at the right moment and not rush as a crowd of people when opening a new electronic store where you can buy a new plasma TV for 10 dollars :)
(In reply to Honza Bambas (:mayhemer) from comment #5)
> Yes!  It has to be 'redirect originating - aka the first in the chain - uri
> entry' not yours 'origin' entry.

OK, that makes more sense.

> When thinking of it, we only save loop over b.html here.  Also, a.html
> already keeps the information it's redirecting somewhere already - it has
> '30X, Location: b.html' response cached.
> 
> How is actually the redirect information you want to collect (regardless how
> we are eventually going to do that) later used?  We preconnect the target
> host?  We start preloading the target page?  Do we just skip the pages in
> the chain (for the above example the b.html)?  Would that be always save? 
> Because I don't believe it would be, multiple redirects are sometimes used
> to add/verify login cookies.  Do we have data how many redirects in the
> chain usually are?

As long as we don't prefetch redirects, then yes, it's always safe (though not, perhaps, as helpful as it could be). Right now, we just preconnect to the target host, skipping any intermediate redirects. So, if I click on a twitter link (who shortens ALL URLs in tweets through the t.co shortener) to bit.ly, which finally ends up on sfgate.com, I would store "t.co/whatever redirects to sfgate.com/awesome-article", with no knowledge at all of the bit.ly/stuff in the middle. (For clarity, the redirect chain is t.co/whatever -> bit.ly/stuff -> sfgate.com/awesome-article). There's definitely room for improvement here.

> I just want to understand how you want to benefit or speed up the redirect
> chaining with seer.  Also could be something we may want to do in a
> followup?  Splitting tasks is always good and this is relatively big!

I'm fine with re-doing redirect as a followup. It gives us more time to figure out the right way to store this information (especially for multiple-redirect cases, where we currently just store the original URI and the final target, after all redirects have been processed).

> OK, maybe I here just need more details on how seer actually works.  From
> what you wrote it seems like you want to have mirror entry for EACH
> subresource on a page.  That sounds crazy.  But maybe you just want to store
> the list of subresources?

Yes, just the list (along with the little bits of extra data about them to be able to make confidence calculations). One cache entry per page load, instead of 1 + N entries (where N = the number of resources we're keeping data for). I may be crazy, but I'm not TOTALLY crazy :) (Actually 2 entries if you take full-uri and origin into account, but there's obviously not a 1-1 correspondence between full-uri and origin entries, since many URIs share the same origin.)

> Maybe rather have:
> 
> "seer::http://some-cdn.com/a-css-reset-library.css" = "0,<timestamp>" (or
> frecency?)
> 
> and iterate (you can walk the metadata keys/values! - visitMetaData) and
> when there is "seer::" at the start, you get what you need.  no need for an
> index then.

OK, I think I like this idea. Let me try to explain what I'm thinking of, and how it might work. I'm sure there will be something wrong with my idea :)

We store things the way you're suggesting, it makes iterating over everything easy. We also keep one other entry, which is either (1) The list of seer:: metadata entries in LRU order, or (2) JUST the URI that is least recently used. Learning and eviction would work like this:

If we go with (1) above (keeping the full list of URIs in LRU order), then when we have to evict, we just take whichever end of that list is LRU, remove it from the cache entry, and add our new one to the entry and the list. When we learn about something, we have to update the ordering of the whole list and re-write that cache entry.

If we go with (2) above (keeping only the LRU URI), then when we have to evict, we take that URI, remove its metadata entry, add the new one, and then walk the metadata entries to find the new LRU URI and store it. When we learn about something, we only have to modify the metadata entry that keeps track of the LRU entry if the URI we're learning about is the one that is in the LRU position (we would have to walk over all the metadata entries and find the new LRU again).

Advantage of (1) - eviction and learning is pretty easy to implement, and doesn't take a whole lot of work. But this keeps a LOT of extra data (depending on what our resource limit is), an extra copy of each URI.

Advantage of (2) - We're not storing a whole lot of extra data (just an extra copy of one URI). However, eviction and learning takes a bit of extra work to do.

I have to think about this a bit more. I have an idea of how to combine your approach with my index approach and get the best of both worlds, but I'm not sure yet it's going to work :)

> OK, so when you have it on your own entry (with URL = URL of the real
> resource and idext = "seer") you save round trips?  The real resource entry
> has to be open as well.  We open entries serially, what will probably not
> change soon (maybe one day we will have multiple threads, but that is a big
> architectural change).

Sorry, no, I thought you were suggesting one of two things:

(1) Keep some of the seer data as the actual content (not metadata) of a cache entry (which would require opening the cache entry data stream, waiting for all its data, and THEN doing work), or

(2) Keep load count and timestamp for the page on the page's "real" cache entry, along with a list of resources. Then keep the hit count and timestamp for the resources on the resources' "real" cache entries, which has the disadvantage (main thread round trips) of opening one entry, reading some metadata from it, then opening a bunch of other entries, and not being able to do any seer work until those entries are opened.

Using a special "seer" entry for full-uri will not save any round-trips over just shoving the seer metadata on the "real" cache entry. (Of course, seer origin data should still probably be on a special seer entry, as that makes it easier to partition full-uri versus origin data).

With the addition of OPEN_PRIVATE (to avoid messing up the fetch count, etc), and the ability to keep metadata around (at least on disk) if we're not horribly space constrained, I think it makes the most sense to store full-uri data on the regular cache entries. That both saves space on disk, and helps reduce eviction of non-seer entries from the in-memory working set. It also can end up preloading at least some necessary cache entries.

> I'd personally start slowly :)  A bit here and a bit there.  That itself can
> do a lot of magic.

Yeah, let's plan on that. I'll poke at HTTP archive to see what a sane number of resources to keep is in order to cover all stylesheets and scripts (not sure they differentiate between sync and async scripts).

> It depends.  when you schedule open for 100 images (no OPEN_PRIORITY set)
> and then you schedule open for two blocking ccs files (use OPEN_PRIORITY)
> then those two will skip opening and reading over all those 100 images.
> 
> Opening a page looks like this:
> 
> http://example.com/index.html, OPEN_PRIORITY
> OnDataAvail then goes to the parser that hits:
> http://example.com/logo.jpg, OPEN_*NORMALLY*
> http://example.com/style.css, OPEN_PRIORITY
> http://example.com/script.js, OPEN_PRIORITY
> http://example.com/sidebarimage, OPEN_*NORMALLY*
> 
> so that style.css and script.js OnData/OnStop should happen sooner than for
> the two images.

Alright, that's what I thought happened. I think if we keep the full-uri data on "real" cache entries, then using OPEN_PRIORITY for those makes sense (since we'll be opening them soon anyway), and perhaps we keep using OPEN_NORMALLY for seer-specific origin entries (since those are less important than real entries that we know we need).

> When are you going to open your 'full-uri' entry?  What other entries are
> you going to open before/after user hits enter/releases mouse/release finger
> to go to a URL?  

For a page load, the docshell calls Predict just before it actually creates and fires up the channel, so we'll be opening the entry before any network activity actually takes place. If we end up opening other entries (say because of a redirect, in the current methodology), those will of course be opened at some point later in the page load process, as we wouldn't know we need to open them before we get the results back for that first entry.

> It would only make sense to start (or do anything that can help speed up)
> load of the css and js resources (or any image that bug 975459 deals with)
> sooner than the parser hits them in the markup.  How are you going to
> achieve that?

I think a combination of OPEN_PRIORITY (for full-uri entries) and storing resource metadata for just css and (sync) js is probably the right way to achieve this.

> Also note that doing too much may effectively slow stuff down overall.  I
> was observing this with paradoxically faster first-paint times when loading
> from a slow sd card than from an ssd disk!
> 
> This all needs to be smart.  We must do the right stuff at the right moment
> and not rush as a crowd of people when opening a new electronic store where
> you can buy a new plasma TV for 10 dollars :)

But... but... cramming into an electronic store to get a cheap plasma TV is The American Way(tm)! :D In all seriousness though, you are correct. And we seem to be converging to a good place.
Depends on: 1000338
Depends on: 1029782
See Also: → 1005958
Attached patch patch (obsolete) — Splinter Review
Suddenly, a wild patch appeared!

Honza, here's a patch that (unless I'm crazy - always a possibility) follows the design we discussed. It's... rather huge. Sorry. I'll also attach a copy of the new Predictor.cpp (the patch for that file is particularly ugly, since so much has been rewritten that it's probably best to treat it as a new file).
Attachment #8498961 - Flags: review?(honzab.moz)
Attached file Predictor.cpp (obsolete) —
The raw Predictor.cpp, as promised.
Attachment #8498963 - Flags: review?(honzab.moz)
Comment on attachment 8498963 [details]
Predictor.cpp

Will review the patch.
Attachment #8498963 - Flags: review?(honzab.moz)
I have a first comments list draft (for both patches).  I will test the patch and see how it works.

Notes for me:
- check the telemetry
- check the iteration frequency on both entries and meta
Comment on attachment 8498961 [details] [diff] [review]
patch

Review of attachment 8498961 [details] [diff] [review]:
-----------------------------------------------------------------

r-, comments will follow on the complete patch.

::: netwerk/base/src/Predictor.h
@@ +46,5 @@
> +  PredictorAction(bool fullUri, bool predict,
> +                  PredictorPredictReason predictReason,
> +                  PredictorLearnReason learnReason, nsIURI *targetURI,
> +                  nsIURI *sourceURI, nsINetworkPredictorVerifier *verifier);
> +  PredictorAction(bool fullUri, bool predict,

I'd a little bit more prefer to use enums instead of bool all the time.  (true, true) is not very descriptive in the code.  Other option is to inline a comment to calls on all places what those args are.

@@ +56,5 @@
> +private:
> +  virtual ~PredictorAction();
> +
> +  bool mFullUri;
> +  bool mPredict;

make these bit fields

@@ +58,5 @@
> +
> +  bool mFullUri;
> +  bool mPredict;
> +  PredictorPredictReason mPredictReason;
> +  PredictorLearnReason mLearnReason;

if possible, can this be put to a union?

@@ +63,5 @@
> +  nsCOMPtr<nsIURI> mTargetURI;
> +  nsCOMPtr<nsIURI> mSourceURI;
> +  nsCOMPtr<nsINetworkPredictorVerifier> mVerifier;
> +  TimeStamp mStartTime;
> +  int mStackCount;

don't like uint32_t ?

@@ +150,5 @@
> +  // Service startup utilities
> +  void CheckForAndDeleteOldDBFiles();
> +  void MaybeCleanupOldDBFiles();
> +
> +  // The guts of prediction

It's clear, but some comments on each method and definitely on arguments that may not be clear (like stackCount) would be nice too.
Attachment #8498961 - Flags: review?(honzab.moz) → review-
Comment on attachment 8501762 [details] [diff] [review]
Predictor.cpp as a new file in patch (for splinter review)

Review of attachment 8501762 [details] [diff] [review]:
-----------------------------------------------------------------

r-, please read the comments.

Not sure if you gave me some draft version for review but this apparently is not tested at all and has serious flaws.  After few minutes of testing I found out you are effectively wiping out cached top level pages (where you'd like to store your subresource lists IIUC) in a second after those have been loaded and cached.

::: netwerk/base/src/Predictor.cpp
@@ +49,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +static Predictor *gPredictor = nullptr;

please get rid of this static and rather make all consumers of it refer the predictor instance via a member, it's safer and I believe easily doable.

@@ +150,5 @@
> +
> +// Get the full origin (scheme, host, port) out of a URI (maybe should be part
> +// of nsIURI instead?)
> +static void
> +ExtractOrigin(nsIURI *uri, nsIURI **originUri, nsIIOService *ioService)

nsContentUtils::GetASCIIOrigin ?

@@ +191,5 @@
> +  bool isHTTP = false;
> +  uri->SchemeIs("http", &isHTTP);
> +  if (!isHTTP) {
> +    uri->SchemeIs("https", &isHTTP);
> +  }

maybe return false on failure of SchemeIs?

@@ +284,5 @@
> +PredictorAction::OnCacheEntryAvailable(nsICacheEntry *entry, bool isNew,
> +                                       nsIApplicationCache *appCache,
> +                                       nsresult result)
> +{
> +  MOZ_ASSERT(NS_IsMainThread(), "Got cache entry off main thread!");

why do you insist on this being called on the main thread?  you can freely let this happen on any thread.  let you know the OnAvail call is always on the thread you have called asyncOpenURI for.

@@ +329,5 @@
> +}
> +
> +// Are you ready for the fun part? Because here comes the fun part. The
> +// predictor, which will do awesome stuff as you browse to make your browsing
> +// experience faster.

I don't see much point of this comment.

@@ +539,5 @@
> +static const char metaDataPrefix[] = "predictor::";
> +nsresult
> +Predictor::OnMetaDataElement(const char *asciiKey, const char *asciiValue)
> +{
> +  if (strncmp(metaDataPrefix, asciiKey, strlen(metaDataPrefix))) {

please use StringBeginsWith(nsDependentCString(asciiKey), NS_LITERAL_CSTRING("predictor::"))

@@ +594,5 @@
> +  nsCOMPtr<nsICacheStorageService> cacheStorageService =
> +    do_GetService("@mozilla.org/netwerk/cache-storage-service;1", &rv);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  nsRefPtr<LoadContextInfo> lci = new LoadContextInfo(false, 0, false, false);

s/0/nsILoadContentInfo::NO_APP_ID/

@@ +617,5 @@
> +  rv = NS_GetSpecialDirectory(NS_APP_USER_PROFILE_50_DIR,
> +                              getter_AddRefs(mDBFile));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  rv = mDBFile->AppendNative(NS_LITERAL_CSTRING("netpredictions.sqlite"));
> +  NS_ENSURE_SUCCESS(rv, rv);

please comment this is just for the cleanup

@@ +661,5 @@
> +    MOZ_ASSERT(NS_IsMainThread(), "Shutting down io thread off main thread!");
> +    // We get here, that means the cleanup happened. Mark so we don't try in the
> +    // future.
> +    Preferences::SetBool(PREDICTOR_CLEANED_UP_PREF, true);
> +    return gPredictor->mIOThread->Shutdown();

how do you (in general) ensure gPredictor is non-null here?

@@ +863,5 @@
> +}
> +
> +// Predicting for a link is easy, and doesn't require the round-trip to the
> +// predictor thread and back to the main thread, since we don't have to hit the
> +// db for that.

a leftover comment?

@@ +903,5 @@
> +  uint32_t lastLoad;
> +  nsresult rv = entry->GetLastFetched(&lastLoad);
> +  NS_ENSURE_SUCCESS(rv, false);
> +
> +  int globalDegradation = CalculateGlobalDegradation(lastLoad);

some uint rather?

@@ +912,5 @@
> +
> +  nsCOMPtr<nsIURI> redirectURI;
> +  if (WouldRedirect(entry, loadCount, lastLoad, globalDegradation,
> +                    getter_AddRefs(redirectURI))) {
> +    mPreconnects.AppendElement(redirectURI);

how is it here with thread safety? is this happening only the main thread (or in general - a single thread?)  maybe just assert the thread in every function/method

@@ +920,5 @@
> +    nsAutoCString redirectUriString;
> +    redirectURI->GetAsciiSpec(redirectUriString);
> +    PREDICTOR_LOG(("Predict redirect uri=%s action=%p", redirectUriString.get(),
> +                   redirectAction.get()));
> +    mCacheDiskStorage->AsyncOpenURI(redirectURI, NS_LITERAL_CSTRING(""),

s/NS_LITERAL_CSTRING("")/EmptyCString()/

@@ +1038,5 @@
> +                                uint32_t loadCount, int globalDegradation)
> +{
> +  // Since the visitor gets called under a cache lock, all we do there is get
> +  // copies of the keys/values we care about, and then do the real work here
> +  entry->VisitMetaData(this);

maybe clear the members you grab the values to also BEFORE this call?

or, you could better use a helper class defined inside this very method to grab the data, it would be a bit clearer code then

up to you

assert thread here

@@ +1071,5 @@
> +    if (confidence >= mPreconnectMinConfidence) {
> +      mPreconnects.AppendElement(uri);
> +    } else if (confidence >= mPreresolveMinConfidence) {
> +      mPreresolves.AppendElement(uri);
> +    }

Please assert thread here.
Nit: indention.

@@ +1090,5 @@
> +    nsCOMPtr<nsIURI> uri = mPreconnects[i];
> +    Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PREDICTIONS> totalPredictions;
> +    ++totalPredictions;
> +    Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PRECONNECTS> totalPreconnects;
> +    ++totalPreconnects;

didn't you want to define this outside the for() block?

@@ +1105,5 @@
> +    nsCOMPtr<nsIURI> uri = mPreresolves[i];
> +    Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PREDICTIONS> totalPredictions;
> +    ++totalPredictions;
> +    Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PRERESOLVES> totalPreresolves;
> +    ++totalPreresolves;

as well here

@@ +1110,5 @@
> +    nsAutoCString hostname;
> +    uri->GetAsciiHost(hostname);
> +    nsCOMPtr<nsICancelable> tmpCancelable;
> +    mDnsService->AsyncResolve(hostname,
> +                              (nsIDNSService::RESOLVE_PRIORITY_MEDIUM |

hmm.. just thinking now, but is the MEDIUM priority the right one here? have you some data from the previous version of the seer on this?  could we use some confidence value to prioritize this?  if it's medium it may not happen sooner then the actual load would retrigger with HI prio.

@@ +1121,5 @@
> +    }
> +  }
> +
> +  mPreconnects.Clear();
> +  mPreresolves.Clear();

better to swap to local arrays after visit

@@ +1172,5 @@
> +  switch (reason) {
> +  case nsINetworkPredictor::LEARN_LOAD_TOPLEVEL:
> +    // This is the only case when we want to update the 'last used' metadata on
> +    // the cache entry we're getting. Right now, this is only used for tests.
> +    openFlags = nsICacheStorage::OPEN_NORMALLY;

just a note this will also create an entry if one doesn't exist, just make sure that's what you want

@@ +1202,5 @@
> +
> +  if (reason == nsINetworkPredictor::LEARN_STARTUP) {
> +    uriKey = mStartupURI;
> +    originKey = mStartupURI;
> +  }

I don't quite follow this block, you already have a case: for this reason above.

@@ +1247,5 @@
> +                         bool isNew, bool fullUri, nsIURI *targetURI,
> +                         nsIURI *sourceURI)
> +{
> +  char *junk;
> +  if (NS_FAILED(entry->GetMetaDataElement("predictor::seen", &junk))) {

use nsCString and getter_Copies for junk. we could also introduce a HasMetaDataElement method easily.

@@ +1254,5 @@
> +        reason == nsINetworkPredictor::LEARN_LOAD_SUBRESOURCE) {
> +      PREDICTOR_LOG(("Tried to learn a redirect or subresource for a toplevel "
> +                     "that didn't exist. What?"));
> +      if (!fullUri) {
> +        entry->AsyncDoom(nullptr);

I don't understand the reason of dooming something from your layers.  This is the place (just add a break point!) that deletes the cached top level page:

 	xul.dll!mozilla::net::Predictor::LearnInternal(0x00000001, 0x14617f40, false, false, 0x158830c0, 0x188f56c0) Line 1258	C++
 	xul.dll!mozilla::net::PredictorAction::OnCacheEntryAvailable(0x14617f40, false, 0x00000000, NS_OK) Line 325	C++
 	xul.dll!mozilla::net::CacheEntry::InvokeAvailableCallback({...}) Line 790	C++
 	xul.dll!mozilla::net::CacheEntry::InvokeCallback({...}) Line 732	C++
 	xul.dll!mozilla::net::CacheEntry::InvokeCallbacks(true) Line 607	C++
 	xul.dll!mozilla::net::CacheEntry::InvokeCallbacks() Line 564	C++
 	xul.dll!mozilla::net::CacheEntry::Open({...}, false, false, false) Line 310	C++
 	xul.dll!mozilla::net::CacheEntry::AsyncOpen(0x188bc290, 0x00000022) Line 282	C++
 	xul.dll!mozilla::net::CacheStorage::AsyncOpenURI(0x188f56c0, {...}, 0x00000022, 0x188bc290) Line 104	C++
>	xul.dll!mozilla::net::Predictor::Learn(0x158830c0, 0x188f56c0, 0x00000001, 0x15bd9124) Line 1240	C++
 	xul.dll!mozilla::net::PredictorLearn(0x158830c0, 0x188f56c0, 0x00000001, 0x00852000) Line 1699	C++
 	xul.dll!mozilla::css::Loader::LoadSheet(0x15f47ac0, eSheetNeedsParser) Line 1671	C++

@@ +1256,5 @@
> +                     "that didn't exist. What?"));
> +      if (!fullUri) {
> +        entry->AsyncDoom(nullptr);
> +      }
> +      return;

else after return (tho, switch preferred)

@@ +1337,5 @@
> +
> +// Called when a subresource has been hit from a top-level load. Uses the two
> +// helper functions above to update the database appropriately.
> +void
> +Predictor::LearnForSubresource(nsICacheEntry *entry, nsIURI *targetURI)

what are the arguments? - comments.  I presume |entry| is the entry of the top level page loading |targetURI|

@@ +1353,5 @@
> +  nsCString uri;
> +  targetURI->GetAsciiSpec(uri);
> +  key.Append(uri);
> +
> +  char *value;

again, getter_Copies

@@ +1354,5 @@
> +  targetURI->GetAsciiSpec(uri);
> +  key.Append(uri);
> +
> +  char *value;
> +  rv = entry->GetMetaDataElement(key.BeginReading(), &value);

hmm... so, you store whole urls in the list. but you use only the origin part from it anyway IIUC.

last time I asked about this (with the old seer version) you replied "I collect for the future!"  it might not have been explicitly said but I actually wanted you to optimize this by storing only what you really need since we don't have any plans right now for using all this data (whole urls).  counting number of loads for each origins a page references seems better to me.  also instantly gives a hint how many connections to open.

so, please provide a strong and reasonable rational for storing whole URLs or just store <origin>:<counter> instead of the whole URL.

@@ +1360,5 @@
> +  nsCOMPtr<nsIURI> junk;
> +  uint32_t hitCount, lastHit;
> +  bool isNewResource = (NS_FAILED(rv) ||
> +                        !ParseMetaDataEntry(key.BeginReading(), value,
> +                                            getter_AddRefs(junk), hitCount,

maybe pass null instead of junk and make ParseMetaDataEntry work with it?

@@ +1432,5 @@
> +                              uint32_t &hitCount, uint32_t &lastHit)
> +{
> +  PREDICTOR_LOG(("Predictor::ParseMetaDataEntry key=%s value=%s", key, value));
> +
> +  const char *comma = strchr(value, ',');

oh god... not again... strchr... parser desperately needed... (followup)

@@ +1539,5 @@
> +
> +NS_IMETHODIMP
> +PredictorReset::OnMetaDataElement(const char *asciiKey, const char *asciiValue)
> +{
> +  if (strncmp(metaDataPrefix, asciiKey, strlen(metaDataPrefix))) {

StringBeginsWith

@@ +1576,5 @@
> +    gPredictor->mCacheDiskStorage->AsyncOpenURI(
> +        uri, idEnhance,
> +        nsICacheStorage::OPEN_READONLY|nsICacheStorage::OPEN_SECRETLY,
> +        this);
> +  }

when seeing these iterations, I'm thinking of using a distinct cache storage for your data (something we don't have, would need to be implemented). followup sufficient! (as well as more thinking for me about it first)

@@ +1599,5 @@
> +    PREDICTOR_LOG(("COULD NOT GET OBSERVER SERVICE!"));
> +    return;
> +  }
> +
> +  obs->NotifyObservers(nullptr, "predictor-reset-complete", MOZ_UTF16(""));

why MOZ_UTF16("")?

@@ +1613,5 @@
> +    do_GetService("@mozilla.org/network/predictor;1",
> +                  &rv);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  NS_IF_ADDREF(*aPredictor = predictor);

predictor.forget(aPredictor)?

@@ +1618,5 @@
> +  return NS_OK;
> +}
> +
> +nsresult
> +PredictorPredict(nsIURI *targetURI, nsIURI *sourceURI,

I assume these are already referred around the code, anyway how about to make them ::mozilla::net::predictor::Predict() etc ? :)  no, it might be to much..
Attachment #8501762 - Flags: review?(honzab.moz) → review-
(In reply to Honza Bambas (:mayhemer) from comment #13)
> Comment on attachment 8501762 [details] [diff] [review]
> Predictor.cpp as a new file in patch (for splinter review)
> 
> Review of attachment 8501762 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r-, please read the comments.
> 
> Not sure if you gave me some draft version for review but this apparently is
> not tested at all and has serious flaws.  After few minutes of testing I
> found out you are effectively wiping out cached top level pages (where you'd
> like to store your subresource lists IIUC) in a second after those have been
> loaded and cached.

It actually is tested, passes everything on try. Please don't assume otherwise.

Other than the wiping out of cached top level stuff (which should only be for origin-only entries, which are entirely controlled by the seer), I'm not seeing any "serious" flaws in your comments. Things to be fixed, sure, but nothing earth-shattering. And actually, the issue with wiping out cached top level stuff I just noticed is because I once passed "false" where I should've passed "true" to the PredictorAction constructor. Simple fix :) (Also would've been caught when changing to using enums instead of bools.)

> @@ +150,5 @@
> > +
> > +// Get the full origin (scheme, host, port) out of a URI (maybe should be part
> > +// of nsIURI instead?)
> > +static void
> > +ExtractOrigin(nsIURI *uri, nsIURI **originUri, nsIIOService *ioService)
> 
> nsContentUtils::GetASCIIOrigin ?

Wish I'd known that existed when I first wrote ExtractOrigin on the first iteration of the seer :)

> @@ +284,5 @@
> > +PredictorAction::OnCacheEntryAvailable(nsICacheEntry *entry, bool isNew,
> > +                                       nsIApplicationCache *appCache,
> > +                                       nsresult result)
> > +{
> > +  MOZ_ASSERT(NS_IsMainThread(), "Got cache entry off main thread!");
> 
> why do you insist on this being called on the main thread?  you can freely
> let this happen on any thread.  let you know the OnAvail call is always on
> the thread you have called asyncOpenURI for.

Because the asserts cost nothing on release builds, they're good sanity checks on debug builds, and nsINetworkPredictor is main-thread only anyway so everything in this file is now main-thread only (with the one exception of the cleanup of the old sqlite file(s)). As you suggest later, I'll add asserts to every function, for clarity and extra safety.

> @@ +329,5 @@
> > +}
> > +
> > +// Are you ready for the fun part? Because here comes the fun part. The
> > +// predictor, which will do awesome stuff as you browse to make your browsing
> > +// experience faster.
> 
> I don't see much point of this comment.

It's just a little bit of whimsy I added when this was first written oh so long ago :)

> @@ +539,5 @@
> > +static const char metaDataPrefix[] = "predictor::";
> > +nsresult
> > +Predictor::OnMetaDataElement(const char *asciiKey, const char *asciiValue)
> > +{
> > +  if (strncmp(metaDataPrefix, asciiKey, strlen(metaDataPrefix))) {
> 
> please use StringBeginsWith(nsDependentCString(asciiKey),
> NS_LITERAL_CSTRING("predictor::"))

Ah, didn't know that existed... nice!

> @@ +661,5 @@
> > +    MOZ_ASSERT(NS_IsMainThread(), "Shutting down io thread off main thread!");
> > +    // We get here, that means the cleanup happened. Mark so we don't try in the
> > +    // future.
> > +    Preferences::SetBool(PREDICTOR_CLEANED_UP_PREF, true);
> > +    return gPredictor->mIOThread->Shutdown();
> 
> how do you (in general) ensure gPredictor is non-null here?

Good catch. It really shouldn't ever happen in practice, but perhaps it's better to pass a reference to the IO thread around, and just use that directly instead of depending on gPredictor being non-null.

> @@ +1110,5 @@
> > +    nsAutoCString hostname;
> > +    uri->GetAsciiHost(hostname);
> > +    nsCOMPtr<nsICancelable> tmpCancelable;
> > +    mDnsService->AsyncResolve(hostname,
> > +                              (nsIDNSService::RESOLVE_PRIORITY_MEDIUM |
> 
> hmm.. just thinking now, but is the MEDIUM priority the right one here? have
> you some data from the previous version of the seer on this?  could we use
> some confidence value to prioritize this?  if it's medium it may not happen
> sooner then the actual load would retrigger with HI prio.

So IIRC, using MEDIUM is what Patrick suggested the first time around (I think I originally had it at LOW). I just stuck with his suggestion. I think we want to save HI for non-speculative things.

> @@ +1172,5 @@
> > +  switch (reason) {
> > +  case nsINetworkPredictor::LEARN_LOAD_TOPLEVEL:
> > +    // This is the only case when we want to update the 'last used' metadata on
> > +    // the cache entry we're getting. Right now, this is only used for tests.
> > +    openFlags = nsICacheStorage::OPEN_NORMALLY;
> 
> just a note this will also create an entry if one doesn't exist, just make
> sure that's what you want

Yes, that's what I want. I can clarify in the comment.

> @@ +1202,5 @@
> > +
> > +  if (reason == nsINetworkPredictor::LEARN_STARTUP) {
> > +    uriKey = mStartupURI;
> > +    originKey = mStartupURI;
> > +  }
> 
> I don't quite follow this block, you already have a case: for this reason
> above.

This is because of my use of fallthrough in the switch statement that would result in this happening for both LEARN_STARTUP *and* LEARN_LOAD_TOPLEVEL, which is not what is needed here.

> @@ +1254,5 @@
> > +        reason == nsINetworkPredictor::LEARN_LOAD_SUBRESOURCE) {
> > +      PREDICTOR_LOG(("Tried to learn a redirect or subresource for a toplevel "
> > +                     "that didn't exist. What?"));
> > +      if (!fullUri) {
> > +        entry->AsyncDoom(nullptr);
> 
> I don't understand the reason of dooming something from your layers.  This
> is the place (just add a break point!) that deletes the cached top level
> page:

See above for the easy fix for this :)

> @@ +1354,5 @@
> > +  targetURI->GetAsciiSpec(uri);
> > +  key.Append(uri);
> > +
> > +  char *value;
> > +  rv = entry->GetMetaDataElement(key.BeginReading(), &value);
> 
> hmm... so, you store whole urls in the list. but you use only the origin
> part from it anyway IIUC.
> 
> last time I asked about this (with the old seer version) you replied "I
> collect for the future!"  it might not have been explicitly said but I
> actually wanted you to optimize this by storing only what you really need
> since we don't have any plans right now for using all this data (whole
> urls).  counting number of loads for each origins a page references seems
> better to me.  also instantly gives a hint how many connections to open.
> 
> so, please provide a strong and reasonable rational for storing whole URLs
> or just store <origin>:<counter> instead of the whole URL.

The rationale is that the need for whole URLs is coming soon. Jeremy (now contracting for us) is working on finishing up the prefetch stuff in parallel with my work here, which will require the full URLs. It's a toss-up on who will finish their stuff first, but it's unnecessary churn to strip down to origins only when URLs will (at latest) be required shortly after (and very likely within the same release, so it doesn't buy us anything)

> @@ +1432,5 @@
> > +                              uint32_t &hitCount, uint32_t &lastHit)
> > +{
> > +  PREDICTOR_LOG(("Predictor::ParseMetaDataEntry key=%s value=%s", key, value));
> > +
> > +  const char *comma = strchr(value, ',');
> 
> oh god... not again... strchr... parser desperately needed... (followup)

Will file.

> @@ +1576,5 @@
> > +    gPredictor->mCacheDiskStorage->AsyncOpenURI(
> > +        uri, idEnhance,
> > +        nsICacheStorage::OPEN_READONLY|nsICacheStorage::OPEN_SECRETLY,
> > +        this);
> > +  }
> 
> when seeing these iterations, I'm thinking of using a distinct cache storage
> for your data (something we don't have, would need to be implemented).
> followup sufficient! (as well as more thinking for me about it first)

Heh, I thought I had suggested something similar originally, using my own namespace (though not separate storage) for all entries instead of along with the regular cache entries :)

So the one thing to note about Reset (which is where you're commenting on here)... it's not used much. Only in tests, and when people "Clear Recent History" either through the menu or on shutdown because they have that box checked. Right now (because of the way the seer was written originally), the functionality for clearing recent history is lumped in with "Browsing & Download History" instead of "Cache" (it seemed the best area to merge with). It's quite possible we should have the seer data cleared when the cache is cleared instead of browsing & download history now. That would mean the only place Reset would be called would be in the tests (and actually, if the cache has a notification when its clearing out is finished, we could get rid of the Reset interface entirely.)

Thoughts?

> @@ +1618,5 @@
> > +  return NS_OK;
> > +}
> > +
> > +nsresult
> > +PredictorPredict(nsIURI *targetURI, nsIURI *sourceURI,
> 
> I assume these are already referred around the code, anyway how about to
> make them ::mozilla::net::predictor::Predict() etc ? :)  no, it might be to
> much..

Yeah... the names made more sense when it was called the seer. Jeremy and I discussed this when he did the rename, and we decided to just leave it as is, even though it is rather awkward :)


FYI, all the other bits of this I haven't responded to, I'm planning on changing as you suggest. A new patch will come when that's all done.

Also, since you're going on vacation soon... do you want someone else to take a look at future iterations of the patch, so we can try to get this in the tree (and let it bounce a few times, as features like this often do) sooner rather than later? Or would you rather block until you return?
(In reply to Nicholas Hurley [:hurley] from comment #14)
> (In reply to Honza Bambas (:mayhemer) from comment #13)
> > Comment on attachment 8501762 [details] [diff] [review]
> > Predictor.cpp as a new file in patch (for splinter review)
> > 
> > Review of attachment 8501762 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > r-, please read the comments.
> > 
> > Not sure if you gave me some draft version for review but this apparently is
> > not tested at all and has serious flaws.  After few minutes of testing I
> > found out you are effectively wiping out cached top level pages (where you'd
> > like to store your subresource lists IIUC) in a second after those have been
> > loaded and cached.
> 
> It actually is tested, passes everything on try. Please don't assume
> otherwise.
> 
> Other than the wiping out of cached top level stuff (which should only be
> for origin-only entries, which are entirely controlled by the seer), I'm not
> seeing any "serious" flaws in your comments. Things to be fixed, sure, but
> nothing earth-shattering. And actually, the issue with wiping out cached top
> level stuff I just noticed is because I once passed "false" where I
> should've passed "true" to the PredictorAction constructor. Simple fix :)
> (Also would've been caught when changing to using enums instead of bools.)


The thing is that this particular problem is so major that I kinda doubt you've put this to a manual testing.  I've discovered this very quickly and it actually completely stopped me from verifying this whole patch does something at all, since the point where you store the predict data simply is immediately erased.  It then puts this patch to a bad light for me, despite the fix (true -> false) is so simple.


> > hmm.. just thinking now, but is the MEDIUM priority the right one here? have
> > you some data from the previous version of the seer on this?  could we use
> > some confidence value to prioritize this?  if it's medium it may not happen
> > sooner then the actual load would retrigger with HI prio.
> 
> So IIRC, using MEDIUM is what Patrick suggested the first time around (I
> think I originally had it at LOW). I just stuck with his suggestion. I think
> we want to save HI for non-speculative things.

Hmm.. and that is something I could oppose, but telemetry will show (still have to take a look at it).

> 
> > @@ +1172,5 @@
> > > +  switch (reason) {
> > > +  case nsINetworkPredictor::LEARN_LOAD_TOPLEVEL:
> > > +    // This is the only case when we want to update the 'last used' metadata on
> > > +    // the cache entry we're getting. Right now, this is only used for tests.
> > > +    openFlags = nsICacheStorage::OPEN_NORMALLY;
> > 
> > just a note this will also create an entry if one doesn't exist, just make
> > sure that's what you want
> 
> Yes, that's what I want. I can clarify in the comment.

The thing is that when you create an entry, don't fill data nor metadata (or don't call metaDataReady() or setValid() or open the output stream), it won't store on disk or will be a state unusable for top levels.


If the entry you are creating here is your "predictor::" (not a content) entry then disregard the following comment.


It's possible it will be written to disk with only yours "seer" metadata and then for ongoing loads we will just show broken/none content.


> > @@ +1254,5 @@
> > > +        reason == nsINetworkPredictor::LEARN_LOAD_SUBRESOURCE) {
> > > +      PREDICTOR_LOG(("Tried to learn a redirect or subresource for a toplevel "
> > > +                     "that didn't exist. What?"));
> > > +      if (!fullUri) {
> > > +        entry->AsyncDoom(nullptr);
> > 
> > I don't understand the reason of dooming something from your layers.  This
> > is the place (just add a break point!) that deletes the cached top level
> > page:
> 
> See above for the easy fix for this :)

Still not sure I follow.  Can you explain what and why you are dooming and how you ensure in general you are not going to doom any regular content?

> 
> > @@ +1354,5 @@
> > > +  targetURI->GetAsciiSpec(uri);
> > > +  key.Append(uri);
> > > +
> > > +  char *value;
> > > +  rv = entry->GetMetaDataElement(key.BeginReading(), &value);
> > 
> > hmm... so, you store whole urls in the list. but you use only the origin
> > part from it anyway IIUC.
> > 
> > last time I asked about this (with the old seer version) you replied "I
> > collect for the future!"  it might not have been explicitly said but I
> > actually wanted you to optimize this by storing only what you really need
> > since we don't have any plans right now for using all this data (whole
> > urls).  counting number of loads for each origins a page references seems
> > better to me.  also instantly gives a hint how many connections to open.
> > 
> > so, please provide a strong and reasonable rational for storing whole URLs
> > or just store <origin>:<counter> instead of the whole URL.
> 
> The rationale is that the need for whole URLs is coming soon. Jeremy (now
> contracting for us) is working on finishing up the prefetch stuff in
> parallel with my work here, which will require the full URLs. It's a toss-up
> on who will finish their stuff first, but it's unnecessary churn to strip
> down to origins only when URLs will (at latest) be required shortly after
> (and very likely within the same release, so it doesn't buy us anything)

OK, could we limit the resources to just css/js?  Or better, to loads that are "blocking" i.e. must load before first paint?  ( attribute boolean loadAsBlocking; )

If you have some numbers that storing/prefetching images will have a positive performance impact as well then I can be persuaded.


> 
> > @@ +1576,5 @@
> > > +    gPredictor->mCacheDiskStorage->AsyncOpenURI(
> > > +        uri, idEnhance,
> > > +        nsICacheStorage::OPEN_READONLY|nsICacheStorage::OPEN_SECRETLY,
> > > +        this);
> > > +  }
> > 
> > when seeing these iterations, I'm thinking of using a distinct cache storage
> > for your data (something we don't have, would need to be implemented).
> > followup sufficient! (as well as more thinking for me about it first)
> 
> Heh, I thought I had suggested something similar originally, using my own
> namespace (though not separate storage) for all entries instead of along
> with the regular cache entries :)

I have to think of this more, there may be more users of storage areas separated not just by what nsILoadContextInfo provides.

> 
> So the one thing to note about Reset (which is where you're commenting on
> here)... it's not used much. Only in tests, and when people "Clear Recent
> History" either through the menu or on shutdown because they have that box
> checked. Right now (because of the way the seer was written originally), the
> functionality for clearing recent history is lumped in with "Browsing &
> Download History" instead of "Cache" (it seemed the best area to merge
> with). It's quite possible we should have the seer data cleared when the
> cache is cleared instead of browsing & download history now. That would mean
> the only place Reset would be called would be in the tests (and actually, if
> the cache has a notification when its clearing out is finished, we could get
> rid of the Reset interface entirely.)
> 
> Thoughts? 

I'd leave it at Browsing & Download History.  Don't worry that much about this.  It's up to the cache to optimize here, but if you have thoughts to do better you are welcome to do it.  But when it would be a big overkill.

> sooner rather than later? Or would you rather block until you return?

I'd like to take a look if possible.


One more note, I can see you are using only a single disk cache storage instance, bound to just [non-private,non-anonymous,app=0,not-in-browser] context.  You should consider respecting the context (probably except private, since the memory limit is small and your collected data would go away anyway)
(In reply to Honza Bambas (:mayhemer) from comment #15)
> The thing is that this particular problem is so major that I kinda doubt
> you've put this to a manual testing.  I've discovered this very quickly and
> it actually completely stopped me from verifying this whole patch does
> something at all, since the point where you store the predict data simply is
> immediately erased.  It then puts this patch to a bad light for me, despite
> the fix (true -> false) is so simple.

*sigh* I'm not going to spend time in bugzilla arguing whether or not I've done my job. Suffice to say, there was manual testing. Then once I got to try there was a failure that I fixed, and the fix caused this bug (at the time, I was both sick, and the fix was small enough that it seemed innocuous). You found a bug, that's part of what the review process is for. The bug will be fixed in the next iteration of the patch, let's move on.

> The thing is that when you create an entry, don't fill data nor metadata (or
> don't call metaDataReady() or setValid() or open the output stream), it
> won't store on disk or will be a state unusable for top levels.

Right. This code path is only EVER hit with non-"predictor::" entries in tests, literally. No point in gecko calls learn with LEARN_LOAD_TOPLEVEL for a non-"predictor::" cache entry (it doesn't have to). So we're guaranteed that this will never call metaDataReady on a real entry.

> Still not sure I follow.  Can you explain what and why you are dooming and
> how you ensure in general you are not going to doom any regular content?

Actually, thinking about it more, this dooming is cruft left over from a previous iteration of the patch (the same version that had manual testing, in fact) that is no longer necessary in this version. I'll just kill it.

> OK, could we limit the resources to just css/js?  Or better, to loads that
> are "blocking" i.e. must load before first paint?  ( attribute boolean
> loadAsBlocking; )
> 
> If you have some numbers that storing/prefetching images will have a
> positive performance impact as well then I can be persuaded.

I'm fine with doing this limiting. I'll poke at how to do it best.

> One more note, I can see you are using only a single disk cache storage
> instance, bound to just [non-private,non-anonymous,app=0,not-in-browser]
> context.  You should consider respecting the context (probably except
> private, since the memory limit is small and your collected data would go
> away anyway)

That makes sense. I explicitly don't do anything in PB mode anyway. Will do.
(In reply to Nicholas Hurley [:hurley] from comment #16)
> (In reply to Honza Bambas (:mayhemer) from comment #15)
> > The thing is that this particular problem is so major that I kinda doubt
> > you've put this to a manual testing.  I've discovered this very quickly and
> > it actually completely stopped me from verifying this whole patch does
> > something at all, since the point where you store the predict data simply is
> > immediately erased.  It then puts this patch to a bad light for me, despite
> > the fix (true -> false) is so simple.
> 
> *sigh* I'm not going to spend time in bugzilla arguing whether or not I've
> done my job. Suffice to say, there was manual testing. Then once I got to
> try there was a failure that I fixed, and the fix caused this bug (at the
> time, I was both sick, and the fix was small enough that it seemed
> innocuous). You found a bug, that's part of what the review process is for.
> The bug will be fixed in the next iteration of the patch, let's move on.

Aha!  This makes sense, this has happened to me as well in the past..  Do your automated tests catch this?  Maybe don't rush next time and just run the tests before r?, it's simple.

> 
> > The thing is that when you create an entry, don't fill data nor metadata (or
> > don't call metaDataReady() or setValid() or open the output stream), it
> > won't store on disk or will be a state unusable for top levels.
> 
> Right. This code path is only EVER hit with non-"predictor::" entries in
> tests, literally. No point in gecko calls learn with LEARN_LOAD_TOPLEVEL for
> a non-"predictor::" cache entry (it doesn't have to). So we're guaranteed
> that this will never call metaDataReady on a real entry.

The goal of my comment was to tell you that it's not a good idea to just create empty cache entries for common web content.  And actually even for "predictor::".  Since any entry that has zero data length and no metadata is considered empty and you'll never get the "lastFetched time" etc info from it anyway, you'll just get "not_found" from asyncOpenURI.

> 
> > Still not sure I follow.  Can you explain what and why you are dooming and
> > how you ensure in general you are not going to doom any regular content?
> 
> Actually, thinking about it more, this dooming is cruft left over from a
> previous iteration of the patch (the same version that had manual testing,
> in fact) that is no longer necessary in this version. I'll just kill it.

Cool.

> 
> > OK, could we limit the resources to just css/js?  Or better, to loads that
> > are "blocking" i.e. must load before first paint?  ( attribute boolean
> > loadAsBlocking; )
> > 
> > If you have some numbers that storing/prefetching images will have a
> > positive performance impact as well then I can be persuaded.
> 
> I'm fine with doing this limiting. I'll poke at how to do it best.

Good.

> 
> > One more note, I can see you are using only a single disk cache storage
> > instance, bound to just [non-private,non-anonymous,app=0,not-in-browser]
> > context.  You should consider respecting the context (probably except
> > private, since the memory limit is small and your collected data would go
> > away anyway)
> 
> That makes sense. I explicitly don't do anything in PB mode anyway. Will do.

Cool :)
(In reply to Honza Bambas (:mayhemer) from comment #17)
> Aha!  This makes sense, this has happened to me as well in the past..  Do
> your automated tests catch this?  Maybe don't rush next time and just run
> the tests before r?, it's simple.

Oh, the tests were run (like I said, totally green try run!) The automated tests don't catch this, though. However, given I'm going to remove the doom, it's rather irrelevant at this point.

> The goal of my comment was to tell you that it's not a good idea to just
> create empty cache entries for common web content.  And actually even for
> "predictor::".  Since any entry that has zero data length and no metadata is
> considered empty and you'll never get the "lastFetched time" etc info from
> it anyway, you'll just get "not_found" from asyncOpenURI.

Right, adding the "predictor::seen" metadata is how I worked around that :) I should probably not bother adding it to regular cache entires ("regular" being "any cache entries that don't have the "predictor-origin" id enhancement") to avoid the possibility of confusing things. At that point, I think we'll be at a safe place for both origin entries and non-origin entries.
(In reply to Honza Bambas (:mayhemer) from comment #15)
> One more note, I can see you are using only a single disk cache storage
> instance, bound to just [non-private,non-anonymous,app=0,not-in-browser]
> context.  You should consider respecting the context (probably except
> private, since the memory limit is small and your collected data would go
> away anyway)

Hrm... this might get a little tricky, when combined with Reset (or perhaps not, hence why I'm asking). If I need to iterate over every single cache entry (with any context!) as I do when doing the Reset, how do I do that? Is that achieved with the current [non-private,non-anonymous,app=0,not-in-browser] context? Or would I have to have some knowledge of all the contexts that have ever been used by the seer, and iterate over all the entries in all of those contexts separately? To (perhaps) make it simpler, is there some way to do:

for each entry in disk cache regardless of context:
    do stuff

or do I have to do something like:

for each context the seer has ever used:
    for each entry in this context:
        do stuff

?
(In reply to Nicholas Hurley [:hurley] from comment #19)
> (In reply to Honza Bambas (:mayhemer) from comment #15)
> > One more note, I can see you are using only a single disk cache storage
> > instance, bound to just [non-private,non-anonymous,app=0,not-in-browser]
> > context.  You should consider respecting the context (probably except
> > private, since the memory limit is small and your collected data would go
> > away anyway)
> 
> Hrm... this might get a little tricky, when combined with Reset (or perhaps
> not, hence why I'm asking). If I need to iterate over every single cache
> entry (with any context!) as I do when doing the Reset, how do I do that? Is
> that achieved with the current
> [non-private,non-anonymous,app=0,not-in-browser] context? Or would I have to
> have some knowledge of all the contexts that have ever been used by the
> seer, and iterate over all the entries in all of those contexts separately?
> To (perhaps) make it simpler, is there some way to do:
> 
> for each entry in disk cache regardless of context:
>     do stuff
> 
> or do I have to do something like:
> 
> for each context the seer has ever used:
>     for each entry in this context:
>         do stuff
> 
> ?

Right now the letter.  But that's of course totally crazy.  Just leave this now for a followup and use the 'default' disk cache storage.  I'll file a bug to make visiting and clearing this more convenient.
Attached patch patch (obsolete) — Splinter Review
Honza - new patch with pretty much all your comments addressed. The following are not included in this patch:

- a real parser for my metadata entries instead of strchr (will be in a followup)
- using a distinct cache storage for predictor stuff (will be in a followup pending your further thoughts on this)
- using real loadcontextinfo objects when opening cache entries (will be in a followup pending the ability to iterate everything in cache regardless of loadcontextinfo)
- limiting to just css and js - there are 2 things about this. (1) it's totally independent of code you would have to review (that change would be in imglib), and (2) I'm not sure this is the right course. I have to run the numbers through web page test with the new backend to see if it makes a difference (I suspect it will, since we don't just care about first paint for the predictor, but the size of that difference remains to be seen). Regardless, like I said, doing that limiting is totally independent of any of the code you've already seen, and is a 1-line change that would need r+ from an imglib peer, so I think we can proceed with the rest of the review without that.

I'll also post both an interdiff from the previous version and a patch containing predictor.cpp as a new file, just like last time, for ease of review..
Attachment #8498961 - Attachment is obsolete: true
Attachment #8498963 - Attachment is obsolete: true
Attachment #8501762 - Attachment is obsolete: true
Attachment #8511121 - Flags: review?(honzab.moz)
Attached patch interdiff (obsolete) — Splinter Review
Not marking r? on this, we'll just use the main patch to determine r+/-
Attached patch predictor.cpp.patch (obsolete) — Splinter Review
Not marking r? on this, we'll just use the main patch to determine r+/-
Attached patch patch (obsolete) — Splinter Review
Argh, there was one bit I forgot to add to support the impending prefetch functionality. It's not a major change, but it's good to have it in here for completeness sake. New interdiff and whole predictor.cpp patch coming, as well
Attachment #8511121 - Attachment is obsolete: true
Attachment #8511123 - Attachment is obsolete: true
Attachment #8511124 - Attachment is obsolete: true
Attachment #8511121 - Flags: review?(honzab.moz)
Attachment #8513870 - Flags: review?(honzab.moz)
Attached patch interdiff (obsolete) — Splinter Review
Attached patch predictor.cpp.patch (obsolete) — Splinter Review
Comment on attachment 8513875 [details] [diff] [review]
predictor.cpp.patch

Review of attachment 8513875 [details] [diff] [review]:
-----------------------------------------------------------------

Few comments:
- not sure all first review comments were addressed (but I might be wrong)
- when testing the patch I often see some faulty logs:

   - OnCacheEntryAvailable xxxxxxxx FAILED to get cache entry. Aborting.

For instance:
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: Learn originKey=http://www.janbambas.cz/ targetOrigin=http://www.janbambas.cz/ sourceOrigin=http://www.janbambas.cz/ reason=1 action=90a6ac0
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: CacheEntry::AsyncOpen [this=9042160, state=EMPTY, flags=34, callback=90a6ac0]
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: CacheEntry::RememberCallback [this=9042160, cb=90a6ac0, state=EMPTY]
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: CacheEntry::InvokeCallback [this=9042160, state=EMPTY, cb=90a6ac0]
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: CacheEntry::InvokeAvailableCallback [this=9042160, state=EMPTY, cb=90a6ac0, r/o=1, n/w=0]
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: OnCacheEntryAvailable 90a6ac0 called. entry=0 mFullUri=0 mPredict=0 mPredictReason=1 mLearnReason=1 mTargetURI=http://www.janbambas.cz/ mSourceURI=http://www.janbambas.cz/ mStackCount=0 isNew=0 result=-2142568387
2014-11-07 20:01:38.980000 UTC - [Main Thread] 0[e11140]: OnCacheEntryAvailable 90a6ac0 FAILED to get cache entry. Aborting.

And I've been to the page twice before that..  flags 34 = OPEN_RO + OPEN_SERCET, it was your predictor specific entry.



   - Predictor::ParseMetaDataEntry key=predictor::resource-count value=18
          could not find first comma

callstack:
 	xul.dll!mozilla::net::Predictor::ParseMetaDataEntry(0x0ca48498, 0x0c9d5d48, 0x0029f004, 0x0029effc, 0x1032e0ef, 0x0c826280) Line 1487	C++
 	xul.dll!mozilla::net::Predictor::CalculatePredictions(0x0c8fe1b0, 0x545d2d4b, 0x00000001, 0x00000000) Line 1073	C++
	xul.dll!mozilla::net::Predictor::PredictForPageload(0x0c8fe1b0, '\0', 0x00000000) Line 940	C++
 	xul.dll!mozilla::net::Predictor::PredictInternal(0x00000001, 0x0c8fe1b0, false, true, 0x0c6fc600, 0x00000000, '\0') Line 857	C++
 	xul.dll!mozilla::net::PredictorAction::OnCacheEntryAvailable(0x0c8fe1b0, false, 0x00000000, NS_OK) Line 285	C++
The entry is the resource entry (http://www.janbambas.cz/, eid='')

2014-11-07 20:37:03.293000 UTC - [Main Thread] 0[a11140]: Predict uri=http://www.janbambas.cz/ reason=1 action=cea4fc0
2014-11-07 20:37:03.293000 UTC - [Main Thread] 0[a11140]: CacheEntry::AsyncOpen [this=ce4f270, state=NOTLOADED, flags=34, callback=cea4fc0]
2014-11-07 20:37:03.293000 UTC - [Main Thread] 0[a11140]: CacheEntry::RememberCallback [this=ce4f270, cb=cea4fc0, state=NOTLOADED]
2014-11-07 20:37:03.440000 UTC - [Main Thread] 0[a11140]: CacheEntry::InvokeCallback [this=ce4f270, state=READY, cb=cea4fc0]
2014-11-07 20:37:03.440000 UTC - [Main Thread] 0[a11140]: CacheEntry::InvokeAvailableCallback [this=ce4f270, state=READY, cb=cea4fc0, r/o=1, n/w=0]
2014-11-07 20:37:03.440000 UTC - [Main Thread] 0[a11140]: OnCacheEntryAvailable cea4fc0 called. entry=c8fe1b0 mFullUri=1 mPredict=1 mPredictReason=1 mLearnReason=1 mTargetURI=http://www.janbambas.cz/ mSourceURI= mStackCount=0 isNew=0 result=0
2014-11-07 20:41:04.178000 UTC - [Main Thread] 0[a11140]:     could not find first comma 


- open the top page load entries as priority (OPEN_PRIORITY flag) or you may ruin performance
- not sure what priority to use for your origin entries..
- I would store only sub-resources that are set nsHttpChannel::mLoadAsBlocking (nsIHttpChannel.SetLoadAsBlocking(true) call)

There may be few more comments on the next iteration, it's just late for me today to continue this review now.

::: netwerk/base/src/Predictor.cpp
@@ +241,5 @@
> +  }
> +}
> +
> +PredictorAction::~PredictorAction()
> +{ }

{
}

@@ +267,5 @@
> +    mSourceURI->GetAsciiSpec(sourceURI);
> +  }
> +  PREDICTOR_LOG(("OnCacheEntryAvailable %p called. entry=%p mFullUri=%d mPredict=%d "
> +                 "mPredictReason=%d mLearnReason=%d mTargetURI=%s "
> +                 "mSourceURI=%s mStackCount=%d isNew=%d result=%d",

log result as 0x%08x

@@ +341,5 @@
> +{
> +  if (mInitialized)
> +    Shutdown();
> +
> +  RemoveObserver();

This won't work, since the obs service keeps a strong ref.  Maybe support weakref?

@@ +514,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  if (!StringBeginsWith(nsDependentCString(asciiKey),
> +                        nsDependentCString(metaDataPrefix))) {

NS_LITERAL_CSTRING(metaDataPrefix) ?

@@ +523,5 @@
> +  nsCString key, value;
> +  key.AssignASCII(asciiKey);
> +  value.AssignASCII(asciiValue);
> +  mKeysToOperateOn.AppendElement(key);
> +  mValuesToOperateOn.AppendElement(value);

these members are still on the Predictor class?

@@ +543,5 @@
> +
> +#if defined(ANDROID) && !defined(MOZ_WIDGET_GONK)
> +  // This is an ugly hack to disable the predictor on android < 2.3, as it
> +  // doesn't play nicely with those android versions, at least on our infra.
> +  // Causes timeouts in reftests. See bug 881804 comment 86.

That comment doesn't say much.  Also, these failures could indicate a bug that only those platforms were able to hit.  I think this is something that should be investigated more deeply.

@@ +591,5 @@
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  mInitialized = true;
> +
> +  MaybeCleanupOldDBFiles();

This may happen too early.  Maybe use "browser-delayed-startup-finished" notification to start this with even some delay.

@@ +596,5 @@
> +
> +  return rv;
> +}
> +
> +class PredictorThreadShutdownRunner : public nsRunnable

all these helper classes must be in an anon namespace

@@ +629,5 @@
> +  ~PredictorOldCleanupRunner() { }
> +
> +  NS_IMETHOD Run() MOZ_OVERRIDE
> +  {
> +    MOZ_ASSERT(!NS_IsMainThread(), "Cleaning up old files on main thread!");

Maybe move this assertion inside CheckForAndDeleteOldDBFiles?

@@ +685,5 @@
> +  rv = dbFile->AppendNative(NS_LITERAL_CSTRING("netpredictions.sqlite"));
> +  RETURN_IF_FAILED(rv);
> +
> +  nsCOMPtr<nsIThread> ioThread;
> +  rv = NS_NewNamedThread("Net Predictor", getter_AddRefs(ioThread));

"Net Predictor Cleaner" ?

@@ +726,5 @@
> +
> +  rv = svc->Init();
> +  if (NS_FAILED(rv)) {
> +    PREDICTOR_LOG(("Failed to initialize predictor, predictor will be a noop"));
> +  }

There is NS_GENERIC_FACTORY_CONSTRUCTOR_INIT macro.  Please use it instead if possible.

@@ +922,5 @@
> +    redirectURI->GetAsciiSpec(redirectUriString);
> +    PREDICTOR_LOG(("Predict redirect uri=%s action=%p", redirectUriString.get(),
> +                   redirectAction.get()));
> +    mCacheDiskStorage->AsyncOpenURI(redirectURI, EmptyCString(),
> +                                    nsICacheStorage::OPEN_READONLY|nsICacheStorage::OPEN_SECRETLY,

space around |

@@ +924,5 @@
> +                   redirectAction.get()));
> +    mCacheDiskStorage->AsyncOpenURI(redirectURI, EmptyCString(),
> +                                    nsICacheStorage::OPEN_READONLY|nsICacheStorage::OPEN_SECRETLY,
> +                                    redirectAction);
> +    return RunPredictions(verifier);

rather as:

} else {
  CalculatePredictions(entry, lastLoad, loadCount, globalDegradation);
}

return RunPredictions(verifier);

?

On the other hand, what is the meaning of calling RunPredictions() at this point?

@@ +943,5 @@
> +
> +  int32_t globalDegradation = CalculateGlobalDegradation(mLastStartupTime);
> +  CalculatePredictions(entry, mLastStartupTime, mStartupCount,
> +                       globalDegradation);
> +  return RunPredictions(verifier);

looking at this code, CalculatePredictions may not be the best name since it schedules/setups predictions so that RunPredictions has a meaning after.

@@ +1053,5 @@
> +  // Since the visitor gets called under a cache lock, all we do there is get
> +  // copies of the keys/values we care about, and then do the real work here
> +  mKeysToOperateOn.Clear();
> +  mValuesToOperateOn.Clear();
> +  entry->VisitMetaData(this);

after this point, swap keys to local arrays

@@ +1101,5 @@
> +
> +  bool predicted = false;
> +  uint32_t len, i;
> +
> +  Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PREDICTIONS> totalPredictions;

isn't PREDICTOR_TOTAL_PREDICTIONS redundant?

@@ +1107,5 @@
> +  Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PRERESOLVES> totalPreresolves;
> +
> +  len = mPreconnects.Length();
> +  for (i = 0; i < len; ++i) {
> +    nsCOMPtr<nsIURI> uri = mPreconnects[i];

please swap() mPreconnects to a local array first.  That will avoid any even only potential reentrancy problems.

Please do this for all 'collective' members you have in the Predictor class.

@@ +1191,5 @@
> +  switch (reason) {
> +  case nsINetworkPredictor::LEARN_LOAD_TOPLEVEL:
> +    // This is the only case when we want to update the 'last used' metadata on
> +    // the cache entry we're getting. Right now, this is only used for tests.
> +    openFlags = nsICacheStorage::OPEN_NORMALLY;

sorry, this doesn't sound good to me, even this is just executed by tests.

@@ +1220,5 @@
> +  ++learnAttempts;
> +
> +  if (reason == nsINetworkPredictor::LEARN_STARTUP) {
> +    uriKey = mStartupURI;
> +    originKey = mStartupURI;

hmm.. it's really very very confusing, rather adjust the case code (duplicate) and get rid of this if(){} branch all together.  This is hard to track.

@@ +1249,5 @@
> +  uriKey->GetAsciiSpec(uriKeyStr);
> +  targetURI->GetAsciiSpec(targetUriStr);
> +  if (sourceURI) {
> +    sourceURI->GetAsciiSpec(sourceUriStr);
> +  }

all these are only for logging.  first, add some blank lines to separate it from the other executive code and second #define PR_LOGGING it.  Applies to more places.

@@ +1280,5 @@
> +    }
> +
> +    entry->SetMetaDataElement("predictor::seen", "1");
> +    // Need to ensure someone else can get to the entry if necessary
> +    entry->MetaDataReady();

this also doesn't sound right.  giving out an empty entry as an entry to read by others is simply a bad idea.  unless this is just the specific predictor entry and not a content entry.  then I could accept it.  you don't need to call this when you are opening as READONLY.

@@ +1385,5 @@
> +  nsCOMPtr<nsIURI> junk;
> +  uint32_t hitCount, lastHit, flags;
> +  bool isNewResource = (NS_FAILED(rv) ||
> +                        !ParseMetaDataEntry(key.BeginReading(), value.BeginReading(),
> +                                            getter_AddRefs(junk), hitCount,

please allow aURI of ParseMetaDataEntry be null (i.e. to not create an URI when you don't need it, it's wasting).
then you probably also don't need to pas |key|

@@ +1506,5 @@
> +  value = comma + 1;
> +  flags = static_cast<uint32_t>(atoi(value));
> +  PREDICTOR_LOG(("    flags -> %u", flags));
> +
> +  const char *uriStart = key + strlen(metaDataPrefix);

brave..

@@ +1580,5 @@
> +  }
> +
> +  --mEntriesToVisit;
> +  if (!mEntriesToVisit) {
> +    Complete();

hmm.. the OnCacheEntryAvailable callback can be called synchronously, so you can call this too early and also multiple times.

@@ +1592,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  if (!StringBeginsWith(nsDependentCString(asciiKey),
> +                        nsDependentCString(metaDataPrefix))) {

LITERAL_CSTRING

@@ +1625,5 @@
> +  uri->GetAsciiSpec(key);
> +  if (idEnhance.EqualsLiteral(PREDICTOR_ORIGIN_EXTENSION)) {
> +    // This is an entry we own, so we can just doom it entirely
> +    mPredictor->mCacheDiskStorage->AsyncDoomURI(uri, idEnhance, nullptr);
> +  } else if (idEnhance.EqualsLiteral("")) {

IsEmpty(), and why it actually has to be empty?
Attachment #8513875 - Flags: feedback-
Comment on attachment 8513870 [details] [diff] [review]
patch

Review of attachment 8513870 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/Predictor.h
@@ +145,5 @@
> +  // information from the cache. Returns true if any predictions were queued up
> +  //   * reason - What kind of prediction this is/why this prediction is
> +  //              happening (pageload, startup)
> +  //   * entry - the cache entry with the information we need
> +  //   * isNew - whether or not the cache entry is brand new and empty

I'd call it isEntryNew

@@ +335,2 @@
>  
> +  static Predictor *sSelf;

please add good descriptive comments to all members or at least each group.

::: netwerk/test/unit/test_predictor.js
@@ +77,5 @@
>  
>    onPredictPreconnect: function verifier_onPredictPreconnect(uri) {
>      var origin = extract_origin(uri);
>      var index = this.expected_preconnects.indexOf(origin);
> +    if (index == -1 && !this.complete) {

what is this !this.complete check for?
Attachment #8513870 - Flags: review?(honzab.moz) → feedback-
New patch incoming, here's some explanation on some of the harder-to-understand changes/non-changes in the new version.


(In reply to Honza Bambas (:mayhemer) from comment #27)
> @@ +341,5 @@
> > +{
> > +  if (mInitialized)
> > +    Shutdown();
> > +
> > +  RemoveObserver();
> 
> This won't work, since the obs service keeps a strong ref.  Maybe support
> weakref?

Fixed (dtor was the wrong place to do this in the first place, it should've been done in shutdown).

> @@ +543,5 @@
> > +
> > +#if defined(ANDROID) && !defined(MOZ_WIDGET_GONK)
> > +  // This is an ugly hack to disable the predictor on android < 2.3, as it
> > +  // doesn't play nicely with those android versions, at least on our infra.
> > +  // Causes timeouts in reftests. See bug 881804 comment 86.
> 
> That comment doesn't say much.  Also, these failures could indicate a bug
> that only those platforms were able to hit.  I think this is something that
> should be investigated more deeply.

I'm (now) 99% certain these failures were indicators of the sqlite issues (this code is left over from that time). However, we have no way to verify, since there's no test coverage for android < 2.3 any more.

> @@ +596,5 @@
> > +
> > +  return rv;
> > +}
> > +
> > +class PredictorThreadShutdownRunner : public nsRunnable
> 
> all these helper classes must be in an anon namespace

Doesn't compile (classes in anon namespaces can't be friends of classes outside the anon namespace).

> @@ +685,5 @@
> > +  rv = dbFile->AppendNative(NS_LITERAL_CSTRING("netpredictions.sqlite"));
> > +  RETURN_IF_FAILED(rv);
> > +
> > +  nsCOMPtr<nsIThread> ioThread;
> > +  rv = NS_NewNamedThread("Net Predictor", getter_AddRefs(ioThread));
> 
> "Net Predictor Cleaner" ?

Too long :) (16-char limit on thread names, iirc). Making this name "better" isn't really worth spending thought-cycles on for a thread that will live only for a very short time.

> @@ +1249,5 @@
> > +  uriKey->GetAsciiSpec(uriKeyStr);
> > +  targetURI->GetAsciiSpec(targetUriStr);
> > +  if (sourceURI) {
> > +    sourceURI->GetAsciiSpec(sourceUriStr);
> > +  }
> 
> all these are only for logging.  first, add some blank lines to separate it
> from the other executive code and second #define PR_LOGGING it.  Applies to
> more places.

FORCE_PR_LOG is defined by default on all builds now (including release), so ifdef'ing everything is not really worthwhile.

> @@ +1280,5 @@
> > +    }
> > +
> > +    entry->SetMetaDataElement("predictor::seen", "1");
> > +    // Need to ensure someone else can get to the entry if necessary
> > +    entry->MetaDataReady();
> 
> this also doesn't sound right.  giving out an empty entry as an entry to
> read by others is simply a bad idea.  unless this is just the specific
> predictor entry and not a content entry.  then I could accept it.  you don't
> need to call this when you are opening as READONLY.

This is only for predictor-specific entries, so no problems.

> @@ +1506,5 @@
> > +  value = comma + 1;
> > +  flags = static_cast<uint32_t>(atoi(value));
> > +  PREDICTOR_LOG(("    flags -> %u", flags));
> > +
> > +  const char *uriStart = key + strlen(metaDataPrefix);
> 
> brave..

This is guaranteed to only be called when we know the key starts with metaDataPrefix, so it's safe.

> @@ +1592,5 @@
> > +{
> > +  MOZ_ASSERT(NS_IsMainThread());
> > +
> > +  if (!StringBeginsWith(nsDependentCString(asciiKey),
> > +                        nsDependentCString(metaDataPrefix))) {
> 
> LITERAL_CSTRING

Doesn't compile :) (I tried that a long time ago.)
Attached patch patch (obsolete) — Splinter Review
Attachment #8513870 - Attachment is obsolete: true
Attachment #8513872 - Attachment is obsolete: true
Attachment #8513875 - Attachment is obsolete: true
Attachment #8527831 - Flags: review?(honzab.moz)
(In reply to Nicholas Hurley [:hurley] from comment #29)
> > That comment doesn't say much.  Also, these failures could indicate a bug
> > that only those platforms were able to hit.  I think this is something that
> > should be investigated more deeply.
> 
> I'm (now) 99% certain these failures were indicators of the sqlite issues
> (this code is left over from that time). However, we have no way to verify,
> since there's no test coverage for android < 2.3 any more.

Hmm.. if I understand correctly, should this be removed?

> 
> > @@ +596,5 @@
> > > +
> > > +  return rv;
> > > +}
> > > +
> > > +class PredictorThreadShutdownRunner : public nsRunnable
> > 
> > all these helper classes must be in an anon namespace
> 
> Doesn't compile (classes in anon namespaces can't be friends of classes
> outside the anon namespace).

Then please use 'detail' subnamespace or nested the classes to Predictor.

> 
> > @@ +685,5 @@
> > > +  rv = dbFile->AppendNative(NS_LITERAL_CSTRING("netpredictions.sqlite"));
> > > +  RETURN_IF_FAILED(rv);
> > > +
> > > +  nsCOMPtr<nsIThread> ioThread;
> > > +  rv = NS_NewNamedThread("Net Predictor", getter_AddRefs(ioThread));
> > 
> > "Net Predictor Cleaner" ?
> 
> Too long :) (16-char limit on thread names, iirc). Making this name "better"
> isn't really worth spending thought-cycles on for a thread that will live
> only for a very short time.

And it is worth, when someone is inspecting logs/crashes then the spotted name should tell him exactly what's going on.  "NetPredictClean" is 15 chars.  You will fit.

> 
> > @@ +1506,5 @@
> > > +  value = comma + 1;
> > > +  flags = static_cast<uint32_t>(atoi(value));
> > > +  PREDICTOR_LOG(("    flags -> %u", flags));
> > > +
> > > +  const char *uriStart = key + strlen(metaDataPrefix);
> > 
> > brave..
> 
> This is guaranteed to only be called when we know the key starts with
> metaDataPrefix, so it's safe.

The Lexan class will nicely fix this ;)  See bug 1024056, there is already a patch for tasting.

> 
> > @@ +1592,5 @@
> > > +{
> > > +  MOZ_ASSERT(NS_IsMainThread());
> > > +
> > > +  if (!StringBeginsWith(nsDependentCString(asciiKey),
> > > +                        nsDependentCString(metaDataPrefix))) {
> > 
> > LITERAL_CSTRING
> 
> Doesn't compile :) (I tried that a long time ago.)

Don't have static const char for this and only #define the string.  I can also see you are doing |strlen(metaDataPrefix)| on one place.  Not really good.

Going to look at the actual patch.
Hmm.. just tried


static const char metaDataPrefix[] = "predictor::";
nsresult
Predictor::OnMetaDataElement(const char *asciiKey, const char *asciiValue)
{
  MOZ_ASSERT(NS_IsMainThread());

  if (!StringBeginsWith(nsDependentCString(asciiKey),
                        NS_LITERAL_CSTRING(metaDataPrefix))) {


and it builds.  On Win.
(In reply to Honza Bambas (:mayhemer) from comment #27)
> Comment on attachment 8513875 [details] [diff] [review]
> predictor.cpp.patch
> 
> Review of attachment 8513875 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Few comments:
> - not sure all first review comments were addressed (but I might be wrong)
> - when testing the patch I often see some faulty logs:

I still see 'could not find first comma' in logs.  You have not reacted to this concern before.  If that is expected, maybe the message should be changed to something less 'error sound' and we are done ?

> @@ +726,5 @@
> > +
> > +  rv = svc->Init();
> > +  if (NS_FAILED(rv)) {
> > +    PREDICTOR_LOG(("Failed to initialize predictor, predictor will be a noop"));
> > +  }
> 
> There is NS_GENERIC_FACTORY_CONSTRUCTOR_INIT macro.  Please use it instead
> if possible.

Left unaddressed.

> rather as:
> 
> } else {
>   CalculatePredictions(entry, lastLoad, loadCount, globalDegradation);
> }
> 
> return RunPredictions(verifier);
> 
> ?
> 
> On the other hand, what is the meaning of calling RunPredictions() at this
> point?

Unanswered.

> @@ +1101,5 @@
> > +
> > +  bool predicted = false;
> > +  uint32_t len, i;
> > +
> > +  Telemetry::AutoCounter<Telemetry::PREDICTOR_TOTAL_PREDICTIONS> totalPredictions;
> 
> isn't PREDICTOR_TOTAL_PREDICTIONS redundant?

Unanswered.

> @@ +1191,5 @@
> > +  switch (reason) {
> > +  case nsINetworkPredictor::LEARN_LOAD_TOPLEVEL:
> > +    // This is the only case when we want to update the 'last used' metadata on
> > +    // the cache entry we're getting. Right now, this is only used for tests.
> > +    openFlags = nsICacheStorage::OPEN_NORMALLY;
> 
> sorry, this doesn't sound good to me, even this is just executed by tests.

Unanswered.  Actually, if you don't need to create an entry for this (when it doesn't exist) then OPEN_READONLY will also update the last used metadata for you while we are left safe.

On the other hand, I want to avoid empty entries for content cache.  I will never r+ something that does this.  If you need it, then the architecture may be wrong.

> @@ +1385,5 @@
> > +  nsCOMPtr<nsIURI> junk;
> > +  uint32_t hitCount, lastHit, flags;
> > +  bool isNewResource = (NS_FAILED(rv) ||
> > +                        !ParseMetaDataEntry(key.BeginReading(), value.BeginReading(),
> > +                                            getter_AddRefs(junk), hitCount,
> 
> please allow aURI of ParseMetaDataEntry be null (i.e. to not create an URI
> when you don't need it, it's wasting).
> then you probably also don't need to pas |key|

Unaddressed.  Creating URIs is not cheap.

> @@ +1625,5 @@
> > +  uri->GetAsciiSpec(key);
> > +  if (idEnhance.EqualsLiteral(PREDICTOR_ORIGIN_EXTENSION)) {
> > +    // This is an entry we own, so we can just doom it entirely
> > +    mPredictor->mCacheDiskStorage->AsyncDoomURI(uri, idEnhance, nullptr);
> > +  } else if (idEnhance.EqualsLiteral("")) {
> 
> IsEmpty(), and why it actually has to be empty?

Unanswered.
(In reply to Honza Bambas (:mayhemer) from comment #31)
> Hmm.. if I understand correctly, should this be removed?

Quite likely, but that can be done in a followup - similar to a lot of things you've commented on, this is code that has already been around for a while, and not (directly) part of the backend rewrite (which is what this bug is about). I'm not opposed to changing this or other things, but not as part of this bug.

> And it is worth, when someone is inspecting logs/crashes then the spotted
> name should tell him exactly what's going on.  "NetPredictClean" is 15
> chars.  You will fit.

I still disagree, given the fact that this thread will exist once per profile (in other words, almost never), but since it seems to be a sticking point with you, I won't spend more cycles defending it.

> The Lexan class will nicely fix this ;)  See bug 1024056, there is already a
> patch for tasting.

Fine. When that has actually landed, file a follow-up. If it lands before this patch does, let's still file a follow-up to avoid unnecessary churn on this bug.

(In reply to Honza Bambas (:mayhemer) from comment #32)
> and it builds.  On Win.

Gotta love platform differences... my mac hates it with a passion :) I'll switch to #defines

(In reply to Honza Bambas (:mayhemer) from comment #33)
> I still see 'could not find first comma' in logs.  You have not reacted to
> this concern before.  If that is expected, maybe the message should be
> changed to something less 'error sound' and we are done ?

This is from the predictor::seen metadata entry on predictor-only cache entries. Definitely not fatal, but I'll filter that metadata entry out before getting to this check to avoid the scary log.

> > There is NS_GENERIC_FACTORY_CONSTRUCTOR_INIT macro.  Please use it instead
> > if possible.
> 
> Left unaddressed.

Again, existing code. We can file a follow-up to clean that up (though I have in my mind a vague memory of that not working for me the first time around with the seer for some reason... even more reason to do it as a follow-up).

> > On the other hand, what is the meaning of calling RunPredictions() at this
> > point?
> 
> Unanswered.

Once again, existing code. To answer why, though, it's to set up the speculative connection to the redirect target.

> > isn't PREDICTOR_TOTAL_PREDICTIONS redundant?
> 
> Unanswered.

Yes, but again, existing code. Plus it's a good sanity check to ensure we're accumulating telemetry properly for all the rest of the predictions we run (even more important as we add more prediction types in the future).

> Unanswered.  Actually, if you don't need to create an entry for this (when
> it doesn't exist) then OPEN_READONLY will also update the last used metadata
> for you while we are left safe.
> 
> On the other hand, I want to avoid empty entries for content cache.  I will
> never r+ something that does this.  If you need it, then the architecture
> may be wrong.

It's addressed in the new version of the patch (OPEN_NORMALLY will only happen for predictor-exclusive entries). I did, however, just notice that I failed to update the comment, which is confusing. Will do in the next iteration.

> > please allow aURI of ParseMetaDataEntry be null (i.e. to not create an URI
> > when you don't need it, it's wasting).
> > then you probably also don't need to pas |key|
> 
> Unaddressed.  Creating URIs is not cheap.

The key is now allowed to be null, and we won't create a URI when that's the case. The one place we create a URI without actually using it is to make sure the metadata entry parses correctly (a sanity check), which can't be done without ensuring it has a valid URI in the key.

> > IsEmpty(), and why it actually has to be empty?
> 
> Unanswered.

Changed to be IsEmpty, and it has to be empty because the predictor won't open any cache entries with an idEnhance other than "" or PREDICTOR_ORIGIN_EXTENSION, and it doesn't make sense to operate on cache entries that couldn't possibly have predictor data on them. I can add a comment to that effect.
> (In reply to Honza Bambas (:mayhemer) from comment #32)
> > and it builds.  On Win.
> 
> Gotta love platform differences... my mac hates it with a passion :) I'll
> switch to #defines

Argh... this was because I had changed *all* the nsDependentCStrings, not just the ones on the static const char[]s. *facepalm* That's an easier fix.
(In reply to Nicholas Hurley [:hurley] from comment #34)
> I still disagree, given the fact that this thread will exist once per
> profile (in other words, almost never), but since it seems to be a sticking
> point with you, I won't spend more cycles defending it.

It's just about changing a string to make it more descriptive :)

> The key is now allowed to be null, and we won't create a URI when that's the
> case. The one place we create a URI without actually using it is to make
> sure the metadata entry parses correctly (a sanity check), which can't be
> done without ensuring it has a valid URI in the key.

Aha, then please rename |junk| to |sanityCheck|.

> 
> > > IsEmpty(), and why it actually has to be empty?
> > 
> > Unanswered.
> 
> Changed to be IsEmpty, and it has to be empty because the predictor won't
> open any cache entries with an idEnhance other than "" or
> PREDICTOR_ORIGIN_EXTENSION, and it doesn't make sense to operate on cache
> entries that couldn't possibly have predictor data on them. I can add a
> comment to that effect.

Aha.  OK, I'm just not sure why you are not operating on entries that do have eid.  Those are also valid entries.  But that is disputable and probably not even worth filing a bug right now.  Let's see if there is a need in the future.

Thanks.
Comment on attachment 8527831 [details] [diff] [review]
patch

Review of attachment 8527831 [details] [diff] [review]:
-----------------------------------------------------------------

So, this looks good.  Few nits left.  Next time I'll focus on sequencing of concurrent open and read operations on cache entries to make sure we are not making any significant downside anywhere.

::: netwerk/base/src/Predictor.cpp
@@ +250,5 @@
> +PredictorAction::OnCacheEntryCheck(nsICacheEntry *entry,
> +                                   nsIApplicationCache *appCache,
> +                                   uint32_t *result)
> +{
> +  MOZ_ASSERT(NS_IsMainThread(), "Got cache entry check off main thread!");

please use the CHECK_MULTITHREADED flag and let this happen on any thread.  It's important since any load that demands check on the main (caller) thread actually pushes all other checks pending on the entry as well to the main thread.  When the open sequence is Predictor/nsHttpChannel quickly one by one, you may just block on the main thread queue.  Bad thing to do.

@@ +691,5 @@
> +  if (!mEnabled || mCleanedUp) {
> +    return;
> +  }
> +
> +  mCleanedUp = true;

should be set when actually done?  when the files could not be deleted from some reason you leave them on disk, forever

@@ +1272,5 @@
> +                 "action=%p", originKeyStr.get(), targetOriginStr.get(),
> +                 sourceOriginStr.get(), reason, originAction.get()));
> +  mCacheDiskStorage->AsyncOpenURI(originKey,
> +                                  NS_LITERAL_CSTRING(PREDICTOR_ORIGIN_EXTENSION),
> +                                  openFlags, originAction);

Performance nit: please first open the resource entry, only then your predictor specific one.  This may start a file opening (and block) sooner than the AsyncOpenURI call bellow, despite the priority flag.

We now can open only sequentially.

@@ +1288,5 @@
> +                 "action=%p", uriKeyStr.get(), targetUriStr.get(),
> +                 sourceUriStr.get(), reason, uriAction.get()));
> +  // For learning full URI things, we *always* open readonly and secretly, as we
> +  // rely on actual pageloads to update the entry's metadata for us.
> +  openFlags = nsICacheStorage::OPEN_READONLY | nsICacheStorage::OPEN_SECRETLY;

For LEARN_LOAD_TOPLEVEL you should open with OPEN_PRIORITY.  Actually, now it's you who first opens the entry.  The first open is the one telling the load back end to prioritize or not.
Attachment #8527831 - Flags: review?(honzab.moz) → feedback+
Everything you've said sounds good, just a clarifying question to make sure this is done right.

(In reply to Honza Bambas (:mayhemer) from comment #37)
> ::: netwerk/base/src/Predictor.cpp
> @@ +250,5 @@
> > +PredictorAction::OnCacheEntryCheck(nsICacheEntry *entry,
> > +                                   nsIApplicationCache *appCache,
> > +                                   uint32_t *result)
> > +{
> > +  MOZ_ASSERT(NS_IsMainThread(), "Got cache entry check off main thread!");
> 
> please use the CHECK_MULTITHREADED flag and let this happen on any thread. 
> It's important since any load that demands check on the main (caller) thread
> actually pushes all other checks pending on the entry as well to the main
> thread.  When the open sequence is Predictor/nsHttpChannel quickly one by
> one, you may just block on the main thread queue.  Bad thing to do.

Makes perfect sense. I just want to make sure that even with CHECK_MULTITHREADED, OnCacheEntryAvailable will still be called on the main thread (the limitation of needing to mess around with nsIURIs).

> @@ +691,5 @@
> > +  if (!mEnabled || mCleanedUp) {
> > +    return;
> > +  }
> > +
> > +  mCleanedUp = true;
> 
> should be set when actually done?  when the files could not be deleted from
> some reason you leave them on disk, forever

Setting this bool won't really matter, since we could just try again on the next startup. However, setting the pref (done in PredictorThreadShutdownRunner) is what will hit us here - even if the deletes fail, we'll set the pref to true, so we could end up with the files still on disk. I'll fix that to do the right thing if the deletes fail.
Attached patch patchSplinter Review
OK, this should be everything you asked for. It's possible I'll need to update this to fix build failures on try (the last time I found errors with anonymous namespaces wasn't until it hit try - my mac liked them just fine), but this should be pretty close to a patch that will pass try (previous versions have, so any changes required for this one should be pretty minor).
Attachment #8527831 - Attachment is obsolete: true
Attachment #8528709 - Flags: review?(honzab.moz)
Comment on attachment 8528709 [details] [diff] [review]
patch

Review of attachment 8528709 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks good.

r=honzab

When all of you new I'd been seriously ill and this was an important work (a goal) I don't follow why you haven't found someone else to do this final nits review.  You could land sooner w/o a deeper review from me (I won't do any deeper inspections of this code anyway till the new year) and see how it behaves whole two weeks before the Christmas easily.

To land this today - between xmas holidays - is not wise.  I'd rather wait when we all are back full time.  We've screwed up once, so why to haste?

::: netwerk/base/src/Predictor.cpp
@@ +538,5 @@
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  if (!StringBeginsWith(nsDependentCString(asciiKey),
> +                        NS_LITERAL_CSTRING(metaDataPrefix)) ||
> +      !strcmp(asciiKey, seenMetaData)) {

Nit: strcmp can also be converted as:

NS_LITERAL_CSTRING(seenMetaData).Equals(asciiKey)

I think...
Attachment #8528709 - Flags: review?(honzab.moz) → review+
So a combination of https://treeherder.mozilla.org/#/jobs?repo=try&revision=1adba174cf65 (everything in the patch attached here) plus https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1beb1549cf1 (fixing two minor static analysis failures that showed up in the new year once static analysis was enabled on infra, plus the nit honza had) show a happy try with this patch. Let's see what non-try dislikes :)

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/f925d9830745
https://hg.mozilla.org/mozilla-central/rev/f925d9830745
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(In reply to Nicholas Hurley [:hurley] from comment #29)

> > > +#if defined(ANDROID) && !defined(MOZ_WIDGET_GONK)
> > > +  // This is an ugly hack to disable the predictor on android < 2.3, as it
> > > +  // doesn't play nicely with those android versions, at least on our infra.
> > > +  // Causes timeouts in reftests. See bug 881804 comment 86.
> > 
> > That comment doesn't say much.  Also, these failures could indicate a bug
> > that only those platforms were able to hit.  I think this is something that
> > should be investigated more deeply.
> 
> I'm (now) 99% certain these failures were indicators of the sqlite issues
> (this code is left over from that time). However, we have no way to verify,
> since there's no test coverage for android < 2.3 any more.

Skimming through this bug and saw this -- we only support Android 2.3 and higher, which is why there's no test coverage, so you're in the clear there.
Nicholas: I don't see any telemetry in this (pretty big!) commit for disk or memory usage increases for the cache. Am I missing something?
Flags: needinfo?(hurley)
(In reply to Richard Newman [:rnewman] from comment #43)
> Skimming through this bug and saw this -- we only support Android 2.3 and
> higher, which is why there's no test coverage, so you're in the clear there.

Excellent, thanks! I'll file a follow-up to remove that code and make sure everything stays good so we can get that ugly hack out of there :)
Flags: needinfo?(hurley)
(In reply to Richard Newman [:rnewman] from comment #44)
> Nicholas: I don't see any telemetry in this (pretty big!) commit for disk or
> memory usage increases for the cache. Am I missing something?

That's a good question, actually. Everything in here is still controlled by the limits in the cache, but I don't know if we have any telemetry to follow to see what changes we get. Honza, do you know if there's any telemetry we have that would show changes in disk and memory usage before and after this landed?
Flags: needinfo?(honzab.moz)
Depends on: 1122540
Looks like a feature that we'll want to note once it moves past Nightly.
relnote-firefox: --- → ?
On the subject of which: certainly for mobile, and perhaps for desktop, I'd like to put this pref change behind a Nightly-only flag until we have a strong idea of its footprint. Nicholas, any thoughts?
Release Note Request (optional, but appreciated)
[Why is this notable]: faster browsing
[Suggested wording]: Allow speculative connection warmup to improve page load times.
[Links (documentation, blog post, etc)]: (I suspect :hurley or :mayhemer already published a blog post or is planning to do so ;) ).

(This was previously implemented but blocked on performance. This bug reimplemented the backend on top of the new HTTP cache backend and also enabled the predictor on all platforms except B2G, judging by the patch.)
Depends on: 1122952
Was there any test performed for this that it works correctly now? As I remember in the past we have big performance problems like these in bug #966469 and bug #945779.
(In reply to Nicholas Hurley [:hurley] from comment #46)
> (In reply to Richard Newman [:rnewman] from comment #44)
> > Nicholas: I don't see any telemetry in this (pretty big!) commit for disk or
> > memory usage increases for the cache. Am I missing something?
> 
> That's a good question, actually. Everything in here is still controlled by
> the limits in the cache, but I don't know if we have any telemetry to follow
> to see what changes we get. Honza, do you know if there's any telemetry we
> have that would show changes in disk and memory usage before and after this
> landed?

We have tests (AWSY?) that may show any changes.  We also may want to watch the hit/miss ratio.  Ans also page load time, if this really is going to have some impact.  Honestly, the subresource lists specific entries may trigger more eviction -> less hit -> lower page load time.  But we'll see, this needs tuning that cannot be done w/o external data.
Flags: needinfo?(honzab.moz)
Duplicate of this bug: 966469
Duplicate of this bug: 973871
Duplicate of this bug: 1045884
Duplicate of this bug: 961175
Is this OK to take up to Aurora at this point?  Do we have enough information to confirm there will not be major performance regressions with a larger user base?
Flags: needinfo?(hurley)
Lukas, yes, I'm confident at this point that this won't be a regression-causing patch. Of course, I will continue to keep my eye on things as this code moves along the trains :)
Flags: needinfo?(hurley)
Noted as:

Changed: Page load times improved by speculative connection warmup
No longer depends on: 1271944
You need to log in before you can comment on or make changes to this bug.