Closed Bug 881804 Opened 11 years ago Closed 11 years ago

add interface for predictive actions

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: u408661, Assigned: u408661)

References

(Depends on 1 open bug)

Details

(Keywords: feature, relnote, Whiteboard: [qa-])

Attachments

(6 files, 19 obsolete files)

3.17 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.64 KB, patch
jst
: review+
Details | Diff | Splinter Review
1.89 KB, patch
ttaubert
: review+
mossop
: review+
Gavin
: feedback+
Details | Diff | Splinter Review
3.59 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
2.01 KB, patch
seth
: review+
Details | Diff | Splinter Review
113.45 KB, patch
mcmanus
: review+
u408661
: superreview+
Details | Diff | Splinter Review
Similar to nsISpeculativeConnect, but we should be smarter about what actions we take (DNS prefetch versus TCP preconnect) based on previous actual actions taken by the user. We should also hook it into more places.

Doing less privacy-preserving predictions (such as prefetching resources or prerendering entire pages) is explicitly NOT a goal of this bug, we're limiting to DNS prefetch and TCP preconnect (which includes SSL handshake).
Attached patch seer implementation patch (obsolete) — Splinter Review
This patch provides the implementation for the network seer as it stands right now. Notable TODOs include:

*Handling redirects (placeholders in... place for this)
*Integrating with "Clear Recent History..."
*Probably a whole lot of dumb things that I totally missed

Example integrations coming in yet another patch.
Attached patch example integrations (obsolete) — Splinter Review
OK, here's the example integrations for docshell (handles both hovering over a link and loading a page) and nsScriptLoader (which is used to learn about scripts loaded by a page, surprise, surprise).

Right now, docshell is the only one that deals with private browsing mode. Obviously, all the integrations will do this in the future, but I had something that (vaguely) worked, and I wanted to get it up for feedback ASAP. Note the well-placed TODO in nsScriptLoader.cpp to remind me of this :)
Attachment #761693 - Flags: feedback?(mcmanus)
Attachment #761693 - Flags: feedback?(jduell.mcbugs)
Attachment #761693 - Flags: feedback?(honzab.moz)
Attachment #761696 - Flags: feedback?(bzbarsky)
Boris, please feel free to pass the f? for docshell, etc on to someone more appropriate.

That goes for everyone, actually. If there's someone I missed, please add them to the f? chain.
Comment on attachment 761696 [details] [diff] [review]
example integrations

Seems fairly reasonable, modulo the naming ("NetworkSeer" sounds odd... maybe "RequestPredictor"?).
Attachment #761696 - Flags: feedback?(bzbarsky) → feedback+
That said, necko already knows when we're starting a new document load, so I'm not sure why we have to explicitly tell it that bit...
Comment on attachment 761693 [details] [diff] [review]
seer implementation patch

Review of attachment 761693 [details] [diff] [review]:
-----------------------------------------------------------------

overall, I think this is right.. awesome! I made a few suggestions about getting a couple other things in here in the first cut that are pretty doable in my mind (awesomebar.. maybe cache warming)

I tried to just provide feedback on the top level things.. the interfaces, the h files, etc.. for the cpp I just kind of did a drive by of a few things, not a real review.

I would love to have a plan for how to evaluate this patch. Can you suggest a metric we have (or more likely can build) that this patch should drive?

::: netwerk/base/public/nsINetworkSeer.idl
@@ +1,5 @@
> +/* vim: set ts=2 sts=2 et sw=2: */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +

I like comments at the top of an IDL saying what the heck this interface is all about.

@@ +6,5 @@
> +#include "nsISupports.idl"
> +
> +interface nsIURI;
> +
> +typedef long SeerPredictReason;

probly unsigned for enums

@@ +10,5 @@
> +typedef long SeerPredictReason;
> +typedef long SeerLearnReason;
> +
> +[scriptable, uuid(884a39a0-a3ed-4855-826a-fabb73ae878d)]
> +interface nsINetworkSeer : nsISupports

this appears to be a main thread only interface? (nsIURI kind of makes it that way). Please document it.

@@ +13,5 @@
> +[scriptable, uuid(884a39a0-a3ed-4855-826a-fabb73ae878d)]
> +interface nsINetworkSeer : nsISupports
> +{
> +  /**
> +   * Prediction reasons

I think we should add in some awesomebar reasons even for the first cut (i.e. uri is top awesome bar result, uri is top 5 awesomebar result). I suspect that's a very strong indicator.

in a separate bug honza was talking about being able to warm cache entries based on relationships.. kicking that off here (cache loading based on speculation) is probably also something suitable for 1.0

while we don't need to do it yet, pre-fetching followed by pre-rendering is clearly on the roadmap for this.. as a thought experiment does this predict/learn pattern support everything needed for that?

@@ +36,5 @@
> +   * taking actions such as DNS prefetch and/or TCP preconnect based on
> +   * (1) the host name that we are given, and (2) the reason we are being
> +   * asked to take actions.
> +   *
> +   * @param target - The URI we are being asked to take actions based on. Note:

I'm not sure its necessary to limit, in the idl, what parts of the URI we are working with

@@ +39,5 @@
> +   *
> +   * @param target - The URI we are being asked to take actions based on. Note:
> +   *   only the scheme, host, and port are taken into account.
> +   * @param referer - The URI that is currently loaded. This is so we can avoid
> +   *   doing predictive actions for link hover on an HTTPS page (for example).

I think you might want to make this into some kind of callbacks that can be queried for nsILoadContext (for private browsing) and for nsImumble for the ssl question. I think that's a better approach than making the callers deal with it.

@@ +61,5 @@
> +   * LEARN_LOAD_REDIRECT - we are learning about the re-direct of a URI
> +   *
> +   * LEARN_STARTUP - we are learning about a page loaded during startup
> +   */
> +  const SeerLearnReason LEARN_LOAD_SUBRESOURCE = 1;

nit: the LEARN consts are 1 based, while the PREDICT consts are 0 based

::: netwerk/base/src/nsNetworkSeer.cpp
@@ +36,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +#define RETURN_IF_FAILED(_rv) \

I really don't like macros that return.. though naming them with RETURN is a definite improvement :).. its up to you if you really want it in your code.

@@ +45,5 @@
> +  } while (0)
> +
> +#define SEER_ENABLED_PREF "network.seer.enabled"
> +
> +#define PRECONNECT_MIN 90

c++ style const's instead of preprocessor defines

@@ +160,5 @@
> +  :mInitialized(false)
> +  ,mIOThread(nullptr)
> +  ,mSpeculativeService(nullptr)
> +  ,mDB(nullptr)
> +  ,mStatementGetPage(nullptr)

all of these comptrs self initialize to nullptr, so you can remove them from the list.

@@ +178,5 @@
> +  ,mObserver(nullptr)
> +  ,mListener(nullptr)
> +  ,mDnsService(nullptr)
> +{
> +  MOZ_ASSERT(gSeer == nullptr, "multiple nsNetworkSeer instances!");

use !gseer (try not to compare things to nullptr)

@@ +580,5 @@
> +      nsAutoCString hostname;
> +      uri->GetAsciiHost(hostname);
> +      nsCOMPtr<nsICancelable> tmpCancelable;
> +      nsNetworkSeer::gSeer->mDnsService->AsyncResolve(hostname,
> +          nsIDNSService::RESOLVE_PRIORITY_LOW | nsIDNSService::RESOLVE_SPECULATE,

you might actually want to bring the priority up here.. this is better information than the normal speculative screen scraping stuff.

::: netwerk/base/src/nsNetworkSeer.h
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#pragma once

iirc there is a reason not to do this with pragma in our code base. weirdly I can't recall it :(

@@ +12,5 @@
> +#include "nsINetworkSeer.h"
> +#include "nsISpeculativeConnect.h"
> +#include "nsThreadUtils.h"
> +#include "nsIObserver.h"
> +#include "nsIDNSService.h"

total nit - I'm hoping to alphabetize includes in all new code.. (same thing for c++) I've found it handy

@@ +80,5 @@
> +  nsCOMPtr<mozIStorageStatement> mStatementAddStartupPage;
> +  nsCOMPtr<mozIStorageStatement> mStatementUpdateStartupPage;
> +
> +  PRTime mStartupTime;
> +  PRTime mLastStartupTime;

we're trying to get rid of PRTime.. use mozilla::Timestamp instead.

@@ +83,5 @@
> +  PRTime mStartupTime;
> +  PRTime mLastStartupTime;
> +  int32_t mStartupCount;
> +
> +  nsNetworkSeerObserver *mObserver;

nsRefPtr

@@ +85,5 @@
> +  int32_t mStartupCount;
> +
> +  nsNetworkSeerObserver *mObserver;
> +
> +  nsNetworkSeerDNSListener *mListener;

nsRefPtr
Attachment #761693 - Flags: feedback?(mcmanus) → feedback+
(In reply to Patrick McManus [:mcmanus] from comment #6)
> @@ +80,5 @@
> > +  nsCOMPtr<mozIStorageStatement> mStatementAddStartupPage;
> > +  nsCOMPtr<mozIStorageStatement> mStatementUpdateStartupPage;
> > +
> > +  PRTime mStartupTime;
> > +  PRTime mLastStartupTime;
> 
> we're trying to get rid of PRTime.. use mozilla::Timestamp instead.

So I've been looking at this, and the thing is... I need some time stamp value (not duration) I can serialize to disk (sqlite in this case) to persist across restarts, which (unless I'm missing something), mozilla::Timestamp doesn't give me. Do we have something other than PRTime that fits the bill?
What would this timestamp value represent?  How would it deal with clock changes across restarts?
(In reply to Boris Zbarsky (:bz) from comment #11)
> How would it deal with clock
> changes across restarts?

Other than making sure precision isn't a mission critical property of whatever is being serialized, is there anything that can be done about the clock change while we're shutdown problem? I'm pretty sure cache entries suffer from the same thing as would a serialized prtime..
Right.  If you don't care too much about clock skew, then serialized PRTime is probably fine.
is it a bad idea to suggest Timestamp be serializable?
We could try, but its not clear what that would mean, for a monotonic clock with a floating 0 base...
I'm not particularly worried about clock skew for this feature; it's used to "decay" our certainty in a predictive action based on how old our data is. If we're a little too aggressive because the clock was skewed backwards, it's not the end of the world. Same if we're too timid because the clock was skewed forward.
nick, go ahead and use the nspr time.
Comment on attachment 761693 [details] [diff] [review]
seer implementation patch

Review of attachment 761693 [details] [diff] [review]:
-----------------------------------------------------------------

There is a relatively complex (at least appearing so) logic and ABSOLUTELY NO comments.  Please add them.  Explain all what you can.

::: netwerk/base/public/nsINetworkSeer.idl
@@ +10,5 @@
> +typedef long SeerPredictReason;
> +typedef long SeerLearnReason;
> +
> +[scriptable, uuid(884a39a0-a3ed-4855-826a-fabb73ae878d)]
> +interface nsINetworkSeer : nsISupports

Shouldn't these be rather two interfaces?  It seems like predict() will have very different consumers then learn().

::: netwerk/base/src/nsNetworkSeer.cpp
@@ +41,5 @@
> +  do { \
> +    if (NS_FAILED(_rv)) { \
> +      return; \
> +    } \
> +  } while (0)

Oh, I don't like these macros... hard to debug and hard to say what they do.

@@ +256,5 @@
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  rv = predDB->AppendNative(NS_LITERAL_CSTRING("seer.sqlite"));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = ss->OpenDatabase(predDB, getter_AddRefs(mDB));

Oh god!! Another "Init with I/O" on the main thread.  NO!

Also, when the service is disabled, you shouldn't open the database at all.

@@ +462,5 @@
> +    rv = svc->QueryInterface(aIID, aResult);
> +  }
> +  NS_RELEASE(svc);
> +
> +  return rv;

Just a note that this may fail code that does 
do_GetService();
NS_ENSURE_SUCCESS();

But since this is just an optimization thing, we might turn it to just a no-op service when something goes wrong and only log a warning.

::: netwerk/base/src/nsNetworkSeer.h
@@ +77,5 @@
> +  nsCOMPtr<mozIStorageStatement> mStatementGetStartup;
> +  nsCOMPtr<mozIStorageStatement> mStatementGetOneStartupPage;
> +  nsCOMPtr<mozIStorageStatement> mStatementGetAllStartupPages;
> +  nsCOMPtr<mozIStorageStatement> mStatementAddStartupPage;
> +  nsCOMPtr<mozIStorageStatement> mStatementUpdateStartupPage;

Please use cache for statements.  See e.g DOMStorageDBThread.cpp as an example.
Attachment #761693 - Flags: feedback?(honzab.moz) → feedback-
I didn't check the patch so deeply, but how this behaves in private browsing mode?  it must not store any data persistently obviously and we may also consider ignoring the already collected data to protect from any possibilities of detection.
(In reply to Honza Bambas (:mayhemer) from comment #18)
> There is a relatively complex (at least appearing so) logic and ABSOLUTELY
> NO comments.  Please add them.  Explain all what you can.

Indeed. Commenting is the last thing I tend to do, as it makes it so I have to go back, identify the most confusing parts on my own, and then I'm more likely to have commented the parts that will confuse other people, as well :)

> ::: netwerk/base/public/nsINetworkSeer.idl
> @@ +10,5 @@
> Shouldn't these be rather two interfaces?  It seems like predict() will have
> very different consumers then learn().

They may have different consumers, but they're rather intimately connected. You can't predict() without having already learn()ed something, so I think it makes more sense to have them together.

> @@ +256,5 @@
> Oh god!! Another "Init with I/O" on the main thread.  NO!

I was under the impression that DB connections had to be opened on the main thread, is that not the case?

I could have used the fully-async connection, but that would have resulted in an extra round trip to the main thread per prediction or learn, which I thought sounded worse. It's quite possible that I'm wrong, though :)

> Also, when the service is disabled, you shouldn't open the database at all.

Good point.

> @@ +462,5 @@
> Just a note that this may fail code that does 
> do_GetService();
> NS_ENSURE_SUCCESS();
> 
> But since this is just an optimization thing, we might turn it to just a
> no-op service when something goes wrong and only log a warning.

Makes sense to me.

> ::: netwerk/base/src/nsNetworkSeer.h
> @@ +77,5 @@
> Please use cache for statements.  See e.g DOMStorageDBThread.cpp as an
> example.

Good tip, will look into it.

PB mode is handled just fine so far, but I think I'm going to change how it's handled based on Patrick's feedback above (it makes more sense).
(In reply to Nick Hurley [:hurley] from comment #20)
> (In reply to Honza Bambas (:mayhemer) from comment #18)
> > There is a relatively complex (at least appearing so) logic and ABSOLUTELY
> > NO comments.  Please add them.  Explain all what you can.
> 
> Indeed. Commenting is the last thing I tend to do, as it makes it so I have
> to go back, identify the most confusing parts on my own, and then I'm more
> likely to have commented the parts that will confuse other people, as well :)

Lessen for next time ;)  Please add the comments for the next patch version.

> 
> > ::: netwerk/base/public/nsINetworkSeer.idl
> > @@ +10,5 @@
> > Shouldn't these be rather two interfaces?  It seems like predict() will have
> > very different consumers then learn().
> 
> They may have different consumers, but they're rather intimately connected.
> You can't predict() without having already learn()ed something, so I think
> it makes more sense to have them together.

Agree.

> 
> > @@ +256,5 @@
> > Oh god!! Another "Init with I/O" on the main thread.  NO!
> 
> I was under the impression that DB connections had to be opened on the main
> thread, is that not the case?

No.  You can open on any thread.  Just have one connection per a thread (i.e. use a single connection object strictly on a single thread), best only one r/w and arbitrary number of r/o used on other threads.

> 
> I could have used the fully-async connection, but that would have resulted
> in an extra round trip to the main thread per prediction or learn, which I
> thought sounded worse. It's quite possible that I'm wrong, though :)

You don't need to use the async mozstorage api.  Just look at how the dom storage sqlite code works, it's quite tuned.
Attached patch Latest implementation patch (obsolete) — Splinter Review
Jason - as promised (though a couple days later than promised), here's an updated implementation patch for your feedback. This is pretty much what I'm going to r? when the time comes (modulo any feedback from you), but a couple of the other patches aren't ready for that yet, so no r? quite yet.
Attachment #761693 - Attachment is obsolete: true
Attachment #761693 - Flags: feedback?(jduell.mcbugs)
Attachment #781114 - Flags: feedback?(jduell.mcbugs)
Attached patch Latest integration patch (obsolete) — Splinter Review
This is just an update of the previous integration patch to work with the new interface in the latest implementation patch. More integrations still to come.
Attachment #761696 - Attachment is obsolete: true
Attached patch Firefox UI hook-up patch (obsolete) — Splinter Review
This is the patch that hooks the seer up to the firefox UI (specifically for "Clear Private Data" functionality).
(In reply to Nick Hurley [:hurley] from comment #22)
> Created attachment 781114 [details] [diff] [review]
> Latest implementation patch

I should note that I've addressed all of Patrick's and Honza's feedback on the previous patch in this version (modulo the addition of functionality Patrick mentioned that isn't part of the current first-run functionality on the goals wiki)
Attached patch Part 1 - seer implementation (obsolete) — Splinter Review
Here we go, the big r?. This is the first in the patch series, containing the IDL, implementation, and associated support bits for the seer. There are still a couple failures on try that I'm working out (latest run at https://tbpl.mozilla.org/?tree=Try&rev=c4bccbbe3de9), but I don't expect those to wildly change anything, so I'm going to do review and try fixes in parallel.

This includes predictive capabilities for startup, pageload, and link hover. There are also tests facilitated by nsINetworkSeerVerifier (the only purpose of that IDL is to support unit tests for this feature).
Attachment #781114 - Attachment is obsolete: true
Attachment #781114 - Flags: feedback?(jduell.mcbugs)
Attachment #790283 - Flags: superreview?(cbiesinger)
Attachment #790283 - Flags: review?(mcmanus)
Attached patch Part 2 - Docshell integration (obsolete) — Splinter Review
Herein lies the docshell integration, which is where we currently drive all our predictive actions from (with the exception of startup). This is the same patch that bz gave f+ to a while back, along with a whitespace fix I came across in the code I touched (one part was indented differently from the rest of the function).
Attachment #781117 - Attachment is obsolete: true
Attachment #790287 - Flags: review?(benjamin)
And here we have a way for the seer to learn about scripts that are loaded as part of a page. It's really quite simple, the interface being used is in part 1.
Attachment #790288 - Flags: review?(jst)
Attached patch 04_layout.patch (obsolete) — Splinter Review
Similar to part 3, some code so the seer can know about stylesheets and fonts being loaded.
Attachment #790291 - Flags: review?(dbaron)
Just like parts 3 and 4, except for images that we load (noticing a pattern here?)
Attachment #790293 - Flags: review?(joe)
Here we go, the final bits (for now). This hooks up the seer to the clear private data functionality in the Firefox UI. It's quite possible that we don't need another "bucket" for this, and can just group it in with the existing "History" bucket, but this was simple enough to do, and changing it around to be under "History" will be simple enough, too.
Attachment #781119 - Attachment is obsolete: true
Attachment #790296 - Flags: review?(gavin.sharp)
Why is this still called "seer"?  I thought "connect predictor" or so would be better.
Attachment #790293 - Flags: review?(joe) → review?(seth)
Comment on attachment 790296 [details] [diff] [review]
Part 6 - browser hookups for clear private data

"Predictive Action Data" is pretty jargon-y. This is "data" that's relatively short-lived and is not persisted to disk, right? I'm inclined to lump it in with "browsing history" as you suggest. Also, is there a way to clear it per-host, for ForgetAboutSite.jsm?
Attachment #790296 - Flags: review?(gavin.sharp) → feedback-
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #33)
> Comment on attachment 790296 [details] [diff] [review]
> Part 6 - browser hookups for clear private data
> 
> "Predictive Action Data" is pretty jargon-y. This is "data" that's
> relatively short-lived and is not persisted to disk, right? I'm inclined to
> lump it in with "browsing history" as you suggest. Also, is there a way to
> clear it per-host, for ForgetAboutSite.jsm?

The data for this is persisted to disk, but I think it probably still makes sense to lump it in with "browsing history" (no need to confuse users with my jargon-filled patch).

As for ForgetAboutSite.jsm, I haven't looked at it, but I think it's doable just based on the name (I didn't even know we had that feature!) I'll take a look and make sure.
Comment on attachment 790293 [details] [diff] [review]
Part 5 - image loading integration

Review of attachment 790293 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/imgLoader.cpp
@@ +1253,5 @@
>      hvc->AddProxy(proxy);
>  
> +    if (aLoadGroup) {
> +      nsCOMPtr<nsIInterfaceRequestor> callbacks;
> +      nsCOMPtr<nsILoadContext> loadContext;

Why not put everything related to loadContext into the smallest scope it can go in, which is the "if (seer)" scope? Same goes for the identical code in LoadImage.

@@ +1753,5 @@
> +          seer->Learn(aURI, aReferrerURI, nsINetworkSeer::LEARN_LOAD_SUBRESOURCE,
> +              loadContext);
> +        }
> +      }
> +    }

Speaking of identical code, we shouldn't be copy-pasting code around like this, especially not in functions which are already ridiculously bloated messes. It would be better to wrap this in a function which takes aLoadGroup, aURI, and aReferrerURI as arguments.

But actually the situation is even worse than that! I see that other patches up for review in this bug contain essentially the exact same code. Obviously this is a pattern that will be repeated in many places. Please create a new static method (or plain old function) that provides this functionality in a single place for all client code. Unfortunately, XPIDL won't let you define static methods. You could perhaps use a %{C++ block in nsINetworkSeer.idl, but I find the idea of introducing a new header file less ugly overall, so that's my recommendation.
Attachment #790293 - Flags: review?(seth) → review-
Alternatively, just modify the interface in the IDL to require less setup on the part of the caller, as you see fit. I confess that I tend to be a fan of hiding XPCOM-style code behind a C++ wrapper where we don't lose any value by doing so, though.
Comment on attachment 790288 [details] [diff] [review]
Part 3 - ScriptLoader integration

r=jst. The one comment I have is that if this is something we'll need to start doing on all kinds of loads throughout the DOM code then it might make sense to add a static getter for the seer so we can avoid going through the XPCOM service manager for each and every usage of this. Followup bug material if we feel that's worth it.
Attachment #790288 - Flags: review?(jst) → review+
Nick, just let you know about

https://hg.mozilla.org/projects/gum/file/ba5504232f24/netwerk/base/public/nsILoadContextInfo.idl and
https://hg.mozilla.org/projects/gum/file/ba5504232f24/netwerk/base/src/LoadContextInfo.h

which I'm introducing on GUM.  You may want to take them (adjust) and use here.  I planned on this class a long ago.  Will go with the new cache in, but can be taken earlier.
Comment on attachment 790283 [details] [diff] [review]
Part 1 - seer implementation

Note, I've focused on the interface changes and skimmed the rest. sr=biesi with some nits

+/**
+ * nsINetworkSeerVerifier - used for testing the network seer to ensure it
+ *                          does what we expect it to do.
+ */

fyi, those comments typically go directly before the [scriptable] line

+++ b/netwerk/base/src/Seer.cpp
+// the result (since we're just trying to warn the cache)

warn -> warm, I think, though I like your version :)

+      nsDependentCString("SELECT * FROM moz_startups;\n"),

NS_LITERAL_CSTRING, here and in other places

+  ioThread->Dispatch(new SeerDBShutdownRunner(), NS_DISPATCH_NORMAL);

This isn't a safe way to call an xpcom function, you need to keep a reference:
  nsCOMPtr<nsIRunnable> runner(new SeerDBShutdownRunner());
  ioThread->Dispatch(runner, NS_DISPATCH_NORMAL);

Otherwise, if dispatch addrefs and releases the pointer, it suddenly becomes invalid, which is bad.

That said, I see this is a pretty common pattern, so maybe we decided not to care...

+++ b/netwerk/base/src/Seer.h
+struct uriInfo {

Uppercase U
Attachment #790283 - Flags: superreview?(cbiesinger) → superreview+
Here's an update to the seer patch that addresses Seth's comments from comment #35, as well as the last failures I've seen so far on try (green as of https://tbpl.mozilla.org/?tree=Try&rev=a8e350619808). This does not (yet) address Biesi's sr+ comments, but I'm carrying forward his sr+ with the caveat that I need to address them in a later version of the patch (they're pretty much all mechanical, anyway).
Attachment #790283 - Attachment is obsolete: true
Attachment #790283 - Flags: review?(mcmanus)
Attachment #793054 - Flags: superreview+
Attachment #793054 - Flags: review?(mcmanus)
Attached patch Part 4 - Layout integration (v2) (obsolete) — Splinter Review
This is a minor update to the layout integration patch that addresses Seth's comments from comment #35 in conjunction with v2 of part 1.
Attachment #790291 - Attachment is obsolete: true
Attachment #790291 - Flags: review?(dbaron)
Attachment #793059 - Flags: review?(dbaron)
And here's an update to the image loading integration that addresses Seth's comments in conjunction with v2 of part 1.
Attachment #790293 - Attachment is obsolete: true
Attachment #793060 - Flags: review?(seth)
Just to make this explicit, still TODO as of the time I write this comment:

- Address Biesi's sr+ from comment #39
- Address Gavin's comments from comment #33
- Any more comments incoming from future reviews :)
Comment on attachment 793060 [details] [diff] [review]
Part 5 - image loading integration (v2)

Review of attachment 793060 [details] [diff] [review]:
-----------------------------------------------------------------

I love it! Thanks Nicholas!
Attachment #793060 - Flags: review?(seth) → review+
Attachment #790287 - Flags: review?(benjamin) → review?(bugs)
Comment on attachment 790287 [details] [diff] [review]
Part 2 - Docshell integration

So, I'd prefer the helper method and I think the originalURI <->url comparison
should be done in such helper.
(Just don't want to add more random not-docshell-specific logic to all ready
too complicated docshell)
Attachment #790287 - Flags: review?(bugs) → review-
Comment on attachment 793054 [details] [diff] [review]
Part 1 - seer implementation (v2)

Review of attachment 793054 [details] [diff] [review]:
-----------------------------------------------------------------

nick - I really like this!

I am concerned about database load and queues on bad hardware given that we are now inserting them (asynchronously) into every transaction. I think at most it was every page load previously (for history). can we figure out some kind of queue size metric and just drop things on the floor when it is exceeded?

What telemetry are we going to gather here, and more broadly - how do we assess if this is useful?

let's deal with # of connections somewhere.. speculativeconnect() is pretty conservative right now - but if we know a domain needs 6 then let's open 6!

::: netwerk/base/src/Seer.cpp
@@ +49,5 @@
> +const int PRECONNECT_MIN = 90;
> +const int PRERESOLVE_MIN = 60;
> +const int REDIRECT_LIKELY = 75;
> +
> +const long long ONE_DAY = 86400LL * 1000000LL;

nit: comment these are in usec

@@ +81,5 @@
> +NS_IMPL_ISUPPORTS1(SeerObserver, nsIObserver)
> +
> +nsresult
> +SeerObserver::Install()
> +{

moz_assert(ismainthread)

@@ +88,5 @@
> +    mozilla::services::GetObserverService();
> +  if (obs) {
> +    rv = obs->AddObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID,
> +        false);
> +    NS_ENSURE_SUCCESS(rv, rv);

so this is kinda weird.. if GetObserverService() fails we carry on.. but if addobserver() fails we return. I guess I would suggest generally carrying on and returning the best value you can figure out - but basically ignoring that in the caller.

@@ +166,5 @@
> +// Are you ready for the fun part? Because here comes the fun part. The seer,
> +// which will do awesome stuff as you browse to make your browsing experience
> +// faster.
> +
> +Seer *Seer::gSeer = nullptr;

should this just be a static scoped to this c++ file rather than a static member?

@@ +168,5 @@
> +// faster.
> +
> +Seer *Seer::gSeer = nullptr;
> +
> +PRLogModuleInfo *gSeerLog = nullptr;

static?

@@ +255,5 @@
> +  }
> +
> +  nsresult rv;
> +
> +  rv = mStorageService->OpenDatabase(mDBFile, getter_AddRefs(mDB));

I don't really have much experience with the sqlite interface. I have read through this but I think you should have honza give feedback? the sql specific functions here too.

@@ +389,5 @@
> +  { }
> +
> +  NS_IMETHODIMP Run()
> +  {
> +    Seer::gSeer->mStatements.FinalizeStatements();

how do we know that gseer hasn't been dtor'd and set to null back on the main thread? does shutdown runner need a reference to it?

@@ +413,5 @@
> +
> +    if (mDB) {
> +      ioThread->Dispatch(new SeerDBShutdownRunner(), NS_DISPATCH_NORMAL);
> +    }
> +    ioThread->Shutdown();

this is going to result in spinning the event loop on the main thread on top of the observer stack. maybe the shutdown() can be done from ShutdownRunner() or even from another event triggered back on the main thread from shutdown runner?

@@ +427,5 @@
> +  if (aOuter != nullptr) {
> +    return NS_ERROR_NO_AGGREGATION;
> +  }
> +
> +  Seer *svc = new Seer();

nsRefPtr<Seer> svc = new Seer();

@@ +428,5 @@
> +    return NS_ERROR_NO_AGGREGATION;
> +  }
> +
> +  Seer *svc = new Seer();
> +  if (svc == nullptr) {

infallible malloc means you don't need this check

@@ +432,5 @@
> +  if (svc == nullptr) {
> +    return NS_ERROR_OUT_OF_MEMORY;
> +  }
> +
> +  NS_ADDREF(svc);

not needed with nsrefptr

@@ +439,5 @@
> +  if (NS_FAILED(rv)) {
> +    PR_LOG(gSeerLog, 4, ("Failed to initialize seer, seer will be a noop"));
> +  }
> +  rv = svc->QueryInterface(aIID, aResult);
> +  NS_RELEASE(svc);

not needed with nsrefptr

@@ +463,5 @@
> +  }
> +}
> +
> +class VerifierReleaseEvent : public nsRunnable
> +{

you don't need this class.. just use nsMainThreadPtrHandle/Holder instead. Among other nice properties it ensures that you don't actually go to the main thread just to drop any old ref - just when running the dtor.

@@ +488,5 @@
> +// the seer thread.
> +class SeerPredictionEvent : public nsRunnable
> +{
> +public:
> +  SeerPredictionEvent(nsIURI *target, nsIURI *referer,

maybe targetURI? target is often the varname for a thread.

@@ +492,5 @@
> +  SeerPredictionEvent(nsIURI *target, nsIURI *referer,
> +      SeerPredictReason reason, nsINetworkSeerVerifier *verifier)
> +    :mReason(reason)
> +    ,mVerifier(verifier)
> +  {

moz_assert(mainthread)

@@ +494,5 @@
> +    :mReason(reason)
> +    ,mVerifier(verifier)
> +  {
> +    if (mVerifier) {
> +      NS_ADDREF(mVerifier);

you can make this go away and have mVerifier just be a normal nsCOMPtr thanks to nsMainThraedPtrHandle.. and then you're not in the awkward situation of having two classes each with unbalanced refs.

@@ +506,5 @@
> +      ExtractOrigin(referer, mReferer.origin);
> +    }
> +  }
> +
> +  NS_IMETHOD Run()

MOZ_OVERRIDE

@@ +507,5 @@
> +    }
> +  }
> +
> +  NS_IMETHOD Run()
> +  {

moz_assert(seerthread) (or at least !mainthread)

@@ +538,5 @@
> +// seer thread and back to the main thread, since we don't have to hit the db
> +// for that.
> +void
> +Seer::PredictForLink(nsIURI *target, nsIURI *referer, nsINetworkSeerVerifier *verifier)
> +{

assert(mainthread) [I love these in any code that deals with multiple threads. over use them - asserts are free.]

@@ +544,5 @@
> +    return;
> +  }
> +
> +  bool isSSL;
> +  if (NS_SUCCEEDED(referer->SchemeIs("https", &isSSL)) && isSSL) {

actually - I can see this both ways. Can we make it preffable? (we can start with the default matching the behavior you have done here)

also log if disabled because of that.

@@ +574,5 @@
> +  {
> +    mPreresolves.AppendElement(uri);
> +  }
> +
> +  bool HasWork()

nit: bool HasWork() const

@@ +579,5 @@
> +  {
> +    return !(mPreconnects.IsEmpty() && mPreresolves.IsEmpty());
> +  }
> +
> +  NS_IMETHOD Run()

MOZ_OVERRIDE

@@ +637,5 @@
> +// associated with a particular subresource.
> +static int
> +CalculateGlobalDegradation(PRTime now, PRTime lastLoad)
> +{
> +  PRTime delta = now - lastLoad;

i think these consts are fine (0, 5, 10, 25) - maybe a tad low for my tastes but cest la vie. I do think we should tie them to prefs though - for easier troubleshooting and experimentation.

@@ +849,5 @@
> +  NS_ENSURE_SUCCESS(rv, false);
> +
> +  bool hasRows;
> +  rv = stmt->ExecuteStep(&hasRows);
> +  NS_ENSURE_SUCCESS(rv, false);

maybe

if (NS_FAILED(rv) || !hasRows)
 return false;

@@ +1027,5 @@
> +
> +  nsRefPtr<SeerPredictionRunner> runner = new SeerPredictionRunner(verifier);
> +
> +  rv = stmt->ExecuteStep(&hasRows);
> +  RETURN_IF_FAILED(rv);

if !hasRows return and move the allocation of the runner down to after the check

@@ +1080,5 @@
> +    return true;
> +  }
> +
> +  bool isHTTP = false;
> +  if (NS_SUCCEEDED(uri->SchemeIs("http", &isHTTP)) && !isHTTP) {

you're relying on isHTTP not being changed for NS_FAILED in one case, but guarding against it in the other :) When things return FAIL they are not supposed to set the out arguments - so I think you can just do this more simply as
uri->SchemeIs("http", &isHTTP);
if (!isHTTP)
 uri->SchemeIs("https", &isHTTP);

@@ +1105,5 @@
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }
> +
> +  if (loadContext && loadContext->UsePrivateBrowsing()) {
> +    // Don't want to do anything in PB mode

its certainly a low priority - but a followup bug that does PB session scoped seer actions is probably useful; just as it is for the cache.

@@ +1115,5 @@
> +    return NS_OK;
> +  }
> +
> +  // Ensure we've been given the appropriate arguments for the kind of
> +  // prediction we're being asked to do

nice

@@ +1151,5 @@
> +{
> +public:
> +  SeerLearnEvent(nsIURI *target, nsIURI *referer, SeerLearnReason reason)
> +    :mReason(reason)
> +  {

assert mainthread

@@ +1160,5 @@
> +      ExtractOrigin(referer, mReferer.origin);
> +    }
> +  }
> +
> +  NS_IMETHOD Run()

moz_override

@@ +1161,5 @@
> +    }
> +  }
> +
> +  NS_IMETHOD Run()
> +  {

assert seer thread

@@ +1182,5 @@
> +
> +    return NS_OK;
> +  }
> +private:
> +  uriInfo mTarget;

mTargetURI

@@ +1184,5 @@
> +  }
> +private:
> +  uriInfo mTarget;
> +  uriInfo mReferer;
> +  nsAutoCString mRefererSHP;

unused?

@@ +1641,5 @@
> +public:
> +  SeerResetEvent()
> +  { }
> +
> +  NS_IMETHOD Run()

moz_override

@@ +1642,5 @@
> +  SeerResetEvent()
> +  { }
> +
> +  NS_IMETHOD Run()
> +  {

moz_assert(seerthread)

::: netwerk/base/src/Seer.h
@@ +8,5 @@
> +
> +#include "nsINetworkSeer.h"
> +
> +#include "nsCOMPtr.h"
> +#include "nsIDNSService.h"

I think almost (all?) of these includes from DNSService down can be replaced with forward declarations.. you can do that even for nsCOMPtrs and nsRefPtrs.. and then the includes are needed seer.cpp of course.

@@ +115,5 @@
> +  PRTime mStartupTime;
> +  PRTime mLastStartupTime;
> +  int32_t mStartupCount;
> +
> +  nsRefPtr<SeerObserver> mObserver;

I'm not sure I see the point of having a separately allocated reference counted object that is always just strictly owned (with no other refs) by Seer. Can't seer just implement nsIObserver?

@@ +117,5 @@
> +  int32_t mStartupCount;
> +
> +  nsRefPtr<SeerObserver> mObserver;
> +
> +  nsRefPtr<SeerDNSListener> mListener;

same comment here as with mobserver

::: netwerk/base/src/moz.build
@@ +15,5 @@
>  
>  EXPORTS.mozilla.net += [
>      'Dashboard.h',
>      'DashboardTypes.h',
> +    'Seer.h',

why does this need to be exported? (don't non network things use the idl?)

::: netwerk/test/unit/test_seer.js
@@ +1,1 @@
> +var Cc = Components.classes;

I haven't read the tests yet.
Attachment #793054 - Flags: review?(mcmanus)
(In reply to Patrick McManus [:mcmanus] [pto until aug 30] from comment #46)
> Comment on attachment 793054 [details] [diff] [review]
> Part 1 - seer implementation (v2)
> 
> What telemetry are we going to gather here, and more broadly - how do we
> assess if this is useful?

I talked to Nick about this a couple of days ago. It seems like a good place to start would be to measure how many of the connections we opened speculatively have actually been used, and maybe check whether we get an improvement in DNS cache hit rate as well.
I think the cache hit rate we track right now is overall hit rate, including for speculatives - so we ought to also track non-speculative hit rate (which is what we would expect this to improve)

Ideally I'd like to see some time metric be improved.. that's the bottom line, right? Time to first byte of mumble?
(In reply to Patrick McManus [:mcmanus] [pto until aug 30] from comment #46)
> let's deal with # of connections somewhere.. speculativeconnect() is pretty
> conservative right now - but if we know a domain needs 6 then let's open 6!

Patrick - I'm working through your review comments, and have come down to this last one (everything else is taken care of, and try still appears to be green!) Do you have a recommendation on where to do this? Should I modify nsISpeculativeConnect (or rather, its HTTP implementation), or should I add a new code path for doing speculative connects from the seer? I'm leaning towards the latter (since it seems like the existing speculative semantics are useful), but would like your expert opinion on the matter.
Attachment #793059 - Flags: review?(dbaron) → review?(bzbarsky)
I'd like to understand the referrer semantics that SeerLearn() expects.

If I have a document at URI X that has <link rel="stylesheet" href="Y"> and then that stylesheet has "@import url(Z);", then for the load of Z do we want Y as the referrer or X?
Flags: needinfo?(hurley)
(In reply to Boris Zbarsky [:bz] from comment #50)
> I'd like to understand the referrer semantics that SeerLearn() expects.
> 
> If I have a document at URI X that has <link rel="stylesheet" href="Y"> and
> then that stylesheet has "@import url(Z);", then for the load of Z do we
> want Y as the referrer or X?

In this case, we want X as the referrer, because X is the URI that we will use to kick off predictions in pretty much every case except the (very rare) case when the user decides to put Y in the address bar (to look at the CSS in the browser window, for whatever weird reason). I don't particularly care to optimize for that odd case.
Flags: needinfo?(hurley)
Forgot to set needinfo? for Patrick in comment #49
Flags: needinfo?(mcmanus)
Here's a patch with all of Patrick's comments to now addressed, with the exception of opening more connections when needed (see comment #49 for more about that). Carrying forward sr+, but not requesting review until the connections issue is addressed.
Attachment #793054 - Attachment is obsolete: true
Attachment #796699 - Flags: superreview+
Smaug - here's a version of the patch updated to use helper functions, as requested, keeping the docshell logic as clean as it is now :)
Attachment #790287 - Attachment is obsolete: true
Attachment #796700 - Flags: review?(bugs)
jst - This is exactly the same patch, but with all the logic pushed down into helper functions that live inside netwerk/, just like I did for docshell, layout, and imagelib. I'm re-requesting review since the code has changed, but the semantics are 100% the same.
Attachment #790288 - Attachment is obsolete: true
Attachment #796701 - Flags: review?(jst)
Gavin - here's a new patch with the changes discussed above. Seer data has been lumped in with history for "clear private data". For "forget about site", I went ahead and just cleared all the seer data (as we do with the cache). Given that this is a speculative (with high confidence) optimization, I'm not too concerned about losing everything.
Attachment #790296 - Attachment is obsolete: true
Attachment #796702 - Flags: review?(gavin.sharp)
Comment on attachment 793059 [details] [diff] [review]
Part 4 - Layout integration (v2)

OK.  I think that the Seer interface should be clearly documented to say that what it wants is the document URI, not the HTTP referrer.  And maybe it should be named something different so people don't get the two confused...

In the CSS loader, you then just want to use the document URI, not the referrer URI, since the referrer in my example would be Y, not X.

Similar in the font-face loader: the "referrer" there is always the stylesheet URI.
Attachment #793059 - Flags: review?(bzbarsky) → review-
And similar for the image loads: those would use stylesheet URIs for image loads coming from style rules, with the patch in attachment 793060 [details] [diff] [review].
(In reply to Boris Zbarsky [:bz] from comment #57)
> Comment on attachment 793059 [details] [diff] [review]
> Part 4 - Layout integration (v2)
> OK.  I think that the Seer interface should be clearly documented to say
> that what it wants is the document URI, not the HTTP referrer.  And maybe it
> should be named something different so people don't get the two confused...

Seems sensible, will do.

> In the CSS loader, you then just want to use the document URI, not the
> referrer URI, since the referrer in my example would be Y, not X.
> 
> Similar in the font-face loader: the "referrer" there is always the
> stylesheet URI.

Done for CSS loader. For font-face, I'm not 100% sure my suspicion of how to get the document uri is correct, though. I *think* it would be ps->GetDocument()->GetDocumentURI() (where ps is an nsIPresShell), does that seem correct? Or would I be better off using GetOriginalURI() on the document?
Flags: needinfo?(bzbarsky)
That's a good question.  I'd use GetDocumentURI(), I think.  The difference only matters if pushState is used, but if it's being used properly then GetDocumentURI is the way to go.

And yes, ps->GetDocument()->GetDocumentURI() for the font-face loader.
Flags: needinfo?(bzbarsky)
Comment on attachment 796699 [details] [diff] [review]
Part 1 - seer implementation (v3)

Honza - Patrick and I specifically want you to take a look at the database-related stuff, since you know that better than us. I stole pretty much everything from the DOM storage code like you mentioned in comment #21, so hopefully I didn't get anything horribly wrong ;)

Of course, don't feel limited to looking at just the db-related code.
Attachment #796699 - Flags: review?(honzab.moz)
New patch for layout/, using the document URI instead of referrer (along with renamed variables to match accordingly). I've also changed the interface and added comments to make it clear that users should use document URI instead of a referrer, but I'm not going to upload that patch until I have a better, larger reason to update part 1 (since it's all just mechanical renames).
Attachment #793059 - Attachment is obsolete: true
Attachment #797003 - Flags: review?(bzbarsky)
Attachment #797003 - Attachment description: 0004-Bug-881804-part-4-Plumb-layout-into-predictive-network-actions.-r-bz.patch → Part 4 - Layout integration (v3)
Seth - this patch is 100% the same as previous, but changed to use document URI instead of straight up referrer, as bz suggested in comment #58. Re-requesting review just to make sure.
Attachment #793060 - Attachment is obsolete: true
Attachment #797004 - Flags: review?(seth)
Latest iterations of the patches are green on try: https://tbpl.mozilla.org/?tree=Try&rev=e0842b279d04
Comment on attachment 797003 [details] [diff] [review]
Part 4 - Layout integration (v3)

>+    nsCOMPtr<nsIURI> documentURI = mDocument->GetDocumentURI();

No need for the strong ref here; can just pass mDocument->GetDocumentURI().

That said, mDocument can be null, but you don't need to SeerLearn in that case anyway. So something like this:

  if (mDocument) {
    mozilla::net::SeerLearn(aLoadData->mURI, 
        mDocument->GetDocumentURI(),
        nsINetworkSeer::LEARN_LOAD_SUBRESOURCE, mDocument);
  }

and the same at the other callsite in this file.

>+  nsCOMPtr<nsIDocument> document = ps->GetDocument();

This can just be nsIDocument*.

r=me
Attachment #797003 - Flags: review?(bzbarsky) → review+
Comment on attachment 796700 [details] [diff] [review]
Part 2 - Docshell integration (v2)


>+    mozilla::net::SeerPredict(aURI, nullptr, nsINetworkSeer::PREDICT_LOAD,
>+            this, nullptr);
Align the parameters

>+  mozilla::net::SeerPredict(aURI, mCurrentURI, nsINetworkSeer::PREDICT_LINK,
>+          this, nullptr);
ditto
Attachment #796700 - Flags: review?(bugs) → review+
Comment on attachment 796699 [details] [diff] [review]
Part 1 - seer implementation (v3)

Review of attachment 796699 [details] [diff] [review]:
-----------------------------------------------------------------

r- for GetHost() and few database glitches.  There are few suggestions too, not needed to address for landing this.

::: netwerk/base/public/nsINetworkSeer.idl
@@ +55,5 @@
> +   *   on which the link appears.
> +   *   In the case of PREDICT_LOAD, targetURI should be the URI of the page that
> +   *   is being loaded and referrerURI should be null.
> +   *   In the case of PREDICT_STARTUP, both targetURI and referrerURI should be
> +   *   null.

What is the verifier?  Where from the loadContext has to come?

@@ +94,5 @@
> +   * @param reason - The reason we are learning this bit of knowledge.
> +   *   Reason can be any of the LEARN_* values.
> +   *   In the case of LEARN_LOAD_SUBRESOURCE, targetURI should be the URI of a
> +   *   subresource of a page, and referrerURI should be the top-level URI (which
> +   *   may have been redirected to).

"(which may have been redirected to)"

Does it belong here?

::: netwerk/base/src/Seer.cpp
@@ +117,5 @@
> +static Seer *gSeer = nullptr;
> +
> +#if defined(PR_LOGGING)
> +static PRLogModuleInfo *gSeerLog = nullptr;
> +#define SEER_LOG(x) PR_LOG(gSeerLog, 4, (x))

Can you please say what is the module name here?

@@ +149,5 @@
> +  ,mQueueSize(0)
> +  ,mQueueSizeLock("Seer.mQueueSizeLock")
> +{
> +#if defined(PR_LOGGING)
> +  gSeerLog = PR_NewLogModule("nsNetworkSeer");

probably just NetworkSeer ?

@@ +361,5 @@
> +      rv = branch->GetIntPref(SEER_REDIRECT_LIKELY_PREF,
> +          &mRedirectLikelyConfidence);
> +    } else if (!strcmp(SEER_MAX_QUEUE_SIZE_PREF, data.get())) {
> +      rv = branch->GetIntPref(SEER_MAX_QUEUE_SIZE_PREF, &mMaxQueueSize);
> +    }

Since you don't do any special actions when a pref changes, you may want to use mozilla::Preferences::AddXXXXVarCache.  It will save you a lot of code and complexity.

@@ +395,5 @@
> +
> +  rv = InstallObserver();
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = NS_NewNamedThread("Seer Thread", getter_AddRefs(mIOThread));

Since there could be more seers in the future, "Network Seer" may be a better name, up to you.

@@ +424,5 @@
> +// all an optimization, anyway.
> +
> +nsresult
> +Seer::EnsureInitStorage()
> +{

Add a MOZ_ASSERT this happens on your I/O thread.

@@ +432,5 @@
> +
> +  nsresult rv;
> +
> +  rv = mStorageService->OpenDatabase(mDBFile, getter_AddRefs(mDB));
> +  NS_ENSURE_SUCCESS(rv, rv);

Please handle schema versions.  I think it's missing in the localStorage database but that is a mistake.

Maybe also add a retry when the file is corrupted, that you delete the file and open the database again.

@@ +448,5 @@
> +
> +  rv = mDB->ExecuteSimpleSQL(
> +      NS_LITERAL_CSTRING("CREATE TABLE IF NOT EXISTS moz_subhosts (\n"
> +                         "  id INTEGER PRIMARY KEY AUTOINCREMENT,\n"
> +                         "  hid INTEGER NOT NULL,\n"

I assume this is ref to moz_hosts.id ?  Some more proper name then just "hid" would be good but up to you.

Create an index for the child id column!  In all tables using foreign keys.

@@ +452,5 @@
> +                         "  hid INTEGER NOT NULL,\n"
> +                         "  origin TEXT NOT NULL,\n"
> +                         "  hits INTEGER DEFAULT 0,\n"
> +                         "  last_hit INTEGER DEFAULT 0,\n"
> +                         "  FOREIGN KEY(hid) REFERENCES moz_hosts(id)\n"

I think you "PRAGMA foreign_keys = ON;" to make this actually work.

@@ +465,5 @@
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  nsCOMPtr<mozIStorageStatement> stmt;
> +  rv = mDB->CreateStatement(
> +      NS_LITERAL_CSTRING("SELECT * FROM moz_startups;\n"),

rather name what you select.

@@ +493,5 @@
> +    rv = stmt->BindInt32Parameter(0, mStartupCount + 1);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +
> +    rv = stmt->BindInt64Parameter(1, mStartupTime);
> +    NS_ENSURE_SUCCESS(rv, rv);

This API is obsolete, please bind by name.

@@ +554,5 @@
> +                         "  hits INTEGER DEFAULT 0,\n"
> +                         "  last_hit INTEGER DEFAULT 0,\n"
> +                         "  FOREIGN KEY(pid) REFERENCES moz_pages(id)\n"
> +                         ");\n"));
> +  NS_ENSURE_SUCCESS(rv, rv);

Can you please add comments what is the purpose of each table?  I.e. what data it actually collects.

@@ +619,5 @@
> +
> +  mInitialized = false;
> +
> +  if (mAccumulators) {
> +    delete mAccumulators;

Please rather use AutoPtr.

@@ +630,5 @@
> +    if (mDB) {
> +      nsRefPtr<SeerDBShutdownRunner> runner =
> +        new SeerDBShutdownRunner(ioThread, this);
> +      ioThread->Dispatch(runner, NS_DISPATCH_NORMAL);
> +    }

And when you don't have the database, who shuts the thread down?

@@ +636,5 @@
> +}
> +
> +nsresult
> +Seer::Create(nsISupports *aOuter, const nsIID& aIID,
> +    void **aResult)

align

@@ +653,5 @@
> +  }
> +  rv = svc->QueryInterface(aIID, aResult);
> +
> +  return rv;
> +}

You can also use NS_GENERIC_FACTORY_CONSTRUCTOR_INIT in net module.

@@ +663,5 @@
> +{
> +  nsAutoCString scheme, hostPort;
> +  nsresult rv = uri->GetScheme(scheme);
> +  if (NS_SUCCEEDED(rv)) {
> +    rv = uri->GetHostPort(hostPort);

Oh... don't use GetHostPort.  Use GetAsciiHost and GetPort separately.  HostPort can give more then just one representation of a host name.

Or do you have any special reason to use HostPort?

@@ +680,5 @@
> +class SeerPredictionEvent : public nsRunnable
> +{
> +public:
> +  SeerPredictionEvent(nsIURI *targetURI, nsIURI *referrerURI,
> +      SeerPredictReason reason, nsINetworkSeerVerifier *verifier)

align

@@ +691,5 @@
> +    if (verifier) {
> +      mVerifier = new nsMainThreadPtrHolder<nsINetworkSeerVerifier>(verifier);
> +    }
> +    if (targetURI) {
> +      targetURI->GetSpec(mTargetURI.spec);

GetAsciiSpec

@@ +695,5 @@
> +      targetURI->GetSpec(mTargetURI.spec);
> +      ExtractOrigin(targetURI, mTargetURI.origin);
> +    }
> +    if (referrerURI) {
> +      referrerURI->GetSpec(mReferrerURI.spec);

as well

@@ +719,5 @@
> +      case nsINetworkSeer::PREDICT_STARTUP:
> +        gSeer->PredictForStartup(mVerifier);
> +        break;
> +      default:
> +        rv = NS_ERROR_UNEXPECTED;

rather MOZ_ASSERT(false) ?  this goes nowhere...

@@ +743,5 @@
> +// seer thread and back to the main thread, since we don't have to hit the db
> +// for that.
> +void
> +Seer::PredictForLink(nsIURI *targetURI, nsIURI *referrerURI,
> +    nsINetworkSeerVerifier *verifier)

align

@@ +824,5 @@
> +
> +      ++gSeer->mAccumulators->mTotalPredictions;
> +      ++gSeer->mAccumulators->mTotalPreresolves;
> +      nsAutoCString hostname;
> +      uri->GetAsciiHost(hostname);

I think with strict use of GetAsciiHost you may just use mPreresolves[i] here?

@@ +827,5 @@
> +      nsAutoCString hostname;
> +      uri->GetAsciiHost(hostname);
> +      nsCOMPtr<nsICancelable> tmpCancelable;
> +      gSeer->mDnsService->AsyncResolve(hostname,
> +          (nsIDNSService::RESOLVE_PRIORITY_MEDIUM |

Shouldn't this be high?  You though may know more then I here.

@@ +852,5 @@
> +// This calculates how much to degrade our confidence in our data based on
> +// the last time this top-level resource was loaded. This "global degradation"
> +// applies to *all* subresources we have associated with the top-level
> +// resource. This will be in addition to any reduction in confidence we have
> +// associated with a particular subresource.

What is "the top-level" here?  And honestly after reading this comment I still don't understand what the global degradation is :)

@@ +887,5 @@
> +// @param globalDegradation - the degradation for this top-level load as
> +//                            determined by CalculateGlobalDegradation
> +int
> +Seer::CalculateConfidence(int baseConfidence, PRTime lastHit,
> +    PRTime lastPossible, int globalDegradation)

:)

@@ +951,5 @@
> +// table (which is for a particular origin).
> +bool
> +Seer::LookupTopLevel(QueryType queryType, const nsACString &key,
> +                     TopLevelInfo &info)
> +{

assert !main thread please

@@ +955,5 @@
> +{
> +  nsCOMPtr<mozIStorageStatement> stmt;
> +  if (queryType == QUERY_PAGE) {
> +    stmt = mStatements.GetCachedStatement(
> +        NS_LITERAL_CSTRING("SELECT * FROM moz_pages WHERE uri = ?;"));

name what you select

@@ +990,5 @@
> +// Insert data about either a top-level page or a top-level origin into
> +// the database.
> +void
> +Seer::AddTopLevel(QueryType queryType, const nsACString &key, PRTime now)
> +{

assert !mt - this applies to all statement usage

@@ +1094,5 @@
> +
> +    // We use goto nextrow here instead of just failling, because we want
> +    // to do some sort of prediction if at all possible. Of course, it's
> +    // probably unlikely that subsequent rows will succeed if one fails, but
> +    // it's worth a shot.

Did this ever happen to you?  I mean a failure...

@@ +1113,5 @@
> +    }
> +
> +    baseConfidence = (hitCount * 100) / info.loadCount;
> +    confidence = CalculateConfidence(baseConfidence, lastHit, info.lastLoad,
> +        globalDegradation);

confidence = CalculateConfidence(baseConfidence, lastHit, info.lastLoad,
                                 globalDegradation);

@@ +1166,5 @@
> +
> +  int globalDegradation = CalculateGlobalDegradation(now, info.lastLoad);
> +  int baseConfidence = (hitCount * 100) / info.loadCount;
> +  int confidence = CalculateConfidence(baseConfidence, lastHit, info.lastLoad,
> +      globalDegradation);

This could well be in some shared function.  Up to you.

@@ +1219,5 @@
> +  if (havePage && WouldRedirect(pageInfo, now, newUri)) {
> +    nsRefPtr<SeerPredictionRunner> runner = new SeerPredictionRunner(verifier);
> +    runner->AddPreconnect(newUri.spec);
> +    NS_DispatchToMainThread(runner);
> +    PredictForPageload(newUri, verifier);

This looks dangerous.  Should have a counter limit (10?).

@@ +1287,5 @@
> +    }
> +
> +    baseConfidence = (hitCount * 100) / mStartupCount;
> +    confidence = CalculateConfidence(baseConfidence, lastHit,
> +        mLastStartupTime, 0);

no global confidence?

@@ +1313,5 @@
> +  bool isHTTP = false;
> +  uri->SchemeIs("http", &isHTTP);
> +  if (!isHTTP) {
> +    uri->SchemeIs("https", &isHTTP);
> +  }

Check error results.

@@ +1319,5 @@
> +  return isHTTP;
> +}
> +
> +nsresult
> +Seer::ReserveSpaceInQueue()

Add comments what this is actually doing and how it is used.

@@ +1328,5 @@
> +    SEER_LOG(("Not enqueuing event - queue too large"));
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }
> +
> +  mQueueSize++;

I think you can use Automic instead of having a lock for this but up to you.

@@ +1347,5 @@
> +    SeerPredictReason reason, nsILoadContext *loadContext,
> +    nsINetworkSeerVerifier *verifier)
> +{
> +  MOZ_ASSERT(NS_IsMainThread(),
> +      "Seer interface methods must be called on the main thread");

Should we hard-fail the call when not on the main thread?

@@ +1504,5 @@
> +  nsCOMPtr<mozIStorageStatement> stmt;
> +  if (queryType == QUERY_PAGE) {
> +    stmt = mStatements.GetCachedStatement(
> +        NS_LITERAL_CSTRING("SELECT * FROM moz_subresources WHERE pid = ? AND "
> +          "uri = ?;"));

You need an index for (pid, uri)

@@ +1508,5 @@
> +          "uri = ?;"));
> +  } else {
> +    stmt = mStatements.GetCachedStatement(
> +        NS_LITERAL_CSTRING("SELECT * FROM moz_subhosts WHERE hid = ? AND "
> +          "origin = ?;"));

and here for (hid, origin)

btw, you then don't need the separate index for hid/pid only.  this one will be used to lookup by hid/pid.

@@ +1654,5 @@
> +  }
> +  // Can't add a subhost to a host we don't have in our db
> +}
> +
> +// This is called when a top-level loaded ended up redirecting to a different

"top-level load" ?

@@ +1708,5 @@
> +
> +  if (!hasRows) {
> +    // This is the first time we've seen this top-level redirect to this URI
> +    nsCOMPtr<mozIStorageStatement> addRedirect = mStatements.GetCachedStatement(
> +        NS_LITERAL_CSTRING("INSERT INTO moz_redirects "

BTW, are you aware of INSERT OR REPLACE INTO?  It would save you a lot of code and logic...

@@ +1903,5 @@
> +};
> +
> +// Helper that actually does the database wipe.
> +void
> +Seer::DoReset()

ResetInternal() ?

@@ +1914,5 @@
> +  mDB->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DELETE FROM moz_pages"));
> +  mDB->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DELETE FROM moz_startup_pages"));
> +  mDB->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DELETE FROM moz_startups"));
> +  mDB->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DELETE FROM moz_subhosts"));
> +  mDB->ExecuteSimpleSQL(NS_LITERAL_CSTRING("DELETE FROM moz_hosts"));

Hmm.. I'm curious why you actually need the foreign keys...

::: netwerk/test/unit/test_seer.js
@@ +43,5 @@
> +    }
> +  }
> +
> +  return -1;
> +}

haystack.indexOf(needle) ?

::: toolkit/components/telemetry/Histograms.json
@@ +1665,5 @@
> +      "high": "3000",
> +      "n_buckets": 10,
> +      "extended_statistics_ok": true,
> +      "description": "Amount of time spent doing the work for learn (ms)"
> +  },

I'd personally would also be interested in how long it takes from Predict() to actual pre* operation.  Since for any predict request you need to go to the database, and you have a single queue for learn and predict events w/o prioritizing predicts (which I think might be a bottle neck) it might take so long time, we will just waste CPU and do dns and speculative connects after the page already started to load.

Could we actually have a telemetry that would rather say that the predict action (it's actual execution) has happened before it would happen w/o prediction?  I.e. that predicted connections are used and are open before we would do it via our current open mechanisms?

Can be in a followup if found reasonable and doable.
Attachment #796699 - Flags: review?(honzab.moz) → review-
Comment on attachment 797004 [details] [diff] [review]
Part 5 - image loading integration (v3)

Review of attachment 797004 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, that makes sense to me.
Attachment #797004 - Flags: review?(seth) → review+
(In reply to Nicholas Hurley [:hurley] (PTO Aug 26, Aug 30-Sept 9) from comment #49)
> (In reply to Patrick McManus [:mcmanus] [pto until aug 30] from comment #46)
> > let's deal with # of connections somewhere.. speculativeconnect() is pretty
> > conservative right now - but if we know a domain needs 6 then let's open 6!
> 
> Patrick - I'm working through your review comments, and have come down to
> this last one (everything else is taken care of, and try still appears to be
> green!) Do you have a recommendation on where to do this? Should I modify
> nsISpeculativeConnect (or rather, its HTTP implementation), or should I add
> a new code path for doing speculative connects from the seer? I'm leaning
> towards the latter (since it seems like the existing speculative semantics
> are useful), but would like your expert opinion on the matter.

Oh I definitely think we want to keep all http connection management inside nsHttpConnectionMgr.cpp.. It could use a little work as it is - but spreading it out isn't going to help anything :)

I would suggest having nsHttpConnectionMgr::OnMsgSpeculativeConnect() call do_GetInterface on the callbacks trying to get some kind of new interface that provides the ability to bypass the idleconns.length() and RestrictConnections checks on 2539 (and gives a new max value). I would probably just define that as a new interface that I tucked into the nsISpeculativeConnect.idl file.
Flags: needinfo?(mcmanus)
Comment on attachment 796702 [details] [diff] [review]
Part 6 - browser hookups (v2)

Looks good to me, but over to Tim to take a closer look.
Attachment #796702 - Flags: review?(ttaubert)
Attachment #796702 - Flags: review?(gavin.sharp)
Attachment #796702 - Flags: feedback+
Attachment #796701 - Flags: review?(jst) → review+
Comment on attachment 796702 [details] [diff] [review]
Part 6 - browser hookups (v2)

Review of attachment 796702 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. Please note that I'm not a toolkit peer, though.
Attachment #796702 - Flags: review?(ttaubert) → review+
Comment on attachment 796702 [details] [diff] [review]
Part 6 - browser hookups (v2)

Dave - looking for a review for the toolkit/ portions of this patch. I didn't see forgetaboutsite listed on the toolkit sub-modules wiki, so please hand off to someone else as appropriate.
Attachment #796702 - Flags: review?(dtownsend+bugmail)
Patrick - new patch with your review comments to this point addressed. The one thing that got a bit tricky was overriding restrictConnections, et. al. to allow us to open more speculative connections. I ended up having to get the data somewhere I was certain was on the main thread because of js objects that may have been passed as callbacks to SpeculativeConnect, and then pass that data to OnMsgSpeculativeConnect manually (which runs on the socket thread), so please take an extra-special look at that to make sure I'm not entirely crazy with my approach there :)
Attachment #796699 - Attachment is obsolete: true
Attachment #804760 - Flags: superreview+
Attachment #804760 - Flags: review?(mcmanus)
Comment on attachment 804760 [details] [diff] [review]
Part 1 - seer implementation (v4)

Honza - this version of the patch also has all your comments addressed, with one exception. I opted to not use "INSERT OR REPLACE INTO" in conjunction with the foreign keys, because (according to SQLite docs), that results in a DELETE operation happening, which WILL cascade (which would mean we would lose subresource data from an INSERT OR REPLACE INTO on moz_hosts or moz_pages). I could conceivably have used INSERT OR REPLACE INTO on other tables, but avoid so to have the code consistent with itself.
Attachment #804760 - Flags: review?(honzab.moz)
Forgot link to latest (and gloriously green!) try run: https://tbpl.mozilla.org/?tree=Try&rev=d167cfc53568
Comment on attachment 796702 [details] [diff] [review]
Part 6 - browser hookups (v2)

Review of attachment 796702 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/forgetaboutsite/ForgetAboutSite.jsm
@@ +224,5 @@
> +    // domain, so just trash it all
> +    let (ns = Cc["@mozilla.org/network/seer;1"].
> +              getService(Ci.nsINetworkSeer)) {
> +      ns.reset();
> +    }

Please just use the same style as above, the let block is unnecessary here.
Attachment #796702 - Flags: review?(dtownsend+bugmail) → review+
Just a minor update to re-consolidate some code into the Seer that I had previously separated out into another class in an attempt to fix some try failures (which was, ultimately, not the issue). Nothing else has changed since the last version.
Attachment #804760 - Attachment is obsolete: true
Attachment #804760 - Flags: review?(mcmanus)
Attachment #804760 - Flags: review?(honzab.moz)
Attachment #806043 - Flags: superreview+
Attachment #806043 - Flags: review?(mcmanus)
Attachment #806043 - Flags: review?(honzab.moz)
Comment on attachment 806043 [details] [diff] [review]
Part 1 - seer implementation (v4.1)

Review of attachment 806043 [details] [diff] [review]:
-----------------------------------------------------------------

I can almost smell success here!

I only skimmed the sql.. honza I would appreciate it if you could read that - but don't feel the need to review more than that unless you want to.

nick, the only must-fix I have left is dealing with RestrictConnections() - see below.

::: netwerk/base/public/nsISpeculativeConnect.idl
@@ +29,5 @@
>                              in nsIInterfaceRequestor aCallbacks);
>  
>  };
>  
> +[builtinclass, uuid(2b6d6fb6-ab28-4f4c-af84-bfdbb7866d72)]

definitely need some documentation here

@@ +32,5 @@
>  
> +[builtinclass, uuid(2b6d6fb6-ab28-4f4c-af84-bfdbb7866d72)]
> +interface nsISpeculativeConnectionOverrider : nsISupports
> +{
> +    readonly attribute unsigned long parallelSpeculativeConnectLimit;

as long as its a builtinclass you can define these as infallible

::: netwerk/base/src/Seer.cpp
@@ +664,5 @@
> +  rv = svc->Init();
> +  if (NS_FAILED(rv)) {
> +    SEER_LOG(("Failed to initialize seer, seer will be a noop"));
> +  }
> +  rv = svc->QueryInterface(aIID, aResult);

should this be in an else branch?

@@ +672,5 @@
> +
> +// Get the full origin (scheme, host, port) out of a URI (maybe should be part
> +// of nsIURI instead?)
> +static void
> +ExtractOrigin(nsIURI *uri, nsAutoCString &s)

I'm thinking s should be Truncated() when you do an early return from an error

@@ +1233,5 @@
> +    LearnForStartup(uri);
> +  }
> +}
> +
> +const int MAX_PAGELOAD_DEPTH = 10;

good idea :)

::: netwerk/base/src/Seer.h
@@ +64,5 @@
> +  friend class SeerDBShutdownRunner;
> +
> +  nsresult EnsureInitStorage();
> +
> +  // This is a proxy for the information we need from an nsIURI

// because it is needed off main thread

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +321,5 @@
>          connInfo.forget();
>      return rv;
>  }
>  
> +class SpecConnectArgs : public nsISupports

SpeculativeConnectArgs please

can we nix nsISupports and not pretend this is com'ified? (yech)

you can just implement AddRef() and Release() and still use nsRefPtr

@@ +360,5 @@
>  
>      caps |= ci->GetAnonymous() ? NS_HTTP_LOAD_ANONYMOUS : 0;
> +    args->mTrans = new NullHttpTransaction(ci, wrappedCallbacks, caps);
> +
> +    nsCOMPtr<nsISpeculativeConnectionOverrider> o = do_GetInterface(callbacks);

one letter variable names sounds like something I would get dinged for in a review :)

@@ +2575,5 @@
> +    bool ignoreIdle = false;
> +
> +    if (args->mOverridesOK) {
> +        parallelSpeculativeConnectLimit = args->mParallelSpeculativeConnectLimit;
> +        restrictConnections = args->mRestrictConnections;

among other things, RestrictConnections() will prevent a new speculative connection to a SPDY host with a full active (muxxable) connection to it. this code basically overrides that and would create multiple TCP sockets to a known (and connected) spdy host - we don't want that. Maybe we could soften the cases where the spdy-sate of the host is unknown and the handshake is in progress - but do we want to be more drastic than that?
Attachment #806043 - Flags: review?(mcmanus)
Yet another patch, with updates based on your latest set of comments. The (intermittent then almost perma) failure on try I mentioned in person today is also fixed in this patch.
Attachment #806043 - Attachment is obsolete: true
Attachment #806043 - Flags: review?(honzab.moz)
Attachment #807027 - Flags: superreview+
Attachment #807027 - Flags: review?(mcmanus)
Nick, don't want a review from me any more?
(In reply to Honza Bambas (:mayhemer) from comment #80)
> Nick, don't want a review from me any more?

Honza, Patrick said yesterday that, since I hadn't made any SQL changes other than the ones you requested in your previous review, I should just have him finish up the reviews and leave you free to work on the cache instead. I obviously won't stop you from reviewing the newest patch, but that's why you don't have an r? on this version.
(In reply to Nicholas Hurley [:hurley] from comment #81)
> (In reply to Honza Bambas (:mayhemer) from comment #80)
> > Nick, don't want a review from me any more?
> 
> Honza, Patrick said yesterday that, since I hadn't made any SQL changes
> other than the ones you requested in your previous review, I should just
> have him finish up the reviews and leave you free to work on the cache
> instead. I obviously won't stop you from reviewing the newest patch, but
> that's why you don't have an r? on this version.

right! (your help is always welcome - don't need an r? for that :)
Attachment #807027 - Flags: review?(mcmanus) → review+
(In reply to Nicholas Hurley [:hurley] from comment #75)
> Forgot link to latest (and gloriously green!) try run:
> https://tbpl.mozilla.org/?tree=Try&rev=d167cfc53568

Not so very green - the Android R3 was neither bug 802756 (whatever it might be) nor bug 820756 (which is about shutdown, not "some random test, different each time"), but was instead the bustage that's going to get you backed out.
Well, boo (for both my failure to type properly and my misreading of the bug). Back to ye olde drawing boarde.
Finally, once I got home to a tree, backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f2b5dbc01325.

No idea what it means that it was only in reftest and jsreftest that "application ran for longer than allowed maximum time".
It looks like these patches also perf caused regressions in the Tp5 and Tp4 page load benchmarks.  (The performance improved again when the patches were backed out.)
Further inspection indicates that this *only* happens on android 2.2 (android 4.0 is reliably green). Not 100% sure what this means (other than that older android versions are crap), but it may at least provide the opportunity for a workaround to get this landed & enabled on most platforms. Running a test patch through try right now, and we'll see how it goes.
Depends on: 917682
Attached patch Android 2.2 workaround hack (obsolete) — Splinter Review
So here's a thing. As mentioned in my previous comment, the tests *only* time out on Android 2.2 (Armv6 and Armv7). This patch is a hack to disable the seer on android versions less than 4 (determined based on the API level available) until the specific problems with android 2.2 is determined. I don't even know if this is a route we want to go, but if we do, it will at least get us 100% of most platforms, and around 2/3 of android (based on http://developer.android.com/about/dashboards/).

I will note here that I think a large part of the problem with android 2.2 is related to the infrastructure or tests themselves. Since I've been digging in on this, I decided to look at how long green test runs (specifically Android 2.2 Opt Armv7 J3) took to complete successfully. My totally unscientific sample of 5 recent pushes on m-c and m-i show a range between 12 and 45 minutes (the timeout occurs after 60 minutes). Looking at my backed-out push to m-i, the green runs of the same test took between 18 and 35 minutes (so well within "normal" range), while (of course) the timed out tests were killed after 60 min.

I will also note that I can even effectively make the seer a no-op service (by never dispatching any actual work to the seer i/o thread, but doing all the setup for it) and the reftests will *still* time out. Obviously, the "no-op" seer is still doing a little work (validation of inputs mostly), but it's really not much, and the fact that even that little bit of overhead can break tests is disconcerting, to say the least.
Attachment #810195 - Flags: review?(mcmanus)
One final thing: the effect of the hack workaround is to ensure mInitialized == false, which makes the seer do even less work than just the setup for dispatching things to the i/o thread. Eleven runs of the same test (Android 2.2 Opt Armv7 J3) with the hack patch applied all completed in between 12 and 35 minutes, which is well within "normal" range for that test.
Comment on attachment 810195 [details] [diff] [review]
Android 2.2 workaround hack

Review of attachment 810195 [details] [diff] [review]:
-----------------------------------------------------------------

I'm totally fine with skipping earlier android version coverage.  As pointed out it looks like we're just tipping already-bloated testruns just past the timeout, so no indications of serious problem anyway.
Attachment #810195 - Flags: review?(mcmanus) → review+
Attachment #810195 - Flags: review?(mcmanus)
it doesn't make a lot of sense to me that we're adding enough work to extend the runtime of the test to fail. This is probably related the tp4 and tp5 regressions as well.

nick?

jason - I don't understand why you canceled my review.
for instance - dnsprefetch was able to minimize tp4/5 pain by building a queue of stuff to submit to the prefetcher, batching that all together, and doing it after the parse was complete. it got bounced a bunch of times before that was complete. maybe that applies here?

If the tp5 regressions are unavoidable I think we need to be able to characterize them (i.e. how bad they are) and at least have some measured anecdotal wins to compare them against using networks with actual latency.
Patrick - I'm already batching things up from a single call to Predict() into one event sent to the main thread (2 queues - one for preresolve, one for preconnect). Should I really be batching up more, and dispatching to the main thread less often?

I agree that it doesn't make sense that we're adding enough work to extend the runtime of the test (though as I pointed out, the test is already insanely variable in its runtime). This is especially confusing to me because it happens EVEN in cases where we don't "really" do any work!

Those cases are: Calls to Predict() and Learn() happen, and do all their initial work on the main thread up to the point where they would dispatch something to hit the db. At that point, I forced them to return NS_OK, and not dispatch anything. This still causes test timeouts.

I'm fine with not doing the hack workaround, that's why I asked in my r? if it was even an approach we wanted to take. Given the other data I've seen, though, I'm rapidly coming to a point where I'm at a loss for what else to try to fix the overall issue.
maybe start with the tp4/tp5 issues matt points out in comment 87?

I would feel more comfortable just calling this a scary test if it were the only issue - but they feel like they are making the same complaint.
Here's the try push that has timeouts I was talking about in comment 94: https://tbpl.mozilla.org/?tree=Try&rev=5c37bcf570fc (note how J2 has the dreaded "application ran for longer than allowed maximum time" failure)
And the diff against the existing patches in this bug that was pushed (disables dispatching anything to the seer IO thread): https://hg.mozilla.org/try/rev/f94348b930f7

I've stared and stared at the main-thread bits of Predict() and Learn() to see if there's anything in there that could be taking so long as to make the test take longer than usual, and either I'm missing something, or there isn't anything. And yet, the test still finds a way to timeout.
(In reply to Patrick McManus [:mcmanus] from comment #95)
> maybe start with the tp4/tp5 issues matt points out in comment 87?
> 
> I would feel more comfortable just calling this a scary test if it were the
> only issue - but they feel like they are making the same complaint.

Fair enough.
For reference, here's a graph that shows the Tp5 regression when this landed on 2013-09-20 and the improvement when it was backed out a day later:
http://graphs.mozilla.org/graph.html#tests=[[255,63,21]]&sel=1379614356683,1380219156683

That shows a regression of about 6% on Mac OS X 10.6.  Other platforms also regressed, by varying amounts.  Tp4 on Android 4.0 regressed by about 8%.
OK, for my reference (as much as anything else) https://tbpl.mozilla.org/?tree=Try&rev=cbd4698230ad is a try run, same code as the one mentioned in comment 96, but running talos as well, to see if we get the regression on tp4/tp5, or if that's a problem independent of the reftest failures.
So I had a random theory that some of the issues may be related to less-than-great servers running on the tests, and perhaps the timeouts are caused by the same stuff Steve has been working on to eliminate speculative connections to rfc1918 IP blocks. As such, I pushed an experimental run to try with my patches applied on top of his latest patch. There's a 99% chance this is worthless, as I'm tired and not feeling well, but I figured it's worth a shot. https://tbpl.mozilla.org/?tree=Try&rev=4d2dcd19a2bb
Attachment #810195 - Flags: review?(mcmanus)
Comment on attachment 810195 [details] [diff] [review]
Android 2.2 workaround hack

It sounds like you've already decided against landing this, but I thought it important to note that I think it's a bad idea.
Attachment #810195 - Flags: feedback-
My idea in comment 100 didn't pan out (jsreftests on android still time out, and at least Tp5 still regresses), however, my push from comment 99 (which does not involve dispatching any events to the seer thread, making the seer an effective no-op) seems to have a normal Tp5 time (this code is already known to not fix the jsreftest issue), so they seem to be two separate issues. Not that this knowledge will make anyone happier (it certainly doesn't make me happier), but it's progress of some sort.
OK, so here's what should be the final version of part 1. The only changes from the previous version are (1) The addition of code to disable the seer on versions of android < 2.3, and (2) turning off the seer xpcshell test on all versions of android. The reason we turn off the test on all versions (instead of just 2.2) is that the code from (1) doesn't work properly on our test infrastructure (it gets the wrong value) while it works perfectly on a real device. Fun!

As discussed in email, here's some perf numbers from WPT with the seer:
- On a modern internet connection like most of us in the US have, up to 10% better SpeedIndex for repeat views of a page
- On a 3G internet connection, up to 23% better SpeedIndex for repeat views

So (as I will also email to dev-tree-management), the tp5 regressions caused by these patches do not reflect the gains that will be seen on real networks. Win!
Attachment #807027 - Attachment is obsolete: true
Attachment #821311 - Flags: superreview+
Attachment #821311 - Flags: review?(mcmanus)
Attachment #810195 - Attachment is obsolete: true
Try run for the newest patch at https://tbpl.mozilla.org/?tree=Try&rev=5786b873e4c9 (I will keep an eye out, though I'm 99% certain I've done a full try run with this patch, and I just lost the link, hence the new run)
As requested on IRC - an explanation of what SpeedIndex actually is: https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-index (the tl;dr is it's a measure of user-perceived responsiveness in loading a page, or how fast the page load *feels* versus how fast it actually might be).
Comment on attachment 821311 [details] [diff] [review]
Part 1 - seer implementation (v6)

Review of attachment 821311 [details] [diff] [review]:
-----------------------------------------------------------------

awesome. please document the pages used by the wpt tests.
Attachment #821311 - Flags: review?(mcmanus) → review+
(In reply to Nicholas Hurley [:hurley] from comment #103)
> OK, so here's what should be the final version of part 1. The only changes
> from the previous version are (1) The addition of code to disable the seer
> on versions of android < 2.3, and (2) turning off the seer xpcshell test on
> all versions of android.

Do we understand why the code doesn't work on Android <2.3?

> The reason we turn off the test on all versions
> (instead of just 2.2) is that the code from (1) doesn't work properly on our
> test infrastructure (it gets the wrong value) while it works perfectly on a
> real device. Fun!

That sounds worrisome. Can we investigate in a followup?

> So (as I will also email to dev-tree-management), the tp5 regressions caused
> by these patches do not reflect the gains that will be seen on real
> networks. Win!

Can you include dev.platform (or use it instead of d-t-m)? Seems like a better audience for this.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #107)
> Do we understand why the code doesn't work on Android <2.3?

We don't yet understand why it causes tests on Android < 2.3 to randomly timeout, no. Brad Lassey agreed in email that disabling this feature for Android < 2.3 (with the intent to investigate further after I give my head a rest from this for a bit) was an acceptable tradeoff to get this perf improvement to the majority of our users (versions < 2.3 account for only about 2% of all Android devices in the wild).

> That sounds worrisome. Can we investigate in a followup?

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=931190

> Can you include dev.platform (or use it instead of d-t-m)? Seems like a
> better audience for this.

Will do
At the risk of looking like an awful driveby asshole (please don't take it that way!): did anyone assess what impact this has on battery life and bytes transferred on Android?

Naively I would assume that speculatively opening network connections would be a bad thing, both from a data usage and a battery aspect (because live cell connectivity is a bad thing unless carefully integrated with other network access), but I am hopeful to be proved wrong… particularly given that I haven't thoroughly read the code.

If this impacts data usage, we *need* a user-facing way to turn it off (and, ideally, a component that uses TrafficStats to pay attention to this, track it in FHR, and turn it off automatically). It's easy to assume that faster for the Western world is a big win, but that's not necessarily true for our markets as a whole.

I'm not super keen on masking test failures on Android 2.2, but I'll defer to Gavin and Brad on that.
(In reply to Richard Newman [:rnewman] from comment #111)
> At the risk of looking like an awful driveby asshole (please don't take it
> that way!):

Why you little! :) (Just kidding!)

> did anyone assess what impact this has on battery life and bytes
> transferred on Android?

Officially? No. However, based on knowing how the feature works, I can make some relatively educated not-quite-guesses.

For battery life, I'm not particularly worried at all. On android, the only times this feature (in its current state) could cause the radio to be turned are situations in which the radio is already guaranteed to be turned on (when we're already loading a page), so no problem there.

Data use is a little fuzzier, but let me try to make you feel better about that, too. Since the whole point of this feature is to do things we were already going to do, but sooner (to make things go faster), we aren't going to be opening connections with no regard for whether or not we'll actually use them. There is the chance we'll get it wrong, but my hope is that the algorithm to decide what to do is conservative enough that, when we do get it wrong, we'll only get it slightly wrong (by firing off a DNS request) instead of seriously wrong (by warming up an https connection that involves doing a large (minimum of about 3k) TLS handshake). The "good" news is, TLS is used for a relatively low percentage of resources on the web, so in most cases, setting up a connection will only be the TCP handshake (which is pretty small). That's not a whole lot of data we're sending speculatively, it's likely a blip on all but the tiniest data plans (and possibly even those - I don't know what a non-unlimited data plan looks like any more).

There are a lot of things we can do in the future without turning off the feature to make this even less of an issue - the prediction algorithm could be made even more conservative (though not much), we could limit the number of preconnections (though we *really* don't want to), and there's probably plenty of room to make the prediction algorithm even smarter without necessarily making it more conservative.

I'll note that, as mentioned in comment 103, this feature is an even bigger win for mobile devices (23% better SpeedIndex) than it is for desktop (10% better SpeedIndex), so keeping this on for mobile is even more important than desktop from a performance perspective.
Thanks for the thorough writeup, Nick! Consider my worries addressed.

I took the liberty of flagging this for relnote/feature so that this is watched for the 27 release. CCing Roland and Erin to be aware of this from end-user support, sentiment, and overall release awareness -- big speedups are nice, and this way we can keep an eye open for fallout, too.
Keywords: feature, relnote
Thanks, guys (speedindex looks awesome, too). Are there specific use-cases you can suggest so we can test this out for Firefox for Android?  Adding fennec QA to Cc list for awareness.
Flags: needinfo?(hurley)
Erin, there isn't a whole lot about this that can be specifically tested, unless you want to start looking at packet traces. Certainly, basic things like "do pages load?" should work just like before, and in general, repeat views of a page should "feel" faster (this is what SpeedIndex measures), but that can be pretty subjective. For the power and data usage concerns... if QA wants to test that, then all they need to do is load some pages (with repeat views, to bring the seer into play) and see how things fare compared to older builds.
Flags: needinfo?(hurley)
nick has an awesome summary here.

I just want to underscore that this feature is particularly advantageous for mobile because of mobile's poor bandwidth/latency ratio. And it is really exciting to see speedindex be able to validate that!

in general - the best thing for mobile performance is to attack incidences of scaling-performance-by-rtt, because mobile rtt is so abysmal. Sometimes that means trading something you have an excess of for a more constrained restraint.

In this case that means trading a bit of bandwidth in order to pickup an optimization of something that is latency based - and the result is a big win. We often think of mobile networks as slow and therefore bandwidth constrained - that's certainly true some of the time, but at the early stages of a page load they are actually very poor at using the available bandwidth - and that's the exact moment of time this technique is targetted at.

In general costs of that bandwidth are something that need to be considered as part of the tradeoff, but in this particular case I think its a slam dunk because:

1] we expect a very high hit rate (much of the complexity of the feature is about trying to acheive that) - so we aren't as much using more bandwidth as shifting the bandwidth forward in time. However, we need to acknowledge that there will be some waste

2] wasted bandwidth here is comprised of just handhsakes and dns queries. They're both absolutely tiny - comprising well less than 0.1% of the bandwidth of even a mobile page that is being loaded.. so even a total miss would be lost in the noise. So its a no brainer imo

The more interesting question will be when it is viable to prefetch resources (not just the setup stages). I think we will want to go there - but its clearly a harder problem.
Is target mozilla27 correct?
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #117)
> Is target mozilla27 correct?

Yep, that's correct.
Blocks: 943475
Blocks: 946399
Is there any cap on size of the seer.sqlite file ?  Mines currently at 415meg and growing daily.  That seems awfully large, and folks that backup the entire Profile folder are going to be in for a big surprise I'm thinking.
Yeah, mine's at 116MB on desktop, 21MB on Android.

Desktop: 1,229 entries in moz_redirects, 18,227 in moz_pages, 1,404 in moz_hosts.

The earliest timestamp in moz_pages is from Oct 29, which is 40 days ago. That's also the earliest timestamp in moz_hosts.

I don't see anything in Seer.cpp that might clean this up.

This needs to be capped at a very low limit on Android, ideally introspecting device capabilities to decide on the default limit, pruning regularly, and responding to system low storage notifications.

I'll file a follow-up.
Blocks: 947745
Depends on: 948448
Depends on: 948205
No longer blocks: 947745
Depends on: 947745
Do we take enough measures to make sure that we disable the seer on private docshells?
Flags: needinfo?(hurley)
(Looking at attachment 796700 [details] [diff] [review] I guess the answer is no...)
Actually, we do. The docshell is passed as the load context, which is used in places such as https://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/Seer.cpp#1572
Flags: needinfo?(hurley)
(In reply to comment #123)
> Actually, we do. The docshell is passed as the load context, which is used in
> places such as
> https://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/Seer.cpp#1572

Great, that seems to address my concern.  Seer::Learn and Seer::Predict are the only API entry points from outside of netwerk/, right?
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #124)
> Great, that seems to address my concern.  Seer::Learn and Seer::Predict are
> the only API entry points from outside of netwerk/, right?

The only ones that would cause data to be written, rather than erased (Seer::Reset), yes :)
(In reply to comment #125)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #124)
> > Great, that seems to address my concern.  Seer::Learn and Seer::Predict are
> > the only API entry points from outside of netwerk/, right?
> 
> The only ones that would cause data to be written, rather than erased
> (Seer::Reset), yes :)

What's different about erasure?  That could give away what you were doing in PB mode just as well as a regular write can...
Oh, Reset clears *everything*, so yeah it's safe.  Thanks!
Whiteboard: [qa-]
What kind of prediction failure rate can we expect ?

A concern would be that, if Firefox sends too many DNS requests by mistake, an attacker listening to DNS servers could figure out which page the user is visiting within a website.

That's why I disabled DNS prefetching. But if Seer is clever enough that it only sends DNS requests that would really occur anyway, I can keep it enabled.

So how many wrong guesses can we expect with the current and future algorithms ?

Thanks
Hopefully the failure rate is pretty low, though we don't currently have a way to track that (especially not for DNS requests). The seer is designed to be clever, though, only doing a DNS prefetch or a TCP preconnect when it's confident that it would be a useful thing to do. There is a slight risk of unnecessary prefetches or preconnects, but it's specifically designed to be a very low risk.
Ok, I'll keep an eye on how things evolve then :)

Uh... Is there a master bug or some place where discussion on Seer development occurs, similar to this bug ? I haven't completely grasped how you guys are organised and using Bugzilla.

Thanks again.

Since you're the "Product Champion" (love that name :D) I won't be double posting the concern on wrong DNS requests in this thread: https://groups.google.com/forum/#!topic/mozilla.dev.planning/aiV8k4XqvJs
But tell me if you prefer that I do.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: