Closed Bug 949104 Opened 6 years ago Closed 5 years ago

Don't write history visits during HTTP redirection

Categories

(Firefox for Android :: General, defect)

All
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 36

People

(Reporter: rnewman, Assigned: mfinkle)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 4 obsolete files)

One is eliminating straight-up redirects: Comment 1. That we can do by observing automatic page redirects, hiding those interstitial pages. This requires a small schema change -- either hiding history entries altogether, or merging in Sync's notion of visits and moving us closer to Places' schema.

---

The solution Places uses is to mark some pages in history as hidden: 

1111   /**
1112    * Some pages in history are marked "hidden" and thus don't appear by default
1113    * in queries.  These include automatic framed visits and redirects.  Setting
1114    * this attribute will return all pages, even hidden ones.  Does nothing for
1115    * bookmark queries. Defaults to false.
1116    */
1117   attribute boolean includeHidden;

We definitely need that capability to handle JS redirection (though we'd need to be judicious about its application).

We can perhaps hackily handle network redirection in the same way, but really a redirect is a property of a visit, not a page. That means a schema change.
Depends on: 949105
No longer depends on: 949105
Another approach is to simply not record visits for redirects.

Note that this kills a:visited support, which might suck.
Summary: Allow history visits to be marked as hidden during redirection → Don't write history visits during redirection
(In reply to Richard Newman [:rnewman] from comment #1)
> Another approach is to simply not record visits for redirects.
> 
> Note that this kills a:visited support, which might suck.

Yes and no. If we classify the visit a redirect, then we shouldn't expect a:visited to work.
> We can perhaps hackily handle network redirection in the same way, but really a redirect is a property of a visit, 
> not a page. That means a schema change.

We already throw away network redirects. http://mxr.mozilla.org/mozilla-central/source/mobile/android/components/build/nsAndroidHistory.cpp#90
We miss a few things that desktop does though:
* Any "recently visited" page is not saved when visited again within a threshold. I think this is to avoid saving visits on refreshes

* SetURITitle ignores changes to non-top level URIs and obeys the canAddURI check. We obey the canAddURI check in GlobalHistory.update(uri, title) which is where the AndroidHistory::SetURITitle ends up. But we don't do the top-level check because we don't have the data available. Desktop holds a list of "embed URIs" and checks it in the History::SetURITitle call.

Even though we know about network redirects, we need a way to guess that a JS-style or client-side redirect occurred. We could assume that if the user did not type a URL or tap a link, then the page must have caused the load. If the page caused a load within a short threshold from the last load, maybe it was a client-side redirect.
Throwing some more notes in this bug for now:

* We implement CanAddVisit in GlobalHistory.java which is not optimal. nsAndroidHistory.cpp will call VisitURI and SetURITitle via JNI into GeckoAppShell, which starts a background thread and calls the GlobalHistory methods. Thos mthods then call the local CanAddVisit, which can return false. Why bother to make a JNI call and post a runnable to the background thread if we are only going to ignore the visit? Let's move CanAddVisit back into nsAndroidHistory. GlobalHistory.add and GlobalHistory.update are only called via the JNI stub in GeckoAppShell. We should be fine with moving CanAddVisit into nsAndroidHistory.

* The code to support a:visited runs through nsAndroidHistory, via GeckoAppShell, into GlobalHistory. Again, I'd like to move this entirely into nsAndroidHistory, much like desktop's History.cpp does. I think it would be more performant. 

Let's basically work to kill GlobalHistory.java and make a simple JNI API to check, add and update history, which is called from nsAndroidHistory.cpp
Comment 5 is definitely a new bug :)
(In reply to Mark Finkle (:mfinkle) from comment #5)

> Let's basically work to kill GlobalHistory.java and make a simple JNI API to
> check, add and update history, which is called from nsAndroidHistory.cpp

f+


(In reply to Mark Finkle (:mfinkle) from comment #6)
> Comment 5 is definitely a new bug :)

File it! :D
> (In reply to Mark Finkle (:mfinkle) from comment #6)
> > Comment 5 is definitely a new bug :)
> 
> File it! :D

bug 949541
This patch adds a delayed DB write for history visits. The nsDocShell code will send us flags for temporary and permanent redirects, but those flags are sent with the page being redirected to, not the page doing the redirection. Therefore, we need to delay writing the visits until we are sure the visit is not redirecting somewhere else.

I use a nsITimer to perform a delay. If we are told of a redirect, we cancel the timer and the previous visit is not saved.

I want to make the timeout delay become a preference, but also want some feedback. I use nsRefPtr for the timer and the cached visit, and I'm wondering if that's correct.
Assignee: nobody → mark.finkle
Attachment #8348138 - Flags: feedback?(rnewman)
Attachment #8348138 - Flags: feedback?(blassey.bugs)
Comment on attachment 8348138 [details] [diff] [review]
Don't write redirects to history v0.1

Review of attachment 8348138 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/components/build/nsAndroidHistory.cpp
@@ +116,5 @@
> +  nsAutoCString uri;
> +  nsresult rv = mPendingVisitURI->GetSpec(uri);
> +  if (NS_FAILED(rv)) return rv;
> +  NS_ConvertUTF8toUTF16 uriString(uri);
> +  GeckoAppShell::MarkURIVisited(uriString);

I think MarkURIVisited should still be called unconditionally, rather than being hidden in the timer callback.

@@ +156,5 @@
> +  if (aFlags & VisitFlags::REDIRECT_PERMANENT || aFlags & VisitFlags::REDIRECT_TEMPORARY) {
> +    __android_log_print(ANDROID_LOG_INFO, "DOCSHELL", "    don't save the pending visit (temporary or permanent)");
> +    // We want to ignore this, so cancel the pending visit write.
> +    mTimer->Cancel();
> +    mPendingVisitURI = 0;

I think we could benefit from something a little smarter here.

aLastVisitedURI is the last URI in the visit chain. That means it'll be correct even in the presence of multiple simultaneous connections, which mPendingVisitURI won't be without your "assume..." canceling below.

Consider doing this:

* Keep a collection of pending URIs.
* When VisitURI is called *without* redirection, you can immediately record a visit for aLastVisitedURI, and remove that URI from the pending set.
* When VisitURI is called *with* redirection, just remove that URI from the pending set.
* Regardless, track the new URI.
* Omit the timer altogether -- track timestamps in the pending set, and process the whole set in this method, and then once when we die. Or add a hook for when pages are closed.

This will:

* Work better in the presence of multiple connections.
* Perhaps handle the "click here if you're not redirected in ten seconds" case.
* Avoid the need to cancel the timer (or avoid the timer altogether).

If you take a look at nsAndroidHistory::Run and mPendingURIs, you can see that this pattern pretty much already exists, though I confess to not knowing when or how that gets called.

::: mobile/android/components/build/nsAndroidHistory.h
@@ +44,5 @@
>  
>    nsresult CanAddURI(nsIURI* aURI, bool* canAdd);
>  
>    /**
> +   * Redirection (temporary and permanent) flags are snet with the redirected

s/snet/sent
Attachment #8348138 - Flags: feedback?(rnewman) → feedback+
(In reply to Richard Newman [:rnewman] from comment #10)

> > +  if (NS_FAILED(rv)) return rv;
> > +  NS_ConvertUTF8toUTF16 uriString(uri);
> > +  GeckoAppShell::MarkURIVisited(uriString);
> 
> I think MarkURIVisited should still be called unconditionally, rather than
> being hidden in the timer callback.

Calling MarkURIVisited writes to the DB. Tell me more. I don't think you want the URI written unconditionally.

> @@ +156,5 @@
> > +  if (aFlags & VisitFlags::REDIRECT_PERMANENT || aFlags & VisitFlags::REDIRECT_TEMPORARY) {
> > +    __android_log_print(ANDROID_LOG_INFO, "DOCSHELL", "    don't save the pending visit (temporary or permanent)");
> > +    // We want to ignore this, so cancel the pending visit write.
> > +    mTimer->Cancel();
> > +    mPendingVisitURI = 0;
> 
> I think we could benefit from something a little smarter here.
> 
> aLastVisitedURI is the last URI in the visit chain. That means it'll be
> correct even in the presence of multiple simultaneous connections, which
> mPendingVisitURI won't be without your "assume..." canceling below.
> 
> Consider doing this:
> 
> * Keep a collection of pending URIs.
> * When VisitURI is called *without* redirection, you can immediately record
> a visit for aLastVisitedURI, and remove that URI from the pending set.
> * When VisitURI is called *with* redirection, just remove that URI from the
> pending set.
> * Regardless, track the new URI.
> * Omit the timer altogether -- track timestamps in the pending set, and
> process the whole set in this method, and then once when we die. Or add a
> hook for when pages are closed.

I agree we can't depend on a single mPendingVisitURI, so I'll look into this. Might not end up exactly as you suggest, but we'll see.
Comment on attachment 8348138 [details] [diff] [review]
Don't write redirects to history v0.1

Review of attachment 8348138 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed on irc, I'd much rather annotate the DB when a redirect happens than use this timer

::: mobile/android/components/build/nsAndroidHistory.cpp
@@ +116,5 @@
> +  nsAutoCString uri;
> +  nsresult rv = mPendingVisitURI->GetSpec(uri);
> +  if (NS_FAILED(rv)) return rv;
> +  NS_ConvertUTF8toUTF16 uriString(uri);
> +  GeckoAppShell::MarkURIVisited(uriString);

+1
Attachment #8348138 - Flags: feedback?(blassey.bugs) → feedback-
Blocks: 947390
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #13)
> Comment on attachment 8348138 [details] [diff] [review]
> Don't write redirects to history v0.1
> 
> Review of attachment 8348138 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> As discussed on irc, I'd much rather annotate the DB when a redirect happens
> than use this timer

We should never write / delete to the DB. We should always try to avoid that. For example, navigating to http://twitter.com causes this flow:

http://twitter.com ->
https://twitter.com ->
http://mobile.twitter.com ->
https://mobile.twitter.com

Currently we touch the DB 4 times.
Using the patch, we only touch the DB once: writing https://mobile.twitter.com
Using a write/delete approach would would touch the DB 7 times.
(In reply to Richard Newman [:rnewman] from comment #10)

> * Keep a collection of pending URIs.
> * When VisitURI is called *without* redirection, you can immediately record
> a visit for aLastVisitedURI, and remove that URI from the pending set.
> * When VisitURI is called *with* redirection, just remove that URI from the
> pending set.
> * Regardless, track the new URI.
> * Omit the timer altogether -- track timestamps in the pending set, and
> process the whole set in this method, and then once when we die. Or add a
> hook for when pages are closed.

I am on-board with most of this. I still think we need a timer to process the pending URI set. We need triggers to process the pending URIs. One trigger is the VisitURI call itself. You suggest using "when we die" as another. This is not realistic since Firefox never cleanly "dies" on Android. You also suggest a "hook for when pages are closed", but even that will orphan the last open tabs. Not to mention add a complexity.

No, I think the timer is a suitable approach, and given how many other parts of the Mozilla platform use it for very similar purposes, I think we need a real reason for trying to avoid it.

> If you take a look at nsAndroidHistory::Run and mPendingURIs, you can see
> that this pattern pretty much already exists, though I confess to not
> knowing when or how that gets called.

This code is used to process the a:visited logic and is not that helpful for the redirection detection.
(In reply to Mark Finkle (:mfinkle) from comment #15)

> No, I think the timer is a suitable approach, and given how many other parts
> of the Mozilla platform use it for very similar purposes, I think we need a
> real reason for trying to avoid it.

Sure, consider me convinced. So let's use a timer to make sure we do the rest of the work a few seconds after the last time we added to the pending URIs collection, and cancel it every time we remove the last item.

> This code is used to process the a:visited logic and is not that helpful for
> the redirection detection.

Wasn't suggesting that for *use*; just pointing out that there's a little pattern of a collection of pending URIs.
Status: NEW → ASSIGNED
This patch addresses most of the feedback, except it still uses a timer:
* Keep a collection of pending URIs.
* When VisitURI is called *without* redirection and it's pending, we immediately record a visit for aLastVisitedURI, and remove that URI from the pending set.
* When VisitURI is called *with* redirection, just remove that URI from the pending set.
* Regardless, track the new URI.
* We use a timer to process pending visits and save them to the DB.

This works for the 30x redirects I have seen. It even works for some pages that use meta refresh *and* location.replace changes. But it won't work for straight up location.replace/assign changes. We'll need additional data for that.

In my testing the following redirecting sites only save the final visit:
* http://drive.google.com
* http://cnn.com
* http://nytimes.com
* http://mozilla.org

Twitter works completely on the first load. No redirections are saved. On subsequent loads, one of the redirects is saved. It must have something to do with caching rules. The redirect is no longer seen by Gecko as a redirect, so we save it.

Also, any t.co redirects are not blocked.
Attachment #8348138 - Attachment is obsolete: true
Duplicate of this bug: 1029324
Unbitrotted.

This patch adds support for a "pending visit" concept. We hold any incoming visit in a queue for at most 3 secs. If we discover the visit was a redirect, we remove it from the queue. Otherwise, we save the visit as soon as we know it's not a redirect, or after the 3 second timeout as a fail safe.

Besides keeping some redirects out of our history, it also helps improve performance because saving to the DB is not free.

For example, with the patch:
http://twitter.com skips two redirects
http://drive.google.com skips two redirects
Attachment #8349903 - Attachment is obsolete: true
This patch removes some logging cruft left over. I tested the list in Comment 17 with the same outcome:

In my testing the following redirecting sites only save the final visit:
* http://drive.google.com
* http://cnn.com
* http://nytimes.com
* http://mozilla.org

Twitter works completely on the first load. No redirections are saved. On subsequent loads, one of the redirects is saved. It must have something to do with caching rules. The redirect is no longer seen by Gecko as a redirect, so we save it.

Also, any t.co redirects are not blocked, but handled by bug 838332.

Try run looks good:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1e45766b86cc
Attachment #8524634 - Attachment is obsolete: true
Attachment #8524728 - Flags: review?(rnewman)
Comment on attachment 8524728 [details] [diff] [review]
android-history-no-redirects v0.4

Review of attachment 8524728 [details] [diff] [review]:
-----------------------------------------------------------------

One follow-up to file, and two spots where I think your array handling might need tweaks.

::: mobile/android/components/build/nsAndroidHistory.cpp
@@ +170,5 @@
> +  PendingVisitArray::index_type i;
> +  for (i = 0; i < mPendingVisitURIs.Length() && !equals; ++i) {
> +    aURI->Equals(mPendingVisitURIs.ElementAt(i), &equals);
> +    if (equals) {
> +      mPendingVisitURIs.RemoveElementAt(i);

Consider pending = ["A", "A", "B"];

i=0 is a match. We remove element 0. i is then preincremented to 1, and pending is now ["A", "B"].

We check "B" next.

I think you want to remove the ++i from the loop header, and increment it in an 'else' here.

@@ +173,5 @@
> +    if (equals) {
> +      mPendingVisitURIs.RemoveElementAt(i);
> +    }
> +  }
> +  return equals;

I don't think this is right.

Consider aURI = "A", pending = ["A", "B", "A", "C"].

We'll loop and set equals to true, false, true, false.

Then we'll return the last one -- false.

@@ +183,5 @@
> +  // Any pending visits left in the queue have exceeded our threshold for
> +  // redirects, so save them
> +  PendingVisitArray::index_type i;
> +  for (i = 0; i < mPendingVisitURIs.Length(); ++i) {
> +    SaveVisitURI(mPendingVisitURIs.ElementAt(i));

This will flush N database operations into the background thread.

I'd love a follow-up that makes SaveVisitURI/MarkURIVisited suck less.

@@ +200,5 @@
> +    mozilla::widget::android::GeckoAppShell::MarkURIVisited(NS_ConvertUTF8toUTF16(spec));
> +  }
> +
> +  // Add the URI to our cache so we can take a fast path later
> +  AppendToRecentlyVisitedURIs(aURI);

Perhaps do this first?

(And I wonder if this is worth doing for all the pending visit URIs in one go around line 185...)
Attachment #8524728 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #21)
> Comment on attachment 8524728 [details] [diff] [review]
> android-history-no-redirects v0.4
> 
> Review of attachment 8524728 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> One follow-up to file, and two spots where I think your array handling might
> need tweaks.
> 
> ::: mobile/android/components/build/nsAndroidHistory.cpp
> @@ +170,5 @@
> > +  PendingVisitArray::index_type i;
> > +  for (i = 0; i < mPendingVisitURIs.Length() && !equals; ++i) {
> > +    aURI->Equals(mPendingVisitURIs.ElementAt(i), &equals);
> > +    if (equals) {
> > +      mPendingVisitURIs.RemoveElementAt(i);
> 
> Consider pending = ["A", "A", "B"];
> 
> i=0 is a match. We remove element 0. i is then preincremented to 1, and
> pending is now ["A", "B"].
> 
> We check "B" next.
> 
> I think you want to remove the ++i from the loop header, and increment it in
> an 'else' here.

There is an | !equals | check in the for loop condition, so we break out of the loop once we find a matching URI. I'm trying to decide if that's a good thing or not. It's been a while since i wrote that part.

> > +    if (equals) {
> > +      mPendingVisitURIs.RemoveElementAt(i);
> > +    }
> > +  }
> > +  return equals;
> 
> I don't think this is right.
> 
> Consider aURI = "A", pending = ["A", "B", "A", "C"].
> 
> We'll loop and set equals to true, false, true, false.
> 
> Then we'll return the last one -- false.

Not if we break out of the loop on the first match.

> > +  // Any pending visits left in the queue have exceeded our threshold for
> > +  // redirects, so save them
> > +  PendingVisitArray::index_type i;
> > +  for (i = 0; i < mPendingVisitURIs.Length(); ++i) {
> > +    SaveVisitURI(mPendingVisitURIs.ElementAt(i));
> 
> This will flush N database operations into the background thread.
> 
> I'd love a follow-up that makes SaveVisitURI/MarkURIVisited suck less.

Some way to batch them might be nice. We should never have too many in there though.

> > +  // Add the URI to our cache so we can take a fast path later
> > +  AppendToRecentlyVisitedURIs(aURI);
> 
> Perhaps do this first?

No problem with moving it

> (And I wonder if this is worth doing for all the pending visit URIs in one
> go around line 185...)

You'll have to tell me more about what you're thinking
(In reply to Mark Finkle (:mfinkle) from comment #22)

> > ::: mobile/android/components/build/nsAndroidHistory.cpp
> > @@ +170,5 @@
> > > +  PendingVisitArray::index_type i;
> > > +  for (i = 0; i < mPendingVisitURIs.Length() && !equals; ++i) {
> > > +    aURI->Equals(mPendingVisitURIs.ElementAt(i), &equals);
> > > +    if (equals) {
> > > +      mPendingVisitURIs.RemoveElementAt(i);
> > 
> > Consider pending = ["A", "A", "B"];
> > 
> > i=0 is a match. We remove element 0. i is then preincremented to 1, and
> > pending is now ["A", "B"].
> > 
> > We check "B" next.
> > 
> > I think you want to remove the ++i from the loop header, and increment it in
> > an 'else' here.
> 
> There is an | !equals | check in the for loop condition, so we break out of
> the loop once we find a matching URI. I'm trying to decide if that's a good
> thing or not. It's been a while since i wrote that part.

I did add a comment giving more context about the method
(In reply to Mark Finkle (:mfinkle) from comment #22)

> There is an | !equals | check in the for loop condition, so we break out of
> the loop once we find a matching URI. 

Ah, I missed that.


> I'm trying to decide if that's a good
> thing or not. It's been a while since i wrote that part.

It's a good thing if we never add a duplicate, treating the array as a set. If so, we should just early return from inside the loop and skip the whole `equals` thing. If not, we should account for duplicates and process the entire array. Right now we're just assuming that we'll never visit the same URL twice, which is a little optimistic, I think.


> > I'd love a follow-up that makes SaveVisitURI/MarkURIVisited suck less.
> 
> Some way to batch them might be nice. We should never have too many in there
> though.

Yeah, I just want to make sure that some kind of session restore/open multiple links in tabs or simply clicking once every two seconds for a while doesn't result in a big blast of five, ten, fifteen separate database writes, each of which takes 50-100msec, and will occur three seconds after the user's last action, right when they're starting to scroll.
 

> > (And I wonder if this is worth doing for all the pending visit URIs in one
> > go around line 185...)
> 
> You'll have to tell me more about what you're thinking

Both of these are basically arrays, and we're trying to do two things: add all of these to the cache (then resize down if it got too big) and add all of these to the DB. An 'addAll' primitive would make this simpler and more efficient in both cases.
* Added a comment to RemovePendingVisitURI
* Return directly from the if block in RemovePendingVisitURI to make it more clear
* Moved the AppendToRecentlyVisitedURIs call to the top of SaveVisitURI

VisitURI does try to ignore duplicate URIs by checking aURI and aLastVisitedURI. It also tries to flush the mPendingVisitURIs array quickly if we know aLastVisitedURI is not a redirect.
Attachment #8524728 - Attachment is obsolete: true
Attachment #8524855 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #24)

> > > I'd love a follow-up that makes SaveVisitURI/MarkURIVisited suck less.
> > 
> > Some way to batch them might be nice. We should never have too many in there
> > though.
> 
> Yeah, I just want to make sure that some kind of session restore/open
> multiple links in tabs or simply clicking once every two seconds for a while
> doesn't result in a big blast of five, ten, fifteen separate database
> writes, each of which takes 50-100msec, and will occur three seconds after
> the user's last action, right when they're starting to scroll.

I wonder if this should be part of bug 949541 or something separate.
Maybe. Or maybe BrowserProvider should be buffering writes -- after all, it's the component that's least likely to get shut down early.
Comment on attachment 8524855 [details] [diff] [review]
android-history-no-redirects v0.5

Review of attachment 8524855 [details] [diff] [review]:
-----------------------------------------------------------------

Ship it!

::: mobile/android/components/build/nsAndroidHistory.cpp
@@ +173,5 @@
> +  for (i = 0; i < mPendingVisitURIs.Length(); ++i) {
> +    aURI->Equals(mPendingVisitURIs.ElementAt(i), &equals);
> +    if (equals) {
> +      mPendingVisitURIs.RemoveElementAt(i);
> +      return equals;

return true;

@@ +176,5 @@
> +      mPendingVisitURIs.RemoveElementAt(i);
> +      return equals;
> +    }
> +  }
> +  return equals;

return false;
Attachment #8524855 - Flags: review?(rnewman) → review+
I filed Bug 1101240 for batching history writes.
We still need to find a better solution for JS based redirection.
Summary: Don't write history visits during redirection → Don't write history visits during HTTP redirection
There is a nagging assertion happening:

I/Gecko   ( 2248): [2248] ###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file ../../dist/include/nsCOMPtr.h, line 402

I am trying to debug
Found it:

-NS_IMPL_ISUPPORTS(nsAndroidHistory, IHistory, nsIRunnable)
+NS_IMPL_ISUPPORTS(nsAndroidHistory, IHistory, nsIRunnable, nsITimerCallback)

All green, even Debug:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=814227d3a121

https://hg.mozilla.org/integration/fx-team/rev/0ed3e26cbc2b
https://hg.mozilla.org/mozilla-central/rev/0ed3e26cbc2b
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 36
You need to log in before you can comment on or make changes to this bug.