Closed Bug 421067 Opened 12 years ago Closed 12 years ago

"Try Again" after a DNS error sometimes loads previous page, not the one that failed to load

Categories

(Core :: DOM: Navigation, defect, P2)

x86
All
defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: myk, Assigned: michal)

References

Details

(Keywords: regression)

Attachments

(2 files, 2 obsolete files)

If I click the "Try Again" button in an error page reporting a DNS error ("Address Not Found"), and DNS resolution works the second time, the error page sometimes gets replaced by the page that had previously been loaded into that tab, not the page that prompted the DNS error.

Steps to Reproduce:

1. load a page in a tab;
2. turn off DNS resolution (f.e. disable networking);
3. try to load another page in that tab;
4. when the error page appears, reenable DNS resolution and click "Try Again".

Expected results: the second page you tried to load appears.

Actual results: the first page you loaded appears.

Unfortunately, I can't reliably reproduce this.
I've seen this too on Windows
OS: Linux → All
Duplicate of this bug: 421544
Flags: blocking-firefox3?
(At least in my case) When this issue happens, if you clicked back again, you stays on the same page you're already. For an outsider like me, it seems that Firefox haven't lost track on what page it is now, but it lost track on what page to show after a Try Again or Back button. (i.e. after a Try Again, it sets itself to the correct page, but showed the wrong one instead)
You can also reproduce this by attempting to visit a url with an entirely bogus hostname.

steps to reproduce:

1.  Navigate to http://xxx.yyy.zzz/ by entering it in the URL bar
2.  Click the Try again button and watch it fail a second time
3.  Click Try again a second time.
I can only reproduce this bug when there's actually some history in the tab ie:

1. spawn this in new tab: Bug 421067
2. click the URL in Comment #5

no matter how many times you hit try again it does not seem to reproduce the bug, but hit the back button and then click the URL again, this time the try again button will redirect you back to the bug after the 2nd try.
--> Core::Docshell
Component: General → Embedding: Docshell
Flags: blocking-firefox3?
Product: Firefox → Core
QA Contact: general → docshell
Carrying over blocking flag, but not sure if it should. My major concern here is exploitability, maybe?
Flags: blocking1.9?
It should (carry blocking flag) because if this bug is let through it'd turn everybody away from using Firefox, nobody would trust a browser that have such an obvious major bug in its "stable" version. I said major, cause back and forward (and the Page not found error page) is something that 99.99% people uses regularly. Bugs in RSS or History is something that people can tolerate, bugs in address box or back-and-forward or refresh or stop is a bug that hurts the reputation of the browser more than causing direct harm itself.
Blocking because this is ugly.

Lie Ryan: please save those comments for elsewhere. We hear that about every bug, saying that *this* bug is the one that needs to be fixed isn't going to help our blocking decisions one way or the other.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Assignee: nobody → michal
Actually I realized that that wasn't a very helpful nor informative nor useful post. But actually I have a good nose that can smell wbhether a bug is a must block and what bug should possibly be blocking bug. And I realized that this bug isn't one that is going to be controversial.
Status: NEW → ASSIGNED
I reported bug 424001 which could be related to this issue or based on the same core bug.
I see this only with the official nightly.  I am completely unable to reproduce this with any build I produce myself from the current source tree.  I would bet this is somehow related to profile guided optimization.
Blocks: 418865
Keywords: regression
No longer blocks: 418865
I don't think so. Running my debug build with the STR in bug 424001 is identical to an official nightly build. After the second try it goes back to the first history entry for this tab.
(In reply to comment #14)
> I don't think so. Running my debug build with the STR in bug 424001 is
> identical to an official nightly build. After the second try it goes back to
> the first history entry for this tab.
> 

It would seem you are right. Now 10 minutes later I can no longer reproduce this with today's official nightly using the exact same procedure that did reproduce it before.  So, I guess it is just not 100% reliably reproducible, which, of course, will make it difficult to narrow down a regression range. ;-(
I can reproduce it consistently:

1) type http://www.mozilla.org/ in URL bar
2) type http://www.mozilla.com/ in URL bar
3) go back in history
4) type http://www.mozilla.xxx/ in URL bar
5) press "Try Again"
6) press "Try Again" -> www.mozilla.org is loaded

Attached patch patch (obsolete) — Splinter Review
Bug is IMHO caused by patch from bug #302115. See http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvsroot&subdir=/mozilla/docshell/base&command=DIFF_FRAMESET&file=nsDocShell.cpp&rev2=1.771&rev1=1.770

Member mRequestedIndex in nsSHistory is changed only in LoadEntry() method (i.e. when working with history) but this method isn't called when new URL is typed in URL bar. In step 5 in comment #16 there is mIndex==1 but mRequestedIndex==0 (from step 3) in mSessionHistory.

I have no idea why patch for bug #302115 couldn't use GetIndex() instead of GetRequestedIndex(). In all my cases this code is called after nsSHistory::UpdateIndex() where mIndex is synchronized with mRequestedIndex. Anyway this patch keeps using mRequestedIndex in case that UpdateIndex() would be called after http://mxr.mozilla.org/firefox/source/docshell/base/nsDocShell.cpp#3223, but if it is -1 it uses mIndex. And mRequestedIndex is set to -1 in UpdateIndex().
Attachment #311593 - Flags: review?(benjamin)
Attachment #311593 - Flags: review?(benjamin) → review?(bzbarsky)
Comment on attachment 311593 [details] [diff] [review]
patch

This looks good, with the following provisos:

1)  You add documentation to the idl that says that requestedIndex can be -1 and what that means.
2)  You add some mochitests for this bug and bug 302115.

It's likely that this patch also fixes bug 424001.  If so, can you add a test for that too?

This patch might want to land in the beta, but I can understand if the tests don't land until after beta given the time-crunch.
Attachment #311593 - Flags: review?(bzbarsky) → review+
Attached patch patch v2Splinter Review
(In reply to comment #18)
> (From update of attachment 311593 [details] [diff] [review])
> This looks good, with the following provisos:
> 
> 1)  You add documentation to the idl that says that requestedIndex can be -1
> and what that means.

Added.

> 2)  You add some mochitests for this bug and bug 302115.

I'll do it, first I need to learn how to write them ;)

> It's likely that this patch also fixes bug 424001.  If so, can you add a test
> for that too?

I can confirm that it fixes 424001 too. I'll try to provide testcase for it too.

> This patch might want to land in the beta, but I can understand if the tests
> don't land until after beta given the time-crunch.

OK
Attachment #311593 - Attachment is obsolete: true
Keywords: checkin-needed
Michal, same applies for bug 407548 which should also be fixed by your patch. Any more possible cases?

Due to both bugs rely on the same core issue, I will mark them as dupes of this one.
Duplicate of this bug: 424001
Duplicate of this bug: 407548
(In reply to comment #19)
> Created an attachment (id=311620) [details]
> patch v2

No sr+ needed?
Attachment #311620 - Flags: superreview?(benjamin)
Comment on attachment 311620 [details] [diff] [review]
patch v2

bz said in comment #18 that this might want to go into the beta.
Attachment #311620 - Flags: approval1.9b5?
Attachment #311620 - Flags: superreview?(benjamin) → superreview+
Comment on attachment 311620 [details] [diff] [review]
patch v2

No need to rush this into the beta, I don't think. Let's wait for the reftests and get it in post-beta.
Attachment #311620 - Flags: approval1.9b5?
Attachment #311620 - Flags: approval1.9b5-
Attachment #311620 - Flags: approval1.9?
Attachment #311620 - Flags: approval1.9?
Duplicate of this bug: 425552
Waiting on the tests...
Flags: in-testsuite?
Keywords: checkin-needed
Target Milestone: --- → mozilla1.9
Attached patch mochitest (obsolete) — Splinter Review
The test works, but I don't like the function checkTestWindow(). I tried to listen to "load" event using:

  testwindow = window.open("...", "testwindow");
  testwindow.addEventListener("load", doNextStep);

But it doesn't work. Also the test takes really long time due to timeout when loading www.nonexistentdomain.zzz. Is there some URL in mochikit that will fail quickly?

If the test is acceptable I'll write test for #302115 in the same way. But I don't know how to write test for #424001.
Attachment #313095 - Flags: review?(bzbarsky)
Comment on attachment 313095 [details] [diff] [review]
mochitest

>Index: docshell/test/test_bug421067.html

>+  <script type="text/javascript" src="chrome://mochikit/content/MochiKit/MochiKit.js"></script>
>+  <script type="text/javascript" src="chrome://mochikit/content/tests/SimpleTest/SimpleTest.js"></script>
>+  <link rel="stylesheet" type="text/css" href="chrome://mochikit/content/tests/SimpleTest/test.css" />

For a non-chrome mochitest, those shouldn't be chrome URIs, should they?

>+function checkTestWindow(callback)

Is there a reason to be using window.open() instead of a subframe (such that you could listen to the load event on the subframe)?  Is the bug only happening with bfcache?

>+  case 2:
>+    is(testwindow.location,
>+      "http://test1.example.org/tests/docshell/test/bug421067-page.html",
>+      "Unexpected window location in step 2");
>+    testwindow.document.body.innerHTML = "";

How does that work?  Isn't that a cross-origin access?  I would expect this to throw...

>+  case 3:
...
>+    testwindow.history.back();
>+    postCheckTestWindow(doNextStep);

And here, I wouldn't expect doNextStep to ever get called if the window got bfcached, since the innerHTML set back in case 2 will mean that the innerHTML is empty.  Is the window in fact not being bfcached?

>+  case 4:
>+    testwindow.location = "http://www.nonexistentdomain.zzz/";
>+    testwindow.history.back();

Uh...  That doesn't make much sense to me.  Shouldn't you wait for the location load to error out before doing the history.back()?

>+  case 7:
...
>+    SimpleTest.finish();
>+    testwindow.close();

You probably want to close() before finish() if you do end up having to use a window.

Also, if you do end up using a window, I'd put "width=100,height=100" features on it.  That will help with some automated test running.

> Is there some URL in mochikit that will
> fail quickly?

For DNS, probably not.  I wonder whether returning no data from the server or something would be possible using an sjs to produce a different error page....
(In reply to comment #29)
> (From update of attachment 313095 [details] [diff] [review])
> >Index: docshell/test/test_bug421067.html
> 
> For a non-chrome mochitest, those shouldn't be chrome URIs, should they?

Yes, I was testing listening to onload event under chrome and forgot to revert it back.

> Is there a reason to be using window.open() instead of a subframe (such that
> you could listen to the load event on the subframe)?  Is the bug only happening
> with bfcache?

I just tested it and I can't reproduce the bug when using iframe.

> >+  case 2:
> >+    is(testwindow.location,
> >+      "http://test1.example.org/tests/docshell/test/bug421067-page.html",
> >+      "Unexpected window location in step 2");
> >+    testwindow.document.body.innerHTML = "";
> 
> How does that work?  Isn't that a cross-origin access?  I would expect this to
> throw...

By assigning "" to innerHTML I stopped loading the page if it was still in progress. I agree that it is a hack, but it didn't throw.

> >+  case 3:
> ...
> >+    testwindow.history.back();
> >+    postCheckTestWindow(doNextStep);
> 
> And here, I wouldn't expect doNextStep to ever get called if the window got
> bfcached, since the innerHTML set back in case 2 will mean that the innerHTML
> is empty.  Is the window in fact not being bfcached?

Don't know the details but doNextStep was called.

> >+  case 4:
> >+    testwindow.location = "http://www.nonexistentdomain.zzz/";
> >+    testwindow.history.back();
> 
> Uh...  That doesn't make much sense to me.  Shouldn't you wait for the location
> load to error out before doing the history.back()?

Hmm, history.back() shouldn't be there.

Since the bug cannot be reproduced in iframe and this seems to be really hack, I think that maybe it can be implemented as browser-chrome test. There shouldn't be problem to listen to onload event. Am I right?
> I just tested it and I can't reproduce the bug when using iframe.

In that case, this test probably needs some additional checks to make sure it's hitting the bfcache (that is, that it's in the situation of this bug).

> By assigning "" to innerHTML I stopped loading the page 

That shouldn't be the case.  I don't understand why this works, and that bothers me.

You might be able to implement this in browser chrome, but note that error pages don't fire load/show events (which is its own bug).
(In reply to comment #31)
> > I just tested it and I can't reproduce the bug when using iframe.
> 
> In that case, this test probably needs some additional checks to make sure it's
> hitting the bfcache (that is, that it's in the situation of this bug).

How can this be checked?

> > By assigning "" to innerHTML I stopped loading the page 
> 
> That shouldn't be the case.  I don't understand why this works, and that
> bothers me.
> 
> You might be able to implement this in browser chrome, but note that error
> pages don't fire load/show events (which is its own bug).

This problem I had when I was trying to reproduce it using iframe. I didn't find a way to get any usable notification, so I tested it manually. So how can I recognize that error page was displayed?
> How can this be checked?

One option is setting some properties on the window, then when you come back to it via history seeing whether they are still there.  There are existing mochitests that do this; grepping for "bfcache" in mochitests should find them...

> So how can I recognize that error page was displayed?

Probably need to poll and look for the about:neterror thing?  biesi might know more.
Attached patch new testSplinter Review
Unfortunately the test cannot reproduce the bug always. It seems that increasing number of reloads raises the probability of failing. So there are 3 reloads instead of 2.
Attachment #313095 - Attachment is obsolete: true
Attachment #314874 - Flags: review?(bzbarsky)
Attachment #313095 - Flags: review?(bzbarsky)
That seems really odd, given that by hand it's 100% reproducible...

In any case, it'll be at least a week before I can possibly look at this.
Bz can you recommend an alternate reviewer?
Maybe biesi?

In any case, this is just for the test.  I think the patch is good and we should take it, even as we work on getting the test to be reliable, etc.
Comment on attachment 311620 [details] [diff] [review]
patch v2

Requesting approval per bz's comment #37.
Attachment #311620 - Flags: approval1.9?
Attachment #311620 - Flags: approval1.9? → approval1.9+
mozilla/docshell/base/nsDocShell.cpp 	1.896
mozilla/docshell/shistory/public/nsISHistory.idl 	1.16
mozilla/docshell/shistory/src/nsSHistory.cpp 	1.87
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Comment on attachment 314874 [details] [diff] [review]
new test

OK.  So looking at the test, why use nsIWebProgressListener?  What's the magic "511" value?  You just want to listen for onload, right?

How come you're back to using a browser test to test core functionality?  This would ideally be tested in a regular mochitest without relying on the Firefox UI, no?
Attachment #314874 - Flags: review?(bzbarsky) → review-
(In reply to comment #40)
> (From update of attachment 314874 [details] [diff] [review])
> OK.  So looking at the test, why use nsIWebProgressListener?  What's the magic
> "511" value?  You just want to listen for onload, right?

511 - nsIWebProgress::NOTIFY_ALL
Are you sure that you can listen for onload of error page?

> How come you're back to using a browser test to test core functionality?  This
> would ideally be tested in a regular mochitest without relying on the Firefox
> UI, no?

It's a long time ago I wrote this test, so I don't remember details, but I know that there was some problem when I wrote it as mochitest.
> 511 - nsIWebProgress::NOTIFY_ALL

So use Components.interfaces.nsIWebProgress.NOTIFY_ALL?

> Are you sure that you can listen for onload of error page?

Gah.  That bug again.  You might not be able to, indeed.  I'm just still trying to figure out why this is not 100% reproducible with the test, while it's 100% reproducible manually... any ideas?
I've spent some more time on this and I think I know where the problem is. Following methods of nsDocShell are called in this order when this bug appears:

1) Reload()        mOSHE=correct, mLSHE=correct
2) DoURILoad()     aURI=http://test1.example.org/tests/docshell/test/bug421067-errorpage.html
3) Embed()         mOSHE=mLSHE, so both are correct
4) nsDocLoader::doStopDocumentLoad() here is called nsIWebProgressListener::onStateChange() with aStateFlags = STATE_STOP | STATE_IS_WINDOW
5) EndPageLoad()   mLSHE=null
6) LoadErrorPage() mLSHE=wrong due to this bug
7) DoURILoad()     aURI=about:neterror?e=contentEncodingError...
8) Embed()         mOSHE=mLSHE, so both are now wrong

Next Reload() will load the wrong page.

In the testcase in nsIWebProgressListener::onStateChange() is set a timer for the next step where Reload() is called again. This Reload() is sometimes called before step 8. So mOSHE stays correct and the bug doesn't appear.
OK.  So how come we need to set a timer?  I thought the whole point of using nsIWPL here was so that we'd detect the end of the error page load, right?
(In reply to comment #44)
> OK.  So how come we need to set a timer?  I thought the whole point of using
> nsIWPL here was so that we'd detect the end of the error page load, right?

IMHO both are needed. nsIWPL is there to be able to detect that error page load has finished (we can't write the testcase without that). Timer is there to be able to reproduce the bug (the testcase will succeed without that).
Wait.  By "error page load has finished" what do we mean?  Wouldn't Embed() happen before it's finished?
(In reply to comment #46)
> Wait.  By "error page load has finished" what do we mean?  Wouldn't Embed()
> happen before it's finished?

OK, you are right. From comment #43 it is obvious that nsIWPL is notified after loading of requested page failed but before loading of error page begins. And so the timer is needed to ensure that Reload() is called after the second Embed() happen.
So we send no nsIWPL notifications for the error page?
(In reply to comment #48)
> So we send no nsIWPL notifications for the error page?

Yes, exactly. There is notification when the request fail, but there is no notification when error page is loaded.
Gah.  biesi, is there really no way to detect when the error page is done loading?

Can we poll on some aspect of the error page js state or DOM or something?
You need to log in before you can comment on or make changes to this bug.