Closed Bug 522545 Opened 10 years ago Closed 10 years ago

some tabs not loaded, are stuck in zombie mode

Categories

(Firefox :: Session Restore, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
Firefox 3.7a1
Tracking Status
status1.9.2 --- beta3-fixed
status1.9.1 --- unaffected

People

(Reporter: dietrich, Assigned: zpao)

References

Details

(Keywords: regression, verified1.9.2)

Attachments

(1 file, 5 obsolete files)

a couple of times when i've restored my session (usually around 80 - 120 tabs), a subset of the restored tabs are in an odd state:

- throbber still throbbing
- status bar says "done"
- page content is empty

notes:
- url is properly in urlbar (after activating tab)
- clicking reload does nothing at all
- shift+reload properly reloads and everything is fine then
- i disabled all my extensions and reduced number of tabs, still happened
Yeah, this has hit me a couple of times too. Just hit it again, and called zpao over to look. It's repeatedly reproducible with may sessionstore.js, so we minimized it down by closing everything but one tab and deleting history data.

({
    "windows": [{
        "tabs": [{
            "entries": [{
                "url": "divx.com",
                "scroll": "0,0"
            }],
            "index": 1,
            "attributes": {},
            "_formDataSaved": true,
            "_tabStillLoading": true
        }]
    }],
    "selectedWindow": 0,
    "_closedWindows": [],
    "session": {
        "state": "stopped",
        "lastUpdate": 1255642959537
    }
})


Paul noticed the the url is "divx.com" and not "http://divx.com". Sure enough, changing that is enough to fix the problem, and tabs will load successfully.

I bet the problem is that something somewhere is trying to stuff this directly into a nsIURI... At least through ioService.newURI(), "host.com" throws but "http://host.com" doesn't.
OS: Windows NT → All
Hardware: x86 → All
(The way I ended up in this state was that the browser crashed right after I typed "divx.com" into the URL bar.)
I'm going to guess this is a side effect of bug 497730. Since the url saved was divx.com, we know the browser hadn't actually loaded the url, so we saved the userTypedValue. Then when we try to | shEntry.setURI(ioService.newURI(aEntry.url, null, null)); | it doesn't work and it just stops loading for any tabs that we load after that one.
Blocks: 497730
Flags: blocking-firefox3.6?
And confirmed.
[Exception... "Component returned failure code: 0x804b000a
(NS_ERROR_MALFORMED_URI) [nsIIOService.newURI]"  nsresult: "0x804b000a
(NS_ERROR_MALFORMED_URI)"  location: "JS frame ::
file:///Users/pao/mozilla/mozilla-central/obj-ff-debug/dist/MinefieldDebug.app/Contents/MacOS/components/nsSessionStore.js
:: sss_deserializeHistoryEntry :: line 2078"  data: no]

So we either change how nsIOService::NewURI works or we do something different.
Since it seems to be confined to cases where we save the userTypedValue, I can
look for a scheme and if it's not there, add "http://". I'm not sure what we do
normally when a scheme is left off a typed url, so this solution probably isn't
quite right.
> I'm not sure what we do normally when a scheme is left off a typed url

Something along these lines (where aURI is the string given, which could be "divx.com"), with whitespace trimmed and all newlines (including ones in the middle) stripped out:

  nsCOMPtr<nsIURI> uri;
  NS_NewURI(getter_AddRefs(uri), aURI);
  if (uri) {
    aLoadFlags &= ~LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP;
  }
  PRUint32 fixupFlags = 0;
  if (aLoadFlags & LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP) {
    fixupFlags |= nsIURIFixup::FIXUP_FLAG_ALLOW_KEYWORD_LOOKUP;
  }
  sURIFixup->CreateFixupURI(aURI, fixupFlags, getter_AddRefs(uri));

Note that the NS_NewURI call is justthere so we can compute the right fixupFlags.  For the URL bar, this flag is set at least sometimes; you might want to check with a browser peer.

In any case, that only matters if we're going to do the keyword thing.  For basic stuff like converting "divx.com" to "http://divx.com" any call to the uri fixup service (nsIURIFixup) will work.
Attached patch Patch v0.1 (Code Only) (obsolete) — Splinter Review
Code only patch. Unlike bug 497730 this is actually testable, so I'll write that in the morning.

Boris, thanks for the pointer. I had already tracked my way into the world of docshell :) I decided to use FIXUP_FLAG_ALLOW_KEYWORD_LOOKUP to cover more cases. In browserland we do that, so it's only fair.
Assignee: nobody → paul
Status: NEW → ASSIGNED
Attachment #406646 - Flags: review?(zeniko)
Comment on attachment 406646 [details] [diff] [review]
Patch v0.1 (Code Only)

Shouldn't we rather just restore a blank tab and make sure the invalid URL shows up again in the URL bar? With this patch, a user could accidentally load a web page s/he never intended to by simply typing something into the address bar, then - without hitting Enter - working on other tabs and finally crash.

IMO we should instead add a new flag for when we use a user typed value _and_ the tab wasn't loading/busy. In that case load about:blank and restore the address bar's value through other means (and for backwards compatibility do the same when setURI throws).
(In reply to comment #7)
> (From update of attachment 406646 [details] [diff] [review])
> Shouldn't we rather just restore a blank tab and make sure the invalid URL
> shows up again in the URL bar? With this patch, a user could accidentally load
> a web page s/he never intended to by simply typing something into the address
> bar, then - without hitting Enter - working on other tabs and finally crash.

That's true...

> IMO we should instead add a new flag for when we use a user typed value _and_
> the tab wasn't loading/busy. In that case load about:blank and restore the
> address bar's value through other means (and for backwards compatibility do the
> same when setURI throws).

So I think that might be OK. It's not ideal in the case where they did type a proper URL and we crashed. Bug 497730 wouldn't show about:blank, but the user would have to go and press enter in all those tabs. For this bug it should only be 1 tab (in the typical case).

This could also happen for an entry in the middle of history (with some forward & back), at least as we do it now. Maybe the fix for bug 497730 should also check to see if there are any history entries and only take the user typed value if there are none.

I'll work on doing it this way & put the patch up.
If we just put a value in the url bar but don't load it, please make sure that the user typed value is updated in the process?  Otherwise if we crash again before the user has hit enter on that tab we lose.
Blocks: 521595
Duplicate of this bug: 521595
Flags: blocking-firefox3.6? → blocking-firefox3.6+
Attached patch Patch v0.1b (Code Only) (obsolete) — Splinter Review
Alternative approach.

The only concern I have here is that setting gURLBar.value like this could have unforeseen consequences (maybe flickering, maybe it being set to the wrong value). So I might need to make that case more specific since I think it only matters when there's a single tab in the window (for some reason). Writing the tests will help.
Attachment #408937 - Flags: review?(zeniko)
(In reply to comment #7)
> (and for backwards compatibility do the same when setURI throws).

Didn't handle that case. I probably should though. If this bug wasn't in the beta, I'd argue that it's not worth having the extra code around. I can wrap history.addEntry in a try/catch and assume that this is the only case that would cause it to throw. Not the safest assumption to make though. I'll at least check that it was a malformed uri exception.

(In reply to comment #9)
> If we just put a value in the url bar but don't load it, please make sure that
> the user typed value is updated in the process?  Otherwise if we crash again
> before the user has hit enter on that tab we lose.

We were already setting user typed value and it gets restored. We win!
Comment on attachment 408937 [details] [diff] [review]
Patch v0.1b (Code Only)

>+      // Instead we'll save what was typed in the location bar and restore that
>+      // without navigating to that location.

Reading through this bug and bug 497730 again, I've come to the conclusion that the fix for bug 497730 was wrong and should not be changed as this patch does.

What we originally wanted was to start loading tabs which were loading from a UI perspective only (i.e. where the xul:browser hasn't picked up the URI to navigate to yet). For this you've used the browser's userTypedValue which mostly has a different meaning and is used by xul:tabbrowser in a hacky way between a call to addTab and the load actually starting.

The correct solution would thus be to save the URI which the UI claims to be loading already in a different property which you can then fetch instead of userTypedValue (actually, that might still not be a proper URI, but at least you can start loading it again without privacy concerns). And to be exact, that new property would even have to be saved when we've already got some tab history (i.e. when the UI claims to be loading a new address due to a click or due to a new address having been entered in the address bar and a click on Go - but that address hasn't been added to the tab's session history yet).

Saving and restoring the actual userTypedValue would then be a different bug.
Attachment #408937 - Flags: review?(zeniko) → review-
Blocks: 523465
No longer blocks: 521595
So without going down into docshell (this gets down there pretty quickly), it doesn't look like there's a great way of doing this. I think by the time browser knows the actual URL being loaded, docshell has already loaded it into session history, meaning we don't even need this extra stuff. I need to look into it a bit more though. Somebody can correct me if I'm wrong - docshell scares me.

That said, it looks like xul:browser tries to at least say what sort of state it's in. I'm not sure if this is safe enough to depend on. c.f. http://mxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/browser.xml#565

I'm going to try this to use this in combination with userTypedValue and like you say, only load up URIs that were actually in some state of loading. A little testing shows that it should work both for the case with no history entries and tabs with some.
> I think by the time browser knows the actual URL being loaded, docshell has
> already loaded it into session history

For navigations originating via the docshell itself (link clicks, etc), yes.

For navigation originating in the UI (url bar, ctrl/cmd-click, etc) the UI clearly knows the URI it's trying to load before it ever gets close to being added to session history.
(In reply to comment #15)
> For navigation originating in the UI (url bar, ctrl/cmd-click, etc) the UI
> clearly knows the URI it's trying to load before it ever gets close to being
> added to session history.

What I meant was URIs like "divx.com" which of course the UI knows was typed. But the UI doesn't know if that URI is actually going to be the one that's visited. It gets added to userTypedValue as "divx.com" but the actual history entry is "http://www.divx.com/en/mac" after correction (and server redirection). The UI get's updated with the correct URI but (I think) by then it's in SH and sessionstore looks at SH first. Whatever is typed goes straight into browser.webnavigation (oh hey docshell) and at some point makes it back into the UI as the corrected uri. If it's already in SH as the correct URI before it's back in the UI, then it doesn't matter.
> but (I think) by then it's in SH

I believe this is correct.
Attached patch Patch v0.1c (WIP) (obsolete) — Splinter Review
The 3rd approach. A bunch of tests to make sure it restores the right state (1 is failing oddly), none to make sure it saves the right state, but that's hard to test (maybe could add 15 tabs, getCurrentState, and see that some have userTypedClear, kinda unreliable though). Still need a couple more tests to make sure the ones with userTypedClear set start loading (they do load when testing with my contrived sessionstore.js).

The only part I'm worried about at this point is the page that's currently active has form data, but if userTypedclear, I'm just navigating away, which will stop the form refilling. So if the user goes back, then the form state isn't restored (I think, I didn't actually test).

Simon, how does this approach work for you?
Attached patch Patch v0.1c (WIP) (for real) (obsolete) — Splinter Review
Forgot to qrefresh
Attachment #410627 - Attachment is obsolete: true
Comment on attachment 410629 [details] [diff] [review]
Patch v0.1c (WIP) (for real)

This looks better indeed. Let's go with this!

>+        // For sessions with an implicit userTypedValue but before bug 522545,
>+        // we want to keep going, but we should still throw in other cases.

I'm not sure if we really want to add code for an uncommon issue in prerelease code. Without this, people might lose their tabs just once more than with this - won't they? (And should we keep it, s/e/ex/g please.)

>+    // Handle userTypedValue here. Hopefully values won't get overridden after
>+    // load events

Will they?

>+        //XXXzpao This is probably going to interact poorly with
>+        //        restoreDocument_proxy... It works, but I haven't looked at
>+        //        form data & how that will restore

This should be fine, as we compare the current URL against the URL of the document for which we've saved form data. So, if the previous document never finishes loading, we just discard the form data, as it would have been discarded anyway, had Firefox crashed just a few seconds later.

>+        // If it's a userTypedValue but we're not loading the URI, then the
>+        // URL bar is not updated until you switch tabs back.
>+        aWindow.gURLBar.value = tabData.userTypedValue;

How does the address bar get updated when switching tabs? And couldn't we mimic a tab switch instead? (Else, AFAICT gURLBar isn't guaranteed to be non-null, as the address bar could've been removed by the user.)
(In reply to comment #20)
> (From update of attachment 410629 [details] [diff] [review])
> This looks better indeed. Let's go with this!

Great!

> >+        // For sessions with an implicit userTypedValue but before bug 522545,
> >+        // we want to keep going, but we should still throw in other cases.
> 
> I'm not sure if we really want to add code for an uncommon issue in prerelease
> code. Without this, people might lose their tabs just once more than with this
> - won't they? (And should we keep it, s/e/ex/g please.)

I agree. Only reason it's there is because I thought that's what you meant in comment #7:
> (and for backwards compatibility do the same when setURI throws).
I'm happy to take it out, and would rather. Less weirdly-specific-cases-not-in-releases to cover, the better.

> >+    // Handle userTypedValue here. Hopefully values won't get overridden after
> >+    // load events
> 
> Will they?

It's not looking like it, but I'll want to make sure.

> 
> >+        //XXXzpao This is probably going to interact poorly with
> >+        //        restoreDocument_proxy... It works, but I haven't looked at
> >+        //        form data & how that will restore
> 
> This should be fine, as we compare the current URL against the URL of the
> document for which we've saved form data. So, if the previous document never
> finishes loading, we just discard the form data, as it would have been
> discarded anyway, had Firefox crashed just a few seconds later.

The case I'm worried about is being on a page that has form data saved. It's done loading, you've carefully filled out the info to file that bug. Now you type in the URL bar & press enter & we crash. Under normal circumstances, the form data was likely saved and would be restored. If you then went to a different page and pressed back, the form data should still be there. But now that we're forcing that navigation before the form data is restored, pressing back won't bring you back to a form-filled page. (That's mostly speculation, like the XXX says, I haven't actually tested this theory)

> 
> >+        // If it's a userTypedValue but we're not loading the URI, then the
> >+        // URL bar is not updated until you switch tabs back.
> >+        aWindow.gURLBar.value = tabData.userTypedValue;
> 
> How does the address bar get updated when switching tabs? And couldn't we mimic
> a tab switch instead? (Else, AFAICT gURLBar isn't guaranteed to be non-null, as
> the address bar could've been removed by the user.)

I think you can hide the URL bar, but not actually remove it - that's just my gut feeling though. I'll look into it. Mimicking a tab switch could be better regardless and reduce potential code duplication.

I found another worry point. Though since this whole bug is an edge case, it's not too bad. That has to do with calling loadURI with allowThirdPartyFixup=true (right now I'm not setting it, so it's false, which isn't exactly right). After pressing enter in the URL bar, allowThirdPartyFixup=true. Opening a new tab from a link it's false. There's also the issue of refferers from links, but I'm not too worried about that.
(In reply to comment #20)
> (Else, AFAICT gURLBar isn't guaranteed to be non-null, as
> the address bar could've been removed by the user.)

You were right. gURLBar is null when the URL bar is removed via the customize menu.
(In reply to comment #21)
> I agree. Only reason it's there is because I thought that's what you meant in
> comment #7:
> > (and for backwards compatibility do the same when setURI throws).

Well, I now completely disagree with whoever wrote comment #7. ;-)

> The case I'm worried about is being on a page that has form data saved. It's
> done loading, you've carefully filled out the info to file that bug. Now you
> type in the URL bar & press enter & we crash.

Since we wait for about 10 seconds between two save operations, this case should be quite rare (i.e. either we crash before we've saved the updated userTypedClear or the new URL has already been added to session history).

The rest of the case you're concerned about would be a different bug (depending on bug 478107 for getting/setting the form state for a given document in session history).

> I found another worry point. Though since this whole bug is an edge case, it's
> not too bad. That has to do with calling loadURI with allowThirdPartyFixup=true
> (right now I'm not setting it, so it's false, which isn't exactly right). 

IIRC allowThirtPartyFixup is a no-op for proper URLs (which you'll expect when opening a new tab from a content link) and could thus be set to true without harm. Or would there be?
Attached patch Patch v0.2 (obsolete) — Splinter Review
* No longer does anything with the URL bar. It seems to be updating itself now (in my multiple manual tests and included tests which look at the value).
* Got rid of backwards-compatibility code (and test).
* Added more tests. One probably has potential for being a new orange (test_getBrowserState_lotsOfTabsOpening). I think this about covers it, but if there are more cases to look at, I'll add more tests.
Attachment #406646 - Attachment is obsolete: true
Attachment #408937 - Attachment is obsolete: true
Attachment #410629 - Attachment is obsolete: true
Attachment #412087 - Flags: review?(zeniko)
Comment on attachment 412087 [details] [diff] [review]
Patch v0.2

>+      // Done is an executeSoon because dispatching events is asynchronous
>+      executeSoon(function() {

You can ignore that. dispatchEvent is not async.
Comment on attachment 412087 [details] [diff] [review]
Patch v0.2

Looking good.

As for the tests, there are just a few too many blank lines (double and trailing) - and one too many mentions of the word "orange". Also, why not also have one test with an explicit userTypedClear != 0 in the state and a valid URL for userTypedValue?
Attachment #412087 - Flags: review?(zeniko) → review+
(In reply to comment #26)
> (From update of attachment 412087 [details] [diff] [review])
> Looking good.
> 
> As for the tests, there are just a few too many blank lines (double and
> trailing)

I'll clean those up.

> and one too many mentions of the word "orange".

There's only the one! I can take it out and if that test does cause orange, well I'll just comment and say "I told you so" :)

> Also, why not also have one test with an explicit userTypedClear != 0
> in the state and a valid URL for userTypedValue?

I'll add one for that.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Depends on: 528699
I have seen this bug with a 1.9.2 build from 11/08 today. Each time I have restarted the browser the content of tabs hasn't been loaded. Even opening new tabs and loading a web page stalled. Software updates weren't possible too with that profile. So after upgrading with another profile and restarting Firefox with the affected profile again all the symptoms are gone.

I will not mark it verified right now. Would be nice to get some more eyes on it.
(In reply to comment #31)
> I have seen this bug with a 1.9.2 build from 11/08 today. Each time I have
> restarted the browser the content of tabs hasn't been loaded. Even opening new
> tabs and loading a web page stalled. Software updates weren't possible too with
> that profile. So after upgrading with another profile and restarting Firefox
> with the affected profile again all the symptoms are gone.
> 
> I will not mark it verified right now. Would be nice to get some more eyes on
> it.

Please upload or email me your sessionstore.js file.
I have tried to reproduce with a time machine backup around that time but no luck. The same situation cannot be reproduced anymore. I know that Firefox crashed before and I had to restart the browser. No idea if this is somehow related.
(In reply to comment #31)
> I have seen this bug with a 1.9.2 build from 11/08 today.

11/08 or 11/18? 11/08 wouldn't have had the fix. If it was a session saved with a pre-fix build, it would still zombiefy until saved with a post-fix build.
(In reply to comment #34) 
> 11/08 or 11/18? 11/08 wouldn't have had the fix. If it was a session saved with
> a pre-fix build, it would still zombiefy until saved with a post-fix build.

Right. And that was my idea. I have used the session which has been used by the pre-fix build. So it should have been broken whether which pre-fix build I have used. But I wasn't able to reproduce it with the 11/08 build. So we would have to rely on the checked in test and I cannot supply another real world testcase. Sorry.
(In reply to comment #35)
> (In reply to comment #34) 
> > 11/08 or 11/18? 11/08 wouldn't have had the fix. If it was a session saved with
> > a pre-fix build, it would still zombiefy until saved with a post-fix build.
> 
> Right. And that was my idea. I have used the session which has been used by the
> pre-fix build. So it should have been broken whether which pre-fix build I have
> used. But I wasn't able to reproduce it with the 11/08 build. So we would have
> to rely on the checked in test and I cannot supply another real world testcase.
> Sorry.

So you could do it manually by repeating the steps that would have caused the problem in pre/post-fix builds.

1. Open a few tabs with real URLs
2. Open 1 (or more) with a malformed URL (eg. "divx.com"). Tab might need to be in the middle... not sure.
3. Restart same build.

In pre-fix builds you should see the zombification. In post-fix builds, there shouldn't be any zombies.
Verified fixed on trunk and 1.9.2 with builds on OS X and Windows:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a1pre) Gecko/20091119 Minefield/3.7a1pre ID:20091119031914

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2b4pre) Gecko/20091119 Namoroka/3.6b4pre ID:20091119033555

Shiretoko isn't affected by that bug which means that was a regression. Given by Paul it should be bug 497730.
Status: RESOLVED → VERIFIED
Depends on: 537061
Blocks: 558996
You need to log in before you can comment on or make changes to this bug.