page not loading on session restoration

RESOLVED FIXED in Firefox 53

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: geobert, Assigned: ckerschb)

Tracking

({regression})

54 Branch
mozilla55
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox52 unaffected, firefox-esr52 unaffected, firefox53+ fixed, firefox54+ fixed, firefox55 fixed)

Details

(Whiteboard: [workaround see comment #8])

Attachments

(3 attachments, 1 obsolete attachment)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0
Build ID: 20170128030204

Steps to reproduce:

on build 20170128030204 when Ffx startup, the restored tabs are blank
the build before was fine


Actual results:

blank tabs


Expected results:

session should be restored and pages loaded
still happening with build id 20170129030205
Component: Untriaged → Session Restore
Does it happen with every page to restore?
Does it happen with the 32bits build?
Yes and yes
regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=75ec51fdcd4e5a6a87f50476e69ed6aeaf30b432&tochange=eaf9152703d8e4277dfb40795be670514ce3c060
Bug 1307736 - Ensure tests performing a history load pass valid triggeringPrincipal. r=bz,mdeboer

I've encountered this issue on latest Nightly 54.0a1 after updating an older profile. I have "Show my windows and tabs from last time" option set. The tabs are no longer loaded after restarting, they remain blank.
Blocks: 1307736
Status: UNCONFIRMED → NEW
Has Regression Range: --- → yes
Ever confirmed: true
Flags: needinfo?(ckerschb)
Keywords: regression
(In reply to Petruta Rasa [QA] [:petruta] from comment #4)
> regression window:
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=75ec51fdcd4e5a6a87f50476e69ed6aeaf30b432&tochange=eaf9
> 152703d8e4277dfb40795be670514ce3c060
> Bug 1307736 - Ensure tests performing a history load pass valid
> triggeringPrincipal. r=bz,mdeboer
> 
> I've encountered this issue on latest Nightly 54.0a1 after updating an older
> profile. I have "Show my windows and tabs from last time" option set. The
> tabs are no longer loaded after restarting, they remain blank.

That's entirely possible. I'll have a look at that, but I am traveling this and next week. I have a look as soon as I can. (leaving ni? until then).
Um...  This is serious dataloss for people updating.  If you can't look into this ASAP, we should back out bug 1307736 until you do have time to look at this.
Flags: needinfo?(tanvi)
Severity: normal → critical
That said, I just tried updating from a 2017-01-27 nightly (to today's nightly), and the session restored correctly.  If I then set that profile (now in the 2017-02-06 nightly) to "restore my tabs from last time", quit, and start up, the tabs do seem to get restored properly.

So it's not as bad as I thought based on comment 6, in that it doesn't affect _all_ attempts to restore session.  What's special about the sessions being restored that _do_ tickle this problem.  Or put another way, what are the steps to reproduce reliably?
Flags: needinfo?(petruta.rasa)
Flags: needinfo?(geobert)
This is indeed either intermittent or difficult to reproduce.
I first saw it on a test machine and I tried without success to reproduce it there. After that, it reproduced on my main station, with the current Nightly profile (I can zip it and send it by email if it helps, although I refreshed it from about:support and it still shows the blank tabs). It was that particular profile that I used to obtain the regression range. New, clean profiles don't reproduce the issue.

It seems that Sea Monkey is also affected, but with the steps from https://bugzilla.mozilla.org/show_bug.cgi?id=1334780#c5 I couldn't reproduce reliably (with or without updates enabled).

According to https://bugzilla.mozilla.org/show_bug.cgi?id=1307736#c65 pinned tabs are also affected. I couldn't reproduce with those either.

The blank tabs can't be loaded with F5 or refresh button, a workaround is to press Enter while the location bar is selected.

Is there a way to update from 2017-01-25 Nightly to 2017-01-28 or 29?
Flags: needinfo?(petruta.rasa)
See Also: → 1334780
> Is there a way to update from 2017-01-25 Nightly to 2017-01-28 or 29?

Gijs, do you know who might know?  I can't even find module owners for the updater on https://wiki.mozilla.org/Modules :(
Flags: needinfo?(gijskruitbosch+bugs)
Maybe Robert....
Flags: needinfo?(robert.strong.bugs)
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #9)
> > Is there a way to update from 2017-01-25 Nightly to 2017-01-28 or 29?
> 
> Gijs, do you know who might know?  I can't even find module owners for the
> updater on https://wiki.mozilla.org/Modules :(

Rob or :mhowell are good for app update, I think.

(In reply to Petruta Rasa [QA] [:petruta] from comment #8)
> This is indeed either intermittent or difficult to reproduce.
> I first saw it on a test machine and I tried without success to reproduce it
> there. After that, it reproduced on my main station, with the current
> Nightly profile (I can zip it and send it by email if it helps, although I
> refreshed it from about:support and it still shows the blank tabs). It was
> that particular profile that I used to obtain the regression range. New,
> clean profiles don't reproduce the issue.

With this profile where you can reproduce, are there errors in the browser console when this happens? It sounds like the load is failing - there should be information where and why.

> It seems that Sea Monkey is also affected, but with the steps from
> https://bugzilla.mozilla.org/show_bug.cgi?id=1334780#c5 I couldn't reproduce
> reliably (with or without updates enabled).
> 
> According to https://bugzilla.mozilla.org/show_bug.cgi?id=1307736#c65 pinned
> tabs are also affected. I couldn't reproduce with those either.
> 
> The blank tabs can't be loaded with F5 or refresh button, a workaround is to
> press Enter while the location bar is selected.

This would be expected. The tabs have about:blank loaded and so that's what refresh will (re)load. When no load happens, we leave the URL bar value stored per tab, which is why the value is there. Hitting enter will then load that URL as it would for any other URL.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(petruta.rasa)
> (In reply to Petruta Rasa [QA] [:petruta] from comment #8)
> With this profile where you can reproduce, are there errors in the browser
> console when this happens? It sounds like the load is failing - there should
> be information where and why.

There are no errors in web or browser console (same as https://bugzilla.mozilla.org/show_bug.cgi?id=1334780#c0). The little Reload icon from tab is not shown at all when selecting an affected tab after restart. 
I just noticed that selecting "Reload Tab" from tab bar contextual menu reloads the page.
Flags: needinfo?(petruta.rasa)
There is no simple way since the update server doesn't have an advertisement from that build to the one you want to update to. Installing the build you want to update to on top of the existing install should be sufficient though since updating doesn't touch profiles. The main difference between the two is the way the application is launched after an update.

If you still want to try updating after installing on top of the existing install let me know and I'll try to put together an update xml that you can use to update between those builds.
Flags: needinfo?(robert.strong.bugs)
About what causes it: see also bug 1334780 comment #6 and #7
Whiteboard: [workaround see comment #8]
This bug also affects SeaMonkey (as bug 1334780), and also affects Linux builds. Is there some shared Product/Component where the one could be moved and the other duped? Toolkit/Places maybe?
OS: Unspecified → All
Hardware: Unspecified → All
Hmm.  So for Firefox, not Seamonkey, this would mean updating from a pre-52 build (i.e. something before bug 1297338 got fixed, or maybe even pre-50 if it needs to be before bug 1286472) to a current one?  Or something else?
Component: Session Restore → Document Navigation
Product: Firefox → Core
But note that depending on what the actual issue the answer may be "you need to fix this in the SeaMonkey sessionstore code".
Unfortunately I can't reproduce the issue. As mentioned in various comments it's rather intermittent and hard to reproduce. I briefly chatted with Kamil, he will take a look at it.

(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #6)
> Um...  This is serious dataloss for people updating.  If you can't look into
> this ASAP, we should back out bug 1307736 until you do have time to look at
> this.

I would rather not back out Bug 1307736 completely, since in most cases it works correctly. Until we can reproduce the issue reliably I propose a partial revert of functionality within Bug 1337622. Instead of returning an error within LoadHistoryEntry if there is no triggeringPrincipal, we could keep the assertion but we could also fall back to using the Systemprincipal. In other words, keep the assertion for debug builds but in release builds we go back to the old semantics.
Flags: needinfo?(ckerschb) → needinfo?(kjozwiak)
Depends on: 1337622
I am assigning that bug to me to make sure it gets fixed. Hopefully Kamil can help me find some reliable STRs.
Assignee: nobody → ckerschb
Status: NEW → ASSIGNED
Duplicate of this bug: 1336799
[Tracking Requested - why for this release]: dataloss, reproducible on 53.0a2

I am seeing the problem just now in Firefox53.0a2.
When the problem occurred, Browser console shows the following error.

NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsISerializationHelper.nsISerializationHelper] 1 Utils.jsm:135
Better steps to (In reply to Alice0775 White from comment #21)
> [Tracking Requested - why for this release]: dataloss, reproducible on 53.0a2
> 
> I am seeing the problem just now in Firefox53.0a2.

Alice, I don't suppose you've succeeded in finding reliable STR?

Alternatively, if you have an unmodified session store file that caused this issue in 53, that might help with reproducing (specifically, the values of the triggeringPrincipal_base64 fields)

> When the problem occurred, Browser console shows the following error.
> 
> NS_ERROR_FAILURE: Component returned failure code: 0x80004005
> (NS_ERROR_FAILURE) [nsISerializationHelper.nsISerializationHelper] 1
> Utils.jsm:135

This is inside deserializePrincipal, so nsISerializationHelper's deserializeObject has thrown. Most callers are inside try...catch blocks that would log their own message (e.g. the ones in SessionHistory.jsm) but the ones in browser-child.js and ContentRestore.jsm aren't. At a minimum, it seems like we should just pass a null triggering principal there if deserializing the principal fails (which could happen for a number of reasons).
Flags: needinfo?(alice0775)
(In reply to :Gijs from comment #22)
> Better steps to (In reply to Alice0775 White from comment #21)
> > [Tracking Requested - why for this release]: dataloss, reproducible on 53.0a2
> > 
> > I am seeing the problem just now in Firefox53.0a2.
> 
> Alice, I don't suppose you've succeeded in finding reliable STR?
 
I do not have STR.


> Alternatively, if you have an unmodified session store file that caused this
> issue in 53, that might help with reproducing (specifically, the values of
> the triggeringPrincipal_base64 fields)

The session was already overwritten...
Flags: needinfo?(alice0775)
(In reply to :Gijs from comment #22)
> This is inside deserializePrincipal, so nsISerializationHelper's
> deserializeObject has thrown. Most callers are inside try...catch blocks
> that would log their own message (e.g. the ones in SessionHistory.jsm) but
> the ones in browser-child.js and ContentRestore.jsm aren't. At a minimum, it
> seems like we should just pass a null triggering principal there if
> deserializing the principal fails (which could happen for a number of
> reasons).

I agree! This would mean that
1) we'd add a try...catch in Utils#deserializePrincipal, along with a debug() method as in SessionHistory.jsm - which may also log the principal_b64 data it failed to deserialize
2) return `null` upon failure and
3) clean out the try..catches from SessionHistory.jsm

Thanks for the info, Alice!
I reproduce the bug just now(53.0a2).
I attached a zipped sessionstore.

STR(using attached zip)
1. Create new profile. Extract zip file.
2. Start Aurora53.0a2
3. Menu > History > Restore Previous Session 

AR
Some tabs fail restore page
Duplicate of this bug: 1335816
Kamil, probably Bug 1335816 might also help finding reliable STRs.
Depends on: 1338009
(In reply to Mike de Boer [:mikedeboer] from comment #24)
> I agree! This would mean that
> 1) we'd add a try...catch in Utils#deserializePrincipal, along with a
> debug() method as in SessionHistory.jsm - which may also log the
> principal_b64 data it failed to deserialize
> 2) return `null` upon failure and
> 3) clean out the try..catches from SessionHistory.jsm

I filed Bug 1338009 to get that ready.
Flags: needinfo?(tanvi)
UA:"Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0 SeaMonkey/2.51a1" 
ID:20170213003001 en-US 
c-c:da2e8137b3dfe320d00c5b79f5f3c07e82225efe 
m-c:00d16f03506b7f9f754b01a0a458c05445ac6dba

This new SeaMonkey nightly does not suffer from the bug, even though no fix was landed on the SeaMonkey side. I assume that the SeaMonkey bug and this (Core) bug stem from a common cause and that they were fixed together, possibly in bug 1338009.
Tracking since this may be a cause of intermittent data loss on update. 
If we think that bug 1338009 really fixed the issue, what about uplifting that fix to 53 aurora?
Flags: needinfo?(ckerschb)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> Tracking since this may be a cause of intermittent data loss on update. 
> If we think that bug 1338009 really fixed the issue, what about uplifting
> that fix to 53 aurora?

Thanks Liz, that sounds like a good plan to me. Even though we are not 100% certain that Bug 1338009 fixed the issue, the changeset within bug 1338009 is very safe to be be uplifted.

I will continue to look into the problem described in this bug, but as a first measure we should get Bug 1338009 uplifted. I will file uplift requests for that.
Flags: needinfo?(ckerschb)
Sorry for the lack of feedback here.

Since bug 1338009 landed, I do not encounter the bug any more.
Note that bug 1337622 also landed, so it's likely that reproducing this in a debug build is not impossible.

But in a debug build I'm hitting the relevant assertion in today's nightly for this case:

1)  Start browser.
2)  Load data:text/html,<div style="overflow: scroll; width:130px; white-space: nowrap" onclick="this.style.textOverflow = 'ellipsis'">Please click this long text</div>
3)  Kill the browser (so the next startup will session restore).
4)  Start browser.

I get the fatal assert when we sessionrestore that data: URI.

This is rather annoying, actually... makes it impossible to debug the things I really care about.
Fwiw, I'm also seeing the "Assertion failure: triggeringPrincipal (need
a valid triggeringPrincipal to load from history)" in my debug builds
on Linux every now and then.  It's fairly reproducible:
1. use a non-e10s profile
2. load a testcase from the file system that causes a crash
3. repeat that a few times
4. now start the browser and click the Restore button a few times
Oh, and in 4, make sure you have fixed what caused the crash in 2 first! :-)
ckerschb, can we disable the assertion until the assertion failures from comment 34 and 35 are fixed, please? Now that you have steps to repro there's no need to disrupt other debug build users.

I can also provide you with another profile that triggers assertions in both nsDocShell::LoadHistoryEntry and nsSHEntry::SetTriggeringPrincipal if that helps.
Flags: needinfo?(ckerschb)
Blocks: 1345433
(In reply to Jonathan Watt [:jwatt] from comment #37)
> ckerschb, can we disable the assertion until the assertion failures from
> comment 34 and 35 are fixed, please? Now that you have steps to repro
> there's no need to disrupt other debug build users.

Yes, we can. Given that we fall back to using the systemPrincipal in case no triggeringPrincipal is provided, we can also 'temporarily' remove the assertion.

I created Bug 1345433 to bring back the assertion. Within Bug 1345433 we can also try the STRs from this bug and make sure everything works before bringing back the assertion.
Flags: needinfo?(kjozwiak)
Flags: needinfo?(geobert)
Flags: needinfo?(ckerschb)
Attachment #8844862 - Flags: review?(jwatt)
Comment on attachment 8844862 [details] [diff] [review]
bug_1334875_disable_history_assertion.patch

Review of attachment 8844862 [details] [diff] [review]:
-----------------------------------------------------------------

Restoring from one of my profiles also causes us to fail the assertion in nsSHEntry::SetTriggeringPrincipal, but maybe that's a separate issue?
Attachment #8844862 - Flags: review?(jwatt) → review+
(In reply to Jonathan Watt [:jwatt] from comment #39)
> Restoring from one of my profiles also causes us to fail the assertion in
> nsSHEntry::SetTriggeringPrincipal, but maybe that's a separate issue?

In fact we should disable that assertion as well because we otherwise return NS_ERROR_FAILURE. Given that, we should also uplift that patch.

We can than handle both cases within Bug 1345433.
Attachment #8844876 - Flags: review+
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #40)
> (In reply to Jonathan Watt [:jwatt] from comment #39)
> > Restoring from one of my profiles also causes us to fail the assertion in
> > nsSHEntry::SetTriggeringPrincipal, but maybe that's a separate issue?

If you have STRs though, please add them to Bug 1345433 - thanks.
Pushed by mozilla@christophkerschbaumer.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0678fa302154
Temporarily remove assertion that history loads pass a valid triggeringprincipal. r=jwatt
https://hg.mozilla.org/mozilla-central/rev/0678fa302154
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #40)
> (In reply to Jonathan Watt [:jwatt] from comment #39)
> > Restoring from one of my profiles also causes us to fail the assertion in
> > nsSHEntry::SetTriggeringPrincipal, but maybe that's a separate issue?
> 
> In fact we should disable that assertion as well because we otherwise return
> NS_ERROR_FAILURE. Given that, we should also uplift that patch.

You seem to have removed the |return NS_ERROR_FAILURE| in the patch you committed.
Flags: needinfo?(ckerschb)
(In reply to Jonathan Watt [:jwatt] from comment #44)
> (In reply to Christoph Kerschbaumer [:ckerschb] from comment #40)
> > (In reply to Jonathan Watt [:jwatt] from comment #39)
> > > Restoring from one of my profiles also causes us to fail the assertion in
> > > nsSHEntry::SetTriggeringPrincipal, but maybe that's a separate issue?
> > 
> > In fact we should disable that assertion as well because we otherwise return
> > NS_ERROR_FAILURE. Given that, we should also uplift that patch.
> 
> You seem to have removed the |return NS_ERROR_FAILURE| in the patch you
> committed.

That was on purpose. In fact on all branches (central, aurora, beta) we now fall back to using the SystemPrincipal in case *no* principal is provided [1]. Hence I not only removed the assertion, but also removed the |return NS_ERROR_FAILURE| in that case. I want to avoid any more surprises that Bug 1307736 might have introduced. With [1] we converted the old behavior. Hence I will also ask to uplift this bug and will basically go back to the start with adding the assertion within Bug 1345433.

[1] https://dxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#12609
Flags: needinfo?(ckerschb)
Attachment #8844862 - Attachment is obsolete: true
Comment on attachment 8844876 [details] [diff] [review]
bug_1334875_disable_history_assertion.patch

Approval Request Comment
It seems to make sense to have consistent behavior on all branches. Since we landed Bug 1337622 where we fall back to systemPrincipal if there is not triggeringPrincipal for history loads it makes sense to also remove the assertion on all of those branches as well so we have consistent behavior between assertions and actual error behavior.

[Feature/Bug causing the regression]:
Bug 1307736 - Assert history loads pass a valid triggeringPrincipal for docshell loads

[User impact if declined]:
Most likely none of the users would notice the difference, but it would be good to have consistent behavior between assertions and error cases.

[Is this code covered by automated tests?]:
No - but we have converted to old behavior within Bug 1337622. This bug only removes the corresponding assertions.

[Has the fix been verified in Nightly?]:
No, and I don't think there is anything to verify.

[Needs manual test from QE? If yes, steps to reproduce]:
No

[List of other uplifts needed for the feature/fix]:
Bug 1337622 fixed the actual problem.

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
Not risky, only removing assertions.

[String changes made/needed]:
No
Attachment #8844876 - Flags: approval-mozilla-beta?
Attachment #8844876 - Flags: approval-mozilla-aurora?
Comment on attachment 8844876 [details] [diff] [review]
bug_1334875_disable_history_assertion.patch

Removing assertions to make the code consistent across various trains, makes sense, Aurora54+, Beta53+
Attachment #8844876 - Flags: approval-mozilla-beta?
Attachment #8844876 - Flags: approval-mozilla-beta+
Attachment #8844876 - Flags: approval-mozilla-aurora?
Attachment #8844876 - Flags: approval-mozilla-aurora+
Setting qe-verify- based on Christoph's assessment on manual testing needs (see Comment 46).
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.