Closed Bug 1346634 Opened 6 years ago Closed 5 years ago
Session restore fails, additional about:home tab gets opened
STR: 1) Open site like twitter in fennec 54 aurora 2) Add the site to homescreen 3) Use the android process switching to kill fennec 54 aurora 4) Click on homescreen icon added in (2) 5) Fennec 54 aurora opens, but shows new tab screen instead of site 6) Open fennec tab switcher to see site is there, but in a background tab This seems to be a regressions since FF53.
Doesn't seem to be reproducible in 55. On Aurora, I can also reproduce it by simply having some tabs open, killing and then reopening Firefox, i.e. it's not necessary to use a link shortcut or other kind of external tab to launch Firefox. The extraneous tab is always about:home, even with a custom homepage set and it's appearing only when restoring the previous session. If there were no tabs to restore or the corresponding setting is deactivated, the extra tab doesn't appear. And finally, this extra tab seems to open only around the time Gecko has finished loading.
Our startup events are transmitted to Gecko in the wrong order. While Gecko is still busy loading, the Java UI is parsing the session file and already opening the tabs in the UI (which queues up a number of "Tab:Load" messages for Gecko) and then we send over the processed session file containing the new Tab IDs using "Session:Restore". So normally, when Gecko starts we first open all of those tabs in Gecko and then the session store restores history into each one. On Aurora (and Beta, these events are arriving the wrong way around, that is "Session:Restore" arrives before any of the "Tab:Load" messages. The session store therefore can't find any of the tabs it wants to restore history into, so it just assumes that the user already closed all of them and gives up. Because at that stage there are no tabs open in Gecko, a fail-safe mechanism kicks in and opens a default about:home tab instead. Then, eventually the "Tab:Load" messages arrive as well and are processed by BrowserApp. So in the end those tabs load, but without any of their previous history and state except for the current URL.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
For Aurora and Beta, this patch makes us only notify push observers (and not all observers) from the XPCOM nsThread event queue, to avoid regressions such as session restore breaking. For Nightly, bug 1344892 will address these issue.
Attachment #8846928 - Flags: review?(snorp)
Session restoring is working on Nightly, though, possibly as a side effect of bug 1337467?
Attachment #8846928 - Flags: review?(snorp) → review+
Comment on attachment 8846928 [details] [diff] [review] Only notify push observers from xpcom queue (v1) Approval Request Comment [Feature/Bug causing the regression]: Bug 1337910 [User impact if declined]: Session restore doesn't work. [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No [Why is the change risky/not risky?]: Patch rolls back a lot of changes from bug 1337910, so it's not any riskier. [String changes made/needed]: None
Summary: Session restore fails → Session restore fails, additional about:home tab gets opened
OK, so we uplifted patches from bug 1337910 to beta 53, to fix regressions from bug 1329793 (I think). There is also a request to uplift work from bug 1329793. I'm not feeling confident about how these changes will all work together in beta. We can either aim all of this at 55 or 54, and test there, or uplift everything now for beta 6 build this Thursday, and test in beta 6, with the understanding we may still need to back out work across several bugs. How did the original feature of sending tabs ship in 51 (bug 1289932) with this many problems? What are we not catching in testing that we should be?
Liz, thank you for your efforts. The TLDR; is that I really hope we can uplift everything into the next beta release. I've tried to provide an answer to your very-much-on-point question below. I think a quick answer to your question is that we haven't been doing detailed enough, "holistic" testing of this functionality which would uncover the less obvious issues. Evidently, testing that was performed hasn't been sufficient, and some of the more underlying problems wouldn't be possible to uncover by mere observation until you observe clients under particular conditions for over a longer periods of time - specifically, that's Bug 1329793. The "why" is more complicated. I think it's a manifestation of Conway's law. We have several separate teams working on separate moving pieces which are involved here, with very different priorities. The problems which we've been observing are either at the explicit organizational boundaries such as client/server issues (Bug 1329793), or at implicit boundaries of knowledge, such as side-effects which are hard to anticipate if one doesn't have a complete understanding of everything that's involved (Bug 1337910, this bug...). The latter is made worse when you consider that implementors are from different teams, possibly with non-overlapping areas of expertise, etc. The "why I think it will be better now" is related to the above. There is more alignment across teams now, and more (if not better) communication. This is in part driven by our desire to actually ensure that this feature is reliable, and in part a side-effect of other organizational changes. I also believe that we now have a good understanding of _why_ things were buggy. As to the particulars of uplifts. Bug 1337910 and Bug 1329793 are only related in a sense that they're affecting the same feature. Technically they're not related, one isn't a regression of another. Bug 1329793 needs to be uplifted to 53 if we want to ensure reliability of this feature, as it fixes a design flow in how clients communicate with Autopush servers. While a larger patch, these changes are well isolated to FxA and Sync components, so fwiw I don't believe you're taking on too much risk here. I've outlined my other concerns in the particular uplift request. I can't speak to how changes made in this bug and in Bug 1337910 will interplay with other things happening on beta, perhaps Jim can elaborate. A good actionable item that should come out of this is a more thorough QA plan which would have uncovered the various problems we've unearthed, as well as its periodic execution. A more vague actionable item is better planning of development of other cross-team features.
This bug is a regression introduced by bug 1337910, and the patch here replaces the patch from bug 1337910. Uplifting the patch here is necessary to address the regression on beta 53. As Grisha said, bug 1329793 is unrelated to this bug or bug 1337910, so I believe it's okay to uplift both this patch and the patch from bug 1329793.
Hmm, sorry I sounded a bit harsher than I intended to there. I agree about client/server syncing issues being difficult to implement and test. We can give it a try for beta 6 this week if you can work with QE for manual testing. It is getting a bit late in beta for that but we can give it a shot. Ioana, can you or your team take this on?
Actually since this is for Fennec it would be aimed at beta 7.
Comment on attachment 8846928 [details] [diff] [review] Only notify push observers from xpcom queue (v1) After more discussion with grisha, let's go ahead and uplift. Sounds like he has a test plan in place.
Marking as FIXED because this bug only applies to Aurora and Beta.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Verified as fixed on Beta 53.0b7. Devices: -Asus ZenPad 8.0 Z380KL (Android 6.0.1) -Huawei Honor 5X (Android 5.1.1)
Status: RESOLVED → VERIFIED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.