Closed
Bug 676780
Opened 13 years ago
Closed 13 years ago
Fennec is unable to load webpages and close tabs
Categories
(Firefox for Android Graveyard :: General, defect, P1)
Tracking
(firefox6 fixed, firefox7 fixed, firefox8 fixed, firefox9 fixed, fennec+)
People
(Reporter: xti, Assigned: mfinkle)
References
Details
(Keywords: relnote)
Attachments
(2 files)
972 bytes,
patch
|
mbrubeck
:
review+
mfinkle
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
4.63 KB,
patch
|
mbrubeck
:
review+
wesj
:
feedback+
asa
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:6.0)Gecko/20110804 Firefox/6.0 Fennec/6.0 Device: HTC Desire Z OS: Android 2.3.3 Steps to reproduce: 1. Open Fennec App 2. Go to Preferences > Language 3. Change the language pack for a couple of times 4. Install Quit Fennec add-on 5. Go to Preferences > Feedback Tools tab and enable the Error Console 6. Browse to www.google.com 7. Close the tab opened at step 6. Expected result: After step 6, new tab is opened and the page is loaded completely. After step 7, the tab is closed. Actual result: Fennec is unable to load any webpage, except those one with special protocols (like about: file: etc). The X button on the left side of the tab thumbnails doesn't work at all. Notes: - Please see the following video: http://www.youtube.com/user/qaioana#p/u/27/nV4oyPLOFYY - After step 7, if the Fennec process is killed and then the app is reopened, all installed add-on are visibly disabled even if they are listed in the Add-ons Manager and the Preferences/Beta tab is blank. Also the Error Console Tab is missing.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [fennec 6.0b5]
Comment 1•13 years ago
|
||
Build: Mozilla/5.0 (Android; Linux armv7l; rv:6.0) Gecko/20110804 Firefox/6.0 Fennec/6.0 Device: Samsung Galaxy SII OS: Android 2.3.3 I can reproduce this, raising severity - although the steps to reproduce are convoluted, Cristian, can you simplify this?
Severity: normal → blocker
Priority: -- → P1
Comment 2•13 years ago
|
||
Force stopping the application, restarting it all I get is a white screen unable to interact with the browser whatsoever
Assignee | ||
Comment 3•13 years ago
|
||
(In reply to Aaron Train [:aaronmt] from comment #2) > Force stopping the application, restarting it all I get is a white screen > unable to interact with the browser whatsoever What happens after clearing the profile using Settings > Applications ... ?
Reporter | ||
Comment 4•13 years ago
|
||
I'm working for a simple way to reproduce this issue. But even if the profile is cleared, if those steps are performed again, it occurs again. As far, I was able to reproduce it 100%.
Comment 5•13 years ago
|
||
I can reproduce this with simpler STR New Profile - Change to deutsch, restart - After first-run animation, tap URL bar, tap magnifying glass and tap Google
Updated•13 years ago
|
Severity: blocker → major
Comment 6•13 years ago
|
||
As per mentioned on IRC: "Quitting" (via Addon install) Firefox Beta restores functionality, and that there may be a possibility of a corrupt file
Comment 7•13 years ago
|
||
Unable to reproduce on Aurora (7.0a2) (08/05)
Comment 8•13 years ago
|
||
I cannot reproduce this in 6.0b4 or 6.0b5 using clean installs with new profiles on Samsung Galaxy Tab 7" (Android 2.2), HTC T-Mobile G2 (2.2), or Motorola Xoom (3.2), using the steps from comment 5.
Updated•13 years ago
|
STR: 1. go to about:firefox 2. change language to french, restart 3. long tap on Assistance, open in a new tab (first selection)
Seems to only occur on 2.3 (used Flyer); does not seem to occur on 2.2 (thunderbolt), nor 3.x (thrive)
Comment 11•13 years ago
|
||
per irc in #mobile, this has been seen before in b3, b4. This is not a new regression in b5.
Reporter | ||
Comment 12•13 years ago
|
||
(In reply to Naoki Hirata :nhirata from comment #10) > Seems to only occur on 2.3 (used Flyer); does not seem to occur on 2.2 > (thunderbolt), nor 3.x (thrive) I was able to reproduce this issue on LG Optimus 2X - Android 2.2
Comment 13•13 years ago
|
||
Chris, do you might have a clue what is going on here? Perhaps this has a relationship to bug 669289?
Comment 14•13 years ago
|
||
We need to get to the bottom of this. If we are going to respin it needs to be today or tomorrow.
Comment 15•13 years ago
|
||
Could not reproduce in Firefox Beta on a Samsung Galaxy Tab running Gingerbread.
Assignee | ||
Comment 16•13 years ago
|
||
(In reply to Christian Legnitto [:LegNeato] from comment #14) > We need to get to the bottom of this. If we are going to respin it needs to > be today or tomorrow. We are collecting more data about the bug, but given the current situation, this bug does not block any release. If new data on frequency and ease-of-reproducing come to light, we can re-assess.
Comment 17•13 years ago
|
||
bug 660185 which is the same thing as this referenced an error uncaught exception: [Exception... "Node was not found" code: "8" nsresult: "0x80530008 (NS_ERROR_DOM_NOT_FOUND_ERR)" location: "chrome://browser/content/browser.js Line: 2781"]
Comment 18•13 years ago
|
||
The symptoms described here indicate that the child process is 'stuck', that is, not responding properly. However it is responding enough for the parent process to not know it is in a bad state (in which case the parent would have restarted it). We saw this in the past with threading problems in the audio code for example. Given the severity of this situation (can't close or use tabs, can't easily get to a working state), perhaps we should add code to check if the child is in this state, and kill it if it is (or do we already have that)? Something like sending a high-level message (same level as 'close tab', which fails in this situation) and making sure we get a valid response after some (long) time. Or perhaps there is some lower-level way to do this. cjones, what do you think?
Comment 19•13 years ago
|
||
I can't reproduce this on Nightly on a Galaxy S or linux desktop. (In reply to Aaron Train [:aaronmt] from comment #5) > I can reproduce this with simpler STR > > New Profile > - Change to deutsch, restart > - After first-run animation, tap URL bar, tap magnifying glass and tap Google Why is the first-run animation showing in this case? This is the second run (during first run in this profile you change the language) or am I misunderstanding these STR?
Comment 20•13 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #19) > I can't reproduce this on Nightly on a Galaxy S or linux desktop. > > (In reply to Aaron Train [:aaronmt] from comment #5) > > I can reproduce this with simpler STR > > > > New Profile > > - Change to deutsch, restart > > - After first-run animation, tap URL bar, tap magnifying glass and tap Google > > Why is the first-run animation showing in this case? This is the second run > (during first run in this profile you change the language) or am I > misunderstanding these STR? It lands on about:firstrun after the restart
Comment 21•13 years ago
|
||
Aurora (8.0a1) HTC Evo Following STR in comment #5 I get http://www.flickr.com/photos/ozten/6026742774/in/photostream Cleared Data. Following STR in comment #9, I could not reproduce. Switched to Duetch. Again same error as above.
Comment 22•13 years ago
|
||
(In reply to Austin King [:ozten] from comment #21) > Aurora (8.0a1) HTC Evo > > Following STR in comment #5 I get > http://www.flickr.com/photos/ozten/6026742774/in/photostream > > Cleared Data. > > Following STR in comment #9, I could not reproduce. > > Switched to Duetch. Again same error as above. Unrelated - that is bug -> 674830
(In reply to Alon Zakai (:azakai) from comment #18) > The symptoms described here indicate that the child process is 'stuck', that > is, not responding properly. However it is responding enough for the parent > process to not know it is in a bad state (in which case the parent would > have restarted it). We saw this in the past with threading problems in the > audio code for example. I don't know what state you're referring to here, so I don't know how we would detect or fix it. Have to a link to one of old bugs you're referring to? If anyone is able reproduce this, we should attach gdb and see what's going on.
Comment 24•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23) > (In reply to Alon Zakai (:azakai) from comment #18) > > The symptoms described here indicate that the child process is 'stuck', that > > is, not responding properly. However it is responding enough for the parent > > process to not know it is in a bad state (in which case the parent would > > have restarted it). We saw this in the past with threading problems in the > > audio code for example. > > I don't know what state you're referring to here, so I don't know how we > would detect or fix it. Have to a link to one of old bugs you're referring > to? > > If anyone is able reproduce this, we should attach gdb and see what's going > on. There's a handful of folks in qa that said they can reproduce. What do you need from us? You're welcome to borrow any device if you're local.
Sorry, remote. I would like to know what the plugin-container process is doing. If someone can attach gdb to plugin-container, then get the output of |thread apply all backtrace| and attach it here, that would help quite a bit.
Comment 26•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23) > (In reply to Alon Zakai (:azakai) from comment #18) > > The symptoms described here indicate that the child process is 'stuck', that > > is, not responding properly. However it is responding enough for the parent > > process to not know it is in a bad state (in which case the parent would > > have restarted it). We saw this in the past with threading problems in the > > audio code for example. > > I don't know what state you're referring to here, so I don't know how we > would detect or fix it. Have to a link to one of old bugs you're referring > to? > Sorry for not being clearer. I finally managed to find the bug I meant before, bug 634407. The main thread deadlocked there, leading to exactly the same symptoms as in this bug (pages don't load, can't close tabs, etc. - child is frozen but not crashed). Do we have any 'keepalive' type checks in the IPC code? (I mean, that the parent decides the child must be restarted if it doesn't respond to some periodic message?) > If anyone is able reproduce this, we should attach gdb and see what's going > on. I agree that that is the way to go for this bug, so just to clarify, my questions above are more regarding a general approach to prevent such problems in the future. If you agree that some type of 'keepalive' check that would catch cases like this might make sense, I'll file a separate bug for that.
A keepalive check is a good idea, but seems to me like it would be hard to implement well (wrt battery, allowed blocking waits, etc.). Firefox-on-desktop is vulnerable to these kinds of deadlock bugs too, but we haven't had the need to implement a main-thread keepalive or watchdog there yet (except for OOPP, but that's an easy problem). I wonder why. Maybe without content processes, these deadlocks are so bad as to be stop-the-world-and-fix, but with content processes things seem to /almost/ work well enough that perceived priority is lower. Before trying something really hard (keepalive), maybe an easier solution would work well enough: if what we lack is feedback from users on these kinds of problems, maybe we can implement a "Kill content process" button, maybe just for unofficial builds. The button would kill -11 the content process, which would give us a crash report and unwedge whatever the user was trying to do. We could note in the crash report that is was generated by the "kill switch". How hard would it be to add a kill switch to the UI?
Assignee | ||
Comment 28•13 years ago
|
||
This patch is for mozilla-beta. Aaron was able to get a copy of his profile folder in the corrupted state. One thing I noticed in sessionstore.js was the "selected" index was out-of-bounds. It was greater than the stored number of tabs. The session restore code wasn't protecting against that. I made a Fennec 6.0b5 build on Linux desktop. I used Aaron's sessionstore.js file and was able to experience the busted state. Putting the "selected" check in the code kept the busted state from happening. I do not know if this will fix the problem when running on a Android phone. I have not been able to reproduce it. Likewise, no one has been able to reproduce it using my test builds. However, it's a good start.
Assignee: nobody → mark.finkle
Attachment #552002 -
Flags: review?(mbrubeck)
Comment 29•13 years ago
|
||
Comment on attachment 552002 [details] [diff] [review] patch 1 r=mbrubeck Nominating for approval-mozilla-beta. This patch is mobile-only and extremely low-risk. For a non-corrupt sessionstore file, it will have no effect. For the specific corruption we were able to reproduce, it simply ignores the corrupt value.
Attachment #552002 -
Flags: review?(mbrubeck)
Attachment #552002 -
Flags: review+
Attachment #552002 -
Flags: approval-mozilla-beta?
Comment 30•13 years ago
|
||
Built this with the patch http://people.mozilla.org/~kbrosnan/tmp/676780/fennec-6.0.en-US.eabi-arm.apk it is en-US only but I think you can add language packs from the previous beta to test http://ftp.mozilla.org/pub/mozilla.org/mobile/releases/latest-beta/linux/
Reporter | ||
Comment 31•13 years ago
|
||
(In reply to Kevin Brosnan [:kbrosnan] from comment #30) > Built this with the patch > http://people.mozilla.org/~kbrosnan/tmp/676780/fennec-6.0.en-US.eabi-arm.apk > it is en-US only but I think you can add language packs from the previous > beta to test > http://ftp.mozilla.org/pub/mozilla.org/mobile/releases/latest-beta/linux/ I've installed several language packs and I've performed the str and it works fine. It seems that the patch has fixed this issue.
Comment 32•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #27) > How hard would it be to add a kill switch to the UI? I think it would be bad UX to have one, actually. We should check if content processes are alive routinely, just like we do with plugin processes, and kill/restart them (and send a hang report pair) if they don't react to keep-alive pings from the main process within a certain timeout.
Updated•13 years ago
|
Keywords: checkin-needed
Updated•13 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 33•13 years ago
|
||
This patch is for mozilla-central. It adds the same "selected" clamp check. It also adds some better protection for failed session restores, making sure the browser is left in a "good" state. (borrowed the session restore addition from Wes)
Attachment #552069 -
Flags: review?(mbrubeck)
Attachment #552069 -
Flags: feedback?(wjohnston)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #27) > > How hard would it be to add a kill switch to the UI? > > I think it would be bad UX to have one, actually. We should check if content > processes are alive routinely, just like we do with plugin processes I think there's a misunderstanding: (i) we don't do that for plugin processes; (ii) that's hard to do right; (iii) "kill switch" proposal is for nightlies, not release (not a UX concern). But since the data we could gather with that wouldn't have helped identify this bug, no need to bother.
Comment 35•13 years ago
|
||
Comment on attachment 552069 [details] [diff] [review] patch for m-c Review of attachment 552069 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine to me (but I wrote it). "fail" is probably a bit general. We could send back a real error code if we want to be fancy.
Attachment #552069 -
Flags: feedback?(wjohnston) → feedback+
Assignee | ||
Comment 36•13 years ago
|
||
Comment on attachment 552002 [details] [diff] [review] patch 1 we are taking this and will spin a build for QA to test. land on mozilla-beta
Attachment #552002 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 37•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #34) > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32) > > (In reply to Chris Jones [:cjones] [:warhammer] from comment #27) > > > How hard would it be to add a kill switch to the UI? > > > > I think it would be bad UX to have one, actually. We should check if content > > processes are alive routinely, just like we do with plugin processes > > I think there's a misunderstanding: (i) we don't do that for plugin > processes; (ii) that's hard to do right; (iii) "kill switch" proposal is for > nightlies, not release (not a UX concern). > > But since the data we could gather with that wouldn't have helped identify > this bug, no need to bother. We can add a kill switch for nightlies or as an addon, and that could help with manual investigation of future problems that have these symptoms, once we are aware of those specific problems. But it wouldn't help prevent normal users from getting stuck with those symptoms, that is, with an unusable browser. And it wouldn't get us automatic reports from those users about that kind of problem. An automatic keepalive system could do both of those things I think.
Assignee | ||
Comment 38•13 years ago
|
||
simple beta patch http://hg.mozilla.org/releases/mozilla-beta/rev/9f2b11f2bd1f not fixed on trunk yet
Assignee | ||
Updated•13 years ago
|
tracking-fennec: ? → 6+
(In reply to Alon Zakai (:azakai) from comment #37) > But it wouldn't help prevent normal users from getting stuck with those > symptoms, that is, with an unusable browser. And it wouldn't get us > automatic reports from those users about that kind of problem. An automatic > keepalive system could do both of those things I think. Yep. My only concern is that a keepalive impl doesn't seem trivial at all, so would want commensurate benefit to attack. It wouldn't have caught this bug, e.g.
Comment 40•13 years ago
|
||
Comment on attachment 552069 [details] [diff] [review] patch for m-c This mobile-only change is also needed on Aurora for Firefox 7 to fix this bug and bug 669289. Both bugs render Firefox unusable and can cause dataloss when the user needs to wipe their profile to continue.
Attachment #552069 -
Flags: review?(mbrubeck)
Attachment #552069 -
Flags: review+
Attachment #552069 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 41•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/b7d5fd20d40a
Comment 42•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #34) > I think there's a misunderstanding: (i) we don't do that for plugin > processes; Erm, what else is it that kills plugins when they don't react and sends hang reports to us? Still, I feel like this belongs into a followup bug.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #42) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #34) > > I think there's a misunderstanding: (i) we don't do that for plugin > > processes; > > Erm, what else is it that kills plugins when they don't react and sends hang > reports to us? Still, I feel like this belongs into a followup bug. That's a very narrow and simple mechanism: when the browser makes a blocking request to the plugin (on behalf of web content), and the plugin process doesn't reply within X seconds, then we declare it hung and invoke all that machinery. If there were never any blocking requests browser-->plugin, we would never detect hangs within the current system. A keepalive mechanism is quite different.
Comment 44•13 years ago
|
||
Filed bug 678073 to discuss the keepalive idea.
Comment 45•13 years ago
|
||
Merged: http://hg.mozilla.org/mozilla-central/rev/b7d5fd20d40a
No longer blocks: 678159
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [fennec 6.0b5]
Target Milestone: --- → Firefox 8
Version: Firefox 6 → Trunk
Updated•13 years ago
|
Attachment #552069 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 47•13 years ago
|
||
http://hg.mozilla.org/releases/mozilla-aurora/rev/a3d8fc803419
Severity: major → blocker
tracking-fennec: 6+ → ---
status-firefox8:
--- → fixed
Target Milestone: Firefox 8 → Firefox 6
Version: Trunk → Firefox 6
Reporter | ||
Comment 48•13 years ago
|
||
I'm able to reproduce this issue on Firefox 6 RC on a clean profile. I will reopen this bug. -- Build id : Mozilla/5.0 (Android;Linux armv7l;rv:6.0)Gecko/20110811 Firefox/6.0 Fennec/6.0 Device: HTC Desire Z OS: Android 2.3.3 Build id : Mozilla/5.0 (Android;Linux armv7l;rv:6.0)Gecko/20110811 Firefox/6.0 Fennec/6.0 Device: LG Optimus 2X OS: Android 2.2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 49•13 years ago
|
||
(In reply to Cristian Nicolae (:xti) from comment #48) > I'm able to reproduce this issue on Firefox 6 RC on a clean profile. I will > reopen this bug. > > -- > Build id : Mozilla/5.0 (Android;Linux armv7l;rv:6.0)Gecko/20110811 > Firefox/6.0 Fennec/6.0 > Device: HTC Desire Z > OS: Android 2.3.3 > > Build id : Mozilla/5.0 (Android;Linux armv7l;rv:6.0)Gecko/20110811 > Firefox/6.0 Fennec/6.0 > Device: LG Optimus 2X > OS: Android 2.2 Can we reproduce this in-house? Can we confirm the RC was built with this fix?
Comment 50•13 years ago
|
||
Yes, the RC was built with this fix. We aren't just seeing bug 678159?
(In reply to Christian Legnitto [:LegNeato] from comment #50) > Yes, the RC was built with this fix. We aren't just seeing bug 678159? AFAIK, they are the same bug. There's 2 parts to the bug. A corruption in the session-profile and a hang in the process or something similar when the session profile is corrupt. 1) This bug fixes quitting of the bug, but getting to the state where the the remote pages won't load still exists. The current state is with this fix, when you quit the app, you don't have to clear data on the app to get fennec to work again. 2) However the part where the session hangs still exists, so you can still hit the state. The work around is to quit the app and restart it at that point.
Comment 52•13 years ago
|
||
I can reproduce this 100% of the time doing the following on the Samsung Galaxy SII/2.3.3 1. Clear profile of Firefox Beta 6 RC 2. Reboot device 3. Immediately following startup and back to home-screen, launch Firefox Beta 4. Change language, Russian, Restart 5. After first-run animation, tap awesome-bar, tap magnifying glass and tap Google Steps in YouTube -> http://www.youtube.com/watch?v=hip4JFcrlYg
I was able to reproduce this issue without a change of languages but strictly with adding and disabling addons. 1. go to about:home 2. add cleary, clear mobile history, homeskin, Bigger Text, personas, Phony, quit Firefox for Mobile, URL Fixer, readability. 3. restart the app 4. once the app starts quickly long tap on about:home and select open in new tab 5. close the new tab while it's loading Can't close the tab.
Force Quitting the app will restore the app with a backup of the sessionstate, there is a slight data loss from the last time the sessionstate was backedup, however it is better than before where the user was forced to clear data from the app.
Comment 55•13 years ago
|
||
I can reproduce this only when there is one tab with an about: page, I can't reproduce when the first tab is a http:// url or there is more than 1 tab and one of the tabs is a http:// url.
Comment 56•13 years ago
|
||
Serious, but no RC blocker for me, given the circumstances it takes to reproduce the issue.
Updated•13 years ago
|
Reporter | ||
Updated•13 years ago
|
status-firefox9:
--- → affected
Comment 58•13 years ago
|
||
Stumbled upon another STR to reproduce this from testing out bug 686901 1. http://www.epsn.com (should load the mobile site) 2. Scroll to bottom, tap epsn.com (should load the desktop site) 3. On the top right, tap 'Sign In' You should see a popup frame. When this happens, attempt to close the active tab or visit any new URL.
Updated•13 years ago
|
Status: REOPENED → NEW
tracking-fennec: --- → ?
Updated•13 years ago
|
Target Milestone: Firefox 6 → ---
Version: Firefox 6 → Firefox 9
Comment 59•13 years ago
|
||
In bug 686901, I mentioned a regression range for which the issue in comment 58 also falls.
Assignee | ||
Updated•13 years ago
|
tracking-fennec: ? → 9+
Assignee | ||
Updated•13 years ago
|
tracking-fennec: 9+ → +
Assignee | ||
Comment 60•13 years ago
|
||
There are no new patches for this bug, but autmation thinks we should land the approved patches to aurora and beta - which we have already done. I am closing this bug. Open a new bug if this is still an issue and we can triage it in the new bug.
Status: NEW → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•