Closed Bug 1766342 Opened 2 years ago Closed 2 years ago

blank and unresponsive screen since version 98 on Sony Xperia XZ1

Categories

(Core :: mozglue, defect, P1)

Unspecified
Android
defect

Tracking

()

VERIFIED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- verified
firefox102 --- verified

People

(Reporter: agi, Assigned: gsvelto)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [geckoview:m102])

Attachments

(1 file)

From github: https://github.com/mozilla-mobile/fenix/issues/24226.

Steps to reproduce

After the apk is updated to version 98 or 99 beta, opening the app makes a blank screen (no decoration, no url bar, no buttons or anything to interact). Rolling back to 97.3.0 makes firefox work again.

Expected behaviour

firefox should show menus and web content

Actual behaviour

The app cannot be interacted with

Device name

Sony Xperia XZ1

Android version

Android 8

Firefox release type

Firefox

Firefox version

98.0.0

Device logs

Cannot pull logs from the application. Checked catlog while opening the app but could not find anything obvious (no tracebacks, at least), can find something better if anyone knows a keyword I can search for

Additional information

No response

Change performed by the Move to Bugzilla add-on.

Some users report that the 98 upgrade causes the browser to be completely unresponsive on the Sony Xperia XZ1.

Severity: -- → S2
Priority: -- → P1
Summary: [Bug]: blank and unresponsive screen since version 98 → blank and unresponsive screen since version 98 on Sony Xperia XZ1

This bug is a regression between GV 97 and 98.

Can we ask the reporter to bisect the Fenix or GVE Nightly 98 builds using mozregression?

(In reply to Chris Peterson [:cpeterson] from comment #2)

This bug is a regression between GV 97 and 98.

Can we ask the reporter to bisect the Fenix or GVE Nightly 98 builds using mozregression?

Hi, I'm am not the original reporter, but I have the same issue. How can I run mozregression on an Android device?

AIUI you want to use mozregression -n gve to bisect the geckoview example app. It should try to use a connected android device (you need to ensure that remote debugging is enabled so it can be accessed via adb). https://mozilla.github.io/mozregression/ has some general information. Note that I haven't actually done this myself, so it may be there are some more steps required to get things working.

Jamie: I'm going to speculatively guess this is somewhat likely to be a gfx issue, or at least you're in a good position to get a device and help find the regression range.

Flags: needinfo?(jnicol)

Thanks bmarne. I wrote some instructions on the github issue here. Please let me know if you need any help with that!

Flags: needinfo?(jnicol) → needinfo?(bmarne)

I've ordered a device. In the meantime, if we get a regression range soon we can decide whether to revert the commit, else we can move affected users to software webrender (assuming it's a driver/webrender bug)

From github, this seems to affect the XZ (Snapdragon 820, Adreno 530), as well as XZ1 and XZ1C (Snapdragon 835, Adreno 540). All users who have reported an Android version have reported 8.0. I'd assume all Sony devices with those chips on Android 8 are affected.

A user also gave this regression range: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=524ae1444004ec35dfcc348104060e91e1efce82&tochange=b34a32e1fc3e6fe4e0b7bcbb68cca4b797e9733d

I'm not sure what to think about that. Only bug 1752168 seems related to rendering, and it doesn't seem like it should cause this

(In reply to Jamie Nicol [:jnicol] from comment #8)

A user also gave this regression range: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=524ae1444004ec35dfcc348104060e91e1efce82&tochange=b34a32e1fc3e6fe4e0b7bcbb68cca4b797e9733d

Since bug happens at app startup, maybe Bug 1751041 (Compute the process startup timestamp early during startup) is related? Maybe these devices have a clock bug that is causing GeckoView to hang?

Could be. Hopefully my device arrives tomorrow or the day after and I can get a precise range. For now, there's nothing to indicate to me we should remotely blocklist webrender. We could perhaps make some test builds with potential causes backed out, and ask users to test them. Other than that we may just have to sit tight until it arrives.

Two users on github have confirmed that bug 1751041 was the regressing bug.

Gabriele, any ideas why that could have caused this on these devices? Do you have any suggestions for a fix or even just some logging that we can ask the users to test an APK with? And can we safely revert this patch for 100 if we need to?

Flags: needinfo?(gsvelto)
Regressed by: 1751041

The issue is odd but is not the first one that gets reported, see bug 1759541 too. The patch in bug 1751041 can be reverted safely but it will cause one of the hazard tests to fail, see bug 1678152 comment 39. I'll try and dig into the change itself to see if there's something wrong with it that can be addressed directly.

Flags: needinfo?(gsvelto)

I gave this and bug 1759541 some thought and I think I know what might be happening: moving the timestamp so early during startup means we're running it in a static initializer. That function is not trivial, it's calling getenv(), strcmp() and clock_gettime() at the very least which are all coming from glibc - and bionic on Android. Both libraries might be setting up stuff early during static initialization and might not work properly. Also there's the issue of sandboxing, I don't know how the content sandbox behaves at that point and maybe I should have kept that in mind.

Either way what might be happening is that content processes might be crashing or getting stuck on startup. I'll write a patch later that reverts the change while avoiding to break the GC hazards test. What bugs me though is that this will paper over the problem: the next time we move timestamp generation too early we might hit this again - and that might happen purely by chance with some code calling TimeStamp::ProcessCreation() indirectly being moved around.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

FWIW my Xperia XZ1 is supposedly being delivered on Apr 30th.

Hi all!
I've verified if I can reproduce this issue on the Sony Xperia Z5 Premium (Android 7.1.1) device.
I've installed Beta 97.0.0-beta.6, browsed a little. I've then updated to Beta 98.0.0-beta.4. Everything works just fine: the tabs opened on Beta 97 were still displayed after update, no blank tabs/screens.
The same behavior on RC 97.1.0, updated to RC 98.1.0.

Has Regression Range: --- → yes
Attachment #9274057 - Attachment description: WIP: Bug 1766342 - Compute the process creation timestamp lazily → Bug 1766342 - Compute the process creation timestamp lazily r=glandium
Blocks: 1759541

Moving this bug to the mozglue component because this is a mozglue bug, not a GeckoView bug.

Component: General → mozglue
Product: GeckoView → Core
Whiteboard: [geckoview:m102]

Set release status flags based on info from the regressing bug 1751041

Gabriele, after this fix lands in Nightly, can you please uplift it to Beta 101?

Flags: needinfo?(gsvelto)
Whiteboard: [geckoview:m102]

(In reply to Chris Peterson [:cpeterson] from comment #19)

Gabriele, after this fix lands in Nightly, can you please uplift it to Beta 101?

Sure! I'm leaving the NI? so I don't forget.

Mike, can you please take a look at this patch? We'd like to get this fixed ASAP.

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b5e8b70a1bf
Compute the process creation timestamp lazily r=glandium
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch

Comment on attachment 9274057 [details]
Bug 1766342 - Compute the process creation timestamp lazily r=glandium

Beta/Release Uplift Approval Request

  • User impact if declined: Fenix is unusable on certain devices
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Launch a nightly build of Fenix on a Sony Xperia XZ1 device and ensure it works instead of just showing a blank screen.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This changes is effectively a revert of bug 1751041. Previous to my change in bug 1751041 this code had worked for years so moving back to the old behavior can't hurt.
  • String changes made/needed: none
  • Is Android affected?: Yes
Flags: needinfo?(gsvelto)
Attachment #9274057 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9274057 [details]
Bug 1766342 - Compute the process creation timestamp lazily r=glandium

Approved for Fenix 101.0.0-beta.5.

Attachment #9274057 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

I tested the issue using a Sony Xperia Z5 (Android 7.0) by installing Firefox RC 97.3.0 and the updating to RC 98.3.0 but was unable to reproduce the issue. We also tested the issue using a Sony Xperia - model number SGP511 (Android 6.0.1) and we were not able to reproduce the issue. Unfortunately we do not have a device that is affected by this issue.

Flags: qe-verify+

Multiple people in the upstream issue have verified that Nightly and Beta are working as expected now.

Status: RESOLVED → VERIFIED
Flags: needinfo?(bmarne)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: