Open Bug 1641715 Opened 6 months ago Updated 2 days ago

GeckoView doesn't load for external apps (custom tabs/PWAs) in AC and Fenix

Categories

(GeckoView :: General, defect, P1)

77 Branch
Unspecified
All

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: tigeroakes, Assigned: aklotz)

Details

(Whiteboard: [geckoview:m79][fenix:p1][geckoview:m80][geckoview:m84][geckoview:m85])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

For about a month in Fenix, opening a PWA or custom tab sometimes gets a white screen rather than the webpage loading. This also occurs in Sample Browser in Android Components. If you remove the main app from memory by clearing recent apps, PWAs will not load.

I checkout out lots of old Android Components versions from late April and early May to track down where the bug appeared. (See https://github.com/mozilla-mobile/fenix/issues/10689.) I narrowed down one line: changing the nighly GeckoView version from "77.0.20200428100141" to "77.0.20200429095105".

There are not many behaviour changes between PWAs and the main browser app, so we're not sure why this issue has sprung up.

This is a high priority bug and needs to be fixed before Fenix's feature freeze.

Actual results:

White screen loads after GV version change

Expected results:

Webpage should load, just like before the change.

Flags: needinfo?(etoop)

This would be a Fenix release blocker for PWAs.

When is Fenix's feature freeze?

We can get this scheduled in for landing 79. It sounds like you would want this uplifted to 78 if it is a release blocker?

Flags: needinfo?(tigeroakes)
Flags: needinfo?(jonalmeida942)
Flags: needinfo?(etoop)

If possible, yes, otherwise we would have to consider disabling the feature and we would need to get buy in from product for that.

Flags: needinfo?(jonalmeida942)
Severity: -- → S3
Priority: -- → P1
Whiteboard: [geckoview:m79][fenix:p1]

Feature freeze is 6/5, however we'll take bugfixes through the end of June. However, if we can get this into 78 beta, then we can test this in Fenix release channel.

Flags: needinfo?(tigeroakes)
Assignee: nobody → droeh
Whiteboard: [geckoview:m79][fenix:p1] → [geckoview:m79][fenix:p1][geckoview:m80]

I've just been bumped on this by :grisha. This is still a Fenix P1.

This appears to have resolved itself -- I updated my tree today and can no longer reproduce on local builds; checking nightly also looks good, and I asked Jon to try it out and he also can no longer reproduce it. (And, looking now, it appears the original reporter on the github issue posted to say it's working a few days back.)

Status: UNCONFIRMED → RESOLVED
Closed: 4 months ago
Resolution: --- → INVALID

It looks like this has started happening again with fairly reliable STR:
https://github.com/mozilla-mobile/fenix/issues/15335

Happens on PWAs like pwa-directory.appspot.com or crypt.ee, STR:

  1. Install the PWA.
  2. Swipe to close Fenix.
  3. Launch the PWA. -> blank page.
    After this, I get the blank screen even with Fenix in the background.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Whiteboard: [geckoview:m79][fenix:p1][geckoview:m80] → [geckoview:m79][fenix:p1][geckoview:m80][geckoview:m84]
Whiteboard: [geckoview:m79][fenix:p1][geckoview:m80][geckoview:m84] → [geckoview:m79][fenix:p1][geckoview:m80][geckoview:m84][geckoview:m85]

After upgrading to GV 84.0a1-20201114094625 we're seeing this a lot more now. I can reproduce every time using latest Fenix Nightly (201116 17:01).

Stepping back to GV 84.0a1-20201109095222 fixes this. So it looks like we have a recent regression?

Going through the logs we see the pageStartevent, but then nothing seems to happen and we never get a pageStop or any other event. The app is responsive and nothing is blocking the main thread.

STR (similar to before):

  • Launch Fenix

  • Install PWA

  • Close Fenix

  • Open PWA (fails intermittently, but consistently using 84.0a1-20201114094625)

Dylan, Agi, can you help us investigate? Any idea what could've regressed this? We've already looked at our commits in this range and didn't find anything.
Good: GV 84.0a1-20201109095222
Bad: GV 84.0a1-20201114094625

Attaching debug logs as well.

Flags: needinfo?(droeh)
Flags: needinfo?(agi)

Looks like this is caused by Multi E10S (turning dom.ipc.processCount back to 1 fixes it), Aaron any ideas?

Flags: needinfo?(droeh)
Flags: needinfo?(aklotz)
Flags: needinfo?(agi)

This is probably one of those things where some bug existed but was hard to reproduce, and then e10s-multi showed up and exacerbated it. I'll take a look...

Flags: needinfo?(aklotz)
Assignee: droeh → aklotz
Status: REOPENED → ASSIGNED

One thing I've noticed when reproducing this is that the PWA load causes Gecko to launch a second content process even though the first one was sitting there unused. We also ended up with both content processes set to FOREGROUND priority, which should not happen.

Something is definitely happening out of order here, but I haven't narrowed it down yet. Stepping through this case with strategically-placed breakpoints produced a working instance of the PWA.

Just to leave a couple of notes here for when I get back:

  • I have figured out why we sometimes see two content processes and other times one: There is a service worker that is being spun up. If the PWA's web content loads before the service worker, we get a second process. If the service worker loads before the web content, we only get one content process. This is because of the way ContentParent's e10s process allocator works: it tracks top-level content but not service workers. If tab0 is hosting a service worker, the allocator still considers that process to be "empty," because it's not holding any "tabs," i.e. top-level content. In that case, the allocator re-uses tab0. OTOH, if tab0 is already hosting a tab, then the allocator decides to spin up tab1 to host the service worker.
  • Is this service worker business the cause? Possibly (and the blank content would be consistent with some of the issues that desktop has seen with service workers during browser startup), but more investigation is necessary.
You need to log in before you can comment on or make changes to this bug.