Open Bug 1903530 Opened 17 days ago Updated 12 days ago

Frequent long janks on Android on loading certain pages

Categories

(Core :: Storage: Quota Manager, defect, P3)

defect

Tracking

()

Performance Impact high

People

(Reporter: bas.schouten, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

I can reliably reproduce this on Google when it is the first site being loaded in the session (i.e. after killing the app). The app appears to be stuck in some storage system.

Profile: https://share.firefox.dev/3zbzrSP

The Performance Impact Calculator has determined this bug's performance impact to be high. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Android
Impact on site: Causes noticeable jank
Page load impact: Severe
Websites affected: Major
[x] Able to reproduce locally

The jank is happening inside a Storage.setItem call and most likely is a long quota manager initialization supposed to be addressed by bug 1671932.

Depends on: 1671932

(In reply to Jens Stutte [:jstutte] from comment #2)

(In reply to Bas Schouten (:bas.schouten) from comment #0)

I can reliably reproduce this on Google when it is the first site being loaded in the session (i.e. after killing the app). The app appears to be stuck in some storage system.

Profile: https://share.firefox.dev/3zbzrSP

A profile taken with all threads would definitely confirm if comment 1 is right.

This profile is taken with all threads :-).

Flags: needinfo?(bas)

Sorry, stupid me. So no, this is something different, a long setitem that does a lot of mozilla_dump_image calls.

No longer depends on: 1671932

(In reply to Jens Stutte [:jstutte] from comment #4)

Sorry, stupid me. So no, this is something different, a long setitem that does a lot of mozilla_dump_image calls.

I think personally that stack trace is nonsense. As far as I can tell mozilla_dump_image is a graphics call that isn't actually reached by Storage.setItem :-s. The bottom of the stack looks real though, and the Storage.setItem is a pseudostack so definitely correct.

(In reply to Bas Schouten (:bas.schouten) from comment #5)

I think personally that stack trace is nonsense. As far as I can tell mozilla_dump_image is a graphics call that isn't actually reached by Storage.setItem :-s. The bottom of the stack looks real though, and the Storage.setItem is a pseudostack so definitely correct.

Yes, that looks very weird. But there seems to be activity on the QM IO thread already before that setitem starts and that activity seems to return to the event loop level several times, so I would assume the initialization is already finished. But it is actually hard to tell with these garbled stacks.

Profile with fixed symbols here: https://share.firefox.dev/3KQ7ouJ

Looks like you were right about the cause: https://share.firefox.dev/3KQoxV2

Component: Storage: localStorage & sessionStorage → Storage: Quota Manager

(In reply to Bas Schouten (:bas.schouten) from comment #8)

Looks like you were right about the cause: https://share.firefox.dev/3KQoxV2

Yeah, it has to do with initialization but I am not sure I read it well - I see QM related initialization but distributed over several events. Is that how this looks like these days when QM initializes?

Depends on: 1671932
Severity: -- → S3
Priority: -- → P3

I noticed this is set to S3? Is there a way to resolve this without blowing away your profile? Would users normally be able to get out of this situation? Because otherwise this seems like a more severe issue.

Delete browsing data is the simplest mechanism available on Android since the settings UI does not provide the more granular data clearing mechanism available in desktop which, most significantly, characterizes the per-site data usage. In general, the slowdown will be correlated with the amount of data, so having a way to see sites using the most data is most useful. That is, Fenix does provide a way to clear cookies and site data for any site from the lock icon, which is granular, but not useful because there's no way to know what sites are the source of the problem.

One additional thing to note is that, as discussed with :mstange for his similar report in #DOM on matrix the other day, performance problems will be exacerbated after:

  1. Fenix updates because of the change in Build ID require running a full initialization sweep.
  2. Killing Firefox without letting it shutdown cleanly because the cache will not be flushed. I guess a very interesting question is whether Fenix ever does cleanly shutdown? Like, if Android wants to reclaim memory, does XPCOM shutdown get a chance to run, or does Fenix 100% of the time perform an unclear shutdown? Or is Fenix regularly shutting down XPCOM proactively when it goes idle but before it's killed by the OS. Which is to say, it's definitely the case that if you explicitly kill the app while using it you are going to see worst-case init behavior. (But which is of course very normal user problem-solving behavior.)

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:aiunusov, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit BugBot documentation.

Flags: needinfo?(aiunusov)

This bug now depends on bug 1671932 which already has severity set to S2.
Not sure if all bugs which such S2 bug blocks need to be S2 as well.

Severity: S3 → S2
Flags: needinfo?(aiunusov)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #11)

Delete browsing data is the simplest mechanism available on Android since the settings UI does not provide the more granular data clearing mechanism available in desktop which, most significantly, characterizes the per-site data usage. In general, the slowdown will be correlated with the amount of data, so having a way to see sites using the most data is most useful. That is, Fenix does provide a way to clear cookies and site data for any site from the lock icon, which is granular, but not useful because there's no way to know what sites are the source of the problem.

One additional thing to note is that, as discussed with :mstange for his similar report in #DOM on matrix the other day, performance problems will be exacerbated after:

  1. Fenix updates because of the change in Build ID require running a full initialization sweep.
  2. Killing Firefox without letting it shutdown cleanly because the cache will not be flushed. I guess a very interesting question is whether Fenix ever does cleanly shutdown? Like, if Android wants to reclaim memory, does XPCOM shutdown get a chance to run, or does Fenix 100% of the time perform an unclear shutdown? Or is Fenix regularly shutting down XPCOM proactively when it goes idle but before it's killed by the OS. Which is to say, it's definitely the case that if you explicitly kill the app while using it you are going to see worst-case init behavior. (But which is of course very normal user problem-solving behavior.)

Before I do any of this, is there anything I can do to help diagnose -why- this is happening? A lot of users will probably just think 'man this browser can be so slow', rather than actually trying to figure out what the cause is or discovering a possible solution like this :-).

Flags: needinfo?(bugmail)

We know why this is happening and we are working hard on addressing the issue. The main meta bug for the effort is bug 1671932.
There is about 100 patches, most of them already accepted and we are in a process of landing them.
The patches are attached to bug 1808294, bug 1866217, bug 1866402 and bug 1867997.

Flags: needinfo?(bugmail)

This is currently the last patch for the effort: https://phabricator.services.mozilla.com/D199082
It has a commit message which describes the changes from high level point of view.

You need to log in before you can comment on or make changes to this bug.