Closed Bug 1567054 Opened 4 months ago Closed 3 months ago

Slow navigation on Google Maps Street VIew

Categories

(Core :: Graphics, defect, P3)

68 Branch
Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- fixed
firefox70 --- fixed

People

(Reporter: yoasif, Assigned: bobowen)

References

(Regression, )

Details

(Keywords: nightly-community, regression)

Attachments

(4 files)

Steps to reproduce:

  1. Navigate to https://www.google.com/maps/@40.739262,-73.9827722,3a,75y,109.58h,75.2t/data=!3m6!1e1!3m4!1ssK1pDYFYJAma4MZjAq7xpg!2e0!7i16384!8i8192
  2. Go forward or backwards on the street

What happens:

Very slow navigation.

Expected result:

Faster navigation.

Oddly, if I use a fresh profile or refresh my profile, the problem goes away. I last refreshed my profile a few days ago because of this, but my browser crashed earlier today, and after I restarted it, I experienced this issue.

Disabling WebRender makes it work better.

16:50.53 INFO: No more inbound revisions, bisection finished.
16:50.53 INFO: Last good revision: 9909cd207cc21bf0de639e211fcfe950690f31a5
16:50.53 INFO: First bad revision: 2195b79ea888b1e49406406f528870ca9cab1525
16:50.53 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9909cd207cc21bf0de639e211fcfe950690f31a5&tochange=2195b79ea888b1e49406406f528870ca9cab1525

Has Regression Range: --- → yes
Has STR: --- → yes
Regressed by: canvas-ipc

I don't seem to be able to reproduce this myself.
Do you have any other prefs set in about:config?

Also, would you be able to find the crash you mention in about:crashes and post a link to it, it might be related, thanks.

Flags: needinfo?(yoasif)
Attached file prefs.js

I also just tried to reproduce the issue with my existing prefs.js in a new profile - I wasn't able to reproduce.

I have attached it here - it is the same one in use in the profile where I can reproduce the issue, but something else is clearly afoot.

If you want, I can try to attach my entire profile (sanitized) if you think that would help.

(In reply to Asif Youssuff from comment #3)

Created attachment 9079071 [details]
prefs.js

I also just tried to reproduce the issue with my existing prefs.js in a new profile - I wasn't able to reproduce.

I have attached it here - it is the same one in use in the profile where I can reproduce the issue, but something else is clearly afoot.

If you want, I can try to attach my entire profile (sanitized) if you think that would help.

Thanks, most of those crashes don't have useful stacks as far as I can see, but at least a couple that do have, look like they're to do with fission, so I guess the crash is unrelated.

I've also just realised that you're running on Linux, so I'll try and reproduce on a VM.
It's weird that you can't reproduce on a new profile with the same prefs.

Sorry, would you mind posting your about:support for the new profile where you don't see the issue as well please.

Flags: needinfo?(yoasif)
OS: Unspecified → Linux
Flags: needinfo?(yoasif)

Thanks for that, unfortunately I couldn't see anything obvious that could be causing the difference.
I've also been unable to reproduce the issue myself, at least I don't see a discernable difference using nightly before and after bug 1464032 landed.

Assuming you can still reproduce, would it be possible for you to produce a performance profile (using the addon at https://profiler.firefox.com/) for the same version of nightly using the firefox profile that exhibits the problem and the fresh firefox profile that doesn't.

Flags: needinfo?(yoasif)

Bob, on a hunch I decided to disable my Google container add-on in the same browser profile where I originally saw the issue: https://addons.mozilla.org/en-US/firefox/addon/google-container/

Performance was immediately back to normal as soon as I opened Maps outside of the container.

Is this a known issue with containers? Can you reproduce if you open Maps using the container add-on? FWIW, I tried a random container (not from the add-on) and it was a lot better, so it may be something with this container specifically?

Flags: needinfo?(yoasif) → needinfo?(bobowencode)

Update -- tried the Google Container in a fresh profile. Didn't see issues.

I'll follow up with profiles.

Flags: needinfo?(bobowencode)

Bob, I did some more digging. I found that if I disabled all add-ons except for First Party Isolation https://addons.mozilla.org/en-US/firefox/addon/first-party-isolation/ and the aforementioned Google Container https://addons.mozilla.org/en-US/firefox/addon/google-container/ I was unable to reproduce the issue.

I was also unable to capture any profiles using the Firefox profiler due to a storage issue.

I visited https://firefox-storage-test.glitch.me/ and found that returned good, but since it required FPI and the container, I suspected that this was happening due to local storage.

If I clear the local storage for google.com, I am unable to reproduce the issue, even with the add-ons above installed.

So it is starting to look like a IndexedDB issue, and the add-ons above is probably a red herring, since disabling the container or FPI uses different sets of local storage.

What is the best way to diagnose this? I have a backup of the profile (before I cleared local storage) -- I can try to strip out personal data and share the profile (ensuring that it continues to reproduce) if you like -- but unfortunately, I don't see a way to capture a profile using the Firefox profiler with the broken local storage on google.com.

Open to any ideas, thanks!

Flags: needinfo?(bobowencode)

(In reply to Asif Youssuff from comment #10)
...

So it is starting to look like a IndexedDB issue, and the add-ons above is probably a red herring, since disabling the container or FPI uses different sets of local storage.

Thanks for your perseverance with this.
This makes me wonder why bug 1464032 came up as the regression for this, it's possible there are two issues here.
The patches in bug 1464032 did cause a performance regression (bug 1559437), which was fixed in buildid 20190619214046.
It's possible that while you were bisecting you hit this unrelated performance regression, which was in nightly for 12 days (bug 1464032 landed in build id 20190607220156).

So, to try and prove this theory could you:

  • Without specifying a profile (so mozregression creates a clean one) test with the following to see if you see an issue and if it was fixed:
    ** mozregression --launch 20190607220156
    ** mozregression --launch 20190619214046
  • Take another copy of the profile that has the problem in current nightly (by default mozregression will clone the profile given, but better to be safe) and run ... current Nightly, the build before bug 1464032 landed and after bug 1559437 was fixed, so:
    ** mozregression --launch 20190724220024 --profile <path-to-profile-dir>
    ** mozregression --launch 20190607095118 --profile <path-to-profile-dir>
    ** mozregression --launch 20190619214046 --profile <path-to-profile-dir>

Build 20190724220024 (current Nightly) should hopefully still have the issue with this profile.
If 20190619214046 (after bug 1559437 fix) is OK then bisect with this as the good build and 20190724220024 as the bad one.
If 20190619214046 is bad and 20190607095118 is good, the only thing I can suggest is if you can see a further regression between the build after bug 1464032 landed and before bug 1559437 was fixed and then bisect using those, so:
** mozregression --launch 20190607220156 --profile <path-to-profile-dir>
** mozregression --launch 20190618214619 --profile <path-to-profile-dir>

Hope that all makes sense.

What is the best way to diagnose this? I have a backup of the profile (before I cleared local storage) -- I can try to strip out personal data and share the profile (ensuring that it continues to reproduce) if you like -- but unfortunately, I don't see a way to capture a profile using the Firefox profiler with the broken local storage on google.com.

You could encrypt the profile and send it via a different means if you like and email me the password.
However, reproducing performance issues on different machines can be troublesome.

Flags: needinfo?(bobowencode)
Flags: needinfo?(yoasif)

Bob,

Thanks for the detailed instructions.

mozregression --launch 20190607220156
mozregression --launch 20190619214046

Are both good.

When launched with a profile, every build you mentioned except 20190607095118 is bad. 20190607095118 is not great but it is better than the rest of the builds, and isn't as annoying to use.

mozregression --good 20190607095118 --bad 20190618214619

with my profile shows:

12:55.21 INFO: No more inbound revisions, bisection finished.
12:55.21 INFO: Last good revision: 9909cd207cc21bf0de639e211fcfe950690f31a5
12:55.21 INFO: First bad revision: 2195b79ea888b1e49406406f528870ca9cab1525
12:55.21 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9909cd207cc21bf0de639e211fcfe950690f31a5&tochange=2195b79ea888b1e49406406f528870ca9cab1525

I went ahead and uploaded my profile to Firefox Send at: https://send.firefox.com/download/646c99adb27e361e/#ug5qOGqXJew3mwU6PUwWPQ

and sent you the password via your email.

Flags: needinfo?(yoasif)
Priority: -- → P3

(In reply to Asif Youssuff from comment #12)
...

mozregression --launch 20190607220156
mozregression --launch 20190619214046

Are both good.

That seems to disprove the theory about hitting the other regression.

...

I went ahead and uploaded my profile to Firefox Send at: https://send.firefox.com/download/646c99adb27e361e/#ug5qOGqXJew3mwU6PUwWPQ

Unfortunately, I can't determine a difference with that profile before or after the change landed.
I've tried running on hardware as well as a VM.
Although, the profiler is also broken for me with that profile, so there's clearly some sort of issue with it.
It's particularly confusing that you see a regression for those changesets, but only with this profile.

I'm running out of ideas, but maybe we can bisect down to the individual changeset.
I've pushed a build to try for the middle one of those changesets.
You should be able to run and test it with:
mozregression --launch 62b6af6b9b70 --repo try --profile <path-to-profile-with-issue>

If you can, I'll push all the changesets before or after, depending on whether you see the regression or not.

Flags: needinfo?(yoasif)

mozregression --launch 62b6af6b9b70 --repo try --profile <path-to-profile-with-issue>

This build doesn't load any web pages. I tried opening a new window and new tabs with different pages, and it never loads any pages.

Although, the profiler is also broken for me with that profile, so there's clearly some sort of issue with it.

Maybe it'd be better to try to figure out why my localstorage is messed up and how it got that way (or how to heal itself so it doesn't do this)? Just a thought.

Flags: needinfo?(yoasif)

(In reply to Asif Youssuff from comment #14)

mozregression --launch 62b6af6b9b70 --repo try --profile <path-to-profile-with-issue>

This build doesn't load any web pages. I tried opening a new window and new tabs with different pages, and it never loads any pages.

Oh, then I'm out of ideas. The build works for me in a VM and on hardware (both running ubuntu).
jgilbert - any ideas how we could investigate this further?
One thing I'm still not sure about is whether google maps uses WebGL, when available.
So I guess possibilities are ... it does and my changes for this particular profile are causing fallback to 2D canvas ... or it doesn't and there is a Canvas 2D regression from my changes.

Although, the profiler is also broken for me with that profile, so there's clearly some sort of issue with it.

Maybe it'd be better to try to figure out why my localstorage is messed up and how it got that way (or how to heal itself so it doesn't do this)? Just a thought.

Possibly, but I don't know enough about it to investigate or help.
mak - do you know how we could investigate the apparent local storage problem with this profile? Comment 10 has the most detail on this I think.

Flags: needinfo?(mak77)
Flags: needinfo?(jgilbert)

I suspect I'm not the right person for DOM Storage or IndexedDB, I'm forwarding to Jan. Alternatively I think also Baku or Asuth may help.

Flags: needinfo?(mak77) → needinfo?(jvarga)

Since Firefox Send removes files after a while, re-uploaded my profile to https://drive.google.com/open?id=1Aw2R3X9_Tmk2Wm2-9ZmpzpUBWVJpukEq

Hope this helps!

OK, I think I've finally managed to reproduce this.
I've installed ubuntu eoan and with your stripped down profile, I can see a performance problem after the canvas 2D remoting changes landed.
It doesn't appear to be to do with (or just to do with) any prefs, because if I copy them to a fresh profile, I don't see the issue.
It does go away (or at least is much better) when webrender is disabled.
Disabling the google container and opening in a different container doesn't seem to fix it for me.

I've bisected the problem down to this changeset:
https://hg.mozilla.org/mozilla-central/rev/4357d695b8d5

The profiler is still broken for me with this profile, so unfortunately I can't use that to try and find the problem.
So, it would still be useful to diagnose the storage issue, if there is one.

I'll look through that patch to see if I can spot anything.

I probably should have mentioned that I am running Ubuntu eoan as well. Thanks for digging in!

I'd did find it went away when I tried a different container (before I'd used no container).
It also seems to be linked to the First Party Isolation addon, because when that is disabled it goes away as well.
This still isn't the whole story, because trying all of these things in a fresh profile still isn't enough to trigger the issue.
There must be something else as well that causes the google maps code to go down a different route.

Anyway, I think I've found the problem.
After looking at the patch it seemed the issue could only be in the CanvasRenderingContext2D or PersistentBufferProvider changes.
So, with some logging added I could see in the non-working case it was continually calling GetSurfaceSnapshot, but it was getting the snapshot from the front buffer, which was not what I was expecting. This is because mBufferProvider exists, but not mTarget.

The similar changes I made to GetImageBuffer and GetImageDataArray didn't change the logic, because even though they both checked mTarget first before, if mTarget is not null (and not sErrorTarget) then mBufferProvider must also be non-null.
However I copied this logic when I moved GetSurfaceSnapshot into the cpp file and before it always called EnsureTarget, whereas now it doesn't if mBufferProvider is not null.
If I remove that check and always call EnsureTarget in GetSurfaceSnapshot then the problem goes away.

Flags: needinfo?(jvarga)
Flags: needinfo?(jgilbert)

Hopefully, the build from the try push will work for you and fix the issue:
https://queue.taskcluster.net/v1/task/dOFzGYhCRCOGt-n81CBZkw/runs/0/artifacts/public/build/target.tar.bz2

Flags: needinfo?(yoasif)
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/024f04feae0b
Always call EnsureTarget in CanvasRenderingContext2D::GetSurfaceSnapshot. r=jrmuizel

Bob, the build you posted works a lot better than the latest nightly. I would consider it fixed.

Flags: needinfo?(yoasif)
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Assignee: nobody → bobowencode

Bob, do you think it is worth spawning a new bug for the storage issue that caused us to not be able to capture a profile?

Flags: needinfo?(bobowencode)

(In reply to Asif Youssuff from comment #27)

Bob, do you think it is worth spawning a new bug for the storage issue that caused us to not be able to capture a profile?

Thanks for testing.
I'm not sure that it was a storage issue, I think it was because of the First Party Isolation addon coupled with other things, unfortunately I still haven't managed to reproduce the issue starting from a clean profile.
The profile capturing issue, does appear to be just the First Party Isolation addon.
If I disable that just before I capture the profile it works, so I'm going to look at the profiles to see if I can spot anything.

Flags: needinfo?(bobowencode)

This is a profile with the fix:
https://perfht.ml/2KpAyCB

This is a profile with the performance issue:
https://perfht.ml/2KnG4Wm

The problem seems to be that because we borrow the snapshot from the front buffer, we read lock it and during the unlock DetachAllSnapshots is called [1], which causes the snapshot to get copied.
Bizarrely, the profile seems to say all the time is being taken in _nss_group_lookup in libc (I guess during a memcpy or something).

When EnsureTarget is always called a new back buffer is used and write locked, which will only get unlocked at the end of the frame.
Presumably other references to the snapshot have gone by then or I would have thought we'd see the same issue.

I don't fully understand the rules around SourceSurfaces, but I wonder if we don't need to call DetachAllSnapshots in the read lock case. (I've tried this and it fixes the performance issue, but I'm not sure if it's valid.)
I also wonder if we could not call DetachAllSnapshots on Unlock and instead do it on Write lock, when we are actually going to change the DrawTarget.
At that point there would be more chance that any references to snapshots had been dropped and so DetachAllSnapshots would be cheap.

[1] https://searchfox.org/mozilla-central/rev/9775cca0a10a9b5c5f4e15c8f7b3eff5bf91bbd0/gfx/layers/client/TextureClient.cpp#705

Flags: needinfo?(jmuizelaar)
Component: Graphics: WebRender → Graphics
Summary: Slow navigation (WebGL?) on Google Maps Street VIew → Slow navigation on Google Maps Street VIew

Comment on attachment 9084048 [details]
Bug 1567054: Always call EnsureTarget in CanvasRenderingContext2D::GetSurfaceSnapshot. r=jrmuizel!

Beta/Release Uplift Approval Request

  • User impact if declined: Performance issue will remain.
    It's unclear how many users this might affect.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This effectively reverts to previous behaviour for CanvasRenderingContext2D::GetSurfaceSnapshot.
  • String changes made/needed: None
Attachment #9084048 - Flags: approval-mozilla-beta?

Comment on attachment 9084048 [details]
Bug 1567054: Always call EnsureTarget in CanvasRenderingContext2D::GetSurfaceSnapshot. r=jrmuizel!

Fixes a regression in 69 causing degraded performance on Google Maps. Approved for 69.0b13.

Attachment #9084048 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(jmuizelaar)
You need to log in before you can comment on or make changes to this bug.