Open Bug 1639280 Opened 4 years ago Updated 2 years ago

Unacceptable performance and memory consumption on Apart Posters VR experience.

Categories

(Core :: Graphics: CanvasWebGL, defect, P2)

ARM64
Android
defect

Tracking

()

Performance Impact low

People

(Reporter: rbarker, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: perf:responsiveness, Whiteboard: [fxr:p1][geckoview:p1])

Attachments

(11 files, 1 obsolete file)

STR:

  1. Visit https://show.apartposters.com/C4HMrDZ/showroom-3 in any GeckoView based browser (Fenix, GVE, FxR)
  2. Enter the room

Actual:
Browser will either crash or hang. If the room is entered, performance is very poor.

Expected:
Browser is able to enter the room and performance is good.

Notes:
This same page works with out issue in Chrome based Android browsers. Running on a Pixel 2 the frame rate is a fixed 60Hz in Cr while if you are even able to enter the room on a GV based browser, the frame rate is 30Hz. On a Pixel 1 the gap is even greater.

From examining the logs and viewing in a profiler, Gecko seems to be consume an excessive amount of memory which causes out-of-memory errors and webcontent, media process, and main process crashes. In the android studio profiler it shows the web content process consuming over 2GBs of graphics memory before a crash occurs.

Whiteboard: [fxr:p1]
Component: General → Performance
Product: GeckoView → Core
Whiteboard: [fxr:p1] → [fxr:p1][geckoview:p1]
Whiteboard: [fxr:p1][geckoview:p1] → [fxr:p1][geckoview:p1][fenix:p1]
Whiteboard: [fxr:p1][geckoview:p1][fenix:p1] → [fxr:p1][geckoview:p1][fenix:p1][qf]

I looked into this a little bit on macOS. I consistently get a 1GB memory increase when loading the page, and a corresponding 1GB decrease when navigating away from it.
I've been navigating between https://show.apartposters.com/#/ and https://show.apartposters.com/C4HMrDZ/showroom-3 by clicking the back/forward arrows, when reproducing this.

Profile: https://share.firefox.dev/3dtPWKj

It's not entirely clear to me what type of memory is being allocated here. It's mysterious to about:memory, too: The "explicit" allocations only increase by 360MB (16MB -> 383MB), but the "resident" number increases by 1.15GB (51MB -> 1227MB). The other number that increases is shmem-mapped, from 0 to 876.36 MB. But then when I navigate away again, shmem-mapped stays high, only decreasing by 60MB to 818.22 MB. But resident decreases to 164MB.

In any case, the profile gives some insight into what's happening: First we decode JPEG data, then WebGL converts it for texture upload, and then the GL driver converts it again during texture upload. And at least one of those three pieces seems to go into shared memory somehow, and what the driver does goes into mapped memory that is shared with the GPU but still accounted for in our resident size.

So this bug falls squarely within WebGL land, with maybe some contribution from image decoding.

Component: Performance → Canvas: WebGL
Version: unspecified → Trunk
Depends on: 1642775

Looking at the network requests, there's only one JPEG image in the list: https://hubs-5-assets.apart-internal.com/hubs/assets/waternormals-4418dde3f6abc21dc32506acf5f5b093.jpg
It's a 1024x1024 image. To allocate 1GB of image data, you'd need a 16k x 16k image.
I wonder if we're decoding the same image over and over.

Oh, image decoding is called from CreateImageBitmapFromBlob, so the JPEG image data presumably comes from some JS buffer, possibly extracted from the 40MB binary file at https://hubs-5-assets.apart-internal.com/files/a5ba573c-ce81-43a1-9060-253a095fccbf.bin .

I believe the page is using a glTF which contains the texture data.

Hey jgilbert, could you put a priority on this one?

Flags: needinfo?(jgilbert)
Whiteboard: [fxr:p1][geckoview:p1][fenix:p1][qf] → [fxr:p1][geckoview:p1][fenix:p1][qf:p3]
Whiteboard: [fxr:p1][geckoview:p1][fenix:p1][qf:p3] → [fxr:p1][geckoview:p1][fenix:p1][qf:p3:responsiveness]
Severity: -- → S2
Flags: needinfo?(jgilbert)
Priority: -- → P1

It would be great to narrow down where these seemingly-extra allocations are coming from. (untracked mallocs in gecko? driver shmems? gpu vram mirroring?)

Hey Sotaro - could you spend some time next week taking a look at this to help narrow down what might be happening?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Jessie [:jbonisteel] pls NI from comment #13)

Hey Sotaro - could you spend some time next week taking a look at this to help narrow down what might be happening?

OK, I confirmed that GeckoView example app and Firefox preview was killed by lowmemorykiller during visiting https://show.apartposters.com/C4HMrDZ/showroom-3

Thanks Sotaro - is there any more information you are able to determine to understand why that is happening?

With attachment 9157610 [details] [diff] [review], I looked into resource usage around WebGLTExtureUpload.cpp on Linux PC. It loaded many ImageBitmaps and upload 2 4k videos(3840, 1920) . Size of videos were huge. They were continuously uploaded to GL texture even when they were not rendered. It seems to consume a lot of memory. I am going to look into more tomorrow.

Flags: needinfo?(sotaro.ikeda.g)

GeckoView does not handle OS memory pressure yet. See Bug 1454752. Its support might reduce memory usage.

Depends on: 1454752

When I tested the page with Pixel3a, I normally saw a crash during uploading ImageBitmaps to GL textures. 37 ImageBitmaps were uploaded during loading the page.

One difference between chromium is AHardwareBuffer usage. Chromium uses it since Android Oreo. On android, gecko always use Shmem buffer for texture buffer, it needs additional gl texture image for GL rendering. When AHardwareBuffer is used, we do not need the additional gl texture image and could reduce memory usage. Bug 1562818 is for adding AHardwareBuffer support. But adding its support is high risk, since it adds totally new rendering and buffer allocation path.

This reproduces on Android 7 devices which is N so I would guess AHardwareBuffer is not the issue?

(In reply to Randall Barker [:rbarker] from comment #21)

This reproduces on Android 7 devices which is N so I would guess AHardwareBuffer is not the issue?

From gecko's architecture, GeckoView uses more memory than chromium, AHardwareBuffer could be one way to reduce memory usage. Though, AHardwareBuffer is not supported on Android 7.

Another problem is a way of uploading video frames. Video frame uses SurfaceTexture. But it has a usage limitation. To bypass the limitation, its data access becomes very redundant(more memory and latency). It was added by Bug 1486659. Its architecture is like the following diagram.

https://github.com/sotaroikeda/firefox-diagrams/blob/master/mobile/mobile_SurfaceAllocatorService_68.pdf

(In reply to Sotaro Ikeda [:sotaro] from comment #22)

(In reply to Randall Barker [:rbarker] from comment #21)

This reproduces on Android 7 devices which is N so I would guess AHardwareBuffer is not the issue?

From gecko's architecture, GeckoView uses more memory than chromium, AHardwareBuffer could be one way to reduce memory usage. Though, AHardwareBuffer is not supported on Android 7.

Right, and yet chrome is not only able to load this page on android 7 without issue, it then gets more than double the frame rate of gecko (when gecko is actually able to load the page).

:jhlin, can you comment to Comment 23?

Flags: needinfo?(jolin)

(In reply to Randall Barker [:rbarker] from comment #24)

From gecko's architecture, GeckoView uses more memory than chromium, AHardwareBuffer could be one way to reduce memory usage. Though, AHardwareBuffer is not supported on Android 7.

Right, and yet chrome is not only able to load this page on android 7 without issue, it then gets more than double the frame rate of gecko (when gecko is actually able to load the page).

On Android 7, single mode SurfaceTexture might be used for reducing memory usage. It is Bug 1413142. But it is not enabled yet because of several problems.

(In reply to Sotaro Ikeda [:sotaro] from comment #25)

:jhlin, can you comment to Comment 23?

Yes, the output buffers from decoder are sent to SurfaceTexture allocated on parent side, and in order to use it in the WebGL context in child/content process, each drawImage() call will make a copy of current video frame from parent [1].

The scene has two 4K videos and the log I added shows each takes 1-2ms to copy the texture on PIxel 2:

06-26 01:20:50.917 14440 14473 D GeckoSurfaceTexture: sync() took 2ms
06-26 01:20:50.920 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
...
06-26 01:20:50.981 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
06-26 01:20:50.983 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
...
06-26 01:20:51.049 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
06-26 01:20:51.052 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
...
06-26 01:20:51.112 14440 14473 D GeckoSurfaceTexture: sync() took 1ms
06-26 01:20:51.115 14440 14473 D GeckoSurfaceTexture: sync() took 1ms

And as Sotaro said, the additional memory usage is huge because of 4K contents. Unfortunately, I don't have a good solution to eliminate that and cannot find any document about how to use AHardwareBuffer in MediaCodec API.

I'm not familiar with Chrome code. It's possible that they don't do extra copy for WebGL but it's hard to tell if that is the reason why it displays the scene smoothly.

[1] https://searchfox.org/mozilla-central/source/gfx/gl/AndroidSurfaceTexture.cpp#55

Flags: needinfo?(jolin)

(In reply to John Lin [:jhlin][:jolin] from comment #27)

And as Sotaro said, the additional memory usage is huge because of 4K contents. Unfortunately, I don't have a good solution to eliminate that and cannot find any document about how to use AHardwareBuffer in MediaCodec API.

I'm not familiar with Chrome code. It's possible that they don't do extra copy for WebGL but it's hard to tell if that is the reason why it displays the scene smoothly.

We could get each video frame by using AImageReader. It is a wrapper of BufferItemConsumer.

Chromium uses it since Android P. Though, it seems possible to enable it since Android O.

ImageReaderGLOwner creates it.

https://phabricator.services.mozilla.com/D81479 roughly enabled AHardwareBuffer usage on Layer buffers and on WebGL SharedSurface. But oom crash happened during loading many ImageBitmaps on my Pixel 3a. From it, it sees necessary to reduce memory usage during loading the ImageBitmaps.

(In reply to Sotaro Ikeda [:sotaro] from comment #28)

We could get each video frame by using AImageReader. It is a wrapper of BufferItemConsumer.

Created Bug 1649110.

(In reply to Sotaro Ikeda [:sotaro] from comment #19)

When I tested the page with Pixel3a, I normally saw a crash during uploading ImageBitmaps to GL textures. 37 ImageBitmaps were uploaded during loading the page.

:jgilbert, do you have any ideas about how to reduce memory usage during loading many ImageBitmaps to GL textures for WebGL?

Flags: needinfo?(jgilbert)
See Also: → 1562818

(In reply to Sotaro Ikeda [:sotaro] from comment #28)

(In reply to John Lin [:jhlin][:jolin] from comment #27)

And as Sotaro said, the additional memory usage is huge because of 4K contents. Unfortunately, I don't have a good solution to eliminate that and cannot find any document about how to use AHardwareBuffer in MediaCodec API.

I'm not familiar with Chrome code. It's possible that they don't do extra copy for WebGL but it's hard to tell if that is the reason why it displays the scene smoothly.

We could get each video frame by using AImageReader. It is a wrapper of BufferItemConsumer.

Chromium uses it since Android P. Though, it seems possible to enable it since Android O.

ImageReaderGLOwner creates it.

Thanks a lot for the info! Maybe Chromium uses ImageReader only for Android P and later because Image::getHardwareBuffer() is available since that version.

The document does mention some use cases are not supported and MediaCodec is one of them. However, Android source code suggests that the HardwareBuffer is created using GraphicBuffer and should be compatible with MediaCodec.

(In reply to John Lin [:jhlin][:jolin] from comment #32)

The document does mention some use cases are not supported and MediaCodec is one of them.

:jhlin, can you provide a link to the document?

Flags: needinfo?(jolin)

(In reply to Sotaro Ikeda [:sotaro] from comment #33)

(In reply to John Lin [:jhlin][:jolin] from comment #32)

The document does mention some use cases are not supported and MediaCodec is one of them.

:jhlin, can you provide a link to the document?

Sorry for not pointing it out clearly. It's in the paragraph explaining the return value from [1]: ... null if this Image doesn't support this feature. (Unsupported use cases include Image instances obtained through MediaCodec, and...

[1] https://developer.android.com/reference/android/media/Image#getHardwareBuffer()

Flags: needinfo?(jolin)

(In reply to Sotaro Ikeda [:sotaro] from comment #31)

(In reply to Sotaro Ikeda [:sotaro] from comment #19)

When I tested the page with Pixel3a, I normally saw a crash during uploading ImageBitmaps to GL textures. 37 ImageBitmaps were uploaded during loading the page.

:jgilbert, do you have any ideas about how to reduce memory usage during loading many ImageBitmaps to GL textures for WebGL?

Maybe de-duplicating them would help? I bet we naively always make a copy.

We also don't seem to be freeing them aggressively enough. Maybe the GC doesn't realize how big they are, which is a common problem we've had in other areas: If the GC thinks the objects are small, it'll defer running a GC pass until there's likely to be more garbage. We should check that these large objects are known to be large by the GC/CC.

Flags: needinfo?(jgilbert)

I tested about what happens during loading the page with several configrations like the followings on my Pixel 3a. Only [8] succeeded to load the page. All another configrations failed to load the pages. Application was killed by lowmemorykiller during loading ImageBitmaps. From it, "Use AHardwareBuffer for layer buffer" could reduce memory usage. And there were objects that were waiting to be cycle collected. And use AHardwareBuffer for layer buffer with CompositorOGL uses less memory than WebRender.

-[1] Use Webrendwe + No AHardwareBuffer use
-[2] Use Webrendwe + No AHardwareBuffer use + Add calling nsJSContext::CycleCollectNow() in FromImageBitmap()
-[3] Use Webrendwe + Use AHardwareBuffer for WebGL SharedSurface
-[4] Use Webrendwe + Use AHardwareBuffer for WebGL SharedSurface in FromImageBitmap()
-[5] Use CompositorOGL + No AHardwareBuffer use
-[6] Use CompositorOGL + No AHardwareBuffer + Add calling nsJSContext::CycleCollectNow() in FromImageBitmap()
-[7] Use CompositorOGL + Use AHardwareBuffer for layer buffer
-[8] Use CompositorOGL + Use AHardwareBuffer for layer buffer + Add calling nsJSContext::CycleCollectNow() in FromImageBitmap()
-[9] Use CompositorOGL + Use AHardwareBuffer for WebGL SharedSurface
-[10] Use CompositorOGL + Use AHardwareBuffer for WebGL SharedSurface + Add calling nsJSContext::CycleCollectNow() in FromImageBitmap()
-[11] Use CompositorOGL + Use AHardwareBuffer for layer buffer + Use AHardwareBuffer for WebGL SharedSurface

:gw, is there a way to reduce memory usage with WebRender on Android?

Flags: needinfo?(gwatson)

There's nothing specific I'm aware of, but it's hard to say for sure without doing some detailed profiling of texture and render target allocations.

It's possible that this case might be causing WR to incorrectly allocate a heap of render targets that are retained in the pool - it might be worth logging what texture allocations the renderer thread is making in this test case.

Flags: needinfo?(gwatson)

Firefox Reality has WebRender disabled since it does not work yet. Most of the devices we support are running Android 7. So neither fixing WebRender nor using AHardwareBuffer (requires Android 8, API Level 26) will reduce memory usage in Firefox Reality.

Just noting down some other things we discussed today that we will try to get more clarity into what could help:

  • Try using DMD to see if that turns up anything
  • Add logging to capture total amount of of Android buffers

Build was failed with "ac_add_options --enable-dmd" on Android. Created Bug 1651079.

With Attachment 9162060 [details] [diff], majority of cases, RasterImages of ImageBitmaps were still alive during uploading them to WebGL texture. In this case, SurfaceCache held 777139116 bytes during the uploading. But there was a case that the RasterImages were destroyed before the WebGL uploading. In this case, SurfaceCache held 22700824 bytes during the uploading and succeeded to upload ImageBitmaps to WebGL textures.

From it, we want to destroy RasterImages explicitly before WebGL texture uploading. I wonder if it might be related to ImageDecoderHelper::~ImageDecoderHelper(). It just posts Image to main thread.

:aosmond, is it possible to destroy RasterImage soon after its usage in CreateImageBitmapFromBlob::OnImageReady()? We want to ensure that the RasterImages are destroyed when the ImageBitmaps are used to reduce peak memory use.

Flags: needinfo?(aosmond)

The following is a call stack during inserting to SurfaceCache.


-> image::SurfaceCacheImpl::StartTracking()
-> image::SurfaceCacheImpl::Insert()
-> image::SurfaceCache::Insert()
-> image::DecoderFactory::CreateDecoder()
-> image::RasterImage::Decode()
-> image::RasterImage::LookupFrame()
-> image::RasterImage::GetFrameInternal()
-> image::RasterImage::GetFrameAtSize()
-> image::RasterImage::GetFrame(s)
-> dom::CreateImageBitmapFromBlob::OnImageReady()
-> ImageDecoderHelper::Run()
-> SchedulerGroup::Runnable::Run()
-> RunnableTask::Run()

See Also: → 1651079
Depends on: 1651587

(In reply to Sotaro Ikeda [:sotaro PTO July 13th-17th] from comment #50)

:aosmond, is it possible to destroy RasterImage soon after its usage in CreateImageBitmapFromBlob::OnImageReady()? We want to ensure that the RasterImages are destroyed when the ImageBitmaps are used to reduce peak memory use.

Created Bug 1651587 for the above.

With a patch of Bug 1651587, GeckoView example app did not killed during loading ImageBitmaps, though the app was sometimes killed during loading videos.

Depends on: 1654169
Flags: needinfo?(aosmond)
Depends on: 1654459

I was surprised that SharedSurface_SurfaceTexture is not used by default in content process. See bug 1654459.

Whiteboard: [fxr:p1][geckoview:p1][fenix:p1][qf:p3:responsiveness] → [fxr:p1][geckoview:p1][qf:p3:responsiveness]
Severity: S2 → S3
Priority: P1 → P2
Performance Impact: --- → P3
Whiteboard: [fxr:p1][geckoview:p1][qf:p3:responsiveness] → [fxr:p1][geckoview:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: