Open Bug 1486454 Opened 7 years ago Updated 4 months ago

Uploading ImageBitmap to texture with gl.texImage2D() is slow

Categories

(Core :: Graphics: ImageLib, defect, P3)

63 Branch
defect

Tracking

()

Tracking Status
firefox61 --- affected
firefox62 --- affected
firefox63 --- affected

People

(Reporter: hogehoge, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: webcompat:platform-bug, Whiteboard: [gfx-noted])

User Story

webcompat:blocked-resources
user-impact-score:0

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36 Steps to reproduce: I made ImageBitmap performance test to check the uploading texture speed with gl.texImage2D(). Test on Three.js: https://github.com/mrdoob/three.js/issues/11746#issuecomment-415461034 Actual results: I confirmed uploading ImageBitmap to texture with gl.texImage2D() isn't faster than uploading regular Image on my Windows + FireFox Nightly. (I haven't tried on other platforms, like Mac, yet.) UA: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0 Expected results: In theory, uploading ImageBitmap should be faster than uploading regular Image because ImageBitmap is decoded ahead of time. On Chrome, it's 3x faster.
More detail about the test. The links are https://rawgit.com/takahirox/three.js/ImageBitmapTest/examples/webgl_texture_upload.html (Regular Image) https://rawgit.com/takahirox/three.js/ImageBitmapTest/examples/webgl_texture_upload.html?imagebitmap (ImageBitmap) Texture is changed(uploaded) in every 5 secs and you can see blocking time on them. On my windows I see 8192x4096 JPG 4.4MB - FireFox Image: 500ms - FireFox ImageBitmap: 500ms - Chrome Image: 500ms - Chrome ImageBitmap: 165ms 2048x2048 PNG 4.5MB - FireFox Image: 40ms - FireFox ImageBitmap: 60ms - Chrome Image: 140ms - Chrome ImageBitmap: 35ms
User Agent:  Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0 Build ID: 20180830111745 I manage to reproduce this issue on Windows 10 x64 with Firefox Nightly 63.0a1 (2018-08-30) (64-bit).
Status: UNCONFIRMED → NEW
Component: Untriaged → Canvas: 2D
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → All
Reviewing how Image and ImageBitmap work, the results are inline with what I would expect. The main practical difference from your example is that using the former will cause a sync decode from createImageBitmap on the main thread context, where as the latter will decode on the image decoder threads and return asynchronously. That is probably why it sometimes takes longer with the latter -- the context switching had some cost associated with it. That said, I'm not sure why Chrome does so much better given most of the work should be the decoding itself, which isn't fundamentally different.
Priority: -- → P3
Whiteboard: [gfx-noted]

:takahirox.

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I saw this profiling number includes all of three. However, only WebGLContext::texImage2D is relative with WebGL context. The other twos are more likely related with image decoder. I would expect we can figure out which part is our main bottleneck.

Flags: needinfo?(hogehoge)

ImageBitmapData vs. HTMLImageElement are different code path. ImageBitmapData uses async approach, and HTMLImageElement uses sync one. For the general case, the async approach should be faster, I need to dig deeper to confirm if they use the same image decoder.

ImageBitmapData

xul.dll!mozilla::dom::CreateImageBitmapFromBlob::Create(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, nsIEventTarget * aMainThreadEventTarget) Line 1484 C++
xul.dll!mozilla::dom::AsyncCreateImageBitmapFromBlob(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect) Line 1267 C++
xul.dll!mozilla::dom::ImageBitmap::Create(nsIGlobalObject * aGlobal, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aSrc, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, mozilla::ErrorResult & aRv) Line 1335 C++
xul.dll!nsGlobalWindowInner::CreateImageBitmap(JSContext * aCx, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aImage, mozilla::ErrorResult & aRv) Line 7097 C++

HTMLImageElement

xul.dll!imgLoader::LoadImage(nsIURI * aURI, nsIURI * aInitialDocumentURI, nsIURI * aReferrerURI, mozilla::net::ReferrerPolicy aReferrerPolicy, nsIPrincipal * aTriggeringPrincipal, unsigned __int64 aRequestContextID, nsILoadGroup * aLoadGroup, imgINotificationObserver * aObserver, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, unsigned int aLoadFlags, nsISupports * aCacheKey, unsigned int aContentPolicyType, const nsTSubstring<char16_t> & initiatorType, bool aUseUrgentStartForChannel, imgRequestProxy * * _retval) Line 2060 C++
xul.dll!nsContentUtils::LoadImage(nsIURI * aURI, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, nsIPrincipal * aLoadingPrincipal, unsigned __int64 aRequestContextID, nsIURI * aReferrer, mozilla::net::ReferrerPolicy aReferrerPolicy, imgINotificationObserver * aObserver, int aLoadFlags, const nsTSubstring<char16_t> & initiatorType, imgRequestProxy * * aRequest, unsigned int aContentPolicyType, bool aUseUrgentStartForChannel) Line 3439 C++
xul.dll!nsImageLoadingContent::LoadImage(nsIURI * aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, bool aLoadStart, mozilla::dom::Document * aDocument, unsigned int aLoadFlags, nsIPrincipal * aTriggeringPrincipal) Line 986 C++
xul.dll!nsImageLoadingContent::LoadImage(const nsTSubstring<char16_t> & aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, nsIPrincipal * aTriggeringPrincipal) Line 868 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterMaybeChangeAttr(int aNamespaceID, nsAtom * aName, const nsAttrValueOrString & aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aValueMaybeChanged, bool aNotify) Line 405 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterSetAttr(int aNameSpaceID, nsAtom * aName, const nsAttrValue * aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aNotify) Line 287 C++

I am trying to use synchronize way to load imageBitmap in my patch. The performance gets a little bit increase but not so obvious. Probably, it is because this patch is not totally synchronize due to CreateImageBitmapFromBlob::DecodeAndCropBlob() still will try to use asynchronize way to load its mimeType and the bitmap data.

Assignee: nobody → dmu
Assignee: dmu → nobody

Providing new profile from my Windows PC:


Firefox
Async
ImageBitmap
Texture uploading time [ms]
25.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 216.00 / 239.00 / 224.71 / 7
2048x2048 PNG 4.5MB 25.00 / 31.00 / 27.43 / 7

Firefox
Sync
ImageBitmap
Texture uploading time [ms]
214.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 213.00 / 238.00 / 217.89 / 9
2048x2048 PNG 4.5MB 24.00 / 32.00 / 26.13 / 8

Chrome
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 80.70 / 96.10 / 88.57 / 13
2048x2048 PNG 4.5MB 16.50 / 27.40 / 25.83 / 13


For the small size imageBitmap, the difference is not so obvious.

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).

Flags: needinfo?(hogehoge)

(In reply to Takahiro Aoyagi (:takahirox) from comment #8)

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).

I think you are right. I will dig into understanding what we did in WebGLContext::texImage2D(), then providing some data to figure out the problem.

Although the format and internalFormat of texImage2D() [1] are RGB8. the SourceSurface format from ImageBitmapData is decoded as BGRX8, so we have to do a conversion at [2] to make the src image to be RGB8 format. If we skip the conversion, we can see the elapsed time will be decreased. Besides this, I don't see other weird things. Our implementation of texImage2D() is very straightforward, just do verification and call the ANGLE function.

Firefox
"Early return the conversion."
ImageBitmap
Texture uploading time [ms]
19.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 123.00 / 166.00 / 147.13 / 8
2048x2048 PNG 4.5MB 19.00 / 26.00 / 20.63 / 8

Firefox
"no early return the conversion."
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 178.00 / 316.00 / 225.86 / 7
2048x2048 PNG 4.5MB 24.00 / 28.00 / 25.71 / 7

Chrome

ImageBitmap
Texture uploading time [ms]
17.80
image / min / max / avg / count
8192x4096 JPG 4.4MB 79.80 / 97.80 / 91.00 / 7
2048x2048 PNG 4.5MB 17.80 / 27.70 / 25.20 / 7

[1] https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/texImage2D
[2] https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/dom/canvas/TexUnpackBlob.cpp#313

Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.

(In reply to Takahiro Aoyagi (:takahirox) from comment #11)

Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.

Agree. We should figure out why ImageBitmapData from createImageBitmap() is BGRX8 and check what the format of Image is.

For Image, we also do the conversion (WebGL perf warning: texImage2D: Conversion requires pixel reformatting. (26->15)). I will add a profiler in Gecko to figure out where is the bottleneck.

On my Windows PC, I think Chrome is getting slow recently. Firefox gets the same profiling result but is faster than Chrome now.

"Chrome"

Image
Texture uploading time [ms]
72.57
image / min / max / avg / count
8192x4096 JPG 4.4MB 433.48 / 436.37 / 434.56 / 7
2048x2048 PNG 4.5MB 71.53 / 72.57 / 72.03 / 7

ImageBitmap
Texture uploading time [ms]
16.55
image / min / max / avg / count
8192x4096 JPG 4.4MB 210.75 / 257.78 / 222.22 / 6
2048x2048 PNG 4.5MB 16.55 / 17.24 / 16.85 / 6

"Nightly"

Image
Texture uploading time [ms]
16.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 151.00 / 193.00 / 164.00 / 5
2048x2048 PNG 4.5MB 16.00 / 17.00 / 16.80 / 5

ImageBitmap
Texture uploading time [ms]
17.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 193.00 / 194.00 / 193.40 / 5
2048x2048 PNG 4.5MB 14.00 / 17.00 / 15.60 / 5

You might also try uploading as RGBA instead of RGB. RGB will often incur an unpack in the driver to RGBX anyways.

My GPU is Nvidia GTX 1080 TI

"Chrome"
[RGB]
ImageBitmap
Texture uploading time [ms] 254.64
image / min / max / avg / count
8192x4096 JPG 4.4MB 228.09 / 254.86 / 249.28 / 5
JS at rAF time spent 0.00 / 1.00 / 0.51 / 5

[RGBA]
ImageBitmap
Texture uploading time [ms] 123.39
image / min / max / avg / count
8192x4096 JPG 4.4MB 122.59 / 123.97 / 123.29 / 13
JS at rAF time spent 0.00 / 1.00 / 0.13 / 5

"Nightly"
[RGB]
ImageBitmap
Texture uploading time [ms] 179.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 179.00 / 230.00 / 211.22 / 9
JS at rAF time spent 0.00 / 1.00 / 0.40 / 9

WebGLContext::TexImage elapsed time 155.694404
->TexOrSubImage::TexUnpackSurface avg. elapsed time 141.756672
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 41.727599
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 100.029205

[RGBA]
ImageBitmap
Texture uploading time [ms] 127.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 125.00 / 138.00 / 128.89 / 9
JS at rAF time spent 0.00 / 1.00 / 0.17 / 9

WebGLContext::TexImage elapsed time 80.424603
->TexOrSubImage::TexUnpackSurface avg. elapsed time 68.367779
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 52.362530
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 16.005117

Texture uploading time includes "JS interpreter time" + "WebGLContext::TexImage()" + "CPU wait for GPU time".

So, according to this stats, we can find if we use RGB internalFormat, GPU spends more time in its GPU function calls and also need time to let CPU wait GPU for unpacking pixels.

Cool, it's encouraging that this matches my prediction about RGBA8 vs RGB8. Please continue to prefer RGBA8 formats over RGB8.

Severity: normal → S3

So, any luck on getting some progress on this issue? The lag is very annoying (my code is similar, works fine on Chrome and Safari, even on mobile).

User Story: (updated)
User Story: (updated)
Component: Graphics: Canvas2D → Graphics: CanvasWebGL

I've attempted repro on a highend machine (Ryzen 7950X3D + Radeon RX 9070 XT, with two 3840x2160@240hz HDR monitors) using the links in comment #1, the numbers for Texture uploading time [ms] I see are as follows...

  • Firefox Nightly 139.0a1 (2025-04-15) (64-bit)
    • Regular image - number seems accurate on first time, after that it goes to 4ms and 3ms respectively but the observable jank looks to be at least as long as the ImageBitmap case, only showing the first time numbers here, reloading the page still shows 4ms and 3ms (but again jank), closing the tab and reopening it after a bit seems to show the full time again for the first upload, then back to 4ms and 3ms reported but visible jank
      • 8192x4096 JPG 142 first time
      • 204x2048 PNG 37 first time
    • ImageBitmap - these seem accurate so I'm including the full timing report
      • 8192x4096 JPG 4.4MB 44.00 / 50.00 / 47.32 / 66
      • 2048x2048 PNG 4.5MB 5.00 / 7.00 / 6.58 / 65
  • Chrome Version 135.0.7049.96 (Official Build) (64-bit)
    • Regular image
      • 8192x4096 JPG 4.4MB 135.00 / 142.70 / 137.91 / 13
      • 2048x2048 PNG 4.5MB 38.90 / 41.20 / 40.12 / 12
    • ImageBitmap
      • 8192x4096 JPG 4.4MB 44.70 / 52.30 / 46.15 / 13
      • 2048x2048 PNG 4.5MB 4.60 / 5.00 / 4.82 / 12
  • Edge Version 135.0.3179.85 (Official build) (64-bit)
    • Regular image
      • 8192x4096 JPG 4.4MB 129.70 / 147.90 / 133.54 / 13
      • 2048x2048 PNG 4.5MB 38.30 / 41.20 / 40.01 / 12
    • ImageBitmap
      • 8192x4096 JPG 4.4MB 44.10 / 46.90 / 45.02 / 13
      • 2048x2048 PNG 4.5MB 4.80 / 5.20 / 4.96 / 12

Obviously at 240hz, anything over 4.1667ms guarantees jank, 142ms observed in Firefox is highly visible jank.

After looking into this a bit more, at first it seemed like the cost was in WebGL, but I think that is actually deceptive, and I am going to refer this over to Image Lib.

It seems like as far as WebGL goes, we are getting the same source formats and doing the same conversions regardless whether we go through an image or ImageBitmap, so I don't believe that to be a significant differentiator of overhead here.

In the image case, on the content process side, we decode a few times and then eventually hit SurfaceCache'd results, that mean we no longer pay a substantial cost in the content process for repeated uploads to WebGL after the first couple tries.

With the ImageBitmap case, on the content process side, we are repeatedly calling CreateImageBitmapFromBlob, creating a new ImageBitmap with a different Blob every time, so that we continue to pay the decode cost over and over in this particular demo. I am not entirely familiar with this part of Image Lib, but it seems really difficult to cache this, as it is using some type of stream blob whose identity seems to differ each time? Like, that might require going as far as trying to hash/cache based on stream contents, which seems a bit extreme unless we can prove this is really a common beneficial pattern?

This seems like it would be better for the demo in this case to keep the original ImageBitmap around, that way on subsequent uses, it would benefit from internal ImageBitmap caching and thus skip the decoding cost that it is paying every time?

Timothy, Andrew?

Flags: needinfo?(tnikkel)
Flags: needinfo?(aosmond)
Component: Graphics: CanvasWebGL → Graphics: ImageLib

Okay, so it seems like, on further discussion with Ashley, that the caching in this demo is beside the point, and it kind of makes it hard to analyze what we're actually trying to test with the demo, which is the uncached behavior, the jank on the first load. Firefox caches, though the other browsers don't.

In Ashley's testing, we are reasonably at parity with the other browsers in terms of performance.

If you compare uncached first-run ImageBitmap results to Firefox to uncached first-run image results for Firefox, we are actually faster with ImageBitmap.

User Story: (updated)

When I looked at this I thought that ImageBitmap was similar to ImageData, but I see now that ImageBitmap is meant to be a texture reference and not a block of raw pixel data, so while the example does perform better in Firefox than other browsers, the actual intent is to directly hand off the texture by reference within the GPU process and avoid any system memory copies, so that is what needs fixing.

User Story: (updated)
User Story: (updated)
User Story: (updated)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: