Uploading ImageBitmap to texture with gl.texImage2D() is slow
Categories
(Core :: Graphics: ImageLib, defect, P3)
Tracking
()
People
(Reporter: hogehoge, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: webcompat:platform-bug, Whiteboard: [gfx-noted])
User Story
webcompat:blocked-resources user-impact-score:0
Attachments
(1 file)
| Reporter | ||
Comment 1•7 years ago
|
||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
:takahirox.
There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do
- THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
- createImageBitmap -> get the ImageBitmapData.
- WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.
I saw this profiling number includes all of three. However, only WebGLContext::texImage2D is relative with WebGL context. The other twos are more likely related with image decoder. I would expect we can figure out which part is our main bottleneck.
Comment 5•7 years ago
|
||
ImageBitmapData vs. HTMLImageElement are different code path. ImageBitmapData uses async approach, and HTMLImageElement uses sync one. For the general case, the async approach should be faster, I need to dig deeper to confirm if they use the same image decoder.
ImageBitmapData
xul.dll!mozilla::dom::CreateImageBitmapFromBlob::Create(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, nsIEventTarget * aMainThreadEventTarget) Line 1484 C++
xul.dll!mozilla::dom::AsyncCreateImageBitmapFromBlob(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect) Line 1267 C++
xul.dll!mozilla::dom::ImageBitmap::Create(nsIGlobalObject * aGlobal, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aSrc, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, mozilla::ErrorResult & aRv) Line 1335 C++
xul.dll!nsGlobalWindowInner::CreateImageBitmap(JSContext * aCx, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aImage, mozilla::ErrorResult & aRv) Line 7097 C++
HTMLImageElement
xul.dll!imgLoader::LoadImage(nsIURI * aURI, nsIURI * aInitialDocumentURI, nsIURI * aReferrerURI, mozilla::net::ReferrerPolicy aReferrerPolicy, nsIPrincipal * aTriggeringPrincipal, unsigned __int64 aRequestContextID, nsILoadGroup * aLoadGroup, imgINotificationObserver * aObserver, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, unsigned int aLoadFlags, nsISupports * aCacheKey, unsigned int aContentPolicyType, const nsTSubstring<char16_t> & initiatorType, bool aUseUrgentStartForChannel, imgRequestProxy * * _retval) Line 2060 C++
xul.dll!nsContentUtils::LoadImage(nsIURI * aURI, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, nsIPrincipal * aLoadingPrincipal, unsigned __int64 aRequestContextID, nsIURI * aReferrer, mozilla::net::ReferrerPolicy aReferrerPolicy, imgINotificationObserver * aObserver, int aLoadFlags, const nsTSubstring<char16_t> & initiatorType, imgRequestProxy * * aRequest, unsigned int aContentPolicyType, bool aUseUrgentStartForChannel) Line 3439 C++
xul.dll!nsImageLoadingContent::LoadImage(nsIURI * aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, bool aLoadStart, mozilla::dom::Document * aDocument, unsigned int aLoadFlags, nsIPrincipal * aTriggeringPrincipal) Line 986 C++
xul.dll!nsImageLoadingContent::LoadImage(const nsTSubstring<char16_t> & aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, nsIPrincipal * aTriggeringPrincipal) Line 868 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterMaybeChangeAttr(int aNamespaceID, nsAtom * aName, const nsAttrValueOrString & aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aValueMaybeChanged, bool aNotify) Line 405 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterSetAttr(int aNameSpaceID, nsAtom * aName, const nsAttrValue * aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aNotify) Line 287 C++
Comment 6•7 years ago
•
|
||
I am trying to use synchronize way to load imageBitmap in my patch. The performance gets a little bit increase but not so obvious. Probably, it is because this patch is not totally synchronize due to CreateImageBitmapFromBlob::DecodeAndCropBlob() still will try to use asynchronize way to load its mimeType and the bitmap data.
Updated•7 years ago
|
Comment 7•7 years ago
•
|
||
Providing new profile from my Windows PC:
Firefox
Async
ImageBitmap
Texture uploading time [ms]
25.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 216.00 / 239.00 / 224.71 / 7
2048x2048 PNG 4.5MB 25.00 / 31.00 / 27.43 / 7
Firefox
Sync
ImageBitmap
Texture uploading time [ms]
214.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 213.00 / 238.00 / 217.89 / 9
2048x2048 PNG 4.5MB 24.00 / 32.00 / 26.13 / 8
Chrome
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 80.70 / 96.10 / 88.57 / 13
2048x2048 PNG 4.5MB 16.50 / 27.40 / 25.83 / 13
For the small size imageBitmap, the difference is not so obvious.
| Reporter | ||
Comment 8•7 years ago
|
||
There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do
- THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
- createImageBitmap -> get the ImageBitmapData.
- WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.
I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).
Comment 9•7 years ago
|
||
(In reply to Takahiro Aoyagi (:takahirox) from comment #8)
There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do
- THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
- createImageBitmap -> get the ImageBitmapData.
- WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.
I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).
I think you are right. I will dig into understanding what we did in WebGLContext::texImage2D(), then providing some data to figure out the problem.
Comment 10•6 years ago
•
|
||
Although the format and internalFormat of texImage2D() [1] are RGB8. the SourceSurface format from ImageBitmapData is decoded as BGRX8, so we have to do a conversion at [2] to make the src image to be RGB8 format. If we skip the conversion, we can see the elapsed time will be decreased. Besides this, I don't see other weird things. Our implementation of texImage2D() is very straightforward, just do verification and call the ANGLE function.
Firefox
"Early return the conversion."
ImageBitmap
Texture uploading time [ms]
19.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 123.00 / 166.00 / 147.13 / 8
2048x2048 PNG 4.5MB 19.00 / 26.00 / 20.63 / 8
Firefox
"no early return the conversion."
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 178.00 / 316.00 / 225.86 / 7
2048x2048 PNG 4.5MB 24.00 / 28.00 / 25.71 / 7
Chrome
ImageBitmap
Texture uploading time [ms]
17.80
image / min / max / avg / count
8192x4096 JPG 4.4MB 79.80 / 97.80 / 91.00 / 7
2048x2048 PNG 4.5MB 17.80 / 27.70 / 25.20 / 7
[1] https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/texImage2D
[2] https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/dom/canvas/TexUnpackBlob.cpp#313
| Reporter | ||
Comment 11•6 years ago
|
||
Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.
Comment 12•6 years ago
|
||
(In reply to Takahiro Aoyagi (:takahirox) from comment #11)
Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.
Agree. We should figure out why ImageBitmapData from createImageBitmap() is BGRX8 and check what the format of Image is.
Comment 13•6 years ago
|
||
For Image, we also do the conversion (WebGL perf warning: texImage2D: Conversion requires pixel reformatting. (26->15)). I will add a profiler in Gecko to figure out where is the bottleneck.
Comment 14•6 years ago
•
|
||
On my Windows PC, I think Chrome is getting slow recently. Firefox gets the same profiling result but is faster than Chrome now.
"Chrome"
Image
Texture uploading time [ms]
72.57
image / min / max / avg / count
8192x4096 JPG 4.4MB 433.48 / 436.37 / 434.56 / 7
2048x2048 PNG 4.5MB 71.53 / 72.57 / 72.03 / 7
ImageBitmap
Texture uploading time [ms]
16.55
image / min / max / avg / count
8192x4096 JPG 4.4MB 210.75 / 257.78 / 222.22 / 6
2048x2048 PNG 4.5MB 16.55 / 17.24 / 16.85 / 6
"Nightly"
Image
Texture uploading time [ms]
16.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 151.00 / 193.00 / 164.00 / 5
2048x2048 PNG 4.5MB 16.00 / 17.00 / 16.80 / 5
ImageBitmap
Texture uploading time [ms]
17.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 193.00 / 194.00 / 193.40 / 5
2048x2048 PNG 4.5MB 14.00 / 17.00 / 15.60 / 5
Comment 15•6 years ago
|
||
You might also try uploading as RGBA instead of RGB. RGB will often incur an unpack in the driver to RGBX anyways.
Comment 16•6 years ago
|
||
My GPU is Nvidia GTX 1080 TI
"Chrome"
[RGB]
ImageBitmap
Texture uploading time [ms] 254.64
image / min / max / avg / count
8192x4096 JPG 4.4MB 228.09 / 254.86 / 249.28 / 5
JS at rAF time spent 0.00 / 1.00 / 0.51 / 5
[RGBA]
ImageBitmap
Texture uploading time [ms] 123.39
image / min / max / avg / count
8192x4096 JPG 4.4MB 122.59 / 123.97 / 123.29 / 13
JS at rAF time spent 0.00 / 1.00 / 0.13 / 5
"Nightly"
[RGB]
ImageBitmap
Texture uploading time [ms] 179.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 179.00 / 230.00 / 211.22 / 9
JS at rAF time spent 0.00 / 1.00 / 0.40 / 9
WebGLContext::TexImage elapsed time 155.694404
->TexOrSubImage::TexUnpackSurface avg. elapsed time 141.756672
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 41.727599
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 100.029205
[RGBA]
ImageBitmap
Texture uploading time [ms] 127.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 125.00 / 138.00 / 128.89 / 9
JS at rAF time spent 0.00 / 1.00 / 0.17 / 9
WebGLContext::TexImage elapsed time 80.424603
->TexOrSubImage::TexUnpackSurface avg. elapsed time 68.367779
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 52.362530
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 16.005117
Texture uploading time includes "JS interpreter time" + "WebGLContext::TexImage()" + "CPU wait for GPU time".
So, according to this stats, we can find if we use RGB internalFormat, GPU spends more time in its GPU function calls and also need time to let CPU wait GPU for unpacking pixels.
Comment 17•6 years ago
|
||
Cool, it's encouraging that this matches my prediction about RGBA8 vs RGB8. Please continue to prefer RGBA8 formats over RGB8.
Updated•3 years ago
|
Comment 18•1 year ago
|
||
So, any luck on getting some progress on this issue? The lag is very annoying (my code is similar, works fine on Chrome and Safari, even on mobile).
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•9 months ago
|
Comment 20•9 months ago
•
|
||
I've attempted repro on a highend machine (Ryzen 7950X3D + Radeon RX 9070 XT, with two 3840x2160@240hz HDR monitors) using the links in comment #1, the numbers for Texture uploading time [ms] I see are as follows...
- Firefox Nightly 139.0a1 (2025-04-15) (64-bit)
- Regular image - number seems accurate on first time, after that it goes to 4ms and 3ms respectively but the observable jank looks to be at least as long as the ImageBitmap case, only showing the first time numbers here, reloading the page still shows 4ms and 3ms (but again jank), closing the tab and reopening it after a bit seems to show the full time again for the first upload, then back to 4ms and 3ms reported but visible jank
- 8192x4096 JPG 142 first time
- 204x2048 PNG 37 first time
- ImageBitmap - these seem accurate so I'm including the full timing report
- 8192x4096 JPG 4.4MB 44.00 / 50.00 / 47.32 / 66
- 2048x2048 PNG 4.5MB 5.00 / 7.00 / 6.58 / 65
- Regular image - number seems accurate on first time, after that it goes to 4ms and 3ms respectively but the observable jank looks to be at least as long as the ImageBitmap case, only showing the first time numbers here, reloading the page still shows 4ms and 3ms (but again jank), closing the tab and reopening it after a bit seems to show the full time again for the first upload, then back to 4ms and 3ms reported but visible jank
- Chrome Version 135.0.7049.96 (Official Build) (64-bit)
- Regular image
- 8192x4096 JPG 4.4MB 135.00 / 142.70 / 137.91 / 13
- 2048x2048 PNG 4.5MB 38.90 / 41.20 / 40.12 / 12
- ImageBitmap
- 8192x4096 JPG 4.4MB 44.70 / 52.30 / 46.15 / 13
- 2048x2048 PNG 4.5MB 4.60 / 5.00 / 4.82 / 12
- Regular image
- Edge Version 135.0.3179.85 (Official build) (64-bit)
- Regular image
- 8192x4096 JPG 4.4MB 129.70 / 147.90 / 133.54 / 13
- 2048x2048 PNG 4.5MB 38.30 / 41.20 / 40.01 / 12
- ImageBitmap
- 8192x4096 JPG 4.4MB 44.10 / 46.90 / 45.02 / 13
- 2048x2048 PNG 4.5MB 4.80 / 5.20 / 4.96 / 12
- Regular image
Obviously at 240hz, anything over 4.1667ms guarantees jank, 142ms observed in Firefox is highly visible jank.
Comment 21•9 months ago
|
||
After looking into this a bit more, at first it seemed like the cost was in WebGL, but I think that is actually deceptive, and I am going to refer this over to Image Lib.
It seems like as far as WebGL goes, we are getting the same source formats and doing the same conversions regardless whether we go through an image or ImageBitmap, so I don't believe that to be a significant differentiator of overhead here.
In the image case, on the content process side, we decode a few times and then eventually hit SurfaceCache'd results, that mean we no longer pay a substantial cost in the content process for repeated uploads to WebGL after the first couple tries.
With the ImageBitmap case, on the content process side, we are repeatedly calling CreateImageBitmapFromBlob, creating a new ImageBitmap with a different Blob every time, so that we continue to pay the decode cost over and over in this particular demo. I am not entirely familiar with this part of Image Lib, but it seems really difficult to cache this, as it is using some type of stream blob whose identity seems to differ each time? Like, that might require going as far as trying to hash/cache based on stream contents, which seems a bit extreme unless we can prove this is really a common beneficial pattern?
This seems like it would be better for the demo in this case to keep the original ImageBitmap around, that way on subsequent uses, it would benefit from internal ImageBitmap caching and thus skip the decoding cost that it is paying every time?
Timothy, Andrew?
Updated•9 months ago
|
Comment 22•9 months ago
|
||
Okay, so it seems like, on further discussion with Ashley, that the caching in this demo is beside the point, and it kind of makes it hard to analyze what we're actually trying to test with the demo, which is the uncached behavior, the jank on the first load. Firefox caches, though the other browsers don't.
In Ashley's testing, we are reasonably at parity with the other browsers in terms of performance.
If you compare uncached first-run ImageBitmap results to Firefox to uncached first-run image results for Firefox, we are actually faster with ImageBitmap.
Updated•8 months ago
|
Comment 23•6 months ago
|
||
When I looked at this I thought that ImageBitmap was similar to ImageData, but I see now that ImageBitmap is meant to be a texture reference and not a block of raw pixel data, so while the example does perform better in Firefox than other browsers, the actual intent is to directly hand off the texture by reference within the GPU process and avoid any system memory copies, so that is what needs fixing.
Updated•6 months ago
|
Updated•6 months ago
|
Updated•4 months ago
|
Description
•