Open Bug 1486454 Opened 6 years ago Updated 3 months ago

Uploading ImageBitmap to texture with gl.texImage2D() is slow

Categories

(Core :: Graphics: Canvas2D, defect, P3)

63 Branch
defect

Tracking

()

Tracking Status
firefox61 --- affected
firefox62 --- affected
firefox63 --- affected

People

(Reporter: hogehoge, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36

Steps to reproduce:

I made ImageBitmap performance test to check the uploading texture speed with gl.texImage2D().

Test on Three.js: https://github.com/mrdoob/three.js/issues/11746#issuecomment-415461034



Actual results:

I confirmed uploading ImageBitmap to texture with gl.texImage2D() isn't faster than uploading regular Image on my Windows + FireFox Nightly. (I haven't tried on other platforms, like Mac, yet.)

UA: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0


Expected results:

In theory, uploading ImageBitmap should be faster than uploading regular Image because ImageBitmap is decoded ahead of time. On Chrome, it's 3x faster.
More detail about the test.

The links are

https://rawgit.com/takahirox/three.js/ImageBitmapTest/examples/webgl_texture_upload.html (Regular Image)
https://rawgit.com/takahirox/three.js/ImageBitmapTest/examples/webgl_texture_upload.html?imagebitmap (ImageBitmap)

Texture is changed(uploaded) in every 5 secs and you can see blocking time on them.

On my windows I see

8192x4096 JPG 4.4MB
- FireFox Image: 500ms
- FireFox ImageBitmap: 500ms
- Chrome Image: 500ms
- Chrome ImageBitmap: 165ms

2048x2048 PNG 4.5MB
- FireFox Image: 40ms
- FireFox ImageBitmap: 60ms
- Chrome Image: 140ms
- Chrome ImageBitmap: 35ms
User Agent:  Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID: 20180830111745                  
I manage to reproduce this issue on Windows 10 x64 with Firefox Nightly 63.0a1 (2018-08-30) (64-bit).
Status: UNCONFIRMED → NEW
Component: Untriaged → Canvas: 2D
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → All
Reviewing how Image and ImageBitmap work, the results are inline with what I would expect. The main practical difference from your example is that using the former will cause a sync decode from createImageBitmap on the main thread context, where as the latter will decode on the image decoder threads and return asynchronously. That is probably why it sometimes takes longer with the latter -- the context switching had some cost associated with it. That said, I'm not sure why Chrome does so much better given most of the work should be the decoding itself, which isn't fundamentally different.
Priority: -- → P3
Whiteboard: [gfx-noted]

:takahirox.

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I saw this profiling number includes all of three. However, only WebGLContext::texImage2D is relative with WebGL context. The other twos are more likely related with image decoder. I would expect we can figure out which part is our main bottleneck.

Flags: needinfo?(hogehoge)

ImageBitmapData vs. HTMLImageElement are different code path. ImageBitmapData uses async approach, and HTMLImageElement uses sync one. For the general case, the async approach should be faster, I need to dig deeper to confirm if they use the same image decoder.

ImageBitmapData

xul.dll!mozilla::dom::CreateImageBitmapFromBlob::Create(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, nsIEventTarget * aMainThreadEventTarget) Line 1484 C++
xul.dll!mozilla::dom::AsyncCreateImageBitmapFromBlob(mozilla::dom::Promise * aPromise, nsIGlobalObject * aGlobal, mozilla::dom::Blob & aBlob, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect) Line 1267 C++
xul.dll!mozilla::dom::ImageBitmap::Create(nsIGlobalObject * aGlobal, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aSrc, const mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > & aCropRect, mozilla::ErrorResult & aRv) Line 1335 C++
xul.dll!nsGlobalWindowInner::CreateImageBitmap(JSContext * aCx, const mozilla::dom::HTMLImageElementOrSVGImageElementOrHTMLCanvasElementOrHTMLVideoElementOrImageBitmapOrBlobOrCanvasRenderingContext2DOrImageData & aImage, mozilla::ErrorResult & aRv) Line 7097 C++

HTMLImageElement

xul.dll!imgLoader::LoadImage(nsIURI * aURI, nsIURI * aInitialDocumentURI, nsIURI * aReferrerURI, mozilla::net::ReferrerPolicy aReferrerPolicy, nsIPrincipal * aTriggeringPrincipal, unsigned __int64 aRequestContextID, nsILoadGroup * aLoadGroup, imgINotificationObserver * aObserver, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, unsigned int aLoadFlags, nsISupports * aCacheKey, unsigned int aContentPolicyType, const nsTSubstring<char16_t> & initiatorType, bool aUseUrgentStartForChannel, imgRequestProxy * * _retval) Line 2060 C++
xul.dll!nsContentUtils::LoadImage(nsIURI * aURI, nsINode * aContext, mozilla::dom::Document * aLoadingDocument, nsIPrincipal * aLoadingPrincipal, unsigned __int64 aRequestContextID, nsIURI * aReferrer, mozilla::net::ReferrerPolicy aReferrerPolicy, imgINotificationObserver * aObserver, int aLoadFlags, const nsTSubstring<char16_t> & initiatorType, imgRequestProxy * * aRequest, unsigned int aContentPolicyType, bool aUseUrgentStartForChannel) Line 3439 C++
xul.dll!nsImageLoadingContent::LoadImage(nsIURI * aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, bool aLoadStart, mozilla::dom::Document * aDocument, unsigned int aLoadFlags, nsIPrincipal * aTriggeringPrincipal) Line 986 C++
xul.dll!nsImageLoadingContent::LoadImage(const nsTSubstring<char16_t> & aNewURI, bool aForce, bool aNotify, nsImageLoadingContent::ImageLoadType aImageLoadType, nsIPrincipal * aTriggeringPrincipal) Line 868 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterMaybeChangeAttr(int aNamespaceID, nsAtom * aName, const nsAttrValueOrString & aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aValueMaybeChanged, bool aNotify) Line 405 C++
xul.dll!mozilla::dom::HTMLImageElement::AfterSetAttr(int aNameSpaceID, nsAtom * aName, const nsAttrValue * aValue, const nsAttrValue * aOldValue, nsIPrincipal * aMaybeScriptedPrincipal, bool aNotify) Line 287 C++

I am trying to use synchronize way to load imageBitmap in my patch. The performance gets a little bit increase but not so obvious. Probably, it is because this patch is not totally synchronize due to CreateImageBitmapFromBlob::DecodeAndCropBlob() still will try to use asynchronize way to load its mimeType and the bitmap data.

Assignee: nobody → dmu
Assignee: dmu → nobody

Providing new profile from my Windows PC:


Firefox
Async
ImageBitmap
Texture uploading time [ms]
25.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 216.00 / 239.00 / 224.71 / 7
2048x2048 PNG 4.5MB 25.00 / 31.00 / 27.43 / 7

Firefox
Sync
ImageBitmap
Texture uploading time [ms]
214.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 213.00 / 238.00 / 217.89 / 9
2048x2048 PNG 4.5MB 24.00 / 32.00 / 26.13 / 8

Chrome
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 80.70 / 96.10 / 88.57 / 13
2048x2048 PNG 4.5MB 16.50 / 27.40 / 25.83 / 13


For the small size imageBitmap, the difference is not so obvious.

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).

Flags: needinfo?(hogehoge)

(In reply to Takahiro Aoyagi (:takahirox) from comment #8)

There are some items that are needed to be figured out. Before uploading an imageBitmapData to WebGLContext::texImage2D(). We will do

  1. THREE.ImageBitmapLoader.Load -> Generate an ImageBlob.
  2. createImageBitmap -> get the ImageBitmapData.
  3. WebGLContext::texImage2D() -> Upload this ImageBitmapData to WebGL texture.

I count the elapsed time of only 3 in the example. Search "isTexture" in the code. The problem I want to fix here is longer main thread blocking time so seeing only 3 (my understanding is 1 and 2 are done asynchronously).

I think you are right. I will dig into understanding what we did in WebGLContext::texImage2D(), then providing some data to figure out the problem.

Although the format and internalFormat of texImage2D() [1] are RGB8. the SourceSurface format from ImageBitmapData is decoded as BGRX8, so we have to do a conversion at [2] to make the src image to be RGB8 format. If we skip the conversion, we can see the elapsed time will be decreased. Besides this, I don't see other weird things. Our implementation of texImage2D() is very straightforward, just do verification and call the ANGLE function.

Firefox
"Early return the conversion."
ImageBitmap
Texture uploading time [ms]
19.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 123.00 / 166.00 / 147.13 / 8
2048x2048 PNG 4.5MB 19.00 / 26.00 / 20.63 / 8

Firefox
"no early return the conversion."
ImageBitmap
Texture uploading time [ms]
26.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 178.00 / 316.00 / 225.86 / 7
2048x2048 PNG 4.5MB 24.00 / 28.00 / 25.71 / 7

Chrome

ImageBitmap
Texture uploading time [ms]
17.80
image / min / max / avg / count
8192x4096 JPG 4.4MB 79.80 / 97.80 / 91.00 / 7
2048x2048 PNG 4.5MB 17.80 / 27.70 / 25.20 / 7

[1] https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/texImage2D
[2] https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/dom/canvas/TexUnpackBlob.cpp#313

Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.

(In reply to Takahiro Aoyagi (:takahirox) from comment #11)

Thanks for the investigation. Can we do the conversion in createImageBitmap()? My understanding of the reason why ImageBitmap upload with texImage2D() should be fast is the data is asynchronously decoded and ready for uploading beforehand so we can skip any image data handling in texImage2D(). I think the createImageBitmap() is the best place to convert unless there is any problems.

Agree. We should figure out why ImageBitmapData from createImageBitmap() is BGRX8 and check what the format of Image is.

For Image, we also do the conversion (WebGL perf warning: texImage2D: Conversion requires pixel reformatting. (26->15)). I will add a profiler in Gecko to figure out where is the bottleneck.

On my Windows PC, I think Chrome is getting slow recently. Firefox gets the same profiling result but is faster than Chrome now.

"Chrome"

Image
Texture uploading time [ms]
72.57
image / min / max / avg / count
8192x4096 JPG 4.4MB 433.48 / 436.37 / 434.56 / 7
2048x2048 PNG 4.5MB 71.53 / 72.57 / 72.03 / 7

ImageBitmap
Texture uploading time [ms]
16.55
image / min / max / avg / count
8192x4096 JPG 4.4MB 210.75 / 257.78 / 222.22 / 6
2048x2048 PNG 4.5MB 16.55 / 17.24 / 16.85 / 6

"Nightly"

Image
Texture uploading time [ms]
16.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 151.00 / 193.00 / 164.00 / 5
2048x2048 PNG 4.5MB 16.00 / 17.00 / 16.80 / 5

ImageBitmap
Texture uploading time [ms]
17.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 193.00 / 194.00 / 193.40 / 5
2048x2048 PNG 4.5MB 14.00 / 17.00 / 15.60 / 5

You might also try uploading as RGBA instead of RGB. RGB will often incur an unpack in the driver to RGBX anyways.

My GPU is Nvidia GTX 1080 TI

"Chrome"
[RGB]
ImageBitmap
Texture uploading time [ms] 254.64
image / min / max / avg / count
8192x4096 JPG 4.4MB 228.09 / 254.86 / 249.28 / 5
JS at rAF time spent 0.00 / 1.00 / 0.51 / 5

[RGBA]
ImageBitmap
Texture uploading time [ms] 123.39
image / min / max / avg / count
8192x4096 JPG 4.4MB 122.59 / 123.97 / 123.29 / 13
JS at rAF time spent 0.00 / 1.00 / 0.13 / 5

"Nightly"
[RGB]
ImageBitmap
Texture uploading time [ms] 179.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 179.00 / 230.00 / 211.22 / 9
JS at rAF time spent 0.00 / 1.00 / 0.40 / 9

WebGLContext::TexImage elapsed time 155.694404
->TexOrSubImage::TexUnpackSurface avg. elapsed time 141.756672
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 41.727599
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 100.029205

[RGBA]
ImageBitmap
Texture uploading time [ms] 127.00
image / min / max / avg / count
8192x4096 JPG 4.4MB 125.00 / 138.00 / 128.89 / 9
JS at rAF time spent 0.00 / 1.00 / 0.17 / 9

WebGLContext::TexImage elapsed time 80.424603
->TexOrSubImage::TexUnpackSurface avg. elapsed time 68.367779
-->Conversion TexOrSubImage::TexUnpackSurface avg. elapsed time 52.362530
-->Driver TexOrSubImage::TexUnpackSurface avg. elapsed time 16.005117

Texture uploading time includes "JS interpreter time" + "WebGLContext::TexImage()" + "CPU wait for GPU time".

So, according to this stats, we can find if we use RGB internalFormat, GPU spends more time in its GPU function calls and also need time to let CPU wait GPU for unpacking pixels.

Cool, it's encouraging that this matches my prediction about RGBA8 vs RGB8. Please continue to prefer RGBA8 formats over RGB8.

Severity: normal → S3

So, any luck on getting some progress on this issue? The lag is very annoying (my code is similar, works fine on Chrome and Safari, even on mobile).

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: