Open Bug 1729328 Opened 3 years ago Updated 3 years ago

Avoid copying images row by row in texture uploads

Tracking

()

Status:

NEW

People

(Reporter: nical, Assigned: nical)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

WIP: Bug 1729328 - texture upload expeirment. 3 years ago Nicolas Silva [:nical] 48 bytes, text/x-phabricator-request		Details \| Review

Nicolas Silva [:nical]

Assignee

Description

•

3 years ago

We see a lot of time spent copying from the cache item into staging buffer on the CPU, and suspect that part of that comes from copying row by row instead of having a large memcpy for the whole image.

Nicolas Silva [:nical]

Assignee

Comment 1

•

3 years ago

Attached file WIP: Bug 1729328 - texture upload expeirment. — Details

Nicolas Silva [:nical]

Assignee

Comment 2

•

3 years ago

As far as I could measure, packing all items linearly and unpacking them in a shader only has a modest impact on the time spent copying into the staging buffer (6% improvement in the copy time at best). It can remove 1 ms in a bad frame which isn't bad but I was hoping for a better speedup.

Storing the images contiguously in the staging buffer opens the door to porentially rasterizing blobs directly into it (for the glTexSubImage code path we use on Windows), however that means more risk of false cache sharing since the blob tiles are rasterized in parallel.

Nicolas Silva [:nical]

Assignee

Comment 3

•

3 years ago

The worse data locality doesn't appear to hurt the copy shader GPU time (or it's compensated by the simplicity of the shader). In renderdoc there is no observable time difference between the two.

Nicolas Silva [:nical]

Assignee

Comment 4

•

3 years ago

Using larger (1024x1024 instead of 512x512) staging textures doesn't affect the copy time, but the number of draw calls required to do the GPU copy go from 40-ish to 15-ish when testing with the bottom of https://creativecluster.lu/ (it has a large animated blob).

Nicolas Silva [:nical]

Assignee

Comment 5

•

3 years ago

On linux+intel, having bigger blob tiles (512x512 instead of 256x256) and uploading them directly instead of using the batched upload path makes a pretty large difference (creativecluster test case total cache update time goes from avg 18.2 max 56.4 to avg 8.9 max 27.3).

Bigger blob tiles means less invalidation granularity, however it also means less of the per-tile overhead during rasterization so it would be a tradeoff.

Nicolas Silva [:nical]

Assignee

Comment 6

•

3 years ago

I added a code path to upload directly off of the image buffer into the staging texture (skipping the staging CPU buffer) when the image is large enough that we are unlikely to fit another one in the Staging CPU buffer. With that and setting the blob tile size to 512, the memcpy time almost or less goes away, however, the time we spend in glTexSubImage2D increases a lot on windows+intel (16ms to 22ms average on the creativecluster test case), which is odd since it should be doing exactly the same thing (except reading from a different source). The time is spent in WaitForSynchronizationObjectForCpu under UpdateSubResource.

Nicolas Silva [:nical]

Assignee

Updated

•

3 years ago

Bugzilla

Quick Search

Avoid copying images row by row in texture uploads

Categories

(Core :: Graphics: WebRender, enhancement, P3)

Tracking

()

People

(Reporter: nical, Assigned: nical)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Attachment

General

Description

File Name

Content Type