Driver stalls with PBO texture uploads on Adreno
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox73 | --- | fixed |
People
(Reporter: jnicol, Assigned: jnicol)
References
Details
Attachments
(1 file)
Here is a profile of zooming in and out of a text-heavy page: https://perfht.ml/2XxUGaD
20% of time is spent in TextureUploader::upload()
, most of which is in ioctls talking to the driver, and very little copying the actual data.
I believe that the driver is stalling when we attempt to write data to/reallocate the buffer, until the texture upload of its previous data has completed. Our calls to glBufferData()
should be convincing the driver to allocate a new buffer rather than reuse the previous one, but that doesn't seem to work on Adreno.
If I force the driver to create a new PBO rather than reusing the old one, by using glGenBuffers()
for each upload, then most of the ioctl time goes away: https://perfht.ml/35oLN5P
Assignee | ||
Comment 1•5 years ago
•
|
||
I'm not sure my description of what's happening in comment 0 is entirely correct. It certainly does help, but on really bad sites (such as the daily mail, in desktop mode, as mentioned in bug 1599248) the upload time is still massive.
While experimenting, I tried using a single TextureUploader
for all uploads to each texture, rather than a separate one per upload. This effectively means we call glBufferData once per frame, with a large size, rather than tens or even hundreds of times. The driver seems to handle this much better! Like, even if I disable the raster-space rounding while zooming, meaning we rerasterize and reupload glyphs every frame, it's still snappy!
Assignee | ||
Comment 2•5 years ago
|
||
Uploading texture data is showing up frequently in profiles on Adreno devices,
especially when zooming on a text-heavy page. Specifically, the time is spent in
glMapBufferRange and glBufferSubData, most likely when internally allocating the
buffer before transferring data in to it.
Currently, we are creating a new PBO, by calling glBufferData(), for each
individual upload region. This change makes it so that we calculate the required
size for all upload regions to a texture, then create single a PBO of the
required size. The entire buffer is then mapped only once, and each individual
upload chunk is written to it. This can require the driver to allocate a large
buffer, sometimes multiple megabytes in size. However, it handles this case much
better than allocating tens or even hundreds of smaller buffers.
An upload chunk may require more space in a PBO than the original CPU-side
buffer, so that the data is aligned correctly for performance or correctness
reasons. Therefore it is the caller of Device.upload_texture()'s responsibility
to call a new function, Device.required_upload_size(), to calculate the required
size beforehand.
Comment 4•5 years ago
|
||
bugherder |
Comment 5•5 years ago
|
||
Backed out on autoland as requested by jnicol: https://hg.mozilla.org/integration/autoland/rev/ee4d1ff71f58952162907aa41ef891f6eedea58b
Comment 6•5 years ago
|
||
Jamie, can we re-land it in a way that is run-time configurable? I.e. on Android use one big PBO, on other platforms just keep creating a new PBO per upload.
Assignee | ||
Comment 7•5 years ago
|
||
That is my plan for the short-term, yes
Updated•5 years ago
|
Comment 9•5 years ago
|
||
bugherder |
Description
•