1720169 - [Bug]: Weird rendering on android 5 (Adreno 4xx)

Reporter

Description

•

3 years ago

From github: https://github.com/mozilla-mobile/fenix/issues/20323.

Steps to reproduce

Open a page in fexin on android 5

Expected behaviour

All things should render smoothly

Actual behaviour

There are some weird render result such as square and triangles at the edge of round elements

Device name

Sony Xperia

Android version

Android 5

Firefox release type

Firefox

Firefox version

89.1.1

Device logs

No response

Additional information

No response

Change performed by the Move to Bugzilla add-on.

Kevin Brosnan [Ex-Mozilla]

Reporter

Updated

•

3 years ago

Comment 1

•

3 years ago

Pretty sure this is the same bug - looks the same, and same GPU and android version

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → DUPLICATE

Jamie Nicol [:jnicol]

Assignee

Comment 2

•

3 years ago

•

Edited

In bug 1714227 I fixed one of (at least) 2 issues causing the rendering artefacts. I'll re-open this bug to use it for the 2nd, which is can be seen in the first screenshot on the github issue linked to this bug. The blue and black diagonal lines across the screen shown instead of some glyphs.

It appears to be due to corrupted vertex data. I am unsure why, but the driver appears not to like our current method of orphaning (calling glBufferData) the instance data buffer prior to every draw call. This means we are repeatedly asking the driver to allocate new buffers of varying sizes, which for some reason leads to this corruption. I have noticed that by always using a fixed-size buffer the issue goes away. However, that would mean either always allocating very large buffers despite most draw calls requiring small amounts of data, or having to break batches in to much smaller calls so that the data can fit in the fixed size, neither of which are ideal.

Instead we should always allocate a large fixed size buffer, and where possible re-use the same buffer for as many draw calls as can fit in it. We have investigated doing this before for performance reasons, as many drivers struggle with the overheads of allocating and managing many small buffers (rather than actually transfering data in to said buffers). We thought of using a "base instance", or SSBOs, but due to lack of platform support we did not do this. However, what we missed at the time was that we can simply rebind the the buffer at a specified offset prior to each draw call, by using glVertexAttribPointer's offset parameter.

(As a follow up, we can even use the newer set of functions glVertexAttribFormat, glBindVertexBuffer, etc where supported, potentially saving the driver some work.)

The simple solution would be to always use the same VBO ID, and re-allocate the buffer when full by orphaning it. However this still results in glitches albeit different looking ones, presumably due to the same driver bug when recycling orphaned buffers. Instead we must handle the recycling ourselves by generating new VBO IDs when required, and only reusing an existing allocated buffer when it is no longer in use (eg by waiting 3 frames).

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Jamie Nicol [:jnicol]

Assignee

Updated

•

3 years ago

Blocks: wr-adreno4xx

Summary: [Bug]: Weird rendering on android 5 → [Bug]: Weird rendering on android 5 (Adreno 4xx)

Jamie Nicol [:jnicol]

Assignee

Comment 3

•

3 years ago

Attached file Bug 1720169 - Pack instance data for multiple draw calls in to large fixed-size VBOs. r?#gfx-reviewers — Details

Currently each VAO owns its own VBO for per-instance array data. Prior
to submitting each draw call that instance buffer is orphaned and a
new one allocated at exactly the right size to contain the batch's
data. This frequent reallocation appears to cause a rendering glitch
on some Adreno 4xx devices, likely due to a bug in the driver's
internal allocation/recycling code. It can also be expensive on some
devices, as the drivers struggle with the overheads of allocating many
small buffers much more than the cost of actually transferring the
data.

This patch makes it so that the VAOs no longer own an instance buffer
each. Instead, they share a pool of buffers. This pool always
allocates buffers large enough to hold the largest batch we
allow. update_vao_instances() now writes data to the pool's current
buffer until that buffer is full, and then either an old one will be
recycled or a new one allocated. After writing to the buffer it must
be bound to the VAO at the correct offset, and then the draw call is
submitted.

By manually managing the buffer lifecycles ourselves we avoid the
glitches on Adreno 4xx, and also improve performance on some devices.

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → jnicol

Pulsebot

Comment 4

•

3 years ago

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9c24c2012186
Pack instance data for multiple draw calls in to large fixed-size VBOs. r=gfx-reviewers,nical

Natalia Csoregi [:nataliaCs]

Comment 5

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9c24c2012186

Status: REOPENED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox92: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 92 Branch

Alexandru Michis [:malexandru]

Comment 6

•

3 years ago

Backed out changeset 9c24c2012186 (bug 1720169) for causing webrender crashes.
Backout link: https://hg.mozilla.org/integration/autoland/rev/a6fe7c532b7230ec8cb09bdfbfb4ca6fd6862463

Status: RESOLVED → REOPENED

status-firefox92: fixed → ---

Resolution: FIXED → ---

Target Milestone: 92 Branch → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 7

•

3 years ago

The change got backed out for crashes with [@ webrender::renderer::Renderer::draw_instanced_batch<T> ] (so far only reported for Windows 10), e.g. bp-3b93b929-ee4a-4a39-bd4d-af0b10210730. The backout will be part of the next merge to mozilla-central and the next Nightly.

Crash Signature: [@ webrender::renderer::Renderer::draw_instanced_batch<T> ]

Pulsebot

Comment 8

•

3 years ago

Backout by cbrindusan@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/b0f81186f7da
Backed out changeset 9c24c2012186 for causing webrender crashes.

Jamie Nicol [:jnicol]

Assignee

Comment 9

•

3 years ago

The stacktrace doesn't point to the exact location of the crash, but it's probably because I didn't add error handling to glMapBufferRange, so it sometimes returns null on failure and we attempt to write data to null.. Lots of the crash reports have graphicsCriticalErrors which indicate errors allocating textures, which makes this plausible.

We just need to avoid writing data to the ptr if the map fails. (The subsequent draw call will be garbage, but that would have been the case anyway before this patch if the corresponding glBufferData had failed.)

Pulsebot

Comment 10

•

3 years ago

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b97f1ccfbb1f
Pack instance data for multiple draw calls in to large fixed-size VBOs. r=gfx-reviewers,nical

Marian-Vasile Laza

Comment 11

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b97f1ccfbb1f

Status: REOPENED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox92: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 92 Branch

Andra Esanu (needinfo me)

Updated

•

3 years ago

Regressions: 1724216

Jamie Nicol [:jnicol]

Assignee

Updated

•

3 years ago

Regressions: 1724678

Cristian Brindusan [:cbrindusan]

Comment 12

•

3 years ago

Backed out changeset b97f1ccfbb1f (Bug 1720169) as requested on irc by jnicol for causing performance regressions (Bug 1724678, Bug 1724216).
https://hg.mozilla.org/integration/autoland/rev/8301e0b76d27e3bab2d0026c15a50594e27f31b1

Status: RESOLVED → REOPENED

status-firefox92: fixed → ---

Resolution: FIXED → ---

Target Milestone: 92 Branch → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 13

•

3 years ago

uplift

Backed out from for causing performance regressions (bug 1724678, bug 1724216):
beta https://hg.mozilla.org/releases/mozilla-beta/rev/4da80584e859

Iulian Moraru

Comment 14

•

3 years ago

Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/8301e0b76d27

Jamie Nicol [:jnicol]

Assignee

Updated

•

3 years ago

Regressions: 1724728

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

3 years ago

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jnicol, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(jnicol)

Nicolas Silva [:nical]

Updated

•

3 years ago

Flags: needinfo?(nical.bugzilla)

Jamie Nicol [:jnicol]

Assignee

Comment 16

•

2 years ago

The patch caused regressions and I haven't yet figured out why. We do want to fix this bug but it only affects a limited number of devices so hasn't been my highest priority.

Flags: needinfo?(jnicol)

Bob Hood [:bhood]

Updated

•

2 years ago

Severity: -- → S3