Closed Bug 727688 Opened 8 years ago Closed 8 years ago

OMTC: Tearing with first few tiles

Categories

(Core :: Graphics, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
blocking-fennec1.0 --- beta+

People

(Reporter: pcwalton, Assigned: pcwalton)

References

Details

There's some pretty severe tearing with the first few tiles on the screen when panning around. It results in the top of the screen looking like it moves faster or slower than the rest of the page.

Things I've found:

* There's still a little tearing even when not panning, if animations are occurring.
* Disabling the transform on async scrolling does not help.
* We often change the viewport on the Java side in the middle of texture upload, but I don't know why that would affect things.

I'm starting to wonder if we're out of sync with vsync, but that doesn't really explain why it looks like we're tearing on tile boundaries.
The tile boundaries are always the same, so this isn't an issue with tile boundaries. It is also not a race between composition and tile texture upload.
It is also not a race between Gecko and texture tile upload, because Gecko is blocked while tile texture upload is happening.
Looks like it's texture tile upload racing with something. Changing tile texture upload to start at tile 8, loop to the end, and then process tiles 0-7 made the broken tiles start in the middle of the screen.
Does the main thread stay blocked during upload?  Is this happening on a multicore phone?
Yes, the intention of the current OMTC implementation is that Gecko stays blocked during upload, and testing seems to indicate that it does. This is a multicore device, although it happens on non-multi-core devices as well.

I tried adding hashing of the bits during texture upload to make sure Gecko wasn't scribbling in the buffer during upload, but it made the problem go away instead. Still, it's evidence that it isn't Gecko writing to the buffer.
OK.  Unexpected buffer swapping sounds most plausible.  Does any of the java code do that?  You could try tracing eglswapbuffers.
Traced eglSwapBuffers(). Nothing out of the ordinary. Commented out Gecko's use of eglSwapBuffers() and nothing appears on the screen, as expected, indicating that nothing else seems to be calling it from behind our backs. (That doesn't rule out something swapping the buffers at a lower level than we do, though.)
The android widget code (inside android itself, not our widget/android) might be triggering buffer swap in surfaceflinger at the end of some event.  Are we lying to android for the sake of omtc?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> The android widget code (inside android itself, not our widget/android)
> might be triggering buffer swap in surfaceflinger at the end of some event. 
> Are we lying to android for the sake of omtc?

No, we aren't lying to Android in any way I can see. We basically do things the same way GLSurfaceView.java does them. However, it's certainly a possibility that something is switching the native buffers. I can test this theory tomorrow though, by seeing whether the native buffers have changed.

Note that disabling the Java compositor integration reduces the tearing significantly, although it still exists.
Wait, I'm not sure that the symptoms reflect buffers getting swapped out from beneath us. The problem is that the wrong texture data is bound when we actually go to do the draw. The only way switching buffers could be messing us up is that we change buffers while drawing... but that doesn't mesh with the evidence, because changing the order of *upload* (not of drawing) changes the tearing pattern.

My initial thought was that somehow texture upload was getting interleaved with drawing, but logging doesn't show this, and calling glFinish() before upload doesn't work either.

So I'm basically kinda stumped here.
Oh ok.  I forgot that we upload in a separate transaction from first draw after that.

Once way to test that hypothesis is to allocate a new texture on each upload.  That might change the timing and make this go away incidentally.  Probably want to check some timestamps.  But if racy upload is the bug ... jeez.  Implications are not good.
Assignee: nobody → pwalton
This seems to only be an issue on SGX540. Currently planning to just disable tiling on that GPU.
Blacklisted tiling on that driver, so closing this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
blocking-fennec1.0: --- → beta+
Depends on: 743314
Target Milestone: --- → mozilla14
You need to log in before you can comment on or make changes to this bug.