Closed Bug 1075265 Opened 10 years ago Closed 21 days ago

Flame 2.1: blank screen and "mdp3_ppp_verify_scale: y req scale factor beyond capability" error

Categories

(Core :: Graphics: Layers, defect)

x86
macOS
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: djf, Unassigned)

References

Details

I was able to work around bug 1074262 by adding a will-change:opacity to the CSS, but there is still something going on here that should be investigated. So I'm filing this bug as a follow-up. Just kind of guessing for the component.

See bug 1074262 for STR. You'll probably have to unapply the patch in that bug in order to reproduce. You can also watch the video attached to that bug.

This only seems to happen on:

  Flame
  v2.1 gecko (but gaia can be 2.1 or 2.2)
  319mb memory
  when trying to play or display a frame of a 720x480 video
  when there are opaque overlays without will-change:opacity
  video recorded in portrait mode
  video being played in portrait mode.

The symptom of the bug is that the screen goes all white (or whatever the background color of the app is) and we see these messages in the logcat:

I/cat     (  228): <3>[ 1883.115369] mdp3_ppp_verify_scale: y req scale factor beyond capability
I/cat     (  228): <3>[ 1883.121412] mdp3_ppp_blit: invalid image!

If the video is playing, then these messages are repeated for each frame of the video.

It is not just the video that doesn't display. It is not just the video playback controls that don't display. Everything except the background color is just gone, so it looks to me like a complete failure of the layer tree or compositing pipeline or something.
Timothy: this is a followup to bug 1074262, and since you were needinfo'ed to look at that bug, I'm setting needinfo here as well.
Flags: needinfo?(tnikkel)
[Blocking Requested - why for this release]: Bug 1074272 was a 2.1+ bug. I found a gaia workaround, but there is obviously still a serious graphics bug lurking in the 2.1 version of gecko, so I suggest that we block on this.
blocking-b2g: --- → 2.1?
In bug 1074262, QA suggested that this bug was related to bug 1051636. I haven't looked closely at that bug, but am not convinced that it is relevant. As Milan noted, the fix for 1051636 had been uplifted to Aurora. Also, that bug may have been windows-specific. And this bug seems to be causing (or caused by) a failure in the mdp3 driver, which I gather is an Android thing.
(In reply to David Flanagan [:djf] from comment #3)
> In bug 1074262, QA suggested that this bug was related to bug 1051636. I
> haven't looked closely at that bug, but am not convinced that it is
> relevant. As Milan noted, the fix for 1051636 had been uplifted to Aurora.
> Also, that bug may have been windows-specific. And this bug seems to be
> causing (or caused by) a failure in the mdp3 driver, which I gather is an
> Android thing.

See bug 1074262, comment 32. Specifically bug 1051636 is not on aurora for b2g. Nothing Windows-specific about the fix for bug 1051636, it just happened to have been observed first on Windows. Since bug 1051636 fixed this on central, I'm guessing it will fix it on aurora as well. Can we get someone to verify that bug 1051636 (when actually applied and not ifdef'd out) when applied to aurora does indeed fix the problem?
Flags: needinfo?(tnikkel) → needinfo?(dflanagan)
Timothy,

But 1051636 seemed to be about layer opacity. But the error messages associated with this bug seem to be about an incorrect scaling factor and are related to video size and available memory. So I would guess that they are unrelated bugs.  But I know nothing about the area, and if the "scale factor beyond capability" error seems plausibly related to the issues in bug 1051636 to you, then I'm happy to treat this as a dupe of that bug.

As for the verification you've proposed, does https://bugzilla.mozilla.org/show_bug.cgi?id=1074262#c24 suffice?  Not sure if a regression window was a precise enough verification for you.
Flags: needinfo?(dflanagan) → needinfo?(tnikkel)
(In reply to David Flanagan [:djf] from comment #5)
> Timothy,
> 
> But 1051636 seemed to be about layer opacity. But the error messages
> associated with this bug seem to be about an incorrect scaling factor and
> are related to video size and available memory. So I would guess that they
> are unrelated bugs.  But I know nothing about the area, and if the "scale
> factor beyond capability" error seems plausibly related to the issues in bug
> 1051636 to you, then I'm happy to treat this as a dupe of that bug.

Is this bug about the error msgs or the user visible problems? Either way, we should try with and without the patch to see exactly what it fixes to know for sure, otherwise it's just speculation.

> As for the verification you've proposed, does
> https://bugzilla.mozilla.org/show_bug.cgi?id=1074262#c24 suffice?  Not sure
> if a regression window was a precise enough verification for you.

There are two bugs in that window, it's possible the other one could be responsible.
Flags: needinfo?(tnikkel)
qawanted for branch checks to see if regression from 2.0 or before.
Keywords: qawanted
Dietrich - I'm not sure about doing a branch check here - this bug is a follow-up bug from another bug and is technically fixed but David wanted to do some follow-up investigation. 

To repro we would have to unapply a patch and then the bug would repro in the same areas as the prior bug - which is: 
2.0 - unaffected
2.1 affected  (also verified fixed now)
2.2 affected  (also verified fixed now)


If you feel I'm incorrect in this - please re-tag and we'll look this unapply-a-patch process.
Keywords: qawanted
(In reply to David Flanagan [:djf] from comment #2)
> [Blocking Requested - why for this release]: Bug 1074272 was a 2.1+ bug. I
> found a gaia workaround, but there is obviously still a serious graphics bug
> lurking in the 2.1 version of gecko, so I suggest that we block on this.

Looks like the workaround exists in 2.1 and I would think that is the best way to go keeping the risk/reward and we should explore the proper fix on 2.2 ? SO not sure why we would block here.
Flags: needinfo?(dflanagan)
Looks like I never got any buy-in on my assertion that this is a "serious graphics bug" and I guess I agree that it is too late to block on this. I still wish we could get someone to investigate.
blocking-b2g: 2.1? → ---
Flags: needinfo?(dflanagan)
Random thoughts...

Based on the source in https://android.googlesource.com/kernel/msm/+/android-msm-hammerhead-3.4-kk-r1/drivers/video/msm/mdss/mdp3_ppp.c David mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1074262#c7, the error message in comment 0 essentially means that the ratio of source and destination height is larger than 8 (in one "direction" s/d > 8 or d/s > 8.)

Given https://bugzilla.mozilla.org/show_bug.cgi?id=1074262#c18, which mentions that reducing the height of the recorded video (source, I presume) makes the problem go away, it stands to reason that the branch that fails is the one where source / destination > 8, which would suggest that the destination is < 80 and >= 44 if the source at 640 fails and at 352 succeeds.  Now, the video bar appears to be 64px high, so that does fit into that range, but none of this really makes sense :)

Benoit, thoughts?
Flags: needinfo?(bgirard)
I'm not familiar with this or how what triggers us to call into that code path. If you're seeing a different with will-change then it's because the layer tree is different.

Setting will-change: opacity might just make you fall HWC. Can you see if the trigger for the bug is if we're hitting HWC or not? You can turn on the FPS counter, if the 3 counters disappear and the bug reproduces then the problem has nothing to do with will-change and it just happens that it makes the page fall out of the HWC budget where the bug lies.

If that's the case then we shouldn't use HWC to render a layer tree that it cannot handle.
Flags: needinfo?(bgirard) → needinfo?(dflanagan)
This might be the same bug as bug 1085593. Not sure yet.
See Also: → 1085593
It's unlikely to be related to bug 1085593 at this point now that we have patches for them since they are about opacity and culling and this is about scaling.
I tried reproducing but I found the STR a bit ambigious. Here's what I tried:
0) Used v2.1
1) I removed the will-change from video.css
2) Recorded a video in portrait mode. Slide down the rocket bar to get an opaque overlay.
3) Played back the video in portrait mode. Slide down the rocket bar to get an opaque overlay.

I was not able to reproduce the bug. Can you give me better STR?
This bug seems similar to Bug 1085593. Bug 1085593 is caused by HwComposer hal's implementation  bug.
(In reply to Sotaro Ikeda [:sotaro] from comment #16)
> This bug seems similar to Bug 1085593. Bug 1085593 is caused by HwComposer
> hal's implementation  bug.

Right now bug 1085593 is tracking the problem or using blending with a complex visible region. This bug is tracking "y req scale factor beyond capability". I don't see this error when we hit the blending bug. I believe they are separate.
Benoit,

Sorry to take so long to get back to you.  I can still reproduce the bug where the playing video goes blank when I try to display the video controls over it. I no longer see an error in the logcat, however.  (I've updated my base image to v188 since I filed this bug, so I suppose that might cause some difference.)

STR:

1) Flash gaia and gecko onto your Flame from a 2.1 nightly build

2) Set the phone's memory to 319mb (this is necessary to force the camera to record 720x480 mode. With more memory it will record a 1280x720 video and this bug will not reproduce in that case).
   adb reboot bootloader
   fastboot oem mem 319
   fastboot reboot

3) Checkout v2.1 of the gaia tree, create a branch, and revert the patch for 1074262, and push to the phone

  git checkout v2.1
  git pull upstream v2.1
  git checkout -b bug1075265
  git revert -m1 dc6d78742a0b327dce7be99245244f66db6f8279
  APP=video make install-gaia

4) Launch the camera app. Hold the phone in portrait mode and record a video.

5) Launch the video app. Hold the phone in portrait mode and play the video. It should briefly show playback controls, start playing, and automatically hide the controls.

6) Tap the screen: the controls appear, then the entire screen goes blank. This is the bug. Tap the screen again, the video comes back and the controls disappear.

It does not happen if you hold the phone in landscape mode, or if the video is a different size.
Flags: needinfo?(dflanagan) → needinfo?(bgirard)
I can duplicate this with a 2.2 nightly, if I push the modified video app from the comment above to the phone. So the underlying gecko issue still exists.
With the FPS counter on, when I play the video with no controls visible, there is no frame rate, so presumably we're on HWC. But when I tap the screen to show the controls, I fall off of HWC, a frame rate appears.  Now, I don't get a completely blank screen but get flashing bars of static.

Note that the bug occurs even when the video is paused. Showing the controls makes the screen blank after a half second or so. Obviously there is no frame rate display in that case since nothing is changing.
Using the video app from master rather than the modified 2.1 app from above, I see the same thing: HWC when no controls are showing, and frame rate display (so non-HWC) when the controls are displayed.
I tried the above steps but using master instead of 2.1 but I can't reproduce. I will try with 2.1 tomorrow.
Flags: needinfo?(bgirard)
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 21 days ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.