Closed Bug 1027100 Opened 10 years ago Closed 10 years ago

WebRTC LoadManager resolution reduction under load causing horizontal distortions

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jesup, Assigned: pkerr)

References

Details

(Whiteboard: p=5)

Attachments

(8 files, 3 obsolete files)

279.26 KB, image/png
Details
2.81 KB, patch
Details | Diff | Splinter Review
8.02 KB, patch
Details | Diff | Splinter Review
4.57 KB, patch
Details | Diff | Splinter Review
629 bytes, image/png
Details
4.13 KB, image/png
Details
198.22 KB, image/png
Details
3.03 KB, patch
Details | Diff | Splinter Review
See the attached image.  Whenever the load manager kicks in resolution reductions, it causes horizontal artifacts.  These seem tied to the data, and in the case of fake video (basically constant values for the entire image except the top couple of lines) the artifacts are very few, and appear in a regular pattern that implies a 'stride' issue perhaps.

Things tried an ruled out:
a) disabled ASM code in libyuv
b) disabled dedicated 3/4 scaler in libyuv
c) disabled 'box' filter in scaler (switch to no interpolation)

When examining the output of the decoder for a fake video instance (in pc_test.html):
a) I couldn't find anything the 'y' buffer that wasn't 0x80 other the the timecode at the start, b) the size output was 640x480(?), though I think I was getting FrameSizeChanged callbacks cutting the resolution. (I was getting them; i didn't verify for which side, but the fake video was showing artifacts)

Options are: libyuv (seems pretty unlikely now); chain between scale and encoder (unlikely), vp8 encoder when encoding 3/4*640x480 input (unlikely, possible), decoder (unlikely, possible per above), or chain between decoder and DeliverFrame()

Dumping frames in a readable state to disk at various points in the chain would help a lot.  An ugly uncompressed I420 file dumper that something (gimp?) can read would help a lot; there may be one somewhere in the tree esp. in the test code
Attached image loadmanager.png
I traced the frame size set calls through both the encode and decode paths, and the sizes appear to always be correct at each point in time through a stress cycle.

I ran several tests between Firefox and Chrome. If the Firefox side is stressed and goes into load adaptation the Chrome side will decode with artifacts. If the Chrome side of the call is stressed there are no decode artifacts produced by the Firefox side.

Working on dumping frames from the VideoConduit.
>If the Chrome side of the call is stressed there are no decode artifacts produced by the Firefox side.

Does Chrome do resolution adaption? (I don't think they have it enabled)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #4)
> >If the Chrome side of the call is stressed there are no decode artifacts produced by the Firefox side.
> 
> Does Chrome do resolution adaption? (I don't think they have it enabled)

It doesn't appear to change resolution. The test was run to complete the exercise.
Periodically dump received yuv frames to files using new PeriodicWriter class (look for FIXME(PRK)). Can handle multiple sample points.
Assignee: nobody → pkerr
Status: NEW → ASSIGNED
I am loading the files created by the PeriodicWriter into GIMP as: raw image data (file type), Planar RGB, manually set the image size based on the expected size: 640x480, 480x360, 320x240 as indicated by the file size. This works best for the fake video stream.

At my first test point, at the DeliverFrame() call in VideoConduit.cpp, the data is already corrupted with the pattern seen on the receive side once the frames are reduced from 640x480 to 480x360.
(In reply to Paul Kerr [:pkerr] from comment #7)
> I am loading the files created by the PeriodicWriter into GIMP as: raw image
> data (file type), Planar RGB, manually set the image size based on the
> expected size: 640x480, 480x360, 320x240 as indicated by the file size. This
> works best for the fake video stream.
> 
> At my first test point, at the DeliverFrame() call in VideoConduit.cpp, the
> data is already corrupted with the pattern seen on the receive side once the
> frames are reduced from 640x480 to 480x360.

DeliverFrame() should be the output of the decoder...  I'm 90% certain the error is on the encode side (maybe 95%).  Note that a new I420VideoFrame is created to hold the scaled image; this may have different strides than the non-scaled data had.  It's possible something is caching the stride somewhere (perhaps the u or v stride) - or that the stride or some other frame var going *into* the scaler is wrong, causing the scaler to mis-scale.  

DUmp the frame periodically at the output of the scaler; at the input to the encoder, at the output of the decoder and DeliverFrame, and rapidly we'll know where the corruption is
Whiteboard: p=3
Target Milestone: --- → mozilla33
Priority: -- → P1
I have trace both the image data and I420VideoFrame meta-data through to VideoSender::AddVideoFrame. Everything seems to be good to that point. I will continue to follow the encoding to output path to ferrit out where the distortion is occuring.
The first frame after a framesize reduction is always good, as far as I can tell. This is normally a key frame triggered by the framesize change. Distortion appears after this lead frame.
To further expand on my previous comment. Periodically forcing a key frame to be generated will cause that frame to be displayed without artifacts on the receiving side. But, the distortion is only cleared for that one frame. It does not get steadily worse after the forced key frame as is the case if the encoder and decoder are off in their image references. Key frames are clear, all frames at the starting resolution are clear (not just 640x480), and non-key frames of reduced resolution show the stripped pattern. This appears to be an issue with VP8.
derf: see comment 11 (and comment 10); preliminary info may point to a libvpx issue when the input size changes with a stream.

The corruption looks like some sort of stride issue.
Flags: needinfo?(tterribe)
Given that we are now encouraging people to use Nightly for Loop and that this is a fairly noticeable regression, I suggest we back out or at least disable the Load Management code until this is fixed.
Flags: needinfo?(rjesup)
Flags: needinfo?(adam)
An alternate temporary solution would be to disable resolution reduction as part of the load management scheme and leave in the rate reduction, which works without a problem. This would give us some feedback on the LoadManager with Loop.
(In reply to Paul Kerr [:pkerr] from comment #14)
> An alternate temporary solution would be to disable resolution reduction as
> part of the load management scheme and leave in the rate reduction, which
> works without a problem. This would give us some feedback on the LoadManager
> with Loop.

That seems reasonable.
(In reply to Eric Rescorla (:ekr) from comment #13)
> Given that we are now encouraging people to use Nightly for Loop and that
> this is a fairly noticeable regression, I suggest we back out or at least
> disable the Load Management code until this is fixed.

This isn't just a Loop problem, though, right? I've seen these relics with relatively high frequency in regular WebRTC calls.

In any case, I agree that we should back out the resolution reduction (or, better yet, put it behind a pref that is off by default) until we have this issue chased to ground.
Flags: needinfo?(adam)
(In reply to Adam Roach [:abr] from comment #16)
> (In reply to Eric Rescorla (:ekr) from comment #13)
> > Given that we are now encouraging people to use Nightly for Loop and that
> > this is a fairly noticeable regression, I suggest we back out or at least
> > disable the Load Management code until this is fixed.
> 
> This isn't just a Loop problem, though, right? I've seen these relics with
> relatively high frequency in regular WebRTC calls.
>

Agreed. I was just thinking we might be able to tolerate this kind of artifact
on Nightly longer if we weren't actively encouraging people to use a feature
that is only on Nightly.

> In any case, I agree that we should back out the resolution reduction (or,
> better yet, put it behind a pref that is off by default) until we have this
> issue chased to ground.
Whiteboard: p=3 → p=5
(In reply to Randell Jesup [:jesup] from comment #12)
> derf: see comment 11 (and comment 10); preliminary info may point to a
> libvpx issue when the input size changes with a stream.
> 
> The corruption looks like some sort of stride issue.

I'm not aware of any libvpx issues like this. We support resolution changes in the <video> tag, too, and have had people file bugs about correctly scaling them afterwards, so use of VP8's resolution-switching in the web is non-zero. See bug 626979, where there's a file encoded like this attached.
Flags: needinfo?(tterribe)
Work around by re-inializing the vp8 encoder on frame size changes. This removes the video artifacts but does cause a freeze/glitch at each frame size change.
Attached image input_31.raw.png
Input image right before vpx_codec_encode() -- clean
Flags: needinfo?(rjesup)
Attached image output_31.raw.png
output image right after decoding (in ReturnFrame()) - corrupt (view at 400% to make it clear).  Note that distortions do not move or change, though the chroma does as the source cycles.
Attached image output_301.raw.png
example with live video.  In live video, the distortions aren't quite static; they are tied to the content and content motion.  Input live video was clean as with fake video.
Updated patch. Uses minimal re-initialization by calling vpx_codec_enc_init() instead of vpx_codec_enc_config_set() in VP8EncoderImpl::UpdateCodecFrameSize when handling a frame size change. This results in minimal glitching on size change. This one change allows for an error free set of reduced frames.
derf: as best I can determine, unless there's a fubar in the reconfig, the problem appears to be in the libvpx encoder when we change the input resolution - especially as pkerr's patch that does a full restart of the encoder on resolution changes makes the problem go away.

To convert the dumped frames from the patch in attachment 8445905 [details] [diff] [review], use
convert -size 480x360 -interlace plane -depth 8 -sampling-factor 4:2:0 yuv:$i $i.png
for input frames, and 
convert -size 544x360 -interlace plane -depth 8 -sampling-factor 4:2:0 yuv:$i $i.png
for output frames (ImageMagik's convert tool)
Flags: needinfo?(tterribe)
Flags: needinfo?(giles)
Attachment #8445883 - Attachment is obsolete: true
Comment on attachment 8446032 [details] [diff] [review]
visual distortion work-around by re-initializing the vp8 encoder on frame size changes

Review of attachment 8446032 [details] [diff] [review]:
-----------------------------------------------------------------

If this is working well with no other patches, file the followup codec bug, put the number in, and I think we can land this with minor nit-fixes.

::: media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc
@@ -136,5 @@
>    if (number_of_cores < 1) {
>      return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
>    }
>    feedback_mode_ = inst->codecSpecific.VP8.feedbackModeOn;
> -

don't remove blank lines unless there's a reason

@@ -157,5 @@
>    // TODO(andresp): assert(inst->extra_options) and cleanup.
>    Config default_options;
>    const Config& options =
>        inst->extra_options ? *inst->extra_options : default_options;
> -

ditto

@@ -331,5 @@
>    }
>    if (encoded_complete_callback_ == NULL) {
>      return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
>    }
> -

ditto

@@ +342,5 @@
>      int ret = UpdateCodecFrameSize(input_image);
>      if (ret < 0) {
>        return ret;
>      }
> +#if 1 //FIXME(PRK) - work around for bug 1027100

#ifndef LIBVPX_ENCODER_RECONFIG_WORKS or some such with a comment with the bug # (and probably it would be a new bug on the config issue, and bug 1027100 will cover the workaround)

@@ +413,5 @@
>    // Update encoder context for new frame size.
>    // Change of frame size will automatically trigger a key frame.
>    config_->g_w = codec_.width;
>    config_->g_h = codec_.height;
> +#if 1 //FIXME(PRK) - bug1027100 work around

ditto
Attachment #8446032 - Flags: review+
Blocks: 986513
Depends on: 1030324
Attachment #8446032 - Attachment is obsolete: true
Attachment #8446110 - Attachment is obsolete: true
https://tbpl.mozilla.org/?tree=Try&rev=28bf46c408b1
r+ by Jesup carried forward.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3e706953551

We'll need to open a new bug for libvpx
Flags: needinfo?(giles)
(In reply to Randell Jesup [:jesup] from comment #30)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/f3e706953551
> 
> We'll need to open a new bug for libvpx

bug 1030324
Blocks: 1030324
No longer depends on: 1030324
https://hg.mozilla.org/mozilla-central/rev/f3e706953551
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Depends on: 1198107
Flags: needinfo?(tterribe)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: