Closed Bug 1151309 Opened 9 years ago Closed 9 years ago

MJPEG stream seems to drop frames

Categories

(Core :: Graphics: ImageLib, defect)

37 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla40
Tracking Status
firefox37 --- wontfix
firefox38 + verified
firefox38.0.5 --- fixed
firefox39 --- verified
firefox40 --- verified

People

(Reporter: sansan9643, Assigned: seth)

References

Details

(Keywords: regression, Whiteboard: gfx-noted)

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0
Build ID: 20150328224110

Steps to reproduce:

Just embed any MJPEG stream into an <img> tag.



Actual results:

Since version 37 of Firefox, <img> embedding an MJPEG stream does not work anymore as expected. The "video" freezes and framerate is not consistent. I tested the same stream on Chrome and other software with no problems. Old Firefox v36 also works fine.



Expected results:

The video frames should be displayed smoothly. Something has changed from Firefox v36 to v37.
Please do not ask me to provide an URL for a MJPEG stream. I obviously cannot. Take one by yourself.
Severity: normal → critical
Component: Untriaged → General
Priority: -- → P1
Okay one public stream is the following, not difficult to find with google "public mjpeg streams". It is flickering all the time. http://www.mjpeg.net/webcams/11405 . Compare it to how Chrome shows it smoothly to have an idea. I consider this bug or regression very critical. Please confirm and fix it as soon as possible,
Severity: critical → normal
Component: General → ImageLib
Priority: P1 → --
Product: Firefox → Core
I took a look at the stream provided and I can confirm that that stream seems to drop frames in Firefox compared to Chrome.

I've updated the bug title to describe the problem a little more clearly. There doesn't seem to be anything specific to the <img> tag here - navigating to the stream directly gives the same result for me.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: <img> tag embedding an mjpeg stream does no longer display the frames as expected (v37) → MJPEG stream seems to drop frames
OS: Linux → All
Hardware: x86_64 → All
Note that our MJPEG code got totally rewritten in 37. (Ironically in part to fix issues like this!) This happened in bug 1112972. So this is probably a regression caused by that bug.
Blocks: 1112972
Okay, the reason why I put <img src="[...]"> in the bug description is that AFAIK it is the only portable way to make browsers display MJPEG streams in an HTML page.
Is "dropping frames" when the video freezes momentarily, or when we have a big flash and everything repaints?  I'm seeing the former on OS X and Windows 7 (I'm seeing it as far back as 33), but the later I'm only seeing on Windows 7.  Only tested those two.

Seth, can we catch the missing frames in code, and do we already have a telemetry problem for some numbers?
(In reply to Milan Sreckovic [:milan] from comment #6)
> Is "dropping frames" when the video freezes momentarily, or when we have a
> big flash and everything repaints?  I'm seeing the former on OS X and
> Windows 7 (I'm seeing it as far back as 33), but the later I'm only seeing
> on Windows 7.  Only tested those two.

I've only reproduced the momentary freezes, but I've only tested on OS X so far. Thanks for bringing up the flash issue; I wasn't previously aware of that.

> Seth, can we catch the missing frames in code

We do! Viewing this stream with a debug build produces a constant stream of "WARNING: Decoder not keeping up with multipart image" for me. Dropping frames is exactly what we do in that situation.

> and do we already have a telemetry problem for some numbers?

We don't currently have a probe for this, but we probably should, I agree. Unfortunately multipart/x-mixed-replace images are pretty hard to test in an automated fashion compared to other ImageLib features, so telemetry probes are very useful.

FWIW, I've been pushing for spec changes to make multipart images more testable; see e.g. here:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28096

But we're not there yet.
anonymous, I've put together a build which seems to solve the problem on my machine. Could you verify that it fixes the issue for you as well?

You can find the 64-bit Linux build here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-0468327a43b9/try-linux64/

And the 32-bit version here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-0468327a43b9/try-linux/
Flags: needinfo?(sansan9643)
Same for me, all my MJPEG live view cameras display a static imaage ,like its frozen :(
(In reply to swe.barb from comment #11)
> Same for me, all my MJPEG live view cameras display a static imaage ,like
> its frozen :(

It's hard to tell from your description whether this is the same problem or a different one. Are any of the cameras you're talking about public? If so, could you post some links to them?

It'd also be great if you could try the special build I mentioned above and see whether it solves your problem. I linked to the Linux versions in comment 10. Here's a link to the Windows version:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-0468327a43b9/try-win32/
Here are a few public that all worked before, now none work except the frozen static image only, and im on windows 8.1 64bit if it matters, clearly the new update in firefox broke the mjpeg support otherwise its weird all dont work incluing my own private,



http://roccabella.asuscomm.com:9091/view/viewer_index.shtml?id=27

http://131.155.61.44/view/viewer_index.shtml?id=273

http://137.229.179.199/view/viewer_index.shtml?id=428

http://176.57.195.83/view/viewer_index.shtml?id=523488

http://148.61.63.226/view/viewer_index.shtml
Should i try 	firefox-40.0a1.en-US.win32.tests.zip then or ? will that mess up my current version of firefox, i dont wanna install my addons and everything again and setup..
(In reply to swe.barb from comment #13)
> clearly
> the new update in firefox broke the mjpeg support otherwise its weird all
> dont work incluing my own private,

I agree, something is broken. I'm just trying to determine whether there is only one bug here or multiple bugs. Thanks for posting these links! That will definitely help.

(In reply to swe.barb from comment #14)
> Should i try 	firefox-40.0a1.en-US.win32.tests.zip then or ? will that mess
> up my current version of firefox, i dont wanna install my addons and
> everything again and setup..

To test, you'd need the installer, located here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-0468327a43b9/try-win32/firefox-40.0a1.en-US.win32.installer.exe

If you install it into a different directory then it won't overwrite your existing version of Firefox.

To be totally safe (because there are always risks when trying a development version of Firefox), you should back up your profile before you test, and restore it after you test. You can find instructions for doing that here:

https://support.mozilla.org/en-US/kb/profiles-where-firefox-stores-user-data?redirectlocale=en-US&redirectslug=Profiles

I realize this is complicated, so I wouldn't blame you for not wanting to go through the process. =)

I am hoping that you or anonymous or someone will be kind enough to verify that my special build fixes the problem for you, though, because this bug involves a performance problem, and it's hard to tell for sure whether it's fixed just by testing on my machine.
(In reply to Seth Fowler [:seth] from comment #10)
> anonymous, I've put together a build which seems to solve the problem on my
> machine. Could you verify that it fixes the issue for you as well?
> 
> You can find the 64-bit Linux build here:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.
> com-0468327a43b9/try-linux64/

I compared Firefox v33, v37, v40 (nightly build above) and chrome against an mjpeg video stream at 30fps on a LAN. Firefox v40 is now ok, as fast as the old v33 and chrome. Thanks.
Flags: needinfo?(sansan9643)
(In reply to anonymous from comment #16)
> I compared Firefox v33, v37, v40 (nightly build above) and chrome against an
> mjpeg video stream at 30fps on a LAN. Firefox v40 is now ok, as fast as the
> old v33 and chrome. Thanks.

Thanks so much for checking, anonymous!
Ok confirmed here too that the version u linked works so maybe this should help you fix this problem maybe i hope.
(In reply to swe.barb from comment #18)
> Ok confirmed here too that the version u linked works so maybe this should
> help you fix this problem maybe i hope.

Thanks for verifying the fix, swe.barb! That is indeed a big help.
I experimented with a couple of ways of solving this problem, but in the end I
think the best approach is just to block until the previous frame finishes
decoding before starting to process another. I also tried just dropping frames
on the floor if the decoder wasn't finished, and this also increases smoothness,
but we still drop more frames than I'd like.

So why do we outrun the decoder? On my machine, at least, the decoder almost
always *finishes* in time, but we haven't gotten the chance to run the event
that updates the part's ProgressTracker and sends notifications on the main
thread, which means that from the main thread we can't tell.

On my machine, then, this patch doesn't cause any increase in jank. That's true
even if we have only one decoding thread. (Indeed, we can only really utilize
one for the MJPEG stream itself, but I think that's desirable.) It just ensures
that we actually believe that the part we just decoded is in fact decoded and
display it to the user, instead of dropping the frame.

The result is drastically increased smoothness.
Attachment #8589999 - Flags: review?(tnikkel)
Assignee: nobody → seth
Status: NEW → ASSIGNED
So part 1 was about dropped frames and outrunning the decoder. This part
addresses the flashing issue.

The root cause of the flashing is that if a part is corrupt, we still display it
to the user. This patch makes us just drop it.

While investigating this, I realized that we can also propagate some errors out
from the parts via OnImageDataAvailable and OnImageDataComplete. I've prevented
that as well in this patch.
Attachment #8590004 - Flags: review?(tnikkel)
(In reply to Seth Fowler [:seth] from comment #21)
> Created attachment 8590004 [details] [diff] [review]
> (Part 2) - Hide errors in multipart image parts both visually and internally
> 
> So part 1 was about dropped frames and outrunning the decoder. This part
> addresses the flashing issue.
> 
> The root cause of the flashing is that if a part is corrupt, we still
> display it
> to the user. This patch makes us just drop it.

Hmm. This causes a test failure, as far as I can tell because test_bug733553.html insists that we should actually present rillybad.jpg and not just drop it as part 2 makes us do.

I'm not convinced that this is good UX.

Timothy, do you see any good reason to maintain this behavior? I don't think we can get rid of the flashing if we insist on passing this test as-is.
How did we avoid the flashing and display rillybad.jpg before?
Comment on attachment 8589999 [details] [diff] [review]
(Part 1) - Block until the previous multipart frame is decoded before processing another

If we really can't keep up with decoding isn't this going to flood the cpu in a bad way? Can we just check to see if the decode is complete except for notifying and only wait in that case?
Attachment #8589999 - Flags: review?(tnikkel) → review+
So these streams are sending bad frames? In what way are they bad?
So i asume the next update on firefox should have this problem solved or ?
(In reply to Timothy Nikkel (:tn) from comment #24)
> How did we avoid the flashing and display rillybad.jpg before?

We didn't avoid flashing before. =) The "flashing" I observe is the result of nsImageFrame displaying an image with errors in the usual way; it's just that when it happens in the middle of a stream, it looks like a brief flash.

(In reply to Timothy Nikkel (:tn) from comment #25)
> If we really can't keep up with decoding isn't this going to flood the cpu
> in a bad way? Can we just check to see if the decode is complete except for
> notifying and only wait in that case?

The previous code flooded the CPU more than this code does, because if the decoder truly fell behind we'd keep pulling data off the network and just end up with a huge queue of decode jobs. Now we at least apply backpressure and won't deliver any more data until we finish decoding the previous frame.

If we really never want to wait, we could just not pass FLAG_SYNC_DECODE to get the GetFrame call, but that'd put us back in the situation of decode jobs queuing up. I think the backpressure is useful.

(In reply to Timothy Nikkel (:tn) from comment #26)
> So these streams are sending bad frames? In what way are they bad?

They're just corrupt images. MJPEG streams do this all the time. If you let one run for a few minutes, sooner or later you get a corrupt frame.

I haven't looked at the details of their implementation, but I think Chrome is probably just dropping the corrupt frames (like my part 2 patch would do), because I never see flashing on Chrome.
(In reply to swe.barb from comment #27)
> So i asume the next update on firefox should have this problem solved or ?

It will definitely be fixed in an upcoming Firefox update, but I can't guarantee it'll be fixed in the next one. We're not yet at the point where we can figure that out.
(In reply to Seth Fowler [:seth] from comment #23)
> (In reply to Seth Fowler [:seth] from comment #21)
> > Created attachment 8590004 [details] [diff] [review]
> > (Part 2) - Hide errors in multipart image parts both visually and internally
> > 
> > So part 1 was about dropped frames and outrunning the decoder. This part
> > addresses the flashing issue.
> > 
> > The root cause of the flashing is that if a part is corrupt, we still
> > display it
> > to the user. This patch makes us just drop it.
> 
> Hmm. This causes a test failure, as far as I can tell because
> test_bug733553.html insists that we should actually present rillybad.jpg and
> not just drop it as part 2 makes us do.
> 
> I'm not convinced that this is good UX.
> 
> Timothy, do you see any good reason to maintain this behavior? I don't think
> we can get rid of the flashing if we insist on passing this test as-is.

Seems like a reasonable plan to me. So you'll modify the test to not expect an image but still have the sjs send the bad image to test that we skip it properly?
Attachment #8590004 - Flags: review?(tnikkel) → review+
(In reply to Timothy Nikkel (:tn) from comment #30)
> Seems like a reasonable plan to me. So you'll modify the test to not expect
> an image but still have the sjs send the bad image to test that we skip it
> properly?

Yeah, what I ended up doing is modifying the test so that the width we expect for the invalid images is the same as the width for the image before them in the stream. (Adding comments explaining why, of course.)

I made a minor tweak to the code in part 2 to make this easier to test accurately: even if we drop the frame, and keep the previous image, we still re-send notifications.
Here's the updated version of part 2.
Attachment #8590004 - Attachment is obsolete: true
Seth, are you planning to uplift it to 38? Thanks (it is an esr release)
Flags: needinfo?(seth)
Tracking as it is a regression.
https://hg.mozilla.org/mozilla-central/rev/6d0f91829740
https://hg.mozilla.org/mozilla-central/rev/414baffa50d2
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Keywords: regression
(In reply to Sylvestre Ledru [:sylvestre] from comment #36)
> Seth, are you planning to uplift it to 38? Thanks (it is an esr release)

We talked on IRC and decided to start with an Aurora uplift and QE verification, because this is a relatively risky patch, but it would definitely be nice to have it in the ESR.
Flags: needinfo?(seth)
Comment on attachment 8589999 [details] [diff] [review]
(Part 1) - Block until the previous multipart frame is decoded before processing another

Approval Request Comment
[Feature/regressing bug #]: Bug 1112972 (but actually this stuff was still somewhat broken even before that).
[User impact if declined]: Some MJPEG streams will either drop frames, flash, or freeze totally.
[Describe test coverage new/current, TreeHerder]: There's not an obvious way to write a test for this; it's really a performance problem. The patch is on m-c. I'll be requesting QE verification concurrently with the uplift request.
[Risks and why]: Medium risk. This patch changes how we do buffering for MJPEG streams, which means that both performance and memory usage during MJPEG playback will be different.
[String/UUID change made/needed]: None.
Attachment #8589999 - Flags: approval-mozilla-aurora?
Comment on attachment 8592015 [details] [diff] [review]
(Part 2) - Hide errors in multipart image parts both visually and internally

(Also requesting Aurora uplift for part 2.)
Attachment #8592015 - Flags: approval-mozilla-aurora?
Requesting QA verification for this. We need to verify that MJPEG webcam stream playback is both smooth and doesn't leak memory (and doesn't crash or anything, obviously), and we need to check this on both slow machines (especially slow dual-core machines) and fast machines. The streams linked in comment 2 and comment 13 are good testcases.
Keywords: qawanted
Flags: qe-verify+
Comment on attachment 8589999 [details] [diff] [review]
(Part 1) - Block until the previous multipart frame is decoded before processing another

thanks!
Attachment #8589999 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8592015 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Does the latest daily Nightly have these patches?
(In reply to Loic from comment #44)
> Does the latest daily Nightly have these patches?

Unfortunately it missed the latest Nightly. It should make the next one.
What about this uplift to beta 38?
any idea what version of firefox will have this fixed?
For sure, 39.
Seth said that it has risk for 38, so, I am not sure we will take it for 38.
However, if the impact is important, we might take it as 38 is an ESR.
There is a risk for 38?

Can you please quantify this.

MJPEG does not work correctly even though the code re write went through all the release procedures. Users currently have a system which does not work as well as previous releases.

If the changes are confined to the MJPEG code it would seem the only risk is users might have a system which works.

Apologies if this comment is not politically correct but one of the reasons I dumped IE and Chrome was because I need a reliable browser with well thought out upgrades and on this occasion I have been let down. 

A concern is if this can happen over a relatively simple re write how can I confidently use Firefox after a major security upgrade?
(In reply to Andy B from comment #51)
> If the changes are confined to the MJPEG code it would seem the only risk is
> users might have a system which works.

The risk is that users might have a system which is catastrophically *worse* in some cases. We want to be sure that we don't end up with crashes, serious memory leaks, or other similar bugs. If we uplift to beta, we have a greatly reduced amount of time in which to find that kind of issue before the change is released to the public and it's too late to fix it. That's why we're waiting on testing from QA before deciding whether to uplift to beta.

> A concern is if this can happen over a relatively simple re write how can I
> confidently use Firefox after a major security upgrade?

We have both automated and human testing to try to catch issues like this, but unfortunately problems still get through. The best we can do at that point is to fix the problem, identify the issues which led to it, and try to make sure they don't happen again.
Florin - We're going to want to know that this fix doesn't regress MJPEG in a major way if we're going to be able to take it on Beta. Can you please see if you have time to do some exploratory testing of this feature on Aurora?

Seth - Can you please provide some guidance to Florin on what his team should test?
Flags: needinfo?(seth)
Flags: needinfo?(florin.mezei)
Sure. I posted some general guidance in comment 42, but let me expand upon that a little bit:

To start with, we need testing on both slow machines (single-core or dual-core) and fast machines. This is really important, because this bug is fundamentally a performance issue.

We need to verify that MJPEG webcam stream playback:
- Is smooth.
- Doesn't cause a memory leak.
- Doesn't crash.
- Doesn't degrade after running for a few minutes.

The streams linked in comment 2 and comment 13 are great testcases. mjpeg.net has links to many more MJPEG webcams and might be useful for further testing.

So in practice what I'd do is use multiple computers of different speeds, visit a number of streams, and leave each running for a few minutes. While they're running, I'd make sure that they play back smoothly and that memory usage isn't increasing. I'd also try to visit other sites in different tabs and see what performance is like with an MJPEG stream running in a background tab.

If all that goes well and nobody encounters any crashes, then I'd feel pretty confident uplifting this to beta.
Flags: needinfo?(seth)
We've reproduced the original issue today using older Nightly and DevEdition builds on various Operating Systems and machines. We performed exploratory testing using various public streams on different machines, but did not encounter any problem with the streams. Testing details can be found in this etherpad: https://etherpad.mozilla.org/MJPEG.

I'm marking this as verified based on today's testing.
Status: RESOLVED → VERIFIED
Flags: needinfo?(florin.mezei)
Keywords: qawanted
Thanks so much Florin!

It looks like we can have pretty good confidence in these patches. I think we're ready to request beta uplift.
Comment on attachment 8589999 [details] [diff] [review]
(Part 1) - Block until the previous multipart frame is decoded before processing another

Approval Request Comment
[Feature/regressing bug #]: Bug 1112972 (but actually this stuff was still somewhat broken even before that).
[User impact if declined]: Some MJPEG streams will either drop frames, flash, or freeze totally. That's a really bad user experience, so I'd really like to get this on beta, especially since it's an ESR.
[Describe test coverage new/current, TreeHerder]: There's not an obvious way to write a test for this; it's really a performance problem. The patch is on m-c and Aurora for a week with no reported problems. QE verification has tested the patch on a variety of test cases and machines with different performance characteristics. At this point I think we have pretty good confidence that this is a solid patch.
[Risks and why]: When I requested uplift to Aurora, I described this patch as medium risk, because this patch changes how we do buffering for MJPEG streams, which means that both performance and memory usage during MJPEG playback will be different. Now that the patch has been on m-c and Aurora for a week with no reported issues, and it's been verified by QE, I'm feeling _much_ more confident in this patch. I think that given the additional testing we can characterize this patch as low risk.
[String/UUID change made/needed]: None.
Attachment #8589999 - Flags: approval-mozilla-beta?
Attachment #8592015 - Flags: approval-mozilla-beta?
Comment on attachment 8589999 [details] [diff] [review]
(Part 1) - Block until the previous multipart frame is decoded before processing another

[Triage Comment]

Let's do it then!
Attachment #8589999 - Flags: approval-mozilla-beta? → approval-mozilla-release+
Comment on attachment 8592015 [details] [diff] [review]
(Part 2) - Hide errors in multipart image parts both visually and internally

[Triage Comment]
Attachment #8592015 - Flags: approval-mozilla-beta? → approval-mozilla-release+
A quick check with Firefox 38 Beta 9 (BuildID=20150429135941) on Windows 7 x64 and Mac OS X 10.8.5 shows the same expected results as in comment 56. Will not re-verify in 38.0.5.
Flags: qe-verify+
A big thank you to you all for doing this. Beta 38 release 9 solves the problem on my machine and it also seems much faster than the full release of 37.

KR's Andy B
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: