Closed Bug 787899 Opened 8 years ago Closed 8 years ago
multipart/x-mixed-replace doesn't recover from bad frames
109.09 KB, image/jpeg
543.74 KB, image/png
81.18 KB, image/jpeg
23.85 KB, patch
|Details | Diff | Splinter Review|
6.53 KB, patch
|Details | Diff | Splinter Review|
10.88 KB, image/jpeg
12.87 KB, text/x-log
22.29 KB, patch
|Details | Diff | Splinter Review|
65.36 KB, image/jpeg
User Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2) Steps to reproduce: I opened a browser window to an IP camera displaying one or more Motion JPEG windows. Actual results: After a while (a few seconds to a few minutes), one or more of the mjpeg windows was replaced by a small gray screen. However, the connection was still open and streaming video data ok. The problem also happened if I started FireFox in Safe Mode. The problem does not happen in Firefox 14. I have attached an image where FireFox 15 has stopped three windows of a quad-window display, replacing them with smaller greyed-out areas. If you browse to http://10.26.7.88/ and login (test/test), click on "4 x View", it should be reproducible, though because of the slow datarate it may take a minute or more to show up. Expected results: The motion jpeg stream should continue displaying, as it does under FireFox14.
The server bandwidth doesn't look very stro,ng, each time I'm trying to connect to your server, I can't load the website. Possible to switch to a more reliable server?
Hi Loic, unfortunately this is the only accessible camera we have streaming on the Internet *sigh*, but I'll try to get a faster one attached ASAP. But if you can connect, going into the quad view makes the FF bug appear roughly 4x quicker. Alternatively, a whole load of other people happen to be connecting to it at the same time may well cause you to be blocked out - basically, it's only a tiny little security camera, so can only support a modest number of clients simultaneously. Be nice, everyone! ;-)
Ooops, here's the address I should have given you first time round:- http://126.96.36.199/ ... cut and paste failure, sorry!
Again, user/pass = test/test
Thanks for the news server, it's fast now. I'll test your STR.
I tried a few minutes. I observed this display glitch. Is this issue you're talking about? (see my attached screenshot)
Yes, that's the glitch - a motion jpeg stream stopping for no obvious reason in FF15 where it would continue without problem in FF14.
Ok, I tested again and I found a regression. STR for the devs who want to test: 1) Open http://188.8.131.52/ 2) Enter username (test) and password (test) 3) Check the box " 4 x VIEW" 4) Wait for a couple of minutes (~5 min) Result: Image stream disappears like in this screenshot https://bugzilla.mozilla.org/attachment.cgi?id=657863 Error: Image corrupt or truncated: http://184.108.40.206/mjpg/video.cgi?view=1&clientid=dgDja Source File: http://220.127.116.11/mjpg/video.cgi?view=1&clientid=dgDja Line: 0 Mozregression range: m-c good=2012-05-19 bad=2012-05-20 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=642d1a36702f&tochange=0e2cc686b07b m-ci good=2012-05-19 bad=2012-05-20 http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=175bcd2c7912&tochange=443c0c79e54f Suspected bug: Adam Dane [:hobophobe] — Bug 733553 - Allow multipart image streams to cope with stream changes. r=joe
Adam, any interest in fixing this?
I looked at it a bit, but so far my results aren't conclusive. With Firefox 11 on x86_64/Linux, the streams would close randomly (though usually only two of the four would close). One can easily see if streams have been closed via the media info window: 1. Context-click on an image 2. Select "View image info" 3. Look at the stream entries (they are of the form http://18.104.22.168/mjpg/video.cgi?view=[N]&clientid=[ID] for the example in comment 8) 4. Look at the "Size" column (may need to add it to the table, by clicking the column chooser in the corner of the header of the table) 5. If the size is known, the stream is closed; open streams display "Unknown" for their size. I also saw the streams being closed by Chromium (v21), being replaced with the "broken image" icons. Can anyone confirm this with Chromium and/or earlier (<15) versions of Firefox? Assuming that the real issue here is semi-flaky streams, we might be able to be a bit more forgiving to data errors/trying to salvage after a data error.
Just so you know, I also tried FF15 on the widely-used Axis 2120 Network Camera accessible at http://webcam01.lugano.ch/ Again, this fails in the same way on FF15 but carries on working on FF14.0.1 just as you'd hope. Hence I don't think this is less a matter of semi-flaky streams than a difference in behaviour between FF14 and FF15.
So I'm clear - people have been able to reproduce this on Nightly?
Assignee: nobody → joe
Ah, yes, I was just able to reproduce it.
Version: 15 Branch → Trunk
We talked about this in channel meeting, not ride-along ready for the 15.0.1 chemspill so we'll track 16->
Sometimes these webcams send bad frames, or at least frames that the MIME type sniffer can't identify. imgRequest::OnStopRequest notices that the Decoder has an error, then marks marks the RasterImage as having an error and Cancel()s it. I presume that the rest of the fallout is because a Cancel()ed imgRequest doesn't send notifications, etc. Problem is: I have no idea how Adam's patch could have exposed this. And my attempts at making a simple testcase to make testing any attempted fixes easier led only to frustration this evening. Adam, any ideas on why we wouldn't see this problem before your patches?
Summary: FireFox15 displays mjpeg stream ok for a few secs BUT then outputs grey block (FF14 works ok). → multipart/x-mixed-replace doesn't recover from bad frames
A bit in |ImgRequest::OnStartRequest|  jumps out. The old version just assumed the MIME type was unchanged. The new version expects each part to have a discoverable MIME available when it calls |RasterImage::NewSourceData| (so it can cope with multiple types in the same stream). If the part's not getting a good MIME type, it should be failing in |RasterImage::InitDecoder| . 1. https://hg.mozilla.org/integration/mozilla-inbound/rev/ed488f577c84#l9.31 2. http://dxr.mozilla.org/mozilla-central/image/src/RasterImage.cpp.html?string=RasterImage.cpp#l2243
From the point of view of the camera producing the multipart motion JPEG stream, all it does is send down the following C string between JPEG frames: \r\n--myboundary\r\nContent-Type: image/jpeg\r\nConnection: Close\r\nCache-Control: max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0\r\nPragma: no-cache\r\n We're not interleaving multiple types of data, only a long sequence of JPEGs. Adam, is this header what you would be expecting to see?
If I browse directly to a multipart motion JPEG stream in FF14 and FF15, then every few minutes an annoying requester pops up. My inference is that the MIME type of an individual frame is getting lost or corrupted, and so FF asks the client what to do with the unknown frame.
I should mention some curious behaviour that's present in both FF14 & FF15 that seems to be very closely related. If you open a motion JPEG stream directly (i.e. not through an IMG container) by browsing directly to the multipart stream itself, e.g. http://22.214.171.124/mjpg/video.cgi?view=5 ...then every few minutes (about the same frequency as the phenomenon we're discussing here) an annoying requester pops up [See attachment 658829 [details] in the preceding comment] that is pretty much what you'd expect to see if an individual frame's MIME header was missing from within a motion JPEG stream. Note that the stream continues displaying & updating even though that individual frame apparently had a failed MIME header (for whatever reason). Hence I suspect that Adam's patch has exposed this bug to do with MIME header handling that was already present in FF14. Can someone confirm this behaviour & perhaps regression test it to find out when it arrived? As I say, it's certainly present in FF14.0.1 but I suspect it wasn't in FF13. Thanks!
Sorry, I meant to mention this yesterday. Occasionally the webcam sends the content-type as application/octet-stream, and this causes the bug because we trust the server for content-types of subsequent parts. I'm working on a patch and a test to fix this, but I've had some issues that I hope to overcome today.
Joe: just to be clear, our camera (the car park one) never specifies application/octet-stream - the only content-type header string it uses is the image/jpeg string I quoted in Comment #18 above. Is there some mechanism in the FF source that might (under some conditions) insert/override the content-type as application/octet-stream?
Right you are. However, sometimes it serves non-JPEG data with that MIME type! Specifically: (gdb) p inStr->mPipe->mReadCursor $21 = 0x15792dc00 "Content-Type: image/jpeg\r\nConnection: Close\r\nCache-Control: max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0\r\nPragma: no-cache\r\nContent-Length: 25760\r\n\r\n????" (gdb) p (inStr->mPipe->mReadCursor + 183) $26 = 0x15792dcb7 "\020AVI1" Looks like maybe sometimes it pushes through AVI files? Anyways, I fixed an unrelated bug (the "wrong MIME type served" bug I mentioned in comment 21) while trying to fix this. Still working on the "fix this" part, though!
So this patch doesn't fix *this* bug, but it fixes a related bug: we trust the server's content-type on non-first parts of multipart/x-mixed-replace, which we shouldn't do.
Attachment #659018 - Flags: review?(justin.lebar+bug)
This test rearchitects the tests from bug 733553 slightly, and also adds a "lie about the mime type" test.
Attachment #659019 - Flags: review?(justin.lebar+bug)
I'm having serious troubles reproducing this bug on anything but the URL in question, and since it can take 5+ minutes to reproduce, it's hard to get real debugging done. However: when someone sees that "Do you want to download this stream" box pop up, it would help me very much if you could save that file. Perhaps with it I'll be able to more easily fake the failure situation!
This is a multipart jpeg section not recognized by FF (and saved out via the annoying "Save File" pop-up requester). The start of the file contains the Content-Type headers, followed by a valid-looking JPEG file (IrfanView displays it fine, anyway). The easy way to examine it is to look at it in Irfanview and press F3 for the HEX view. And if you do, what you'll see is:- FF D8 [JPEG "SOI" marker] FF E0 [JPEG "APP0" block marker] 00 10 [JPEG block length = 16] 41 56 49 31 ["AVI1", a standard marker => "contents are Motion JPEG"] ...etc. So Joe, the "AVI1" you saw is what's expected for Motion JPEG. :-)
Comment on attachment 659018 [details] [diff] [review] always re-sniff mime types on multipart/x-mixed-replace > 4 files changed, 180 insertions(+), 151 deletions(-) and then diff -w to the rescue: > 4 files changed, 76 insertions(+), 47 deletions(-) Much better. :) > static NS_METHOD sniff_mimetype_callback(nsIInputStream* in, >- void* closure, >+ void* aClosure, I don't like this; if you're not using the |a| prefix universally, you don't get to use it to avoid a name collision. Call it |void* data|, if you like. > const char* fromRawSegment, > uint32_t toOffset, > uint32_t count, > uint32_t *writeCount)
Attachment #659018 - Flags: review?(justin.lebar+bug) → review+
Attachment #659019 - Flags: review?(justin.lebar+bug) → review+
What I see happening in netwerk/streamconv/converters/nsMultiMixedConv.cpp:~461 (nsMultiMixedConv::OnDataAvailable) when it hits the bad part: 1. (~578) new part begins 2. (~591) |!done|, so |mProcessingHeaders = true| so it will continue processing headers on the next data 3. (~537, ~548) next data, finishes processing headers (doesn't actually process any), calls |nsMultiMixedConv::SendStart| 4. (~787) |mContentType| is empty (gets emptied from |OnDataAvailable| (~600) when it's preparing for a new part, but would normally be refilled in |ParseHeaders| (~964)), so it defaults to |application/x-unknown-content-type| (via |SetContentType|, called by |SendStart|) 5. (~787) |mContentType| then set to |application/octet-stream|, by |nsUnknownDecoder::FireListenerNotifications| also calling |SetContentType| What's apparently happening: due to variations in the data lengths, a part comes through at an offset that makes |nsMultiMixedConv| think that no header information is available for that part. It sets the MIME to the |x-unknown-content-type|, which means the next data goes to the |nsUnknownDecoder|, which is why the user is prompted for download. That's why the downloaded data contains the whole part, headers and all.
Adam, that was *incredibly* helpful - especially your discovery of that MIME type. This patch fixes this bug, and includes tests for it (and other invalid images). Basically, we just stop bailing out if an image fails to initialize and we're multipart. That way, we keep trying to load. Try push: https://tbpl.mozilla.org/?tree=Try&rev=b144f1378c15
Attachment #661014 - Flags: review?(justin.lebar+bug)
attachment 661014 [details] [diff] [review] runs in to bug 791156 (that I just filed); to work around it, I've removed the last image in the list: + [100, "lime100x100.svg"] (RasterImage handles multiple OnStopRequest calls silently, it seems.) Anyways, new try push: https://tbpl.mozilla.org/?tree=Try&rev=ac6b00c4cf57
Nick: Can you try the build linked here: http://email@example.com/try-win32/ and verify whether it fixes the bug for you?
Try run for b144f1378c15 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=b144f1378c15 Results (out of 106 total builds): success: 89 warnings: 17 Builds (or logs if builds failed) available at: http://firstname.lastname@example.org
This is an individual frame taken from a multipart motion jpeg stream which brought a "Browse / Save File" requester. Looks like the same MIME bug as before? Note that this was a stream browsed directly to the cgi file, rather than one within an HTML IMG container.
Joe: I'd say this nightly build is a workable patch, but not yet a good fix. Watching motion jpeg streams inside HTML IMG containers, it is clear that a fair number of frames are still being lost in transit (which would imply that the root cause of the MIME header-losing bug hasn't yet been addressed), but these are then being resumed straight away. Basically, though I saw the ALT message flash up a few times on the Axis Lugano camera stream, the stream itself didn't halt upon that error: and for the fisheye cameras I'm testing here, I similarly saw individual frames that flashed grey but which immediately recovered. However, when I browsed directly to motion jpeg streams (i.e. directly to the video.cgi file rather than via an IMG container), the nightly build still brings up the Browse / Save File requester with roughly the same frequency as in FF14 and FF15. Hence it looks as though this side of the problem remains both unpatched and unfixed.
That is basically exactly what I hoped for. All I was fixing in this bug was the fact that the webcam connection disconnected when that error happens; I've filed bug 791258 for the underlying networking bug (which I'm not the best person to fix).
Comment on attachment 661014 [details] [diff] [review] don't bail if we fail to initialize a part Jeff, are you comfortable reviewing this change? I was OK with the last patch, but it's pretty tough for me to say whether this is safe.
Attachment #661014 - Flags: review?(justin.lebar+bug) → review?(jmuizelaar)
Comment on attachment 661014 [details] [diff] [review] don't bail if we fail to initialize a part Review of attachment 661014 [details] [diff] [review]: ----------------------------------------------------------------- I can't really say whether this is safe. Previously we didn't have a test case and now we do. So I feel somewhat ok with this.
Attachment #661014 - Flags: review?(jmuizelaar) → review+
Given where we are in the cycle, and the fact that FF15 was similarly affected (without a major uproar), we'll have to fix this first in FF17. Affected users can run Aurora/Beta until that release.
Alex - this is a bug that affects the whole security IP camera industry. Anyone looking at a streamed motion jpeg from a security camera cannot sensibly use FF15. Right now, my company tells its clients to turn FF auto update off and revert to FF14, which makes FF look really bad. (It's not comfortable asking clients to run Beta releases, would you be?) Basically, Joe's patch stops the problem from stopping streams early (even if FF's underlying network level problem is still there), so why not put it in FF16?
(In reply to Nick Pelling from comment #41) > Basically, Joe's patch stops the problem from stopping streams early (even > if FF's underlying network level problem is still there), so why not put it > in FF16? We're ~2 weeks away from release, and ~8 weeks away from the next. In order to get this into FF16, we would need to 1) Land on mozilla-central now 2) Cross our fingers that the landing on central will go well, and land these two fairly large patches on mozilla-aurora 3) Land on mozilla-beta before Tuesday's beta 5 go to build, without any real feedback about regressions from Nightly/Aurora 4) Ship beta 5 out next Friday, hope that we receive feedback before our final beta 6 (basically an RC) build, going to build on Monday This sadly is untenable and poor release practice. We only land critical issues affecting a large population at this point in the cycle.
> It's not comfortable asking clients to run Beta releases, would you be? Our Beta builds are pretty stable. I'd be perfectly comfortable asking clients to use Beta while we fix this regression; the chance of Beta performing /worse/ than release for your clients is pretty low...
https://hg.mozilla.org/mozilla-central/rev/dccfebdf756c https://hg.mozilla.org/mozilla-central/rev/31f6e5e2ca84 https://hg.mozilla.org/mozilla-central/rev/eea69a3e5c1b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Justin Lebar [:jlebar] (PTO 9/27 - 10/1) from comment #43) > > It's not comfortable asking clients to run Beta releases, would you be? > > Our Beta builds are pretty stable. I'd be perfectly comfortable asking > clients to use Beta while we fix this regression; the chance of Beta > performing /worse/ than release for your clients is pretty low... Can you please tell me which beta release my company should be directing clients to download that includes this patch? The obvious public beta releases I can see are all FF16 betas, which I presume don't include this patch (if it's only going into FF17). Thanks!
(In reply to Joe Drew (:JOEDREW! \o/) from comment #44) > https://hg.mozilla.org/integration/mozilla-inbound/rev/dccfebdf756c > https://hg.mozilla.org/integration/mozilla-inbound/rev/31f6e5e2ca84 > https://hg.mozilla.org/integration/mozilla-inbound/rev/eea69a3e5c1b Let's get this nominated/approved for Aurora 18 uplift. Thanks!
(In reply to Nick Pelling from comment #46) > Can you please tell me which beta release my company should be directing > clients to download that includes this patch? > > The obvious public beta releases I can see are all FF16 betas, which I > presume don't include this patch (if it's only going into FF17). Thanks! Once uplifted to Aurora, you'll be able to find it here: https://www.mozilla.org/en-US/firefox/aurora/ After 10/9 or so (where Aurora migrates to Beta), you'll be able to find it here: https://www.mozilla.org/en-US/firefox/beta/
Comment on attachment 661014 [details] [diff] [review] don't bail if we fail to initialize a part [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 733553 User impact if declined: Webcams stop loading Testing completed (on m-c, etc.): On mozilla-central as well as automated tests Risk to taking this patch (and alternatives if risky): Could further break webcams, though probably not. String or UUID changes made by this patch: None
Attachment #661014 - Flags: approval-mozilla-aurora?
Attachment #659019 - Flags: approval-mozilla-aurora?
Attachment #659018 - Flags: approval-mozilla-aurora?
Attachment #659018 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #659019 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #661014 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Backed out in https://hg.mozilla.org/releases/mozilla-aurora/rev/7a03659b943e - "ABORT: Can't be discardable or decode-on-draw for multipart" in mochitest-4 and reftest.
Reftests also failed for other reasons, joyously. Alex, I'd like to just let this ride the trains at this point; the merge to Aurora was complex and clearly faily, and I think fixing it will be riskier than it's worth.
Thanks Joe, untracking and marking wontfix for 17 in that case.
Looking at FF 18 beta 7, it looks as though the fix for the problem I first reported is fixed and about to ship in FF18, thanks everyone! All the same, from the way individual streams flicker and flash roughly once a minute, it looks to me as though the underlying bug is still in place. i.e. where (as I understand it) a given frame's mimetype is completely ok on entry to FF but then somehow gets corrupted inside FF before being used later on. Has anyone opened a ticket for this underlying bug?
(In reply to Nick Pelling from comment #55) > Has anyone opened a ticket for this underlying bug? Yep - that's bug 791258.
You need to log in before you can comment on or make changes to this bug.