Closed Bug 861595 Opened 11 years ago Closed 11 years ago

multipart motion jpeg stops working immediately

Categories

(Core :: Networking: HTTP, defect)

22 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla24
Tracking Status
firefox21 --- unaffected
firefox22 + verified
firefox23 + verified
firefox24 --- verified

People

(Reporter: alice0775, Assigned: joe)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Broken in Aurora22.0a2 and Nightly23.0a1.
Works ok in Firefox21.0b2

Steps to reproduce:
1. Open http://www.shioda-dcom.co.jp/shiocame/

Actual Results:
multipart motion jpeg stops working immediately

Expected Results:
multipart motion jpeg should display continually

Regression window:
Good(but Bug 857906):
http://hg.mozilla.org/integration/mozilla-inbound/rev/a703fa409b15
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130409 Firefox/23.0 ID:20130409142132
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/6ceb44926370
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130409 Firefox/23.0 ID:20130409142734
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a703fa409b15&tochange=6ceb44926370

Suspected: Bug 857906


This is a feature broken and a recent regression.
So, I reqested to track 22 and 23.
I think that Bug 857906 should be backed out in aurora22 at least.
Alice, I'm kind of on the fence about whether or not 857906 is the real problem. Backing it out of nightly does help (at the cost of reopening the bug that was in there to fix), but I think this might be exposing a problem with bug 857367 instead.. the patch really juts reverts our loadgroup behavior wrt images back to what it was with FF19 - so its not very suspicious other than a lot of img loader changes have been done in the interim that were not tested in that configuration. Also note that 857367 hasn't been applied to aurora yet (which means this test case has other problems on that branch too.. I believe it hung the browser for me.).

in any event - on nightly the url http://www.shioda-dcom.co.jp/shiocame/ reproducibly gives this ASSERT failure with a DEBUG build. Its possible this is the true root of the issue.. I hope :seth can weigh in.


#0  0x00007ffff7fe85f1 in mozalloc_abort (msg=0x7fffffffc050 "###!!! ABORT: Shouldn't have a decoder in NewSourceData: '!mDecoder', file ../../../scratch/image/src/RasterImage.cpp, line 1867") at ../../../scratch/memory/mozalloc/mozalloc_abort.cpp:30
#1  0x00007ffff35a698c in Abort (aMsg=0x7fffffffc050 "###!!! ABORT: Shouldn't have a decoder in NewSourceData: '!mDecoder', file ../../../scratch/image/src/RasterImage.cpp, line 1867") at ../../../scratch/xpcom/base/nsDebugImpl.cpp:430
#2  0x00007ffff35a6897 in NS_DebugBreak (aSeverity=3, aStr=0x7ffff4d37550 "Shouldn't have a decoder in NewSourceData", aExpr=0x7ffff4d3753f "!mDecoder", aFile=0x7ffff4d36a08 "../../../scratch/image/src/RasterImage.cpp", aLine=1867)
    at ../../../scratch/xpcom/base/nsDebugImpl.cpp:387
#3  0x00007ffff16f1807 in mozilla::image::RasterImage::OnNewSourceData (this=0x7fffca93b290) at ../../../scratch/image/src/RasterImage.cpp:1867
#4  0x00007ffff171f157 in imgRequest::OnStartRequest (this=0x7fffd3821e50, aRequest=0x7fffcc45bb40, ctxt=0x0) at ../../../scratch/image/src/imgRequest.cpp:537
#5  0x00007ffff1445513 in nsPartChannel::SendOnStartRequest (this=0x7fffcc45bb40, aContext=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at ../../../../scratch/netwerk/streamconv/converters/nsMultiMixedConv.cpp:73
#6  0x00007ffff14481d6 in nsMultiMixedConv::SendStart (this=0x7fffca87c110, aChannel=0x7fffca905858) at ../../../../scratch/netwerk/streamconv/converters/nsMultiMixedConv.cpp:851
#7  0x00007ffff14471c0 in nsMultiMixedConv::OnDataAvailable (this=0x7fffca87c110, request=0x7fffca905858, context=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0, inStr=0x7fffca8c8a10, sourceOffset=67514, count=8829) at ../../../../scratch/netwerk/streamconv/converters/nsMultiMixedConv.cpp:595
#8  0x00007ffff1715abb in ProxyListener::OnDataAvailable (this=0x7fffca8dd500, aRequest=0x7fffca905858, ctxt=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0, inStr=0x7fffca8c8a10, sourceOffset=67514, count=8829) at ../../../scratch/image/src/imgLoader.cpp:2134
#9  0x00007ffff13ffa84 in nsStreamListenerTee::OnDataAvailable (this=0x7fffcc4f6c40, request=0x7fffca905858, context=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0, input=0x7fffcc424238, offset=67514, count=8829) at ../../../../scratch/netwerk/base/src/nsStreamListenerTee.cpp:92
#10 0x00007ffff150e99e in mozilla::net::nsHttpChannel::OnDataAvailable (this=0x7fffca905800, request=0x7fffca8fdf00, ctxt=warning: (Internal error: pc 0x0 in read in psymtab, but not in symta
Flags: needinfo?(seth)
I'd find it surprising if bug 857367 specifically was the problem; that code mostly affects shutdown. I need to look at this in a debugger. Will report back shortly.
Flags: needinfo?(seth)
Patrick, the assert you describe is fixed by bug 857876, which just hit m-c.
I believe that bug 791258 is related to this, and is probably the real root problem or at least a symptom closer to the real root problem. Although the summary of that bug is about header issues, the comments suggest that data may generally be getting silently dropped from multipart streams. I think bug 857906 just makes the problem more noticeable by causing display of the mJPEG to stop when we hit the other issue.

Patrick, do you concur?
Flags: needinfo?(mcmanus)
Bug 858600 may also be related to this whole mess.
So rather than backing out bug 857367 we could uplift bug 857876 - have marked tracking on that bug and will track this one as well while we look into it further.
Assignee: nobody → joe
Alex/Lukas: what needs my attention here?
(In reply to Seth Fowler [:seth] from comment #5)
> Patrick, do you concur?

I'm looking at this now. I do see those stream errors but haven't made heads nor tails of them.

Its possible there is still some kind of loadgroup interaction going on. 857906 basically reverted that to the old style, but a bunch of image work went on in the meantime so we might have some kind of combo bug.
Flags: needinfo?(mcmanus)
tl;dr Due to bisecting and some local patching along the way, I think the root cause of this issue is 
https://hg.mozilla.org/mozilla-central/rev/2b96c5907a1b

That is part of 716140 (multithreaded image decoding).

The trail to get there is confusing. Bear with me.

My primary test case is http://www.ducksong.com/misc/zz.html which is an empty page with a link to a webcam provided in one of these bug reports.

Before 792438 (specifically 
https://hg.mozilla.org/mozilla-central/rev/6c63bb77ddef ) landed, the case works fine. That is my patch dealing with network resource loader changes.

The image loader creates a channel for a new image.. but instead of adding that channel to the loadgroup for the current tab, it creates a new load group (to deal with redirects according to the code.. I don't fully grok the reasoning I will admit). My patch added the new load group to the base tab load group. It did that so the resource loading code could identify some kind of grouping between different images/html on the same page for the purpose of scheduling. But that turned out to have a bug - bug 857906. By adding the new group to the old group that meant that the tab would never appear loaded when the new group still had data coming in (the icon would spin) and that's very relevant to things like endless multipart jpgs. Later on I would fix that bug by reverting the image loader to the behavior it always had and just carrying a weak reference to the tab load group in the new load group - this didn't have any side effects and reverts things to the way they always had been from the perspective of the image loader. But I'm getting ahead of myself.

So if you go to m-c and revert 6c63bb77ddef the test case continues to work fine (no spinning, no "JPEG decoding error", etc..) until the multithreaded-image-decoding work lands. specifically 2b96c5907a1b is the first cset where we start seeing JPEG decoding error pop up and some crashes occur in debug builds.

Its important to note that my patch for 857906 hasn't landed yet in this sequence of events. Eventually that patch lands and the crash goes away (separate events), but I don't think its the root issue.. and its definitely right from the perspective that it reverts things to the way things were before 716140.

To recap:

test case http://www.ducksong.com/misc/zz.html (sometimes you need to reload a couple of times to be sure its ok)

go to parent of 2b96c5907a1b (parent is 16cd97be2846) and revert 6c63bb77ddef .. test should pass

go to 16cd97be2846, again revert 6c63bb77ddef .. test now fails.. in debug mode you'll get crashes and various "JPEG decoding error" messages (again sometimes reload). Over the next few weeks the crashes come and go, and 857906 lands, but the JPEG decoding error messages remain the chief indicator of concern.

I don't think there is a lot more that I can do on my own...

but seth I'm happy to hangout on video or irc or whatever and help. What do you think should be next?

I'll mark a bunch of bugs to be dup of this one.
Blocks: 716140
Flags: needinfo?(seth)
Patrick, I'm curious whether logging the bytes that the decoder sees might help. My best guess right now is that when the decoder is done with the size decode and starts to do the full decode, maybe it somehow doesn't end up seeing some of the data. Maybe an off-by-1 error somewhere. Looking at the bytes it receives with and without the patch (maybe just printf them out in hex) could prove or disprove this easily.
Flags: needinfo?(seth)
I have this exact problem. Forced to use FFv19 to be able to view these streamed videos.  Thanks!
(In reply to Joe Drew (:JOEDREW! \o/) from comment #8)
> Alex/Lukas: what needs my attention here?

We suspected that you could help with the investigation, since bug 857876 was being implicated. It seems that's not the case, so I'll follow up in email.
Assignee: joe → nobody
Assignee: nobody → joe
Previously, we would try to flush bytes to a decoder even if we didn't have any bytes to flush. Unfortunately, we can even get into the situation that we have a higher mBytesDecoded than mSourceData.Length() (such as when we're not storing source data at all)!

I *think* this is the correct solution, though I'd be willing to listen to reason!

https://tbpl.mozilla.org/?tree=Try&rev=b8a7821da471
Attachment #752064 - Flags: review?(seth)
Attachment #752064 - Flags: review?(seth) → review+
that patch works for my testcase. huzzah!
Unfortunately we can't early-exit from these functions if mBytesDecoded == mSourceData.Length(), because we rely on the FinishedSomeDecoding() calls later in the function. I could move the condition lower in the function, except that it can't be lower because it's guarding against it for DecodeUntilSizeAvailable() too. So I'll just leave it as >.

I also added an assertion that would have caught this problem.
Attachment #752064 - Attachment is obsolete: true
Attachment #752511 - Flags: review?(seth)
Comment on attachment 752511 [details] [diff] [review]
don't try to decode if we don't have bytes to decode v2

Review of attachment 752511 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #752511 - Flags: review?(seth) → review+
Comment on attachment 752511 [details] [diff] [review]
don't try to decode if we don't have bytes to decode v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140 (at least in part)
User impact if declined: JPEG webcam stops refreshing
Testing completed (on m-c, etc.): Just pushed to m-c; tested by a couple of people.
Risk to taking this patch (and alternatives if risky): Not too risky. As written, this would mostly be stopping us from going down a very bad path.
String or IDL/UUID changes made by this patch: none
Attachment #752511 - Flags: approval-mozilla-release?
Attachment #752511 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/9554181f5240
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Comment on attachment 752511 [details] [diff] [review]
don't try to decode if we don't have bytes to decode v2

Firefox 21 is unaffected - assuming this was for aurora/beta approvals.
Attachment #752511 - Flags: approval-mozilla-release?
Attachment #752511 - Flags: approval-mozilla-beta?
Attachment #752511 - Flags: approval-mozilla-beta+
Attachment #752511 - Flags: approval-mozilla-aurora+
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Build ID: 20130528181031

Verified as fixed on latest Firefox 22 beta 3 build.
http://www.shioda-dcom.co.jp/shiocame/ multipart motion jpeg displays continually on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130603 Firefox/24.0 ID:20130603031140 CSet: 198e38876f7e

Marking VERIFIED based on this and comment 27.
Status: RESOLVED → VERIFIED
Hussa, my Cam works fine again!
I was surprised how many people from different countries
have committed to eliminate the bug from the FF 20th
How nice that there are so many who volunteered at the Mozilla
Project involved. I wish to all my deepest gratitude for
express the tenacity and the diligence with which it itself
have used my specific problem.

Here they are again in remembrance

http://loumergens.dyndns.org/img/main.cgi?next_file=main.htm

Meanwhile, I had an out of frustration for the Mobotix webcam
Gained outside, because I no longer even a good
End had believed.
This camera is also mounted so it is now also a
Look in the area throw out there, so if it will be shown times
like, if you please:

http://loumergens.dyndns.org:8081/cgi-bin/guestimage.html

Thank you from good old Germany.
orry for my bad English, but i traslated it with oogle Translater!
Thank you, thank you, thank you!!!
Works great, I very much appreciate your hard work.
Wayne
I agree with the comments above.  Thanks for the really quick fix on this!  I'm happy to be back to using FF for everything again.
Keywords: verifyme
Verified as fixed on Firefox 23 beta 9:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Build ID: 20130725195523
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0

Verified as fixed on Firefox 24 beta 1 (buildID: 20130806170643).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: