motion JPEG server push (multipart/x-mixed-replace image stream) hangs / freezes Firefox

VERIFIED FIXED in Firefox 24

Status

()

defect
--
critical
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: bugzilla, Assigned: joe)

Tracking

(Blocks 1 bug, {hang, regression})

22 Branch
mozilla25
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox22- wontfix, firefox23+ wontfix, firefox24+ verified, firefox25+ verified)

Details

()

Attachments

(2 attachments, 1 obsolete attachment)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130618035212

Steps to reproduce:

- Open http://www.videovalvonta.fi/control/faststream.jpg?stream=full&html

The above link (when it works, it can be a bit shaky) points to a network camera providing motion JPEG via server push.
 


Actual results:

Firefox freezes / hangs, "wheel of death" appears.
(it may not immediately happen, try a few time, use backward/forward navigation)


Expected results:

Firefox should display motion JPEG images.
Component: Untriaged → ImageLib
Product: Firefox → Core
Could be similar to bug #329681 but occurs with even much longer intervals.
Severity: normal → critical
Keywords: hang
Reproduced with:
* 22.0 Beta linux-x86_64 — bp-219998e5-681a-4079-ad7b-1ed452130626
2013-06-26-03-11-00-mozilla-central-firefox-25.0a1.en-US.linux-x86_64 — bp-094e060b-9c8d-433c-a5d8-8c4132130626
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X → All
Hangs 22 for me, but works fine in nightly. Probably one of the mjpeg patches that went in fixed this, so we should find the fixed range too.
(In reply to Timothy Nikkel (:tn) from comment #3)
> Hangs 22 for me, but works fine in nightly. Probably one of the mjpeg
> patches that went in fixed this, so we should find the fixed range too.

I tested once each in nightly and 22, so if the problem is intermittent this could mean nothing.
WFM with 2005-08-01-05-trunk-firefox-1.0+.en-US.linux-i686
WFM with 2013-02-01-03-09-27-mozilla-central-firefox-21.0a1.de.linux-x86_64
WFM with 2013-03-04-03-09-33-mozilla-central-firefox-22.0a1.en-US.linux-x86_64
problems showing image: 2013-04-01-03-08-17-mozilla-central-firefox-22.0a1.en-US.linux-x86_64
WFM: 2013-04-08-03-09-28-mozilla-central-firefox-23.0a1.en-US.linux-x86_64
> WFM (tried twice, seems real WFM compared to the subsequent ones): 2013-04-09-03-08-55-mozilla-central-firefox-23.0a1.en-US.linux-x86_64
> Hang: 2013-04-10-06-59-39-mozilla-central-firefox-23.0a1.en-US.linux-x86_64 — bp-7db976a9-0e7d-4beb-aaf4-342a72130626 (no source references)
2013-04-11-03-09-25-mozilla-central-firefox-23.0a1.en-US.linux-x86_64 - bp-733bc5ca-eff1-40b0-a764-858702130626 (no source references)
Hang (almost immediate): 2013-04-22-10-58-38-mozilla-central-firefox-23.0a1.en-US.linux-x86_64 — bp-5a09d343-3736-4a3c-b5ba-806852130626 (no source references)
Hang (almost immediate): 2013-05-03-03-09-20-Hang: mozilla-central-firefox-23.0a1.en-US.linux-x86_64 — bp-86bdc92d-86fa-4e2b-abfd-d500f2130626 bp-b87415be-9f0e-44db-9320-678632130626 (no source references)


Note: sometimes the stream breaks, showing:
Corrupt JPEG data: premature end of data segment
Corrupt JPEG data: 4892 extraneous bytes before marker 0x0b
> Note: sometimes the stream breaks, showing:

(showing in the terminal)
For crashreport links with source references, see comment #2.
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
FF 21 WFM
likely similar to 861595 in that bisecting won't get you to the real root cause due to some parallel development.
Where does Firefox hang? Can someone get a stack trace?
How can I help to fix this bug?
Assignee: nobody → joe
Blocks: 716140
The problem was that we were locking the image decode lock and then trying to do a size decode, which itself tries to take the image decode lock. Simply rearranging the code in SyncDecode to not do this fixes the problem.
Attachment #780574 - Flags: review?(seth)
Comment on attachment 780574 [details] [diff] [review]
don't try to size decode with the lock held

Review of attachment 780574 [details] [diff] [review]:
-----------------------------------------------------------------

Wow; I'm surprised this is the only case we know of that hits this. Glad this is fixed.
Attachment #780574 - Flags: review?(seth) → review+
Comment on attachment 780574 [details] [diff] [review]
don't try to size decode with the lock held

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140
User impact if declined: Firefox hangs in unpredictable situations, most prominently in this webcam
Testing completed (on m-c, etc.): Locally tested, on try, just pushed to inbound
Risk to taking this patch (and alternatives if risky): Not terribly risky.
String or IDL/UUID changes made by this patch: none
Attachment #780574 - Flags: approval-mozilla-beta?
Attachment #780574 - Flags: approval-mozilla-aurora?
Blocks: 898436
Attachment #780574 - Flags: approval-mozilla-beta?
Attachment #780574 - Flags: approval-mozilla-aurora?
I have the dumb. mInDecoder is actually meant to be a locked variable, because it can be (and is) modified on decoding threads. Therefore, you can only read it for the assert while holding the decoder lock.

I will file a followup bug to make sure all mInDecoder use is properly locked.
Attachment #780574 - Attachment is obsolete: true
Attachment #781845 - Flags: review+
(In reply to Joe Drew (:JOEDREW! \o/) from comment #19)
> I will file a followup bug to make sure all mInDecoder use is properly
> locked.

Bug 898569.
We're just a little over a week from releasing FF23 so definitely not respinning 22 for this regression. Joe, what kind of testing can QA do to verify this if we were willing to consider this for our final Beta which goes to build on Monday.  I'm inclined to skip this for FF23 since we're so late in the cycle but might be convinced if you can elaborate on risk/reward here.
https://hg.mozilla.org/mozilla-central/rev/ddbd70f50d73
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
(In reply to lsblakk@mozilla.com [:lsblakk] from comment #22)
> We're just a little over a week from releasing FF23 so definitely not
> respinning 22 for this regression. Joe, what kind of testing can QA do to
> verify this if we were willing to consider this for our final Beta which
> goes to build on Monday.  I'm inclined to skip this for FF23 since we're so
> late in the cycle but might be convinced if you can elaborate on risk/reward
> here.

This has very high reward IMO. It's pretty unpredictable when we can see this hang, and AFAIK we have no visibility into it. (I'm willing, and hoping, to be proven wrong on that!)

Like any code change, this is not zero risk, but I'm quite comfortable with it, especially since the world hasn't exploded since it got into mozilla-central.
Flags: needinfo?(joe)
As for QA: This is timing-related, so doing things like going back and forth on pages, especially JPEG webcams like the one on this page, and seeing whether we get other problems. We synchronously decode when going back and forward on pages, and also when we're scrolling images back into view.
And aw crap, I forgot to nom this for landing in beta (thought I already had) so it didn't make the last build. I wouldn't be comfortable with this being a ride-along, so I'll nom it only for aurora for now.
Comment on attachment 781845 [details] [diff] [review]
don't try to size decode with the lock held v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140
User impact if declined: Hangs (that we don't catch with any mechanism)
Testing completed (on m-c, etc.): On m-c for a few days
Risk to taking this patch (and alternatives if risky): Not very risky. Moving some code, nothing else.
String or IDL/UUID changes made by this patch: none
Attachment #781845 - Flags: approval-mozilla-aurora?
Comment on attachment 781845 [details] [diff] [review]
don't try to size decode with the lock held v2

Here's hoping this doesn't blow up terribly in release population, approving for aurora.
Attachment #781845 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Thanks for fixing this. Of course I hoped to see the fix in FF23 but obviously the fix did not make it into FF23.

When will FF24 be released approximately?
The week of September 16th is a good time to start looking for 24.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0

Verified as fixed on Windows 7 x32, Ubuntu 12.04 x32 and Mac OS X 10.8 with Firefox 24 beta 8 (buildID: 20130902131354) and latest Nightly (buildID: 20130902030220).
Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (X11; Linux i686; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:25.0) Gecko/20100101 Firefox/25.0

Also verified as fixed on latest Aurora 25.0a2 (buildID: 20130910004002).
Status: RESOLVED → VERIFIED
Keywords: verifyme
Great, Firefox 24 is out and viewing the webcam live stream does not hang FF any more.
You need to log in before you can comment on or make changes to this bug.