Closed Bug 857906 Opened 11 years ago Closed 11 years ago

Loading forever (progress indicator(tab icon) is continue spinning and status display is continue showing) on multipart motion jpeg site

Categories

(Core :: Networking: HTTP, defect)

20 Branch
x86_64
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla23
Tracking Status
firefox20 --- wontfix
firefox21 + wontfix
firefox22 + fixed
firefox23 + fixed
firefox-esr17 --- unaffected

People

(Reporter: alice0775, Assigned: mcmanus)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Build Identifier:
http://hg.mozilla.org/mozilla-central/rev/b5cb88ccd907
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130403 Firefox/23.0 ID:20130403090950


Steps to Reproduce:
1. Open multipart motion jpeg site
     Ex. http://www.shioda-dcom.co.jp/shiocame/
         http://www.oadm.cat/en/content/61/live-webcams.htm

Actual Results:
Loading forever (progress indicator(tab icon) is continue spinning and status display is continue showing)

Expected Results:
progress indicator should stop spinning and Status display disappears immediately.

Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/1942b4d64dc8
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121204 Firefox/20.0 ID:20121204174201
Bad:
http://hg.mozilla.org/mozilla-central/rev/56f34c1cc509
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121205 Firefox/20.0 ID:20121205072802
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1942b4d64dc8&tochange=56f34c1cc509


Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/17278474949a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121204 Firefox/20.0 ID:20121204150558
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/7e8069286526
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121204 Firefox/20.0 ID:20121204150900
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=17278474949a&tochange=7e8069286526

Regressed by bug 792438
In local build,
Last Good: 17278474949a
First Bad: 6c63bb77ddef

Regressed by: 
  6c63bb77ddef	Patrick McManus — bug 792438 - part 0. imagelib proxies should share load group root r=joedrew
When I try beta (or presumably released 20) 
http://www.oadm.cat/en/content/61/live-webcams.htm
I get the behavior described.. a spinning loader icon on the tab.

And when I back 6c63bb77ddef out of beta the problem is resolved. That's a bad fix tho - I'll see if I can find a better one.

However, nightly is a much bigger fail with this same page.. even with  6c63bb77ddef backed out. The browser deadlocks.

First, is the RasterImage::DecodePool::RequestDecode(RasterImage* aImg) Is_MainThread() assertion. I know that's being tracked elsewhere (I get it a lot in my sandbox builds) -  but this page hits it every single time as far as I can tell so maybe that is useful information.

If I get rid of that assert (which the other bug kind of gives mixed messages on) I get deadlocks. Multiple image decoder threads are blocked on getting the mutex in decodejob::run(), and the main thread is deadlocked trying to get it in decodedoneworker::run(). Its not clear to me who holds the lock.

joe, do you want to clone this bug or file a new one to track separately?
Flags: needinfo?(joe)
Blocks: 858605
(In reply to Patrick McManus [:mcmanus] from comment #2)

> 
> joe, do you want to clone this bug or file a new one to track separately?

I've filed the browser hang, which is applicable to 22 and 23 and isn't the bug alice files here, separately as 858605.
Flags: needinfo?(joe)
When I did this feature I needed to be able to create load group relationships so that all the stuff being conceptually loaded together could share some basic tracking information.

I did that by chaining together load groups and looking for the root group.

unfortunately the result of that is the image load group got added (not just referenced) to the base load group of the document.. and because the image channel never finishes (its a perpetual mutlipart motion image) the page loading count never drops to 0.

setting BACKGROUND on the load would fix that (as it would reduce the load group relationship to a reference instead of a recursive add), but it would also suppress some important observer events. so we won't fix it like that.

I bit the bullet and added an explicit parent/child relationship.. where a load group can carry a weak pointer to a parent load group without adding itself into the parent. That way the chain can still be traversed to find a root load group but without causing any side effects or increases in reference counts.

I did it with a new interface (ChildLoadGroup) because I think we need to backport the fix and touching the popular nsILoadGroup.idl probably isn't the best idea.. its largely an internal concept anyhow, so best not to pollute the base interface with it.
Assignee: nobody → mcmanus
Attached patch patch v1 (obsolete) — Splinter Review
based on beta (ff21), try likes this, the testcase reported in the problem statement is fixed, and I can confirm that the "logical root load group" code is still working.

We can't confirm the bit about the testcase in the problem report on 22 or 23 because that page is broken in those builds due to another unrelated bug.
Attachment #733978 - Flags: review?(jduell.mcbugs)
Comment on attachment 733978 [details] [diff] [review]
patch v1

Review of attachment 733978 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok, with a few questions that I want to raise (but don't need re-review on unless they result in significant code changes).

::: netwerk/base/public/nsILoadGroupChild.idl
@@ +5,5 @@
> +
> +#include "nsILoadGroup.idl"
> +
> +/**
> + * nsILoadGroupChild provides a heirarchy of load groups so that the

hierarchy

@@ +8,5 @@
> +/**
> + * nsILoadGroupChild provides a heirarchy of load groups so that the
> + * root load group can be used to conceptually tie a series of loading
> + * operations into a logical whole while still leaving them separate
> + * for the purposes of cancelation and status events.

cancellation :)

The bug here is that we don't want the status events from the imgLoader request.  But this patch also changes whether the imgLoader request gets cancelled if the parent loadGroup get cancelled, right? (i.e. imgLoad would no longer be cancelled.) Do we not want to also cancel the image loads for a page if we cancel the load?

@@ +15,5 @@
> +[scriptable, uuid(02efe8e2-fbbc-4718-a299-b8a09c60bf6b)]
> +interface nsILoadGroupChild : nsISupports
> +{
> +    /**
> +     * The parent of this load group. It is stored weak referenced.

nit: "stored as a weak reference"?

@@ +22,5 @@
> +
> +    /**
> +     * The nsILoadGroup associated with this nsILoadGroupChild
> +     */
> +    readonly attribute nsILoadGroup childLoadGroup;

So why not just make nsILoadGroupChild inherit from nsILoadGroup, and then childLoadGroup just becomes /this/?  Seems like it ought to work and be slightly simpler.  Then again we don't have nsIHttpChannelInternal inherit from nsIHttpChannel, so obviously we can do this either way.  Implementor's choice.

::: netwerk/base/src/nsLoadGroup.cpp
@@ +775,5 @@
> +    *aParentLoadGroup = nullptr;
> +    nsCOMPtr<nsILoadGroup> parent = do_QueryReferent(mParentLoadGroup);
> +    if (!parent)
> +        return NS_OK;
> +    *aParentLoadGroup = parent.forget().get();

Are you sure you want to forget here?  Most of the other weak accessors I see in the tree just seem to return w/o forget call.  See nsDocument::GetFullscreenRoot(), and nsDSURIContentListener::GetParentContentListener looks similar.

Seems like we'd want to give a strong ref to callers so if some other thread kills off last ref we're not hosed.  But my understanding of this stuff is (...wait for it...) weak :)

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +678,5 @@
> +        return;
> +
> +    nsCOMPtr<nsILoadGroup> rootLoadGroup;
> +    childLoadGroup->GetRootLoadGroup(getter_AddRefs(rootLoadGroup));
> +    if (!rootLoadGroup)

nit: the way you've written it (and I assume the way the IDL implicitly specifies: maybe make explicit if that's how we want it to work?), you can't get null back for GetRoot, you'll just get /this/ if there's no encompassing LG. So this is unreachable, and couldbe replaced with MOZ_ASSERT or nothing.
Attachment #733978 - Flags: review?(jduell.mcbugs) → review+
So I think this patch is ok on the fundamentals (I'll update the comments, spelling and whatnot), but I'd like to make sure you agree with this reasoning.. these aren't my strongest interfaces either. thanks for the feedback.


> 
> @@ +8,5 @@
> > +/**
> > + * nsILoadGroupChild provides a heirarchy of load groups so that the
> > + * root load group can be used to conceptually tie a series of loading
> > + * operations into a logical whole while still leaving them separate
> > + * for the purposes of cancelation and status events.
> 
> cancellation :)
> 
> The bug here is that we don't want the status events from the imgLoader
> request.  But this patch also changes whether the imgLoader request gets
> cancelled if the parent loadGroup get cancelled, right? (i.e. imgLoad would
> no longer be cancelled.) Do we not want to also cancel the image loads for a
> page if we cancel the load?
> 

I think there is confusion here. The bug being fixed was introduced by me with this patch https://hg.mozilla.org/integration/mozilla-inbound/rev/6c63bb77ddef

The image code always has been using a new load group for the channel that loads the image (and it gets observers off of that). The bad change I made set the load group of the new load group to be the one for the root document. I did that so that the channel code that was loading the image could walk the load group chain and find the one-load-group-to-bind-them-all in order to apply a resource scheduling policy to it. (block images on js/css in that case).

The side effect of doing that is that not only is the mLoadGroup member of the new load group set to the parent the new load group (what I wanted) it is actually also added to the parent's group! And now we have a new binding that keeps the parent load group active while the child is loading which we don't want.

I mentioned that BACKGROUND could prevent adding the new load group to the parent, but it would also suppress the observer events - and we don't want that.

So the new arrangement basically allows you to store who your parent is without adding yourself to the parent or suppressing observer events. I've changed the image loading code back to the way it was (don't set the load group of the new load group) and had it use the new mechanism.

> ::: netwerk/base/src/nsLoadGroup.cpp
> @@ +775,5 @@
> > +    *aParentLoadGroup = nullptr;
> > +    nsCOMPtr<nsILoadGroup> parent = do_QueryReferent(mParentLoadGroup);
> > +    if (!parent)
> > +        return NS_OK;
> > +    *aParentLoadGroup = parent.forget().get();
> 
> Are you sure you want to forget here?  Most of the other weak accessors I
> see in the tree just seem to return w/o forget call.  See
> nsDocument::GetFullscreenRoot(), and
> nsDSURIContentListener::GetParentContentListener looks similar.
> 
> Seems like we'd want to give a strong ref to callers so if some other thread
> kills off last ref we're not hosed.  But my understanding of this stuff is
> (...wait for it...) weak :)
> 

This is giving a strong ref to callers. parent is a strong ref (that's what queryReferent returns if the weakptr is still valid).. the forget() syntax is basically the same as addref, right?

> ::: netwerk/protocol/http/nsHttpChannel.cpp
> @@ +678,5 @@
> > +        return;
> > +
> > +    nsCOMPtr<nsILoadGroup> rootLoadGroup;
> > +    childLoadGroup->GetRootLoadGroup(getter_AddRefs(rootLoadGroup));
> > +    if (!rootLoadGroup)
> 
> nit: the way you've written it (and I assume the way the IDL implicitly
> specifies: maybe make explicit if that's how we want it to work?), you can't
> get null back for GetRoot, you'll just get /this/ if there's no encompassing
> LG. So this is unreachable, and couldbe replaced with MOZ_ASSERT or nothing.

right - you never get null. In this implementation any nsiloadgroupchild is also a nsiLoadGroup - so if you don't have a parent then you are the root.
Flags: needinfo?(jduell.mcbugs)
> ::: netwerk/base/src/nsLoadGroup.cpp
> @@ +775,5 @@
> > +    *aParentLoadGroup = nullptr;
> > +    nsCOMPtr<nsILoadGroup> parent = do_QueryReferent(mParentLoadGroup);
> > +    if (!parent)
> > +        return NS_OK;
> > +    *aParentLoadGroup = parent.forget().get();
> 
> Are you sure you want to forget here?  Most of the other weak accessors I
> see in the tree just seem to return w/o forget call.  See
> nsDocument::GetFullscreenRoot(), and
> nsDSURIContentListener::GetParentContentListener looks similar.


so getfulscreenroot() returns a weak ptr.. I think anyhow. its not documented either way, but that's what it seems to be doing.

getparentcontentlistener() is just like my new method. it is a xpcom getter_addrefs() style of interface and works just like what I have implemented here. we use slightly different syntax but in each case a reference is added for the caller. the .forget.get() idiom is used in a few places to implement the "back-end" of these GetFoo(getter_addrefs(smartptr)) interfaces. I like it more than the explicit ADDREF.
> the forget() syntax is basically the same as addref.

Yes you're right.  Sorry for the wild goose chase about weak refptrs.

> I'd like to make sure you agree with this reasoning

I get the part about how we don't want the imgLoader's LG to be part of the root doc's LG for purposes of notifying when the page load is complete.  The change to use the new IDL definitely fixes this bug.  I was just curious about whether this change affects page cancellation.  My understanding from looking at the code is that canceling a LG cancels all the requests in it (including other loadGroups), so adding the imgLoader LG to the parent LG as you did in the code that caused this bug should also have causes image loads to be cancelled when a root document's LG was cancelled.  And before that (and once this patch lands), we didn't/won't cancel the image loads.  Is that all well and good?  It would be nice to cancel image loads we don't need.  But since the whole point of the imgLoader is to share image loads across documents, perhaps it wouldn't be safe to cancel them w/o knowing there's no other doc that needs them.  If you don't know the answer, that's not the end of the world--we're just reverting to the way things were AFAICT.
Flags: needinfo?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #9)
>  If you don't know the answer, that's
> not the end of the world--we're just reverting to the way things were AFAICT.

I have a lot of theories, but I don't want to say something I don't fully understand. By observation I can say that we do see to do the cancels where I would expect it. For example, closing the tab kills the ever-loading-animated-jpg in this bug.

I can't put my finger quickly on where that cancel happens. I suppose it might just be a dropped reference to the channel, but I don't think so.

The image request proxy creates a nsirequest that gets added to the parent load group, so it is getting the notifications that way - even if the actual loading channel is in a separate group.
(In reply to Jason Duell (:jduell) from comment #6)

> So why not just make nsILoadGroupChild inherit from nsILoadGroup, and then
> childLoadGroup just becomes /this/? 


for the comment record, this is really because we need to backport this to fix the regression and I thought the QI to a new interface was less impactful than changing the object hierarchy for existing interfaces
Attached patch patch v2Splinter Review
Attachment #735144 - Flags: superreview?(bzbarsky)
Attachment #735144 - Flags: review+
Comment on attachment 735144 [details] [diff] [review]
patch v2

What happens if parentLoadGroup is set when childLoadGroup.loadGroup is already set, or vice versa?  The interface should document that.  And the implementations should do it.

And maybe worth making clear that you mean "weak reference" in the nsWeakPtr sense.

>+    *aParentLoadGroup = parent.forget().get();

  parent.forget(aParentLoadGroup);

please.

sr=me with those fixed
Attachment #735144 - Flags: superreview?(bzbarsky) → superreview+
Attachment #733978 - Attachment is obsolete: true
Comment on attachment 735144 [details] [diff] [review]
patch v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 792438
User impact if declined: progress indicator never stops for pages containing multipart images
Testing completed (on m-c, etc.): inbound
Risk to taking this patch (and alternatives if risky): pretty low. primary impact of this patch is to revert overloaded code in 792438 and replace it with single purpose code.
String or IDL/UUID changes made by this patch: there is a brand new IDL added, but no changes to existing interfaces are made.
Attachment #735144 - Flags: approval-mozilla-beta?
Attachment #735144 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/6b893ed61f92
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Patrick McManus [:mcmanus] from comment #15)
> Comment on attachment 735144 [details] [diff] [review]
> patch v2
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): 792438
> User impact if declined: progress indicator never stops for pages containing
> multipart images
> Testing completed (on m-c, etc.): inbound
> Risk to taking this patch (and alternatives if risky): pretty low. primary
> impact of this patch is to revert overloaded code in 792438 and replace it
> with single purpose code.
> String or IDL/UUID changes made by this patch: there is a brand new IDL
> added, but no changes to existing interfaces are made.

We are comfortable uplifting this until aurora & let it ride the trains from that point.This regression has been there since Fx20 and we have not heard any outside reports so far.This Bug does not seem to break anything as such and we can leave it wontfixed on Fx21.Also adding a new interface seems slightly risky to get uplifted on Beta.

Feel free to renominate if I've got anything wrong above or there is a strong reason for us to take this in Beta 3.
Attachment #735144 - Flags: approval-mozilla-beta?
Attachment #735144 - Flags: approval-mozilla-beta-
Attachment #735144 - Flags: approval-mozilla-aurora?
Attachment #735144 - Flags: approval-mozilla-aurora+
Depends on: 861595
Using the latest Nightly( http://hg.mozilla.org/mozilla-central/rev/50ab959f4bd1 ), i can still reproduce this on http://whitecity.dvrdns.org/cgi-bin/guestimage.html

STR : 

1. open the URL
2. The site may/may not keep on loading.
3. In case it loads, drag and drop the image to the current tab.( This will open the image in the current tab.)
4. Click on the "Back" button.

5. The original URL opens, and the page keeps on loading.

Caveat : I have a poor ISP.
The above STR only happens on the third attempt. So the  slightly modified STR is :

Create fresh profile.

1. open the URL
2. The site may/may not keep on loading.
3. In case it loads, drag and drop the image to the current tab.( This will open the image in the current tab.)
4. Click on the "Back" button.
5. Close Firefox.
6. Open firefox. 

Repeat these STR 3 times. On the third try does the "loading forever" happens. 

So i thought this might be related to the disk cache. To test, i disabled the disk cache (from options->preference->advanced->network) , and repeated the STR 3 times.
And I could not repro the bug this time.
Blocks: 864014
Depends on: 862816
(In reply to bhavana bajaj [:bajaj] from comment #17)
> (In reply to Patrick McManus [:mcmanus] from comment #15)
> > Comment on attachment 735144 [details] [diff] [review]
> > patch v2
> > 
> > [Approval Request Comment]
> > Bug caused by (feature/regressing bug #): 792438
> > User impact if declined: progress indicator never stops for pages containing
> > multipart images
> > Testing completed (on m-c, etc.): inbound
> > Risk to taking this patch (and alternatives if risky): pretty low. primary
> > impact of this patch is to revert overloaded code in 792438 and replace it
> > with single purpose code.
> > String or IDL/UUID changes made by this patch: there is a brand new IDL
> > added, but no changes to existing interfaces are made.
> 
> We are comfortable uplifting this until aurora & let it ride the trains from
> that point.This regression has been there since Fx20 and we have not heard
> any outside reports so far.This Bug does not seem to break anything as such
> and we can leave it wontfixed on Fx21.Also adding a new interface seems
> slightly risky to get uplifted on Beta.
> 
> Feel free to renominate if I've got anything wrong above or there is a
> strong reason for us to take this in Beta 3.


See bug 864014 , I think it is necessary to reconsider fixing in Firefox21beta.
I request to track 21.
Keywords: verifyme
I've reproduced this issue on Nightly 20.0a1 (Build ID: 20121209030817) when loading http://www.oadm.cat/en/content/61/live-webcams.htm URL.

Verified with FF 23 beta 7 (Build ID: 20130718163513) and latest Nightly (Build ID: 20130717030207) on: Mac OS X 10.7, Windows 7 x64, Ubuntu 13.04 x64;
Result: the page is properly displayed; no progress indicator spinning nor status display continuously showing; but when I hit refresh button, the URL is continuously loading. I consider that this issue is not completely fixed; 

Any thoughts?
QA Contact: alexandra.lucinet
(In reply to Alexandra Lucinet [QA] from comment #23)
> when I hit refresh button, the URL is continuously loading. 
> I consider that this issue is not completely fixed; 

Any thoughts on this Patrick?
Flags: needinfo?(mcmanus)
I tried the repro from comment 23 and this is wfm on nightly 27.0a1 (2013-10-22)
Flags: needinfo?(mcmanus)
Calling this verified fixed. Please open a follow-up bug if you have further issues.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: