Closed Bug 1042192 Opened 5 years ago Closed 5 years ago

GIFs get stuck mid-animation

Categories

(Core :: Networking: Cache, defect, major)

33 Branch
x86_64
Windows 8.1
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox32 --- unaffected
firefox33 + fixed
firefox34 + fixed

People

(Reporter: simon+mozilla, Assigned: michal)

References

Details

(Keywords: regression, reproducible)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 (Beta/Release)
Build ID: 20140722030201

Steps to reproduce:

1. go to http://wifflegif.com
2. at the bottom click next
3. close the tab
4. repeat 1. & 2.


Actual results:

Loading the next page the second time will produce stuck GIFs.
Throbber on the tab will spin forever and a 'Read domain.tld' for one of the stuck GIFs appears in the status bar overlay.

This has been happening for a couple of weeks now on the nightly channel.
It only seems to happen on cached GIFs, first load is always fine.
It only seems to happen on pages with multiple GIFs or lots of content, I couldn't repro with just one GIF.


Expected results:

Animated GIFs should play fine, no matter what.
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0

I can reproduce this behavior.
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0 (Beta/Release)
Build ID: 20140722030201

I can reproduce this on linux
I am unable to reproduce this on Windows 7 (34.0a1 (2014-07-22)) or Ubuntu (34.0a1 (2014-07-22), Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0).
[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

Regression window(m-c)
Good:
https://hg.mozilla.org/mozilla-central/rev/aab3362f97e9
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 ID:20140612142126
Bad:
https://hg.mozilla.org/mozilla-central/rev/adcf3f05f813
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 ID:20140612172426
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aab3362f97e9&tochange=adcf3f05f813


Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8c2c61229a3
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 ID:20140612090433
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/085ecf37395e
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 ID:20140612090732
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e8c2c61229a3&tochange=085ecf37395e

Regressed by: Bug 1013587, Bug 1013638
Blocks: 1013587, 1013638
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: Cache
Ever confirmed: true
Keywords: regression
Product: Firefox → Core
Version: 34 Branch → 33 Branch
Severity: normal → major
Keywords: reproducible
In local build,
Last Good: e8c2c61229a3
First Bad: 8f03e0ef5809

Triggered by:
 	8f03e0ef5809	Michal Novotny — Bug 1013638 - CacheFileIn(Out)putStream::AsyncWait() doesn't respect eventTarget argument, r=honzab
No longer blocks: 1013587
Assignee: nobody → michal.novotny
I finally understand the problem and I'm not sure whether it is a bug in nsThreadPool code or whether we use event targets in a wrong way in the cache code. The problem is that nsThreadPool runs only events that were dispatched via its Dispatch() method. When such event calls some code that dispatches another event to a target obtained using NS_GetCurrentThread(), this event won't run until nsThreadPool::Run() finishes, i.e. the thread is idle for a specified time and then shut down.

In the cache code we use the current thread for the internal callbacks, e.g. CacheFileInputStream calls CacheFile::GetChunkLocked() and when the chunk doesn't exist yet it gets the result asynchronously via CacheFileInputStream::OnChunkAvailable() on the same thread, so when the input stream is used on a thread from a thread pool, the callback is called with a huge delay.

I want to make sure this is an expected behavior that shouldn't be fixed before I rewrite completely event targets usage in the new cache.
Flags: needinfo?(nfroyd)
(In reply to Michal Novotny (:michal) from comment #6)
> I finally understand the problem and I'm not sure whether it is a bug in
> nsThreadPool code or whether we use event targets in a wrong way in the
> cache code. The problem is that nsThreadPool runs only events that were
> dispatched via its Dispatch() method. When such event calls some code that
> dispatches another event to a target obtained using NS_GetCurrentThread(),
> this event won't run until nsThreadPool::Run() finishes, i.e. the thread is
> idle for a specified time and then shut down.
> 
> I want to make sure this is an expected behavior that shouldn't be fixed
> before I rewrite completely event targets usage in the new cache.

This is unintuitive and unfortunate, but it does look like the expected behavior.

I don't know whether changing it would be OK; several minutes of thought to how you'd do it and preserve current behavior seems to bring up any number of issues.
Flags: needinfo?(nfroyd)
Attached patch patch v1Splinter Review
The patch 
- moves all callbacks from CacheFileIOManager to Cache I/0 thread.
- fixes some badly written tests which started to fail with this change
- fixes a problem with unwritten data (see comment in CacheFile::OnChunkWritten)

https://tbpl.mozilla.org/?tree=Try&rev=32f21a286f90
Attachment #8470764 - Flags: review?(jduell.mcbugs)
Attachment #8470764 - Flags: review?(jduell.mcbugs) → review?(sworkman)
Comment on attachment 8470764 [details] [diff] [review]
patch v1

Review of attachment 8470764 [details] [diff] [review]:
-----------------------------------------------------------------

Code looks good; looking at the test scripts now...

::: netwerk/cache2/CacheFile.cpp
@@ +1613,5 @@
> +  if (!item->mTarget) {
> +    LOG(("CacheFile::QueueChunkListener() - Cannot get Cache I/O thread! Using "
> +         "main thread for callback."));
> +    item->mTarget = do_GetMainThread();
> +  }

Would it be useful to have IOTarget() default to the main thread? Do we want to guarantee that it returns something?

::: netwerk/cache2/CacheFileChunk.cpp
@@ +200,5 @@
>  
>    DoMemoryReport(MemorySize());
>  
>    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
> +                                this);

Nit: put 'this' on the previous line?
Comment on attachment 8470764 [details] [diff] [review]
patch v1

Review of attachment 8470764 [details] [diff] [review]:
-----------------------------------------------------------------

OK looks good to me. Just check that query about IOTarget defaulting to the main thread. Otherwise, r=me.
Attachment #8470764 - Flags: review?(sworkman) → review+
(In reply to Steve Workman [:sworkman] from comment #9)
> Would it be useful to have IOTarget() default to the main thread? Do we want
> to guarantee that it returns something?

Probably not. There are places where we expect that IOTarget() returns IO thread and we assert or fail when nullptr is returned.


> ::: netwerk/cache2/CacheFileChunk.cpp
> >    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
> > +                                this);
> 
> Nit: put 'this' on the previous line?

The line would exceed 80 characters.
Sounds a bit like bug 1034309.
See Also: → 1034309
https://hg.mozilla.org/mozilla-central/rev/6601c86da4c2
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Does this need uplift to Aurora33?
This needs to land in the same version as 1013638, so yes.
Comment on attachment 8470764 [details] [diff] [review]
patch v1

Approval Request Comment
[Feature/regressing bug #]: 1013638
[User impact if declined]: very slow loading of resources from the cache
[Describe test coverage new/current, TBPL]: tested manually
[Risks and why]: low 
[String/UUID change made/needed]: none
Attachment #8470764 - Flags: approval-mozilla-aurora?
Attachment #8470764 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This needs rebasing for Aurora uplift.
Flags: needinfo?(michal.novotny)
Attached patch patch for auroraSplinter Review
Flags: needinfo?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.