Follow-up: favicon task chaining code blows up under massive load

RESOLVED FIXED in Firefox 27

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: rnewman, Assigned: ckitching)

Tracking

({intermittent-failure})

Trunk
Firefox 27
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox25 unaffected, firefox26 unaffected, firefox27 fixed, firefox-esr24 unaffected)

Details

(Whiteboard: [qa-])

Attachments

(1 attachment, 2 obsolete attachments)

Not unexpected, but when running reftests (and thus loading 2,500 pages as fast as possible) the favicon chaining code ends up with a really deep task stack, resulting in OOM or stack exhaustion when it finally gets to run.

This is indicative of several issues:

* The use of recursion here isn't a good one. Works fine in real life, but...
* There seems to be some competition between recording new tasks and being able to actually deliver.

E.g.,

https://tbpl.mozilla.org/php/getParsedLog.php?id=29022891&full=1&branch=fx-team
In this patch, the mChainee atomic reference is replaced by a linked list of references.

When a task starts and finds another task to load the same URL is underway, it will as before call that task's chainTasks method.
This method simply adds the caller the the linked list. The thread-safety is assured by the fact that the lock on loadsInFlight is always held when this method is called, as before.
Unlike before, the chaining task does not update loadsInFlight to point to itself - the oldest in-flight task remains in the hashmap to allow other tasks to add themselves to the list as required.

In onPostExecute, the task which has been chained from locks loadsInFlight, removes itself, and then calls processResult on all tasks in the mChainee list.
I believe it to be safe to do the latter part without holding the lock, as the removal of this task from the hashmap while holding the lock ensures any would-be chainees are either already chained or about to be rejected by the time flow reaches that point.

I've dumped a try run here:
https://tbpl.mozilla.org/?tree=Try&rev=4ee6fd2df7d9

To get the "Is it broken" ball rolling slightly ahead of the review.
Attachment #816288 - Flags: review?(rnewman)
Typo.
Attachment #816288 - Attachment is obsolete: true
Attachment #816288 - Flags: review?(rnewman)
Attachment #816289 - Flags: review?(rnewman)
Comment on attachment 816289 [details] [diff] [review]
V2 - Chain tasks without causing stack overflows.

Review of attachment 816289 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/favicons/LoadFaviconTask.java
@@ +61,5 @@
>      private final boolean mOnlyFromLocal;
>  
>      // Assuming square favicons, judging by width only is acceptable.
>      protected int mTargetWidth;
> +    private LinkedList<LoadFaviconTask> mChainees = new LinkedList<LoadFaviconTask>();

Lazy init?

@@ +319,4 @@
>          processResult(image);
> +
> +        // Share the result with any tasks that chained off this one.
> +        // We perform an emptyness check before we go to the trouble of constructing the iterator,

You repeat this comment below. Remove it here.

@@ +324,5 @@
> +
> +        synchronized (loadsInFlight) {
> +            // Prevent any other tasks from chaining on this one.
> +            loadsInFlight.remove(mFaviconUrl);
> +        }

Let's do this immediately after line 316, where we put the result in the cache. My reasoning: we want to try to grab this lock as soon as we can, to avoid us being cockblocked by a new task waiting to chain. Am I wrong?

@@ +328,5 @@
> +        }
> +
> +        // Since any update to mChainees is done while holding the loadsInFlight lock, once we reach
> +        // this point no further updates to that list can possibly take place (As far as other tasks
> +        // are concerned, there is no longer a task to chain from. The above block will have waited

Missing closing paren.

@@ +336,5 @@
> +        // This is nice - we do not want to take the lock unless we have to anyway, and chaining rarely
> +        // actually happens outside of the strange situations unit tests create.
> +
> +        // Share the result with all chained tasks.
> +        // We perform an emptyness check before we go to the trouble of constructing the iterator,

emptiness.

@@ +338,5 @@
> +
> +        // Share the result with all chained tasks.
> +        // We perform an emptyness check before we go to the trouble of constructing the iterator,
> +        // since the vast majority of the time the size will be zero.
> +        if (!mChainees.isEmpty()) {

If lazy-inited, this becomes a null check.

@@ +340,5 @@
> +        // We perform an emptyness check before we go to the trouble of constructing the iterator,
> +        // since the vast majority of the time the size will be zero.
> +        if (!mChainees.isEmpty()) {
> +            for (LoadFaviconTask t : mChainees) {
> +                t.processResult(image);

We don't actually use onPostExecute in the chaining case, so cancel the task.

@@ +365,5 @@
>  
>          synchronized(loadsInFlight) {
> +            // Only remove from the hashmap if the task there is the one that's being canceled.
> +            // Cancellation of a task that would have chained is not interesting to the hashmap.
> +            if (loadsInFlight.get(mFaviconUrl) == this) {

We also want to remove ourselves from the chain we're in, which should be an `else` clause here.

Something like:

  LoadFaviconTask primary = loadsInFlight.get(mFaviconUrl);
  if (primary == this) {
    loadsInFlight.remove(mFaviconUrl);
    return;
  }
  primary.mChainees.remove(this);

Unless, that is, we use the canceled mechanism as a way to indicate inactivity -- that way the async task executor won't run us.
(In reply to Richard Newman [:rnewman] from comment #3)
> ::: mobile/android/base/favicons/LoadFaviconTask.java
> @@ +61,5 @@
> >      private final boolean mOnlyFromLocal;
> >  
> >      // Assuming square favicons, judging by width only is acceptable.
> >      protected int mTargetWidth;
> > +    private LinkedList<LoadFaviconTask> mChainees = new LinkedList<LoadFaviconTask>();
> 
> Lazy init?

Duh!

> @@ +324,5 @@
> > +
> > +        synchronized (loadsInFlight) {
> > +            // Prevent any other tasks from chaining on this one.
> > +            loadsInFlight.remove(mFaviconUrl);
> > +        }
> 
> Let's do this immediately after line 316, where we put the result in the
> cache. My reasoning: we want to try to grab this lock as soon as we can, to
> avoid us being cockblocked by a new task waiting to chain. Am I wrong?

Other tasks can continue to add themselves to the chainee list until this block runs. The more that do so the better - less duplication of work.
Sure, we might have to wait for that to happen before we run here. That's a good thing.
Having loads and loads of things in the chain is, always, a good thing - we're doing a small amount of work instead of going to the network/database! Yay! It's only problematic if this implies linear stack usage and causes things to blow up, as they did before.

> @@ +328,5 @@
> > +        }
> > +
> > +        // Since any update to mChainees is done while holding the loadsInFlight lock, once we reach
> > +        // this point no further updates to that list can possibly take place (As far as other tasks
> > +        // are concerned, there is no longer a task to chain from. The above block will have waited
> 
> Missing closing paren.

That'll teach me for turning off my IDE's comment bracket matching inspection. :P

> @@ +336,5 @@
> > +        // This is nice - we do not want to take the lock unless we have to anyway, and chaining rarely
> > +        // actually happens outside of the strange situations unit tests create.
> > +
> > +        // Share the result with all chained tasks.
> > +        // We perform an emptyness check before we go to the trouble of constructing the iterator,
> 
> emptiness.

I'm not very good at nihilism.

> 
> @@ +338,5 @@
> > +
> > +        // Share the result with all chained tasks.
> > +        // We perform an emptyness check before we go to the trouble of constructing the iterator,
> > +        // since the vast majority of the time the size will be zero.
> > +        if (!mChainees.isEmpty()) {
> 
> If lazy-inited, this becomes a null check.

Aye. That it does.
Attachment #816289 - Attachment is obsolete: true
Attachment #816289 - Flags: review?(rnewman)
Attachment #816296 - Flags: review?(rnewman)
Attachment #816296 - Flags: review?(rnewman) → review+
https://hg.mozilla.org/integration/fx-team/rev/f42cbf375294
Whiteboard: [fixed in fx-team][qa-]
Target Milestone: --- → Firefox 27
Pushed a follow-up for a missing null check.

https://hg.mozilla.org/integration/fx-team/rev/80f6c958c74c
https://hg.mozilla.org/mozilla-central/rev/f42cbf375294
https://hg.mozilla.org/mozilla-central/rev/80f6c958c74c
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in fx-team][qa-] → [qa-]
Chris/Richard, can one of you please nominate this for uplift to whatever branches would be appropriate?
This is a follow-up to Bug 914296, which is only in 27. Thanks for checking, Ryan!
You need to log in before you can comment on or make changes to this bug.