Closed Bug 969417 Opened 6 years ago Closed 6 years ago

crash in java.util.ConcurrentModificationException: at java.util.LinkedList$LinkIterator.remove(LinkedList.java)

Categories

(Firefox for Android :: General, defect, critical)

29 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 30
Tracking Status
firefox29 --- fixed
firefox30 --- fixed

People

(Reporter: aaronmt, Assigned: ckitching)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 7 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-946b28ea-fc6c-4f81-bf32-4b3272140206.
=============================================================

java.util.ConcurrentModificationException
	at java.util.LinkedList$LinkIterator.remove(LinkedList.java:167)
	at java.util.LinkedList.removeOneOccurrence(LinkedList.java:837)
	at java.util.LinkedList.removeFirstOccurrenceImpl(LinkedList.java:830)
	at java.util.LinkedList.remove(LinkedList.java:665)
	at org.mozilla.gecko.favicons.cache.FaviconCache.recordRemoved(FaviconCache.java:490)
	at org.mozilla.gecko.favicons.Favicons.putFaviconsInMemCache(Favicons.java:309)
	at org.mozilla.gecko.favicons.Favicons.putFaviconsInMemCache(Favicons.java:313)
	at org.mozilla.gecko.favicons.LoadFaviconTask.pushToCacheAndGetResult(LoadFaviconTask.java:431)
	at org.mozilla.gecko.favicons.LoadFaviconTask.doInBackground$2d4c763b(LoadFaviconTask.java:354)
	at org.mozilla.gecko.favicons.LoadFaviconTask.doInBackground$42af7916(LoadFaviconTask.java:42)
	at org.mozilla.gecko.util.UiAsyncTask$BackgroundTaskRunnable.run(UiAsyncTask.java:48)
	at android.os.Handler.handleCallback(Handler.java:733)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:136)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:32)
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
I think this is as simple as not taking the reordering semaphore in some cases where we touch mOrder. Patch forthcoming.
Attachment #8372463 - Flags: review?(chriskitching)
I don't think you've got the right idea here. This is probably a testament to insufficient documentation about the reordering semaphore (So certainly add that to your patch if what follows turns out to be correct).

recordRemoved needs to hold the write lock when it is called. See the comment above the function. But why?

The intent of the reordering semaphore is to allow readers to update the most-recently-used list without needing to take the write lock (Which locks all other readers out, because MRSW). Clearly, an MRSW scheme in which every read implies a write is pretty unhelpful!
So - readers can modify the ordering list, but such modifications are guarded in a fine-grained way by the reordering semaphore -> A reader can never concurrently modify the reordering list with anothe reader.
It is not necessary for writer to take the reordering semaphore if it wants to update the ordering list because MRSW (Only one writer can ever coexist, and it will never coexist with a reader) -> A writer will never concurrently modify the ordering list with *anything*.

At least, that's the plan! Now to find where I screwed up the execution...

Consider:
http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/mobile/android/base/favicons/cache/FaviconCache.java#l257

This block allows recordRemoved to be called with *no* locks at all (In the event that an exception is thrown in the try - it'll finish the read txn and not acquire any more locks before carrying on.)
That looks like a possible culprit.

I'll continue reading - posting early to exhibit signs of life (and hopefully deliver useful information)
Comment on attachment 8372463 [details] [diff] [review]
Take reordering semaphore before modifying favicon cache order list.

Review of attachment 8372463 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Chris Kitching [:ckitching] from comment #3)
> Consider:
> http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/mobile/android/
> base/favicons/cache/FaviconCache.java#l257
> 
> This block allows recordRemoved to be called with *no* locks at all (In the
> event that an exception is thrown in the try - it'll finish the read txn and
> not acquire any more locks before carrying on.)
> That looks like a possible culprit.

My failure to read knows no bounds. There's a return statement literally the line above the one I linked. *facepalm*.

So *that's* not the fault.

This is very strange. This patch shouldn't be necessary, for reasons that you hopefully understand. It might work as a bandaid, but I'd like to find the underlying problem. In theory, this problem shouldn't happen, so there's gotta be a subtle implementation bug somewhere...

I'll go stare at it for a while.

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +486,5 @@
>          }
>  
>          int sizeRemoved = 0;
>  
> +        mReorderingSemaphore.acquireUninterruptibly();

Again, if you're holding the write lock (as per the precondition on line 476) then you've implicitly locked *everything* and so you need not do this.

@@ +647,5 @@
>          }
>  
>          startWrite();
>          try {
> +            mReorderingSemaphore.acquireUninterruptibly();

Not necessary - starting a write locks *everyone* else out (Readers and writers) (because MRSW), so writers don't need to take this semaphore. So this does nothing.

@@ +675,5 @@
>          // Note that we neither clear, nor track the size of, the permanent map.
>          try {
>              mCurrentSize.set(0);
>              mBackingMap.clear();
> +            mOrdering.clear();       // This should take the reordering semaphore.

No it shouldn't - same reason as above.
Attachment #8372463 - Flags: review?(chriskitching)
Slightly out of scope, but I spotted this - putFailed won't release the write lock in the case of an exception, which could lead to deadlock. That said, the only exception it might throw would be an NPE, which would only occur if there's some bug in the body of the function or its dependancies.
So. Yes. Probably pointless future-proofing. I'll stop hijacking your bug now.


It's not obvious why this crash is occuring. As far as I can tell, every modification to mOrdering either happens in a write transaction (So nothing else can be concurrently executing) or happens in a read transaction guarded by the reordering semaphore (So cannot run concurrently with any other modification of mOrdering, or with a write transaction).

So, either there's a bug in the implementation of the start/stopWrite functions (and friends - I name the lock upgrading function as a likely culprit, since that's the slightly nonstandard part of the implementation (and, indeed, why I didn't just use ReentrantReadWriteLock)), there's a bug in Android's implementation of Semaphore/LinkedList (Not likely), or I've missed something (Probably).

I realise this is a lot of words and not a lot of help. Since a given ConcurrentModificationException can generally be thrown from two places (subject to scheduling propeties and whatnot), it might be interesting to wait for another crash to appear showing us the other side of the conflict (If we remain unable to determine it by inspection).
> So, either there's a bug in the implementation of the start/stopWrite
> functions (and friends - I name the lock upgrading function as a likely
> culprit, since that's the slightly nonstandard part of the implementation
> (and, indeed, why I didn't just use ReentrantReadWriteLock)), there's a bug
> in Android's implementation of Semaphore/LinkedList (Not likely), or I've
> missed something (Probably).

I can only see two holes. One, based purely on speculation (no concrete error case in mind), is that 

    private void finishWrite() {
        mTurnSemaphore.release();
        mWriteLock.release();
    }

might, perhaps, maybe, ought to have its operations reversed.

The second is, indeed, upgrade.

There is a window between discarding the read lock and acquiring the write lock.

In that window we can get another reader:

A: (already is a reader)
B: acquires turn semaphore, discards it
A: acquires turn semaphore
A: decrements ongoing reads, gets 0
B: increments ongoing reads, gets 1

Scenario 1:
B: ... can now acquire the write lock and read
A: ... can release that write lock! Oops!

Scenario 2:
A: releases the write lock
B: acquires it
A: blocks on acquiring it, waiting for B to finish reading.


The consequence of this: `A` thinks that its upgrade from reader to writer is atomic, but in fact arbitrary operations can occur inside that window. Any assumptions it has made prior to that call are invalidated.

Could you check my working?
Flags: needinfo?(chriskitching)
(In reply to Richard Newman [:rnewman] from comment #6)
>     private void finishWrite() {
>         mTurnSemaphore.release();
>         mWriteLock.release();
>     }
> 
> might, perhaps, maybe, ought to have its operations reversed.

This one I did spot, and I'm pretty sure it doesn't matter.
So, the possible strangeness can occur if we context switch after releasing the turn semaphore but before we release the write semaphore, I assume.
If startWrite jumps in, it'll take the turn semaphore and then block until we come back and release the write semaphore for it.
If startRead jumps in, because we're still holding the write semaphore this read absolutely must be the first concurrently-executing read, so onGoingReads == 0, so it'll try to take the write semaphore, and again wait for finishWrite to come back and release it.
upgradeReadToWrite cannot possibly jump in at this point, because, since no reads can coexist with a write, and we're still holding the write semaphore, and we've established that startRead can't complete until we finish our finishWrite, there cannot possibly be a read underway to upgrade to a write, if you see what I mean.

The turn semaphore really only exists to ensure fairness - the system would work just fine if you deleted it, but you'd suffer from writer starvation if there are lots of readers. The turn semaphore ensures that jobs get run approximately in arrival-order. Without it, writes have to wait until there are no more reads before they get a turn (reads would never queue up behind a write).
For our use-case, I thought the fairness seemed like a sensible thing to have.

> The second is, indeed, upgrade.
> 
> There is a window between discarding the read lock and acquiring the write
> lock.
> 
> In that window we can get another reader:
> 
> A: (already is a reader)
> B: acquires turn semaphore, discards it
> A: acquires turn semaphore
> A: decrements ongoing reads, gets 0
> B: increments ongoing reads, gets 1
> 
> Scenario 1:
> B: ... can now acquire the write lock and read
> A: ... can release that write lock! Oops!
> 
> Scenario 2:
> A: releases the write lock
> B: acquires it
> A: blocks on acquiring it, waiting for B to finish reading.
> 
> 
> The consequence of this: `A` thinks that its upgrade from reader to writer
> is atomic, but in fact arbitrary operations can occur inside that window.
> Any assumptions it has made prior to that call are invalidated.

Ah yes, this has consistency problems.
You might read, decide you want to write something based on that, upgrade to a writer, and not notice that while you were upgrading another writer came along and changed the stuff you read, so your decision for what to read is now invalid.
That is certainly a flaw, but I don't think that can actually create a ConcurrentModificationException?
It's also not impossible that this is a flaw that, more through luck than judgement, doesn't affect our particular implementation.

The fix, in any case, seems not too complicated. Since this is obtuse 1960s concurrency control, this probably means I've got the wrong idea. (Taking a backwards step for a moment and asking the question "Why are we using a concurrency primitive invented by an angry Dutchman in the 60s?":
What we really want is an MRSW scheme with lock upgrading. This really doesn't *need* Semaphores, it needs locks. Java provides two almost-but-not-quite useful classes:
ReentrantReadWriteLock - Provides fair/non-fair MRSW with no support for lock upgrading. Retrofitting lock upgrading falls foul of a similar problem to the one I, ironically enough, implemented with Semapores. Grand.
ReentrantLock - A reentrant lock - it can be acquired repeatedly, so is exactly as annoying to use as a Semaphore for this purpose, but with higher overheads.
Plus, Semaphores are cool.)

Could not the upgradeReadToWrite function be rephrased as:

    private void upgradeReadToWrite() {
        mTurnSemaphore.acquireUninterruptibly();
        if (mOngoingReads.decrementAndGet() != 0) {
            mWriteLock.acquireUninterruptibly();
        }
    }

Remembering that the first reader takes the write lock to keep writers out and the last reader releases it to let them back in, if the read we are upgrading to write is the last concurrently-executing one, it already *has* the write lock, so needs only acquire the turn semaphore to ensure fairness and it's free to go.
Otherwise, it has to wait for the ongoing reads to conclude before taking the write semaphore from them. Having taken the turn semaphore, the fairness property ensures no other reads can enter while we're waiting.
That'd probably fix the problem you spotted? Or did I miss something obvious? I should probably have breakfast before thinking about semaphores.
Flags: needinfo?(chriskitching)
(In reply to Chris Kitching [:ckitching] from comment #7)

> Could not the upgradeReadToWrite function be rephrased as:
> 
>     private void upgradeReadToWrite() {
>         mTurnSemaphore.acquireUninterruptibly();
>         if (mOngoingReads.decrementAndGet() != 0) {
>             mWriteLock.acquireUninterruptibly();
>         }
>     }
> 
> Remembering that the first reader takes the write lock to keep writers out
> and the last reader releases it to let them back in, if the read we are
> upgrading to write is the last concurrently-executing one, it already *has*
> the write lock, so needs only acquire the turn semaphore to ensure fairness
> and it's free to go.

My parallel reasoning, laid out here in case it helps with clarity:

A reader can upgrade to being a writer when there are no other readers or writers.

The former is determined by the reader counter -- if we decrement it and it ends up as zero, we were the last, and we have the write lock (by virtue of any reads having been begun), and there are no other readers.

The latter is always true because readers and writers cannot simultaneously exist.

As such, no work needs to be done to lock out writers during an upgrade; that we were reading is sufficient.

But we do need to lock out readers during the upgrade. When a new reader arrives, it attempts to take the turn semaphore. The turn semaphore is held for the duration of a write, and is sufficient to lock out readers.

This leads to your phrasing: take the semaphore to lock out new readers, decrement the reader counter, and if there are still concurrent readers, wait for the last one to release the read lock.

I see two pitfalls here:

* Lock re-entrancy. If this is being called on the same thread that initially took the lock, and the lock acquisition succeeds as a result, confusing things will happen. This would happen if a read begins on thread A, another read begins on thread B, and thread A attempts to upgrade. This locking system needs to block here regardless of which thread initially acquired the lock.

* Queued writes. Imagine reader A, reader B, potential writer C. Writer C arrives, calls startWrite, and begins waiting for the turn semaphore. If there are no writers it'll get it immediately, and then wait for the write lock. Reader A attempts to upgrade, and blocks on the turn semaphore. 

Writer C has the turn semaphore and is waiting for the write semaphore.
Reader A is waiting for the turn semaphore.
Either Reader A or Reader B has the write semaphore.

Either Reader A or Reader B can release the write semaphore (depending on who finishes reading first), but only when the number of readers reaches zero. Reader A won't decrement that counter until it has the turn semaphore, so even if Reader B finishes reading, it'll never release the write semaphore.

Deadlock.

I haven't done the analysis to determine whether this deadlock exists in the current implementation (I have a hunch that it does).

The solution might lie in allowing readers to simultaneously discard their reader state and wait for the turn semaphore (two layers of locks!). Or it might lie in an analysis that declares that this thread arrangement (namely, more than one reader active while a writer is waiting) cannot exist -- but the existence of this bug would encourage me to apply a high bar to such an analysis.

Another approach might be to altering the relationship between the turn semaphore and counter operations. For example, what happens if we decrement without taking the turn semaphore, recording the value, and *then* acquire the turn semaphore? That would unblock a parallel reader, and the == 0 conditional in both cases should avoid double-release problems.

Thoughts?
Attached patch Part 1/3: Cleanup. (obsolete) — Splinter Review
Bear with me... All shall be explained shortlyish.
Attachment #8372607 - Attachment is obsolete: true
Attachment #8374225 - Flags: review?(rnewman)
This'll make sense shortly. Honest.
Attachment #8374227 - Flags: review?(rnewman)
Attachment #8374228 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #8)

And this is why one should not try to be "clever" with concurrency. Whoops.

> I see two pitfalls here:
> 
> * Lock re-entrancy. If this is being called on the same thread that
> initially took the lock, and the lock acquisition succeeds as a result,
> confusing things will happen. This would happen if a read begins on thread
> A, another read begins on thread B, and thread A attempts to upgrade. This
> locking system needs to block here regardless of which thread initially
> acquired the lock.

Behold the reason why I used lock-flavoured semaphores, instead of one of Java's existing locking primitives.

A quick refresher - a semaphore is just a glorified atomic integer. When you "acquire" it you atomically check if its value > 0 and, if so, decrement it and continue.
If the value is zero, you join a queue and go to sleep.
When you "release" it, you atomically check if the queue is empty. If it is not, you wake the thread at the start of the queue. If it is, you increment the value. 
Java's names for the semaphore methods are sort of confusing - they make them feel lock-like, when really, they're not. At no point does anyone really "own" a semaphore. It doesn't track which thread took it (Indeed, there are no lock objects, just a counter).

tl;dr: It doesn't matter who calls wait. Semaphores are semaphores, not reentrant locks. You can make reentrant locks using semaphores (and, with some effort, semaphores with reentrant locks) - but I didn't.

In the sort of likely case that explanation hasn't made you happy, let's consider your concrete example.
A and B have both started reading. The values of the semaphores will be as follows:
mTurnSemaphore = 1
mWriteLock = 0

And the counter:
mOngoingReads = 2

We now call upradeReadToWrite on A. (Considering the new version of the function)
mTurnSemaphore.acquireUninterruptibly();  Returns immediately. mTurnSemaphore is now 0.
if (mOngoingReads.decrementAndGet() != 0) {  Well, that'll be 1, so we'll skip the block

}

Whoops! We just continued with a write when there's still ongoing reads. What we really want to do is wait until mOngoingReads is zero before continuing.
For reasons I'll explain shortly, this is stupid and pointless.

> Deadlock.

Yup! This new function of mine isn't well-written.


However, this yak smells funny.

After thinking about it some more, I believe I've figured out why none of the library implementations of MRSW I can find implement lock upgrading. It's because it's a *terrible* idea.

The simple approach to lock upgrading is to just do finishRead(); startWrite() - but this breaks consistency. Another writer can jump in between finishRead() and startWrite() and ruin what the upgrading transaction is trying to do.

So - you need an upgrade function that provides an atomic version of `finishRead(); startWrite()` - which is what we've spent the last few messages trying to come up with. This is something that can be done, but also something one shouldn't do.
Consider a case in which two readers start, then both attempt to upgrade to writers. One of them will necessarily complete its write before the other (because MRSW) - whoops. Even with an atomic upgrade function you've still got a write happening that affects the information the second transaction just read, so your consistency is broken.

So - the only way you can get what I was trying to do here to work in the way that I wanted it to work, in general, is to use a transaction system that supports rollback.

Let's not do that.


MRSW can't do lock upgrading in general. Took me a while to notice, but it just can't be done.
The implementation of the conventional MRSW methods provided (Everything except upgradeReadToWrite) is lifted straight out of a textbook, so should be sound. All we need to do is kill upgradeReadToWrite with fire.

Now, we do have some methods that want to do this "read stuff and then use what we read to decide what we want to write, if anything". I'm of the opinion that none of them can suffer badness from being rephrased to simply be "startRead(); doStuff(); stopRead(); startWrite(); doMoreStuff(); stopWrite();". While this *does* let stuff jump in and modify the structures between doing stuff and doing more stuff - this does not affect us here.
They are:

isFailedFavicon upgrades its read transaction to a write in the case that the entry has expired (to remove it from the failure cache).
A sneaky-write might remove the entry we want to remove - we don't care.
A sneaky-write might add more entries to the failed cache - we still don't care.
A sneaky-write might make this entry no longer be expired. That's annoying, we'll try to download a failed favicon again and write its failure to memory at that point. This situation is extremely unlikely (perhaps even impossible - it requires isFailedFavicon to race with the conclusion of another attempt to download that very same favicon, that failed, and did so slightly after the favicon ceased to have been considered a failure. Since failure prevents attempts to redownload until the failure cache entry is cleared, this cannot ever happen (Unless someone regresses something at some point)). It still doesn't matter.

getFaviconForDimensions upgrades its read transaction to a write in order to put a newly-calculated secondary (If there is one) into the cache. Sneaky-writes could:
Delete the container from the cache. In which case, calling addSecondary on it will succeed just fine, but be a pointless waste of effort. We don't care.
Cause the dominant colour to already be computed. So we compute it again. We don't care.
Cause an equivalent secondary to be computed in another thread and added before we then try to insert it again. We care - as well as wasting effort (we don't care. It's so rare) this can lead to two identical images coexisting in a container. There's a patch for that (Part 2). A slight hack has been used to prevent doubly-counting the size.

Finally, putFavicons should just be a write transaction all the way. It very slightly reduces concurrency (The reordering tweaks would happen in a write, not a read) - but it's a trivial performance drop and it eliminates a lot of faff.


Hopefully this is clear.

tl;dr: MRSW lock upgrading is impossible if you want consistency. Omission of lock upgrading is impossible if you want both high concurrency and consistency. The potential inconsistencies do not worry us at this time, so we can both have our cake and eat it.
Reasoning WFM. Review pending.
Attachment #8374225 - Flags: review?(rnewman) → review+
Comment on attachment 8374227 [details] [diff] [review]
Part 2/3: Make favicon cache containers resistent to multiple inserstions.

Review of attachment 8374227 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, but you're not actually using setMostRecentlyUsed's return value...
Attachment #8374227 - Flags: review?(rnewman) → review+
Comment on attachment 8374228 [details] [diff] [review]
Part 3/3: Kill lock upgrading in FaviconCache.

Review of attachment 8374228 [details] [diff] [review]:
-----------------------------------------------------------------

Let's fold these patches down, and fix that one lock error.

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +242,5 @@
>          }  finally {
> +            finishRead();
> +            if (!isAborting && isExpired) {
> +                // No longer expired.
> +                startWrite();

If !isExpired || isAborting, we're going to return as soon as we're finished with this `finally` block. As such, I think it's equivalent -- and less confusing -- to simply call `finishRead` here, and unconditionally do the `startWrite` in the block below.

@@ +395,5 @@
>              return null;
>          } finally {
> +            finishRead();
> +            if (!isAborting && doingWrites) {
> +                startWrite();

Similarly.

@@ +573,2 @@
>  
> +            // Not using setMostRecentlyUsed, because the elements are known to be new. This can be done

Is that true?

@@ +583,5 @@
> +            } catch (Exception e) {
> +                Log.e(LOGTAG, "Favicon cache exception!", e);
> +                return;
> +            } finally {
> +                mReorderingSemaphore.release();

If an exception is raised, and caught on line 583, you'll have released the ordering semaphore but not called finishWrite() before returning.

You need an aborting flag in order to conditionally release it after the ordering semaphore, or you need to release it in the exception handler (and prove that it's fine to release write-then-reordering).
Attachment #8374228 - Flags: review?(rnewman) → review-
Assignee: rnewman → chriskitching
Blocks: 981684
(In reply to Richard Newman [:rnewman] from comment #14)
> This looks fine, but you're not actually using setMostRecentlyUsed's return
> value...

Ah yes, it seems my attempt to split the patches to make your reviewing task earlier backfired somewhat.


(In reply to Richard Newman [:rnewman] from comment #15)
> If !isExpired || isAborting, we're going to return as soon as we're finished
> with this `finally` block. As such, I think it's equivalent -- and less
> confusing -- to simply call `finishRead` here, and unconditionally do the
> `startWrite` in the block below.

Very much so. Not enough thought used when converting from the lock-upgrading logic. This lets us kill both of the flags, which is nice.
 
> @@ +573,2 @@
> >  
> > +            // Not using setMostRecentlyUsed, because the elements are known to be new. This can be done
> 
> Is that true?

Not as such, no.

Well, it's a slightly funky one. The reordering semaphore is used to guard the mOrdering list. The rule is you can take that semaphore and modify the most-recently-used list if you're currently in any sort of transaction (read or write).
This is very handy, because if you needed to be in the midst of a write to update the most recently used then every read would imply a write, which makes all this MRSW stuff kinda pointless.

So you *could* change this function to start a read, update the new entires in the ordering list, then start a write, and enter their actual records. The ensuing inconsistency would almost certainly be fine (unless the cache gets culled so aggressively that the records you haven't entered yet are deleted before you insert them, causing a crash).
But, I don't think it's worth it. The performance gain would probably be worth it if you were very often going to have many favicons being inserted. But you're not. It's just lots of complication for a tiny improvement some of the time.

So, I think the best plan is to update this confusing comment and refactor the setMostRecentlyUsed function to come in two flavours - from-write-transaction and from-read-transaction. The former need not pay the cost of taking the reordering semaphore as it is certain to be the only thread active (Provided all callers follow the rule of "Thou shalt not call this from outside a transaction" (An ifdef-like construct woul be extremely nice for making debug builds *enforce* this...)). The latter being the function as it currently stands.
That way, we get the benefit of not needing to repeatedly take and drop a semaphore when updating multiple entries, and the benefit of not needing to inline the function by hand (which was a kinda stupid thing to do, anyway).

This tidies up putFavicons considerably. And makes it faster. And less confusing. Yay.

> @@ +583,5 @@
> > +            } catch (Exception e) {
> > +                Log.e(LOGTAG, "Favicon cache exception!", e);
> > +                return;
> > +            } finally {
> > +                mReorderingSemaphore.release();
> 
> If an exception is raised, and caught on line 583, you'll have released the
> ordering semaphore but not called finishWrite() before returning.
> 
> You need an aborting flag in order to conditionally release it after the
> ordering semaphore, or you need to release it in the exception handler (and
> prove that it's fine to release write-then-reordering).

... Or I could do as described above and not be touching that semaphore here in the first place. :D

(Patches folded as requested).

I'm really kicking myself for not doing things this way from the start. This change has, as well as making the concurrency model actually correct, reduced the number of code lines in this file by 10.6%. Jackpot.
Attachment #8372463 - Attachment is obsolete: true
Attachment #8374225 - Attachment is obsolete: true
Attachment #8374227 - Attachment is obsolete: true
Attachment #8374228 - Attachment is obsolete: true
Attachment #8388956 - Flags: review?(rnewman)
A try run, to see if I broke anything obvious:
https://tbpl.mozilla.org/?tree=Try&rev=151fb936a836
Comment on attachment 8388956 [details] [diff] [review]
Reduce insanity in favicon cache concurrency.

Review of attachment 8388956 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +242,3 @@
>          try {
>              recordRemoved(mBackingMap.get(faviconURL));
>              mBackingMap.remove(faviconURL);

recordRemoved(mBackingMap.remove(faviconURL));

recordRemoved is null-safe.

---
public V remove(Object key)

Removes the mapping for the specified key from this map if present.
Returns:
    the previous value associated with key, or null if there was no mapping for key. 
---

@@ +261,5 @@
> +                recordRemoved(mBackingMap.get(faviconURL));
> +            }
> +
> +            FaviconsForURL container = new FaviconsForURL(0, true);
> +            mBackingMap.put(faviconURL, container);

Map.put() returns the previous value. So:

  try {
      final FaviconsForURL container = new FaviconsForURL(0, true);
      final FaviconsForURL previous = mBackingMap.put(faviconURL, container);
      if (previous != null) {
          recordRemoved(previous);
      }
  } finally {
      finishWrite();
  }

You don't strictly need the null check, because recordRemoved checks.

@@ +479,2 @@
>       */
> +    private boolean setMostRecentlyUsed(FaviconCacheElement element) {

WithinRead.

@@ +483,3 @@
>          mOrdering.offer(element);
>          mReorderingSemaphore.release();
> +        return contained;

I gently prefer try..finally here.

@@ +491,5 @@
> +     *
> +     * @param element The element that is to become the most recently used one.
> +     * @return true if this element already existed in the list, false otherwise. (Useful for preventing multiple-insertion.)
> +     */
> +    private boolean setMostRecentlyUsedUnsafe(FaviconCacheElement element) {

setMostRecentlyUsedWithinWrite

@@ +573,3 @@
>              }
> +
> +            mCurrentSize.addAndGet(sizeGained);

Add a small note that briefly double counting here (in the case of multiple writes for the same favicon URL) doesn't matter, because recordRemoved will fix it.
Attachment #8388956 - Flags: review?(rnewman) → review+
(In reply to Richard Newman [:rnewman] from comment #18)
> ::: mobile/android/base/favicons/cache/FaviconCache.java
> @@ +242,3 @@
> >          try {
> >              recordRemoved(mBackingMap.get(faviconURL));
> >              mBackingMap.remove(faviconURL);
> 
> recordRemoved(mBackingMap.remove(faviconURL));
> 
> recordRemoved is null-safe.
> 
> ---
> public V remove(Object key)
> 
> Removes the mapping for the specified key from this map if present.
> Returns:
>     the previous value associated with key, or null if there was no mapping
> for key. 
> ---

Ick. Can't believe I missed that one.

> Map.put() returns the previous value. So:
> 
>   try {
>       final FaviconsForURL container = new FaviconsForURL(0, true);
>       final FaviconsForURL previous = mBackingMap.put(faviconURL, container);
>       if (previous != null) {
>           recordRemoved(previous);
>       }
>   } finally {
>       finishWrite();
>   }
> 
> You don't strictly need the null check, because recordRemoved checks.

I find myself yearning for Lombok's @Nonnull annotation. Many subtle bugs very suddenly cease to be subtle.

> @@ +479,2 @@
> >       */
> > +    private boolean setMostRecentlyUsed(FaviconCacheElement element) {
> 
> WithinRead.
> 
> @@ +491,5 @@
> > +     *
> > +     * @param element The element that is to become the most recently used one.
> > +     * @return true if this element already existed in the list, false otherwise. (Useful for preventing multiple-insertion.)
> > +     */
> > +    private boolean setMostRecentlyUsedUnsafe(FaviconCacheElement element) {
> 
> setMostRecentlyUsedWithinWrite

This is what spending too much time reading the documentation of sun.misc.Unsafe does to one's ability to name things.

> @@ +573,3 @@
> >              }
> > +
> > +            mCurrentSize.addAndGet(sizeGained);
> 
> Add a small note that briefly double counting here (in the case of multiple
> writes for the same favicon URL) doesn't matter, because recordRemoved will
> fix it.

Further to your recent-ish email on the mailing list, shall also I tack on an additional patch to kill off all the mStupidNamingConvention rubbish in this file while we're here?
Patch incorporating final comments.
Attachment #8388956 - Attachment is obsolete: true
And if you want it, the bonus kill-off-all-the-silly-prefix-naming-from-favicon-code patch.
I guess if this is going to be done it should be done to whole files at a time, ideally while the blame is still mostly one person (And at this point I think the blame in these files is still largely me, so there's not much information getting clobbered by this)

Still. Feel free to discard - represents only about 5 minutes of work thanks to nice editors, and I needed something to do while eating.
(I'll land one or both of these once I know if I'm supposed to be landing both or just one of them - seems wasteful to use two runs of the unit tests when one of them potentially is just a refactoring patch)
Comment on attachment 8389623 [details] [diff] [review]
Eliminate unfavoured naming convention from favicon code.

Review of attachment 8389623 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with corrections. And commit message.

::: mobile/android/base/favicons/Favicons.java
@@ +331,1 @@
>                  return false;

{}

::: mobile/android/base/favicons/LoadFaviconTask.java
@@ +83,2 @@
>  
> +        pageUrl = pUrl;

this.pageURL = pageURL;

@@ +83,4 @@
>  
> +        pageUrl = pUrl;
> +        faviconUrl = fUrl;
> +        listener = aListener;

this.listener = listener;

@@ +366,5 @@
>              return image;
>          }
>  
>          try {
> +            loadedBitmaps = downloadFavicon(new URI(faviconUrl));

Might as well make these `faviconURL` while you're at it.

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +287,5 @@
>  
>          startRead();
>  
>          try {
> +            container = oermanentBackingMap.get(faviconURL);

per, not oer.

::: mobile/android/base/favicons/cache/FaviconCacheElement.java
@@ +23,5 @@
>      // has a record of the existence of a primary payload, even if it is no longer in the cache.
>      // This means that when a request comes in that will be best served using a primary that is in
>      // the database but no longer cached, we know that it exists and can go get it (Useful when ICO
>      // support is added).
> +    volatile boolean invalidated;

If this can be private, make it (and the other members) so.

::: mobile/android/base/favicons/decoders/ICODecoder.java
@@ +90,4 @@
>  
> +    public ICODecoder(byte[] buffer, int off, int length) {
> +        decodand = buffer;
> +        offset = off;

this.offset = offset;
Attachment #8389623 - Flags: review+
Comment on attachment 8389621 [details] [diff] [review]
Reduce insanity in favicon cache concurrency.

Review of attachment 8389621 [details] [diff] [review]:
-----------------------------------------------------------------

If this applies fairly cleanly to Aurora, I would take (and take care of landing) a patch for 29. Please nom if it applies.
Attachment #8389621 - Flags: review+
(In reply to Richard Newman [:rnewman] from comment #23)
> ::: mobile/android/base/favicons/Favicons.java
> @@ +331,1 @@
> >                  return false;
> 
> {}
> 
> ::: mobile/android/base/favicons/LoadFaviconTask.java
> @@ +83,2 @@
> >  
> > +        pageUrl = pUrl;
> 
> this.pageURL = pageURL;

Ah, you prefer them that way. Inferring your preferred code style from review trends...
Of course, auto-generated constructors would be nicer. :P

> ::: mobile/android/base/favicons/cache/FaviconCache.java
> @@ +287,5 @@
> >  
> >          startRead();
> >  
> >          try {
> > +            container = oermanentBackingMap.get(faviconURL);
> 
> per, not oer.

Oo-er.

> ::: mobile/android/base/favicons/cache/FaviconCacheElement.java
> @@ +23,5 @@
> >      // has a record of the existence of a primary payload, even if it is no longer in the cache.
> >      // This means that when a request comes in that will be best served using a primary that is in
> >      // the database but no longer cached, we know that it exists and can go get it (Useful when ICO
> >      // support is added).
> > +    volatile boolean invalidated;
> 
> If this can be private, make it (and the other members) so.

It can't - this is partly why there's a package for these things. Package level access allows for nice data hiding without resorting to getters (that aren't *really* data hiding so much as "Simon-says"...)

> ::: mobile/android/base/favicons/decoders/ICODecoder.java
> @@ +90,4 @@
> >  
> > +    public ICODecoder(byte[] buffer, int off, int length) {
> > +        decodand = buffer;
> > +        offset = off;
> 
> this.offset = offset;

For consistency, this-ifying the lot of them.

(In reply to Richard Newman [:rnewman] from comment #24)
> If this applies fairly cleanly to Aurora, I would take (and take care of
> landing) a patch for 29. Please nom if it applies.

Appears to apply to current m-a without any changes. Go nuts.
Should I proceed to land both on fx-team as usual, or is something different required when an uplift is taking place to avoid a future merge conflict?
Attachment #8389623 - Attachment is obsolete: true
(In reply to Chris Kitching [:ckitching] from comment #25)

> Appears to apply to current m-a without any changes. Go nuts.
> Should I proceed to land both on fx-team as usual, or is something different
> required when an uplift is taking place to avoid a future merge conflict?

Land both on fx-team, flag the uplift patch (part 1) for Aurora approval when everything goes green.
Target Milestone: --- → Firefox 30
Comment on attachment 8389654 [details] [diff] [review]
V2: Eliminate unfavoured naming convention from favicon code.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
  Favicons rework.

User impact if declined: 
  Crashes in favicon handling. Perhaps ANRs like Bug 981684, too.

Testing completed (on m-c, etc.): 
  Baked on m-c. Thorough review.

Risk to taking this patch (and alternatives if risky): 
  Potential odd behavior due to favicon cache inconsistency, but (a) this risk should be slim, because we thoroughly analyzed the root cause, and (b) this fix is way better than any cache inconsistency issue!

String or IDL/UUID changes made by this patch:
  None.
Attachment #8389654 - Flags: review+
Attachment #8389654 - Flags: approval-mozilla-aurora?
Oh, ckitching used "v2", not "part 2", so I flagged the wrong patch. Pretend I picked part 1!
Attachment #8389654 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 987867
You need to log in before you can comment on or make changes to this bug.