incrementAndGet concurrency risk in FaviconCache

RESOLVED FIXED in Firefox 32

Status

()

defect
--
major
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: rnewman, Assigned: araim)

Tracking

Trunk
Firefox 32
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox29 affected, firefox30 affected, firefox31 affected, firefox32 affected)

Details

(Whiteboard: [mentor=rnewman][lang=java][good first bug])

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

5 years ago
N + 1 concurrent readers and one concurrent reader or writer can allow N readers to slip through and perform work without acquiring the write lock.

This is only a risk if a writer is involved, so the practical failure is:


  R1                   |  R2                   |  W1
=======================================================================
Acquire turn           |                       |
                       | Wait for turn         |
Release turn           |                       |
                       | Acquire turn, release |
                       |                       | Acquire turn
incrementAndGet = 1    |                       |
                       | incrementAndGet = 2   |
                       |                       | Acquire write lock
WAITING ON WRITE LOCK  | FREE TO PERFORM READ  | FREE TO PERFORM WRITE


But this is generally very unlikely.
Reporter

Updated

5 years ago
Whiteboard: [mentor=rnewman][lang=java][good first bug]
This bug needs some love to be marked a good first bug: https://wiki.mozilla.org/Contribute/Coding/Mentoring#Good_First_Bugs.  rnewman, can you fill in some of the details of the fix?  (Or, if necessary, remove the tag, which would make me a sad panda.)
Flags: needinfo?(rnewman)
Reporter

Comment 2

5 years ago
This fragment:

https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/favicons/cache/FaviconCache.java#156

156         if (ongoingReads.incrementAndGet() == 1) {
157             // First one in. Wait for writers to finish and lock them out.
158             writeLock.acquireUninterruptibly();
159         }

isn't correct, and can result in the concurrency bug diagrammed in Comment 0.

The solution is to figure out a thread-safe alternative to this two-step process -- perhaps as simple as synchronizing -- that doesn't significantly regress performance and addresses that failure mode.

A contributor for this bug is expected to understand Java concurrency and common CS concurrency primitives. For that kind of contributor this is a good first bug.
Flags: needinfo?(rnewman)
Assignee

Comment 3

5 years ago
Posted patch Proposed patch for this bug (obsolete) — Splinter Review
Comment on attachment 8420666 [details] [diff] [review]
Proposed patch for this bug

Review of attachment 8420666 [details] [diff] [review]:
-----------------------------------------------------------------

Using ReentrantReadWriteLock is a generally good plan. The original reason this wasn't done was because I wanted to be able to implement lock upgrading. Since that turned out to be unsafe in general (facepalm), this is now a great idea. And it turned out way neater than expected!

The quirk to realise, that you appear to have noticed, is the need for readers to be able to write mOrdering without entering a write transaction and killing all the concurrency. That lock you have seems to do the trick nicely.
You've got a couple of typos, but otherwise I'm a big fan. Well done for fixing my spectacular screwup :D.

(That said, I'm always distrusting of concurrent code, and don't have time to study this particularly carefully, so there might be something obscure in there that rnewman will spot shortly...)

A couple of first-patch kinks:

I've never seen anyone include the bug title itself in the commit message. I think the convention for commit messages is Bug XXXXX - <Description of what the patch does>. Various UIs around the place linkify the "Bug XXXXXX", so obtaining the title isn't problematic. So you might want to tweak that. I'm sure rnewman will let you know.

If you're submitting a patch for review, you'll want to tag the reviewer with the "r" flag found on the attachment creation screen. If you then put in ':rnewman' it should autocomplete accordingly (or whoever you're flagging for review. Generally the :$SOMETHING is for autocompleting IRC handles to Bugzilla accounts).

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +128,5 @@
>  
>      // The maximum quantity, in bytes, of bitmap data which may be stored in the cache.
>      private final int maxSizeBytes;
>  
> +    // this object is used to guard modification to the ordering map. This allows for read transactions

s/this/This/

@@ +135,2 @@
>  
> +    // This Reader/Writer lock is to ensure synchronization of reads/writes on both pernament

s/pernament/permanent/

@@ +135,3 @@
>  
> +    // This Reader/Writer lock is to ensure synchronization of reads/writes on both pernament
> +    // and non-pernament backing maps. It's created to be fair

s/pernament/permanent/

Perhaps you were thwarted by keming?
Attachment #8420666 - Flags: review?(rnewman)
Reporter

Comment 5

5 years ago
Comment on attachment 8420666 [details] [diff] [review]
Proposed patch for this bug

Review of attachment 8420666 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. I agree with ckitching's comments, so let's get those addressed and get this on Try.

::: mobile/android/base/favicons/cache/FaviconCache.java
@@ +135,3 @@
>  
> +    // This Reader/Writer lock is to ensure synchronization of reads/writes on both pernament
> +    // and non-pernament backing maps. It's created to be fair

Nit: closing period.

@@ +135,4 @@
>  
> +    // This Reader/Writer lock is to ensure synchronization of reads/writes on both pernament
> +    // and non-pernament backing maps. It's created to be fair
> +    private final ReentrantReadWriteLock backingMapsLock = new ReentrantReadWriteLock(true);

Do we need fairness here? I suspect that non-fairness will actually produce better results for us, particularly in our usual two-thread world. Certainly it's worth commenting about why you made the decision.

@@ +450,5 @@
>       * @param element The element that is to become the most recently used one.
>       * @return true if this element already existed in the list, false otherwise. (Useful for preventing multiple-insertion.)
>       */
>      private boolean setMostRecentlyUsedWithinRead(FaviconCacheElement element) {
> +        synchronized(reorderingLock){

Nit:

  synchronized (reorderingLock) {
Attachment #8420666 - Flags: review?(rnewman) → feedback+
(In reply to Richard Newman [:rnewman] from comment #5)
> @@ +135,4 @@
> >  
> > +    // This Reader/Writer lock is to ensure synchronization of reads/writes on both pernament
> > +    // and non-pernament backing maps. It's created to be fair
> > +    private final ReentrantReadWriteLock backingMapsLock = new ReentrantReadWriteLock(true);
> 
> Do we need fairness here? I suspect that non-fairness will actually produce
> better results for us, particularly in our usual two-thread world. Certainly
> it's worth commenting about why you made the decision.

This is equivalent to the code he's replacing - I chose to have a fair MRSW implementation when first implementing it. But was that sensible?

The cost of fairness is trivial. Remember the turn semaphore from the old code? That's the bit that gives you fairness. Take it out and you get the unfair version. (The difference being all the nitty-gritty semaphores (or equivalent) are now in library code, making us much happier).

In the two-threaded universe it hardly makes a difference, but in a hypothetical many-threaded universe it can be more interesting.

The fairness property ensures that threads get processed in the order they arrived at the lock. Without fairness, it's a free-for-all.
In particular, this tends to lead to writer starvation. A writer trying to start will block until there are no readers reading before it writes. If there's a steady stream of new readers, the poor writer never gets a turn. More writers might queue up. Sad times... maybe.

My reasoning for selecting fairness in the first place was that we are generally keen for favicon writes to go through quickly. While unfairness might give readers more priority (and reads are what we care about for UI performance), if your writers get starved all your readers are returning the null favicon because they're trying to read stuff that's queuing up to write...


It's perhaps been too long since I worked on this stuff for me to be able to advise sensibly either way at this point, but that's the long and short of it... do you want fairness?

On a more blue-sky note - why the heck don't we have more than 2 threads yet?
Reporter

Comment 7

5 years ago
(In reply to Chris Kitching [:ckitching] from comment #6)

> My reasoning for selecting fairness in the first place was that we are
> generally keen for favicon writes to go through quickly. While unfairness
> might give readers more priority (and reads are what we care about for UI
> performance), if your writers get starved all your readers are returning the
> null favicon because they're trying to read stuff that's queuing up to
> write...

I suspect we're likely to grow more writing threads than reading threads. The UI code tends to stick to the UI-thread-plus-async-task-thread pattern.

I'm more concerned about us unnecessarily imposing costs in a world where contention is very low (we typically write-then-read, or do a whole bunch of reads), or where we don't get any value from a fair ordering of writes.

JCIP suggests that the fairness penalty of ReentrantLock is about *two orders of magnitude*. RRWL is better for mixed loads, but still, don't pay for fairness unless you need it.

I'm also not all that concerned by waiting-by-the-post-box starvation -- the immediate consumer can be delivered the value without writing it to the cache first. We don't do that now, but we could.

Consider also that Java's RWL seem pretty safe in unfair mode, at least from the standpoint of writer starvation:

https://jeffgrigg.wordpress.com/2009/03/13/the-pure-danger-of-unfair-read-write-locks/


> On a more blue-sky note - why the heck don't we have more than 2 threads yet?

Well, we have plenty of threads. We don't really have more than 2 threads *in UI code*, for a few reasons:

* It's hard to reason about. Rephrasing a problem as single-threaded tasks in one of two queues makes things a lot easier.
* Android likes the two-thread model.
* There aren't all that many devices out there that would offer a huge perf win for, say, having four favicon decoding threads.
Assignee: nobody → araim
Status: NEW → ASSIGNED
Wow. Okay. That makes sense.
I am unsure how the library implementation manages to make this so much more expensive... You could get fairness more cheaply than that with the RRWL and one semaphore...

Oh, Java.
Assignee

Comment 9

5 years ago
Thanks for all the comments - I'll update and generate a new patch. 

I think with our level of contention the fairness cost wouldn't be that bad (two orders of magnitude apply to 2+ threads and heavy contention since it's caused by thread scheduling subtleties). I'd suggest to just document (in code) the decision to use the fair lock and the fact that there is an opportunity to optimize if need be (but not until it's identified to be a problem).

By the way - as I understand it, the semaphore as it was used before doesn't guarantee fairness for operations that happen after the semaphore is released (JVM is free to reorder the threads after they release the semaphore), or am I missing something ?
(In reply to Jakub Miara from comment #9)
> I think with our level of contention the fairness cost wouldn't be that bad
> (two orders of magnitude apply to 2+ threads and heavy contention since it's
> caused by thread scheduling subtleties). I'd suggest to just document (in
> code) the decision to use the fair lock and the fact that there is an
> opportunity to optimize if need be (but not until it's identified to be a
> problem).

I don't understand your logic. Didn't we conclude that our situation gains nothing from the fairness property?
If so, why bother paying anything for it?

Insignificant costs may be insignificant, but if they can be avoided - do so! They add up!

Optimising performance isn't *just* about fixing the big bottlenecks. Sometimes you can find a way to take out a few hundred little ones and do just as well. Not introducing them in the first place, particularly when you gain nothing from doing so, seems folly.

Or did 0300-me miss something?

> By the way - as I understand it, the semaphore as it was used before doesn't
> guarantee fairness for operations that happen after the semaphore is
> released (JVM is free to reorder the threads after they release the
> semaphore), or am I missing something ?

You're right. The definition of fairness for the MRSW problem is weaker than that. Implementing that stronger fairness would be *extremely expensive*.

Fairness in the MRSW problem (and as implemented by ReentrantReadWriteLock) means that threads get processed in arrival order, instead of it being a free-for-all at the locks.
This means that if a writer is queueing, no other readers may start until that writer has his turn. This prevents writer starvation.

By contrast, an unfair MRSW implementation will allow readers to enter so long as no write is active. One may be queuing, but who cares: read! Where many reader are present, that writer may be waiting for ever.

So, the old semaphore usage had readers take and drop a turn semaphore on start, and had writers take the turn semaphore before they wait on the write lock and drop it after they finish their work.
This means readers cannot enter after a writer has started waiting. All new threads queue up on the turn semaphore until that writer has finished.


Our situation really doesn't care too much, since there are few readers. You need enough readers for there to be near-continual reader activity to get this starvation: A moment with no readers active is enough to get a writer going. We just don't have that situation. Fairness would be theoretically tidy if it were almost-free, but it's not, so we don't want it, particularly because the concurrency argument for having it is not applicable here.


Upon re-reading Richard's comment at a less insane hour I noticed another thing...

(In reply to Richard Newman [:rnewman] from comment #7)
> I'm more concerned about us unnecessarily imposing costs in a world where
> contention is very low (we typically write-then-read, or do a whole bunch of
> reads), or where we don't get any value from a fair ordering of writes.

That's not what fairness means here. Write ordering is roughly arrival-order even in unfair mode.
The problem is that in a situation where many readers are continually active, a writer never gets to do anything because they are locked out by the readers.
That link you posted nicely demonstrates that this isn't a real problem until you have very, very many more readers than writers.

> JCIP suggests that the fairness penalty of ReentrantLock is about *two
> orders of magnitude*. RRWL is better for mixed loads, but still, don't pay
> for fairness unless you need it.
> 
> I'm also not all that concerned by waiting-by-the-post-box starvation -- the
> immediate consumer can be delivered the value without writing it to the
> cache first. We don't do that now, but we could.

I suspect you'd find that would scale less well than having a fair MRSW implementation (rolling your own if somehow the library one is awful).
However, this is not a problem we realistically have. Come back when we have hundreds of competing threads and we can have this argument. With benchmarks and things. And ridiculous many-core phones.

> > On a more blue-sky note - why the heck don't we have more than 2 threads yet?
> 
> Well, we have plenty of threads. We don't really have more than 2 threads
> *in UI code*, for a few reasons:
> 
> * It's hard to reason about. Rephrasing a problem as single-threaded tasks
> in one of two queues makes things a lot easier.
> * Android likes the two-thread model.
> * There aren't all that many devices out there that would offer a huge perf
> win for, say, having four favicon decoding threads.

It might, however, be useful to stop writing new code that relies on serialisation over the background thread so it can someday be replaced by a background thread *pool*. Core counts are only going to go up, and having more threads to dump background work onto would be a nice way to exploit that.
I feel some nice API could be constructed that accepts Runnables and stipulations about how they must be serialised and sticks them onto appropriate threads from this pool. This could allow UI developers to exploit this extra concurrency where available.

That said, it's probably going to be several more years before this approach would see any gains except on a few very swanky phones... but maybe it would be worth gently encouraging the concept earlier than that to reduce the work required when many-core phones are here. We don't want to be in a "panic and try to threadify everything in a month" scenario, because that *will* create a lot of really hilarious bugs. :P
Assignee

Comment 11

5 years ago
patch corrected based on reviews
Attachment #8420666 - Attachment is obsolete: true
Assignee

Comment 12

5 years ago
(In reply to Chris Kitching [:ckitching] from comment #10)
> (In reply to Jakub Miara from comment #9)
> > I think with our level of contention the fairness cost wouldn't be that bad
> > (two orders of magnitude apply to 2+ threads and heavy contention since it's
> > caused by thread scheduling subtleties). I'd suggest to just document (in
> > code) the decision to use the fair lock and the fact that there is an
> > opportunity to optimize if need be (but not until it's identified to be a
> > problem).
> 
> I don't understand your logic. Didn't we conclude that our situation gains
> nothing from the fairness property?
> If so, why bother paying anything for it?

We don't pay much. While it's true that little improvements add up, it may be that the penalty for unfairness (in rare cases with writer starvation) will lead (in future) to UI micro-freezes or other disasters. I'm not experienced with the product enough (I'm not experienced with the product at all ;) to know if playing safe and paying the penalty of fairness every time to avoid such potential problems is worth it/is preferable option. Usually when I don't know I tend to play safe - that was my reasoning (It's my first Mozilla patch and I'm rather scared not to break anything :PPP.

> 
> Insignificant costs may be insignificant, but if they can be avoided - do
> so! They add up!
> 
> Optimising performance isn't *just* about fixing the big bottlenecks.
> Sometimes you can find a way to take out a few hundred little ones and do
> just as well. Not introducing them in the first place, particularly when you
> gain nothing from doing so, seems folly.
> 
> Or did 0300-me miss something?

As you can see in the revised patch, I updated the lock to be an unfair one. This is due to my understanding that general consensus (myself included) is that risk of problems due to unfairness is not significant enough to pay the penalty of fairness every time.
Reporter

Updated

5 years ago
Flags: needinfo?(rnewman)
Reporter

Comment 14

5 years ago
Try is green, and this looks fine on my device, so:

https://hg.mozilla.org/integration/fx-team/rev/47d87ee86e69

Thanks for the fix, Jakub; great first bug.

If you're interested in working on more favicon-related bugs, there's a list here:

  https://bugzilla.mozilla.org/buglist.cgi?component=Favicon%20Handling&product=Firefox%20for%20Android&bug_status=__open__&list_id=10310611

Otherwise, let us know what you're interested in working on next (either here or in IRC), and we can help point you in the right direction!
Flags: needinfo?(rnewman)
https://hg.mozilla.org/mozilla-central/rev/47d87ee86e69
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 32
You need to log in before you can comment on or make changes to this bug.