Closed Bug 1297111 Opened 9 years ago Closed 8 years ago

Invalid array access in nsExpirationTracker::RemoveObject()

Categories

(Core :: Graphics: ImageLib, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox48 --- wontfix
firefox49 + wontfix
firefox-esr45 - wontfix
firefox50 + wontfix
firefox51 + wontfix
firefox52 --- wontfix
firefox-esr52 54+ fixed
firefox53 + wontfix
firefox54 + fixed
firefox55 + fixed

People

(Reporter: mccr8, Assigned: bevis)

References

Details

(Keywords: crash, csectype-bounds, sec-high, Whiteboard: [fixed by bug 1350177][post-critsmash-triage][adv-main54+][adv-esr52.2+])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-75eca981-f8b4-4313-be4b-a05dd2160821. =============================================================
This crash has the annotation ElementAt(aIndex = 4294967295, aLength = 0), where 4294967295 is 2^32-1, which I think comes from here: uint32_t last = generation.Length() - 1; T* lastObj = generation[last];
Summary: Crash in InvalidArrayIndex_CRASH → Invalid array access in nsExpirationTracker::RemoveObject()
This is filed as a security bug because while this is a release assert in Nightly, it is a possible security issue on all other branches.
Keywords: csectype-bounds
Keywords: sec-high
Nathan, could you figure out if this is a bug in the expiration tracker or the blur cache?
Flags: needinfo?(nfroyd)
OK, so in RemoveObject, we have some BlurCacheData, with an associated nsExpirationState. That nsExpirationState knows its generation and has an index into the particular generation. So we access the array for the object's generation and find that--according to the assert message--that the generation is empty. But that seems nonsensical, because when the expiration tracker calls BlurCache::NotifyExpired: http://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsExpirationTracker.h#206 We have an object that we've found in some generation and that generation is clearly non-empty. That object gets passed through NotifyExpired and in BlurCache's case, down to RemoveObject. So the object should be found...but it's not. I suppose the generation we're touching in AgeOneGeneration is not necessarily the same one we accessed in RemoveObject. That would imply some sort of memory corruption of the nsExpirationState, or some programming error in BlurCache/nsExpirationState. BlurCache does call nsExpirationTracker::MarkUsed infallibly, and MarkUsed can fail, but I can't see any way that not handling the failure would lead to the problems seen in this crash. I can think of two ways to make this code more robust: 1) We should check that when we access mGenerations in RemoveObject(), we're doing an in-bounds access. mGenerations is currently a normal C array and it's entirely possible for an object with a corrupted nsExpirationState to get here? 2) The NS_ASSERTIONs in RemoveObject should probably also be early exits.
Flags: needinfo?(nfroyd)
Group: core-security → dom-core-security
This will miss the 49 release. We could wontfix it for 49 or try to aim a patch at a 49 dot release later if we need to.
I think WONTFIX is fine. We don't really even know what is happening here.
Crash Signature: [@ InvalidArrayIndex_CRASH] → [@ InvalidArrayIndex_CRASH | nsExpirationTracker<T>::RemoveObject]
Mark 51 as fix-optional as there are no actions at this moment and the volume of crash is low now but still happy to take the fix in 51 early beta.
I'm just going to mark this incomplete. We have no idea what is going on here, and soon the array bounds checks will be everywhere and hopefully save us.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
(In reply to Bevis Tseng[:bevistseng][:btseng] from bug 1345464 comment #11) > Given the facts that: > 1. nsExpirationTracker can only be constructed on the main thread. > (The internal ExpirationTrackerObserver can only be Init/Destroyed on the > main thread.) > 2. There are some potential races in SurfaceCache that creates tracker > on the main thread but calling AddObject/RemoveObject off the main thread > without a mutex lock [1]. :-( > hance, the assertions is added in construction time instead of the time when > CheckStartTimer() is called to ensure that the passed-in event target is > always on the main thread. (In reply to Nathan Froyd [:froydnj] from comment #12) > Ah, I bet this is the cause of bug 1297111. :(
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Component: XPCOM → ImageLib
Bug 1345464 comment 11 suggests a possible cause for these sorts of crashes: nsExpirationTracker<T> must be created on the main thread, but there's nothing that enforces that AddObject/RemoveObject also have to be called on the main thread. So if we have AddObject/RemoveObject being called on multiple threads, there's all sorts of places for things to go wrong. And then we can run into an interleaving of events like: T1: Initializes timer for aging generation. ... T2: Examines current generation, finds its size to be N. T1: Timer goes off. T1: Ages away generation T2 was examining, drastically reducing its size. T2: Goes to index into an array thinking its size is N, whereas it's really M <<< N. T2: Boom. Of course, since there's *zero* locking involved here, we might run into multiple timers running, which is even more fun. Bug 1345464 observed the problem with SurfaceCache, and out of ~10 crash reports looked at for the last week or so, only 1 of them involved SurfaceCache... Is it too much to guess that the other similar crash reports are also caused by races?
tnikkel, can you look at the imagelib crashes? bug 1345464 uncovered some thread races in how SurfaceCache works with expiration trackers -- constructing them on the main thread and then touching the cache off the main thread, which might start a timer for deletion, which would race with the destruction of the tracker on the main thread -- and the same races might be at work in other places.
Flags: needinfo?(tnikkel)
Crash reports show this one is still happening. No reports for 55 yet, but I am going to pre-emptively mark it as affected.
The SurfaceCache was designed to be used on any thread, so it could be pretty hard to only do the nsExpirationTracker calls on the main thread. We could dispatch a runnable to the main thread to do it, but then there could be arbitrary further mutations to the SurfaceCache which could change what calls we want to make on the expiration tracker. I talked to froydnj on irc, he said it might be easier to enhance nsExpirationTracker to allow interaction with it off the main thread.
Flags: needinfo?(tnikkel)
Here is some test done locally by introducing ReentrantMonitor into ExpirationTracker for your reference. It became worse because it is pretty easy to see potential deadlock from SurfaceCache: https://treeherder.mozilla.org/logviewer.html#?job_id=84982699&repo=try&lineNumber=15693 > INFO - GECKO(2925) | ###!!! ERROR: Potential deadlock detected: > INFO - GECKO(2925) | === Cyclical dependency starts at > INFO - GECKO(2925) | --- Mutex : StaticMutex calling context > INFO - GECKO(2925) | [stack trace unavailable] > INFO - GECKO(2925) | --- Next dependency: > INFO - GECKO(2925) | --- ReentrantMonitor : SurfaceTracker (currently acquired) > INFO - GECKO(2925) | calling context > INFO - GECKO(2925) | [stack trace unavailable] > INFO - GECKO(2925) | === Cycle completed at > INFO - GECKO(2925) | --- Mutex : StaticMutex calling context > INFO - GECKO(2925) | [stack trace unavailable] > INFO - GECKO(2925) | Deadlock may happen for some other execution > INFO - GECKO(2925) | [Child 2978] ###!!! ASSERTION: Potential deadlock detected: > INFO - GECKO(2925) | Cyclical dependency starts at > INFO - GECKO(2925) | Mutex : StaticMutex > INFO - GECKO(2925) | Next dependency: > INFO - GECKO(2925) | ReentrantMonitor : SurfaceTracker (currently acquired) > INFO - GECKO(2925) | Cycle completed at > INFO - GECKO(2925) | Mutex : StaticMutex > INFO - GECKO(2925) | Deadlock may happen for some other execution > INFO - GECKO(2925) | : 'Error', file /home/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 307
I can't tell from the log but I think that is because the SurfaceCache has it's own mutex, and so the SurfaceCache and the monitor from nsExpirationTracker that you added can be acquired in either order. During a call into the SurfaceCache it will hold its own mutex, and then call into the nsExpirationTracker to acquire the nsExpirationTracker monitor. And when the nsExpirationTracker calls NotifyExpired it will probably hold its own monitor, and then the NotifyExpired call on the SurfaceCache will acquire the SurfaceCache mutex. The idea that froydnj described was to use the mutex from the SurfaceCache in nsExpirationTracker by passing it to the constructor. So there is only ever one mutex in play.
Too late for firefox 52, mass-wontfix.
Bug 1350177 landed on inbound which should fix this.
Whiteboard: [fixed by bug 1350177]
Assignee: nobody → btseng
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
It would be good if that could get uplifted to at least 54. It may be too late for a big change like this for 53.
Group: dom-core-security → core-security-release
Target Milestone: --- → mozilla55
Wontfix for 53, from comment 18. We are only a week away from release.
(In reply to Andrew McCreight [:mccr8] from comment #18) > It would be good if that could get uplifted to at least 54. It may be too > late for a big change like this for 53. (I've now tagged Bevis for needinfo on getting an uplift request for 54, over in bug 1350177.)
We're going to want an uplift to ESR-52 also, or maybe a subset if there's a safe wallpaper patch. please attach the ESR-52 version to this bug rather than the public bug. An ESR patch in a public bug that's not a topcrasher will be suspicious.
There is some difference on the mutex used in SurfaceCache between aurora and esr52 that causes the conflict on SurfaceCache when merging the patch of bug 1350177 comment 43 to esr52. I have no confidence in this change. It will be helpful from someone familiar with SurfaceCache to merge this change to esr52.
Flags: needinfo?(btseng) → needinfo?(tnikkel)
This patch folds the patches of bug 1335045 and bug 1350177. Would you mind to review again for the changes in SurfaceCache.cpp before uplifting? Thanks!
Flags: needinfo?(tnikkel)
Attachment #8856815 - Flags: review?(tnikkel)
Comment on attachment 8856815 [details] [diff] [review] (ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache. There were no conflicts when I applied your aurora patch from bug 1350177 and the patch from bug 1335045. And the combined diff I got did not differ from this patch. So assuming your aurora patch is good (there weren't many changes to surface cache in aurora) this should be good.
Attachment #8856815 - Flags: review?(tnikkel) → review+
Comment on attachment 8856815 [details] [diff] [review] (ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache. [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is a sec-high bug. User impact if declined: Random crash when SurfaceCache uses nsExpirationTracker. Fix Landed on Version: firefox55 and aurora54+ Risk to taking this patch (and alternatives if risky): The risk is low because we provide a thread-safe implementation of nsExpirationTracker to fix this race condition problem by sharing the same mutex lock that has been acquired by SurfaceCache in original implementation to prevent potential risk of deadlock by introducing a new mutex lock. String or UUID changes made by this patch: No. Treeherder result for reference: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ccad3095963837055c6a649b7d87c35d32e8ae1a
Attachment #8856815 - Flags: approval-mozilla-esr52?
Hi Bevis, Since this bug is a sec-high and also affects 54, do you consider to uplift this for 54?
Flags: needinfo?(btseng)
(In reply to Gerry Chang [:gchang] from comment #26) > Hi Bevis, Since this bug is a sec-high and also affects 54, do you consider > to uplift this for 54? You have approved it in https://bugzilla.mozilla.org/show_bug.cgi?id=1350177#c43 :D
Flags: needinfo?(btseng)
Comment on attachment 8856815 [details] [diff] [review] (ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache. Sec-high, meets the ESR uplift bar, ESR52.2+
Attachment #8856815 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Whiteboard: [fixed by bug 1350177] → [fixed by bug 1350177][post-critsmash-triage]
Whiteboard: [fixed by bug 1350177][post-critsmash-triage] → [fixed by bug 1350177][post-critsmash-triage][adv-main54+][adv-esr52.2+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: