Closed
Bug 1297111
Opened 9 years ago
Closed 8 years ago
Invalid array access in nsExpirationTracker::RemoveObject()
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
People
(Reporter: mccr8, Assigned: bevis)
References
Details
(Keywords: crash, csectype-bounds, sec-high, Whiteboard: [fixed by bug 1350177][post-critsmash-triage][adv-main54+][adv-esr52.2+])
Crash Data
Attachments
(1 file)
|
35.35 KB,
patch
|
tnikkel
:
review+
ritu
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-75eca981-f8b4-4313-be4b-a05dd2160821.
=============================================================
| Reporter | ||
Comment 1•9 years ago
|
||
This crash has the annotation ElementAt(aIndex = 4294967295, aLength = 0), where 4294967295 is 2^32-1, which I think comes from here:
uint32_t last = generation.Length() - 1;
T* lastObj = generation[last];
Summary: Crash in InvalidArrayIndex_CRASH → Invalid array access in nsExpirationTracker::RemoveObject()
| Reporter | ||
Comment 2•9 years ago
|
||
This is filed as a security bug because while this is a release assert in Nightly, it is a possible security issue on all other branches.
Keywords: csectype-bounds
| Reporter | ||
Comment 3•9 years ago
|
||
Nathan, could you figure out if this is a bug in the expiration tracker or the blur cache?
Flags: needinfo?(nfroyd)
Comment 4•9 years ago
|
||
OK, so in RemoveObject, we have some BlurCacheData, with an associated nsExpirationState. That nsExpirationState knows its generation and has an index into the particular generation. So we access the array for the object's generation and find that--according to the assert message--that the generation is empty.
But that seems nonsensical, because when the expiration tracker calls BlurCache::NotifyExpired:
http://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsExpirationTracker.h#206
We have an object that we've found in some generation and that generation is clearly non-empty. That object gets passed through NotifyExpired and in BlurCache's case, down to RemoveObject. So the object should be found...but it's not.
I suppose the generation we're touching in AgeOneGeneration is not necessarily the same one we accessed in RemoveObject. That would imply some sort of memory corruption of the nsExpirationState, or some programming error in BlurCache/nsExpirationState. BlurCache does call nsExpirationTracker::MarkUsed infallibly, and MarkUsed can fail, but I can't see any way that not handling the failure would lead to the problems seen in this crash.
I can think of two ways to make this code more robust:
1) We should check that when we access mGenerations in RemoveObject(), we're doing an in-bounds access. mGenerations is currently a normal C array and it's entirely possible for an object with a corrupted nsExpirationState to get here?
2) The NS_ASSERTIONs in RemoveObject should probably also be early exits.
Flags: needinfo?(nfroyd)
Updated•9 years ago
|
Group: core-security → dom-core-security
status-firefox48:
--- → wontfix
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → affected
tracking-firefox49:
--- → +
tracking-firefox50:
--- → +
tracking-firefox51:
--- → +
Comment 5•9 years ago
|
||
This will miss the 49 release. We could wontfix it for 49 or try to aim a patch at a 49 dot release later if we need to.
| Reporter | ||
Comment 6•9 years ago
|
||
I think WONTFIX is fine. We don't really even know what is happening here.
| Reporter | ||
Updated•9 years ago
|
Crash Signature: [@ InvalidArrayIndex_CRASH] → [@ InvalidArrayIndex_CRASH | nsExpirationTracker<T>::RemoveObject]
Comment 7•9 years ago
|
||
Mark 51 as fix-optional as there are no actions at this moment and the volume of crash is low now but still happy to take the fix in 51 early beta.
| Reporter | ||
Comment 8•9 years ago
|
||
I'm just going to mark this incomplete. We have no idea what is going on here, and soon the array bounds checks will be everywhere and hopefully save us.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
| Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Bevis Tseng[:bevistseng][:btseng] from bug 1345464 comment #11)
> Given the facts that:
> 1. nsExpirationTracker can only be constructed on the main thread.
> (The internal ExpirationTrackerObserver can only be Init/Destroyed on the
> main thread.)
> 2. There are some potential races in SurfaceCache that creates tracker
> on the main thread but calling AddObject/RemoveObject off the main thread
> without a mutex lock [1]. :-(
> hance, the assertions is added in construction time instead of the time when
> CheckStartTimer() is called to ensure that the passed-in event target is
> always on the main thread.
(In reply to Nathan Froyd [:froydnj] from comment #12)
> Ah, I bet this is the cause of bug 1297111. :(
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
| Reporter | ||
Updated•8 years ago
|
Component: XPCOM → ImageLib
Comment 10•8 years ago
|
||
Bug 1345464 comment 11 suggests a possible cause for these sorts of crashes: nsExpirationTracker<T> must be created on the main thread, but there's nothing that enforces that AddObject/RemoveObject also have to be called on the main thread. So if we have AddObject/RemoveObject being called on multiple threads, there's all sorts of places for things to go wrong. And then we can run into an interleaving of events like:
T1: Initializes timer for aging generation.
...
T2: Examines current generation, finds its size to be N.
T1: Timer goes off.
T1: Ages away generation T2 was examining, drastically reducing its size.
T2: Goes to index into an array thinking its size is N, whereas it's really M <<< N.
T2: Boom.
Of course, since there's *zero* locking involved here, we might run into multiple timers running, which is even more fun.
Bug 1345464 observed the problem with SurfaceCache, and out of ~10 crash reports looked at for the last week or so, only 1 of them involved SurfaceCache... Is it too much to guess that the other similar crash reports are also caused by races?
Comment 11•8 years ago
|
||
tnikkel, can you look at the imagelib crashes? bug 1345464 uncovered some thread races in how SurfaceCache works with expiration trackers -- constructing them on the main thread and then touching the cache off the main thread, which might start a timer for deletion, which would race with the destruction of the tracker on the main thread -- and the same races might be at work in other places.
Flags: needinfo?(tnikkel)
Comment 12•8 years ago
|
||
Crash reports show this one is still happening. No reports for 55 yet, but I am going to pre-emptively mark it as affected.
Comment 13•8 years ago
|
||
The SurfaceCache was designed to be used on any thread, so it could be pretty hard to only do the nsExpirationTracker calls on the main thread. We could dispatch a runnable to the main thread to do it, but then there could be arbitrary further mutations to the SurfaceCache which could change what calls we want to make on the expiration tracker.
I talked to froydnj on irc, he said it might be easier to enhance nsExpirationTracker to allow interaction with it off the main thread.
Flags: needinfo?(tnikkel)
| Assignee | ||
Comment 14•8 years ago
|
||
Here is some test done locally by introducing ReentrantMonitor into ExpirationTracker for your reference.
It became worse because it is pretty easy to see potential deadlock from SurfaceCache:
https://treeherder.mozilla.org/logviewer.html#?job_id=84982699&repo=try&lineNumber=15693
> INFO - GECKO(2925) | ###!!! ERROR: Potential deadlock detected:
> INFO - GECKO(2925) | === Cyclical dependency starts at
> INFO - GECKO(2925) | --- Mutex : StaticMutex calling context
> INFO - GECKO(2925) | [stack trace unavailable]
> INFO - GECKO(2925) | --- Next dependency:
> INFO - GECKO(2925) | --- ReentrantMonitor : SurfaceTracker (currently acquired)
> INFO - GECKO(2925) | calling context
> INFO - GECKO(2925) | [stack trace unavailable]
> INFO - GECKO(2925) | === Cycle completed at
> INFO - GECKO(2925) | --- Mutex : StaticMutex calling context
> INFO - GECKO(2925) | [stack trace unavailable]
> INFO - GECKO(2925) | Deadlock may happen for some other execution
> INFO - GECKO(2925) | [Child 2978] ###!!! ASSERTION: Potential deadlock detected:
> INFO - GECKO(2925) | Cyclical dependency starts at
> INFO - GECKO(2925) | Mutex : StaticMutex
> INFO - GECKO(2925) | Next dependency:
> INFO - GECKO(2925) | ReentrantMonitor : SurfaceTracker (currently acquired)
> INFO - GECKO(2925) | Cycle completed at
> INFO - GECKO(2925) | Mutex : StaticMutex
> INFO - GECKO(2925) | Deadlock may happen for some other execution
> INFO - GECKO(2925) | : 'Error', file /home/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 307
Comment 15•8 years ago
|
||
I can't tell from the log but I think that is because the SurfaceCache has it's own mutex, and so the SurfaceCache and the monitor from nsExpirationTracker that you added can be acquired in either order. During a call into the SurfaceCache it will hold its own mutex, and then call into the nsExpirationTracker to acquire the nsExpirationTracker monitor. And when the nsExpirationTracker calls NotifyExpired it will probably hold its own monitor, and then the NotifyExpired call on the SurfaceCache will acquire the SurfaceCache mutex.
The idea that froydnj described was to use the mutex from the SurfaceCache in nsExpirationTracker by passing it to the constructor. So there is only ever one mutex in play.
Comment 16•8 years ago
|
||
Too late for firefox 52, mass-wontfix.
Comment 17•8 years ago
|
||
Bug 1350177 landed on inbound which should fix this.
| Reporter | ||
Updated•8 years ago
|
Whiteboard: [fixed by bug 1350177]
| Reporter | ||
Updated•8 years ago
|
Assignee: nobody → btseng
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
tracking-firefox54:
--- → ?
Resolution: --- → FIXED
| Reporter | ||
Comment 18•8 years ago
|
||
It would be good if that could get uplifted to at least 54. It may be too late for a big change like this for 53.
Updated•8 years ago
|
Group: dom-core-security → core-security-release
Updated•8 years ago
|
status-firefox-esr52:
--- → affected
tracking-firefox53:
--- → ?
tracking-firefox55:
--- → ?
tracking-firefox-esr45:
--- → ?
tracking-firefox-esr52:
--- → ?
Target Milestone: --- → mozilla55
Comment 20•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #18)
> It would be good if that could get uplifted to at least 54. It may be too
> late for a big change like this for 53.
(I've now tagged Bevis for needinfo on getting an uplift request for 54, over in bug 1350177.)
Comment 21•8 years ago
|
||
We're going to want an uplift to ESR-52 also, or maybe a subset if there's a safe wallpaper patch. please attach the ESR-52 version to this bug rather than the public bug. An ESR patch in a public bug that's not a topcrasher will be suspicious.
Depends on: 1350177
Flags: needinfo?(btseng)
| Assignee | ||
Comment 22•8 years ago
|
||
There is some difference on the mutex used in SurfaceCache between aurora and esr52 that causes the conflict on SurfaceCache when merging the patch of bug 1350177 comment 43 to esr52.
I have no confidence in this change.
It will be helpful from someone familiar with SurfaceCache to merge this change to esr52.
Flags: needinfo?(btseng) → needinfo?(tnikkel)
| Assignee | ||
Comment 23•8 years ago
|
||
This patch folds the patches of bug 1335045 and bug 1350177.
Would you mind to review again for the changes in SurfaceCache.cpp before uplifting?
Thanks!
Flags: needinfo?(tnikkel)
Attachment #8856815 -
Flags: review?(tnikkel)
Comment 24•8 years ago
|
||
Comment on attachment 8856815 [details] [diff] [review]
(ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache.
There were no conflicts when I applied your aurora patch from bug 1350177 and the patch from bug 1335045. And the combined diff I got did not differ from this patch. So assuming your aurora patch is good (there weren't many changes to surface cache in aurora) this should be good.
Attachment #8856815 -
Flags: review?(tnikkel) → review+
| Assignee | ||
Comment 25•8 years ago
|
||
Comment on attachment 8856815 [details] [diff] [review]
(ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is a sec-high bug.
User impact if declined: Random crash when SurfaceCache uses nsExpirationTracker.
Fix Landed on Version: firefox55 and aurora54+
Risk to taking this patch (and alternatives if risky):
The risk is low because we provide a thread-safe implementation of nsExpirationTracker to fix this race condition problem by sharing the same mutex lock that has been acquired by SurfaceCache in original implementation to prevent potential risk of deadlock by introducing a new mutex lock.
String or UUID changes made by this patch: No.
Treeherder result for reference:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ccad3095963837055c6a649b7d87c35d32e8ae1a
Attachment #8856815 -
Flags: approval-mozilla-esr52?
Comment 26•8 years ago
|
||
Hi Bevis, Since this bug is a sec-high and also affects 54, do you consider to uplift this for 54?
Flags: needinfo?(btseng)
| Assignee | ||
Comment 27•8 years ago
|
||
(In reply to Gerry Chang [:gchang] from comment #26)
> Hi Bevis, Since this bug is a sec-high and also affects 54, do you consider
> to uplift this for 54?
You have approved it in https://bugzilla.mozilla.org/show_bug.cgi?id=1350177#c43 :D
Flags: needinfo?(btseng)
Updated•8 years ago
|
Comment on attachment 8856815 [details] [diff] [review]
(ESR52) Patch: Bug 1297111 - Refactor a thread-safe ExpirationTracker for the use in SurfaceCache.
Sec-high, meets the ESR uplift bar, ESR52.2+
Attachment #8856815 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 29•8 years ago
|
||
| uplift | ||
Updated•8 years ago
|
Whiteboard: [fixed by bug 1350177] → [fixed by bug 1350177][post-critsmash-triage]
Updated•8 years ago
|
Whiteboard: [fixed by bug 1350177][post-critsmash-triage] → [fixed by bug 1350177][post-critsmash-triage][adv-main54+][adv-esr52.2+]
Updated•8 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•