Closed Bug 1342567 Opened 7 years ago Closed 7 years ago

Crash in nsExpirationTracker<T>::RemoveObject

Categories

(Core :: Graphics: ImageLib, defect)

52 Branch
x86
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 54+ fixed
firefox53 --- wontfix
firefox54 + fixed
firefox55 + fixed

People

(Reporter: philipp, Assigned: tnikkel)

References

Details

(Keywords: crash, sec-high, testcase-wanted, Whiteboard: [post-critsmash-triage][adv-main54+][adv-esr52.2+])

Crash Data

Attachments

(8 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-8ce8fb39-5515-4de9-98d4-4ccfa2170224.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsExpirationTracker<imgCacheEntry, 3>::RemoveObject(imgCacheEntry*) 	obj-firefox/dist/include/nsExpirationTracker.h:147
1 	xul.dll 	nsExpirationTracker<imgCacheEntry, 3>::MarkUsed(imgCacheEntry*) 	obj-firefox/dist/include/nsExpirationTracker.h:170
2 	xul.dll 	nsContentUtils::LoadImage(nsIURI*, nsINode*, nsIDocument*, nsIPrincipal*, nsIURI*, mozilla::net::ReferrerPolicy, imgINotificationObserver*, int, nsAString_internal const&, imgRequestProxy**, unsigned int) 	dom/base/nsContentUtils.cpp:3297
3 	xul.dll 	nsImageBoxFrame::UpdateImage() 	layout/xul/nsImageBoxFrame.cpp:260
4 	xul.dll 	nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*) 	layout/style/nsRuleNode.cpp:2422
5 	xul.dll 	nsStyleContext::DoGetStyleList<1>() 	obj-firefox/layout/style/nsStyleStructList.h:51
6 	xul.dll 	nsImageBoxFrame::DidSetStyleContext(nsStyleContext*) 	layout/xul/nsImageBoxFrame.cpp:533
7 	xul.dll 	mozilla::dom::Element::FindAttrValueIn(int, nsIAtom*, nsIAtom* const* const*, nsCaseTreatment) 	dom/base/Element.cpp:2668
8 	xul.dll 	nsImageBoxFrame::UpdateLoadFlags() 	layout/xul/nsImageBoxFrame.cpp:307
9 	xul.dll 	nsImageBoxFrame::Init(nsIContent*, nsContainerFrame*, nsIFrame*) 	layout/xul/nsImageBoxFrame.cpp:206
10 	xul.dll 	nsCSSFrameConstructor::InitAndRestoreFrame(nsFrameConstructorState const&, nsIContent*, nsContainerFrame*, nsIFrame*, bool) 	layout/base/nsCSSFrameConstructor.cpp:4971

this signature has been around for a while - a portion of it seems to be uaf though, and if you focus just on them they seem to have gotten more frequent in the 52 beta cycle:
https://crash-stats.mozilla.com/search/?signature=%3DnsExpirationTracker%3CT%3E%3A%3ARemoveObject&address=%3D0xffffffffe5e5e5e5&release_channel=beta&date=%3E%3D2016-04-01T00%3A00%3A00.000Z&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=adapter_vendor_id&_facets=build_id&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_facets=cpu_arch&_facets=process_type&_facets=proto_signature&_facets=addons&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=address#facet-version

around half of them have the mozilla_cc2@internetdownloadmanager.com addon present - not sure if that would be related
Half of all the crashes have that addon, or half of the ones that look like a UAF type crash?
Component: Untriaged → ImageLib
Flags: needinfo?(madperson)
half (or by now closer to 40%) of the UAF-type reports show that the IDM extension was also present (you can see that in the long crash stats link in comment #0 when switching to the addons facet tab).

if you turn it around and specifically exclude the uaf crashes from a socorro search, IDM is installed in 8% of the remaining [@ nsExpirationTracker<T>::RemoveObject] crashes.
Flags: needinfo?(madperson)
Lots like we can get these kind of crashes if we MarkUsed on an entry that isn't in the nsExpirationTracker.
Group: core-security → gfx-core-security
Keywords: sec-high
Timothy, Andrew, do you have any idea to investigate this?
Flags: needinfo?(tnikkel)
Flags: needinfo?(aosmond)
Without any detailed analysis, my gut says is that it is related to the problem identified in bug 1297111. nsExpirationTracker is not thread safe but SurfaceCache can interact with it on many threads, and the extensions simply alter the probability of these thread collisions.
(In reply to Andrew Osmond [:aosmond] from comment #5)
> Without any detailed analysis, my gut says is that it is related to the
> problem identified in bug 1297111. nsExpirationTracker is not thread safe
> but SurfaceCache can interact with it on many threads, and the extensions
> simply alter the probability of these thread collisions.

This is a different expiration tracker, no the surface cache one. I think the imgLoader surface cache tracker is main thread only. I think the mechanism we use to track what is in the tracker is buggy.
Flags: needinfo?(tnikkel)
Too late for firefox 52, mass-wontfix.
(In reply to [:philipp] from comment #0)
> around half of them have the mozilla_cc2@internetdownloadmanager.com addon
> present - not sure if that would be related

Does anyone have a copy of this addon? I don't want to install this program and I don't have a Windows VM handy. Might be useful to know how it interacts with the image cache.
Attached patch diagnostic patchSplinter Review
I've been through this code up and down, left and right, with a fine tooth comb. I don't know how this can happen.

The only things I can see possible that are happening are:

1) the entry isn't in the cache at all, so it's not in the cache tracker either (the cache tracker being a subset of the cache). You'd be right to expect that the entry is in the cache as we just retrieved it a few lines before:

https://dxr.mozilla.org/mozilla-central/rev/10ea10d9993c9701e5525928257a589dea2c05d8/image/imgLoader.cpp#2155

However there is a ValidateEntry call which can do a lot of things. But I went through ValidateEntry and everything it can call in detail to see if it could possible result in entries being removed from the cache and I couldn't find anything. It would be good to either rule this out (and confirm my analysis). Or if my analysis is wrong then we have a strong lead of where to look.

2) the HasNoProxies flag on the cache entry is out of sync with the imgRequest (whether the imgRequest has proxies or not). If this is the case it still wouldn't explain the crashes because we use the HasNoProxies flag on the cache entry as the source of truth for whether to put the entry in the cache tracker or not. And as far as I can tell if they got out of sync and then the HasNoProxies was updated to be correct we would handle that change correctly (adding or removing from the cache tracker).
Attachment #8855962 - Flags: review?(aosmond)
Attachment #8855962 - Flags: review?(aosmond) → review+
Comment on attachment 8855962 [details] [diff] [review]
diagnostic patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

the patch points out where a problem happens, but that is already public on crash-stats, although landing this patch would call more attention to it, an attacker would need to figure out how to trigger the condition first

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

see previous answer

Which older supported branches are affected by this flaw?

not sure

If not all supported branches, which bug introduced the flaw?

not sure

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

don't need to backport, diagnostic patch only

How likely is this patch to cause regressions; how much testing does it need?

just adds a note to crashreports if we are likely about to crash
Attachment #8855962 - Flags: sec-approval?
Flags: needinfo?(aosmond)
If these crashes are mostly from a particular addon, maybe it is somehow interacting with the image cache off the main thread? We see that occasionally with the observer service, such as bug 1339083 and bug 1295600. You could try adding some release assert main thread checks to the entry points to this cache and see if that turns up anything.
Comment on attachment 8855962 [details] [diff] [review]
diagnostic patch

sec-approval=dveditz

Might want to add the "leave-open" keyword so the sheriffs don't resolve it when they merge to mozilla-central.
Attachment #8855962 - Flags: sec-approval? → sec-approval+
Keywords: leave-open
(In reply to Andrew McCreight [:mccr8] from comment #12)
> If these crashes are mostly from a particular addon, maybe it is somehow
> interacting with the image cache off the main thread? We see that
> occasionally with the observer service, such as bug 1339083 and bug 1295600.
> You could try adding some release assert main thread checks to the entry
> points to this cache and see if that turns up anything.

Interesting! I'll write a patch for that next.
Attachment #8858987 - Flags: review?(aosmond)
Comment on attachment 8858987 [details] [diff] [review]
mainthread asserts

Review of attachment 8858987 [details] [diff] [review]:
-----------------------------------------------------------------

What threads does imgLoader::mUncachedImagesMutex protect us from contention with? Most of the call trees into imgLoader::AddToUncachedImages and imgLoader::RemoveFromUncachedImages appear to be on the main thread (mostly called directly or indirectly from the methods we are adding the asserts to). Is it imgRequest::~imgRequest which can be called off main thread (given it has a threadsafe destructor)?
Attachment #8858987 - Flags: review?(aosmond) → review+
> (given it has a threadsafe destructor)?

*Has threadsafe *refcounting*
(In reply to Andrew Osmond [:aosmond] from comment #18)
> What threads does imgLoader::mUncachedImagesMutex protect us from contention
> with? Most of the call trees into imgLoader::AddToUncachedImages and
> imgLoader::RemoveFromUncachedImages appear to be on the main thread (mostly
> called directly or indirectly from the methods we are adding the asserts
> to). Is it imgRequest::~imgRequest which can be called off main thread
> (given it has a threadsafe destructor)?

Exactly. The comment for mUncachedImagesMutex spells it out:

https://dxr.mozilla.org/mozilla-central/rev/a374c35469935a874fefe64d3e07003fc5bc8884/image/imgLoader.h#454

I don't expect the uncached images to be a problem here because they are basically only tracked so we can report them in about:memory. Once an image goes into the uncached images it shouldn't be in any of the cache data structures, and it should be impossible for it to return to the cache at a later time.
We had our first crash with the annotation

https://crash-stats.mozilla.com/report/index/67d0af19-481a-41b2-9925-35ff90170418

the gfx critical note says "request->HasConsumers() false inCache false". So the cache entry is somehow getting removed. Either some illegal off main thread access to the img cache is happening, or the ValidateEntry is somehow able to re-entry and modify the imgcache. I've got the patch (which I'll land shortly) to detect the off main thread case. I'll attach a patch to detect the other case.
Attachment #8859388 - Flags: review?(aosmond)
Comment on attachment 8858987 [details] [diff] [review]
mainthread asserts

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

if the imgcache is indeed being used off main thread, and if the attacker sees the patch, looks at crash stats, finds a crash, sees the stack, and then is able to use the imgcache off main thread in the same way it might not be too hard to make this happen on a build without this diagnostic patch. We need to get this data ourselves before we can do anything about this problem though, not really any other way to get this data.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

see previous answer

Which older supported branches are affected by this flaw?

not sure

If not all supported branches, which bug introduced the flaw?

not sure

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

don't need to backport, diagnostic patch only

How likely is this patch to cause regressions; how much testing does it need?

purposely adds a crash if we use the imgcache off main thread to see if that is happening, i'll back this out on the first crash we see
Attachment #8858987 - Flags: sec-approval?
Comment on attachment 8858987 [details] [diff] [review]
mainthread asserts

sec-approval+
Attachment #8858987 - Flags: sec-approval? → sec-approval+
Comment on attachment 8859388 [details] [diff] [review]
check for imgloader reentry

I'll remove the MOZ_RELEASE_ASSERT before landing, that was just for tryserver to verify that it was not happening.
Attachment #8859388 - Flags: review?(aosmond) → review+
Attachment #8859390 - Flags: review?(aosmond) → review+
Comment on attachment 8859388 [details] [diff] [review]
check for imgloader reentry

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

the patch points out that we think it might be possible for the imgcache to reenter itself. if the attacker figures out how then they could maybe do something with that. I'm not landing the MOZ_RELEASE_ASSERTS so neither we nor the attackers will get the stacks of re-entrancy (at this point, depending on the results of this patch we might need the stacks later)

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

same as answer to the previous question

Which older supported branches are affected by this flaw?

not sure

If not all supported branches, which bug introduced the flaw?

not sure

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

don't need to backport, diagnostic patch only

How likely is this patch to cause regressions; how much testing does it need?

the patch only adds a crash report annotation in a bad situation
Attachment #8859388 - Flags: sec-approval?
Comment on attachment 8859390 [details] [diff] [review]
check if our entry was replaced or removed from the cache

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

this patch just adds a tiny bit more info to the diagnostic patch we already landed, no new information is exposed

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

not anymore then is already landed

Which older supported branches are affected by this flaw?

not sure

If not all supported branches, which bug introduced the flaw?

not sure

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

don't need to backport, diagnostic patch only

How likely is this patch to cause regressions; how much testing does it need?

just changes what we put in a crash report annotation, not risky, no testing needed
Attachment #8859390 - Flags: sec-approval?
https://hg.mozilla.org/mozilla-central/rev/600f9d60d76f
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Looks like this was closed accidentally despite the leave-open keyword.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla55 → ---
Attachment #8859388 - Flags: sec-approval? → sec-approval+
Attachment #8859390 - Flags: sec-approval? → sec-approval+
The addons list from one of the crashes was very vanilla, nothing suspect in there, I think only adblock plus beyond the default stuff we ship. So the addon lead might be a dead end.
We've had a few crashes in builds that have the main thread fatal asserts but there are no occurrences of hitting the fatal asserts. So doesn't look like it's off main thread access.

Stilling waiting for crashes in a build that contains the last two patches.
No crashes on nightly in about two weeks, looks like we'll need some data from beta (which has multiple crashes a day) to make any progress here.

This rolls up all the diagnostic patches for beta except the main thread asserts because that doesn't seem to be the problem.
Attachment #8865261 - Attachment description: rollout patch of diagnostics for beta → rollup patch of diagnostics for beta
Comment on attachment 8865261 [details] [diff] [review]
rollup patch of diagnostics for beta

We have gotten zero crashes on nightly in about two weeks, to make any progress here we need more data. So I'm suggesting uplifting this to beta. There should be no downside (no extra crashes are added, only crash report notes).

We get enough crashes on beta that we only need one day on beta to get enough data. So can be landed for only one beta build.

Approval Request Comment
[Feature/Bug causing the regression]: diagnostic patch to get information from crash stats
[User impact if declined]: diagnostic patch to get information from crash stats
[Is this code covered by automated tests?]: n/a
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: only adds a note to crash reports
[String changes made/needed]: none
Attachment #8865261 - Flags: approval-mozilla-beta?
Attachment #8865261 - Attachment is obsolete: true
Attachment #8865261 - Flags: approval-mozilla-beta?
We finally got a crash with the latest diagnostic patches

https://crash-stats.mozilla.com/report/index/ea212243-7678-4f5f-9beb-5aff40170509
The crash report annotation shows that there is no entry in the cache at all for our key. And that we call RemoveFromCache (the version that passes an entry and not a key) while that entry is "in use". Still not sure how that can happen. So I'll land a patch that release asserts when we get the interesting RemoveFromCache call so we can get a stack. I'll request uplift to beta so we don't have to wait two weeks to get data from this.
Attachment #8866035 - Flags: review?(aosmond) → review+
Comment on attachment 8866035 [details] [diff] [review]
release assert if we remove an "in use" entry from the cache

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

no new information is being disclosed by this patch, just a rel assert where we can a crash note before

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

no new information is being disclosed by this patch, just a rel assert where we can a crash note before

Which older supported branches are affected by this flaw?

diagnostic patch only

If not all supported branches, which bug introduced the flaw?

diagnostic patch only

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

diagnostic patch only

How likely is this patch to cause regressions; how much testing does it need?

we might crash on a rel assert a little more often then we would hit a uaf crash
Attachment #8866035 - Flags: sec-approval?
Comment on attachment 8866035 [details] [diff] [review]
release assert if we remove an "in use" entry from the cache

<insert comment here>
sec-approval+
Attachment #8866035 - Flags: sec-approval? → sec-approval+
Comment on attachment 8866037 [details] [diff] [review]
rollup patch for beta

Instead of waiting another two weeks to get another crash lets uplift to beta.

We get enough crashes on beta that we only need one day on beta to get enough data. So can be landed for only one beta build and then backed out.

Approval Request Comment
[Feature/Bug causing the regression]: diagnostic patch to get information from crash stats
[User impact if declined]: diagnostic patch to get information from crash stats
[Is this code covered by automated tests?]: n/a
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: crashes slightly earlier where we are likely to crash anyways
[String changes made/needed]: none
Attachment #8866037 - Flags: approval-mozilla-beta?
Comment on attachment 8866037 [details] [diff] [review]
rollup patch for beta

This is a diagnostic patch to get information. Beta54+. Should be in 54 beta 8.
Attachment #8866037 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://crash-stats.mozilla.com/report/index/36e15fd1-9962-4cb3-a0ba-a8ab00170517

Looks like one of the content policy calls we make in ShouldLoadCachedImage

https://dxr.mozilla.org/mozilla-central/rev/b8e9b674033bcd1f3a4c59b9d0ee7619c1a17cc5/image/imgLoader.cpp#559

calls into JS because content policies can be implemented in JS it seems. That's already bad, but it looks like the JS does a sync XHR which goes ahead and spins it's own event loop which can do anything, in this case it modifies the image cache entry we are looking at.

I saw ehsan in the blame of this code so I'll pick on you. Is this expected? Is there anything we can do here? Is it even reasonable to make sure every place that calls a content policy ShouldLoad method can handle running a nested event loop there?
Flags: needinfo?(ehsan)
the rollup patch has issues like 

Hunk #7 FAILED at 2417
7 out of 7 hunks FAILED -- saving rejects to file image/imgLoader.cpp.rej
patching file image/imgLoader.h
Hunk #1 FAILED at 138
Hunk #2 FAILED at 172
2 out of 2 hunks FAILED -- saving rejects to file image/imgLoader.h.rej

could you take a look, thanks!
Flags: needinfo?(tnikkel)
The rollup patch has already been uplifted.
Flags: needinfo?(tnikkel)
(In reply to Timothy Nikkel (:tnikkel) from comment #48)
> I saw ehsan in the blame of this code so I'll pick on you. Is this expected?

Certainly (at least the part about content policies being implemented in JS).  This is how extensions like ad blockers work.  (And I have seen the IDM add-on use the content policy API in other bugs FWIW so no surprises there.)

> Is there anything we can do here? Is it even reasonable to make sure every
> place that calls a content policy ShouldLoad method can handle running a
> nested event loop there?

You need to protect your code against this.  :-)

The simplest way to do that would be to check nsIDocument::IsInSyncOperation() <https://searchfox.org/mozilla-central/rev/9a7fbdee1d54f99cd548af95b81231d80e5f9ad1/dom/base/nsIDocument.h#2541>.  That function would return true if a nested event loop such as this one for sync XHR is currently running somewhere underneath you on the stack.  If you need to defer some operation to the end of any such sync loops, you may want to modify the counter management in nsIDocument::SetIsInSyncOperation() (called from nsAutoSyncOperation::~nsAutoSyncOperation) to do something when the counter hits zero, which would be when the last one of these sync loops returns.
Flags: needinfo?(ehsan)
The nested event loop doesn't matter here. Just running js can do plenty of things to screw us up here.

So basically whenever we do a content policy check we have to assume that the entire world can change out from under us and not save any state across content policy calls. Surely this is a more widespread problem in code calling content policy methods and can't just be limited to imagelib? We need to audit all calls.

It's quite frustrating because I spent hours pouring over the code that imgLoader::ValidateEntry calls to try to see if any of the calls could "escape" like this. If this is the expected behaviour then it needs to be very clear so that this same mistake isn't repeated over and over again in our code.

Apparently, this comment

https://dxr.mozilla.org/mozilla-central/rev/41958333867b0f537271dbd4cb4ba9e8a67a85a8/dom/base/nsIContentPolicy.idl#64

is completely wrong then? We should remove it.
Flags: needinfo?(ehsan)
I just had a look at some of the places where this code is called, this seems really bad. It seems like it adds JS calls and nested event loops to tons of very deeply nested code that isn't expecting it.
We got the data we needed, we can back this out.

Approval Request Comment
[Feature/Bug causing the regression]: n/a, back out a diagnostic patch
[User impact if declined]: n/a, back out a diagnostic patch
[Is this code covered by automated tests?]:  n/a, back out a diagnostic patch
[Has the fix been verified in Nightly?]:  n/a, back out a diagnostic patch
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no, back out a diagnostic patch
[Why is the change risky/not risky?]: back out a diagnostic patch
[String changes made/needed]: none
Attachment #8868754 - Flags: approval-mozilla-beta?
Just for fun here is a stack without a nested event loop
https://crash-stats.mozilla.com/report/index/ab33de34-e187-4bdc-9ce2-5d5720170517
(In reply to Timothy Nikkel (:tnikkel) from comment #56)
> Backed out all diagnostic changes landed on central/inbound
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> 641fa04790771eff942517d200348a72501c09e7

This got merged to m-c as well.
https://hg.mozilla.org/mozilla-central/rev/641fa04790771eff942517d200348a72501c09e7
Comment on attachment 8868754 [details] [diff] [review]
backout patch for beta

Backout diagnostic patch. Beta54+. Should be in 54 beta 9.
Attachment #8868754 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Timothy Nikkel (:tnikkel) from comment #52)
> The nested event loop doesn't matter here. Just running js can do plenty of
> things to screw us up here.

Right.

> So basically whenever we do a content policy check we have to assume that
> the entire world can change out from under us and not save any state across
> content policy calls. Surely this is a more widespread problem in code
> calling content policy methods and can't just be limited to imagelib? We
> need to audit all calls.

Yeah, although I think most callers are already aware of this.  I'm surprised that imagelib assumed something else all these years.  :-(  And how that didn't blow up earlier.

> It's quite frustrating because I spent hours pouring over the code that
> imgLoader::ValidateEntry calls to try to see if any of the calls could
> "escape" like this. If this is the expected behaviour then it needs to be
> very clear so that this same mistake isn't repeated over and over again in
> our code.

Firstly, I'm sorry you had to waste a lot of time and energy on this.  :-(

But in general, please never assume that *any* XPCOM add-on that isn't builtinclass has any sane behavior, that is a very dangerous assumption and can indeed backfire in all sorts of ways.  In general, extensions are free to implement any and all such interfaces and make their implementations do absolutely anything.  After 57 when release builds won't load anything but web extension based add-ons, this problem will mostly fix itself...

> Apparently, this comment
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 41958333867b0f537271dbd4cb4ba9e8a67a85a8/dom/base/nsIContentPolicy.idl#64
> 
> is completely wrong then? We should remove it.

Yes, that's definitely wrong.  That was done ~17 years ago by Boris: https://github.com/mozilla/gecko-dev/commit/6d2d10784c084733cdbcad5bf647d7d36c16e965.  OK to remove it?
Flags: needinfo?(ehsan) → needinfo?(bzbarsky)
That comment is true in the sense that if an implementor of ShouldLoad does any of those things really bad stuff will happen.  If an implementor of ShouldLoad does a sync XHR, really bad things can also happen, and not just for the imagelib callsite...

Like comment 52 says, we do content policy checks in all sorts of places that are deeply nested under various callstacks right now.  I don't think any of them are really hardened against either malice (what's the point?) or incompetence on the part of the content policy implementors.  Yes, this is all pretty terrible.  Spot-fixing things for stability purposes probably makes sense, of course.

Bug 893916 tracks one of the proposed mitigation strategies for this specifically for images, though it would not necessarily help the specific case discussed in this bug, even if we managed to do it.

When we drop support for XPCOM addons, that might help some.  I don't know whether anyone has done a really careful audit of the webextension content policy codepaths.  We should really do that.
Flags: needinfo?(bzbarsky)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #59)
> But in general, please never assume that *any* XPCOM add-on that isn't
> builtinclass has any sane behavior, that is a very dangerous assumption and
> can indeed backfire in all sorts of ways.  In general, extensions are free
> to implement any and all such interfaces and make their implementations do
> absolutely anything.  After 57 when release builds won't load anything but
> web extension based add-ons, this problem will mostly fix itself...

I didn't know that the code involved addons in any way. If I knew then of course I would assume the addon was doing the worst thing possible.
I think this should make things better, hopefully it doesn't mess up our accounting of entries in the cache somehow.
Attachment #8872110 - Flags: review?(aosmond)
Attachment #8872110 - Flags: review?(aosmond) → review+
Keywords: leave-open
Comment on attachment 8872110 [details] [diff] [review]
spotfix imgloader



[Security approval request comment]
How easily could an exploit be constructed based on the patch?

an attacker would need to have control of an extension installed into Firefox on the targets computer to exploit this. this patch doesn't reveal anything that the diagnostic patches we already landed didn't already reveal

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

probably not. I scoured the code to try to find this problem (using more info than is in the patch) and I didn't find it.

Which older supported branches are affected by this flaw?

all

If not all supported branches, which bug introduced the flaw?

not sure, likely around for a very long time

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

trivial

How likely is this patch to cause regressions; how much testing does it need?

since we haven't been able to reproduce this bug the only way to test this is to check crashstats to confirm that the crashes go away, and that we haven't made other crashes worse
Attachment #8872110 - Flags: sec-approval?
NI? release management to see if we have time to take this on Beta and ESR52 given the current schedule. This will determine if we take it on trunk now.
Flags: needinfo?(rkothari)
Flags: needinfo?(lhenry)
We will gtb b13 on Thursday so yes we can take it.
Flags: needinfo?(rkothari)
Comment on attachment 8872110 [details] [diff] [review]
spotfix imgloader

Please nominate patches for Beta and ESR52 approval, to be checked in after mozilla-central.
sec-approval+
Attachment #8872110 - Flags: sec-approval? → sec-approval+
Comment on attachment 8872110 [details] [diff] [review]
spotfix imgloader

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: it's marked sec-high, but thats probably too high since it requires an attacker to control an extension. it should fix crashes.
User impact if declined: some crashes
Fix Landed on Version: 55
Risk to taking this patch (and alternatives if risky): not risky
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/Bug causing the regression]: around for a long time
[User impact if declined]: some crashes
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: not yet, it just landed
[Needs manual test from QE? If yes, steps to reproduce]: no, we don't know how to reproduce
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: if we reach the condition changed in the patch we are likely crashing on the next line
[String changes made/needed]: none
Attachment #8872110 - Flags: approval-mozilla-esr52?
Attachment #8872110 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/39f174dc2f91
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Flags: needinfo?(lhenry)
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8872110 [details] [diff] [review]
spotfix imgloader

imgloader crash fix, sec-high, beta54+ and esr52.2+
Attachment #8872110 - Flags: approval-mozilla-esr52?
Attachment #8872110 - Flags: approval-mozilla-esr52+
Attachment #8872110 - Flags: approval-mozilla-beta?
Attachment #8872110 - Flags: approval-mozilla-beta+
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Group: gfx-core-security → core-security-release
Track 54+/55+ as sec-high.
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main54+][adv-esr52.2+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: