Bugzilla

Comment 1

•

7 years ago

class nsOfflineCacheEvictionFunction final : public mozIStorageFunction {

has an:

  NS_DECL_THREADSAFE_ISUPPORTS

But access to mItems is not synchronized. This code appears to be pretty ancient. Does anyone still understand the threading? If not, we can try adding a mutex here.

Comment 2

•

7 years ago

Trying to find an owner for this code. Jason? Honza?

Flags: needinfo?(jduell.mcbugs)

Flags: needinfo?(honzab.moz)

Assignee

Comment 3

•

7 years ago

Already too busy.. sorry.

Flags: needinfo?(honzab.moz)

Comment 4

•

7 years ago

The stacks I looked at all seemed to be in the nsOfflineCacheDevice / nsDiskCacheDeviceSQL stuff, so moving this out of XPCOM, also based on the analysis in the description.

Component: XPCOM → Networking: Cache

Patrick McManus [:mcmanus]

Comment 5

•

7 years ago

Patrick, can you find an owner for this code? (And perhaps, confirm/deny my suspicions in comment 1).

Flags: needinfo?(mcmanus)

Comment 6

•

7 years ago

offline cache - I'm afraid hozna's queue is going to be the only route.

Flags: needinfo?(mcmanus) → needinfo?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Comment 7

•

7 years ago

All called from nsOfflineCacheEvictionFunction::Apply() -> nsLocalFile::Remove(bool) -> nsLocalFile::MakeDirty() [1] when truncating the mShortWorkingPath member.

Interesting this doesn't crash sooner in the code path.  I would not believe this was reordering.

I'll try to look at this, but really not very soon.


[1] https://dxr.mozilla.org/mozilla-central/rev/f4f374622111022d41dd8d5eb9220624135c534a/xpcom/io/nsLocalFileWin.h#98

Assignee: nobody → honzab.moz

Flags: needinfo?(jduell.mcbugs)

Flags: needinfo?(honzab.moz)

Frederik Braun [:freddy]

Comment 8

•

7 years ago

This is a sec-high. When can work on this bug be scheduled?

BTW: Are there any plans to increase the bus factor for offline cache? :)
It seems bad (not only from a security perspective), if there's only one person who can work on this.

Flags: needinfo?(honzab.moz)

Comment 9

•

7 years ago

the goal is to remove appcache - iirc office 360 was the blocker.

Assignee

Updated

•

7 years ago

Flags: needinfo?(honzab.moz)

Assignee

Comment 10

•

7 years ago

No idea where from this heap corruption comes and unable to reproduce locally (even with some code instrumentation added to fill in the mShortWorkingPath member).

I really don't know how to move forward here.  The crash rate is tho very low, so I'm not sure I should spend too much time on this.

Throwing back to the triage bag.

Assignee: honzab.moz → nobody

Comment 11

•

7 years ago

I had a look through the crash reports and I see that the error is hit from multiple different threads (i.e. thread 0 and random other ones). So again, I think the problem might be as described in comment 1: simultaneous access to the same data, causing one thread to access data that has been invalidated/reallocated in another.

Compare:
https://crash-stats.mozilla.com/report/index/0b0b6324-b34b-48cd-a0e5-72c292170114
https://crash-stats.mozilla.com/report/index/a8797065-0b09-4cb4-9389-ece492170119

And see the different callstacks:
4 	xul.dll 	nsOfflineCacheEvictionFunction::Apply() 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:243
5 	xul.dll 	nsOfflineCacheDevice::EvictEntries(char const*) 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:1930
6 	xul.dll 	nsCacheService::EvictEntriesForClient(char const*, int)
7 	xul.dll 	nsCacheService::EvictEntriesInternal(int) 	netwerk/cache/nsCacheService.cpp:1542
8 	xul.dll 	nsOfflineCacheDevice::Evict(nsILoadContextInfo*) 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:2411
9 	xul.dll 	nsApplicationCacheService::Evict(nsILoadContextInfo*) 	netwerk/cache/nsApplicationCacheService.cpp:177
10 	xul.dll 	mozilla::net::AppCacheStorage::AsyncEvictStorage(nsICacheEntryDoomCallback*)

vs

 4 	xul.dll 	nsOfflineCacheEvictionFunction::Apply() 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:243
5 	xul.dll 	nsOfflineCacheDevice::EvictEntries(char const*) 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:1948
6 	xul.dll 	nsOfflineCacheDiscardCache::Run()

The data that is corrupted is not protected against simultaneous access. So again, is there any reason to believe we don't simply need a mutex here? (One reasonable answer would be: we changed things around from Firefox 49 to 53).

Flags: needinfo?(honzab.moz)

Comment 12

•

7 years ago

By the way, there are (unfortunately) other callstacks crashing in nsLocalFile::Remove() too, so this bug likely needs to be split up in several others, because there's more than one bug here :-/

Assignee

Comment 13

•

7 years ago

Ah!  I though this was also thread-local as the EvictionObserver was!  Yes, I've seen this being accessed on more than one thread, but I abandoned the thread-concurrency theory based on that bad presumption.

Then this should not be hard to fix.

Assignee: nobody → honzab.moz

Status: NEW → ASSIGNED

Flags: needinfo?(honzab.moz)

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 14

•

7 years ago

Attached patch v1 (thread local for mItems) (obsolete) — Details — Splinter Review

There are two classes involved here:

* nsOfflineCacheEvictionFunction, which is a singleton kept in mEvictionFunction during lifetime of the offline device
- only once registered in sqlite (same lifetime)
- this function (the class object) is the one that collects the deleted items in it's mItems member (replaced by the patch) that we then iterate to delete files on disk
- this function is used on more than one thread
- this function cannot be registered per thread (on demand) because two threads could try to register/deregister it concurrently, what would break sqlite or make it cry loud

* EvictionObserver, which is thread/stack local class
- that only forwards to to the eviction function and mainly forwards the Apply() function to nsOfflineCacheEvictionFunction, which iterates the mItems array and deletes the files (no inter-thread protection)

I removed the mItems member and turned to thread local instead, since we want to handle correctly failure of the sql delete query failure (by not deleting files that might have been collected by partially executed query).  If there were a single array protected by a mutex, the files would be intermixed and we could delete or not delete something we wanted or didn't want.


OTOH, why don't we delete the file directly from inside the function execution?  I'm not sure if the query itself is considered transactional or not.  So that if it fails on, say, row 3 of 5, then the 2 affected rows are reverted.  Then the collect/apply-on-success would make sense.  There is no comment why we collect and delete later, tho.


https://treeherder.mozilla.org/#/jobs?repo=try&revision=434d096e925c0f0f56f632d977e6848672ed0608

Attachment #8838510 - Flags: review?(valentin.gosu)

Comment 15

•

7 years ago

Comment on attachment 8838510 [details] [diff] [review]
v1 (thread local for mItems)

Review of attachment 8838510 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

Attachment #8838510 - Flags: review?(valentin.gosu) → review+

Assignee

Comment 16

•

7 years ago

Comment on attachment 8838510 [details] [diff] [review]
v1 (thread local for mItems)

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  No idea but probably very hard...

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  It's easily identifiable by looking at the code changes, apparently a thread race issue.

Which older supported branches are affected by this flaw?  all

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  this code didn't change a long time, so it's only about merging (will do it soon) ; risk is low

How likely is this patch to cause regressions; how much testing does it need?  Hmmm.. this is not a heavily used  feature, so it might show no sooner than on release if there is some issue with it.  We have some automated tests that probably exercise this code.  We also have QA doing manual tests to run this code, so I think it should be covered.

Attachment #8838510 - Flags: sec-approval?

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 17

•

7 years ago

Attached patch v1.1 (thread local for mItems + few fixes) — Details — Splinter Review

Small nits fixed that slipped the review.

Attachment #8838648 - Flags: review+

Reporter

Comment 18

•

7 years ago

Honza: can you also ask for uplifts as appropriate?  thanks

Flags: needinfo?(honzab.moz)

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

status-firefox51: affected → wontfix

status-firefox-esr45: ? → affected

status-firefox-esr52: --- → affected

tracking-firefox52: --- → +

tracking-firefox53: --- → +

tracking-firefox54: --- → +

tracking-firefox-esr45: --- → 52+

Al Billings [:abillings - ex-MoCo]

Comment 19

•

7 years ago

Comment on attachment 8838510 [details] [diff] [review]
v1 (thread local for mItems)

sec-approval+ for trunk.
As has been mentioned, please nominate patches for Aurora, Beta, and ESR45 so we can get this everywhere.

Attachment #8838510 - Flags: sec-approval? → sec-approval+

Assignee

Updated

•

7 years ago

Attachment #8838510 - Attachment is obsolete: true

Flags: needinfo?(honzab.moz)

Assignee

Comment 20

•

7 years ago

Comment on attachment 8838648 [details] [diff] [review]
v1.1 (thread local for mItems + few fixes)

applies on m-a, m-b.

Approval Request Comment
[Feature/Bug causing the regression]: appcache
[User impact if declined]: crash caused by concurrent thread access on nsCOMArray of nsIFiles
[Is this code covered by automated tests?]: yes (or at least should be according my knowledge)
[Has the fix been verified in Nightly?]: no STR
[Needs manual test from QE? If yes, steps to reproduce]: -
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: hmm.. it's a bit larger patch, but I performed a lot of local manual testing and it seems to work well
[Why is the change risky/not risky?]: there is some (tho small) risk, since it's hard to test all possible code paths
[String changes made/needed]: none

Attachment #8838648 - Flags: approval-mozilla-beta?

Attachment #8838648 - Flags: approval-mozilla-aurora?

Assignee

Comment 21

•

7 years ago

the patch in its version 1 has been approved in comment 19.

Keywords: checkin-needed

Assignee

Comment 22

•

7 years ago

Attached patch v1.1 for esr45 (obsolete) — Details — Splinter Review

[Approval Request Comment]
see comment 20.

Attachment #8838848 - Flags: approval-mozilla-esr45?

Assignee

Comment 23

•

7 years ago

Comment on attachment 8838648 [details] [diff] [review]
v1.1 (thread local for mItems + few fixes)

Applies cleanly, please see comment 20.

Attachment #8838648 - Flags: approval-mozilla-esr52?

Comment 24

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/20b742119207873b92cf8f207fb5a9e71a7734ed

tracking-firefox-esr52: --- → ?

Keywords: checkin-needed

Julien Cristau [:jcristau]

Comment 25

•

7 years ago

Comment on attachment 8838648 [details] [diff] [review]
v1.1 (thread local for mItems + few fixes)

esr52 uplift will happen automagically after this lands on Beta. Speaking of uplift, this patch needed to be rebased around bug 1060419 for landing on trunk, but should apply cleanly as-is to Aurora.

Attachment #8838648 - Flags: approval-mozilla-esr52?

Comment 26

•

7 years ago

Comment on attachment 8838648 [details] [diff] [review]
v1.1 (thread local for mItems + few fixes)

thread safety fix for network cache, aurora53+, beta52+

Attachment #8838648 - Flags: approval-mozilla-beta?

Attachment #8838648 - Flags: approval-mozilla-beta+

Attachment #8838648 - Flags: approval-mozilla-aurora?

Attachment #8838648 - Flags: approval-mozilla-aurora+

Carsten Book [:Tomcat]

Comment 27

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/20b742119207

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox54: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Carsten Book [:Tomcat]

Comment 28

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/1c4be61b3b49

status-firefox53: affected → fixed

Carsten Book [:Tomcat]

Comment 29

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/14a2e9a60736

status-firefox52: affected → fixed

Comment 30

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/14a2e9a60736

status-firefox-esr52: affected → fixed

Gerry Chang [:gchang]

Comment 31

•

7 years ago

Comment on attachment 8838848 [details] [diff] [review]
v1.1 for esr45

Fix a sec-high. ESR45+.

Attachment #8838848 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: dom-core-security → core-security-release

Comment 32

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr45/rev/e06c297754f4

status-firefox-esr45: affected → fixed

Assignee

Comment 33

•

7 years ago

Attached patch v1.1.1 for esr45 (prevent repeated call to ThreadLocal.Init() to avoid assertion failure) — Details — Splinter Review

Attachment #8838848 - Attachment is obsolete: true