1367497 - Crash in ExpirationTrackerImpl<T>::RemoveObjectLocked

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-3dc83f35-1f51-40c9-89f0-182280170522.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	ExpirationTrackerImpl<nsIDocument::SelectorCacheKey, 4, detail::PlaceholderLock, detail::PlaceholderAutoLock>::RemoveObjectLocked(nsIDocument::SelectorCacheKey*, detail::PlaceholderAutoLock const&) 	xpcom/ds/nsExpirationTracker.h:162
1 	xul.dll 	nsIDocument::SelectorCache::NotifyExpired(nsIDocument::SelectorCacheKey*) 	dom/base/nsDocument.cpp:1244
2 	xul.dll 	nsExpirationTracker<mozilla::gfx::GradientCacheData, 4>::NotifyExpiredLocked(mozilla::gfx::GradientCacheData*, detail::PlaceholderAutoLock const&) 	obj-firefox/dist/include/nsExpirationTracker.h:450
3 	xul.dll 	ExpirationTrackerImpl<nsIDocument::SelectorCacheKey, 4, detail::PlaceholderLock, detail::PlaceholderAutoLock>::AgeOneGenerationLocked(detail::PlaceholderAutoLock const&) 	xpcom/ds/nsExpirationTracker.h:226
4 	xul.dll 	ExpirationTrackerImpl<nsIDocument::SelectorCacheKey, 4, detail::PlaceholderLock, detail::PlaceholderAutoLock>::HandleTimeout() 	xpcom/ds/nsExpirationTracker.h:374
5 	xul.dll 	nsTimerImpl::Fire(int) 	xpcom/threads/nsTimerImpl.cpp:488
6 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/TimerThread.cpp:287
7 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1264
8 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
9 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:301
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
12 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
13 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:269
14 	xul.dll 	XRE_RunAppShell() 	toolkit/xre/nsEmbedFunctions.cpp:869
15 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:269
16 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
17 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
18 	xul.dll 	XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/nsEmbedFunctions.cpp:693
19 	xul.dll 	mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/Bootstrap.cpp:65
20 	firefox.exe 	content_process_main(mozilla::Bootstrap*, int, char** const) 	ipc/contentproc/plugin-container.cpp:64
21 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
22 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
23 	kernel32.dll 	BaseThreadInitThunk 	
24 	ntdll.dll 	__RtlUserThreadStart 	
25 	ntdll.dll 	_RtlUserThreadStart

this crash signature is newly showing up in firefox 54 and later and so far reports are coming in from windows and macos installations. the timing of the crashes appearing indicates that it might be related to bug 1350177.

the crashing address of a small number of reports is indicating that it may be a UAF situation, so as a precaution i'm marking the report as security sensitive

Nathan Froyd [:froydnj]

Comment 1

•

7 years ago

Bevis, can you take a look at this?

Flags: needinfo?(btseng)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 2

•

7 years ago

I'll look into this.

Assignee: nobody → btseng

Flags: needinfo?(btseng)

Daniel Veditz [:dveditz]

Comment 3

•

7 years ago

Some of the crashes appear to be use-after-frees. Others are near-null dereferences or other random addresses

Keywords: csectype-uaf, sec-high, testcase-wanted

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security → dom-core-security

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 4

•

7 years ago

(In reply to [:philipp] from comment #0)
> this crash signature is newly showing up in firefox 54 and later and so far
> reports are coming in from windows and macos installations. the timing of
> the crashes appearing indicates that it might be related to bug 1350177.
After further review, I didn't see the reason why the fix in nsExpirationTracker in bug 1350177 could introduce these new crashes.
It shall be more robust instead because the similar issue on SurfaceCache in bug 1297111 has never been happened after including the fix of bug 1350177 with a mutex protected.

The difference of these sub-classes compared to SurfaceCache is that they are all supposed to be accessed on the main thread.
However, currently, there is no assertion check on the main thread in nsExpirationTracker yet.

In addition, there is a undefined behavior in release build if we use NS_ASSERTION() in ExpirationTrackerImpl::<T>::RemoveObjectLocked():
https://hg.mozilla.org/releases/mozilla-beta/annotate/54f218cec92f/xpcom/ds/nsExpirationTracker.h#l161
Hence, thing could go wrong if the caller removes an object which is not tracked or remove an object from the wrong tracker found in bug 1363036 comment 29.

Maybe we could add more MOZ_DIAGNOSTIC_ASSERT for thread-safety check and add more error handling to replace the use of NS_ASSERTION() in ::RemoveObjectLocked().

Does it make sense to you, Nathan?

Flags: needinfo?(nfroyd)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 5

•

7 years ago

Attached patch (WIP) Add MOZ_DIAGNOSTIC_ASSERT for correctness check. (obsolete) — Details — Splinter Review

I'd like to address the concern in comment 4.

Flags: needinfo?(nfroyd)

Attachment #8871667 - Flags: feedback?(nfroyd)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 6

•

7 years ago

Comment on attachment 8871667 [details] [diff] [review]
(WIP) Add MOZ_DIAGNOSTIC_ASSERT for correctness check.

From the call stack it's more likely that
1. from the caller, same object were added more than once unexpectedly.
2. when removing at the 1st time, its nsExpirationState::mGeneration was reset to nsExpirationState::NOT_TRACKED at 
http://searchfox.org/mozilla-central/rev/b318c7dca7392bd16c0b11929f55b1be133f0b31/xpcom/ds/nsExpirationTracker.h#191
3. when removing at the 2nd time, we will try to access nsTArray from mGenerations[nsExpirationState::NOT_TRACKED] which is invalid:
http://searchfox.org/mozilla-central/rev/b318c7dca7392bd16c0b11929f55b1be133f0b31/xpcom/ds/nsExpirationTracker.h#185

It's hard to identify the abuse of this scenario by the code review in its sub-class.
I'd like to update a formal patch to have better error handling on this and add some MOZ_DIAGNOSTIC_ASSERT to catch the possible scenario from user trial.

Attachment #8871667 - Flags: feedback?(nfroyd)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 7

•

7 years ago

Attached patch Part 1: Improve Error Handling in {Add|Remove}ObjectLocked(). — Details — Splinter Review

I can found obvious error that cause this crash after further review in the nsExpirationTracker and its subclasses.

I'd like to land this patch and leave this bug open for further investigation in nightly build:
1. Add error handling of {Add|Remove}ObjectLocked if nsExpirationState::IsTracked() is wrong in beta/release build or hit MOZ_DIAGNOSTIC_ASSERT in nightly build.
2. Add MOZ_DIAGNOSTIC_ASSERT for thread-safety check.

Hi Nathan,

May I have your review for these change?

Thanks!

Attachment #8871667 - Attachment is obsolete: true

Attachment #8872919 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 8

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

Review of attachment 8872919 [details] [diff] [review]:
-----------------------------------------------------------------

It's not clear to me what error you have found from your description, nor is it clear how these changes would address the error.  (Skimming the first couple crashes, my guess is that the first two diagnostic assertions would help identify places where we might be making mistakes?)  We could get more information about the crash, but we also might wind up crashing Nightly in not-actually-problematic cases.  Please explain further.

::: xpcom/ds/nsExpirationTracker.h
@@ +148,5 @@
>    {
>      nsExpirationState* state = aObj->GetExpirationState();
> +    if (NS_WARN_IF(state->IsTracked())) {
> +      MOZ_DIAGNOSTIC_ASSERT(false, "Tried to add an object that's already tracked");
> +      return NS_ERROR_UNEXPECTED;

I think this makes sense, though I'm not sure it should crash on Nightly.  Why not just return an error?

@@ +179,5 @@
>    {
>      nsExpirationState* state = aObj->GetExpirationState();
> +    if (NS_WARN_IF(!state->IsTracked())) {
> +      MOZ_DIAGNOSTIC_ASSERT(false, "Tried to remove an object that's not tracked");
> +      return;

Likewise here.

@@ +473,5 @@
>    }
>  
>    Lock& GetMutex() override
>    {
> +    MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread());

These assertions I am unsure about.  What if there is careful locking done by the user of nsExpirationTracker?  We'd rather they used the locked version, but if they are using things correctly, then these calls will just crash Nightly for no reason.

Attachment #8872919 - Flags: review?(nfroyd) → review-

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 9

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #8)
> Comment on attachment 8872919 [details] [diff] [review]
> (v1) Improve Error Handling in {Add|Remove}ObjectLocked().
> 
> Review of attachment 8872919 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's not clear to me what error you have found from your description, nor is
> it clear how these changes would address the error.  (Skimming the first
> couple crashes, my guess is that the first two diagnostic assertions would
> help identify places where we might be making mistakes?)  We could get more
> information about the crash, but we also might wind up crashing Nightly in
> not-actually-problematic cases.  Please explain further.
> 
Sorry, I just realized a typo in comment 7. It's "I can't found obvious error" instead of "I can".... :\
Actually, I've done self-review on ExpirationTracker's implementation again, but
I didn't see any reason that the refactoring in bug 1350177 could introduce these new crash.
(I am not sure if you agree as well.)
It might be some change of the use in its subclass that introduces this error which I would like to narrow down instead.

The only 2 suspicious points I can figure out from nsExpirationTracker's implementation itself are:
1. The thread-safety problem we didn't enforce yet in the main-thread only class nsExpirationTracker compared to ExpirationTrackerImpl which always includes mutex and we never see any crash again from it sub-class: SurfaceCache after this mutex lock is introduced.
2. Add/Remove object with improper nsExpirationState::IsTracked().

The reason why I added MOZ_DIAGNOSTIC_ASSERT for these 2 cases is to see if my assumption is correct by getting proactive feedback from the crash report in nightly, then remove the DIAGNOSTIC_ASSERT immediately after that and file more bugs to break down this problem if possible. (I can't see any crash of these assertion after push the patch try server, so I'd like to get further information from nightly users if possible.)
Otherwise, we just suppress the potential problem if we only return the error here.

Does this make sense to you?

Flags: needinfo?(nfroyd)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 10

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

Review of attachment 8872919 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/ds/nsExpirationTracker.h
@@ +473,5 @@
>    }
>  
>    Lock& GetMutex() override
>    {
> +    MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread());

I assume that the nsExpirationTracker is a main-thread-access-only class.
So any access to its instance should be on the main thread, shouldn't it?
If we got crash from this, then we can file a bug on its sub-class according to the call stack to enforce it either using the locked version, or ensure the access is main-thread only.

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 11

•

7 years ago

If adding crash of these use cases in nightly is not recommended, I'd like to replace them with MOZ_ASSERT instead since I didn't see any crash from try server. Then, at least, we could ensure that there is no more improper use in debug build.

Nathan Froyd [:froydnj]

Updated

•

7 years ago

status-firefox54: affected → wontfix

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Updated

•

7 years ago

status-firefox54: wontfix → fix-optional

Nathan Froyd [:froydnj]

Comment 12

•

7 years ago

Sorry, I accidentally cleared the needinfo flag when setting status-firefox54!

With Add/RemoveObject, I see that we already have documentation that says these objects being added/removed must not be/must be tracked, respectively.  So maybe it really is reasonable that people should be using the API correctly and not expecting erroneous calls to be idempotent.

Your point in comment 10 that nsExpirationTracker should really be main-thread-only is well taken.

I guess we could try landing the patch in its current state to try and get more information.  Please don't land it this week, though, since we're in a soft code freeze.  Unfortunately, this bug is marked as sec-high, which means that it will take a while to land patches since we've moved to landing all sec-approvals in a batch.  Perhaps your sec-approval request can state that this is purely a diagnostic patch, rather than a fix, and we could get this landed sooner rather than later?

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 13

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

This patch was r-, but it's worthy to give it a trial per suggestion in comment 12.
I'd like to grant r+ before asking for landing this diagnostic patch after FF55 is frozen.

Attachment #8872919 - Flags: review- → review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 14

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

Review of attachment 8872919 [details] [diff] [review]:
-----------------------------------------------------------------

Please land this around the second week of the 56 cycle.

Attachment #8872919 - Flags: review?(nfroyd) → review+

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 15

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #14)
> Please land this around the second week of the 56 cycle.
Will do!

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 16

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

[Security approval request comment]
Q: How easily could an exploit be constructed based on the patch?
A: It's not easy because there is no known use case to explicitly remove an 
   untracked object according the implementation of each tracker subclass instance
   yet. What we are doing in this patch is to
   1.  Replace useless MOZ_ASSERT in BETA_OR_RELEASE build with explicit return.
   2.  MOZ_DIAGNOSTIC_ASSERT is added to figure out the abuse of this tracker 
       from its subclasses.

Q: Do comments in the patch, the check-in comment, or tests included in the patch 
   paint a bulls-eye on the security problem?
A: Not much, the comment only tells the reader that we added some error handling.

Q: Which older supported branches are affected by this flaw?
A: FF54.

Q: If not all supported branches, which bug introduced the flaw?
A: Bug 1350177.

Q: Do you have backports for the affected branches?
   If not, how different, hard to create, and risky will they be?
A: Yes, but this is a diagnostic patch instead. Root cause is still unknown yet.

Q: How likely is this patch to cause regressions; how much testing does it need?
A: Not likely to cause more regressiones.
   This is only a diagnostic patch with better error handling.

Attachment #8872919 - Flags: sec-approval?

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox54: fix-optional → wontfix

status-firefox56: --- → affected

tracking-firefox55: --- → ?

tracking-firefox56: --- → ?

Julien Cristau [:jcristau] (back April 22)

Comment 17

•

7 years ago

sec-high, tracking for 55/56

tracking-firefox55: ? → +

tracking-firefox56: ? → +

Nathan Froyd [:froydnj]

Comment 18

•

7 years ago

(In reply to Bevis Tseng [:bevis][:btseng] from comment #16)
> Q: Which older supported branches are affected by this flaw?
> A: FF54.
> 
> Q: If not all supported branches, which bug introduced the flaw?
> A: Bug 1350177.

I don't think this is an accurate summary of the situation.  We were trying to address a (different) security bug in bug 1350177, but the crashes for this bug indicate that we just moved the signature around.  So this should really be all supported branches, even if the signature for this particular bug doesn't show up on older branches.

Component: MFBT → XPCOM

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 19

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #18)
> I don't think this is an accurate summary of the situation.  We were trying
> to address a (different) security bug in bug 1350177, but the crashes for
> this bug indicate that we just moved the signature around.  So this should
> really be all supported branches, even if the signature for this particular
> bug doesn't show up on older branches.

Right, thanks for correcting this!

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 20

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

Just hit MOZ_ASSERT(NS_IsMainThread()) in gfxFontCache in Linux Stylo build:
https://treeherder.mozilla.org/logviewer.html#?job_id=108485121&repo=try&lineNumber=12077
Which implies race conditions in gfxFontCache and shall be replaced with ExpirationTrackerImpl with Lock protected instead.

Attachment #8872919 - Flags: sec-approval?

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 21

•

7 years ago

:jfkthame, can you help to check the problem we found in comment 20?

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 22

•

7 years ago

:heycam, :bholley - what's the expectation w.r.t. gfxFont and thread usage in stylo? From comment 20, it looks like a gfxFont is being created and accessed (to get metrics) off-main-thread, which I wouldn't expect to be safe. Do we need to be making gfxFont and gfxFontCache thread-safe, or should stylo avoid doing this?

(Just making the gfxFontCache thread-safe to address the immediate issue from comment 20 probably isn't sufficient, as the gfxFont objects themselves aren't safe either.)

Flags: needinfo?(jfkthame)

Flags: needinfo?(cam)

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Comment 23

•

7 years ago

Forwarding the font-safety question to Manish.

Flags: needinfo?(bobbyholley) → needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Comment 24

•

7 years ago

Cameron worked on the final safety stuff for font metrics and understands this better.

Flags: needinfo?(manishearth)

Cameron McCormack (:heycam)

Comment 25

•

7 years ago

The expectation is that gfxFont objects can be created from style worker threads, but that this will only happen while the Servo font metrics mutex is locked.  In that stack trace, that's mutex is acquired in Gecko_GetFontMetrics.  So only one style worker thread can interact with the gfxFontCache / gfxFonts at one time.  And while the style worker threads are running, the main thread is blocked.  So I expect these operations to be safe.

Could we specialize the assertion for gfxFontCache so that it calls AssertIsMainThreadOrServoFontMetricsLocked() instead of MOZ_ASSERT(NS_IsMainThread())?

Flags: needinfo?(cam)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 26

•

7 years ago

What we want to improve in the patch is to ensure that nsExpirationTracker won't be abused in mutil-thread environment.
For the trackers that could be accessed by multiple threads, ExpirationTrackerImpl is correct option.

Hence, if gfxFontCache could be accessed by multiple threads, I'd suggest to inherit from ExpirationTrackerImpl.
If the lock is not required, you can reuse the PlaceholderLock and override ExpirationTrackerImpl::GetMutex() with AssertIsMainThreadOrServoFontMetricsLocked() instead.

(It not a good idea to put AssertIsMainThreadOrServoFontMetricsLocked() in nsExpirationTracker::GetMutex() because nsExpirationTracker is not only inherited by gfxFontCache.)

Cameron McCormack (:heycam)

Comment 27

•

7 years ago

(In reply to Bevis Tseng [:bevis][:btseng] from comment #26)
> Hence, if gfxFontCache could be accessed by multiple threads, I'd suggest to
> inherit from ExpirationTrackerImpl.
> If the lock is not required, you can reuse the PlaceholderLock and override
> ExpirationTrackerImpl::GetMutex() with
> AssertIsMainThreadOrServoFontMetricsLocked() instead.

Yes, we don't need to acquire the lock, just assert that it is already acquired (or we're on the main thread).  So that sounds good to me.  Please needinfo if you'd like me to write that patch, or I'm happy to review such a change.

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 28

•

7 years ago

(In reply to Cameron McCormack (:heycam) (away 23 Jun) from comment #27)
> Yes, we don't need to acquire the lock, just assert that it is already
> acquired (or we're on the main thread).  So that sounds good to me.  Please
> needinfo if you'd like me to write that patch, or I'm happy to review such a
> change.
We'll need your help to improve this, thanks!

Flags: needinfo?(cam)

Cameron McCormack (:heycam)

Comment 29

•

7 years ago

Bevis, if I inherit from ExpirationTrackerImpl, and I would like to keep gfxFontCache using the API that doesn't require the AutoLock arguments everywhere, I'll need to duplicate all the methods on nsExpirationTracker.  Do you think that's the best approach, or do you have another suggestion?

Flags: needinfo?(cam) → needinfo?(btseng)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 30

•

7 years ago

IMHO, making gfxFontCache as a Compositor of your new ExpirationTrackerImpl subclass could to meet your requirement, but yes, you'll have to duplicate the methods and hide the detail of the AutoLock inside gfxFontCache.

BTW, the refactoring of gfxFontCache seems not a security bug.
You can fire another bug that blocks this bug for the refactor of gfxFontCache.

Flags: needinfo?(btseng)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 31

•

7 years ago

NI as a reminder of refactoring gfxFontCache because this bug is marked as sec-high.

Flags: needinfo?(cam)

Cameron McCormack (:heycam)

Comment 32

•

7 years ago

Attached patch Part 2: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked. — Details — Splinter Review

Sorry for the delay here.

Flags: needinfo?(cam)

Attachment #8896684 - Flags: review?(jfkthame)

Attachment #8896684 - Flags: review?(btseng)

Jonathan Kew [:jfkthame]

Updated

•

7 years ago

Attachment #8896684 - Flags: review?(jfkthame) → review+

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 33

•

7 years ago

Comment on attachment 8896684 [details] [diff] [review]
Part 2: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked.

Review of attachment 8896684 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

Attachment #8896684 - Flags: review?(btseng) → review+

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Updated

•

7 years ago

Attachment #8872919 - Attachment description: (v1) Improve Error Handling in {Add|Remove}ObjectLocked(). → Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Updated

•

7 years ago

Attachment #8896684 - Attachment description: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked. → Part 2: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked.

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 34

•

7 years ago

Comment on attachment 8872919 [details] [diff] [review]
Part 1: Improve Error Handling in {Add|Remove}ObjectLocked().

[Security approval request comment]
Q: How easily could an exploit be constructed based on the patch?
A: It's not easy because there is no known use case to explicitly remove an 
   untracked object according the implementation of each tracker subclass instance
   yet. What we are doing in this patch is to
   1.  Replace useless MOZ_ASSERT in BETA_OR_RELEASE build with explicit return.
   2.  MOZ_DIAGNOSTIC_ASSERT is added to figure out the abuse of this tracker 
       from its subclasses.

Q: Do comments in the patch, the check-in comment, or tests included in the 
   patch paint a bulls-eye on the security problem?
A: Not much, the comment only tells the reader that we improved some error handling.

Q: Which older supported branches are affected by this flaw?
A: All branches.

Q: If not all supported branches, which bug introduced the flaw?
A: N/A

Q: Do you have backports for the affected branches?
   If not, how different, hard to create, and risky will they be?
A: Yes, but this is a diagnostic patch instead. 
   We'd like to collect more information from nightly to identify the root cause.

Q: How likely is this patch to cause regressions; how much testing does it need?
A: Not likely to cause more regressions.
   This is only a diagnostic patch with better error handling.
   Testing in treeherder shall be good enough.

Attachment #8872919 - Flags: sec-approval?

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 35

•

7 years ago

Comment on attachment 8896684 [details] [diff] [review]
Part 2: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked.

[Security approval request comment]
Q: How easily could an exploit be constructed based on the patch?
A: It's not easy to create a race condition among main thread and style worker
   threads according to comment 25.
   This patch is the dependency of patch part 1 to ensure that, in all instances 
   of ExpirationTrackerImpl class, proper assertions or error handling are
   included when data is corrupted/inconsistent.

Q: Do comments in the patch, the check-in comment, or tests included in the 
   patch paint a bulls-eye on the security problem?
A: Not much, the comments only tell the reader that more assertion is added.

Q: Which older supported branches are affected by this flaw?
A: All branches.

Q: If not all supported branches, which bug introduced the flaw?
A: N/A

Q: Do you have backports for the affected branches?
   If not, how different, hard to create, and risky will they be?
A: Yes, but this is a diagnostic patch instead.
   We'd like to collect more information from nightly to identify the root cause.

Q: How likely is this patch to cause regressions; how much testing does it need?
A: Not likely to cause more regressions by adding proper assertions.
   Testing in treeherder shall be good enough.

Attachment #8896684 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 36

•

7 years ago

Sec-approval+ for checkin in August 28 into trunk (three weeks into the current cycle). We'll want a Beta patch made and nominated to land after it lands on trunk as well.

Whiteboard: [checkin on 8/28]

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Attachment #8872919 - Flags: sec-approval? → sec-approval+

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Attachment #8896684 - Flags: sec-approval? → sec-approval+

Carsten Book [:Tomcat]

Comment 37

•

7 years ago

just for reference, seems this landed and got backedout like:
https://hg.mozilla.org/mozilla-central/rev/c29f8828b859
https://hg.mozilla.org/mozilla-central/rev/2ae749fba6a0

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 38

•

7 years ago

(In reply to Carsten Book [:Tomcat] from comment #37)
> just for reference, seems this landed and got backedout like:
> https://hg.mozilla.org/mozilla-central/rev/c29f8828b859
> https://hg.mozilla.org/mozilla-central/rev/2ae749fba6a0

Yes, I didn't notice "[checkin on 8/28]" on the whiteboard added in comment 36 earlier. That's why I backed them out later.

Ryan VanderMeulen [:RyanVM]

Comment 39

•

7 years ago

(In reply to Bevis Tseng [:bevis][:btseng] from comment #38)
> Yes, I didn't notice "[checkin on 8/28]" on the whiteboard added in comment
> 36 earlier. That's why I backed them out later.

There's really no point in doing that - by landing on inbound you've already let the cat out of the bag and defeated the purpose of the sec-approval. Might as well re-land at this point so we can get it uplifted around ASAP.

Flags: needinfo?(btseng)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 40

•

7 years ago

Pushed to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a273c823e19e1048d78f9e6b1ddedf8cc034bd6
https://hg.mozilla.org/integration/mozilla-inbound/rev/26e12505e1aebb4fff765f4248d60dac73a71c48

Flags: needinfo?(btseng)

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Comment 41

•

7 years ago

Attached patch (beta) Improve Error Handling in {Add|Remove}ObjectLocked(). r=froydnj,jfkthame — Details — Splinter Review

Fold patches for beta approval.

Approval Request Comment
[Feature/Bug causing the regression]:N/A
[User impact if declined]:Potential risk of incorrect memory access.
[Is this code covered by automated tests?]:Yes
[Has the fix been verified in Nightly?]:Not yet but soon to be verified (patches are just landed to inbound)
[Needs manual test from QE? If yes, steps to reproduce]:No 
[List of other uplifts needed for the feature/fix]:No
[Is the change risky?]:The risk is low.
[Why is the change risky/not risky?]:We only add proper error handling for incorrect status check, nullptr check and thread-safety check.
[String changes made/needed]:No

Attachment #8897426 - Flags: review+

Attachment #8897426 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Comment 42

•

7 years ago

This was backed out while investigating new test failures on inbound and then re-landed after it was confirmed to not be at fault.

https://hg.mozilla.org/integration/mozilla-inbound/rev/66c06afe139182f9c3b59aa4b01757cfc1197d78
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4dc3119f39c965314893aaf078016abb3f0487d

FWIW, the reason this caused concern was that it appears to have caused a new warning to appear in the logs:
https://treeherder.mozilla.org/logviewer.html#?job_id=123275069&repo=mozilla-inbound&lineNumber=7411
WARNING: Fonts still alive while shutting down gfxFontCache: 'mFonts.Count() == 0', file /home/worker/workspace/build/src/gfx/thebes/gfxFont.cpp, line 217

Whiteboard: [checkin on 8/28]

Al Billings [:abillings - ex-MoCo]

Comment 43

•

7 years ago

(In reply to Bevis Tseng [:bevis][:btseng] from comment #38)
 https://hg.mozilla.org/mozilla-central/rev/2ae749fba6a0
> 
> Yes, I didn't notice "[checkin on 8/28]" on the whiteboard added in comment
> 36 earlier. That's why I backed them out later.

As Ryan says, there is no point in backing out once you've exposed a security fix.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 44

•

7 years ago

Merge of the re-landed patches:
https://hg.mozilla.org/mozilla-central/rev/66c06afe1391
https://hg.mozilla.org/mozilla-central/rev/b4dc3119f39c

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Flags: needinfo?(btseng)

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Bevis Tseng[:bevis][:btseng](Exited)

Assignee

Updated

•

7 years ago

Flags: needinfo?(btseng)

Gerry Chang [:gchang]

Comment 45

•

7 years ago

Comment on attachment 8897426 [details] [diff] [review]
(beta) Improve Error Handling in {Add|Remove}ObjectLocked(). r=froydnj,jfkthame

Fix a security issue. Beta56+.

Attachment #8897426 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 46

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/91a92e4f9136

status-firefox55: affected → wontfix

status-firefox56: affected → fixed

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: dom-core-security → core-security-release

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Whiteboard: [adv-main56+]

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

7 years ago

Flags: qe-verify-

Whiteboard: [adv-main56+] → [adv-main56+][post-critsmash-triage]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

(WIP) Add MOZ_DIAGNOSTIC_ASSERT for correctness check. 7 years ago Bevis Tseng[:bevis][:btseng](Exited) 3.41 KB, patch		Details \| Diff \| Splinter Review
Part 1: Improve Error Handling in {Add\|Remove}ObjectLocked(). 7 years ago Bevis Tseng[:bevis][:btseng](Exited) 2.88 KB, patch	froydnj : review+ abillings : sec-approval+	Details \| Diff \| Splinter Review
Part 2: Make gfxFontCache use an expiration tracker that can assert the Servo font metrics mutex is locked. 7 years ago Cameron McCormack (:heycam) 6.39 KB, patch	bevis : review+ jfkthame : review+ abillings : sec-approval+	Details \| Diff \| Splinter Review
(beta) Improve Error Handling in {Add\|Remove}ObjectLocked(). r=froydnj,jfkthame 7 years ago Bevis Tseng[:bevis][:btseng](Exited) 9.25 KB, patch	bevis : review+ gchang : approval-mozilla-beta+	Details \| Diff \| Splinter Review