Closed Bug 1400435 Opened 2 years ago Closed 2 years ago

stylo: heap write hazard in Gecko_CSSValue_SetNumber

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: sfink, Assigned: bholley)

References

Details

Attachments

(5 files, 1 obsolete file)

This may be a false positive, but someone will need to look at it. To start with, is nsAutoRefCnt threadsafe?

Analyzing Gecko_CSSValue_SetAbsoluteLength ...
Error: Field write nsAutoRefCnt.mValue
Location: _ZN12nsAutoRefCntaSEm$uint64 nsAutoRefCnt::operator=(uint64) @ /builds/worker/checkouts/gecko/obj-analyzed/dist/include/nsISupportsImpl.h#314
Stack Trace:
_ZN7mozilla3css20FontFamilyListRefCnt7ReleaseEv$uint32 mozilla::css::FontFamilyListRefCnt::Release() @ /builds/worker/checkouts/gecko/obj-analyzed/dist/include/nsCSSValue.h#313
_ZN10nsCSSValue7DoResetEv$void nsCSSValue::DoReset() @ /builds/worker/checkouts/gecko/layout/style/nsCSSValue.cpp#457 ### SafeArguments: this
_ZN10nsCSSValue5ResetEv$void nsCSSValue::Reset() @ /builds/worker/checkouts/gecko/obj-analyzed/dist/include/nsCSSValue.h#884 ### SafeArguments: this
_ZN10nsCSSValue13SetFloatValueEf9nsCSSUnit$void nsCSSValue::SetFloatValue(float32, uint32) @ /builds/worker/checkouts/gecko/layout/style/nsCSSValue.cpp#490 ### SafeArguments: this
_ZN10nsCSSValue20SetIntegerCoordValueEi$void nsCSSValue::SetIntegerCoordValue(int32) @ /builds/worker/checkouts/gecko/layout/style/nsCSSValue.cpp#535 ### SafeArguments: this
Gecko_CSSValue_SetAbsoluteLength @ /builds/worker/checkouts/gecko/layout/style/ServoBindings.cpp#2183 ### SafeArguments: aCSSValue
Priority: -- → P2
Summary: stylo heap write hazard in Gecko_CSSValue_SetAbsoluteLength → stylo: heap write hazard in Gecko_CSSValue_SetAbsoluteLength
But there is no function called "Gecko_CSSValue_SetAbsoluteLength" anymore in the current codebase?
Flags: needinfo?(sphink)
Yeah this was removed recently in bug 1392161.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(sphink)
Resolution: --- → DUPLICATE
Duplicate of bug: 1392161
I think this just effectively changed its name:

[29.17s] #193 Analyzing Gecko_CSSValue_SetNumber ...
Error: Field write nsAutoRefCnt.mValue
Location: _ZN12nsAutoRefCntaSEm$uint64 nsAutoRefCnt::operator=(uint64) @ obj-analyzed/dist/include/nsISupportsImpl.h#314
Stack Trace:
_ZN7mozilla3css20FontFamilyListRefCnt7ReleaseEv$uint32 mozilla::css::FontFamilyListRefCnt::Release() @ obj-analyzed/dist/include/nsCSSValue.h#313
_ZN10nsCSSValue7DoResetEv$void nsCSSValue::DoReset() @ layout/style/nsCSSValue.cpp#457 ### SafeArguments: this
_ZN10nsCSSValue5ResetEv$void nsCSSValue::Reset() @ obj-analyzed/dist/include/nsCSSValue.h#884 ### SafeArguments: this
_ZN10nsCSSValue13SetFloatValueEf9nsCSSUnit$void nsCSSValue::SetFloatValue(float32, uint32) @ layout/style/nsCSSValue.cpp#490 ### SafeArguments: this
Gecko_CSSValue_SetNumber @ layout/style/ServoBindings.cpp#2151 ### SafeArguments: aCSSValue
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: stylo: heap write hazard in Gecko_CSSValue_SetAbsoluteLength → stylo: heap write hazard in Gecko_CSSValue_SetNumber
I think the fix here is simply a MOZ_ASSERT(mainthread) before the non-thread-safe nscssvalue members in the nsCSSValue reset() function. We are *usually* operating on already-null nsCSSValues, and even when not, these are values that usually contain simpler primitives.
Branches in DoReset contains "MOZ_ASSERT(NS_IsInCompositorThread() || !ServoStyleSet::IsInServoTraversal())". Shouldn't that be enough for static analysis?
IIRC the analysis only understands NS_IsMainThread()?
Flags: needinfo?(sphink)
(In reply to Manish Goregaokar [:manishearth] from comment #6)
> IIRC the analysis only understands NS_IsMainThread()?

It's supposed to also understand !IsInServoTraversal, and treat it equivalently to IsMainThread.
Ok, sounds like this is a bug in the analysis.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → WORKSFORME
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> (In reply to Manish Goregaokar [:manishearth] from comment #6)
> > IIRC the analysis only understands NS_IsMainThread()?
> 
> It's supposed to also understand !IsInServoTraversal, and treat it
> equivalently to IsMainThread.

The present of NS_IsInCompositorThread() may be the problem, I guess...

I'm not sure why !IsInServoTraversal should be treated equivalent to IsMainThread either.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #9)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> > (In reply to Manish Goregaokar [:manishearth] from comment #6)
> > > IIRC the analysis only understands NS_IsMainThread()?
> > 
> > It's supposed to also understand !IsInServoTraversal, and treat it
> > equivalently to IsMainThread.
> 
> The present of NS_IsInCompositorThread() may be the problem, I guess...
> 
> I'm not sure why !IsInServoTraversal should be treated equivalent to
> IsMainThread either.

I guess we should change it to:

MOZ_ASSERT(!IsInServoTraveral() && (NS_IsMainthread() || NS_IsCompositorThread());

The first part should be enough for the analysis.

Does that sound right?
Status: RESOLVED → REOPENED
Flags: needinfo?(xidorn+moz)
Resolution: WORKSFORME → ---
The part of NS_IsInCompositorThread() was added in bug 1351026, and that bug titled "nsCSSValue::Reset asserts when OMTA runs concurrently with parallel traversal", so I suppose the new assertion is not going to work...

Probably the current one is really what we want, but I'm unsure how should we verify statically that it's safe here...
Flags: needinfo?(xidorn+moz)
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #9)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> > (In reply to Manish Goregaokar [:manishearth] from comment #6)
> > > IIRC the analysis only understands NS_IsMainThread()?
> > 
> > It's supposed to also understand !IsInServoTraversal, and treat it
> > equivalently to IsMainThread.
> 
> The present of NS_IsInCompositorThread() may be the problem, I guess...

That's correct. It treats !IsInServoTraversal as an indication that it's on the main thread and therefore you can write to whatever heap addresses you want. In this case, it's running code but it doesn't know whether !IsInServoTraversal is true or not (because it might be NS_IsInCompositorThread instead.)

...but if you want me to teach the analysis differently, you'll need to explain it to me first. :) Why is it ok to do unlocked heap writes on the compositor thread? I get that there's only one compositor thread, but don't you need to protect against races between the main thread and the compositor?

> 
> I'm not sure why !IsInServoTraversal should be treated equivalent to
> IsMainThread either.

The IsInServoTraversal predates my involvement, it's a bhackett thing. But looking at code, I'm pretty sure it's because there's a typical pattern if (IsInServoTraversal()) { do threadsafe stuff } else { do whatever you want }. For example:

static inline bool
IsSignificantChildMaybeThreadSafe(const nsIContent* aContent,
                                  bool aTextIsSignificant,
                                  bool aWhitespaceIsSignificant)
{
  if (ServoStyleSet::IsInServoTraversal()) {
    // See bug 1349100 for optimizing this
    return nsStyleUtil::ThreadSafeIsSignificantChild(aContent,
                                                     aTextIsSignificant,
                                                     aWhitespaceIsSignificant);
  } else {
    auto content = const_cast<nsIContent*>(aContent);
    return IsSignificantChild(content, aTextIsSignificant, aWhitespaceIsSignificant);
  }
}

At any rate, I would rather not close this bug, since I need to do *something* here. If this is indeed a false positive, then it need not block stylo.

Also, I should mention that this same hazard is now reported for Gecko_CSSValue_SetKeyword as well.
Sorry, was working off an old version of the bug and didn't see the last two comments before posting comment 12.
Probably we should just assert that we are in the main thread for all the Gecko_CSSValue_ entry... or alternatively we assert they are all operating on null value so that we shouldn't be releasing anything.
Note: I will probably end up pushing bug 1400442 with an annotation for all Gecko_CSSValue_Set* functions to ignore this ref count, leaving this bug open to track the issue. (Without an annotation, this bug is responsible for 11 hazards.)
Flags: needinfo?(sphink)
I think I know what to do here.
Assignee: nobody → bobbyholley
We'd then just teach the analysis about this function.

MozReview-Commit-ID: KFdgtxyOZ01
Attachment #8909565 - Flags: review?(xidorn+moz)
Comment on attachment 8909565 [details] [diff] [review]
Use a more precise check in the nsCSSValue destructor. v1

Hm, looks like PR thread name doesn't get inherited from OS thread name.
Attachment #8909565 - Flags: review?(xidorn+moz)
MozReview-Commit-ID: KFdgtxyOZ01
Attachment #8909576 - Flags: review?(xidorn+moz)
Attachment #8909565 - Attachment is obsolete: true
Comment on attachment 8909576 [details] [diff] [review]
Use a more precise check in the nsCSSValue destructor. v2

Review of attachment 8909576 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see how this is more helpful than the current check. This is actually even looser than the current.

IIUC, MOZ_ASSERT(mainthread) is safe because that excludes any other threads to involve. Anything looser than that shouldn't be considered safe from the perspective of static analysis.

So I'm now a bit confused about the scope of the static analysis. Is it only for analyzing FFI calls to the C++ side? Or it also intends to figure out other race conditions between threads inside C++? If the latter is also in scope, if we whitelist any check like this, could that become a footgun we would overlook in the future?

In any case, I believe there would still be some work need to be done in the analysis script, which you would need to convince Steve with.

::: layout/style/ServoStyleSet.cpp
@@ +39,5 @@
>  
>  ServoStyleSet* ServoStyleSet::sInServoTraversal = nullptr;
>  
> +bool
> +ServoStyleSet::IsCurrentThreadInServoTraversal()

You need wrapping this function with `#ifdef DEBUG` as well.
Attachment #8909576 - Flags: review?(xidorn+moz)
Comment on attachment 8909576 [details] [diff] [review]
Use a more precise check in the nsCSSValue destructor. v2

Review of attachment 8909576 [details] [diff] [review]:
-----------------------------------------------------------------

Discussed with bholley via IRC. I'm convinced that the new check can be more helpful because we can teach the static analyzer about the new IsCurrentThreadInServoTraversal() check. It makes sense as far as this analysis is specific to stylo parallel traversal.
Attachment #8909576 - Flags: review+
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #22)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=75288d8b6db9ddb09bf6162a1cc78babb8c9a11e
> 
> https://github.com/servo/servo/pull/18555

Turns out that thread_state::get() totally doesn't do the right thing, because it assumes that any thread it hasn't seen before is a rayon thread. Fixing that.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=24aba1f49e512740f417b02fbd3b54c1f4836eb6
I don't need this per se, but it seems like a footgun for the methods to
return incorrect information depending on the build type. I don't see anywhere
where the overhead would be at all significant.

MozReview-Commit-ID: G1qyUFhI0aB
Attachment #8909938 - Flags: review?(manishearth)
MozReview-Commit-ID: 2d7lvQx7jOf
Attachment #8909939 - Flags: review?(manishearth)
Attachment #8909938 - Flags: review?(manishearth) → review+
Attachment #8909939 - Flags: review?(manishearth) → review+
Attachment #8909940 - Flags: review?(manishearth) → review+
MozReview-Commit-ID: LdX4k3hDysU
Attachment #8910005 - Flags: review?(sphink)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #27)
> https://github.com/servo/servo/pull/18566

This landed as https://hg.mozilla.org/integration/autoland/rev/7e1111a6f052

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #29)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=1535b3f70ee924b69dffa3b66f553d1736f7e5e4

This is to test the hazard analysis fix in comment 28.
Comment on attachment 8910005 [details] [diff] [review]
Remove special case from analysis. v1

Review of attachment 8910005 [details] [diff] [review]:
-----------------------------------------------------------------

Yep, that's definitely the idea. Hopefully it'll work. r=me pending the try results.

::: js/src/devtools/rootAnalysis/analyzeHeapWrites.js
@@ -1094,5 @@
>  
> -    var armed = false;
> -    if (name.includes("Gecko_CSSValue_Set"))
> -        armed = true;
> -

Oops. That was debugging code. Thanks.
Attachment #8910005 - Flags: review?(sphink) → review+
Error: External function
Location: Servo_IsWorkerThread
Stack Trace:
_ZN7mozilla13ServoStyleSet31IsCurrentThreadInServoTraversalEv$uint8 mozilla::ServoStyleSet::IsCurrentThreadInServoTraversal() @ layout/style/ServoStyleSet.cpp#46
_ZN10nsCSSValue7DoResetEv$void nsCSSValue::DoReset() @ layout/style/nsCSSValue.cpp#462 ### SafeArguments: this
_ZN10nsCSSValue5ResetEv$void nsCSSValue::Reset() @ obj-analyzed/dist/include/nsCSSValue.h#927 ### SafeArguments: this
_ZN10nsCSSValue13SetFloatValueEf9nsCSSUnit$void nsCSSValue::SetFloatValue(float32, uint32) @ layout/style/nsCSSValue.cpp#493 ### SafeArguments: this
Gecko_CSSValue_SetNumber @ layout/style/ServoBindings.cpp#2158 ### SafeArguments: aCSSValue

I guess I need to whitelist Servo_IsWorkerThread?
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e5bd13138738
Use a more precise check in the nsCSSValue destructor. r=xidorn
https://hg.mozilla.org/integration/autoland/rev/94f70af939b5
Remove special case from analysis. r=sfink
https://hg.mozilla.org/mozilla-central/rev/e5bd13138738
https://hg.mozilla.org/mozilla-central/rev/94f70af939b5
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #32)
> I guess I need to whitelist Servo_IsWorkerThread?

Yes. It might actually be safe to whitelist anything beginning with Servo_, but I didn't want to do that until one of you told me it was ok. (The idea being that I don't *think* any of servo's code would do any racy heap writes directly? It's ok if they do them by calling through Gecko_ stuff, since those will be caught separately.)
You need to log in before you can comment on or make changes to this bug.