Closed Bug 1354966 Opened 3 years ago Closed 2 years ago

stylo: Add tracking for gfxPlatform/widget calls in the static analysis

Categories

(Core :: CSS Parsing and Computation, enhancement, P2)

enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: emilio, Assigned: sfink)

References

Details

Attachments

(1 obsolete file)

So we're going to have static analysis in bug 1294915.

That analysis is going to run in linux64. We would not analyze OSX/Windows platform code.

We should audit that code, somehow.
Priority: -- → P3
Priority: P3 → --
Priority: -- → P3
Blocks: 1391787
Chris, I think we should find an owner for this, and this should probably not be just P3 (see bug 1391787, for example).

I'm happy to do gtk, but not have any OSX or Windows machines (and I'd prefer to avoid it if possible).:
Flags: needinfo?(cpeterson)
Priority: P3 → P2
Given that we now have cross-compiling for mac build, probably it wouldn't be too hard to extend the analysis to that platform?

Windows is also doable if we support clang-cl I suppose, but probably it isn't an option for now. How should we do the analysis manually?
(In reply to Emilio Cobos Álvarez [:emilio] from comment #0)
> That analysis is going to run in linux64. We would not analyze OSX/Windows
> platform code.
> 
> We should audit that code, somehow.

Emilio, do we need additional support from the hazard static analysis scripts? Or do we just need to add platform-specific widget functions to the function lists in analyzeHeapWrites.js?

How much platform-specific widget code is called over servo/gecko FFI? Can we create one platform-independent facade C API (that calls out to each of the platform-specific APIs with C/C++ #ifdefs) so we only need to add static analysis for one set of FFI functions?

Also, can you please CC me on bug 1391787 so I see an example of the platform FFI issues? I don't have permission to view it.
I think in that bug I added a mutex to the only remaining widget code that was called... Still worth to see if it's really the only one.

Sure, will cc you.
Emilio, I'm picking up bugs. Feel free to pass this one to me.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> I think in that bug I added a mutex to the only remaining widget code that
> was called... Still worth to see if it's really the only one.

I just need someone to sanity-check ^, so feel free too, should take a few minutes, hopefully :)
Assignee: nobody → bwerth
Blocks: stylo-release
No longer blocks: stylo
Flags: needinfo?(cpeterson)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> I think in that bug I added a mutex to the only remaining widget code that
> was called... Still worth to see if it's really the only one.

The only other direct LookAndFeel call in ServoBindings is Gecko_nsFont_InitSystem, and that already has a mutex proteccting it. I'll see if I can rig up something to detect "deep" calls through ServoBindings calls.
(In reply to Brad Werth [:bradwerth] from comment #7)
> I'll see if I can rig up something to detect "deep" calls through ServoBindings calls.

This is hard to do quickly. I was able to annotate all of the Gecko_ callsites with some regular expression matching, but annotating all the gtk entry points does not appear to be automatable. I'm recommend we content ourselves with merely verifying that direct calls are mutex-protected.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
Comment on attachment 8903437 [details]
Bug 1354966 Part 1: [WIP] set a static flag for the duration of all Gecko_ calls, to be checked later within gtk calls for dangerous concurrent access.

https://reviewboard.mozilla.org/r/175184/#review180548

Note - an RAII class to restore the previous value is probably preferable, in the case where we end up with reentrancy.

Also, worth working with sfink on this stuff, since he maintains the static analysis.
Comment on attachment 8903437 [details]
Bug 1354966 Part 1: [WIP] set a static flag for the duration of all Gecko_ calls, to be checked later within gtk calls for dangerous concurrent access.

https://reviewboard.mozilla.org/r/175184/#review180548

This is also wrong, because nothing prevents another thread from unsetting the flag if they're called in parallel.
Agreed. The patch is just WIP and not marked for review. I only put it up to ease some future search-and-replacing because I don't know how to handle the problem I laid out in comment 8 -- how to check a static value (a threadsafe counter, presumably) from all gtk calls.
I added a comment to the patch laying out the issues raised in comment 11, and renaming the patch to make it clear it's WIP.
sfink has found 4 other callstacks that lead from "Gecko_" to "gtk_" calls. I'll create a patch that puts mutex protections around these, and then we can consider opening a new bug to make the check part of the static analysis package.

  #1290265 = Gecko_SetOwnerDocumentNeedsStyleFlush
  #218842 = void nsIPresShell::ObserveStyleFlushes()
  #218844 = void nsIPresShell::DoObserveStyleFlushes()
  #377901 = uint8 nsRefreshDriver::AddStyleFlushObserver(nsIPresShell*)
  #377904 = void nsRefreshDriver::EnsureTimerStarted(uint32)
  #570853 = mozilla::RefreshDriverTimer* nsRefreshDriver::ChooseTimer() const
  #570840 = nsRefreshDriver.cpp:void CreateVsyncRefreshTimer()
  #137305 = gfxPlatform* gfxPlatform::GetPlatform()
  #1361767 = void gfxPlatform::Init()
  #248070 = uint32 nsObserverService::NotifyObservers(nsISupports*, int8*, uint16*)
  #527347 = void nsObserverList::NotifyObservers(nsISupports*, int8*, uint16*)
  #126879 = uint32 nsDragService::Observe(nsISupports*, int8*, uint16*)
  #389540 = gtk_widget_destroy
 
  #1290268 = Gecko_CalcStyleDifference
  #1290269 = uint32 nsStyleContext::CalcStyleDifference(ServoComputedValues*, uint32*, uint32*)
  #1298930 = uint32 nsStyleContext::CalcStyleDifferenceInternal(FakeStyleContext*, uint32*, uint32*) [with StyleContextLike = FakeStyleContext; uint32_t = unsigned int]
  #221740 = nsStyleUserInterface* nsStyleContext::StyleUserInterface()
  #221741 = nsStyleUserInterface* nsStyleContext::DoGetStyleUserInterface() [with bool aComputeData = true]
  #226453 = void nsStyleUserInterface::FinishStyle(nsPresContext*)
  #219842 = uint8 nsStyleImageRequest::Resolve(nsPresContext*)
  #1300077 = void mozilla::css::ImageValue::Initialize(nsIDocument*)
  #1289862 = void mozilla::css::ImageLoader::LoadImage(nsIURI*, nsIPrincipal*, nsIURI*, mozilla::css::ImageValue*)
  #846818 = uint32 nsContentUtils::LoadImage(nsIURI*, nsINode*, nsIDocument*, nsIPrincipal*, nsIURI*, uint32, imgINotificationObserver*, int32, nsAString*, imgRequestProxy**, uint32)
  #491502 = uint32 imgLoader::LoadImage(nsIURI*, nsIURI*, nsIURI*, uint32, nsIPrincipal*, nsILoadGroup*, imgINotificationObserver*, nsINode*, nsIDocument*, uint32, nsISupports*, uint32, nsAString*, imgRequestProxy**)
  #468324 = void imgRequestProxy::AddToLoadGroup()
  #382478 = uint32 mozilla::net::nsLoadGroup::AddRequest(nsIRequest*, nsISupports*)
  #396852 = uint32 nsWebBrowserPersist::OnStartRequest(nsIRequest*, nsISupports*)
  #371236 = uint32 nsDocShell::OnStateChange(nsIWebProgress*, nsIRequest*, uint32, uint32)
  #1030459 = void nsWindow::SetCursor(uint32)
  #497134 = gtk_widget_get_type
 
  #1290287 = Gecko_UpdateAnimations
  #1290289 = uint8 nsTransitionManager::UpdateTransitions(mozilla::dom::Element*, uint8, mozilla::ServoComputedValuesWithParent*, mozilla::ServoComputedValuesWithParent*)
  #1300756 = uint8 nsTransitionManager::DoUpdateTransitions(nsStyleDisplay*, mozilla::dom::Element*, uint8, class mozilla::AnimationCollection<mozilla::dom::CSSTransition>**, mozilla::ServoComputedValuesWithParent, mozilla::ServoComputedValuesWithParent) [with StyleType = mozilla::ServoComputedValuesWithParent; nsTransitionManager::CSSTransitionCollection = mozilla::AnimationCollection<mozilla::dom::CSSTransition>]
  #1302215 = void nsTransitionManager::ConsiderInitiatingTransition(int32, mozilla::StyleTransition*, mozilla::dom::Element*, uint8, class mozilla::AnimationCollection<mozilla::dom::CSSTransition>**, mozilla::ServoComputedValuesWithParent, mozilla::ServoComputedValuesWithParent, uint8*, nsCSSPropertyIDSet*) [with StyleType = mozilla::ServoComputedValuesWithParent; nsTransitionManager::CSSTransitionCollection = mozilla::AnimationCollection<mozilla::dom::CSSTransition>]
  #1300701 = void mozilla::dom::CSSTransition::SetEffectFromStyle(mozilla::dom::AnimationEffectReadOnly*)
  #500098 = void mozilla::dom::Animation::SetEffectNoUpdate(mozilla::dom::AnimationEffectReadOnly*)
  #500112 = void mozilla::dom::KeyframeEffectReadOnly::SetAnimation(mozilla::dom::Animation*)
  #500211 = void mozilla::dom::KeyframeEffectReadOnly::NotifyAnimationTimingUpdated()
  #500817 = uint8 mozilla::dom::KeyframeEffectReadOnly::CanThrottle() const
  #501167 = uint8 mozilla::dom::KeyframeEffectReadOnly::CanThrottleTransformChanges(nsIFrame*) const
  #218690 = int32 mozilla::LookAndFeel::GetInt(uint32, int32)
  #218691 = uint32 mozilla::LookAndFeel::GetInt(uint32, int32*)
  #428471 = uint32 nsLookAndFeel::GetIntImpl(uint32, int32*)
  #497132 = gtk_widget_get_settings
 
  #1290539 = Gecko_GetFontMetrics
  #399628 = gfxFont* gfxFontGroup::GetFirstValidFont(uint32)
  #1285927 = void gfxUserFontEntry::Load()
  #1386553 = void gfxUserFontEntry::LoadNextSrc()
  #137305 = gfxPlatform* gfxPlatform::GetPlatform()
  #1361767 = void gfxPlatform::Init()
  #248070 = uint32 nsObserverService::NotifyObservers(nsISupports*, int8*, uint16*)
  #527347 = void nsObserverList::NotifyObservers(nsISupports*, int8*, uint16*)
  #126873 = uint32 FullscreenTransitionTask::Observer::Observe(nsISupports*, int8*, uint16*)
  #261487 = uint32 nsHideViewer::Run()
  #670394 = void nsFrameLoader::Hide()
  #576360 = uint32 nsWebBrowser::SetParentWidget(nsIWidget*)
  #415927 = void* nsWindow::GetNativeData(uint32)
  #1128160 = _GtkWidget* nsWindow::GetToplevelWidget()
  #497133 = gtk_widget_get_toplevel
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
(In reply to Brad Werth [:bradwerth] from comment #15)
> sfink has found 4 other callstacks that lead from "Gecko_" to "gtk_" calls.
> I'll create a patch that puts mutex protections around these, and then we
> can consider opening a new bug to make the check part of the static analysis
> package.
> 
>   #1290265 = Gecko_SetOwnerDocumentNeedsStyleFlush

Main thread only, doesn't need mutex.

>   #1290268 = Gecko_CalcStyleDifference

That particular code path should also be main-thread only.

>   #1290287 = Gecko_UpdateAnimations

Main thread only.

>   #1290539 = Gecko_GetFontMetrics

Should already have synchronization.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #16)
> Main thread only, doesn't need mutex.

Good to know. He's refinining his analysis tool to both find more cases and now I've also asked him to eliminate anything that's verifiable main-thread-only (if possible).
(In reply to Emilio Cobos Álvarez [:emilio] from comment #16)
> (In reply to Brad Werth [:bradwerth] from comment #15)
> > sfink has found 4 other callstacks that lead from "Gecko_" to "gtk_" calls.
> > I'll create a patch that puts mutex protections around these, and then we
> > can consider opening a new bug to make the check part of the static analysis
> > package.
> > 
> >   #1290265 = Gecko_SetOwnerDocumentNeedsStyleFlush

This function was removed in bug 1389385, which landed a month ago. Is he using an up-to-date checkout?
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #18)

> This function was removed in bug 1389385, which landed a month ago. Is he
> using an up-to-date checkout?

Yeah, I just noticed that. He's updated his callstack and we'll have a new analysis to review. Sorry for posting premature info.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #16)
> >   #1290539 = Gecko_GetFontMetrics
> 
> Should already have synchronization.

That contests a lock only with other Gecko_GetFontMetrics() and Gecko_nsFont_InitSystem() calls.
It doesn't contest a lock with nsLookAndFeel::NativeGetColor(), which can also call into GTK, for GtkStyleContexts.

That stack is for the first call to gfxPlatform::GetPlatform(), which must not occur on concurrent threads.  I don't know whether or not there exist other paths to provide the possibility of concurrent calls.

NativeGetColor() can be modified to retrieve cached values from member variables set in EnsureInit(), if that is helpful.
(In reply to Karl Tomlinson (:karlt) from comment #20)
> (In reply to Emilio Cobos Álvarez [:emilio] from comment #16)
> > >   #1290539 = Gecko_GetFontMetrics
> > 
> > Should already have synchronization.
> 
> That contests a lock only with other Gecko_GetFontMetrics() and
> Gecko_nsFont_InitSystem() calls.
> It doesn't contest a lock with nsLookAndFeel::NativeGetColor(), which can
> also call into GTK, for GtkStyleContexts.

Huh, great point.

> That stack is for the first call to gfxPlatform::GetPlatform(), which must
> not occur on concurrent threads.  I don't know whether or not there exist
> other paths to provide the possibility of concurrent calls.
> 
> NativeGetColor() can be modified to retrieve cached values from member
> variables set in EnsureInit(), if that is helpful.

That would be great I think.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #21)
> (In reply to Karl Tomlinson (:karlt) from comment #20)
> > (In reply to Emilio Cobos Álvarez [:emilio] from comment #16)
> > > >   #1290539 = Gecko_GetFontMetrics
> > > 
> > > Should already have synchronization.
> > 
> > That contests a lock only with other Gecko_GetFontMetrics() and
> > Gecko_nsFont_InitSystem() calls.
> > It doesn't contest a lock with nsLookAndFeel::NativeGetColor(), which can
> > also call into GTK, for GtkStyleContexts.
> 
> Huh, great point.
> 
> > That stack is for the first call to gfxPlatform::GetPlatform(), which must
> > not occur on concurrent threads.  I don't know whether or not there exist
> > other paths to provide the possibility of concurrent calls.
> > 
> > NativeGetColor() can be modified to retrieve cached values from member
> > variables set in EnsureInit(), if that is helpful.
> 
> That would be great I think.

I filed bug 1400965 about this issue.

As for the rest, we really want the static analysis to help us here. Steve is overhauling the analysis in bug 1400442, but won't get to the platform API stuff until next week. That's probably ok, since we can uplift the analysis (and fixes for any problems it uncovers).

Note that I think the analysis will only technically cover linux, but since we usually (but not always) call into isomorphic platform APIs at the same callsite, it should be an ok proxy for other platforms.
Assignee: bwerth → sphink
Summary: stylo: We should audit gfxPlatform/widget code stylo calls into. → stylo: Add tracking for gfxPlatform/widget calls in the static analysis
Attachment #8903437 - Attachment is obsolete: true
See Also: → 1401063
Requesting tracking on all outstanding p2 stylo bugs.
Ok, I'm an idiot. Unless someone can convince me otherwise, I assert that the analysis already checks all of these cases. If you were to make a call to gtk_* without being protected by a mainthread assertion or some other annotation, the analysis would already fail -- it doesn't allow calling random external functions. Unless you specifically whitelist them. And this is done for eg 

    // We manually lock here
    if (name == "Gecko_nsFont_InitSystem" ||
        name == "Gecko_GetFontMetrics" ||
        name == "Gecko_nsStyleFont_FixupMinFontSize" ||
        /ThreadSafeGetDefaultFontHelper/.test(name))
    {
        return true;
    }

(this didn't stop me from doing a bunch of preparatory work for what I thought was going to be a tricky implementation. Oh well; it should be useful for analyzing DOM iterators, at least.)
Doh, of course. I apologize for this bug being kind of a train wreck - that's definitely on me for never sitting down and paging in all the pieces at once. I'll try to summarize everything here once and for all:

This bug, as originally filed in comment 0, was focused on potential hazards due to the lack of static analysis coverage for non-linux platforms. That is a real concern, but I'm not sure there's much we can do about it until the sixgill analysis runs on other platforms (NI Steve: is that something that feasible / on the road map?). I think we get decent indirect protection from the fact that it's somewhat rare for a codepath to do a system API call on one platform but not another. It can certainly happen, but I don't think we have a realistic path to finding those instances without static analysis of the call graphs on each platform.

But then comment 1 changed the subject somewhat, at least implicitly, by suggesting that gtk needed an audit (and thus that the static analysis was insufficient), and referencing the issue in bug 1391787, which was due to races in gtk calls in LookAndFeel::GetColor. However, the reason we had that crash was simply because LookAndFeel::GetColor was (and still is) whitelisted in the analysis [1]. So what we really want there is to remove that whitelisting, and instead add Gecko_GetLookAndFeelSystemColor to the "protected by the lock" section, to reflect the lock added in bug 1391787. But to make matters more complicated, brad landed a patch in bug 1401063 that removes the native gtk calls, but not the corresponding calls on mac and windows, which puts us precisely in the dangerous mismatch scenario discussed in the previous paragraph. This all needs sorting out.

Then in comment 6, brad was asked to find any other call paths from FFI functions into gtk code (which, given that the analysis already checks for this, was not actually necessary). This is a pretty difficult thing to do manually, which he soon discovered and closed the bug. But then Steve found some main-thread-only gtk calls when working on the static analysis, because his main-thread-only detection was broken. So he passed those to Brad, who reopened the bug in comment 15. Emilio subsequently pointed out the main-thread-only issue in comment 16.

This mostly put us back where we started. However, the stacks in comment 15 caused Karl to point out that emilio's "Should already have synchronization" argument for Gecko_GetFontMetrics was faulty, since we have separate locks for font metrics and widget calls. So I went ahead an filed bug 1400965, and only later connected the dots on Karl's second paragraph, where he points out that Gecko_GetFontMetrics->gtk edge only occurs in a lazy gfxPlatform::Init call, which is again main-thread-only. So I resolved bug 1400965 as invalid, despite the fact that having multiple locks is actually not very smart. This needs fixing too.

In conclusion: there are a few things left to do here, but no fundamental changes to the analysis that I can see. I'll file followups on those pieces and resolve this bug. Sorry we flailed here for so long.

[1] http://searchfox.org/mozilla-central/rev/3dbb47302e114219c53e99ebaf50c5cb727358ab/js/src/devtools/rootAnalysis/analyzeHeapWrites.js#472
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Flags: needinfo?(sphink)
Resolution: --- → WORKSFORME
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #25)
> This bug, as originally filed in comment 0, was focused on potential hazards
> due to the lack of static analysis coverage for non-linux platforms. That is
> a real concern, but I'm not sure there's much we can do about it until the
> sixgill analysis runs on other platforms (NI Steve: is that something that
> feasible / on the road map?). I think we get decent indirect protection from

Running on OSX is feasible if gcc can compile the browser there. I don't know if it can? But the analysis is currently completely tied to gcc, so there's not much hope for Windows.

At least, in its current form. I suspect we'll want to port the analysis over to clang at some point, and then all three platforms would be reasonably straightforward to get working. That would be a pretty large project, though, so it would need a pretty strong motivator.

> This mostly put us back where we started. However, the stacks in comment 15
> caused Karl to point out that emilio's "Should already have synchronization"
> argument for Gecko_GetFontMetrics was faulty, since we have separate locks
> for font metrics and widget calls. So I went ahead an filed bug 1400965, and
> only later connected the dots on Karl's second paragraph, where he points
> out that Gecko_GetFontMetrics->gtk edge only occurs in a lazy
> gfxPlatform::Init call, which is again main-thread-only. So I resolved bug
> 1400965 as invalid, despite the fact that having multiple locks is actually
> not very smart. This needs fixing too.
> 
> In conclusion: there are a few things left to do here, but no fundamental
> changes to the analysis that I can see. I'll file followups on those pieces
> and resolve this bug. Sorry we flailed here for so long.

The one thing I could see adding to the analysis here is paying attention to the mutexes. Currently, it doesn't do anything with them at all, since the only functions that need it are whitelisted. It would be better if the analysis had general functionality for checking mutex guards, perhaps even comparing the mutexes used (under the assumption that they're often &sFoo). That would apply both to external function calls (eg gtk_*) as well as heap writes.
Flags: needinfo?(sphink)
See Also: → 1403699
See Also: → 1403690
(In reply to Steve Fink [:sfink] [:s:] from comment #26)
> The one thing I could see adding to the analysis here is paying attention to
> the mutexes. Currently, it doesn't do anything with them at all, since the
> only functions that need it are whitelisted. It would be better if the
> analysis had general functionality for checking mutex guards, perhaps even
> comparing the mutexes used (under the assumption that they're often &sFoo).
> That would apply both to external function calls (eg gtk_*) as well as heap
> writes.

Once bug 1403690 is fixed, there will only be two locks. Bug 1403699 is then about merging those down to one lock, so that the problem goes away. If this is something that the analysis could comprehend easily, that would certainly be preferable. But it's probably not worth spending a lot of time on it otherwise.
You need to log in before you can comment on or make changes to this bug.