Closed Bug 1350958 Opened 3 years ago Closed 2 years ago

Finish labeling ProxyReleaseEvent

Categories

(Core :: DOM: Core & HTML, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: billm, Assigned: baku)

References

Details

Attachments

(4 files, 2 obsolete files)

We have already labeled some ProxyReleaseEvent runnables by using NS_ReleaseOnMainThreadSystemGroup instead of NS_ReleaseOnMainThread. However, telemetry shows that there are still a lot of uses of NS_ReleaseOnMainThread (and there are many static occurrences in the code, too). We need to eliminate these.
Priority: -- → P2
Assignee: nobody → amarchesini
Somewhere I was able to take the eventTarget from the window. But elsewhere I used the SystemGroup.
Attachment #8883914 - Flags: review?(wmccloskey)
Probably we can use SystemGroup for all of them. They are not going to touch DOM.
I'm a little worried about how safe this is. Steve thinks that we should be able to get a callgraph out of the JS rooting analysis that will tell us all the functions that can be called from NS_ReleaseOnMainThreadSystemGroup. I think that would be interested to see for review purposes.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3d96b8e2f0360a148a0e796b9d240b45d8b82c29
Flags: needinfo?(sphink)
Ok. I'm not sure how useful this'll be. It seems to have found its way to the larger graph, which is often an indication of some stupid debug-only or crash-only path. (It says that a full 10% of all functions in Gecko are reachable from NS_ReleaseOnMainThreadSystemGroup, which btw is not a function but a family of templated functions.)

I will attempt to attach the output file, but it's pretty enormous. Structure of the file:

len(callers) = 1041300
(Cmd) 113815 total reachable functions found
All reachable functions:
(...one line per reachable function...)
Route from source to every reachable function:
(...blank line-separate paragraphs, each paragraph giving an example static call chain from a source function to each of the reachable functions...)

So I'd expect you to use it by skimming through the "All reachable functions" section, saying "wtf? we shouldn't be able to get here", then looking up that part of the "Route from..." section to see how it thinks it can get there.

And that will probably mean you'll want to exclude some paths or something. Let me know; there's already code for avoiding functions for other purposes (NS_DebugBreak is a common candidate.) I can add that in pretty easily.

Or if you want to run it yourself, code is at both

(mercurial) https://bitbucket.org/sfink/sfink-tools
(git) https://github.com/hotsphink/sfink-tools.git

in the file bin/traverse.py. Run it on callgraph.txt, which is in the tarball https://queue.taskcluster.net/v1/task/RA62oKaZRQaN1m_yBy5u2A/runs/0/artifacts/public/build/hazardIntermediates.tar.xz . Do "(Cmd) reachable NS_ReleaseOnMainThreadSystemGroup". But I'm happy to make fixes.

Now I'll see if bugzilla will let me attach the whole thing...
Flags: needinfo?(sphink)
Attachment #8884362 - Attachment mime type: text/plain → application/octet-stream
Thanks Steve. I now have a new query that should work better. I want to find all paths from detail::ProxyReleaseEvent<T>::Run to js::RunScript, ignore paths that go through nsXPCWrappedJSClass::CallQueryInterfaceOnJSObject. (Hopefully there won't be any.) I tried to do this myself, but I couldn't get the gAvoidFuncs thing to work.
Flags: needinfo?(sphink)
Sorry, I didn't get to looking at this until just now.

The script couldn't do that yet, for various reasons. And "all routes" is a tall order -- the graph often forks enough to make that infeasible. It might be ok in this case, but I've just added enough functionality to run your query; I haven't tried asking for all routes. This is just a representative sampling, which *is* implemented, though in a funky way. (Doing all routes would be much simpler to implement.)

Anyway, let me know if these are useful and I can beef it up to add more, or you can tell me more things to avoid. I'll update the repo with these changes. The exact query used is in the attachment description.
Flags: needinfo?(sphink)
> The script couldn't do that yet, for various reasons. And "all routes" is a
> tall order -- the graph often forks enough to make that infeasible.

Sorry, that statement bugged me. "Infeasible" as in there are too many outputs. If you have a path through the graph that forks into 3 callees then rejoins, then forks again, you'll end up with an exponential number of paths. (Exponential in the number of times this happens, but still.) This is especially likely when you have large loops in your callgraph, eg as happens with nested event loop invocations. If you ever reach any node in that loop, you'll have a bad time.

More likely in practice, if you have foo -> maybeBar -> bar as well as a direct foo -> bar, you'll get a quadratic number of outputs, and this pattern is not uncommon.

Again, I have no evidence that either would be a concern for this particular query.
Ugh, and looking at my output, it's bogus; the avoid still isn't working. Sorry, will update.
Attachment #8884362 - Attachment is obsolete: true
Attachment #8885545 - Attachment is obsolete: true
Actually, that query wasn't even doing what I thought it was. It's actually just finding a single path from any of the source functions (ProxyReleaseEvent<T>::Run) to the destination function, and previously it was ignoring the avoids. Here's the raw result, fixed up.
That output tends to favor the same path over and over again. Here are some alternative paths, somewhat manually constructed.
Comment on attachment 8883914 [details] [diff] [review]
releaseMainThread.patch

Review of attachment 8883914 [details] [diff] [review]:
-----------------------------------------------------------------

None of the paths Steve's analysis found look worrisome. I was initially concerned about the XHR path, but that one is not actually changed by your patch. The other paths go through SuspectAfterShutdown, which I think we don't have to worry about. It would be better to get more complete results, but I don't want to block this bug any longer.

This is probably no more dangerous than labeling the cycle collector with the system group, so let's just go for it. We have assertions that should catch any actual labeling violations.
Attachment #8883914 - Flags: review?(wmccloskey) → review+
Ok, if the XHR one is not a concern, then I can dig to the bottom of the pile. This is the minimal query I could construct that results in no results:

routes from ProxyReleaseEvent<T>::Run to js::RunScript avoiding nsXPCWrappedJSClass::CallQueryInterfaceOnJSObject and SuspectAfterShutdown and GetBindingImplementation and ReportToConsole and Init and GetSingleton and JS::ExposeObjectToActiveJS and XMLHttpRequestMainThread::Release and ConsumeBodyDoneObserver<Derived>::Release

that means there's a path through if you stop avoiding any one of the above, so to be confident there aren't any problematic paths you should have a reason why each of these is ok. (various *::Init methods are very common reasons why you might be able to run arbitrary code; first-time lazy initialization often finds a way to go as far as spinning the event loop, but in practice you often know that you won't be doing a first-time lazy initialization. And GetSingleton sounded similar to my ears.) The list in better format of critical cut points that you will have to go through in order to run JS:

  - nsXPCWrappedJSClass::CallQueryInterfaceOnJSObject
  - SuspectAfterShutdown
  - GetBindingImplementation
  - ReportToConsole
  - Init
  - GetSingleton
  - JS::ExposeObjectToActiveJS
  - XMLHttpRequestMainThread::Release
  - ConsumeBodyDoneObserver<Derived>::Release

I'll attach a final set of paths including some of the more questionable items in that list. As always, these are just random paths found during a graph traversal; just because a particular path is impossible doesn't mean there isn't a similar one that is far more plausible.

If anyone has ideas for how to make this sort of callgraph analysis more useful, I'd be happy to hear them. From this, it feels to me like identifying some minimal set of intermediate nodes that allow you to get from A to B would be useful, though in practice I always do it via guesses from staring at the method names. I guess I could codify some of those guesses.
None of the paths in comment 15 are feasible. I don't know whether that exclusion list is a reasonable one though.

I think the only way to use this tool is interactively. It would be much better if it printed only one path, but made it easier to add additional exclusions based on the result you see. The cycle of getting a path and then excluding should be really fast. If you can make that work, I think it would be pretty useful.

I still think this bug should land though. Actually proving the patch is safe this way would still take hours, and I don't the time.
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bfd1975229fd
Finish labeling ProxyReleaseEvent, r=billm
https://hg.mozilla.org/mozilla-central/rev/bfd1975229fd
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Duplicate of this bug: 1378974
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.