Closed Bug 1816165 Opened 4 months ago Closed 2 months ago

Remove hazards resulting from ~shared_ptr<T> when ~T does not GC


(Core :: JavaScript: GC, task, P2)




113 Branch
Tracking Status
firefox113 --- fixed


(Reporter: sfink, Assigned: sfink)


(Blocks 1 open bug)



(5 files)

I am getting bogus hazards from ~shared_ptr:

void mozilla::ClientWebGLContext::Run(uint32&, mozilla::RawBuffer<>&&, uint32&) const [with MethodType = void (mozilla::HostWebGLContext::*)(unsigned int, const mozilla::RawBuffer<>&, unsigned int) const; MethodType method = &mozilla::HostWebGLContext::BufferData; Args = {unsigned int&, mozilla::RawBuffer<unsigned char>, unsigned int&}]

std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[complete_dtor]]

std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[base_dtor]]

std::__shared_ptr<_Tp, _Lp>::~__shared_ptr() [with _Tp = mozilla::webgl::NotLostData; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]

std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]

std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]

void std::_Sp_counted_base<_Lp>::_M_release() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]


std::_Sp_counted_deleter<_IO_FILE*, void (*)(_IO_FILE*), std::allocator<void>, __gnu_cxx::_S_atomic>._M_dispose:0

void std::_Sp_counted_deleter<_Ptr, _Deleter, _Alloc, _Lp>::_M_dispose() [with _Ptr = _IO_FILE*; _Deleter = void (*)(_IO_FILE*); _Alloc = std::allocator<void>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]

IndirectCall: UNKNOWN



The problem is that the destructor goes through a function pointer that will end up calling T::~T(), but the analysis doesn't know that.

I am handling this with a mechanism that detects the call to std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[complete_dtor]] and sets a flag ATTR_REPLACED on that call edge to tell the analysis to ignore it, and instead synthesize a call to NotLostData::~NotLostData().

That part works. The first problem I ran into was that in my unit test for this, it turns out that the shared_ptr constructor was also calling the destructor. Which is for a reason I fixed long ago: in the presence of exceptions, that can actually happen. So the fix was to compile my test with -fno-exceptions. I'm surprised this is the first time I hit this.... oh. It isn't. In fact, I have an explicit test that runs with an without exceptions to test the cleanup path handling. Me is dumb.

The next problem: I'm still seeing a call to the shared_ptr destructor, one that does NOT go through an unknown function pointer, but instead calls a virtual function at a point where it no longer knows the type T. So the analysis assumes that it can call any instantiated override of std::_Sp_counted_base<__gnu_cxx::_S_atomic>._M_dispose(), including those than can GC. I need to either replace that initial edge, or propagate the actual type through the intervening calls. Here's the stack:

   std::shared_ptr<NoGCOnDestruction>::~shared_ptr() [[complete_dtor]]
   std::shared_ptr<NoGCOnDestruction>::~shared_ptr() [[base_dtor]]
   std::__shared_ptr<_Tp, _Lp>::~__shared_ptr() [with _Tp = NoGCOnDestruction; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]
   std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]
   void std::_Sp_counted_base<_Lp>::_M_release() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   std::_Sp_counted_ptr_inplace<GCOnDestruction, std::allocator<GCOnDestruction>, __gnu_cxx::_S_atomic>._M_dispose:0
   void std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_M_dispose() [with _Tp = GCOnDestruction; _Alloc = std::allocator<GCOnDestruction>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   static void std::allocator_traits<std::allocator<_Tp1> >::destroy(std::allocator_traits<std::allocator<_Tp1> >::allocator_type&, _Up*) [with _Up = GCOnDestruction; _Tp = GCOnDestruction; std::allocator_traits<std::allocator<_Tp1> >::allocator_type = std::allocator<GCOnDestruction>]
   void __gnu_cxx::new_allocator<_Tp>::destroy(_Up*) [with _Up = GCOnDestruction; _Tp = GCOnDestruction]
   void GCOnDestruction::~GCOnDestruction() [[complete_dtor]]
   void GCOnDestruction::~GCOnDestruction() [[base_dtor]]
   void GC()
Severity: -- → N/A
Priority: -- → P2

It turned out to require some more extensive changes to do this properly. The main issue is that the specifics of the function body is looked at twice: first to gather the calls to generate the global callgraph, and then to find the actual hazards. Right now, those are totally separate, but we need to do the same thing in both cases: the original call should be skipped in the callgraph (or better, noted down but tagged as replaced), but it should also be skipped when looking for hazards.

There is a mechanism for propagating information from the callgraph scan to the hazard scan, the gcEdges table. But it hasn't been used for much, and is a little rickety (eg the hazard scan applies its effects to points rather than edges). Nothing here is particularly expensive, so it would be nicer to remove the complexity of that table and just have both scans call into the same code.

I ended up not using the main point of this generalization, which was to use the dominator traversal for a set of properties at a time. Maybe someday later. But other parts of it set things up for later changes. Note that the isSpecialEdge function will be renamed in a following patch (though it'll do roughly the same thing), which is good because the name sucks. Moving the pieces between the two patches seemed like it would take too long to get right; apologies.

Assignee: nobody → sphink

Mostly mechanical refactoring. This includes a big comment added for the gcEdges mechanism, which will then be removed entirely in a following patch.

Not entirely refactoring. This also adds some fairly unimportant information into the getCallees() output.

Now that both the callgraph generation and the hazard detection phases use the same source of callee information, the gcEdges mechanism to transmit data between them is no longer needed. Which is good, because the whole analysis is basically a map/reduce operation:

  • map: extract callee information from all functions
  • reduce: merge into a global callgraph
  • map: gather hazards from all functions
  • reduce: merge into a single output

The gcEdges output caused the first map step to produce two different outputs, which was a bit of a nuisance for the driver. (Now it can handle multiple outputs and no longer needs to. Oh well.)

Pushed by
[hazards] Generalize edgeIsNonReleasingDtor mechanism without changing its current behavior. r=jonco
[hazards] Reflect some edge-specific info in the callgraph, and rework how attrs are passed through the analysis code. r=jonco
[hazards] Merge getCallEdgeProperties() into getCallees(). r=jonco
[hazards] Use common getCallees() from analyzeRoots.js and remove the gcEdges mechanism. r=jonco
Pushed by
[hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco

Backed out for causing Linux x64 debug hazard bustages.

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: Exception: Process executed with non-0 exit code 1: ['/builds/worker/checkouts/gecko/js/src/devtools/rootAnalysis/', '--js', '/builds/worker/workspace/obj-haz-shell/dist/bin/js']
Flags: needinfo?(sphink)
Pushed by
[hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco
Pushed by
[hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco
Flags: needinfo?(sphink)
You need to log in before you can comment on or make changes to this bug.