Closed Bug 1816165 Opened 3 years ago Closed 2 years ago

Remove hazards resulting from ~shared_ptr<T> when ~T does not GC

Tracking

()

Status:

RESOLVED FIXED

Milestone:

113 Branch

Tracking Flags:

Tracking

Status

firefox113

---

fixed

People

(Reporter: sfink, Assigned: sfink)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

Bug 1816165 - [hazards] Generalize edgeIsNonReleasingDtor mechanism without changing its current behavior. 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1816165 - [hazards] Reflect some edge-specific info in the callgraph, and rework how attrs are passed through the analysis code. 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1816165 - [hazards] Merge getCallEdgeProperties() into getCallees(). 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1816165 - [hazards] Use common getCallees() from analyzeRoots.js and remove the gcEdges mechanism. 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1816165 - [hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review

Steve Fink [:sfink] [:s:]

Assignee

Description

•

3 years ago

I am getting bogus hazards from ~shared_ptr:

void mozilla::ClientWebGLContext::Run(uint32&, mozilla::RawBuffer<>&&, uint32&) const [with MethodType = void (mozilla::HostWebGLContext::*)(unsigned int, const mozilla::RawBuffer<>&, unsigned int) const; MethodType method = &mozilla::HostWebGLContext::BufferData; Args = {unsigned int&, mozilla::RawBuffer<unsigned char>, unsigned int&}]

std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[complete_dtor]]

std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[base_dtor]]

std::__shared_ptr<_Tp, _Lp>::~__shared_ptr() [with _Tp = mozilla::webgl::NotLostData; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]

std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]

std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]

void std::_Sp_counted_base<_Lp>::_M_release() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]

std::_Sp_counted_base<__gnu_cxx::_S_atomic>._M_dispose:0

std::_Sp_counted_deleter<_IO_FILE*, void (*)(_IO_FILE*), std::allocator<void>, __gnu_cxx::_S_atomic>._M_dispose:0

void std::_Sp_counted_deleter<_Ptr, _Deleter, _Alloc, _Lp>::_M_dispose() [with _Ptr = _IO_FILE*; _Deleter = void (*)(_IO_FILE*); _Alloc = std::allocator<void>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]

IndirectCall: UNKNOWN

(any-function)

(GC)

The problem is that the destructor goes through a function pointer that will end up calling T::~T(), but the analysis doesn't know that.

Steve Fink [:sfink] [:s:]

Assignee

Comment 1

•

3 years ago

I am handling this with a mechanism that detects the call to std::shared_ptr<mozilla::webgl::NotLostData>::~shared_ptr() [[complete_dtor]] and sets a flag ATTR_REPLACED on that call edge to tell the analysis to ignore it, and instead synthesize a call to NotLostData::~NotLostData().

That part works. The first problem I ran into was that in my unit test for this, it turns out that the shared_ptr constructor was also calling the destructor. Which is for a reason I fixed long ago: in the presence of exceptions, that can actually happen. So the fix was to compile my test with -fno-exceptions. I'm surprised this is the first time I hit this.... oh. It isn't. In fact, I have an explicit test that runs with an without exceptions to test the cleanup path handling. Me is dumb.

The next problem: I'm still seeing a call to the shared_ptr destructor, one that does NOT go through an unknown function pointer, but instead calls a virtual function at a point where it no longer knows the type T. So the analysis assumes that it can call any instantiated override of std::_Sp_counted_base<__gnu_cxx::_S_atomic>._M_dispose(), including those than can GC. I need to either replace that initial edge, or propagate the actual type through the intervening calls. Here's the stack:

   std::shared_ptr<NoGCOnDestruction>::~shared_ptr() [[complete_dtor]]
   std::shared_ptr<NoGCOnDestruction>::~shared_ptr() [[base_dtor]]
   std::__shared_ptr<_Tp, _Lp>::~__shared_ptr() [with _Tp = NoGCOnDestruction; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]
   std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   std::__shared_count<_Lp>::~__shared_count() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic] [[base_dtor]]
   void std::_Sp_counted_base<_Lp>::_M_release() [with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   std::_Sp_counted_base<__gnu_cxx::_S_atomic>._M_dispose:0
   std::_Sp_counted_ptr_inplace<GCOnDestruction, std::allocator<GCOnDestruction>, __gnu_cxx::_S_atomic>._M_dispose:0
   void std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_M_dispose() [with _Tp = GCOnDestruction; _Alloc = std::allocator<GCOnDestruction>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]
   static void std::allocator_traits<std::allocator<_Tp1> >::destroy(std::allocator_traits<std::allocator<_Tp1> >::allocator_type&, _Up*) [with _Up = GCOnDestruction; _Tp = GCOnDestruction; std::allocator_traits<std::allocator<_Tp1> >::allocator_type = std::allocator<GCOnDestruction>]
   void __gnu_cxx::new_allocator<_Tp>::destroy(_Up*) [with _Up = GCOnDestruction; _Tp = GCOnDestruction]
   void GCOnDestruction::~GCOnDestruction() [[complete_dtor]]
   void GCOnDestruction::~GCOnDestruction() [[base_dtor]]
   void GC()
   (GC)

Steven DeTar [:sdetar]

Updated

•

2 years ago

Severity: -- → N/A

Priority: -- → P2

Steve Fink [:sfink] [:s:]

Assignee

Comment 2

•

2 years ago

It turned out to require some more extensive changes to do this properly. The main issue is that the specifics of the function body is looked at twice: first to gather the calls to generate the global callgraph, and then to find the actual hazards. Right now, those are totally separate, but we need to do the same thing in both cases: the original call should be skipped in the callgraph (or better, noted down but tagged as replaced), but it should also be skipped when looking for hazards.

There is a mechanism for propagating information from the callgraph scan to the hazard scan, the gcEdges table. But it hasn't been used for much, and is a little rickety (eg the hazard scan applies its effects to points rather than edges). Nothing here is particularly expensive, so it would be nicer to remove the complexity of that table and just have both scans call into the same code.

Steve Fink [:sfink] [:s:]

Assignee

Comment 3

•

2 years ago

Attached file Bug 1816165 - [hazards] Generalize edgeIsNonReleasingDtor mechanism without changing its current behavior. — Details

I ended up not using the main point of this generalization, which was to use the dominator traversal for a set of properties at a time. Maybe someday later. But other parts of it set things up for later changes. Note that the isSpecialEdge function will be renamed in a following patch (though it'll do roughly the same thing), which is good because the name sucks. Moving the pieces between the two patches seemed like it would take too long to get right; apologies.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → sphink

Status: NEW → ASSIGNED

Steve Fink [:sfink] [:s:]

Assignee

Comment 4

•

2 years ago

Attached file Bug 1816165 - [hazards] Reflect some edge-specific info in the callgraph, and rework how attrs are passed through the analysis code. — Details

Mostly mechanical refactoring. This includes a big comment added for the gcEdges mechanism, which will then be removed entirely in a following patch.

Steve Fink [:sfink] [:s:]

Assignee

Comment 5

•

2 years ago

Attached file Bug 1816165 - [hazards] Merge getCallEdgeProperties() into getCallees(). — Details

Not entirely refactoring. This also adds some fairly unimportant information into the getCallees() output.

Steve Fink [:sfink] [:s:]

Assignee

Comment 6

•

2 years ago

Attached file Bug 1816165 - [hazards] Use common getCallees() from analyzeRoots.js and remove the gcEdges mechanism. — Details

Now that both the callgraph generation and the hazard detection phases use the same source of callee information, the gcEdges mechanism to transmit data between them is no longer needed. Which is good, because the whole analysis is basically a map/reduce operation:

map: extract callee information from all functions
reduce: merge into a global callgraph
map: gather hazards from all functions
reduce: merge into a single output

The gcEdges output caused the first map step to produce two different outputs, which was a bit of a nuisance for the driver. (Now it can handle multiple outputs and no longer needs to. Oh well.)

Steve Fink [:sfink] [:s:]

Assignee

Comment 7

•

2 years ago

Attached file Bug 1816165 - [hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() — Details

Pulsebot

Comment 8

•

2 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d58734c63589 [hazards] Generalize edgeIsNonReleasingDtor mechanism without changing its current behavior. r=jonco https://hg.mozilla.org/integration/autoland/rev/cb7d0aea075b [hazards] Reflect some edge-specific info in the callgraph, and rework how attrs are passed through the analysis code. r=jonco https://hg.mozilla.org/integration/autoland/rev/65461a0d2fda [hazards] Merge getCallEdgeProperties() into getCallees(). r=jonco https://hg.mozilla.org/integration/autoland/rev/5da1a824ea41 [hazards] Use common getCallees() from analyzeRoots.js and remove the gcEdges mechanism. r=jonco

Cristina Horotan [:chorotan]

Comment 9

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d58734c63589
https://hg.mozilla.org/mozilla-central/rev/cb7d0aea075b
https://hg.mozilla.org/mozilla-central/rev/65461a0d2fda
https://hg.mozilla.org/mozilla-central/rev/5da1a824ea41

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox113: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 113 Branch

Pulsebot

Comment 10

•

2 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f86e15496069 [hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco

Serban Stanca [:SerbanS]

Comment 11

•

2 years ago

Backed out for causing Linux x64 debug hazard bustages.

Backout link
Push with failures
Failure Log
Failure line: Exception: Process executed with non-0 exit code 1: ['/builds/worker/checkouts/gecko/js/src/devtools/rootAnalysis/analyze.py', '--js', '/builds/worker/workspace/obj-haz-shell/dist/bin/js']

Flags: needinfo?(sphink)

Pulsebot

Comment 12

•

2 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/808a43ecc06c [hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco

Cosmin Sabou [:CosminS]

Comment 13

•

2 years ago

Backed out for causing hazard failures on rootAnalysis/analyze.py

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=2751ae76325c35a84d5781d1f507210cdbd72159&selectedTaskRun=M9SJI8SYQGaNn2E9N2IxfQ.0

Failure log: https://treeherder.mozilla.org/logviewer?job_id=412108269&repo=autoland

Backout link: https://hg.mozilla.org/integration/autoland/rev/79b976bdc2330d5df519bdd325b0a988fb02878e

Norisz Fay [:noriszfay]

Comment 14

•

2 years ago

Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/79b976bdc233

Pulsebot

Comment 15

•

2 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a2533274bbae [hazards] Replace shared_ptr<T>::~shared_ptr() calls with T::~T() r=jonco

Narcis Beleuzu [:NarcisB]

Comment 16

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a2533274bbae

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

2 years ago

Flags: needinfo?(sphink)

You need to log in before you can comment on or make changes to this bug.