Closed Bug 1546546 Opened 11 months ago Closed 11 months ago

WindowsDLLInterceptor and VMSharingPolicyUnique must be destroyed in the proper order

Categories

(Core :: mozglue, defect, P1)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: handyman, Assigned: handyman)

References

Details

Attachments

(3 files, 1 obsolete file)

WindowsDllInterceptor objects are usually (currently always) static and are therefore destroyed atexit. WindowsDllInterceptors hold a WindowsDllDetourPatcher, which in turn holds a VMSharingPolicyShared (usually anyway -- VMSharingPolicyShared is a common choice for the template parameter). VMSharingPolicyShared holds a (reference to a) static VMSharingPolicyUnique object -- so the VMSharingPolicyUnique is also destroyed atexit. In C/C++, static destructor order is undefined. The WindowsDllInterceptor's destructor Clears itself, which eventually results in it Clearing the VMSharingPolicyUnique by taking and clearing its items. But the VMSharingPolicyUnique may have already been destroyed, resulting in corruption.

The issue is actually worse than this. It sometimes happens that the first time the VMSharingPolicyUnique is accessed is when the WindowsDllInterceptor requests its Items while being destroyed atexit. When statics are first accessed, they are constructed. So we create the static VMSharingPolicyUnique during our atexit call. In order to destroy statics atexit, they register an atexit handler when they are constructed. You can probably see where this is going. I have witnessed that, when our new VMSharingPolicyUnique goes to register its destructor's atexit handler during atexit, our atexit (totally legitimately) immediately calls the handler, immediately destroying the newly created object.

...Then it gives us the pointer :) Weak.

I've got a patch that makes the VMSharingPolicyShared access the VMSharingPolicyUnique using a RefPtr. Each VMSharingPolicyShared has a reference, so the VMSharingPolicyUnique has to last at least as long as we need it, and will be destroyed when all VMSharingPolicyShared destructors have run and the static RefPtr<VMSharingPolicyUnique> has been destroyed atexit.

WindowsDllInterceptor and VMSharingPolicyUnique require a specific destructor dependency order. Both are static so such a guarantee is impossible.

This patch makes the VMSharingPolicyShared access the VMSharingPolicyUnique using a RefPtr. The VMSharingPolicyUnique is no longer static. Each VMSharingPolicyShared has a reference, so the VMSharingPolicyUnique has to last at least as long as we need it, and will be destroyed when both the VMSharingPolicyShared destructors have run and the static RefPtr<VMSharingPolicyUnique> has been destroyed by its atexit destructor.

So, that patch just kicks the issue along but doesn't actually solve it. With it, there is a destruction order dependency between WindowsDllinterceptor and the new RefPtr<VMSharingPolicyUnique>, which could be destroyed before it is referenced in the VMSharingPolicyShared constructor.

I'm at a loss here. Aaron, any ideas?

Flags: needinfo?(aklotz)

I have two ideas, one is conservative and one is kind of wacky but I think it would work:

Just modify VMSharingPolicyShared::ShouldUnhookUponDestruction to unconditionally return false; or

What if VMSharingPolicyUnique had a method that accepted a bool*, and saved that to an instance variable, and set it to false it its destructor (when present)?

Then if we added a static bool initialized to true to VMSharingPolicyShared and pass its pointer into that method. Then modify ShouldUnhookUponDestruction to only return true if that bool is not yet set to false. Since the bool doesn't have a destructor, it would still be safe to check.

As wacky as idea 2 is, I'm can't really think of a case where we really must unhook during shutdown, so I'm leaning more toward idea 1. What do you think?

Flags: needinfo?(aklotz)

I don't see unhooking at shutdown as crucial, but I feel there's a lot to this I'm not aware of and can't predict. I think the fear would be that someone else's static destruction incidentally used some hooked method after the hooks were supposed to be torn down. That sounds like a bad problem but failures would (probably) involve the hook method so the situation would be identifiable. Otherwise, I'd be concerned that people would be getting crazy mysterious shutdown failures and have no clue what the hell happened (see: me last week). Of course, there is no way to clean up in the DynamicCodeDisabled sandbox case so this is unfortunately always a possibility.

I guess the bool solution should work since it'll be allocated in DATA so we don't have to worry about it being freed. I think the benefit of erasing the problem in the last paragraph outweighs the C hijinks. If it leads to something bad, it'll be more obvious.

This got me thinking that we can make the bool solution a little less zany by turning it into a ref-count while still taking advantage of you observation about trivial destructors of statics. We could use a std::atomic<int> in place of the bool. Integral std::atomics also have trivial construction/destruction so we'd be safe but the pattern of ref-counting would be more familiar. (I assume it needs to be atomic -- I didn't see anything keeping that stuff from being created on 'any' thread.) With the ref count, the VMSharingPolicyUnique can't be deleted from underneath us so cleanup should always succeed (again, modulo DynamicCodeDisabled).

I like the idea of the static atomic int but I also like the simplicity of just making ShouldUnhookUponDestruction return false. It's your call.

Flags: needinfo?(aklotz)

I do like your suggestion of using an atomic. OTOH, keeping in mind some of the stuff that I need to change to fulfill bug 1532470, I think that very soon it is going to become a royal PITA to tear down a VMSharingPolicyShared regardless, so let's just make ShouldUnhookUponDestruction return false and be done with it.

Flags: needinfo?(aklotz)

This patch fixes a static destructor order dependency between WindowsDllInterceptor and VMSharingPolicyUnique by telling VMSharingPolicyShared not to access the VMSharingPolicyUnique at destruction. See the bug for details of the order dependency.

Attachment #9060335 - Attachment is obsolete: true
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3096a547bb84
Never unhook DLL-intercepted methods upon VMSharingPolicyShared destruction r=aklotz
Flags: needinfo?(davidp99)
Blocks: 1532470
Attachment #9060595 - Attachment description: Bug 1546546: Never unhook DLL-intercepted methods upon VMSharingPolicyShared destruction r=aklotz → Bug 1546546: Part 1 - Never unhook DLL-intercepted functions upon VMSharingPolicyShared destruction r=aklotz

QueryCredentialsAttributesA and FreeCredentialsHandle trigger an exception when null is passed for the CredHandle pointer. This exception was ignored (when not run in the debugger) but that is no longer the case with the changes in part 3. This patch passes a real CredHandle to them.

Depends on D28764

In part 1, we disabled the unhooking of DLL-intercepted functions at shutdown. The TestDllInterceptor relied on unhooking -- it worked by hooking functions with a "nonsense function" (nullptr) and then immediately unhooking it. That restored the original function behavior. Some hooked functions (e.g. NtWriteFile) are used by functions later in the program (e.g. printf) so the functions need to maintain their behavior.

This patch replaces the nonsense function with an identity function that also sets a global boolean as a side-effect. The function is written in machine code. x86-32, x86-64, and aarch64 variants are included.

Depends on D30243

Attachment #9060335 - Attachment is obsolete: false
Attachment #9060335 - Attachment is obsolete: true
Pushed by daparks@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e06739cc67cd
Part 1 - Never unhook DLL-intercepted functions upon VMSharingPolicyShared destruction r=aklotz
https://hg.mozilla.org/integration/autoland/rev/8f80e7afef7c
Part 2 - Pass a real CredHandle to relevant TestDllInterceptor functions r=aklotz
https://hg.mozilla.org/integration/autoland/rev/c04cde900d27
Part 3 - TestDllInterceptor must leave intercepted functions operable r=aklotz
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
You need to log in before you can comment on or make changes to this bug.