Closed Bug 1492508 Opened 6 years ago Closed 6 years ago

Update occasionally fails when running Windows ASan reporter Nightly build

Categories

(Release Engineering :: Release Requests, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

(firefox64 fixed)

RESOLVED FIXED
Tracking Status
firefox64 --- fixed

People

(Reporter: tsmith, Assigned: away)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

From time to time the update process seems to get stuck in a loop. 

Expected behavior:
0) checking "Help > About nightly" shows out of date version
1) prompted to "restart to update..."
2) I click restart and the browser shuts down.
3) updater.exe runs (progress bar is shown)
4) browser relaunches
5) checking "Help > About nightly" shows up to date version

Actual behavior:
0) checking "Help > About nightly" shows out of date version
1) prompted to "restart to update..."
2) I click restart and the browser shuts down
3) browser relaunches
4) checking "Help > About nightly" shows the same out of date version

The updater does not seem to run (or if it does maybe it crashes/closes immediately?). When this happens I need to manually download and reinstall Windows ASan reporter Nightly build and it seems fine for a few more updates.

I noticed this has happened 2x now when the version I am running is out of date and the pending update is also out of date. After installing the pending update I force check for a newer version but opening "Help > About nightly" and wait for the download to complete then I click update. I have tried deleting the pending update in the AppData/Local/Mozilla/updates/ dir and rerunning and this made no difference.

Also I'd like to note the strange behavior that happens between steps 2 and 3. Normally a console window opens when the browser launches. At this point that window sometimes does not close before updater.exe runs.
I'm stuck on 20180805100054
Does this behavior continue if you reboot the machine?

Separately, does duplicating clang_rt.asan_dynamic-x86_64.dll from the install directory into the `uninstall\` subdirectory prevent the issue from coming back?
Flags: needinfo?(twsmith)
FYI, I dug deeply into why my ASAN Nightly wasn't updating with dmajor. Turned out it was erroring out moving uninstall/helper.exe to uninstall/helper.exe.moz_backup, because helper.exe was locked.  There was a copy of helper.exe running; killing that unblocked updates.
Matt, any ideas here?
Flags: needinfo?(mhowell)
Hmm. Well for some background, helper.exe is the one executable that we don't actually compile, the NSIS compiler generates it from a fixed blob that was generated when *it* was built. That means helper.exe doesn't have any ASAN instrumentation or know anything about ASAN at all. But I don't know why that would mean that starting it from an ASAN updater would cause it to hang like that. I'll see if I can reproduce this.
Flags: needinfo?(mhowell)
So yes, I can reproduce it, but only once every several update attempts for reasons I don't currently understand.

It's not actually helper.exe that's hanging, regsvr32.exe is hanging when helper.exe invokes it to register the updated version of AccessibleHandler.dll. Now, regsvr32 has to load the DLL that it's registering and call its DllRegisterServer export, so it's ending up loading clang_rt.asan_dynamic-x86_64.dll into regsvr32, which seems like its causing something nasty to happen that I haven't nailed down yet.
Thanks for looking into this, Matt!

Worst case could probably de-ASan AccessibleHandler (and anything else we regsvr -- AccessibleMarshal?) as part of bug 1478096.
(In reply to David Major [:dmajor] from comment #7)
> Worst case could probably de-ASan AccessibleHandler (and anything else we
> regsvr -- AccessibleMarshal?) as part of bug 1478096.

My guess is that that's all we can really do. I think those would be the only two binaries involved.

I'm attaching the stack for the main thread of the hung regsvr32, for anyone who might know what to make of it. There are 530 frames in this stack; I'm surprised it hasn't overflowed.
(In reply to Matt Howell [:mhowell] from comment #8)
> Created attachment 9011576 [details]
> Regsvr32.exe stack trace

Interesting! The hex digits in the module name suggest that this module was unloaded: clang_rt_asan_dynamic_x86_64_7ffd758f0000. 

And at the very bottom of the stack is FreeLibrary... so I bet regsvr is unloading AccessibleHandler (and the ASan DLL) but ASan didn't clean up its hook of ntdll!memmove (nor its AddVectoredExceptionHandler registration) so any use of memmove will jump into bad memory, and then in response to that the exception dispatcher also jumps into bad memory, and then again and again...
(In reply to David Major [:dmajor] from comment #9)
> Interesting! The hex digits in the module name suggest that this module was
> unloaded: clang_rt_asan_dynamic_x86_64_7ffd758f0000. 

Oh, it was and I had windbg reload it manually to get the symbols. Sorry for failing to mention that.
From a cursory look, there is no RemoveVectoredExceptionHandler in the ASan codebase, so I have a feeling that the runtime wasn't designed to be unloaded, and that it would likely be nontrivial to change.

I wonder if, as a hack, we could make the DLL pin itself.
I can't be completely sure because this was always intermittent, but before it took me about 5 tries to hit the problem, and with that try build I haven't had it happen in 10 attempts, so I do think it's working.
The ASan runtime wasn't designed to be unloaded, so pin it in memory.
Assignee: nobody → dmajor
Comment on attachment 9012266 [details]
Merge upstream patch to pin the ASan DLL

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9012266 - Flags: review+
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b793141b127
Merge upstream patch to pin the ASan DLL. r=ted
https://hg.mozilla.org/mozilla-central/rev/4b793141b127
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Flags: needinfo?(twsmith)
Component: Custom Release Requests → Release Requests
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: