Crash in [@ mozilla::interceptor::ReadOnlyTargetBytes<T>::EnsureLimit]
Categories
(Core :: mozglue, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox72 | --- | unaffected |
firefox73 | + | wontfix |
firefox74 | + | disabled |
firefox75 | --- | disabled |
People
(Reporter: jseward, Assigned: toshi)
References
(Regression)
Details
(Keywords: crash, regression, topcrash, Whiteboard: zeropatch)
Crash Data
Attachments
(3 files, 1 obsolete file)
This bug is for crash report bp-1148a11f-339e-4d5c-a38f-0c56b0191214.
This crash appears in two different installations of the Windows nightly
20191212095326. There were 13 crashes for these 2 installations.
Top 10 frames of crashing thread:
0 firefox.exe void mozilla::interceptor::ReadOnlyTargetBytes<mozilla::interceptor::MMPolicyOutOfProcess>::EnsureLimit mozglue/misc/interceptor/TargetFunction.h:519
1 firefox.exe static class mozilla::interceptor::ReadOnlyTargetFunction<mozilla::interceptor::MMPolicyOutOfProcess> mozilla::interceptor::WindowsDllPatcherBase<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyOutOfProcess> >::ResolveRedirectedAddress mozglue/misc/interceptor/PatcherBase.h:30
2 firefox.exe bool mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyOutOfProcess> >::AddHook mozglue/misc/interceptor/PatcherDetour.h:352
3 firefox.exe bool mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyOutOfProcess> >::AddDetour mozglue/misc/nsWindowsDllInterceptor.h:438
4 firefox.exe bool mozilla::interceptor::FuncHookCrossProcess<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyOutOfProcess> >, long mozglue/misc/nsWindowsDllInterceptor.h:229
5 firefox.exe mozilla::InitializeDllBlocklistOOP browser/app/winlauncher/DllBlocklistInit.cpp:50
6 xul.dll mozilla::SandboxBroker::LaunchApp security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp:339
7 xul.dll class RefPtr<mozilla::MozPromise<void*, mozilla::ipc::LaunchError, 0> > mozilla::ipc::WindowsProcessLauncher::DoLaunch ipc/glue/GeckoChildProcessHost.cpp:1490
8 xul.dll class RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, 0> > mozilla::ipc::BaseProcessLauncher::PerformAsyncLaunch ipc/glue/GeckoChildProcessHost.cpp:957
9 xul.dll nsresult mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, 0>, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, 0> > xpcom/threads/MozPromise.h:1343
Comment 1•5 years ago
|
||
I seem to get this crash myself on Windows 10. Nightly only, not stable. Started a few weeks ago. Have no clue what causes it. Matching crash symbols. Does not happen on Linux install.
Assignee | ||
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Let's remove the release asserts as a quick patch to start, then investigate the failures after.
Comment 3•5 years ago
|
||
I looked into a few of these crashes and it seems like the one consistent element is 0PatchLoaderX64.dll. I tried installing 0patch in a VM and even with no patches enabled Nightly crashes on startup.
Comment 5•5 years ago
|
||
I removed 0patch and the startup crash stopped.
Comment 7•5 years ago
|
||
0Patch support said the issue lays in something writing to xul.dll. He attached a screenshot in the above link. Turns out the issue is a Firefox problem triggered by 0Patch. Other programs theoretically can cause it, too.
Comment 8•5 years ago
|
||
Thanks for reporting it to 0patch, Michael. I haven't been able to access the request on zendesk, though, even after creating an account.
Comment 9•5 years ago
|
||
Comment 10•5 years ago
|
||
Comment 11•5 years ago
|
||
Above is attached some information the tech found. Should help patch the issue.
Comment 12•5 years ago
|
||
Hi there, Mitja Kolsek of 0patch here. I'll be happy to help reproduce this issue if needed. Also, if there's any suspicion that 0patch might be responsible for at least a part of this problem, I'd like to coordinate our actions. Other than that, kudos to the Firefox team for building and maintaining my favorite browser! ;)
Assignee | ||
Comment 14•5 years ago
|
||
I've got a consistent local repro with 0patch. Let me start debugging and root-causing.
Assignee | ||
Comment 15•5 years ago
|
||
Ok, the root cause is clear now. 0Patch injects 0PatchLoaderX64.dll into firefox by hooking ntdll!NtTestAlert
via kernel callback of process creation.
In the browser process, 0PatchLoaderX64.dll replaces the export table entry for LdrLoadDll
with a trampoline when it's loaded. After that, when the browser process spawns a tab process and tries to hook LdrLoadDll
, it actually tries to hook the browser process's trampoline, which does not exist in the child process. That's why the browser process hits this MOZ_RELEASE_ASSERT
.
0PatchLoaderX64.dll is loaded in the launcher process, but somehow it does not modify the export table.
The fix would be either 1) or 2), or both. Let me think how we can implement it.
- to get a real address of
LdrLoadDll
even though the export table is tampered - safely to fail to detour if a target address is not readable in a target process
Comment 16•5 years ago
|
||
Since they have a representative in this bug, maybe we can just ask them to not do that?
Comment 17•5 years ago
|
||
Thanks for your analysis, Toshihito!
0patch Agent indeed injects a DLL (0PatchLoaderX64.dll for 64bit processes) into every newly created process via kernel callback, and into all already-running processes upon 0patch Service startup. Both are implemented by the MadCodeHook library (http://www.madshi.net/madCodeHookDescription.htm, used by several products out there) which we have licensed and integrated into our product.
Once our DLL is loaded, it hooks a single Windows API functions, LdrLoadDll
, using MadCodeHook's hooking mechanism. We absolutely need to do this at this point because it allows us to detect when a process loads a new executable module that we should apply a micropatch to. We are in the early stages of development of a new micropatching engine which will neither use hooks nor inject a DLL at all but this is where we're at right now.
Obviously hooking is a problem when two actors want to hook the same function. MadCodeHook's hooking is making sure that multiple "hookers" (pun intended) can add and delete hooks in arbitrary order but only if they all use the same method of hooking. This is probably not the case here though.
That said, we unfortunately can't stop hooking LdrLoadDll
for now, but I'm curious about two things:
-
With 0patch installed, my Firefox Nightly crashes only sometimes, which I assume is a case of a race condition (if we hook before Firefox does => crash; if Firefox hooks first => no crash, but probably also 0patch doesn't work properly)
-
With 0patch removed, I still get occasional crashes (see my comment and screenshot of WinDbg above) - are these two different crashes? (Sorry, I'm not very good at reading Firefox crash reports and attaching WinDbg to firefox.exe always gives me the same access violation on a "mov dword ptr [0], ecx instruction", whether 0patch Agent is installed or not.)
Updated•5 years ago
|
Assignee | ||
Comment 18•5 years ago
|
||
I tried the approach 2). It worked, but I needed to change multiple places. Probably it's not a good approach..
Thank you for the information! So this injection library tries to hook ntdll!LdrLoadDll
first, and if it's already hooked by someone, it goes to the export table and replaces the entry. I think that's why EnsureLimit
hits the assert only in the browser.
(In reply to Mitja Kolsek from comment #17)
With 0patch installed, my Firefox Nightly crashes only sometimes, which I assume is a case of a race condition (if we hook before Firefox does => crash; if Firefox hooks first => no crash, but probably also 0patch doesn't work properly)
With 0patch removed, I still get occasional crashes (see my comment and screenshot of WinDbg above) - are these two different crashes? (Sorry, I'm not very good at reading Firefox crash reports and attaching WinDbg to firefox.exe always gives me the same access violation on a "mov dword ptr [0], ecx instruction", whether 0patch Agent is installed or not.)
I think you're hitting a different issue. The crash at EnsureLimit
100% happens and there is no timing issue. It's actually very strange to hit a crash at mov dword ptr [0], ecx
because that instruction always causes AV and it should not be in the code. Somebody tried to modified the code, but it was not finished..?
In the attached Firefox_Nightly_Crash.png, debugger did not load symbols. Can you configure your debugger to use Mozilla's symbol server? Then we will see actual functions names on debugger.
Comment 19•5 years ago
|
||
Okay, this is a WinDbg screenshot of the crash I'm getting everytime I launch Nightly with WinDbg attached on Windows 7 64bit (fully updated).
Updated•5 years ago
|
Assignee | ||
Comment 20•5 years ago
|
||
When WindowsDllInterceptor
detours a function in a remote process, it calculates
a target address via GetProcAddress
in the caller's process first, and detours
that address in the target process. If the caller's export table was modified, the
target address might be invalid in the target process.
With this patch, WindowsDllInterceptor
uses the target process's export table to
calculate a target function address.
Assignee | ||
Comment 21•5 years ago
|
||
(In reply to Mitja Kolsek from comment #19)
Created attachment 9117514 [details]
Firefox_Nightly_Crash-with_symbols.pngOkay, this is a WinDbg screenshot of the crash I'm getting everytime I launch Nightly with WinDbg attached on Windows 7 64bit (fully updated).
Thank you for providing a new screenshot. Looks like you're hitting this assert. I searched the crash reports, but I didn't find the same one (If there is data, the signature would be this).
We can conclude the occasional crash is a different issue from of this bug (crash at EnsureLimit
). We can discuss it in a new bug, but it would be difficult to investigate without a repro.
Updated•5 years ago
|
Comment 22•5 years ago
|
||
The priority flag is not set for this bug.
:glandium, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•5 years ago
|
Comment 23•5 years ago
|
||
Getting crash on startup after update of Nightly. Tried clean profile, safe mode, and downloading latest nightly as of today.
I checked the Windows process list on this Windows 10 VM where I have admin access and I do not see the word 0patch.
Admittedly Windows is not my primary OS so I could be looking in the wrong place.
Comment 24•5 years ago
|
||
[Tracking Requested - why for this release]:
can we paper over the issue with the approach from comment #2 in beta at least? in the first hours of 73.0b1 this signature is accounting for a third of all browser crashes - though admittedly it hasn't gotten much usage and this figure isn't reliable yet.
Updated•5 years ago
|
Comment 25•5 years ago
|
||
Changing the priority to p1 as the bug is tracked by a release manager for the current beta.
See What Do You Triage for more information
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (me-too) |
Comment 27•5 years ago
|
||
(In reply to Myron from comment #26)
Just found and followed this bug report and the same problem exists in the beta 1 & 2 release of Firefox 73. I've had to revert back to Forefox 72 public release.
Forgot to mention I'm on Windows 10 Pro 1909.
Comment 28•5 years ago
|
||
.... also realised forgot to report same was happening on a second computer where 0patch has never been installed.
Comment 29•5 years ago
|
||
Yeah, I'm almost positive my VM here at work does not have 0patch unless it hides from me pretty well.
I had to go back to stable. It's a particularly sucky bug since it crashes on startup on clean profiles, so the only way to update is https://ftp.mozilla.org and blowing away the existing nightly folder. But ofc no point in updating until it is fixed.
Comment 30•5 years ago
|
||
FYI, we @0patch have issued a troubleshooting article for this compatibility issue: https://0patch.zendesk.com/hc/en-us/articles/360011227079-Mozilla-Firefox-72-can-crash-upon-launching-when-0patch-Agent-is-installed
Comment 31•5 years ago
|
||
Based on the comment above, I checked my registry and there is no 0patch entry under HKLM → Software but things are still crashing. I guess it could be some other agent that does the same thing.
Comment 32•5 years ago
|
||
(In reply to [:philipp] from comment #24)
[Tracking Requested - why for this release]:
can we paper over the issue with the approach from comment #2 in beta at least? in the first hours of 73.0b1 this signature is accounting for a third of all browser crashes - though admittedly it hasn't gotten much usage and this figure isn't reliable yet.
This is one of the top crashes showing up in the 73 betas at this point. Can we proceed with a wallpaper for now to stop the bleeding?
Assignee | ||
Comment 33•5 years ago
|
||
Aaron, sorry for pinging you again. Do you need more time to review my patch? If so, I'll prepare a different patch just to remove the release assert once I verify it stops the crash.
Comment 34•5 years ago
|
||
I'd like to note that Firefox 72.0.1 is affected as well.
Comment 35•5 years ago
•
|
||
(In reply to Mitja Kolsek from comment #34)
I'd like to note that Firefox 72.0.1 is affected as well.
we only see this particular crash signature in firefox 73 and upwards. can you take a look in about:crashes and provide a crash id generated from firefox 72, if it's affected as well? thank you
Assignee | ||
Comment 37•5 years ago
|
||
Thanks, Aaron.
This issue happens only when the launcher process is enabled. Disabling the launcher process via registry would be a temporary workaround (It's not automatically disabled like a launcher failure because of the crash).
Removing MOZ_RELEASE_ASSERT
did not mitigate the problem. With that, the crash is gone, but all tab processes fail to start.
Comment 38•5 years ago
•
|
||
[removed incorrect comment about stable - jumped to conclusions after firefox did not launch first couple of times I doubleclicked the icon]
Updated•5 years ago
|
Comment hidden (me-too) |
Comment hidden (me-too) |
Comment 41•5 years ago
|
||
Comment 42•5 years ago
|
||
bugherder |
Comment 43•5 years ago
•
|
||
Please nominate this for Beta approval when you get a chance.
EDIT: Though we may want to hold off on uplifting until bug 1608645 has been assessed.
Comment 44•5 years ago
|
||
I think this is ready for a Beta approval request now alongside bug 1608645.
Assignee | ||
Comment 45•5 years ago
|
||
Comment on attachment 9117893 [details]
Bug 1604008 - Use a target process's export table to cross-process detour. r=aklotz
Beta/Release Uplift Approval Request
- User impact if declined: Firefox fails to start if 0Patch is installed.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: Bug 1608645
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The fix is to implement a part of
GetProcAddress
on our own, which is fully covered by a unittest. This caused a regression Bug 1608645 when an antivirus software modifies memory in an unexpected way. Given that the fix for Bug 1608645 was landed and we don't have any other regression reports from Nighly in the last 10 days, we can uplift this fix with a low risk. - String changes made/needed: None
Assignee | ||
Comment 46•5 years ago
|
||
I added a request, but can we hold up until tomorrow? I want to wait one more night to make sure the fix for Bug 1608645 does not cause a new problem.
Comment 47•5 years ago
|
||
Comment on attachment 9117893 [details]
Bug 1604008 - Use a target process's export table to cross-process detour. r=aklotz
Topcrash fix. Approved for 73.0b9.
Comment 48•5 years ago
|
||
bugherder uplift |
Comment 49•5 years ago
|
||
Backed out from Beta along with bug 1608645 to avoid shipping bug 1610790 in 73.0b9.
https://hg.mozilla.org/releases/mozilla-beta/rev/6fa1f4dcb6b1
Updated•5 years ago
|
Comment 50•5 years ago
|
||
Comment on attachment 9117893 [details]
Bug 1604008 - Use a target process's export table to cross-process detour. r=aklotz
I'm not crazy about shipping this issue in 73, but I'm also worried about these patches going out in an RC release given the regression-prone nature of this area of code. Let's leave this on the radar for a possible 73 dot release ride-along in case this explodes post-release, but otherwise we'll let it ride Fx74 to release.
Updated•5 years ago
|
Comment 51•5 years ago
|
||
73rc1 has crashed in the same way as the betas with 0patch installed. When I implement 0patch's workaround to exclude module firefox.exe
FF 73rc1 starts.
Comment 52•5 years ago
•
|
||
Confirmed. "stable" firefox 73 is now crashing on all the machines at work. Unfortunately, I can't use the 0patch workaround because there's
no hint of 0Patch in the registry. Going to guess some similar tool is crashing everything.
What sucks about this, once it is fixed, is that AFAIK the only solution is to tell everyone to redownload and reinstall firefox. It's not possible to use firefox autoupdate to fix this.
Hm. We do have something called Cylance though - perhaps it is doing something similar to 0patch and GData
Comment 53•5 years ago
|
||
backout |
Comment 54•5 years ago
|
||
For those following along from home, comment 53 was because the patch in bug 1610790 obsoletes what landed here. We'll be closely monitoring how it does on Nightly and Beta and go from there when making a decision about what to do about 73 at this point. At this point, more me-too reports probably aren't going to help move anything along, however.
Comment 55•5 years ago
|
||
Setting the bug flags to hopefully encapsulate what was said in comment 54 (that all the work related for this set of related crashes is moving to bug 1610790, not that we've stopped trying to implement a fix for this on the Firefox side of things).
Updated•5 years ago
|
Updated•5 years ago
|
Comment 56•5 years ago
|
||
backout |
Updated•5 years ago
|
Description
•