Missing function return addresses when unwinding the stack on Windows x64 with injected third-party libraries
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: yannis, Unassigned)
References
Details
Bug 1794064 involves a third-party injected DLL called aswJsFlt64.dll
. While analyzing the minidumps from the crash reports associated with that bug, I noticed that the stack contains more function return addresses than what was showing in our crash infrastructure. With a correct stack unwinding, there should have been additional return addresses pointing to functions in aswJsFlt64.dll
, but only one was showing. It seems that when we reach a function that lives in that third-party injected DLL, we cannot continue unwinding the stack.
I am not sure if the source of the problem is (but I think it is the second option):
- we do the stack unwinding on the user's machine, and the DLL is not found because it is listed as
aswjsflt64.dll.t01
in the module list whereas it isaswjsflt64.dll
on disk; - or we do the stack unwinding on the server side, so we don't have the DLL at hand when we do stack unwinding.
Proper stack unwinding on Windows requires unwind data that is stored in the DLL itself in the data directories. Crash analysis itself is also severly limited if we don't have all the DLLs involved in the call stack at hand.
I wonder if it may be necessary to collect some third-party DLLs under some circumstances in order to fix this problem. In that case, what do you think the circumstances should be and what could that look like?
Or, alternatively, maybe we could unwind the stack on client side - where all DLLs are available - and send that together with the crash report?
Thanks!
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 1•2 years ago
|
||
(In reply to Yannis Juglaret from comment #0)
I am not sure if the source of the problem is (but I think it is the second option):
- we do the stack unwinding on the user's machine, and the DLL is not found because by it is listed as
aswjsflt64.dll.t01
in the module list whereas it isaswjsflt64.dll
on disk;- or we do the stack unwinding on the server side, so we don't have the DLL at hand when we do stack unwinding.
Stack unwinding is done server-side using symbol files extracted from known DLLs, see this presentation for an overview of the complete flow. In the case of the Avast DLLs we don't have corresponding symbol files and thus can't unwind properly through its frames.
Proper stack unwinding on Windows requires unwind data that is stored in the DLL itself in the data directories. Crash analysis itself is also severly limited if we don't have all the DLLs involved in the call stack at hand.
When we encounter unknown DLLs during crash processing their names, code and debug IDs are stored. Later on a script that runs every 24h tries to scrape those DLLs from known symbol servers (the script is here). Currently we fetch DLLs from Microsoft, Intel, AMD and NVidia's symbol servers.
I wonder if it may be necessary to collect some third-party DLLs under some circumstances in order to fix this problem. In that case, what do you think the circumstances should be and what could that look like?
Yes, the system described above isn't enough to get all the DLLs we care about, but we could manually scrape more. We have specific scrapers for Linux for example, and I'm writing a new one to scrape DLLs from Windows graphics drivers that aren't available on symbol servers. I guess we could download various AV installers, unpack them then dump the symbols for the DLLs they contain... I don't know if it's worth the fuss though.
Or, alternatively, maybe we could unwind the stack on client side - where all DLLs are available - and send that together with the crash report?
We do stack unwinding client-side too, however we don't send the results to crash-stats, we send them in the crash ping. The tool we use for client-side stack unwinding supports reading native unwind information from DLLs (see the code here) but only on x86-64. We plan on rewriting it in Rust (see bug 1743983 and the project here).
So there's a lot of moving parts to this and there is definitely room for improvements.
Comment 2•2 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Description
•