Add "mac_crash_info" to the "details" page of crash reports, and make it searchable
Categories
(Socorro :: General, task, P2)
Tracking
(Not tracked)
People
(Reporter: smichaud, Assigned: willkg)
References
Details
Attachments
(5 files, 1 obsolete file)
Patches have just been landed at bug 1577886 to add support for __crash_info
data to Breakpad. Another is about to be landed at https://github.com/mozilla-services/minidump-stackwalk/pull/29. I'm opening this follow up bug to deal with two tasks:
-
Make a summary of
__crash_info
data (if present) available on the "details" page of crash reports at https://crash-stats.mozilla.org/. -
Make this data searchable.
Once the GitHub pull request has landed, __crash_info
data will be available on the "raw data" page of crash reports. But its format won't be particularly user-friendly, and of course it won't (yet) be searchable. Gabriele has pointed out that before we make either of these changes, we need to make sure this data doesn't contain user-sensitive information.
Apple's __crash_info
data is completely undocumented, beyond a few references to it in the source code available at https://opensource.apple.com/. But it's only system modules that contain a __crash_info
section. So the information in them can only be very low-level. For example I'd be surprised if it could contain URLs. I'd expect the concept of a URL to only be understandable by higher-level, Mozilla-specific code.
I have a lot of experience reverse-engineering macOS, so this is something I can try to check. I'll spend a few hours doing that and report back.
By the way, I only have the vaguest notion of "user-sensitive information". Is there a good definition of it somewhere, that I can rely on?
Reporter | ||
Comment 1•4 years ago
|
||
Here are my ideas of how the two tasks from comment #0 can be accomplished, once the problem of user-sensitive information is resolved.
I'd like the summary of __crash_info
data on the "details" page to look something like this (from the output of minidump_stackwalk
):
Application-specific information:
Module "/System/Library/Frameworks/Security.framework/Versions/A/Security":
message: "CryptKit fatal error: Raise test exception from _pthread_cond_wait(1)"
Module "/usr/lib/system/libsystem_c.dylib":
message: "abort() called"
I'd like the following fields (from mac_crash_info
in the output of stackwalker
) to be searchable:
num_records
message
signature_string
backtrace
message2
thread
dialog_mode
abort_cause
I'd also like the whole mac_crash_info
field (all of its contents) to be searchable, like proto signature
is.
Assignee | ||
Comment 2•4 years ago
|
||
For adding this data to the details page, I can take a pass at that and attach screenshots that we can iterate on.
Does the __crash_info
data include argument data in the messages? For example, Java crash reports for exceptions that occur when manipulating strings include the string arguments in the message and that can contain urls being visited.
Examples of sensitive data that shouldn't be public:
- personally identifiable information: names, phone numbers, addresses, email addresses, SSNs, drivers license cards, passport numbers, personal credentials, account numbers, passwords, urls of visited sites
- sensitive data: exploitability information, anything Mozilla confidential, credentials
I don't know offhand if there's a list somewhere. Having one would be a good idea--I wrote up bug #1709688 to cover that.
For making the data searchable, generally, I don't make data in a crash report searchable unless it's useful to search. I'd need to know more about what questions users might be asking and how they'd be searching these fields.
For example, num_records
doesn't seem interesting to search to me. What questions would an engineer have such that they're searching for crash reports that have some number of records?
Can you walk me through how you expect users to be looking at each field?
Also, I'm not sure how to take what's in the stackwalker output and convert it into that list of fields. I see num_records and for each record message and module. Where do the rest of them come from?
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
•
|
||
I don't have a good handle on all the information that can be included in __crash_info
. But, like I said in comment #0, I'm confident it's all very low-level. For example I doubt it can ever contain a URL. This is helped by the fact that Gecko tends to do everything itself, and not rely on system code for any high-level stuff. I'll know more when I've finished my survey of the __crash_info
sections in all the system modules that Firefox pulls in which have one. That's only about 50 modules. They shouldn't take too long to work through.
The only __crash_info
section I'm already familiar with is the one in /System/Library/PrivateFrameworks/GPUSupport.framework/Libraries/libGPUSupportMercury.dylib
. Code in its gpusGenerateCrashLog.cold.1()
can write either of the following two messages to __crash_info
's 'signature_string' field:
Graphics kernel error: 0x%08x\\n
Graphics hardware encountered an error and was reset: 0x%08x\\n
Where do the rest of them come from?
One place to see all the fields is in the code I added to ConvertProcessStateToJSON()
, here.
I agree that num_records
won't be interesting to most people. But I'm quite interested in it. The reason is that I don't what's the largest number of records that's practically possible, or even the number to expect in a "typical" crash report.
The other fields are self-explanatory, I think. Any of them might contain critically useful information. Because Apple hasn't documented __crash_info
, I don't really know what to expect. Aside from the research I've promised to do into those 50 modules I mentioned above, I have no way of finding out except to look at crash reports as they come in. Unless these fields are searchable, doing that will be like looking for a needle in a haystack.
You should probably treat all the "string" fields (message
, signature_string
, backtrace
and message2
) as free-form strings. Searching on them should mean finding out whether or not they contain a given substring. The other fields (thread
, dialog_mode
and abort_cause
) are numeric -- at least Apple's very sparse documentation seems to indicate that.
Examples of sensitive data that shouldn't be public:
Thanks very much for these! Your list is very helpful.
Reporter | ||
Comment 4•4 years ago
|
||
(Following up comment #3)
Also, I think the entire mac_crash_info
field should be searchable as a free-form string, like proto signature
. You'd be trying to find out whether or not it contained a given substring.
I didn't mention the module
field above, because I don't think it's important to be able to search on it individually. But I would like it to be considered part of the contents of mac_crash_info
, for the purposes of searching on the whole field.
Assignee | ||
Comment 5•4 years ago
|
||
Grabbing this to work on in the next week or so.
Reporter | ||
Comment 6•4 years ago
•
|
||
(Following up comment #0)
Apple's
__crash_info
data is completely undocumented, beyond a few references to it in the source code available at https://opensource.apple.com/. But it's only system modules that contain a__crash_info
section. So the information in them can only be very low-level. For example I'd be surprised if it could contain URLs. I'd expect the concept of a URL to only be understandable by higher-level, Mozilla-specific code.I have a lot of experience reverse-engineering macOS, so this is something I can try to check. I'll spend a few hours doing that and report back.
This going to take me longer than I expected, because it's a lot more complicated than I expected. I may end up recommending that we not make public the __crash_info
data from certain modules. But doing that prematurely may end up making it impossible to do the research required to find out whether or not information from those modules' __crash_info
sections is too sensitive.
I'll have at least a preliminary report available either later today or sometime tomorrow.
Edit: It's going to be sometime tomorrow.
Comment 7•4 years ago
|
||
Here's my thoughts about this: I don't think that making the individual fields searchable is valuable but it would be useful to be able to search crashes that have / don't have this particular bit and it would be nice to be able to do free form searches in the field as if it were a simple string. My reasoning is the following: this is a little bit like last_error_value on Windows: it's not something that matters in and by itself but it's interesting to know if all crashes under a given signature have the same (or similar) errors recorded there.
Reporter | ||
Comment 8•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #7)
My reasoning is the following: this is a little bit like last_error_value on Windows: it's not something that matters in and by itself but it's interesting to know if all crashes under a given signature have the same (or similar) errors recorded there.
Actually, the information in __crash_info is much more precisely targeted than last_error_value. It's usually written, just before an abort, to specify the reason for that abort. See for example gpusGenerateCrashLog.cold.1()
from comment #3.
it would be nice to be able to do free form searches in the field as if it were a simple string.
By "in the field" do you mean the whole mac_crash_info
field? If so, then I don't really object to your suggestion. With __crash_info
basically undocumented, it's hard to tell which field to look in for information. We can be much more certain that important information will be found somewhere in mac_crash_info
than we can that it will be found in any particular field within mac_crash_info
. On the other hand, it'd be good to find out, over time, if some fields in mac_crash_info
tend to be used for specific purposes.
So yes. Let's make it possible to do free form searches in the entire mac_crash_info
field, and find out which crashes have or don't have data in particular fields within mac_crash_info
.
Reporter | ||
Comment 9•4 years ago
|
||
Also, of course, we should be able to search for crashes that have or don't have a mac_crash_info
field at all.
Reporter | ||
Comment 10•4 years ago
|
||
Here's my preliminary report.
I quickly found that I wouldn't have time to report on all the modules pulled in by Firefox that have __crash_info
sections. So I concentrated on seven of them whose names indicate they are more likely to log user-sensitive information. Of these, I found only one that actually does:
/usr/lib/libnetwork.dylib
It's used by Firefox's DNSResolver. And when it's effected by low level errors, it can write external ip addresses to __crash_info
. So we should probably prevent the public from seeing __crash_info
data from this module.
At least for now, I think everything else should be reported without restrictions. I'll keep my eye on crash reports with __crash_info
data, and so I assume will others. If I see user-sensitive information there, I'll open a new bug, mark it security-sensitive, and CC at least Gabriele and Will.
Reporter | ||
Comment 11•4 years ago
|
||
I created this list using output from my HookCase hook library from bug 1577886 comment #12. Beforehand I commented back in the code that traces Firefox's crash handling.
Reporter | ||
Comment 12•4 years ago
•
|
||
Here's the HookCase hook library I used to test writing __crash_info
data in the libnetwork.dylib
and AccountsDaemon
modules (as a patch on https://github.com/steven-michaud/HookCase/blob/master/HookLibraryTemplate/hook.mm).
Reporter | ||
Comment 13•4 years ago
|
||
(Following up comment #10)
/usr/lib/libnetwork.dylib
It's used by Firefox's DNSResolver. And when it's effected by low level errors, it can write external ip addresses to
__crash_info
. So we should probably prevent the public from seeing__crash_info
data from this module.
I'm now much less confident that we need to prevent this module's __crash_info
data from becoming public. I notice that the ip addresses in my logs never include pages that Firefox has visited -- including sites I'm quite sure I've never visited before, or at least for a very long time (so it's unlikely they're in some kind of cache).
All of the logged NWConcrete_nw_endpoint
objects are created by calls to mozilla::net::GetAddrInfo()
from here:
Maybe we should consult someone who knows this code, and who could tell us whether any of the host addresses that pass through here are user-sensitive.
Reporter | ||
Comment 14•4 years ago
|
||
Revised version of the HookCase hook library from comment #12.
Reporter | ||
Comment 15•4 years ago
|
||
(Following up comment #13)
I'm now much less confident that we need to prevent this module's
__crash_info
data from becoming public. I notice that the ip addresses in my logs never include pages that Firefox has visited -- including sites I'm quite sure I've never visited before, or at least for a very long time (so it's unlikely they're in some kind of cache).
I've now figured out what was happening: I had dns over https turned on. When I turned it off, I started seeing ip addresses for the sites I was visiting. (I also saw a lot more logging.)
So yes, we probably do need to prevent __crash_info
data from the following module from becoming public:
/usr/lib/libnetwork.dylib
Reporter | ||
Comment 16•4 years ago
•
|
||
Just to make things clear:
Crashes in /usr/lib/libnetwork.dylib
of the kind that might cause user-sensitive information to be written to its __crash_info
section are vanishingly rare. There have been none over the last six months, aside from the ones I myself triggered, using the HookCase hook library I've attached to this bug. (My crashes all have hook.dylib
in the stack trace.)
So we probably don't need to work out how to prevent __crash_info
data from /usr/lib/libnetwork.dylib
from becoming public before we start allowing mac_crash_info
to appear in crash reports.
I continue investigating what can show up in the __crash_info
sections of system modules pulled in by Firefox. So far I haven't discovered any more user-sensitive information. I'll post another report later today.
Edit: It'll be sometime tomorrow.
Reporter | ||
Comment 17•4 years ago
|
||
(Following up comment #16)
I broadened my search a bit and found two crashes (besides my own) in /usr/lib/libnetwork.dylib
over the last six months that might cause user-sensitive information to be written to its __crash_info
section. I'd still say they're "vanishingly rare", though.
Assignee | ||
Comment 18•4 years ago
|
||
I spent a bunch of time thinking about this. I can add the mac_crash_info
to the Details tab--that's pretty straightforward.
The mac_crash_info
structure has an array of structures in it--I can't break that up into parts that I can index and make searchable. Further, I can't index structures.
I think what I'm going to do is serialize the data as a string and then index that string and make it searchable. It's not great, but I think it'll give you something you can use to answer questions like:
- what are all the crash reports that have a
mac_crash_info
? - what are all the crash reports that have "CryptKit" in the
mac_crash_info
?
Then we can iterate on that in the future.
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
Reporter | ||
Comment 21•4 years ago
|
||
I just triggered another of my CKRaise crashes (all of whose minidumps contain __crash_info
data):
bp-a3ae783c-13a2-4ccf-8197-b7a840210511
But there isn't any mac_crash_info
information in either the "details" page or the "raw data" page.
Will: How long will it take for your changes to work their way into public-facing systems?
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Comment 22•4 years ago
|
||
I merged a patch which automatically gets deployed to the staging site. You can see your crash here on the staging site:
https://crash-stats.allizom.org/report/index/a3ae783c-13a2-4ccf-8197-b7a840210511
This involved an index change, so the mac_crash_info
field won't be searchable until Monday when the new index is created. Crash reports submitted after the new index is created will get indexed correctly and will be searchable.
In order for this to be available in our production environment, I need to do a prod deploy. I'll probably do that later this week. I hit issues with availability last week, so it's possible it may take me longer. I'll update the bug as things progress.
Reporter | ||
Comment 23•4 years ago
•
|
||
Thanks for the info.
https://crash-stats.allizom.org/report/index/a3ae783c-13a2-4ccf-8197-b7a840210511
This looks fine to me. I notice that you've chosen to display mac_crash_info
in the "details" page exactly as it's displayed in the "raw data" page. It takes up a bit more room that way than as minidump_stackwalk
displays it. But it shows people exactly how to search on substrings of mac_crash_info
. I assume that, once mac_crash_info
becomes searchable, it will be possible to do a search like mac_crash_info contains '"num_records": 2'
. Is that right?
{
"num_records": 2,
"records": [
{
"message": "CryptKit fatal error: Raise test exception from _pthread_cond_wait(1)",
"module": "/System/Library/Frameworks/Security.framework/Versions/A/Security"
},
{
"message": "abort() called",
"module": "/usr/lib/system/libsystem_c.dylib"
}
]
}
Assignee | ||
Comment 24•4 years ago
|
||
Yes. I had no specification for the structure, so I figured it's best to show it JSON encoded for now. Regarding searches, yes, I'm pretty sure that's right. I think we'll know more once stage creates a new index on Monday.
Reporter | ||
Comment 25•4 years ago
•
|
||
This report is about large, general-purpose system modules. After the first batch, these are the most likely to write user-sensitive information to their __crash_info
sections. I worked through everything they might write there, and I didn't find anything that might be user-sensitive.
So it looks like Apple is quite careful about what it writes to __crash_info
. I'll keep my eyes open. If I find problems, I'll open security-sensitive bugs about them. But I doubt I'll find anything. It seems like what I reported above about /usr/lib/libnetwork.dylib
is very much the exception.
Unless something comes up, I don't plan on writing any more of these reports.
Assignee | ||
Comment 26•4 years ago
|
||
If it turns out to contain protected data, we can lock it down--we've got runbooks for that. I think you've done the due diligence and I feel comfortable with where things are. I really appreciate the work you've done on this!
Reporter | ||
Comment 27•4 years ago
|
||
You're most welcome! It'll be very good to have access to this new trove of Mac crash data.
Assignee | ||
Comment 28•4 years ago
•
|
||
I pushed the code to prod in bug #1711055. On Monday, a new index will get created and we should be able to search the mac_crash_info
field. I'll keep this open and needinfo me to verify that next week.
Reporter | ||
Comment 29•4 years ago
|
||
Crashes with mac_crash_info
have started to appear at https://crash-stats.mozilla.org/, all (so far) with the signature gpusGenerateCrashLog.cold.1
:
But I've noticed that the "aggregate on" function doesn't work on mac_crash_info
, though the option is available.
Is this something that will be resolved by having a new index? Or will getting this functionality require extra work, and maybe a new bug?
Assignee | ||
Comment 30•4 years ago
|
||
The "aggregate on" won't work until we have a new index because there's no data for the mac_crash_info
field being indexed, yet.
Reporter | ||
Comment 31•4 years ago
•
|
||
It seems that the index was created earlier today. I just tested searching on all crash reports that have mac_crash_info
records, and the search worked. It found three identical crash reports, all created today. I also tested the "aggregate on" function, and that also worked. But since the test reports' mac_crash_info
sections were identical, the test was pretty minimal. I'll try more sophisticated searches as more crash reports get added to the index.
Reporter | ||
Comment 32•4 years ago
•
|
||
I've already found one puzzle, though. Here's a search for all crash reports with signatures containing "gpusGenerateCrashLog", on macOS and the 90.0a1 branch, created since 2021-05-13 05:01PM UTC (when comment #28's push to prod happened). Oddly, the results don't contain any of today's crashes:
(All these crash reports happen to contain mac_crash_info
.)
Assignee | ||
Comment 33•4 years ago
|
||
I see crashes from today:
Given that we can search and aggregate on this field, I'm going to mark it as FIXED. If there are additional needs, we'll probably need to do additional work and we can do that in a new bug.
Reporter | ||
Comment 34•4 years ago
|
||
(Following up comment #32)
But today's crashes do show up if I limit the search to "the last 24 hours" or the "the last 3 days". So it looks like I've found a bug in the search page (possibly an old one), which is probably unrelated to the index change.
Assignee | ||
Comment 35•4 years ago
|
||
Hrm... That's puzzling. Seems like there's some funny business with date stamps. If I push the end date of the original query to 6:00pm (18:00), then three crash reports show up, but they're all before 4:00pm (16:00).
I don't think this search issue is related to the new index getting created. I think it's more likely there's some timezone conversion happening somewhere that shouldn't be. It should get a new bug.
Assignee | ||
Comment 36•4 years ago
|
||
I wrote up bug #1711550 to cover the date filter issue with super search.
Reporter | ||
Comment 37•4 years ago
•
|
||
(Following up comment #31)
The index now includes __pthread_kill | abort | gpusGenerateCrashLog.cold.1
crashes with more than one kind of "graphics kernel error". So I reran my test of the "aggregate on" function. It works fine:
Edit: To see the results you have to explicitly choose "aggregate on mac_crash_info".
Reporter | ||
Comment 38•4 years ago
|
||
It just occurred to me to try "faceting" on mac_crash_info
. It also works fine:
Reporter | ||
Comment 39•3 years ago
|
||
I've just opened bug 1713355 for a followup issue.
Reporter | ||
Comment 40•3 years ago
|
||
I've opened bug 1714190 for another followup issue.
Reporter | ||
Comment 41•3 years ago
•
|
||
I've opened bug 1715812 for another followup issue.
Edit: This turns out to be an Apple bug, and not a problem with Socorro.
Description
•