Closed Bug 523350 Opened 15 years ago Closed 15 years ago

Warn users about malware if they crash with known-bad modules

Categories

(Toolkit :: Crash Reporting, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jimm, Assigned: jimm)

References

(Blocks 1 open bug)

Details

(Whiteboard: [crashkill])

To try and decrease the number of crashes related to malware infestation, we're going to start alerting users about suspect modules found in the address space of mozilla apps. Initially this will be based on a small, static module list stored in crash reporter, including module name, version number, and if possible, a hash signature of the file (or any other information we can use to prevent false positives). When detected in a crash report, we should throw up a dialog (content TBD by UX) informing the user of the issue.
Can you be more specific about where this is planned to be implemented? We don't know anything about the content of the crash report on the client-side, generally. (Although clearly we can know what modules are loaded in our address space while our program is running.)

It's not clear to me what "stored in crash reporter" or "detected in a crash report" means here.
Ted, see bug 467167 for some malware module examples. This bug came out of discussion over that bug in the Crash Kill meeting yesterday.

(In reply to comment #1)
> Can you be more specific about where this is planned to be implemented?

If we have the module list (the list of modules that show up on crash stats) we could walk that list and look for a small set of common malware modules ("calc.dll" for example), and if found, put something up for the user.  

> We
> don't know anything about the content of the crash report on the client-side,
> generally.

It's in the report data though, can't we pull this information out on the client side somehow?

(Although clearly we can know what modules are loaded in our address
> space while our program is running.)
> 
> It's not clear to me what "stored in crash reporter" or "detected in a crash
> report" means here.

This bug represents step one of multiple steps we *might* take to address malware infested machines. Longer term we would have the server side do the identifying when a report is submitted and report back to crash reporter whether or not to alert the user.
(In reply to comment #2)
> It's in the report data though, can't we pull this information out on the
> client side somehow?

"Sort of". The problem is that we don't currently process (and don't have the capacity to process) every single crash report. Currently we do server-side throttling, by storing and ignoring some large percentage of reports. If the user clicks the link to their crash report in about:crashes, we will give it priority and process it (since they clearly want to see it). Otherwise, their crash may never be processed.

In the near term, we have implemented client-side throttling (but not turned it on yet), where the server can tell the client that its report is not wanted, and the client will simply save the unprocessed data locally. The user can then re-submit the report via about:crashes if they're interested in seeing its contents (we send an extra field in the re-submitted report to indicate that it can't be throttled).

All that being said, if the crash report is processed, then yes, we have a JSON dump of the report available on the server that you can access to pull out things like the module list etc. but note that we simply don't process most reports.
Additionally, while the information is in fact present on the client side before the report is processed, it's stored in the minidump file. I suppose on Windows there are probably APIs for reading the minidump file, but otherwise we'd have to build the server-side Breakpad code and use that to get any info out of it.
(In reply to comment #4)
> Additionally, while the information is in fact present on the client side
> before the report is processed, it's stored in the minidump file. I suppose on
> Windows there are probably APIs for reading the minidump file, but otherwise
> we'd have to build the server-side Breakpad code and use that to get any info
> out of it.

Processing the minidump on the client initially was the idea. Server side would handle all the identification down the road, but to get things started we wanted to do this in the client to see how effective it is.
Ok, thanks for the clarification. One more question: do you plan to do this in the crash reporter client itself, or in Firefox when it's launched after a restart?

Doing things in the crash reporter client, while nicer in that we can present the data even if Firefox crashes on startup, sucks in that it's all platform-native code and you can't use any of the niceties we have available in the Mozilla platform.
It would be a lot easier to do this *before* we crash than after we crash. Also, it would probably be easier to store the list of bad modules in a text file than compile it directly into any of our binaries.

Why do you think it's easier to do this on the client side now, rather than just doing the server-side solution immediately? I bet we could implement a server-side solution for 3.6 without requiring any string changes (since the strings/warning website could be provided on the server).

Pulling the module information out of the minidump on the client is really tricky. We theoretically have the information in the exception handler, but we can do very little with it there (we're not even allowed to allocate memory). Once we end up in the crash reporter app, all we have is binary blobs and very little way to read or understand them.
(In reply to comment #6)
> Ok, thanks for the clarification. One more question: do you plan to do this in
> the crash reporter client itself, or in Firefox when it's launched after a
> restart?
> 
> Doing things in the crash reporter client, while nicer in that we can present
> the data even if Firefox crashes on startup, sucks in that it's all
> platform-native code and you can't use any of the niceties we have available in
> the Mozilla platform.

The crash reporter to be more generic (not specific to Fx) however, doing it in Fx at some point would be alright as long as it didn't effect perf stats. Crash reporter also made better sense in that if down the road we do all this processing on the server side and send bad module data back in the response to a submitted report, crash reporter would be involved.

Also, one of the things we want to try and avoid is false positives, this may involve generating hashes of the actual suspect files. That's something more suited to a background process.
 
(In reply to comment #7)
> It would be a lot easier to do this *before* we crash than after we crash.
> Also, it would probably be easier to store the list of bad modules in a text
> file than compile it directly into any of our binaries.

Yes, the idea was to make the client list easily to update.

> 
> Why do you think it's easier to do this on the client side now, rather than
> just doing the server-side solution immediately? I bet we could implement a
> server-side solution for 3.6 without requiring any string changes (since the
> strings/warning website could be provided on the server).

The consensus in the meeting was that to get something up and running to test this idea out, a client side list would be simpler. The server side infrastructure for this (if we do it) would take upwards of six months to put together. That work will likely get spun off in tandem to this.   

> 
> Pulling the module information out of the minidump on the client is really
> tricky. We theoretically have the information in the exception handler, but we
> can do very little with it there (we're not even allowed to allocate memory).
> Once we end up in the crash reporter app, all we have is binary blobs and very
> little way to read or understand them.

Tricky, but not impossible? If it's impossible or involves questionable methods, falling back on a server side solution and accepting the delay may be in order. 

(On the server side vs. client discussion I really need to yield to some of the other folks at the meeting, my preference was server side as well.)
> reporter also made better sense in that if down the road we do all this
> processing on the server side and send bad module data back in the response to
> a submitted report, crash reporter would be involved.

We would definitely *not* send back an immediate response. When a crash report is submitted, we queue it and process it later (or sometimes not at all). So we'd really need to check for a response from the server in a few minutes. It would be best to do this from Firefox itself, not from the crash reporter app.

> The consensus in the meeting was that to get something up and running to test
> this idea out, a client side list would be simpler. The server side
> infrastructure for this (if we do it) would take upwards of six months to put
> together. That work will likely get spun off in tandem to this.  

Huh. I think I could write an extension to do this in a couple hours and wouldn't require any server changes at all. The client already knows the crash report IDs, and could just poll the JSON data from the crash reporter. The downside would be, of course, that we'd basically be asking the crash server to process all reports, instead of throttling.

> Tricky, but not impossible? If it's impossible or involves questionable
> methods, falling back on a server side solution and accepting the delay may be
> in order.

It would involve compiling and linking some set of the minidump processor code... we don't need the stackwalk logic, but at least the data to split apart a minidump and retrieve the module list.
Blocks: 506338
Summary: Implement basic malware module blacklist in crashreporter → Warn users about malware if they crash with known-bad modules
Some quick remarks I would like to make:
- If we are going to use local storage for module checking, we would have to protect the "signature files" from unauthorized edits.

To solve the server-side problem, the processing could take place after the crash, AND could be integrated directly into Firefox.
Example:
- Crash Reporter starts up and collects data. Saves a copy offline and uploads the data to the server.
- Next time you run Firefox, it could:
 1) Check for a recent crash (already happens in FF 3.5 afaik)
 2) Check crash modules locally stored by the Crash Reporter.
 3) Match crash modules with "signature" list.
 4) If there's a positive match, it could display a warning to the user from within the same page that apologizes for the Firefox crash.

The warning has to be written in a way that will make clear that Firefox doesnt detect virus. So instead of "Firefox detected that a virus may have crashed your browser", we could use "The last Firefox crash 'wasn't typical'/'is suspecious' and might have been cause by a virus. Please scan your computer."
(In reply to comment #10)
> Some quick remarks I would like to make:
> - If we are going to use local storage for module checking, we would have to
> protect the "signature files" from unauthorized edits.

This isn't possible, and shouldn't be in scope.
(In reply to comment #11)
> (In reply to comment #10)
> > Some quick remarks I would like to make:
> > - If we are going to use local storage for module checking, we would have to
> > protect the "signature files" from unauthorized edits.
> 
> This isn't possible, and shouldn't be in scope.

Most likely viruses would develop a way to remove themselves from the local database, if we consider to store it that way.
That is a war we cannot win, it doesn't matter.
> Huh. I think I could write an extension to do this in a couple hours and
> wouldn't require any server changes at all. The client already knows the crash
> report IDs, and could just poll the JSON data from the crash reporter. The
> downside would be, of course, that we'd basically be asking the crash server to
> process all reports, instead of throttling.
> 

Is there any way the extension could tell the server to process all reports based on a client id? A simple extension as a starting point might be a great first step.
We don't have client IDs, but we know the crash GUIDs (we use that to show about:crashes). So we'd just use that same data to request the JSON format, collect the module list from that, and do something with it.
We could, instead of actually notifying people, "flag" modules on the online crash reports as part of the server processing. 

It would just be another information that Mozilla's SUMO could use to help diagnose virus-related crashes. 
In some ways, it would defeat the idea of directly warning about viruses, but right now we can't even tell if a crash is virus-related without googling unknown modules first.

But first things first, a concept extension would be an awesome start. Look at https://addons.mozilla.org/pt-PT/firefox/addon/11217 for a guideline, I use it and it is an awesome tool.
A few comments here:

The crashes that brought this up were crashes that happened on startup the first time that we tried to perform any network access (bug 467167). For these crashes we're very limited in what we can do in the firefox process. We can't try to pull down any data processed on the server since network access crashes us.

This was one reason we discussed doing this detecting in the crash reporter process, in the hopes that malware/viruses doesn't hook that process.

The other thing is that we really want to try to get the information that the crash is likely caused by malware/viruses in the user's face. We apparently already do have this information in SUMO and direct users there, but only if the user clicks enough "tell me more"-type buttons. Given the number of crashes we're still seeing, that doesn't seem to work well enough currently.

So it'd be really great if we could show the user a "you likely crashed due to malware/viruses" in the crash-ui. I guess showing it on first startup after the crash is almost as good, however we'd obviously need to ensure to do that before we crash again.
I'm worried that complaining after *any* crash, when malware is present, will cause malware authors to disable this feature (or breakpad, or even Firefox).  Complaining to the user only when the crash signature matches might be more effective in the long run.
I think it's critical to keep this dead simple to start with to see if it's at all effective in helping our users eliminate the malware that hits the most of them. Down the road there's tons of stuff we can do to make this better, but I'd like to see something done here quick, and that won't involve much, but should show us whether it's worth pushing forward to a more elaborate system in future versions.
I see there was some back-and-forth in bug 467167, and I'd like to make a proposal based on what was originally discussed there:
1) We ship a simple text file of "potentially problem causing DLLs" with Firefox.
2) After a crash, when the user restarts Firefox via the crash reporter, we set an environment variable or command line argument to indicate that we're restarting from a crash
3) Very early in startup, if this flag is set, we inspect our process space for loaded modules matching this list. Because we only do this when restarting from a crash, this can be a slower path involving checking module signatures etc.
4) If there are any matches, we display a notice to the user, noting that these modules may be the cause of their crashes.

I think this would be a pretty simple solution to start with. If we wanted, we could expand it in the future to keep the list updated like we do with the blocklist, by downloading a newer copy from the internet. I'm sure if we ship this it will eventually get targeted by malware etc, but there just isn't a whole lot we can do about that. Even if this is only effective to warn people about buggy firewall or antivirus software that causes startup crashes it would help a lot of users.
Whiteboard: [crashkill]
Ok, so who's going to do that?
@luser: My comment #10 basically tells that idea, and I have been thinking on the text file problem.

A possible solution would be implementing updates in a way that makes sure the locally-stored list is valid and updated before even trying to match DLLs. This could be done with a hash instead of the entire file. Next time Firefox crashed, any edits made to the text file would then be reversed within the update process.

These suggestions could be executed in the future, but right now we need to concentrate on the basics.
So I would start with the proposal on comment #20 , it sets some basic goals.
(In reply to comment #20)
> I see there was some back-and-forth in bug 467167, and I'd like to make a
> proposal based on what was originally discussed there:
> 1) We ship a simple text file of "potentially problem causing DLLs" with
> Firefox.
> 2) After a crash, when the user restarts Firefox via the crash reporter, we set
> an environment variable or command line argument to indicate that we're
> restarting from a crash
> 3) Very early in startup, if this flag is set, we inspect our process space for
> loaded modules matching this list. Because we only do this when restarting from
> a crash, this can be a slower path involving checking module signatures etc.
> 4) If there are any matches, we display a notice to the user, noting that these
> modules may be the cause of their crashes.
> 
> I think this would be a pretty simple solution to start with.

I agree, but don't you think this would be simpler to do in the crashreporter once a crash happens than in Firefox once restarted after a crash?
Johnny is right, we could do it on crashreporter and make it pass that flag to Firefox.
> I agree, but don't you think this would be simpler to do in the crashreporter
> once a crash happens than in Firefox once restarted after a crash?

It's unlikely that the crashreporter will have the same set of modules loaded into its process as Firefox will. In fact I'm worried that the check-on-startup solution is likely to miss icky DLLs that hook into the process later (e.g. via window hooks, via plugins, or even via extensions which don't load at startup).
But the crash report has the crash info for the crash it's reporting, which is what contains the list of what was loaded at the time of the crash. That's the info we'd need to use, not what libraries are loaded in the crash reporter process.
But as previously discussed, getting that info out of the minidump is a pain.
taking.
Assignee: nobody → jmathies
Flags: blocking1.9.2?
removing the b1.9.2 request flag as this will require strings, which are now frozen.
Flags: blocking1.9.2?
See also bug 524904, "Add support for generic DLL blocklist".
i'd like to note that the crash report does *not* have the info you think it has.

it has the list of *currently* loaded modules.

a module is free (and if evil, eventually likely) to load into an address space, allocate memory, copy its code into the allocated memory, fixup addresses, and unload itself.

when we crash, that module will not be listed in "loaded modules". in a debugger, if you're lucky there *might* be hints of unloaded modules.

some other bug is working on just preventing libraries from successfully loading. i consider that a much more effective measure. we could also simply have *that* code build up a list of libraries that it didn't block, into a file that would be easily managed by crash reporter.

crash reporter could then send that list to some other service to see if there are any bad guys.

we can also arrange for crash reporter to retrieve a current blocklist for when it restarts and have it feed that directly to firefox.

for kicks, we could also sign [windows signing] crashreporter and arrange for it to refuse to allow anything to load in its process (afaik nothing has any business loading into it, although someone should figure out if MSAA violates this assumption).
timeless: while that's certainly feasible, as you know we've seen plenty of crash reports with malware modules clearly present in the module list. We can fix that simple case, and worry about the more complicated case later.
Not even sure we're doing this any more given that we're instead attempting bad dlls from even loading into our process. Not blocking.
Flags: blocking1.9.2-
I think we can go ahead and mark this as wont fix.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
sicking: what's the bug number for that?
You need to log in before you can comment on or make changes to this bug.