<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

17 years ago

Depends on: 427820

ken kovash

Comment 1

•

17 years ago

Attached file proposed crash report — Details

Chofmann and I have started creating a proposed data script/query and report format. The attached example uses most of the data provided by Benjamin on bug 427820. We want a way to look at all dll's across all crashes, count those dll's, provide a couple additional metrics (these last two are left blank in the attached example)... and then make all of this data/reporting both platform specific and release specific.

ken kovash

Comment 2

•

17 years ago

Ted -- copying you with Benjamin being on leave. this bug is also related to bugs 427820 and 412605.

Comment 3

•

17 years ago

I watch Benjamin's address, so I get mail for bugs he's CCed on. I'm not sure what the intent of this report you've attached is, this just seems to be the baseline data? Do you need anything more from us to proceed here?

Reporter

Comment 4

•

17 years ago

We need to start thinking about how we can turn ken's rough prototype into script that can run and produce (nightly/weekly?) reports. Maybe this is more morgamic. I see that he has this marked as 0.8 work. If anyone can contribute feedback on what ken proposes before work on this starts in 0.8 that would be great.

Reporter

Comment 5

•

17 years ago

the analysis part of this bug might share some of the same back end process needed for bug 439679 and we might want to have several report outputs that look at the data in a variety of ways.

Reporter

Updated

•

17 years ago

Blocks: 439679

Michael Morgan [:morgamic]

Reporter

Comment 6

•

17 years ago

I think this kind of analysis could have spotted this bug pretty easily bug 441649 Firefox 3.0 Crash Report [@ nsBaseWidget::RemoveChild(nsIWidget*) ] - metasearch addon xshared.dll 1.0.0.42 might only appear in crashes related to stack signature nsBaseWidget::RemoveChild

Blocks: 441648

Martijn Wargers (dead)

Updated

•

17 years ago

Blocks: 441649
No longer blocks: 441648

Updated

•

17 years ago

Target Milestone: 0.8 → ---

Wayne Mery (:wsmwk)

Updated

•

17 years ago

Severity: normal → enhancement

OS: Mac OS X → All

Hardware: PC → All

Reporter

Comment 7

•

16 years ago

another possible output of this analysis was just spotted in https://bugzilla.mozilla.org/show_bug.cgi?id=434403#c111 ------- Comment #111 From Henrik Skupin 2008-11-14 16:37:14 PST ------- I've taken a look at a couple of these reports and each one lists some bogous DLL files which have random names and cannot be found by searching on Google. Looks like that Firefox 3.0.x can be used to detect this trojan. ------------------------------------------------------------------ lets find new .dll's names that don't appear in previous reports or anti-virus databases and turn thenm over to others that can investigate the possibility that these are new spyware variants.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 8

•

16 years ago

it would be very cool to get alerts when the name of some new .dll never seen before was detected in the module list of an incoming report.

Comment 9

•

16 years ago

Chris, that would be somewhat problematic for crash reports like we have on bug 434403. Because the file names are chosen randomly by the trojan you wont get a significant list of new DLL files. It will be mostly garbage.

Tony Mechelynck [:tonymec]

Comment 10

•

16 years ago

Could this also be used to detect *.so (Linux shared libraries) which don't jibe with the version they were used with? I remember random crashes caused by having installed a newer .tar.gz/.tar.bz2 version "on top of" an older version which included some .so library not present in the newer version. (The solution is to remove the installdir with all its contents before unpacking the archive for a newer version, but not everyone does that.)

Reporter

Comment 11

•

16 years ago

re: comment 9 sure it would not work in all cases. It would work in others. for example if we had this report running right now we could do a search for a Trojan DLL named “wmimachine2.dll” and find out how many people have been hit by the current zero-day attacks on Adobe Flash and PDF reader, before they have gotten the patch out, or before anti-virus packages have started detecting. http://www.scmagazineuk.com/Finjan-detects-zero-day-attacks-due-to-Adobe-vulnerability/article/140564/ This is definitely going to be a needle in the haystack kind of problem so developing good filters is going to be part of it.

Reporter

Comment 12

•

16 years ago

bug 366973 is marked as fixed. just printing that stuff out in a report somewhere would help to a bunch of related analysis.

Reporter

Comment 13

•

16 years ago

if we had a tab separated text file that contained signature \t uuid_url \t last_crash \t product \t version \t build \t branch \t os_name \t os_version \t comma,separated,list,of,the,module,list,in,alpha,order we could start to analyze some of the module list data in spreadsheets and text processing tools. I've heard the module list is in a format that isn't easy to work with, but is something like this possible?

Comment 14

•

16 years ago

Socorro is not presently parsing the module data provided by breakpad. Does anyone have an ort of institutional memory about why we stopped collecting module info? Assuming we re-started collecting the module data, the report mentioned in comment #13 would be reasonably easy to do. The alternative would be to parse the crash's json file on demand. I think Aravind mentioned that we are now storing json files for only a few days, so the window of opportunity to do on-demand parsing is small. If we want historical data, or the load is high, or we need to compare crashes, then we probably need to look at parsing / saving the module data as crashes are processed.

Comment 15

•

16 years ago

We're collecting it, just not storing it the database. This was a database-size-and-maintenance issue IIRC, because it's difficult to normalize the table and a non-normalized table was very large.

Comment 16

•

16 years ago

Agree: My 'collecting' should have been spelled 'saving'. On average, the module list per crash seems to be something over 100 modules long (based on an exhaustive analysis of three data points); and both the module list and the details within each module are different, even when the crashes have the same signature. It does appear that trying to normalize such data would not work because there would be too many distinct module strings; and trying to save it in raw form would make storage even heavier. Saving only the module name (e.g: 'nss3.dll' or 'XUL' would be much more feasible, probably requiring only several hundred to a few thousand distinct module names. That would make normalizing the data simple, and storing it reasonably easy, especially if we store the module list as a comma-sep list within the database. Would such coarsly ground data be useful?

timeless

Comment 17

•

16 years ago

imo, not without the version and hash. if you retained those two bits, ... maybe. is it possible to have a table which would have (key autoinc, name, version, hash) and then have module list just reference things by that key?

Reporter

Comment 18

•

16 years ago

version info would be nice, but we can get started without it. we could also filter the list down considerably by removing any module that we have symbols for on the symbol server. That would be the first step I'll look at in the post processing of the data, but if we could do it at the time when the crash reported is digested and stored in the database that would be a plus for me. with product version number info we know the versions of modules that we have symbols for.

Comment 19

•

16 years ago

I think it makes sense to filter out known product modules: there's no point in correlating crashes against versions of xul.dll or js3250.dll. It also makes sense to filter out known system modules which are always present, such as libc, libstdc++, system32, etc... it's possible but unlikely that we'll correlate crashes against particular versions of those, and removing them significantly reduces the dataset size. However, I don't think it makes sense to remove all modules we have symbols for: we have symbols for various extensions and plugins, and hopefully will have more in the future, and it's likely that we'll be able to correlate crashes against those. How exactly we enumerate "known product and system modules" should be considered: it would probably be better to keep a list in the DB instead of hardcoding it, so that as products and systems change we can keep up with potential name changes.

Comment 20

•

16 years ago

Also note that Socorro in general has no knowledge of symbol files at all. It treats Breakpad like a black box, and simply accepts the minidump_stackwalk output with or without symbols, never knowing (or caring) whether they're present for any symbols. In the script I implemented to fetch missing Win32 symbols for bug 419882, I simply had a blacklist of modules that I knew were part of Mozilla apps, and ignored them: http://hg.mozilla.org/users/tmielczarek_mozilla.com/fetch-win32-symbols/file/tip/blacklist.txt It wouldn't be hard to expand that list to include common Windows/Mac/Linux system symbols, so you could pare the list down to simply third-party symbols (plugins, drivers, malware, etc).

Comment 21

•

16 years ago

per comment #17 What timeless suggests is exactly what normalization does. The issue here is keeping the size of the 'module info' table small enough to be reasonable. Adding a version column probably multiplies the table size, long term, by about 10 (immediately, it has small effect). I don't know what a 'hash' is here. If it is a hash of all the data except the name and version, then I think it would be too much: Would basically get us back to just storing it all, since a good hash would be different for each different set of details. I think name and version is quite feasible. If 'hash' has fewer than 10 values for a given name and version, then I'll go on a limb to say that would be feasible. More than 1000 hash values per name/version: Not feasible. per comment #18, comment #19 A list of module names/versions to not keep track of would be easy to handle in the database: Simple (LDAP authorized) GUI to add/remove them, and simple SQL to access them from the processor. Items missing from that list add a little noise, with small conceptual cost for programmers (I think). Items improperly in that list probably would be noticed 'pretty soon' when a programmer tries to see details that aren't there. Maybe 'almost anyone' can remove items from the list, but you have to be specially authorized to put them in?

Comment 22

•

16 years ago

A hash is a unique identifier for a particular build of a DLL. Most details will have common modules, since (for example) system32.dll will have the same hash/version for everyone who's using Windows XP SP3, or Windows Vista SP1, etc... so I don't expect that the modules table will grow multiplicitavely large. Since some (many?) DLLs don't have useful versions, the hash is usually the better thing to key on... the version number is mainly useful for human-readable communication. It's unlikely that if a DLL has a version number it will have more than one hash. As for the blacklist management, a few key technical people for each project (ted and myself, a few people each from SM and TB) would be sufficient to maintain it.

Reporter

Comment 23

•

16 years ago

re: comment 21 about the number of possible/reasonable hashes... Here is a sample of the number of flash versions detected on mozilla.com cut off at 20... include various platform and debug versions and its easy to see the number of hashes getting into the hundreds pretty quickly for flash alone. 1. 10.0.22 2,302,363 60.7% 2. 10.0.12 585,302 15.4% 3. 9.0.124 275,640 7.3% 4. 10.0.32 225,881 6.0% 5. -1 116,714 3.1% 6. 9.0.115 82,708 2.2% 7. 9.0.159 54,124 1.4% 8. 9.0.47 29,641 0.8% 9. 9.0.151 24,253 0.6% 10. 9.0.45 23,791 0.6% 11. 9.0.28 22,946 0.6% 12. 10.0.2 19,153 0.5% 13. 8.0.22 7,809 0.2% 14. 9.0.16 6,972 0.2% 15. 7.0.19 2,606 0.1% 16. 8.0.24 2,171 0.1% 17. 10.0.b218 1,054 0.0% 18. 10.0.15 797 0.0% 19. 9.0.19 746 0.0% 20. 10.0.26 692 0.0%

Comment 24

•

16 years ago

Sure, but hundreds is not a big deal. I'd only be worried about database size if we ended up with 50k. If we exclude browser DLLs and known system DLLs I think we'll be well under that mark.

Comment 25

•

16 years ago

50,000 rows/table divided by 200 rows/item = 250 items/table. Assuming "hundreds" is 200. My look at a few data points seems to indicate that 250 is the appropriate order of magnitude for the number of modules, so this probably fits. Given this raw datum: Module|libplds4.dylib|0.1.0.0|libplds4.dylib|414E08F7EE504E7FBFED13DA7F38DFE20|0x00f46000|0x00f50fff|0 I'm guessing the hash is 414E08F7EE504E7FBFED13DA7F38DFE20, correct? If so, then from my three data points we get about 380 distinct hashes out of 430 lines. Looking two at a time, we see a little under 20% overlap for some pairs, and effectively no overlap for others. I'm not statistician enough (and three points isn't data enough) to make any prediction from that...

Comment 26

•

16 years ago

I wouldn't use libplds4.dylib as an example, since it's part of Firefox. There's going to be more variation in modules we ship, since we upload new builds every day, therefore ensuring a huge amount of different modules. It'd be more interesting to look at chofmann's example of flash--npswf32.dll on Windows, "Flash Player" on mac, and libflashplayer.so on Linux.

Jesse Ruderman

Updated

•

16 years ago

Summary: find and report on bad .dll's in the process list that correlate with particular stack signatures → find and report on bad modules (e.g. DLLs) in the process list that correlate with particular stack signatures

Jesse Ruderman

Comment 27

•

16 years ago

Updated

•

16 years ago

Whiteboard: cloud

Reporter

Comment 28

•

16 years ago

I guess there are two ways the analysis might go with tools related to this bug. one way is to look at a big pool of reports to try and figure out what combination of .dll's and versions that might be associated with a crash. e.g. we don't know whats happening, lets look to see if we can see a common pattern in what extra .dll might be running.... the other way is more targeted. in this case we have a hunch that a particular plugin is the cause of the crash, and we just want to confirm or deny if its entirely the same version of the plugin or more distribute across versions. cww and I have a few tools for doing the later now. my tool basically: 1) gets a list of crash reports for a particular signature 2) foreach report grab the version of the .dll we are interested in - e.g. grep -i "Module|"NPSWF32.dll temp | awk -F'|' '{printf "\t%s\t%s\n",$2,$3}' 3) circle though the full list of reports and kick out version info or a summary report with counts of all the different versions encountered in the crash reports.

Reporter

Comment 29

•

16 years ago

sounds like cww has some rough scrapping tools that also look at the "big pool" analysis part.

Damon Sicore (:damons)

Updated

•

16 years ago

Whiteboard: cloud → cloud[crashkill][crashkill-metrics]

Jesse Ruderman

Comment 30

•

16 years ago

Fixed by bug 521917?