Closed Bug 634343 Opened 14 years ago Closed 14 years ago

Run a mapreduce job to find crash reports for frankeninstalls

Categories

(Socorro :: General, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: christian, Assigned: aphadke)

References

Details

Attachments

(7 files)

In bug 633869 we noticed the Firefox dlls are mismatched. These are invalid installs and people are crashing on startup. We need to see how pervasive the problem is and if similar mismatches have nbeen seen before. We would like a map-reduce job to search through crashes and count how many crash reports are from installs like this (add app/dll versions I guess). I'll get more specifics shortly.
Probably the simplest reduction of this is to provide a list of modules that we ship, and find any crash reports where those modules do not have the exact same version numbers. (Modulo bug 634282.)
We ship some dlls with different version numbers for various reasons. nss dll's contain the nss version, nspr has its own version, as does sqlite. In bug 633869 we saw firefox.exe version 1.9.2.4038 (3.6.14 build 1) and brwsrcmp.dll 1.9.2.4055 (3.6.14 build 2) so maybe we can start with those two. Or restrict it to dlls whose versions start "1.9.2." for 3.6 and "1.9.1." for 3.5 if we're checking those, too (if we're doing this exercise it's probably worth knowing about 3.5).
Blocks: 633869
from bug 634351 comment 0: bug 633869 seems to be caused by people having Firefox components from two separate builds. It would help us to know if this is a new problem or something that happens regularly. Is it possible to construct queries of this kind? Not sure what the best evidence might be, but let's go with what we see in bug 633869: search for Firefox 3.6.x crashes where the module version of firefox.exe does not match brwsrcmp.dll. For example, in bug 633869 we're seeing firefox.exe version 1.9.2.4038 (3.6.14 build 1) and brwsrcmp.dll 1.9.2.4055 (3.6.14 build 2). 1) of people with Firefox 3.6.14, what percentage of crashes show these specific versions, what percent show both 1.9.2.4038 and what percent show both 1.9.2.4055 2) Of Firefox 3.6.x in general, what percentage of crashes have different module versions? 3) Can we get a count by Firefox version (or buildID) of how many crashes show mismatched modules. We're looking to see if this is an ongoing persistent issue or if it got worse at some point. Maybe percentage would be better than count since the vast majority of 3.6 users will be using 3.6.13 and that will swamp any numerical results. I'm assuming you'd run this query over a small time range for sanity's sake. A couple of days or a week would be plenty of data. Maybe even just one day if the query takes a long time to run.
Assignee: nobody → aphadke
data for 1. (comment #4) date: 2011-02-13 to 2011-02-14 total firefox_windows_crash 639942 format: DLL_1 && DLL_2 count\n firefox.exe|1.9.2.4038 brwsrcmp.dll|1.9.2.4055 107 firefox.exe|1.9.2.4038 brwsrcmp.dll|1.9.2.4038 3360 firefox.exe|1.9.2.4055 brwsrcmp.dll|1.9.2.4055 4457
So about 57% upgraded to build2, and 2.4% failure rate on the upgrades. I'll guess that hugely overstates the number of people with broken builds because as a start-up crash people will try/crash multiple times before giving up -- call it a "2.4% unhappiness rate". Probably way more than we want to ship with.
aphadke, I think this needs to be a map/reduce thing since it involves looking at a large sample of module lists. the only way I have at getting at that data new is to do screen scrape of a collection of reports from crash-stats. it would be nice if we could run a number of jobs that run analysis over all the module lists of all crash reports. this would be one report, dbaron's module correlations would be another, malware analysis another, and stuff like in Bug 634097] Compare beta 10 hardware acceleration usage to beta 11 hardware acceleration usage still another example.
I ran a sample of 1000 reports from 3.6.14 build crashes on feb 14. here are rough counts of the differ combinations of firefox.exe and brwsrcmp.dll that I came up with. 391 firefox.exe 1.9.2.4055 brwsrcmp.dll 1.9.2.4055 278 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4038 255 27 firefox.exe 1.9.2.4038 ( brwsrcmp.dll missing?) 20 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 missmatch 15 firefox.exe 1.9.2.4055 ( brwsrcmp.dll missing?) 10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038 mismatch 5 FIREFOX.EXE 1.9.2.4038 brwsrcmp.dll 1.9.2.4038 mismatch 1 firefox.exe 1.9.2.4021 brwsrcmp.dll 1.9.2.4021 raw data with links to reports are attached.
chofmann - comment #5 is actually a MR job that goes through the entire dataset to get the version mismatches. I am currently running another MR job for item 2 in comment #4. Will update the ticket once its done. I agree on your suggestion of doing analysis over all module list of all crash reports. Can you file a separate bug and assign it to me or lemme know the bug-id so I can take a look at it.
(In reply to comment #8) > 255 255 Firefox crashes have neither firefox.exe nor brwsrcmp.dll? > 27 firefox.exe 1.9.2.4038 ( brwsrcmp.dll missing?) > 15 firefox.exe 1.9.2.4055 ( brwsrcmp.dll missing?) Probably startup crashes before component loading. Inconclusive. > 10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038 mismatch Pretty sure .3989 is the stock 3.6.13 release, in which case it's an old problem? Might still jibe with bug 466778 making it tons worse though. > 5 FIREFOX.EXE 1.9.2.4038 brwsrcmp.dll 1.9.2.4038 mismatch Those match, don't know why the case changed on the filename. > 1 firefox.exe 1.9.2.4021 brwsrcmp.dll 1.9.2.4021 Was that a nightly? .4021 is 17 days before our "build1" candidate.
So we already had frankenbuilds in the upgrade from 3.6.13 to 3.6.14 build1 bp-08e0fd1d-fc6e-4c04-8152-b0a222110214 -- the same mixed set of modules as the ones in bug 633869 (except for an earlier build). The signature is js_DestroyScriptsToGC -- the new topcrash that caused us to back out bug 599610. Were we misled, and all of those were frankenbuilds too?
(In reply to comment #10) > > 10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038 mismatch > > Pretty sure .3989 is the stock 3.6.13 release, in which case it's an old > problem? Might still jibe with bug 466778 making it tons worse though. Awesome: all but one of those 10 are the js_DestroyScriptsToGC crash (bug 631105) that prompted us to back-out bug 599610 and do build2.
(In reply to comment #9) > > I agree on your suggestion of doing analysis over all module list of all crash > reports. Can you file a separate bug and assign it to me or lemme know the > bug-id so I can take a look at it. Bug 634498
full days crash analysis for feature 2. in comment 4 date: 20110211 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:null 1773803 Firefox:3.6.3 firefox.exe:null brwsrcmp.dll:null 49191 Firefox:3.6.8 firefox.exe:null brwsrcmp.dll:null 41788 Firefox:3.6.10 firefox.exe:null brwsrcmp.dll:null 39043 Firefox:3.6 firefox.exe:null brwsrcmp.dll:null 39009 Firefox:3.6.12 firefox.exe:null brwsrcmp.dll:null 38315 Firefox:3.6.6 firefox.exe:null brwsrcmp.dll:null 21417 Firefox:3.6.4 firefox.exe:null brwsrcmp.dll:null 12540 Firefox:3.6.2 firefox.exe:null brwsrcmp.dll:null 8028 Firefox:3.6.11 firefox.exe:null brwsrcmp.dll:null 7943 Firefox:3.6.9 firefox.exe:null brwsrcmp.dll:null 7529 Firefox:3.6.7 firefox.exe:null brwsrcmp.dll:null 4423 Firefox:3.6.14 firefox.exe:null brwsrcmp.dll:null 2268 Firefox:3.6.15pre firefox.exe:null brwsrcmp.dll:null 128 Firefox:3.6.14 firefox.exe:1.9.2.4038 brwsrcmp.dll:1.9.2.4055 90 Firefox:3.6b4 firefox.exe:null brwsrcmp.dll:null 73 Firefox:3.6b5 firefox.exe:null brwsrcmp.dll:null 68 Firefox:3.6b1 firefox.exe:null brwsrcmp.dll:null 49 Firefox:3.6.13 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 48 Firefox:3.6b2 firefox.exe:null brwsrcmp.dll:null 45 Firefox:3.6.13 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 29 Firefox:3.6b3 firefox.exe:null brwsrcmp.dll:null 25 Firefox:3.6.14pre firefox.exe:null brwsrcmp.dll:null 18 Firefox:3.6.3plugin1 firefox.exe:null brwsrcmp.dll:null 14 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:1.9.2.3989 13 Firefox:3.6.13 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3989 13 Firefox:3.6.14 firefox.exe:1.9.2.3615 brwsrcmp.dll:1.9.2.4038 11 Firefox:3.6.10 firefox.exe:null brwsrcmp.dll:1.9.2.3909 11 Firefox:3.6a1pre firefox.exe:null brwsrcmp.dll:null 5 Firefox:3.6a1 firefox.exe:null brwsrcmp.dll:null 5 Firefox:3.6.8 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3855 5 Firefox:3.6.13 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.13 firefox.exe:1.9.2.3727 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.12 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3951 5 Firefox:3.6 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 4 Firefox:3.6.6 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3828 4 Firefox:3.6.12 firefox.exe:1.9.2.3937 brwsrcmp.dll:1.9.2.3951 4 Firefox:3.6.10 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 4 Firefox:3.6.13 firefox.exe:1.9.2.3909 brwsrcmp.dll:1.9.2.3989 3 Firefox:3.6.6pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.4 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3814 2 Firefox:3.6.13pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll: 2 Firefox:3.6.12 firefox.exe:null brwsrcmp.dll:1.9.2.3951 2 Firefox:3.6.12 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3951 2 Firefox:3.6.10pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.10 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3909 2 Firefox:3.6.8 firefox.exe:null brwsrcmp.dll:1.9.2.3855 1 Firefox:3.6.8 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.80 firefox.exe:null brwsrcmp.dll:null 1 Firefox:3.6.3 firefox.exe:null brwsrcmp.dll:1.9.2.3743 1 Firefox:3.6.3 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.3 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3743 1 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:1.9.0.3071 1 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3667 1 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.1.3642 1 Firefox:3.6.13 firefox.exe:1.9.2.3814 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.12pre firefox.exe:1.9.2.3926 brwsrcmp.dll:1.9.2.3933 1 Firefox:3.6.12 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3951 1 Firefox:3.6.12 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.11pre firefox.exe:null brwsrcmp.dll:null 1 Firefox:3.6.10 firefox.exe:1.9.0.3831 brwsrcmp.dll:1.9.2.3909 1 total_firefox_crash 2193305
full days crash analysis for feature 2. in comment 4 date: 20110211 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:null 1773803 Firefox:3.6.3 firefox.exe:null brwsrcmp.dll:null 49191 Firefox:3.6.8 firefox.exe:null brwsrcmp.dll:null 41788 Firefox:3.6.10 firefox.exe:null brwsrcmp.dll:null 39043 Firefox:3.6 firefox.exe:null brwsrcmp.dll:null 39009 Firefox:3.6.12 firefox.exe:null brwsrcmp.dll:null 38315 Firefox:3.6.6 firefox.exe:null brwsrcmp.dll:null 21417 Firefox:3.6.4 firefox.exe:null brwsrcmp.dll:null 12540 Firefox:3.6.2 firefox.exe:null brwsrcmp.dll:null 8028 Firefox:3.6.11 firefox.exe:null brwsrcmp.dll:null 7943 Firefox:3.6.9 firefox.exe:null brwsrcmp.dll:null 7529 Firefox:3.6.7 firefox.exe:null brwsrcmp.dll:null 4423 Firefox:3.6.14 firefox.exe:null brwsrcmp.dll:null 2268 Firefox:3.6.15pre firefox.exe:null brwsrcmp.dll:null 128 Firefox:3.6.14 firefox.exe:1.9.2.4038 brwsrcmp.dll:1.9.2.4055 90 Firefox:3.6b4 firefox.exe:null brwsrcmp.dll:null 73 Firefox:3.6b5 firefox.exe:null brwsrcmp.dll:null 68 Firefox:3.6b1 firefox.exe:null brwsrcmp.dll:null 49 Firefox:3.6.13 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 48 Firefox:3.6b2 firefox.exe:null brwsrcmp.dll:null 45 Firefox:3.6.13 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 29 Firefox:3.6b3 firefox.exe:null brwsrcmp.dll:null 25 Firefox:3.6.14pre firefox.exe:null brwsrcmp.dll:null 18 Firefox:3.6.3plugin1 firefox.exe:null brwsrcmp.dll:null 14 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:1.9.2.3989 13 Firefox:3.6.13 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3989 13 Firefox:3.6.14 firefox.exe:1.9.2.3615 brwsrcmp.dll:1.9.2.4038 11 Firefox:3.6.10 firefox.exe:null brwsrcmp.dll:1.9.2.3909 11 Firefox:3.6a1pre firefox.exe:null brwsrcmp.dll:null 5 Firefox:3.6a1 firefox.exe:null brwsrcmp.dll:null 5 Firefox:3.6.8 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3855 5 Firefox:3.6.13 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.13 firefox.exe:1.9.2.3727 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.12 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3951 5 Firefox:3.6 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 4 Firefox:3.6.6 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3828 4 Firefox:3.6.12 firefox.exe:1.9.2.3937 brwsrcmp.dll:1.9.2.3951 4 Firefox:3.6.10 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 4 Firefox:3.6.13 firefox.exe:1.9.2.3909 brwsrcmp.dll:1.9.2.3989 3 Firefox:3.6.6pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.4 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3814 2 Firefox:3.6.13pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll: 2 Firefox:3.6.12 firefox.exe:null brwsrcmp.dll:1.9.2.3951 2 Firefox:3.6.12 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3951 2 Firefox:3.6.10pre firefox.exe:null brwsrcmp.dll:null 2 Firefox:3.6.10 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3909 2 Firefox:3.6.8 firefox.exe:null brwsrcmp.dll:1.9.2.3855 1 Firefox:3.6.8 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.80 firefox.exe:null brwsrcmp.dll:null 1 Firefox:3.6.3 firefox.exe:null brwsrcmp.dll:1.9.2.3743 1 Firefox:3.6.3 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.3 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3743 1 Firefox:3.6.13 firefox.exe:null brwsrcmp.dll:1.9.0.3071 1 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3667 1 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.1.3642 1 Firefox:3.6.13 firefox.exe:1.9.2.3814 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.12pre firefox.exe:1.9.2.3926 brwsrcmp.dll:1.9.2.3933 1 Firefox:3.6.12 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3951 1 Firefox:3.6.12 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.11pre firefox.exe:null brwsrcmp.dll:null 1 Firefox:3.6.10 firefox.exe:1.9.0.3831 brwsrcmp.dll:1.9.2.3909 1 total_firefox_crash 2193305
full days crash analysis for feature 2. in comment 4 date: 20110211 (only restricted to firefox 4.0b11) Firefox:4.0b11 firefox.exe:null brwsrcmp.dll:null 50501 Firefox:4.0b11pre firefox.exe:null brwsrcmp.dll:null 164 total_firefox_crash 50665
full days crash analysis for feature 2. in comment 4 date: 20110215 (only restricted to firefox 4.0b11) Firefox:4.0b11 firefox.exe:null browsercomps.dll:null 27232 Firefox:4.0b11pre firefox.exe:null browsercomps.dll:null 38 Firefox:4.0b11 firefox.exe:2.0.0.4038 browsercomps.dll:2.0.0.4051 3 Firefox:4.0b11 firefox.exe:2.0.0.4027 browsercomps.dll:2.0.0.4051 2 Firefox:4.0b11 firefox.exe:2.0.0.4051 browsercomps.dll: 1 total_firefox_crash 59540
date: 20110215 (only restricted to firefox 4.0b11) Firefox:4.0b11 firefox.exe:2.0.0.4051 browsercomps.dll:2.0.0.4051 32771 Firefox:4.0b11 firefox.exe:2.0.0.4050 browsercomps.dll:2.0.0.4050 6 Firefox:4.0b11 firefox.exe:2.0.0.4038 browsercomps.dll:2.0.0.4051 3 Firefox:4.0b11 firefox.exe:2.0.0.4027 browsercomps.dll:2.0.0.4051 2 Firefox:4.0b11 firefox.exe:2.0.0.4051 browsercomps.dll: 1 total_firefox_crash 32783
So frankenbuilds still happen in FF4, but mostly gone. Not like 3.6 at all. Are the firefox.exe:null crashes plugin-container.exe crashes? Maybe, but that wouldn't explain Firefox:3.6.3 firefox.exe:null brwsrcmp.dll:null 49191 Firefox:3.6.2 firefox.exe:null brwsrcmp.dll:null 8028 Firefox:3.6 firefox.exe:null brwsrcmp.dll:null 39009
Stripping out the "firefox.exe:null" lines from comment 14 on the theory they were mostly plugin crashes (ignoring the evidence of comment 19, but in any case I don't know what to do with them) and then sorting by release I get Firefox:3.6 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 4 Firefox:3.6.3 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3743 1 Firefox:3.6.3 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.4 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3814 2 Firefox:3.6.6 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3828 4 Firefox:3.6.8 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3855 5 Firefox:3.6.8 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.10 firefox.exe:1.9.0.3831 brwsrcmp.dll:1.9.2.3909 1 Firefox:3.6.10 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3909 2 Firefox:3.6.10 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3909 4 Firefox:3.6.12 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3951 5 Firefox:3.6.12 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3951 2 Firefox:3.6.12 firefox.exe:1.9.2.3937 brwsrcmp.dll:1.9.2.3951 4 Firefox:3.6.12 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.12 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3951 1 Firefox:3.6.12pre firefox.exe:1.9.2.3926 brwsrcmp.dll:1.9.2.3933 1 Firefox:3.6.13 firefox.exe:1.9.2.3667 brwsrcmp.dll:1.9.2.3989 29 Firefox:3.6.13 firefox.exe:1.9.2.3727 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.13 firefox.exe:1.9.2.3743 brwsrcmp.dll:1.9.2.3989 13 Firefox:3.6.13 firefox.exe:1.9.2.3814 brwsrcmp.dll:1.9.2.3989 1 Firefox:3.6.13 firefox.exe:1.9.2.3855 brwsrcmp.dll:1.9.2.3989 5 Firefox:3.6.13 firefox.exe:1.9.2.3909 brwsrcmp.dll:1.9.2.3989 3 Firefox:3.6.13 firefox.exe:1.9.2.3951 brwsrcmp.dll:1.9.2.3989 48 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll: 2 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.1.3642 1 Firefox:3.6.13 firefox.exe:1.9.2.3989 brwsrcmp.dll:1.9.2.3667 1 Firefox:3.6.14 firefox.exe:1.9.2.3615 brwsrcmp.dll:1.9.2.4038 11 Firefox:3.6.14 firefox.exe:1.9.2.4038 brwsrcmp.dll:1.9.2.4055 90 3.6.10 - 3 3.6.12 - 14 3.6.13 - 108 3.6.14 - 101 I think 3.6.14 crashes are unthrottled now. If so that 3.6.13 number is more like 1080 crashes from frankenbuilds. But the number of 3.6.13 users is more than 300 times 3.6.14 beta users, not just 10 times. Appears to be a serious uptick in frankenbuilds. But maybe not. That set of 11 3.6.14 crashes with a 3.6 beta(!!) firefox.exe and a 3.6.14 component is an odd combination. Does that happen to a lot of people or is it one guy crashing 11 times before giving up? Probably the latter. Maybe we're not getting any more frankenbuilds than we always do, but the results in this case were a little more noticeable in a crash spike.
(In reply to comment #20) > > I think 3.6.14 crashes are unthrottled now. Just confirming: yes, as per bug 632171.
> Firefox:3.6.14 firefox.exe:1.9.2.3615 brwsrcmp.dll:1.9.2.4038 11 > Firefox:3.6.14 firefox.exe:1.9.2.4038 brwsrcmp.dll:1.9.2.4055 90 I'm not seeing any firefox-1.9.2.3989/brwsrcmp-1.9.2.4038 in this dataset but we did see them on earlier days in chofmann's sample (comment 12). Maybe it's a self-limiting problem as people give up, and not really new.
> 3.6.10 - 3 Missed a row, there were 7 frankenbuild crashes in 3.6.10 in the dataset.
1) We would like to run a similar job to the above, but we want to get a count of what groups of dlls are mismatched and their versions (to see if there are more than just the exe and brwsrcmp.dll). I'd like the report something like [count] [FF version] [firefox.exe vers] [mismatched dll#1 vers] [mismatched dll#2 vers] For example: 580 Firefox:3.6.10 firefox.exe:1.9.2.999 dll#1:1.9.2.888 dll#2:1.9.2.888 2) I'd like a report to see if the level of the frankenbuilds is the same over the 3.6.13 and the 3.6.14 beta period. The beta period for 3.6.13 was 2010-12-01 through 2010-12-09 3) I'd like a report to see if #2 shows the level on beta is the same what level to expect for release. I'd like the report to query 3.6.13 from 2010-12-09 to now. Bonus points to track it over time so we can graph what crash curve looks like. Please let me know if more information is needed for these. This is very high priority as this data will help us determine if we go out with what we have now for 3.6.14 or if we rebuild / go a different direction.
we should probably also look at early stages of 3.6.13 and other release deployment. the highest pct. of the problem would most likely occur when release upgrades happen, so looking at what's happening with 3.6.13 now, isn't as much value as looking at the week after December 9 when most of the updates where happening.
Yep, that's why in #3 I want it tracked over time. Do you think we need to do it for #2 as well?
sample one day report for feature 1) in comment #24: http://people.mozilla.com/~aphadke/top.100.txt legneato - thoughts?
(In reply to comment #26) > Yep, that's why in #3 I want it tracked over time. Do you think we need to do > it for #2 as well? we collided and I didn't read your comment closely. yeah, the plan in comment 24 sounds good. one sugggestion is to output the data with date and adu's to help correlate frequency or mismatches per 100 users or some other similar metric. date adu's count firefox_version dll_mismatches, ... 580 Fx:3.6.10 firefox.exe:1.9.2.999 dll#1:1.9.2.888 dll#2:1.9.2.888
(In reply to comment #27) > sample one day report for feature 1) in comment #24: > http://people.mozilla.com/~aphadke/top.100.txt > > legneato - thoughts? Looks great! A couple of things: * I would like to get the FF exe version in there so I can compare the mismatch without having to cross-reference with the main Firefox verion * We should probably filter out any dll version that isn't 1.9.* (is this what you asked me about via IRC?) * In the tsv's it'd be nice if the dlls were prepended to their versions for easier sorting (xul.dll:1.9.2.999). Not a big deal as we can do that in post-processing
Woo, that looks good! Were we going to add the firefox.exe version as its own column after the Firefox:x.y.z version? Also, would it be too much time / stress to run that query for the past year? Is that too much? If so, can we do the last 6 months? 3? Not sure what the sweet spot for time vs data is.
(17:25) < LegNeato> aphadke: Sorry, should have been clearer. Only want records where there is at least one mismatched dll with a version matching 1.9.* (17:26) < LegNeato> (and only want the mismatched dlls and the firefox.exe in that case)
Looks good, let's run it on 3 months of data.
I managed to bring down the hadoop cluster yday while running a single job for 3 months. The job has been modified since then to do 1 week at a time for 3 months, combine and print the results. The job is running, results should be available in next 2-3 hours..
mismatch dll data for 8 weeks at http://people.mozilla.com/~aphadke/nov_dec_jan_dll_mismatch.txt Secondary cluster will be up and running soon, this will allow us to run MR jobs on a much wider time-range..
Ok, that's enough data for his, thanks! Can we get query #2 run? It has a lot less data / a more specific time range
that's enough data for *this* that is. I'm not sure having another month of data will tell us anything more.
Data for 2010-12-01 to 2010-12-09, firefox 3.6.13 and firefox 3.6.14 pre build (see comment #24, 2) http://people.mozilla.com/~aphadke/mismatchdll.20101201.20101209.txt
This takes the data from comment 37 and strips out the lines that don't have any Firefox .dlls in them. Makes it easier to focus on the various mismatched firefox groupings. Interesting that sometimes firefox.exe is newer than the dlls, not 100% older as I'd expect if the firefox process was locked. I think my favorite is the 3.7.a1pre build with a reasonable-sounding "1.9.3.3568" xpcom.dll and a Firefox 3.5 firefox.exe (1.9.1.3593).
Another way to slice the data in comment 37. Again stripping out lines that only have non-Firefox dlls, then combining the crash counts for each Firefox version with mismatched firefox dlls. The second column is the number of different dll version groupings for that version of Firefox. The second column slightly overcounts the number of groupings because I did not coalesce lines whose only difference is a non-Firefox .dll. You can see these in attachment 513356 [details] which was the raw data for this one. It's not too big an effect.
Considering that 3.5.x has only 10-15% of the users that 3.6.x does the counts make 3.5 looks incredibly infested with frankenbuilds. But remember that 3.5 doesn't have OOPP, while in 3.6.4+ plugin crashes won't have a firefox.exe and will be excluded from the data set. To make more sense of it we'd have to add ADU and crash-per-user columns.
Takes the comment 41 data and strips out the lines with no Firefox .dlls, similar to comment 41 / attachment 513356 [details]
Daniel - wrt comment #43, I assume we are looking for 3.6.13 and 3.6.14pre-build ADUs for 2010-12-01 through 2010-12-09? for crash-per-user, in addition to the above constraints, we want avg. # of crashes/user? btw, the ADUs reside at a completely different data source, so I'll have to do some manual data marshaling out here....
Daniel, bug 525390 should make frankenbuilds of Firefox 3.6 much less likely which I am certain is a major factor as to why there are less frankenbuilds with Firefox 3.6 when compared to Firefox 3.5
(In reply to comment #43) > Considering that 3.5.x has only 10-15% of the users that 3.6.x does the counts > make 3.5 looks incredibly infested with frankenbuilds. But remember that 3.5 > doesn't have OOPP, while in 3.6.4+ plugin crashes won't have a firefox.exe and > will be excluded from the data set. Unless I am mistaken, the majority of frankenfox crashes are startup crashes well before OOPP comes into play.
> Takes the comment 41 data and strips out the lines with no Firefox .dlls, Comment 40 data, I mean. It shows 39 mixed-dll crashes in 3.6.13 during its week of beta. The nov-jan data shows 37 mixed-dll crashes in 3.6.14 during the last six days of Jan when it was available on the beta channel. Comfortingly similar, but that comfort could go out the window if socorro throttling was set differently.
(In reply to comment #45) > Daniel - wrt comment #43, I assume we are looking for [...] > so I'll have to do some manual data marshaling out here.... My comment was not a request, just an opinion. If Christian thinks we need that additional data he will ask for it in a clear manner. Thanks for volunteering though! (In reply to comment #47) > Unless I am mistaken, the majority of frankenfox crashes are startup crashes > well before OOPP comes into play. That seemed to be the case in bug 633869 and bug 631105, but I don't think that's generally true. Some of these combinations are so old that they must be stable for these users. The user just happened to crash from some other cause and left traces of their frankenfox for us to find. What we're measuring is "people who crash who happen to have a frankenfox", but we didn't capture data on whether they were startup crashes or not. We're making guesses about the likelihood of frankenfox creation because JS changes in 3.6.14 made this an unstable, unusable combination. From the data I'm starting to think we didn't do anything to make frankenfoxes more likely, but since the effects are worse (guaranteed startup crash) we're noticing it a lot more this time around. If frankenfoxes are really common then we're screwed. If they're rare enough we can plow ahead with the release and hope the affected people will figure out that they should download a fresh copy.
I suspect that at the very least some of the mismatched crashes are due to updating, ending up with mismatched dll's (which can be due to updating from a very old build), and then crashing on startup. Having Uptime included in the reports would tell us the number of these crashes are startup crashes and I'd appreciate this data though it can wait if it interferes with getting 3.6.14 out the door. After comparing the data for 3.5.x and 3.6.x I filed bug 635161 which should reduce mismatches from happening even more.
I'm re-running one of my scans and here is some preliminary data about uptime count last_crash uptime 1 \N 0 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 1 72470 6 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 1 63262 19 firefox.exe 1.9.2.3855 brwsrcmp.dll 1.9.2.4038 1 2 0 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038 1 12 0 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 1 11 0 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 1 10 0 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 They are all pretty close to startup, but its also interesting that if the time since last crash is a long time, it takes longer to hit the crash. If it looks like a retry the crash is immediate.
(In reply to comment #51) >... > They are all pretty close to startup, but its also interesting that if the time > since last crash is a long time, it takes longer to hit the crash. If it looks > like a retry the crash is immediate. The longer times are likely due to the work that is done such as extension checks after a version change and the shorter times are likely due to the same install trying to start again.
(In reply to comment #37) > mismatch dll data for 8 weeks at > http://people.mozilla.com/~aphadke/nov_dec_jan_dll_mismatch.txt > > Secondary cluster will be up and running soon, this will allow us to run MR > jobs on a much wider time-range.. The following entry seems incorrect since the executable and the dll's are all 1.9.1 4 Firefox:3.6.13 firefox.exe:1.9.1.3951 xpcom.dll:1.9.1.3685 xul.dll:1.9.1.3685
I reproduced the js_Enumerate startup crash by taking a 3.6.14 build2 and then copying firefox.exe, xpcom.dll, and xul.dll from a build1. Also got a js_PurgeCachedNativeEnumerators crash bp-3627645d-56a6-47ad-bce0-2b79e2110222 bp-2ef01596-4480-4b86-af60-152262110222
Oops, first one should be bp-3627645d-56a5-47ad-bce0-2b79e2110222 Also reproduced the js_DestroyScriptsToGC crash with a frankenfox 3.6.14-build1 plus firefox.exe, xpcom.dll, and xul.dll from 3.6.13 bp-7ebddbde-100f-4994-914e-67b992110222 bp-5f011c57-ff8f-4d34-b020-02c2d2110222 bp-0c5ef72e-9bcd-4a11-a9f2-d82592110222 bp-e5375cbe-4a05-4143-b1d6-4122e2110222 bp-abe417f3-fe51-45ec-a9f7-245a62110222 case closed: crash spikes bug 631105 and bug 633869 are caused by frankenfoxes.
Sweet!
do u guys need anything from my end or can we close this bug?
https://bugzilla.mozilla.org/show_bug.cgi?id=633869 im assuming it can be closed since that one is fixed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Nope, I think we still want data from #3 in comment 24. I owe a proper description I think
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, we are getting data for #3 (in a google spreadsheet), great! Now, we need to run it every day to see how 3.6.14 is tracking so we can turn off the updates if we need to. * Does 4:00 PM PST every day sound ok? Or is early in the morning preferred? * How resource intensive is the query on the cluster? * How long does the query take? If we started it at 4:00 pm would it be done at 5:00 pm so that we can take action? Please note that we won't need to operate on anything other than 3.6.14 crashes as the 3.6.13 data should be 100% the same. This is very important data that impacts millions of users so it should be treated as a priority.
* Does 4:00 PM PST every day sound ok? Or is early in the morning preferred? A: we have brought down the # of map tasks to 1. Given the CRITICAl nature of bug, I think we should be fine running it @ 4pm but I leave the final decision to dre and xstevens. * How resource intensive is the query on the cluster? A: Not too much as we are only doing it for a day and for a specific build. * How long does the query take? If we started it at 4:00 pm would it be done at 5:00 pm so that we can take action? A: It takes roughly 10 minutes for query to run to completion.
I'm fine with this. dre/x can comment on the timing issue.
#of frankenstein builds for fx 3.6.14: 3870 20110301-20110302
Out of those 3.6.14 frankenfox crashes what were the version for the firefox.exe, xpcom.dll, and xul.dll files?
the report only calculates the aggregate, if needed, I can modify the current process to output the mismatched DLL's similar to: https://bug634343.bugzilla.mozilla.org/attachment.cgi?id=513361
It would be helpful since some number of those would be the earlier firefox.exe version which doesn't cause a crash and will hopefully be fixed by bug 635161.
#of frankenstein builds for fx 3.6.14 (removed crashes with non Mozilla dlls): 2541 20110301-20110302
I didn't remove the non Mozilla dlls from the entries that also included Mozilla dlls.
Counts comparing firefox.exe version and the common dll version without the questionable crashes (e.g. tbb-firefox.exe, incorrect dll filename case, and AccessibleMarshal.dll where only one file can be registered for all installations). There was only one crash where there were multiple dll versions. Out of the remaining count of 2538 only 13 (around 0.5%) had a newer version of firefox.exe which should be improved by fixing bug 635161. count firefox.exe-ver dll-ver1 dll-ver2 2503 1.9.2.3989 1.9.2.4066 12 1.9.2.4066 1.9.2.3989 6 1.9.2.3606 1.9.2.4066 3 1.9.2.3667 1.9.2.4066 3 1.9.2.3743 1.9.2.4066 2 1.9.2.3855 1.9.2.4055 1 1.9.2.3615 1.9.2.4055 1 1.9.2.3615 1.9.2.4066 1 1.9.2.3855 1.9.2.4066 1 1.9.2.3909 1.9.2.4066 1 1.9.2.4038 1.9.2.4055 1 1.9.2.4038 1.9.2.4066 1 1.9.2.4055 1.9.2.4066 1.9.2.3909 1 1.9.2.4055 1.9.2.4066 1 1.9.2.4066 1.9.2.3846
Verified that using a 3.6.13 profile with a 3.6.14 build (updated from 3.6.13) with a firefox.exe with a version of 1.9.2.3989 and all other files up to date there was no crash. This covers the common case for builds with mismatched dlls.
aphadke, could you generate a report for mismatched dll's (version 2.0.0.x) for Firefox Beta 12? I'd like to get an idea if the changes to the updater on trunk have affected the number of frankenbuilds.
rstrong - date: 03/07 to 03/08 firefox version: 4.0b12 dll: 2.0.0.x report at: http://people.mozilla.com/~aphadke/frankenstein/firefox.4.0b12.20110307.20110308.sort.txt let me know if you would like to run it for a different date range.
(In reply to comment #74) > rstrong - > date: 03/07 to 03/08 > firefox version: 4.0b12 > dll: 2.0.0.x > > report at: > http://people.mozilla.com/~aphadke/frankenstein/firefox.4.0b12.20110307.20110308.sort.txt > > let me know if you would like to run it for a different date range. Could I get the same report from February 23rd onward?
will be running this @ 7pm PST once the load on Socorro starts tapering....
aphadke, Thanks! Additional reports for beta 12 won't be needed in case you set up a job.
#of frankenstein builds for fx 4.0b12 and 4.0b12pre (removed crashes with non Mozilla dlls): 285 20110222.20110308 Only 1 had a firefox.exe version greater than the dll version
count Firefox ver firefox.exe ver dll ver 243 Firefox:4.0b12 2.0.0.4051 2.0.0.4070 19 Firefox:4.0b12 2.0.0.4038 2.0.0.4070 6 Firefox:4.0b12 2.0.0.3960 2.0.0.4070 5 Firefox:4.0b12pre 2.0.0.4060 2.0.0.4068 3 Firefox:4.0b12 2.0.0.3882 2.0.0.4070 2 Firefox:4.0b12 2.0.0.4000 2.0.0.4070 1 Firefox:4.0b12 2.0.0.3869 2.0.0.4070 1 Firefox:4.0b12 2.0.0.4027 2.0.0.4070 1 Firefox:4.0b12pre 2.0.0.4028 2.0.0.4069 1 Firefox:4.0b12pre 2.0.0.4063 2.0.0.4060 1 Firefox:4.0b12pre 2.0.0.4066 2.0.0.4067 1 Firefox:4.0b12pre 2.0.0.4068 2.0.0.4069 1 Firefox:4.0b12pre 2.0.0.4069 2.0.0.4070
should we close this bug?
closing for now..
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Blocks: 671348
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: