Closed Bug 635834 Opened 13 years ago Closed 13 years ago

Mismatched dll report

Categories

(Socorro :: General, task)

x86
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: robert.strong.bugs, Assigned: aphadke)

References

Details

I would like to get more details on mismatched dll's if possible. In bug 634343 the results were limited to dll's that used the same version as firefox.exe which skips the nss, sqlite, mozcrt, and several other dll's. To try to get a better understanding of the differences between 3.5.x, 3.6.x, and 4.x I'd like to get the versions for firefox.exe and all loaded modules along with build ID, crash time, uptime, and the time since last crash for the last 3 months so I can evaluate the affect bug 525390 had and bug 635161 will have.

Around 3 months after bug 635161 lands for a release which will likely happen after Firefox 4 I'd like to get the same data to evaluate the affect it had which can be done in a new bug.
Assignee: nobody → aphadke
Whiteboard: ETA: 2/25 (please change if needed)
Whiteboard: ETA: 2/25 (please change if needed) → ETA: 3/15 (please change if needed)
This is very much so a low priority so no worries and thanks.
robert - can u please check and comment on the sample dataset at http://people.mozilla.com/~aphadke/all_dlls.txt
The data looks good. I don't need the common reports grouped with a count which should speed things up. Instead of all dll's could it just contain the following dll's:
AccessibleMarshal.dll, browsercomps.dll, browserdirprovider.dll, brwsrcmp.dll, firefox.exe, freebl3.dll, D3DCompiler_42.dll, d3dx9_42.dll, libEGL.dll, libGLESv2.dll, mozalloc.dll, mozcpp19.dll, mozcrt19.dll, mozsqlite3.dll, nspr4.dll, nss3.dll, nssckbi.dll, nssdbm3.dll, nssutil3.dll, plc4.dll, plds4.dll, smime3.dll, softokn3.dll, ssl3.dll, xpcom.dll, xul.dll

btw: this is what I am seeing in the report out of a total of 56344 crashes on Windows trunk

dll filename           total  % of total crashes
browsercomps.dll        32        0.0050%
browserdirprovider.dll   0        0%
brwsrcmp.dll             0        0%
mozalloc.dll            22        0.0035%
xpcom.dll               23        0.0036%
xul.dll                 31        0.0049%

All of the mismatches had a firefox.exe version less than the mismatched dll version
Whiteboard: ETA: 3/15 (please change if needed)
robert - 
* For our current MR job, counts barely slows anything down in terms of processing time, I'll keep this for now unless u absolutely don't need it.

* Added the DLL list, the comparison will be case insensitive.

* Hopefully, firefox.exe version less than mismatched dll version is a good thing.

I am planning to create three cron jobs that run nightly @ 8pm for Fx 3.5.x, 3.6.x and 4.x for Windows platform.
Whiteboard: ETA: 3/18
Sounds good and thanks!

Though having mismatched versions is never a *good thing* I believe that Bug 635161 (landed on mozilla-1.9.2 yesterday) will fix it.
Depends on: 630312
Whiteboard: ETA: 3/18 → depends on 630312
Any word on when the report(s) will be available?
robert - we won't be able to run long-range MR jobs on our cluster until bug: 630312 is resolved. There isn't a specific ETA, but we hope to get it done by 1st week of april. Here are some other options that might help:

1 - A 3-month job can be run for the date range: 1st Nov. to 1st Feb. 
2 - A nightly job can be run on production for (t - 1)th day at 19:00PST.

Once bug 630312 is resolved, the data will be replicated on our secondary cluster which will allow us to run MR job without the risk of bringing down production HBase.

Thoughts?
Could option 1 and option 2 be implemented? Would it be possible to get the missing data between 1st Feb to the first daily report?
yes, i'll start running the job for option 1 and cronify option 2 by tuesday (3/22)
We can get missing data between 1st feb to first daily report once replication is running. Note that we can always get the data for given period if its urgent.
Currently running job for Firefox 4.0.x, please verify output at:

http://people.mozilla.com/~aphadke/mismatched_dll_report/
Looked at a couple of them and they look great! Thanks again
This is working out really well

Total Crashes          : 1386283
Total Mismatch Crashes : 850
AccessibleMarshal.dll  : 17
browsercomps.dll       : 793
browserdirprovider.dll : 0
brwsrcmp.dll           : 0
firefox.exe            : 0
mozalloc.dll           : 452
xpcom.dll              : 440
xul.dll                : 498
btw: the last line in firefox.4.0.20110107.txt and the last couple of lines in firefox.4.0.20110121.txt are incorrectly formatted

firefox.4.0.20110107.txt
-------------	firefox.exe:2.0.0.4000	buildId: 20101214170338	crashTime: null	uptime: 2495	secondsSinceLastCrash: null	dllSet: browsercomps.dll:2.0.0.4000	firefox.exe:2.0.0.4000	freebl3.dll:3.12.9.0	mozalloc.dll:2.0.0.4000	mozcpp19.dll:8.0.0.0	mozcrt19.dll:8.0.0.0	mozsqlite3.dll:3.7.1.0	nspr4.dll:4.8.7.0	nss3.dll:3.12.9.0	nssckbi.dll:1.81.0.0	nssdbm3.dll:3.12.9.0	nssutil3.dll:3.12.9.0	plc4.dll:4.8.7.0	plds4.dll:4.8.7.0	smime3.dll:3.12.9.0	softokn3.dll:3.12.9.0	ssl3.dll:3.12.9.0	xpcom.dll:2.0.0.4000	xul.dll:2.0.0.4000		


firefox.4.0.20110121.txt
true	uptime: 5833	secondsSinceLastCrash: 5835	dllSet: browsercomps.dll:2.0.0.4027	firefox.exe:2.0.0.4027	freebl3.dll:3.12.9.0	mozalloc.dll:2.0.0.4027	mozcpp19.dll:8.0.0.0	mozcrt19.dll:8.0.0.0	nss3.dll:3.12.9.0	nssckbi.dll:1.81.0.0	nssdbm3.dll:3.12.9.0	softokn3.dll:3.12.9.0	ssl3.dll:3.12.9.0	xpcom.dll:2.0.0.4027	xul.dll:2.0.0.4027		
Content-Disposition: form-data; name="EMCheckCompatibility"

-------------=---------------0000002900004823
Cleaned up numbers from the report

Skipped 98 out of 1386180 crashes due to the crash having an invalid name for firefox.exe or firefox.exe not having version information.

                      |       Total       |  dll ver <   |   dll ver >    |
Crashes               | 1386180 (100%)    |     N/A      |      N/A       |
Mismatch Crashes      |     805 (0.0581%) | 15 (0.0011%) |  790 (0.0570%) |
browsercomps.dll      |     773 (0.0558%) | 13 (0.0009%) |  760 (0.0548%) |
mozalloc.dll          |     435 (0.0314%) |  9 (0.0006%) |  426 (0.0307%) |
xpcom.dll             |     420 (0.0303%) | 11 (0.0008%) |  409 (0.0295%) |
xul.dll               |     472 (0.0341%) | 10 (0.0007%) |  462 (0.0333%) |
firefox 4, 3.6 and 3.5 data for Dec. 2011
buildid seems to be missing, not exactly sure if its related to my code or the actual crash. Data for past 90 days will be available by EOD 3/22.
Whiteboard: depends on 630312
I post-processed all of the current reports and everything is working out great... Thanks! The missing build ID won't be a problem.
data for firefox 4.0, 3.6 and 3.5; time-range: 2nd feb. to 21st mar. available at:
https://bugzilla.mozilla.org/show_bug.cgi?id=635834
No longer depends on: 630312
added: nightly cron job @ 19:30 PST on production hbase processing 1-days of data for Firefox 3.5, 3.6 and 4.0
Output: 
http://people.mozilla.com/~aphadke/mismatched_dll_report/

Format:
http://people.mozilla.com/~aphadke/mismatched_dll_report/firefox.4.0.20110201.txt

http://people.mozilla.com/~aphadke/mismatched_dll_report/firefox.$fx_major.$fx_minor.$yyyy$mm$dd.txt

Please let me know if anything is missing.
Looks like there is missing data for 3.5 and 3.6... the gaps are between
firefox.3.5.20110102.txt - firefox.3.5.20110201.txt
and
firefox.3.6.20110102.txt - firefox.3.6.20110201.txt
fixed. there shouldn't be any missing data except for 2011-03-22 which will get populated today. 
Daily cron job for (t-1)th day is also up and running.
closing this for now, please reopen if needed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
robert - are u using http://people.mozilla.com/~aphadke/mismatched_dll_report/ to look at the reports? I got an email from ops that they want me to clean up diskspace on people. Is it okay if i delete reports dated april and before?
Yeah, if you want to use people for this please determine the number of reports you need going back, and then set a cron to delete any older than that on a regular basis (so we won't have to go through this again)

TIA
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I have the data so it is ok to delete it.  Since the next data set will be for specific versions you can also stop the reports. I'll file a bug for new reports when the client versions with the fix applied are coalesced.
closing...
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Anurag, I have copies of all of the reports through 20110716 if you would like to delete them.
thx robert, deleted :-)
Anurag, I have copies of all of the reports through 20110809 if you would like to delete them.
Anurag, I have copies of all of the reports through 20110917 if you would like to delete them.

P.S. I'm thinking of other ways I can get this data so you don't have to delete the files every couple of weeks. Thanks!
@rstrong: no worries on deleting the data, I have a cron setup that takes care of it :-)
Anurag, I've downloaded the files up to 20111004. Thanks!
thx robert - not sure y the script didn't run to completion last night, I'll investigate and let u know
If you want you can stop running the report for a while. I have enough data and will likely re-evaluate where we are in a couple of months.
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.