Looking at crash reports, a high proportion of crashes with these signatures seem to directly involve AVG. RtlpCoalesceFreeBlocks is apparently a sign of corrupted heap, we should try to repro with AVI + valgrind/purify to see what's the root cause here. (Possible allocator mismatch too?)
Blocking 1.9.2+ per CrashKill effort.
A user on support.mozilla.com confirmed that disabling Avg Safe Search 8.5
fixed the problem.
Bug 513329 covers this crash stack for reasons other than AVG.
Looks like this general crash signature is WinXP only. Of the 8566 matching crashes in the last week, just under half are "Windows NT 5.1.2600 Service Pack 2" and the rest are "Windows NT 5.1.2600 Service Pack 3". Only *8* reports are different (mostly slight variations of SP2/3).
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
The blocklist.xml shipped with the application (http://mxr.mozilla.org/mozilla-central/source/browser/app/blocklist.xml) does not yet include this update, so new users who crash on startup won't be protected.
This has been one of the most frequently reported crashes this week on support.
Uptime on the crashreports seems to be an easy way to identify when the crash is due to AVG vs. other problems (bug 513329)... Almost every crash with an uptime under 2 minutes is AVG related, and the longer uptimes are almost always not involving AVG. Roughly 80% of the AVG crashes are < 15 seconds.
All of the AVG crashes have an identical crashing thread:
0 ntdll.dll RtlpCoalesceFreeBlocks
1 msvcr80.dll malloc
2 msvcr80.dll operator new
3 avglngx.dll avglngx.dll@0x28404
4 avglngx.dll avglngx.dll@0x25dae
5 avglngx.dll avglngx.dll@0x25faf
6 avglngx.dll avglngx.dll@0x260a4
7 avglngx.dll avglngx.dll@0x263f2
Frame 8 seems to be garbage. It's randomly different in each report, but the other frames have identical addresses.
There's commonality with module versions. Many of the avg*.dll versions are the same, the only differences I saw were that AVGCFGX.DLL can be version 188.8.131.521 or 184.108.40.2064, and AVGLOGX.DLL can be version 220.127.116.117 or 18.104.22.1680.
A current install of AVG Free seems to work just fine. It gives me an AVGCFGX.DLL with version 22.214.171.1241 (which isn't in any of the crash reports, so the most recent AVG doesn't seem to be involved with this particular crash). The earliest mention on Google of versions .371 and .384 were July 2009, so it appears the people crashing in Firefox are using an AVG version that's a month or two out of date. Did we see any support spikes in July for problems with AVG?
When installing AVG, an "AVG Safe Search" extension is added to Firefox automatically, and an "AVG Security Toolbar" extension is optionally added (the installer asks if you want it, it's checked by default). One crashreport has a list of extensions pasted in, with Toolbar v.2.506.026.001. My newly installed AVG gives version 2.507.024.001. However, only some of the AVG crashes include IGeared_tavgp_xputils35.dll in the module list. That's the component used by the toolbar, so this implies the toolbar alone isn't to blame.
The install.rdf version in the "AVG Safe Search" addon is just "8.5", so we can't blocklist this unless we're willing to blocklist all AVG installs. But if we can get them to start using real versions number in their next update, we could add 8.5 to the blocklist once most users have updated to it. The extension includes an AVGSSFF.DLL component, which is version 126.96.36.1990 in all the crash reports but 188.8.131.521 in my new install. If we could blocklist based on component DLL versions, that might help...
But since these are startup crashes, it seems unlikely that users will get blocklist updates (or even Firefox app updates) before they crash, as zzxc noted. So that makes it rather difficult to do anything for users currently hitting this problem. :(
We may be able to update the blocklist that comes included with Firefox so that we can ask user to download the latest version of Firefox with this version of AVG blocklisted.
Created attachment 403928 [details]
Crashes per day
Hmm, this is an interesting trend.
I did a manual search on crash-stats for 3.5.3 crashes matching this signature, in the last 1 day, 2 day, 3 days, etc; and then looked at the deltas to determine crashes per day. For the 2 weeks before Sep 26th, there were ~500 crashes/day, and then in the last few days this has suddenly spiked up to 1200/day.
Lars ran some queries on the DB to confirm this. Additionally, if you filter out crashes with an uptime > 120 seconds, the spike is even more dramatic -- about a 15x increase in crashes over the last few days. And maybe higher, as the data for the 28th may not be a full day.
I wonder what happened on the 26th/27th to cause this spike?
This just blew up. Counts of for any Firefox version. Date lags actual crash date by a day.
_we_ didn't ship, and _they_ didn't ship given the old versions involved, could it be communication with their service (say, a signature update)? Or maybe a malware epidemic disabling their main windows service/executable, with their Firefox component unable to handle that gracefully?
Created attachment 404142 [details]
Crash data mined from DB
Got some more data from the DB via Lars...
* Looked for other crashes where "avg*.dll" was in one of the crashing stack's frames, ignoring crashes with a signature of RtlpCoalesceFreeBlocks. This turned up a nearly identical crash with RtlAllocateHeap as the signature. There are also fairly common crashes with a signature of avgssff.dll@0x9943 and avgtbapi.dll@0x7b26, but these look different than the issue in this bug. There is also a long-tail of crashes with a signature of avgcfgx.dll @ various addresses, which I didn't investigate.
Based on that, we're able to do more accurate DB searches for matching crashes... signature of 'RtlpCoalesceFreeBlocks' or 'RtlAllocateHeap', with a frame matching 'avg*.dll'.
* Massive spike in these crashes over the past few days. (Keep in mind that we only process 10% of reports, so actual crashes are roughly 10x these counts.)
1016 | 2009-09-25 00:00:00
1844 | 2009-09-26 00:00:00
6849 | 2009-09-27 00:00:00
24662 | 2009-09-28 00:00:00
86674 | 2009-09-29 00:00:00
39335 | 2009-09-30 00:00:00
Between the 9th and the 25th, reports were ~1000/day. Between August 13th and September 9th, reports were ~200/day. Between August 1st and 13th, there were almost no reports. There some randomish peaks, but it basically looks like something started causing the problem on August 13th, it got mildly worse on September 9th, and exploded on the 25th.
* Looked at when crashing users had last upgraded Firefox. There's a peak of users crashing on the same day they upgraded their browser, but there are also large numbers of crashes for users having browsers up to 19 days old. So, it's unlikely that just upgrading the browser is a cause of the crashes.
* Looked at the total number of crashes since the 16th, broken down by browser version. The numbers are not normalized by actual installed base, but it looks like the crashes are widely spread across Firefox 3.0.x and 3.5.x versions. So it seems unlikely that we broke something in a specific update that caused these crashes.
* Looked at the uptime reported with crashes. Most crashes are within 15 seconds of launching Firefox, the overwhelming majority are within 1 minute, and there's a trickle of reports with longer uptimes.
Also, we've reached out to contacts at AVG to involve them in this investigation.
We are not aware of any changes or program updates that could happen from 25th. As this is a memory corrution problem, could it be possible to get a memory dump with the page heap enabled so we could pinpoint the location of the first corruption?
We don't gather heap data in the dumps. The available data is basically what's in reports like this: bp-f3e7a8d1-c9b1-4e77-a8fc-4cbb62091002 (random example of this bug). We also record the current URL and stack, but that isn't publicly accessible (for privacy/security reasons). I can get that data if it's needed, though it doesn't seem useful here.
The RtlAllocateHeap signature took a crazy jump yesterday.
391 total crashes for RtlAllocateHeap on 20091010-crashdata.csv
393 total crashes for RtlAllocateHeap on 20091011-crashdata.csv
455 total crashes for RtlAllocateHeap on 20091012-crashdata.csv
439 total crashes for RtlAllocateHeap on 20091013-crashdata.csv
436 total crashes for RtlAllocateHeap on 20091014-crashdata.csv
430 total crashes for RtlAllocateHeap on 20091015-crashdata.csv
487 total crashes for RtlAllocateHeap on 20091016-crashdata.csv
433 total crashes for RtlAllocateHeap on 20091017-crashdata.csv
467 total crashes for RtlAllocateHeap on 20091018-crashdata.csv
1645 total crashes for RtlAllocateHeap on 20091019-crashdata.csv
what ever is happening seems to render firefox and maybe all browsers useless with repeated crashes or the inability to even start up according to some comments:
no browsers are working
Firefox 3.0.14 Windows NT 5.1.2600 Service Pack 3
After installing Google Chrome, I put it as default browser. I can still use firefox and IE then. After I reboot my computer, I can't access firefox or IE, only Chrome can be used
. I only can use firefox through safe mode
Both dumps have different signature (crash inside avgcfgx) than the original report in the comment #7.
Anyway, the AVG modules are out-dated, the first update with higher build numbers (406) was released on June, 27th. That means the users' installations have not been updated for some reasons since then. The current build available is 423.
The only advice now is to try to update or reinstall AVG with the up-to-date installation package.
there are a couple of things that we might do get more updates/reinstalls happening.
1) block the old versions if we can
2) get a support article on sumo that spotlights this a needed for people that are crashing.
:cww cc'ed to get number 2 going.
is blocking possible?
AVG says the updated version with the bumped version number is being released today/tomorrow, so we should be good to start blocklisting in a week or so.
Where are we on blocklisting?
It looks like the blocklisting (bug 527135) has greatly reduced the numbers of these crashes, though coming out of a US holiday it might be hard to tell. Randomly clicking on recent crash reports makes me think it's a smaller fraction of reasons for this signature, too.
Since this often seems to manifest as a startup crash, it would probably help to update the shipped copy of blocklist.xml, which should make it go away entirely for updated users.
I'll file a bug to do that, and after a bit more time run a DB query to get a more accurate view of crashes-per-day that are definitely AVG-related.
I took a crack at trying to sort out percentages of the two signatures in the bug title across the main releases that we are tracking. it looks like the RtlpCoalesceFreeBlocks crash in 3.6b4 is the highest volume problem that still remains. it made up about 1.2% of all crashes on 2009-11-30
checking --- 20091130-crashdata.csv RtlAllocateHeap
total 233706 488 0.00208809
3.0.15 50334 102 0.00202646
3.5.5 122547 144 0.00117506
3.6b4 16576 14 0.000844595
3.6b3 2703 4 0.00147984
3.6b2 1193 0
3.6b1 2776 3 0.00108069
host-5-95:crashdata chofmann$ ./daily-flash-counts.sh RtlpCoalesceFreeBlocks 20091130*
checking --- 20091130-crashdata.csv RtlpCoalesceFreeBlocks
total 233706 2008 0.00859199
3.0.15 50334 290 0.00576151
3.5.5 122547 956 0.00780109
3.6b4 16576 202 0.0121863
3.6b3 2703 25 0.00924898
3.6b2 1193 5 0.00419111
3.6b1 2776 54 0.0194524
still a pretty high incoming rate per day
checking --- 20100110-crashdata.csv RtlAllocateHeap
all 215352 410 0.00190386
3.0.15 1958 4 0.0020429
3.0.16 4940 17 0.0034413
3.5.5 5270 8 0.00151803
3.5.6 16510 22 0.00133253
3.5.7 102593 137 0.00133537
3.6 10425 38 0.00364508
3.6b5 14884 26 0.00174684
3.6b4 1505 0
3.6b3 721 0
3.6b2 729 0
3.6b1 2102 0
checking --- 20100110-crashdata.csv RtlpCoalesceFreeBlocks
all 215352 1682 0.00781047
3.0.15 1958 5 0.00255363
3.0.16 4940 21 0.00425101
3.5.5 5270 25 0.00474383
3.5.6 16510 88 0.0053301
3.5.7 102593 701 0.00683282
3.6 10425 409 0.0392326
3.6b5 14884 232 0.0155872
3.6b4 1505 19 0.0126246
3.6b3 721 12 0.0166436
3.6b2 729 5 0.00685871
3.6b1 2102 39 0.0185538
happens pretty close to startup.
1246 total crashes for avgxpl.dll@0x23a74 on 20100220-crashdata.csv
990 start up crashes inside 30 seconds of startup
1087 start up crashes inside 3 minutes of startup
looks like it blew up a day after to the release of 3.0.18 3.5.8 last week, but that might be coincidence since it also seems to be hitting 3.6 in high volume too. maybe an update to avg happened around the same time.
20100215-crashdata 1 avgxpl.dll@0x23a74
20100216-crashdata 0 avgxpl.dll@0x23a74
20100217-crashdata 0 avgxpl.dll@0x23a74
20100218-crashdata 3178 avgxpl.dll@0x23a74
20100219-crashdata 2416 avgxpl.dll@0x23a74
20100220-crashdata 1246 avgxpl.dll@0x23a74
checking --- 20100218-crashdata.csv avgxpl.dll@0x23a74
all 260226 3178 0.0122125
3.0.15 1147 29 0.0252833
3.0.16 484 18 0.0371901
3.0.17 10576 349 0.0329992
3.0.18 5315 196 0.0368768
3.5.5 3143 23 0.00731785
3.5.6 1777 40 0.0225098
3.5.7 77424 1230 0.0158865
3.5.8 37178 577 0.0155199
3.6 92031 487 0.0052917
er, than last comment belongs in a new avg bug. bug 547210
This is actually a duplicate of bug 519430, but was originally filed against older the AVG 8.5 code. All 8.5 users are being migrated to 9.0 which will solve this problem in the next update.
also topcrash in thunderbird
We're now tracking such bugs. This doesn't mean it's something we can fix, merely something we hope to be able to point vendors to so they can investigate. This is an automated message.
Based on comment 28, I close it as workforme.