Last Comment Bug 519340 - Crash with AVG Safe Search [@ RtlpCoalesceFreeBlocks] and [@ RtlAllocateHeap]
: Crash with AVG Safe Search [@ RtlpCoalesceFreeBlocks] and [@ RtlAllocateHeap]
Status: RESOLVED WORKSFORME
[crashkill][crashkill-thirdparty][cra...
: crash, user-doc-complete
Product: Plugins Graveyard
Classification: Graveyard
Component: AVG AV (show other bugs)
: unspecified
: x86 Windows XP
: P2 critical
: ---
Assigned To: Greg Mosher
:
Mentors:
Depends on: 527135 531712 720655
Blocks: 520484
  Show dependency treegraph
 
Reported: 2009-09-28 16:44 PDT by Justin Dolske [:Dolske]
Modified: 2016-04-28 09:04 PDT (History)
24 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Crashes per day (1.46 KB, text/plain)
2009-09-30 18:12 PDT, Justin Dolske [:Dolske]
no flags Details
Crash data mined from DB (33.67 KB, text/plain)
2009-10-01 15:36 PDT, Justin Dolske [:Dolske]
no flags Details

Description Justin Dolske [:Dolske] 2009-09-28 16:44:58 PDT
Looking at crash reports, a high proportion of crashes with these signatures seem to directly involve AVG. RtlpCoalesceFreeBlocks is apparently a sign of corrupted heap, we should try to repro with AVI + valgrind/purify to see what's the root cause here. (Possible allocator mismatch too?)
Comment 1 Damon Sicore (:damons) 2009-09-29 16:06:29 PDT
Blocking 1.9.2+ per CrashKill effort.
Comment 2 Matthew Middleton (:zzxc) 2009-09-29 16:38:06 PDT
A user on support.mozilla.com confirmed that disabling Avg Safe Search 8.5
fixed the problem.

Bug 513329 covers this crash stack for reasons other than AVG.
Comment 3 Justin Dolske [:Dolske] 2009-09-29 17:39:49 PDT
Looks like this general crash signature is WinXP only. Of the 8566 matching crashes in the last week, just under half are "Windows NT 5.1.2600 Service Pack 2" and the rest are "Windows NT 5.1.2600 Service Pack 3". Only *8* reports are different (mostly slight variations of SP2/3).
Comment 4 Mike Beltzner [:beltzner, not reading bugmail] 2009-09-30 07:32:09 PDT
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Comment 5 Matthew Middleton (:zzxc) 2009-09-30 11:27:56 PDT
The blocklist.xml shipped with the application (http://mxr.mozilla.org/mozilla-central/source/browser/app/blocklist.xml) does not yet include this update, so new users who crash on startup won't be protected.
Comment 6 Matthew Middleton (:zzxc) 2009-09-30 15:23:37 PDT
This has been one of the most frequently reported crashes this week on support.
Comment 7 Justin Dolske [:Dolske] 2009-09-30 17:35:38 PDT
Uptime on the crashreports seems to be an easy way to identify when the crash is due to AVG vs. other problems (bug 513329)... Almost every crash with an uptime under 2 minutes is AVG related, and the longer uptimes are almost always not involving AVG. Roughly 80% of the AVG crashes are < 15 seconds.

All of the AVG crashes have an identical crashing thread:

0   ntdll.dll    RtlpCoalesceFreeBlocks  	
1   msvcr80.dll  malloc
2   msvcr80.dll  operator new
3   avglngx.dll  avglngx.dll@0x28404 	
4   avglngx.dll  avglngx.dll@0x25dae 	
5   avglngx.dll  avglngx.dll@0x25faf 	
6   avglngx.dll  avglngx.dll@0x260a4 	
7   avglngx.dll  avglngx.dll@0x263f2 	
8                @0x3031322b

Frame 8 seems to be garbage. It's randomly different in each report, but the other frames have identical addresses.

There's commonality with module versions. Many of the avg*.dll versions are the same, the only differences I saw were that AVGCFGX.DLL can be version 8.5.0.371 or 8.5.0.384, and AVGLOGX.DLL can be version 8.5.0.317 or 8.5.0.380.

A current install of AVG Free seems to work just fine. It gives me an AVGCFGX.DLL with version 8.5.0.401 (which isn't in any of the crash reports, so the most recent AVG doesn't seem to be involved with this particular crash). The earliest mention on Google of versions .371 and .384 were July 2009, so it appears the people crashing in Firefox are using an AVG version that's a month or two out of date. Did we see any support spikes in July for problems with AVG?

When installing AVG, an "AVG Safe Search" extension is added to Firefox automatically, and an "AVG Security Toolbar" extension is optionally added (the installer asks if you want it, it's checked by default). One crashreport has a list of extensions pasted in, with Toolbar v.2.506.026.001. My newly installed AVG gives version 2.507.024.001. However, only some of the AVG crashes include IGeared_tavgp_xputils35.dll in the module list. That's the component used by the toolbar, so this implies the toolbar alone isn't to blame.

The install.rdf version in the "AVG Safe Search" addon is just "8.5", so we can't blocklist this unless we're willing to blocklist all AVG installs. But if we can get them to start using real versions number in their next update, we could add 8.5 to the blocklist once most users have updated to it. The extension includes an AVGSSFF.DLL component, which is version 8.5.0.310 in all the crash reports but 8.5.0.401 in my new install. If we could blocklist based on component DLL versions, that might help...

But since these are startup crashes, it seems unlikely that users will get blocklist updates (or even Firefox app updates) before they crash, as zzxc noted. So that makes it rather difficult to do anything for users currently hitting this problem. :(
Comment 8 [:Cww] 2009-09-30 18:05:55 PDT
We may be able to update the blocklist that comes included with Firefox so that we can ask user to download the latest version of Firefox with this version of AVG blocklisted.
Comment 9 Justin Dolske [:Dolske] 2009-09-30 18:12:26 PDT
Created attachment 403928 [details]
Crashes per day

Hmm, this is an interesting trend.

I did a manual search on crash-stats for 3.5.3 crashes matching this signature, in the last 1 day, 2 day, 3 days, etc; and then looked at the deltas to determine crashes per day. For the 2 weeks before Sep 26th, there were ~500 crashes/day, and then in the last few days this has suddenly spiked up to 1200/day.

Lars ran some queries on the DB to confirm this. Additionally, if you filter out crashes with an uptime > 120 seconds, the spike is even more dramatic -- about a 15x increase in crashes over the last few days. And maybe higher, as the data for the 28th may not be a full day.

I wonder what happened on the 26th/27th to cause this spike?
Comment 10 Bob Clary [:bc:] 2009-10-01 05:42:05 PDT
This just blew up. Counts of for any Firefox version. Date lags actual crash date by a day.

20090926:820
20090927:1076
20090928:2319
20090929:5811
20090930:17918
Comment 11 Daniel Veditz [:dveditz] 2009-10-01 11:06:00 PDT
_we_ didn't ship, and _they_ didn't ship given the old versions involved, could it be communication with their service (say, a signature update)? Or maybe a malware epidemic disabling their main windows service/executable, with their Firefox component unable to handle that gracefully?
Comment 12 Justin Dolske [:Dolske] 2009-10-01 15:36:34 PDT
Created attachment 404142 [details]
Crash data mined from DB

Got some more data from the DB via Lars...

Summary:

* Looked for other crashes where "avg*.dll" was in one of the crashing stack's frames, ignoring crashes with a signature of RtlpCoalesceFreeBlocks. This turned up a nearly identical crash with RtlAllocateHeap as the signature. There are also fairly common crashes with a signature of avgssff.dll@0x9943 and avgtbapi.dll@0x7b26, but these look different than the issue in this bug. There is also a long-tail of crashes with a signature of avgcfgx.dll @ various addresses, which I didn't investigate.

Based on that, we're able to do more accurate DB searches for matching crashes... signature of 'RtlpCoalesceFreeBlocks' or 'RtlAllocateHeap', with a frame matching 'avg*.dll'.

* Massive spike in these crashes over the past few days. (Keep in mind that we only process 10% of reports, so actual crashes are roughly 10x these counts.)

 count   date
  1016 | 2009-09-25 00:00:00
  1844 | 2009-09-26 00:00:00
  6849 | 2009-09-27 00:00:00
 24662 | 2009-09-28 00:00:00
 86674 | 2009-09-29 00:00:00
 39335 | 2009-09-30 00:00:00

Between the 9th and the 25th, reports were ~1000/day. Between August 13th and September 9th, reports were ~200/day. Between August 1st and 13th, there were almost no reports. There some randomish peaks, but it basically looks like something started causing the problem on August 13th,  it got mildly worse on September 9th, and exploded on the 25th.

* Looked at when crashing users had last upgraded Firefox. There's a peak of users crashing on the same day they upgraded their browser, but there are also large numbers of crashes for users having browsers up to 19 days old. So, it's unlikely that just upgrading the browser is a cause of the crashes.

* Looked at the total number of crashes since the 16th, broken down by browser version. The numbers are not normalized by actual installed base, but it looks like the crashes are widely spread across Firefox 3.0.x and 3.5.x versions. So it seems unlikely that we broke something in a specific update that caused these crashes.

* Looked at the uptime reported with crashes. Most crashes are within 15 seconds of launching Firefox, the overwhelming majority are within 1 minute, and there's a trickle of reports with longer uptimes.
Comment 13 Justin Dolske [:Dolske] 2009-10-01 15:37:44 PDT
Also, we've reached out to contacts at AVG to involve them in this investigation.
Comment 14 Petr Prazak 2009-10-02 08:53:55 PDT
We are not aware of any changes or program updates that could happen from 25th. As this is a memory corrution problem, could it be possible to get a memory dump with the page heap enabled so we could pinpoint the location of the first corruption?
Comment 15 Justin Dolske [:Dolske] 2009-10-02 18:38:04 PDT
We don't gather heap data in the dumps. The available data is basically what's in reports like this: bp-f3e7a8d1-c9b1-4e77-a8fc-4cbb62091002 (random example of this bug). We also record the current URL and stack, but that isn't publicly accessible (for privacy/security reasons). I can get that data if it's needed, though it doesn't seem useful here.
Comment 16 chris hofmann 2009-10-20 21:51:52 PDT
The RtlAllocateHeap signature took a crazy jump yesterday.

391   total crashes for RtlAllocateHeap on 20091010-crashdata.csv
393   total crashes for RtlAllocateHeap on 20091011-crashdata.csv
455   total crashes for RtlAllocateHeap on 20091012-crashdata.csv
439   total crashes for RtlAllocateHeap on 20091013-crashdata.csv
436   total crashes for RtlAllocateHeap on 20091014-crashdata.csv
430   total crashes for RtlAllocateHeap on 20091015-crashdata.csv
487   total crashes for RtlAllocateHeap on 20091016-crashdata.csv
433   total crashes for RtlAllocateHeap on 20091017-crashdata.csv
467   total crashes for RtlAllocateHeap on 20091018-crashdata.csv
1645   total crashes for RtlAllocateHeap on 20091019-crashdata.csv


what ever is happening seems to render firefox and maybe all browsers useless with repeated crashes or the inability to even start up according to some comments:

RtlAllocateHeap
        http://crash-stats.mozilla.com/report/index/d008f733-2c7d-4b81-94ae-a96172091019
        no browsers are working        
        Firefox 3.0.14 Windows NT 5.1.2600 Service Pack 3

RtlAllocateHeap
        http://crash-stats.mozilla.com/report/index/f8fb2ab9-d307-4d2e-94e7-bfccc2091019
        After installing Google Chrome, I put it as default browser. I can still use firefox and IE then. After I reboot my computer, I can't access firefox or IE, only Chrome can be used
. I only can use firefox through safe mode
Comment 17 Petr Prazak 2009-10-21 02:14:16 PDT
Both dumps have different signature (crash inside avgcfgx) than the original report in the comment #7.
Anyway, the AVG modules are out-dated, the first update with higher build numbers (406) was released on June, 27th. That means the users' installations have not been updated for some reasons since then. The current build available is 423.

The only advice now is to try to update or reinstall AVG with the up-to-date installation package.
Comment 18 chris hofmann 2009-10-21 08:27:31 PDT
there are a couple of things that we might do get more updates/reinstalls happening.

1) block the old versions if we can

2) get a support article on sumo that spotlights this a needed for people that are crashing.

:cww cc'ed to get number 2 going.

is blocking possible?
Comment 20 Justin Dolske [:Dolske] 2009-11-02 16:17:31 PST
AVG says the updated version with the bumped version number is being released today/tomorrow, so we should be good to start blocklisting in a week or so.
Comment 21 Samuel Sidler (old account; do not CC) 2009-11-16 16:12:25 PST
Where are we on blocklisting?
Comment 22 Damon Sicore (:damons) 2009-11-23 16:16:26 PST
1.9.2-.
Comment 23 Justin Dolske [:Dolske] 2009-11-29 17:45:15 PST
It looks like the blocklisting (bug 527135) has greatly reduced the numbers of these crashes, though coming out of a US holiday it might be hard to tell. Randomly clicking on recent crash reports makes me think it's a smaller fraction of reasons for this signature, too.

Since this often seems to manifest as a startup crash, it would probably help to update the shipped copy of blocklist.xml, which should make it go away entirely for updated users.

I'll file a bug to do that, and after a bit more time run a DB query to get a more accurate view of crashes-per-day that are definitely AVG-related.
Comment 24 chris hofmann 2009-12-01 12:12:49 PST
I took a crack at trying to sort out percentages of the two signatures in the bug title across the main releases that we are tracking.  it looks like the RtlpCoalesceFreeBlocks crash in 3.6b4 is the highest volume problem that still remains.  it made up about 1.2% of all crashes on 2009-11-30


checking --- 20091130-crashdata.csv RtlAllocateHeap
release total-crashes
              RtlAllocateHeap crashes
                         pct.
total	233706	488	0.00208809
3.0.15	50334	102	0.00202646
3.5.5	122547	144	0.00117506
3.6b4	16576	14	0.000844595
3.6b3	2703	4	0.00147984
3.6b2	1193		0
3.6b1	2776	3	0.00108069
host-5-95:crashdata chofmann$ ./daily-flash-counts.sh   RtlpCoalesceFreeBlocks 20091130*

checking --- 20091130-crashdata.csv RtlpCoalesceFreeBlocks
release total-crashes
              RtlpCoalesceFreeBlocks crashes
                         pct.
total	233706	2008	0.00859199
3.0.15	50334	290	0.00576151
3.5.5	122547	956	0.00780109
3.6b4	16576	202	0.0121863
3.6b3	2703	25	0.00924898
3.6b2	1193	5	0.00419111
3.6b1	2776	54	0.0194524
Comment 25 chris hofmann 2010-01-11 20:20:21 PST
still a pretty high incoming rate per day

checking --- 20100110-crashdata.csv RtlAllocateHeap
release total-crashes
              RtlAllocateHeap crashes
                         pct.
all     215352  410     0.00190386
3.0.15  1958    4       0.0020429
3.0.16  4940    17      0.0034413
3.5.5   5270    8       0.00151803
3.5.6   16510   22      0.00133253
3.5.7   102593  137     0.00133537
3.6     10425   38      0.00364508
3.6b5   14884   26      0.00174684
3.6b4   1505            0
3.6b3   721             0
3.6b2   729             0
3.6b1   2102            0


checking --- 20100110-crashdata.csv RtlpCoalesceFreeBlocks
release total-crashes
              RtlpCoalesceFreeBlocks crashes
                         pct.
all     215352  1682    0.00781047
3.0.15  1958    5       0.00255363
3.0.16  4940    21      0.00425101
3.5.5   5270    25      0.00474383
3.5.6   16510   88      0.0053301
3.5.7   102593  701     0.00683282
3.6     10425   409     0.0392326
3.6b5   14884   232     0.0155872
3.6b4   1505    19      0.0126246
3.6b3   721     12      0.0166436
3.6b2   729     5       0.00685871
3.6b1   2102    39      0.0185538
Comment 26 chris hofmann 2010-02-21 16:38:06 PST
happens pretty close to startup.

1246 total crashes for avgxpl.dll@0x23a74 on 20100220-crashdata.csv
990 start up crashes inside 30 seconds of startup
1087 start up crashes inside 3 minutes of startup


looks like it blew up a day after to the release of 3.0.18 3.5.8 last week,  but that might be coincidence since it also seems to be hitting 3.6 in high volume too.  maybe an update to avg happened around the same time. 

date               avgxpl.dll@0x23a74crashes
20100215-crashdata 1 avgxpl.dll@0x23a74
20100216-crashdata 0 avgxpl.dll@0x23a74
20100217-crashdata 0 avgxpl.dll@0x23a74
20100218-crashdata 3178 avgxpl.dll@0x23a74
20100219-crashdata 2416 avgxpl.dll@0x23a74
20100220-crashdata 1246 avgxpl.dll@0x23a74


checking --- 20100218-crashdata.csv avgxpl.dll@0x23a74
release total-crashes
              avgxpl.dll@0x23a74 crashes
                         pct.
all     260226  3178    0.0122125
3.0.15  1147    29      0.0252833
3.0.16  484     18      0.0371901
3.0.17  10576   349     0.0329992
3.0.18  5315    196     0.0368768
3.5.5   3143    23      0.00731785
3.5.6   1777    40      0.0225098
3.5.7   77424   1230    0.0158865
3.5.8   37178   577     0.0155199
3.6     92031   487     0.0052917
Comment 27 chris hofmann 2010-02-21 16:39:46 PST
er, than last comment belongs in a new avg bug. bug 547210
Comment 28 Greg Mosher 2010-05-12 13:07:08 PDT
This is actually a duplicate of bug 519430, but was originally filed against older the AVG 8.5 code.  All 8.5 users are being migrated to 9.0 which will solve this problem in the next update.
Comment 29 Wayne Mery (:wsmwk, NI for questions) 2011-01-13 11:15:28 PST
also topcrash in thunderbird
Comment 30 timeless 2011-03-28 14:47:42 PDT
We're now tracking such bugs. This doesn't mean it's something we can fix, merely something we hope to be able to point vendors to so they can investigate. This is an automated message.
Comment 31 Scoobidiver (away) 2012-03-29 00:13:13 PDT
Based on comment 28, I close it as workforme.

Note You need to log in before you can comment on or make changes to this bug.