Closed Bug 519458 Opened 15 years ago Closed 8 years ago

Crash [@ FreeEEInfoChain(tagExtendedErrorInfo*)]

Categories

(Firefox :: General, defect, P2)

3.5 Branch
x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbook, Unassigned)

Details

(Keywords: crash, Whiteboard: [crashkill][crashkill-thirdparty])

Crash Data

TopCrash from the Topcrash statistics 

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=FreeEEInfoChain%28tagExtendedErrorInfo*%29

from http://support.mozilla.com/tiki-view_forum_thread.php?locale=en-US&comments_parentId=450259&forumId=1 one comment mention "I disabled McAfee SiteAdvisor and Firefox never crashed anymore. There seemed to be a new update to SiteAdvisor as the interface changed. I guess the new SiteAdvisor update did not work well with Firefox 3.5.3."

investigating
Flags: blocking-firefox3.6?
Flags: blocking-firefox3.6? → blocking-firefox3.6+
Priority: -- → P2
I looked at a dozen or so reports from today; almost all had Cooliris as the top frame's module for the crashing thread. Not all, though. And dbaron's module correlation report shows that while Cooliris is the top one, others have suspicious correlations too (eg Google Toolbar).
(In reply to comment #1)
> I looked at a dozen or so reports from today; almost all had Cooliris as the
> top frame's module for the crashing thread. Not all, though. And dbaron's
> module correlation report shows that while Cooliris is the top one, others have
> suspicious correlations too (eg Google Toolbar).

installed also google toolbar and cooliris and running some tests
(In reply to comment #2)
> (In reply to comment #1)
> > I looked at a dozen or so reports from today; almost all had Cooliris as the
> > top frame's module for the crashing thread. Not all, though. And dbaron's
> > module correlation report shows that while Cooliris is the top one, others have
> > suspicious correlations too (eg Google Toolbar).
> 
> installed also google toolbar and cooliris and running some tests

not reproducible so far :(
If there are DLLs being loaded in these crashes that are coming from direct components-dir placement, please mark this as depending on "compdir-lockdown".

Have we contacted the Cooliris guys?  They might know something based on the stacks in there, and they've been responsive.
Kev: could you advise us as to a CoolIris contact to ping, here, or cc them to the bug with a reference to comment 1?
Whiteboard: [crashkill][crashkill-outreach]
My product mgmt contacts for CoolIris seem to be a little out of date. Pinging the marketing team for a dev/eng contact.
nick/fligtar?
or alastair has helped on some recent bugs.
Assignee: nobody → beltzner
We've got a contact now, and they've asked for more information, specifically when we started seeing these crashes, and whether full crash dumps can be made available.
For the latter, we only have minidumps, and we cannot make them available without significant legal arrangement; they are extremely sensitive from a privacy perspective.
Right now we do not have an internal repro.  If you know the URLs where this crash is happening, that would be useful in narrowing down a repro.

A minidump would be fine.  At least we could load symbols and see where in our code this is occurring.  

We completely understand the privacy issues; at this point I am open to suggestions on how to proceed.
fairly correlated with startup/session restore

133 total crashes for FreeEEInfoChain on 20091109-crashdata.csv
56 start up crashes inside 3 minutes

os breakdown
 101 FreeEEInfoChain(tagExtendedErrorInfo*) Windows NT 5.1.2600 Service Pack 3
  32 FreeEEInfoChain(tagExtendedErrorInfo*) Windows NT 5.1.2600 Service Pack 2

distribution of all versions where the FreeEEInfoChain crash was found on 20091109-crashdata.csv
  81 Firefox 3.5.5
  15 Firefox 3.5.3
  15 Firefox 3.0.15
  12 Firefox 3.5.4
   3 Firefox 3.5.2
   2 Firefox 3.5.1
   2 Firefox 3.0.14
   2 Firefox 3.0.10
   1 Firefox 3.0.1

I don't see any 3.6b1 crashes in this sample...

for some reason we don't appear to be getting many urls on this one


sanitized url listing for 2009 11 09

  94 //
  10 \N//
   3 http://www.juegosjuegos.com/busqueda/tiros_de_faltas.html
   3 about:sessionrestore//
   3 about:blank//
   2 http://www.google.ca/ig
   1 https://www.claas-online.com
   1 http://www.youtube.com
   1 http://www.searchsave.com/search/Health-Insurance
   1 http://www.nytimes.com
   1 http://www.miniclip.com/games/gutterball/en/
   1 http://www.hostave3.net [query string removed]
   1 http://www.google.si
   1 http://www.cooliris.com
   1 http://www.apple.com/supplierresponsibility/auditing-compliance.html
   1 http://www.aol.com
   1 http://validator.w3.org/checklink?check=Check&hide_type=all&summary=on&uri=http%3A%2F%2Fen-us.www.mozilla.com%2Fen-US%2Ffirefox%2F3.5.5%2Fwhatsnew%2F
   1 http://prodigy.msn.com
   1 http://m.www.yahoo.com [removed]
   1 http://home.myspace.com  [removed]
   1 http://get.adobe.com/flashplayer/thankyou/xpi/?installer=Flash_Player_10_for_Windows_-_Other_Browsers&d=McAfe
e_Security_Scan
   1 http://game3.pogo.com
   1 http://batikunik.com
this appears to be a low volume crash going back past july/aug

26   total crashes for FreeEEInfoChain on 20090716-crashdata.csv
11   total crashes for FreeEEInfoChain on 20090717-crashdata.csv
18   total crashes for FreeEEInfoChain on 20090718-crashdata.csv
14   total crashes for FreeEEInfoChain on 20090719-crashdata.csv
15   total crashes for FreeEEInfoChain on 20090720-crashdata.csv
11   total crashes for FreeEEInfoChain on 20090721-crashdata.csv
12   total crashes for FreeEEInfoChain on 20090722-crashdata.csv
13   total crashes for FreeEEInfoChain on 20090723-crashdata.csv
6   total crashes for FreeEEInfoChain on 20090724-crashdata.csv
10   total crashes for FreeEEInfoChain on 20090725-crashdata.csv
7   total crashes for FreeEEInfoChain on 20090726-crashdata.csv
10   total crashes for FreeEEInfoChain on 20090727-crashdata.csv
11   total crashes for FreeEEInfoChain on 20090728-crashdata.csv
5   total crashes for FreeEEInfoChain on 20090729-crashdata.csv
15   total crashes for FreeEEInfoChain on 20090730-crashdata.csv
2   total crashes for FreeEEInfoChain on 20090805-crashdata.csv
2   total crashes for FreeEEInfoChain on 20090806-crashdata.csv
3   total crashes for FreeEEInfoChain on 20090807-crashdata.csv
2   total crashes for FreeEEInfoChain on 20090808-crashdata.csv
1   total crashes for FreeEEInfoChain on 20090809-crashdata.csv
6   total crashes for FreeEEInfoChain on 20090810-crashdata.csv
11   total crashes for FreeEEInfoChain on 20090811-crashdata.csv
8   total crashes for FreeEEInfoChain on 20090812-crashdata.csv
14   total crashes for FreeEEInfoChain on 20090813-crashdata.csv

then spike occurred around 8/13 or 14.   

110   total crashes for FreeEEInfoChain on 20090814-crashdata.csv
153   total crashes for FreeEEInfoChain on 20090815-crashdata.csv
164   total crashes for FreeEEInfoChain on 20090816-crashdata.csv
168   total crashes for FreeEEInfoChain on 20090817-crashdata.csv
185   total crashes for FreeEEInfoChain on 20090818-crashdata.csv
181   total crashes for FreeEEInfoChain on 20090819-crashdata.csv
205   total crashes for FreeEEInfoChain on 20090820-crashdata.csv
214   total crashes for FreeEEInfoChain on 20090821-crashdata.csv
205   total crashes for FreeEEInfoChain on 20090822-crashdata.csv
177   total crashes for FreeEEInfoChain on 20090823-crashdata.csv
174   total crashes for FreeEEInfoChain on 20090824-crashdata.csv
198   total crashes for FreeEEInfoChain on 20090825-crashdata.csv

and its maintained between 131 and 264 though data from yesterday.

since the beginning of this month the pattern looks like

131   total crashes for FreeEEInfoChain on 20091101-crashdata.csv
140   total crashes for FreeEEInfoChain on 20091102-crashdata.csv
155   total crashes for FreeEEInfoChain on 20091103-crashdata.csv
164   total crashes for FreeEEInfoChain on 20091104-crashdata.csv
203   total crashes for FreeEEInfoChain on 20091105-crashdata.csv
135   total crashes for FreeEEInfoChain on 20091106-crashdata.csv
170   total crashes for FreeEEInfoChain on 20091107-crashdata.csv
130   total crashes for FreeEEInfoChain on 20091108-crashdata.csv
133   total crashes for FreeEEInfoChain on 20091109-crashdata.csv
Whiteboard: [crashkill][crashkill-outreach] → [crashkill][crashkill-thirdparty]
(In reply to comment #11)

> A minidump would be fine.  At least we could load symbols and see where in our
> code this is occurring.  

Another possible option would be to give us symbol information for the Cooliris DLL, so that the crash reports we generate will show useful symbols. Through if these stacks are all the same, it may be easier for a developer on your end to manually match up the hex addresses in each frame to the function it's in.
(In reply to comment #14)

We have already done that so we know the general code path Cooliris is taking into GDIPlus.  Most stacks are the same, though some (such as this one: http://crash-stats.mozilla.com/report/index/8d5f68b8-0005-410f-a3db-b59a72091110) are clearly not Cooliris related, but just some other Windows component dying in the same RPC code.

Basically we're calling DrawImage(), but it crashes deep inside some Windows OS components.  It isn't clear from just that stack what exactly is going wrong since we don't see values for arguments, and whatever is going wrong doesn't seem to be a simple NULL pointer type issue.  Also, the stacks in the reports are not complete, e.g. Cooliris shouldn't be at the bottom of the thread 0 stack, so some context is missing.  

I'll look into the symbol-sharing idea.
Chris, your data is showing ~130-160 crashes per day, but the crash logs at the URL below are showing 10-20 per day.  Which is accurate?  Want to make sure we best understand when it really spiked...

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=FreeEEInfoChain%28tagExtendedErrorInfo*%29
the query you posted has the query strings Firefox 3.5.3 and range of one week.

The number you see of 10-20 crashes per day for 3.5.3 corresponds the decline of Firefox users on 3.5.3 as they have upgraded to Firefox 3.5.4 and 3.5.5.

here is  profile of 3.5.3 crashes over the last twenty days

 126 20091020-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 109 20091021-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 144 20091022-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 105 20091023-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 100 20091024-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 100 20091025-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
 141 20091026-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3

firefox 3.5.4 released and users started moving to that release

  98 20091027-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  73 20091028-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  40 20091029-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  43 20091030-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  15 20091031-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
   9 20091101-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  11 20091102-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  12 20091103-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  18 20091104-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  29 20091105-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3

firefox 3.5.5 released and more users moved to that release

   7 20091106-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  18 20091107-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  14 20091108-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3
  15 20091109-crashdata.csv FreeEEInfoChain(tagExtendedErrorInfo*) 3.5.3


if you change the query to work for 3.5.5 you will see the higher volume

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=FreeEEInfoChain%28tagExtendedErrorInfo*%29

or you can change range unit to 1 day and see the current daily volume for the latest release.

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.5&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=days&do_query=1&signature=FreeEEInfoChain%28tagExtendedErrorInfo*%29

the numbers I posted are for all releases so they should be a better reflection if the bug is the same across all current firefox releases.
looking over user comments from the last couple of months there seem to be two general classes.

one class of comments talks about seeing the problem at the update of firefox or cooliris

http://crash-stats.mozilla.com/report/index/e9de3586-c473-4635-9669-d46ff2091110

Firefox crashed when updating Cooliris

http://crash-stats.mozilla.com/report/index/aacd27ff-b196-4de2-9e43-90b9d2091107
        Simple(?) update to v. 3.5.5.  CUrrently: 2 separate Firefox windows open, each with multiple tabs.


the other class of comments is a lot about seeing repeated and frequent crashes and a general high level frustration.
We pinned this Cooliris crash down to a buggy gdiplus.dll choking on (possibly malformed) color profile data in specific images from third-party content providers.

It seems to be XP only (the reports seem to confirm this) and we're already testing a fix internally for this.
Hi Mozilla team,

An update from our side: as Mark noted we've spend the last few days tracking down the crash and have a reproduceable case and a fix we're testing.  What we found is that there's a buggy gdiplus.dll choking on (possibly malformed) color profile data in specific images from third-party content providers.  We have the option of pushing an auto-update to all of our users with the fix.  However, it feels aggressive to push an update for something that's effecting less than .1% of our users, and doing this would need to percolate through our user base which takes 2-3 weeks at minimum.  This would be less than ideal for both your users and ours because of the time it takes and because its making almost everyone auto-update

However, we have a strategy for fixing the issue in the short term.  Since there was a dramatic order of magnitude spike on 8/13, and neither you nor us pushed a new client at that time, we're looking back to see new content that was introduced in Cooliris on that day or nearby that may be triggering the crash since we've traced it down to an image property.  We began running tests two nights ago to remove certain content feeds and are aggressively pursuing this today and over the weekend to figure out what feeds are triggering the crash, and will remove them once identified.  We have traced it down to one EXIF property and now are in the process of scanning our whole DB of content to figure out the offending feeds.  From there we can take those down until we push a new client which will have the fix.

Thus, our current plan of attack is to solve this issue by tracking down the content triggering the crash, remove that content, and include a patch in our next client push that prevents this issue from happening going forward.  

Thanks and if you have any questions/concerns at all feel free to reach me 24/7 at josh@cooliris.com or on my cell at 310.710.1420.
Hi All,

An additional update: we ran a script to identify all content feeds in our system that have what we thought was the offensive EXIF property (whitepoint) and it turns out that many feeds had images with this property.  This means that it is not white-point exclusively that is causing the crash, although it is clearly related.  However, we currently have only one reproduce-able case of the crash in-house.  Two cases would improve our chances of figuring out exactly what sort of malformed content causes the crash but until then we are basically looking for a needle in the haystack, especially given the opaque nature of GDIplus. 

Thus from this point our plan is to:
a) Continue to keep our eye out for a second reproduce-able case of the crash so we can identify what common properties exist between the two cases, and thus figure out what content is causing the crash.
b) If this is unsuccessful, the issue will definitely be fixed in the next client release which we are currently planning to release in December (the fix is already checked in).

In the meantime, if anyone on the Mozilla team has a reproduce-able case we can take a look at, that would be extremely helpful. As always, I'm reachable if we can help with anything :).

Josh
310.710.1420
josh@cooliris.com
This is pretty far down on our crash list; currently #194 on http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.5.5/7 for 0.08% of total crashes (and not even showing up on http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b4/7 )

Given that, does this need to be blocking-firefox3.6+?  If so, what do we plan to do about it from the Firefox side?
I agree that it doesn't need to block; an update on the release timing of the Cooliris extension would be helpful, but as long as we keep an eye on the crash stats for 3.6 I think we're good here.  As always, renom if you think we shouldn't ship until Cooliris has updated or something.
Flags: blocking-firefox3.6+
Summary: Crash [@FreeEEInfoChain(tagExtendedErrorInfo*)] → Crash [@ FreeEEInfoChain(tagExtendedErrorInfo*)]
yeah, volume on 3.6 beta's is low.   here are all the 3.6beta reports with url's over the last month.  I'd also suspect that volume should be low since cooliris is flagged as not compat with 3.6  https://addons.mozilla.org/en-US/firefox/addon/5579   

awk -F\t '$8 ~ /3.6b/ && $1 ~ /FreeEEInfoChain/ {print FILENAME,$8,$2}' 200911*
20091113-crashdata.csv 3.6b2 about:blank
20091113-crashdata.csv 3.6b2 about:blank
20091114-crashdata.csv 3.6b2 http://www.miniclip.com/games/age-of-speed-2/en/
20091114-crashdata.csv 3.6b2 http://www.google.co.in/  query about grad school
20091115-crashdata.csv 3.6b2 about:blank
20091115-crashdata.csv 3.6b2 about:blank
20091117-crashdata.csv 3.6b1 http://www.pog.com/games/Trech_2
20091120-crashdata.csv 3.6b3 about:blank
20091121-crashdata.csv 3.6b1 http://www.isketch.net/isketch.shtml
20091122-crashdata.csv 3.6b3 about:blank
20091124-crashdata.csv 3.6b3 http://mail.live.com/ sanitized
20091124-crashdata.csv 3.6b3 \N
20091124-crashdata.csv 3.6b1 
20091124-crashdata.csv 3.6b1 
20091125-crashdata.csv 3.6b3 http://mail.live.com/ sanitized
20091126-crashdata.csv 3.6b3 
20091126-crashdata.csv 3.6b3 http://search.yahoo.com search for free on-line games
20091126-crashdata.csv 3.6b2 about:blank
20091126-crashdata.csv 3.6b2 about:blank
20091126-crashdata.csv 3.6b4 about:blank
20091126-crashdata.csv 3.6b4 about:blank
20091126-crashdata.csv 3.6b4 about:blank
20091127-crashdata.csv 3.6b4 about:blank
20091127-crashdata.csv 3.6b4 about:blank
20091128-crashdata.csv 3.6b4 about:blank
20091128-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b2 about:blank
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 about:blank

Its also interesting that we made some fixed to url reporting in 3.6b4, and in that release all the reports indicate about:blank, or a possible triggering of the crash when a user opens a new tab and leaves the previous page that had focus.

also, Josh,  any updates on when a cooliris release will be ready for 3.6?
Crash Signature: [@ FreeEEInfoChain(tagExtendedErrorInfo*)]
Still happens but not a top crash anymore. Only 47 on 8.0 in the last 4 weeks. Removing the keyword.
Keywords: topcrash
I am definitely not the droid this bug is looking for.
Assignee: mbeltzner → nobody
Crash Signature: [@ FreeEEInfoChain(tagExtendedErrorInfo*)] → [@ FreeEEInfoChain(tagExtendedErrorInfo*)] [@ FreeEEInfoChain]
You need to log in before you can comment on or make changes to this bug.